LINGUIST List 14.1980

Mon Jul 21 2003

Calls: General Ling/USA; Computational Ling/France

Editor for this issue: Marie Klopfenstein <marielinguistlist.org>

As a matter of policy, LINGUIST discourages the use of abbreviations
or acronyms in conference announcements unless they are explained in
the text.
To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

International Conference on Language in the Era of Globalization
Date: 02-OCT-03 - 04-OCT-03
Location: New York, NY, United States of America
Contact: Wayne Fink
Contact Email: wayne_finkebaruch.cuny.edu
Linguistic Sub-field: General Linguistics
Call Deadline: 01-AUG-03
Meeting Description:
THE AMERICAN SOCIETY OF GEOLINGUISTICS
Announcing the International Conference on
LANGUAGE IN THE ERA OF GLOBALIZATION THE AMERICAN SOCIETY OF
GEOLINGUISTICS Founded by Dr. Mario A. Pei in 1965
Response to the call for papers has been especially good and,
as the announced deadline of 15 July arrives, we can report that we
have accepted proposals for presentations at this international
conference by scholars from Australia, Belgium, Cameroon, Cuba, India,
Iran, Israel, Japan, Singapore, Thailand, Taiwan, the United Kingdom
and other countries, including, of course, many from the United
States. The keynote speaker is from Denmark.
We are extending the call for papers until 1 August 2003
because so many proposals have arrived just before the deadline that
we believe others will come soon after. If you wish to participate in
this conference and have not yet responded to the call for papers,
please send your proposal of 100-200 words immediately.
Due to the generosity of The City University of New York, we
are happy to announce that the registration fee is only US $60 (US $40
for full-time students and retirees). Note that this includes daily
coffee breaks, one gala luncheon, and a copy of the proceedings when
printed. (Proceedings will be sent by surface mail; if airmail
overseas if desired, add US $10). The proceedings of the previous
conference (Language and Identity, 2002) are just about ready to be
mailed. They include more than 40 papers.
The Hotel Madison, located a mere two blocks from the
conference site of Baruch College (CUNY), has agreed to a special low
rate for conference participants: $65 single, $75 double, including
taxes. E-mail: madihotelaol.com
Please bring this conference to the attention of your friends
and colleagues who might be interested in attending and participating.
Do you have any questions? Contact Prof. Wayne H. Finke (e-mail:
wayne_finkebaruch.cuny.edu or by mail at Prof. Wayne H. Finke, Modern
Languages, B6-280, Baruch College, 1 Bernard Baruch Way, New York, NY
10010-5585).

Role of Typography and Punctuation in Natural Language Processing
Short Title: Punctuation and NLP
Date: 22-Nov-2003 - 22-Nov-2003
Location: Paris, France
Contact: Ghassan Mourad
Contact Email: Ghassan.Mouradparis4.sorbonne.fr
Meeting URL: http://www.lalic.paris4.sorbonne.fr/
Linguistic Sub-field: Writing Systems, Text/Corpus Linguistics,
Syntax, Semantics, General Linguistics, Computational Linguistics
Call Deadline: 30-Sep-2003
Meeting Description:
Objective:
Even though punctuation and typography are not seen as teaching
knowledge, we can hardly deny their role in reading and writing. This
is also true for natural language processing, where punctuation plays
an important role.
Typographical and punctuation signs are "natural tags" of information,
and indicators on which most of the processing should rely. It is
essential to tally and study all issues in the multilingual,
multiwriting, and multicoding processing phases.
The ATALA workshop is particularly concerned with current research on
punctuation, typography, coding and transcribing issues in linguistics
and language processing; and with work that already exists in this
restricted domain or directly related to.
Issues:
Linguistic engineering and language processing is confronted with new
issues. Indeed, it is now necessary to work not only on isolated
sentences or utterances, but on entire structured or unstructured
texts too; for example, texts from the Internet or from document-bases
stored by companies or administrations, encyclopaedias or even
dictionary articles.
Moreover, texts are rarely tagged or digitised. However, text
processing requires pre-processing in order to conduct syntactical,
semantic and pragmatic analysis. In particular, each text has two
structures: formal and discursive. The later depends on the
earlier. The formal structure expresses a certain meaning
intentionality; it results from the coding in a typographical system
and from "text-setting" or text layout.
The pre-processing of a text must exploit the formal structure (titles
and sub-titles localisation; text fragmentation in sentences,
paragraphs, utterances, propositions, words; quotation identification;
item list identification; spatial disposition consideration; images,
diagrams, captions, boxes localisation....), before executing other
tasks, or exploiting the discursive structure (temporal, spatial,
topic, event frames identification; relations between concepts, terms,
events; anaphoric links; enunciative phenomena...).
Without complete control of the exploitation of formal structure,
text processing will not really be operational. Obviously, this issue
did not appear when we worked only on isolated sentences. However, for
semantic analysis, text must segmented into linguistic units that are
superior or inferior to the normative sentences, by taking into
account semiotic marks clearly and formally known by the
computer. Punctuation and all typographic signs (index) are still the
most relevant elements, since they can provide sharp indications for
formal text segmentation and structuring; these indications being the
foundation of automatic textual linguistics.
We can distinguish between three types of approaches for segmentation:
(a) Digital approaches (neuronal nets, N-grams, Markov model...);
(b) Finite automata and regular expressions approaches (for instance
INTEX);
(c) Contextual exploration approaches based on punctuation marks (for
instance SegATex).
Traditional theories (treaties, handbooks) of punctuation generally
are normative and do not allow the expression of precise rules that
could lead to automatic segmentation. Furthermore, these treaties did
not consider semantic analysis of highly polysemous marks like comma,
semicolon, colon, dash, parenthesises, ... However, marks play a very
important role in semantic structuring; their analysis allow to
improve segmentation process and text discursive structuring.
Text processing tools offer enormous potentialities for typographic
variations; for example highlighting a term being quoted, exemplify,
or disambiguate an expression...; Quoting Ch. Gouriou : � A
tout probl�me que pose la transcription de la pens�e, la
typographie se doit dapporter au moins une solution ; elle
en offre plusieurs d�s que lon la sollicite de
faire valoir des nuances ou des subtilit� �. However,
the integration to be granted to these variations is not regular and
depends on other contextual (typographic and punctuation) elements;
for example, an italicized expression does not have the same value
(meaning) according to the fact that it is capitalized or between
quoting marks. It is indeed a conglomerate of typographic marks,
variable from text to text, which gives the value of an occurrence of
typographic change. Text processing must resolve these linguistic and
computational issues.
Theme:
Submission can also Discuss/tackle cross-domain topics in relation to:
- Formal segmentation of text,
- Text discursive segmentation based on punctuation and typography
marks,
- "Textual architecture",
- The role of the punctuation - particularly, the comma- in a
syntactic analysis,
- Contribution of the punctuation for the coding of the prosody and
contribution of typography for the coding of intonation,
- Contribution of the punctuation for the identification of proper
names, compound words, abbreviations, initials, ...
- Comparison between punctuation in various linguistic systems (Arab,
Chinese...),
- Coding and transcribing issues in various linguistics systems,
- ...
Modalities :
Submission : a 2-4 page summary.
We ask authors to indicate if their submission:
- present in-progress work or is a position paper;
- present theoretical or applied completed work.
A 2-4-page summary must be sent before 30 September 2003 by e-mail in
text, .rtf, .doc or .pdf to:
Ghassan.Mouradparis4.sorbonne.fr
and
Jean-Pierre.Desclesparis4.sorbonne.fr
Acceptance notifications will be sent for 20 October 2003