As a matter of policy, LINGUIST discourages the use of abbreviations
or acronyms in conference announcements unless they are explained in
the text. To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

The goal of the workshop is to provide a forum for researchers to share ongoing research on spatial language processing, with the aim of moving towards a set of community standards. We invite submissions of papers and demonstrations related to the development of or evaluation of resources, tools, and frameworks for understanding and generating spatial expressions in natural language.

This workshop will be held at the sixth international conference on Language Resources and Evaluation, LREC 2008 (http://www.lrec-conf.org/lrec2008/), in Marrakech, Morocco on 31 May, 2008. (The main conference will be held 28-30 May 2008).

Rationale The time is ripe for the development and standardization of computational resources for processing spatial language: the ubiquitous use of digital geographic resources (e.g., Mapquest and Google Earth) has resulted in a surge of practical interest in location-based services; spoken-language interfaces for navigation systems are becoming widespread; the publishing of geographically-relevant information in Google Earth's Keyhole Markup Language (KML) and other formats is now common; several commercial products for geo-coding text in different languages are now available and have a growing user base. Many of the technologies and resources used are, however, proprietary and task-specific.

There is a need for versatile and comprehensive methodologies for mapping natural language expressions that describe locations, orientations and paths to the geospatial entities they refer to and for encoding the spatial relationships among the entities described. This workshop aims to address this need and to focus research on the development of standardized resources to support the understanding and generation of spatial language on a large scale. These resources include spatial annotation schemes and systems for spatial reasoning as well as spatial ontologies, and might be applied to applications in information retrieval, visualization, data mining, etc. In addition, research into spatial processing may be informed by results from psycholinguistics, particularly the acquisition and processing of spatial language, as well as theoretical perspectives such as those offered by cognitive linguistics, artificial intelligence, and usage-based approaches. The goal of the workshop is to provide a forum for researchers to share ongoing research on spatial language processing, with the aim of moving towards a set of community standards.

Submission Format Technical papers should be no more than 8 pages in length and should follow the style for submissions to the main LREC conference.

We also invite submissions of short papers (3 pages in length) describing demonstrations (in the same LREC style). Demo papers must include a concise abstract that describes what the demo is intended to convey, and should also include screen shots. As the computing facilities in the workshop room are limited, demonstrations are possible only if no additional facilities are needed. Please contact tenbrinkuni-bremen.de for details.

The fourth workshop on collecting and processing linguistic data from the Web

Submission deadline: 29 February 2008

Description

Commercial Web search engines offer fast search on huge amounts of text, combined with increasingly clever ranking and data analysis algorithms, but their content-centric services do not cater to the needs of the computational linguistics and NLP communities. The leading theme of this workshop, the fourth in a row of highly successful Web as Corpus meetings, is to find out how to combine the power and scalability of modern search engine technology with sophisticated linguistic annotation and query processing.

We invite papers on various topics concerning the use of Web resources for corpus research and NLP applications, including (but not limited to) the following:

- linguistic Web crawler technology and Web corpus collection projects - applications of Web-derived corpora and other kinds of Web data - how far does the ''easy way'' get you? (using search engines, or Google's n-gram lists; we are particularly interested in a critical discussion of the usefulness and limitations of such approaches) - methods and tools for ''cleaning'' Web pages to turn them into a corpus (contributors to this topic will be encouraged to participate in the second CLEANEVAL competition to be held in 2009) - automatic linguistic annotation of Web data: tokenisation, POS tagging, lemmatisation, semantic tagging, etc. (established tools often perform very poorly on Web data) - search engine architectures for linguists: bringing linguistics to commercial search engines, or high-performance search technology to linguistics? - search engine-related topics such as result ranking (e.g. how to identify typical'' uses rather than returning 50 very similar matches on the first page) - duplicate detection, interactive query refinement, etc. - reviews and clever uses of search engine APIs (Google, Yahoo, Altavista, and in particular Microsoft's current generous LiveSearch API)

This workshop is endorsed by the Special Interest Group on the Web as Corpus (SIGWAC) of the Association for Computational Linguistics (ACL).

Submission Information: Authors are invited to submit full papers on original, unpublished work in the topic area of this workshop. Submissions should follow the format of LREC proceedings and should not exceed eight (8) pages, including references. We strongly recommend the use of LREC LaTeX or Microsoft Word style files tailored for this year's conference. Details on the submission procedure will be posted on the conference website shortly.