Language Engineering Resources Questionnaire: Indigenous Minority Languages of the British Isles and Ireland

This questionnaire has been taken off-line.

The Department of Linguistics at Lancaster University is currently
undertaking an EPSRC-sponsored project investigating the needs of the language
engineering community with regard to corpus building
in the indigenous minority languages of the British Isles and Ireland (LER-BIML), namely
Cornish, Scottish Gaelic, Irish, Manx, Scots, Ulster Scots (Ullans) and Welsh.
This web-questionnaire has been designed to assess such needs. The answers will be
made anonymous and eventually contribute to a report detailing the needs of the
language engineering community. If you would like a copy of this
report, please mark this box.

Even if you are not working directly with these languages at present, we'd like to imagine that
you may do at some point in the future, and it would be helpful if you could
complete the questionnaire with
this in mind.

2a From the list below please indicate those languages for which you would like to see
(more) corpus resources available:

Cornish

Scottish Gaelic

Irish

Manx
Scots

Ulster Scots

Welsh

2b For which other languages would you like to see corpus resources
available?

3a Which of the following corpus types would you be most
interested in building for these languages?

Monolingual

Bilingual

Multilingual

All
Any

3bi If you answered "Bilingual",
"Multilingual", "Any" or "All"
to the above question, which language(s) would you like to have in your
corpora? e.g. English/Welsh.

3bii ...and which of the following would you prefer a
multi/bi-lingual corpus to contain?

Word-aligned
translations of the same texts in each language

Sentence-aligned
translations of the same texts in each language

Un-aligned
translations of the same texts in each language

Different texts
from equivalent genres in each languages

Different texts from
different genres

4 Would you prefer mostly to see written or spoken data
built for these languages?

Both about equally

Written

Spoken

Both but emphasis on written

Both but emphasis
on spoken

5a Would you prefer balanced corpora of these languages,
or corpora which focused on specific genres?

Balanced

Focused

Either

5b What genres would you like to see represented in such
a corpus? (Mark as many as apply)

News

Legal

Health

Fiction

Letters/Diaries

Leisure

Commerce

Government

Scientific/Academic

Historical

Children

Manuals

5c Any other type of genre?

6a How would you like the data to be
linguistically annotated? (Mark as many as apply)

Part-of-speech

Parsed

Phonemic

Prosodic

Semantic

Just plain text

6b Any other type of linguistic annotation?

7 By which media would you prefer to receive corpus data?
(Mark as many as apply)

Diskette

CD

ftp

Dat Tape

Internet

8 Would you be interested in seeing any of the following textual
mark-up in the corpus?

TEI-Lite

TEI

SGML

XML

HTML

CHILDES/LIDES

9 For the features below please mark how important they
would be for your corpus. We have specified the default option as "no
opinion" to save you time.

Feature

Example

Preferred status

Header elements

Creator(s) of corpus

creator=

Essential

If Possible

No Opinion

Not wanted

Date Created/updated

date.created=

Essential

If Possible

No Opinion

Not wanted

Author

<author>

Essential

If Possible

No Opinion

Not wanted

Extent: words/bytes

<wordCount>

Essential

If Possible

No Opinion

Not wanted

Source of data

<sourceDesc>

Essential

If Possible

No Opinion

Not wanted

Project description

<projectDesc>

Essential

If Possible

No Opinion

Not wanted

Sampling description

<samplingDecl>

Essential

If Possible

No Opinion

Not wanted

Editorial description

<editorialDecl>

Essential

If Possible

No Opinion

Not wanted

Revision description

<revisionDesc>

Essential

If Possible

No Opinion

Not wanted

Language usage

<langUsage>

Essential

If Possible

No Opinion

Not wanted

Primary data

Top-level structure

<cesCorpus>

Essential

If Possible

No Opinion

Not wanted

Text body

<body>

Essential

If Possible

No Opinion

Not wanted

Text divisions

<div>

Essential

If Possible

No Opinion

Not wanted

Head elements

<opener> <head>

Essential

If Possible

No Opinion

Not wanted

Closer elements

<closer> <byline>

Essential

If Possible

No Opinion

Not wanted

Paragraph-level elements

Paragraph

<p>

Essential

If Possible

No Opinion

Not wanted

Quote

<quote> <q>

Essential

If Possible

No Opinion

Not wanted

Poem

<poem>

Essential

If Possible

No Opinion

Not wanted

Figure

<figure>

Essential

If Possible

No Opinion

Not wanted

Note (footnote)

<note>

Essential

If Possible

No Opinion

Not wanted

Table

<table>

Essential

If Possible

No Opinion

Not wanted

List

<list>

Essential

If Possible

No Opinion

Not wanted

Foreign

<foreign lang=>

Essential

If Possible

No Opinion

Not wanted

Sentence unit

<s>

Essential

If Possible

No Opinion

Not wanted

Punctuation

<punc type=colon>

Essential

If Possible

No Opinion

Not wanted

Rendition information e.g. bold/italics

rend=BO

Essential

If Possible

No Opinion

Not wanted

Spoken data

Overlapping speech

<anchor>

Essential

If Possible

No Opinion

Not wanted

Non-lexical vocalisations

<vocal coughs>

Essential

If Possible

No Opinion

Not wanted

Stress

<shift feature=loud>

Essential

If Possible

No Opinion

Not wanted

Pauses

<pause dur=2>

Essential

If Possible

No Opinion

Not wanted

Unclear speech

<unclear>

Essential

If Possible

No Opinion

Not wanted

Actions/gestures/events

<event>

Essential

If Possible

No Opinion

Not wanted

Setting

<settingDesc>

Essential

If Possible

No Opinion

Not wanted

Participants

<particiDesc>

Essential

If Possible

No Opinion

Not wanted

10a (For language engineers) Imagine you have a CD of
corpus data for a range of indigenous minority languages in
both written and spoken formats. What applications would you want to use
this data to build?

10b (For linguists) Imagine you have a CD of corpus data
for a range of indigenous minority languages in both written
and spoken formats. What sort of questions would you want to explore with
such a corpus?

11 What type of support tools would you like to use with
this imaginary corpus data?

12 How likely are you to be working with such indigenous minority languages in the future?

Very likely

Possibly

Unsure

Probably not

Very unlikely

Finally, it would be helpful if you could forward the url of
this page to anyone else who you think might be interested in completing
it.