Ciprian

make a schema/dtd description of the lexC-file (experiment withKomi and Romanian)

transform sme-lexC files into XML format

restructure and clean the script catalogue as suggested in the newsgroup

todo

GT web:

coloring output of the disambiguation

add a tree visualizer for the dependency trees

add input help for special characters on the tool sites

automatize the web statistics

filter (English, German, etc.) input using language detection tools

put a note on the sites that these are NOT MT tools

input help for generating wordforms (dropdown menus).

todo

Bodø Oahpa:

generalize for semantic classes and pos

implement a login mechanism

implement an upload data mechanism

setup the Bodø version on victorio

ongoing

Sandbox Oahpa:

finish the db installation for sb_oahpa

implement and test the Finnish lemmata (and if it works transfer it to theofficial OAHPA implementation)

Numra for Skolt Sámi and Finnish

todo

Running Oahpa:

add an Oahpa clock and date excercise (cf. Numra)

Generate fin/xml/{nouns|verbs|adjectives}.xml, and implement the new Leksadropdown menu (but first run the sb_oahpa test)

email notification when the server goes down

dictionaries, generally:

synchronize the source language entries from a specific dictionary with the entries in the morphology component (now especially for sma): that means nothing then put the entries from dict that are NOT analyzed into the lexC files

todo

KomEngFin:

compile Komi dictionary for Mac

compile only one version of dictionary and use css for user preferences

convert comment tags to real lexc tags when the discussion is over (Tomi)

General list

Bugzilla and Jabber are out of service after upgrading to Snow Leopard Server. Bugzilla is hampered by a strange compilation problem, and Jabber fails to start properly for unknown reasons. Børre is working on it.

To accommodate future enhancements in different directions (in rough order ofimportance):

test bench for all parts of our language technology efforts

test bench enhanced, but not yet complete

set up the Snow Leopard Server features for collaborative support:

iCal server / group calendars

wiki

wiki? on G5 (is part of Snow Leopard Server) or other web-based documentation

make a test-all target that runs all tests we have (Ciprian, Sjur, Trond)

delayed until we have restructured the make/build process

define and document testing routines (Ciprian, Sjur, Trond)

delayed until we have restructured the make/build process

Linguistics

North Sámi

(nothing new, see proofing bugs below)

Lule Sámi

(nothing new, see proofing bugs below)

South Sámi

Numerals:

Bergsland, and Spiik for smj as well: gøøkte luhkie gøøkte

Majja, and Nickel for sme: gøøkteluhkiegøøkte

Tronds interpretation: Bergsland and Spiik do not reflect the norm, but a pedagogical convention to make it easier for readers to understand

Linguistically, joined writing is correct (ref: TO, the native southerner)

Classification of pronouns etc:

personal: manne "I", dïhte "he/she/it"

demonstrative: daate "this one"

determiner: daennie gåetesne "to that house"

Personal vs demonstrative:

dihte Pron Pers

dihte Pron Dem

Given that "dihte maana" is out, the analysis should be +Pron+Pers andnot +Det.

We do not want to disambiguate persons and things as referents, therefore we do not want two analyses +Pron+Pers and +Pron+Dem.

Anna Jacobsen: Dihte Kovlasaemiej åvtohke

Demonstrative vs. determiners

daate "påpekende pronomen" demonstrative

gen: daen

ine: daesnie (Trond: pronoun, sma.fst: +Pron+Dem)

ine: daennie gåetesne (Trond: determiner, +Det) (sme: Gen)

Here, we should have two paradigms, one pronominal +Pron+Dem, and one +Det, disambuguating to pre-nominal use.

Three types of problem areas:

missing closed POSes: errors in transducer, or in PL conversion

missing words found in the smanob dictionary: 110 verbs, 80 adjectives, all core sma words (Cip: this statistics is NOT complete, as far as I know Trond has forgoten to take the Swedish verb file into account!)

The ATTRSUFF-PREDSUFF-STEMTYPE lexica now go to EVENCOMP and ODDCOMP only.

add compound tags (Thomas)

Name lexicon/risten.no infrastructure

Tomi has played with couchdb as a replacement for eXist in risten.no and general dictionary-related work. Seems much lighter and easier to work with.

Instead of building our own webforms and back-end update scripts, use XForms with a premade connection to our xml db. Orbeon XForms is such a tool (opensource).

From the meeting with the terminology and IT teams last week:

no major rework on the present search interface now

no work on the editing section; instead:

add existing lists of sanctioned terminology as separate term entities

add a dictionary if we can make one with sufficient quality

This means the following tasks:

find already approved lists, in paper or electronic form (term team)

convert paper lists to electronic lists (term team)

convert lists to standard XML (Sjur, Tomi)

add prepared lists to risten.no (Sjur, Tomi)

TODO:

send eXist log files to Ciprian (Sjur)

fix i18n bug in risten.no/G5 (so they will work without the proper localerequest) (Sjur)

fix bugs in lexc2xml; add comments to the log element (Saara)

finish first version of the editing (Sjur)

test editing of the xml files. If ok, then: ( Sjur, Thomas, Trond)

make terms-smX.xml <=== automatically from propernoun-sme-lex.xml (add nobas well) (the morphological section should be kept intact, in e.g.propernoun-sme-morph.txt) (Sjur, Saara)

convert propernoun-($lang)-lex.txt to a derived file from common xml files( Sjur, Tomi, Saara)

implement data synchronisation between risten.no and the cvs repo, and possibly other servers (ie the G5 as an alternative serverto the public risten.no - it might be faster and better suited than the official one; also local installations could be treated the same way)

start to use the xml file as source file

clean terms-sme.xml such that all names have the correct tag for their use (e.g. @type=secondary) (Thomas, linguists)

merge placenames which are errouneously in different entries: e.g. Helsinki, Helsingfors, Helsset (linguists)

publish the name lexicon on risten.no (Sjur)

add missing parallel names for placenames (linguists)

add informative links between first names like Niillas and Nils ( linguists)

Dictionaries

sma: nob-swe

over 400 multiplied entries found (these are coused by a not so clean workwith the data)

done: deleted automatically about 300 of them

all multiple entries now cleaned

ongoing: Maja-Lisa is checking the entries on correctness

ongoing: Maja-Lisa is translating the missing words from her own diploma thesis into nob

ongoing: synchronize data from dictionary with that from lexC files so that we are able to analyze everything from the dictonary (Cip: this should be valuable for ALL dictionaries)

Maja priority list, in the following order:

pronouns

negation verb

copula

other closed POS'es?

translations of examples

fkv: nob and nob: fkv

started to work on them: waiting for Verena to get rid of multiplied entries

kom: fin-eng

moved the original kom-lex.xml to the inc-dir and froze it

split it by pos into the working_file dir, the ONLY place to work with the dictionary entries

now, the lexC files are generated via XSLT sheets, no perl scripts

adjusted the Makefile

prepared the pipeline for compiling the mac dict

todo: make a pipeline for StarDict also (as far as I know, Jaska has) a Linux machine

Ciprian

make a schema/dtd description of the lexC-file (experiment withKomi and Romanian)

transform sme-lexC files into XML format

restructure and clean the script catalogue as suggested in the newsgroup

GT web:

coloring output of the disambiguation

add a tree visualizer for the dependency trees

add input help for special characters on the tool sites

automatize the web statistics

filter (English, German, etc.) input using language detection tools

put a note on the sites that these are NOT MT tools

input help for generating wordforms (dropdown menus).

Bodø Oahpa:

generalize for semantic classes and pos

implement a login mechanism

implement an upload data mechanism

setup the Bodø version on victorio

Sandbox Oahpa:

finish the db installation for sb_oahpa

implement and test the Finnish lemmata (and if it works transfer it to theofficial OAHPA implementation)

Numra for Skolt Sámi and Finnish

Running Oahpa:

add an Oahpa clock and date excercise (cf. Numra)

Generate fin/xml/{nouns|verbs|adjectives}.xml, and implement the new Leksadropdown menu (but first run the sb_oahpa test)

email notification when the server goes down

dictionaries, generally:

synchronize the source language entries from a specific dictionary with the entries in the morphology component (now especially for sma): that means nothing then put the entries from dict that are NOT analyzed into the lexC files

KomEngFin:

compile Komi dictionary for Mac

compile only one version of dictionary and use css for user preferences