3 1. Introduction These specifications cover the Danish lexical-semantic wordnet DanNet Version 2, which has been developed by Centre for Language Technology, University of Copenhagen, and the Society for Danish Language and Literature. The specifications present the types of information present in DanNet, and how they are to be interpreted linguistically. For a more in depth treatment of the methodological and linguistic choices we have made, as well as information about the sources used for the project, we refer to Pedersen et al. 2009, EuroWordNet (Vossen (ed.) 1999), Princeton WordNet (Fellbaum (ed.) 1998), and the Generative Lexicon (Pustejovsky 1995). Additionally, see our list of publications on wordnet.dk. DanNet now contains 65,000 so-called synsets, or sets of synonyms. A synset is the set of words in a given language that refer to the same concept. It is thus possible to equate a synset with a concept. Synsets are the central building blocks or nodes in the wordnet, and they are related to one another by semantic relations. Furthermore, synsets and relations are further specified by semantic characteristics or features. All 65,000 synsets are provided with an ontological type and a link to the closest hypernym. Presently, more information is supplied about concrete substantives, these having on average four semantic relations to other synsets. 5,000 Danish synsets are linked to the equivalent English synset in Princeton WordNet (by the relation eq_has_synonym), also labelled Princeton Core. Approximately 30% of the material has been produced semi-automatically with no further annotations; this is especially true for actions, events, properties and abstract entities. Concrete objects have received the most attention and as such have more relations than other concepts (for more information on the description of artifacts in DanNet, see Nimb, 2009). Approximately 2% of the material has been validated by others than the DanNet editors, see section 6. Figure 1 shows a small excerpt of the wordnet from the area concerning musical instruments. The squares correspond to synsets and the lines to relations. The arrows illustrate the hyponymy relation, i.e. the closest hypernym. The relations are inherited such that trompet ( trumpet ) inherits all the relations that its hypernym has, meaning made_by fremstille ( manufacture ), used_for frembringe musik ( making music ) and blæse ( blow ), has_mero_part mundstykke ( mouth piece ), rør ( pipe ) and ventil ( valve ), as well as has_mero_made_of metal. 3

4 Figure 1: Graphical display of trompet and the relations it inherits from its hypernyms. Figure 2 and 3 show two other graphical representations of musikinstrument ( musical instrument ) from the webpage andreord.dk, where it is possible to browse DanNet online with Danish relation names (for a more thorough review of this browser, see Johannsen & Pedersen 2010). Figure 2 shows all relations given or inherited by musikinstrument, while figure 3 shows the largest groups of hyponyms for this concept. Figure 2: Musikinstrument shown with all its relations 4

5 Figure 3: Musikinstrument shown with the largest groups of hyponyms such as wind instruments, string instruments etc. 2. Overview of editorial principles Word selection The vocabulary in DanNet is a subset of Den Danske Ordbog (DDO), a corpus-based monolingual dictionary of modern Danish (online version: ordnet.dk/ddo). Two principles guided the selection: Frequency Focus on concrete objects This means that highly frequent terms are generally in DanNet if they are concrete objects, while some frequently used words from other semantic domains, such as actions and abstract entities, still may be missing. Organization of the lexical network DanNet is organized by closest hypernym. Hypernyms were automatically extracted from DDO but adjusted by the DanNet editors. In addition to closest hypernym, each synset has an ontological type from the EuroWordNet-ontology, cf. section 4. 5

6 The hypernyms used in DDO's meaning definitions were not taken from an established ontological system, but chosen by the individual editor in concordance with the general editorial guidelines for the dictionary. This means that one DDO-editor has chosen lære as the hypernym for informatik ( informatics ) and bromatologi ( food science ), while another has chosen fag as the hypernym for samfundsfag ( social science ) and a third has chosen videnskab for datalogi ( computer science ). In these cases it has been the task of the DanNet editors to harmonize the wordnet, which often happens by collecting words such as fag, lære, and videnskab in one synset, as least in the cases where there are no good reasons for maintaining the division. As far as possible we have attempted to harmonize these cases. In the 'classical' taxonomy sisters are often disjunctive, i.e. they rule out one another (shirt vs. trousers) and cannot be hypernyms of one another. Often, the hyponymy relation denotes an attribute or function which runs contrary to this classical taxonomy, for instance in the case of a lemma which designates a specific function rather than a 'kind'. In this, the hyperonymy relation is stated as has_hypernym with the feature ortho (orthogonal) to show that it refers to a different dimension than the classical 'kind of'. An example is vejtræ which is any kind of tree lining a road, or fødselsdagskage, which is any kind of cake eaten at a birthday. Such sister concepts are not mutually exclusive; a vejtræ can also be a klatretræ ( tree for climbing ). Generally, one has to be aware how big a part of the ordinary vocabulary which has been described as non-taxonomic and that this affects the structure of the net to a large degree. It has not always been obvious which perspective to adopt when constructing the wordnet, and often it might be a good idea to adopt multiple perspectives at the same time. For DanNet we have adopted a layman's approach to the overall structure of the wordnet so that we do not develop a deeper hyponymy structure than is obvious to a non-specialist. The taxonomic hypernyms for stol ( chair ) that are described in DanNet are thus siddemøbel ( piece of furniture for sitting in ), møbel ( piece of furniture ), and genstand ( object ). This does not mean that other perspectives on furniture are not found in the wordnet. In insurance terminology, for instance, the terms bohave and indbo are often used for all the possessions found in a dwelling, and more specifically løsøre for all the movable objects in a dwelling. These concepts are in DanNet, but not necessarily in the taxonomical structure except that they have a hypernym; in this case samling ( collection ), in the sense of a number of objects. Likewise, the net will often appear somewhat heterogeneous, in that e.g. genstand has subordinate categories such as møbel ( furniture ), bygningsværk ( building ) and transportmiddel ( vehicle ), but also quite specific concepts like gulvtæppe ( carpet ) and dyne ( blanket ), which do not have any immediately obvious hypernym except for genstand (object/entity). If one wishes to use DanNet to extract groups of semantically related concepts, it is important to note that this can be done in two ways. One can try to identify the hypernym(s) best describing the group of concepts one is interested in, e.g. legemsdel ( body part ) or føde ( food ). As it not always easy to predict if the concepts one wishes to find have the same hypernym, it is possible to find a larger group of related terms by extracting concepts via the ontology. The ontology makes it possible to search for types such as COMESTIBLE or BODY_PART. In this way, muskler ( muscles ), knogler ( bones ), and organer ( organs ) are returned when searching for BODY_PART, but these do not have legemsdel as a common hypernym. 6

7 Polysemy and sense distinctions DDO is the starting point for the sense distinctions established in DanNet. We have, however, merged senses where we thought DDO too fine-grained for what we can express in DanNet by relations and features. This is especially true for the subsenses of verbs. Generally we have tried to give different hypernyms for polysemous terms, or otherwise to differentiate subsenses formally by relations or features. As example would be the two closely related senses of frokost ( lunch ) in DDO: 1) det kolde måltid der indtages midt på dagen ("The cold meal eaten in the middle of the day") 2) måltid der serveres (for gæster) midt på dagen ("Meal served (for guests) in the middle of the day") Both have måltid as genus proximum in DDO's definition, but sense 1 is given the hypernym måltid and the second sammenkomst. In such cases the definition of sense 2 is adjusted to sammenkomst midt på dagen hvor der serveres varm eller kold mad ("assembly in the middle of the day were hot or cold food is served"). A formal differentiation of polysemous words is alas not always possible. In the database different senses are separated by means of homograph numbers, main sense numbers, and subsense numbers which come from the printed version of DDO. An entry always has a main sense number which is added to the lemma following an underscore: bil_1 refers to the first (and coincidentally only) main sense of the lemma bil ( car ). Entries with the main sense number _0 are not yet included in DDO. Subsenses have both a main sense number and a subsense number, such that karet_1_2 refers to the second subsense of the first main sense of the lemma karet ( horse-drawn carriage ). From the name it can be seen that there are at least two other senses of karet: a main sense (karet_1) and a subsense (karet_1_1); there is, however, no guarantee that these are actually included in DanNet. Homograph numbers are separated from the lemma with a comma, such that slæde,1_3 signifies the third main sense of the first homograph of slæde ( sleigh ). Systematic polysemy is when groups of words exhibit the same variation in meaning. Terms referring to institutions, for instance, can often be used both in the sense of the building housing them and the people they represent. In these cases, each sense is coded as a separate synset, i.e. one for the institution, one for the building, and one for the group of people, which are each connection by the relation "reg_polysem", regular polysemy. This has also been attempted in the cases where these senses are not consistently listed in DDO. In DDO, most animals have an extra sense denoting the animal as food (e.g., lam, which can either mean lamb or mutton ), but not all words for edible animals have this subsense. Since systematic polysemy is rather comprehensive and requires new corpus research because many of the senses are missing from DDO, it must be noted that systematic polysemy is not fully covered for all relevant semantic groups in DanNet. 7

8 Synonymy Synonyms are as a rule given in the same synset corresponding to one concept, e.g. hustru, viv and kone ( spouse, wife ) are seen as the same concept. This is true even if they belong to different registers. There are also examples where two concepts are listed as near-synonyms of one another by way of a relation; this is done in cases where the semantic distance is seen as too big to merge them. Connotation and gender DanNet contains certain features, such as connotation (negative/positive) and gender (male/female). For illustration, words like rappenskralde (glossed in DDO as loud-mouthed woman ) and knag (glossed a person one likes ) are coded with negative and positive connotation respectively, and rappenskralde furthermore given the feature 'female', where knag is gender-neutral and therefore underspecified for this feature. For most features given a connotation it is also stated which aspect this connotation applies to, i.e. appearance, behaviour, intelligence, etc.; see figure 4 which shows how negative terms for men and women are distributed over different areas. For more information, see Pedersen & Braasch, Figure 4: Negative and positive terms for men and women distributed on different aspects of the denotation such as appearance, sex appeal, personality, conduct, status, age etc. 3. Relations and features Version 2 contains the relations given in the figure below. Relations in bold face are specific to DanNet (and thus neither in Princeton WordNet nor EuroWordNet). concerns used_for used_for_object made_by fodboldmål concerns sport hammer used_for hamre clipse used_for_object clips bagværk made_by bage 8

10 connotation, i.e. fjols connotation = negative) Sex = male; female (The synset refers to a male or a female) Domain= (see below) ungkarl role_agent gifte_sig) Ortho (The relation is orthoginal, i.e. nontaxonomical; only related to has_hyperonym as in vejtræ is_a træ, ortho) Restrict (The relation is specified in relation to an inhreited relation as in maleri: involved_agent maler; restriction of the inherited involved_agent kunstner) 4. Domain Information about domain in DDO called 'sysfag' has been automatically inserted in DanNet, in part by the relation 'domain', in part by the feature 'domain' (Domain from DDO). The attribute value consists of the same abbreviations for the individual domains as are used in DDO, which are explicated below. Background: Sysfag in DDO The element Sysfag ( theme ) had two purposes when DDO was edited: 1. It was possible to check have a given subject was being covered by entries while work was ongoing on the first edition. 2. It was desired to make possible at a later date the extraction of a given subject's vocabulary for a subject-specific dictionary. The guiding principle was that Sysfag should be marked as soon as possible, including for words or senses which are mostly or wholly part of everyday language. For instance, words like penge ("money") and værktøj ("tool") were given Sysfag. The editors added sysfag based on their first intuition and without systematically examining how similar words were marked, and because it was only possible to add one sysfag-element to each word, words from similar semantic areas - even synonyms - might have different sysfag values (e.g. tallerken ("(dinner) plate") has "mad" ("food"), but kop has bolig ("dwelling")). Later this has been partly revised for greater consistency, but certainly there still exists some undesirable inconsistency. While making the digital edition of DDO is was decided to allow as many sysfag for a sense as is thought relevant, and in the future DDO will contain more information on sysfag which can be transferred to DanNet. Some senses were not marked with information on sysfag due to oversight from the DDO-editor. In other cases the information may be missing because the DDO-editor decided not to mark a concept which belongs to several domains, because adding multiple sysfag was not allowed, and choosing one over another would skew the distribution. Finally, the information may be faulty because the field was not visible on the print that was checked during the second round of editing. In the long term DDO will be expanded with information about sysfag for words that are currently lacking it, both existing articles and the undefined b-words. 10

24 5. Ontological type Every synset in DanNet has an ontological type. Most of the ontological types in DanNet are from the EuroWordNet ontology. Ontological types added to DanNet are set in bold in the tables below. For more information about the ontology, see Vossen Concrete objects (1st Order Entities) Origin Natural Artifact Living Plant Human Creature Animal Form Substance Solid Liquid Gas Object Composition Part BodyPart Group Function Vehicle Representation MoneyRepresentation LanguageRepresentation ImageRepresentation Software Place Occupation Instrument Garment Furniture Covering Container Comestible Building Artwork Actions, events, properties and abstract objects (2nd and 3rd Order Entities) SituationType Dynamic BoundedEvent UnboundedEvent Static 24

25 Property Relation SituationComponent Cause Agentive Phenomenal Stimulating Communication Condition Existence Experience Location Manner Mental Modal Physical Possession Purpose Quantity Social Time Usage 6. Examples DanNet was created with so-called coding templates which were available in the coding tool. The templates follow the ontological types and work as guidelines for which relations are relevant to which ontological types. For instance it is relevant for the ontological type Part to state what something is a part of, whilst for an Artifact it is relevant to state what it is used for. To illustrate the principles used in creating DanNet some of the prototypical templates representing different areas of the ontology are presented below. It must be noted that the templates are for compound ontological types consisting of several of the types listed in the ontology above. A foodstuff will always have the Comestible, but depending on its nature it may also be Natural or Artifact, Object, Liquid, Part or Group and so on. It can also be seen that the different relations are grouped after Pustejovsky s so-called qualia roles (Formal, Constitutive, Agentive and Telic), which roughly correspond to hypernym, part-whole, origin and purpose (cf. Pustejovsky 1995). The ontology has generally been clearly applicable for concrete objects, while it has been more difficult to consistently use for abstract concept and for actions, events, and porepties. While coding we have attempted to distinguish clearly between intentional and non-intentional actions (+/- Agentive), where the distinction between Bounded and Unbounded is less clear in Danish at the lexical level. The ontological type Mental is used for concepts referring to mental activity; Social when other people or communities are involved; Physical for physical actions like movement; and Location for actions taking place at a specific place or in a specific direction. It will be noted that semantic relations are most prevalent for concrete objects, unlike, say, actions and abstract entities. In the current version of DanNet these for most cases have only a hypernym relation specified. 25

37 ArgStr: sygdom der skyldes unormal vækst af celler has_ hyperonym sygdom Examples of abstract entities (3rd Order Entities) Ontological type: 3RDORDERENTITY+MENTAL+PURPOSE+MANNER Test/explanation Examples: Comments: Abstrakte entititer der har et formål og en mådesangivelse kur, metode, måde Template 3rdOrderEntity+Mental+Purpose+Manner has_ hyperonym near_synonym //optional// Example 3rdOrderEntity+Mental+Purpose+Manner kur foranstaltning eller metode til behandling af sygdom eller andre problemer der vedrører kroppen has_ hyperonym metode 7. Validation 2% of the material in DanNet has been validated. The focus of the validation has been the same as the editorial focus. Validation has shown that there are some inconsistencies in the semantic description. Where the hyperonomy relation is mostly consistent throughout the material, the other relations vary as to the amount of detail in which they are specified. This is unsurprising and is 37

Developing a tool for searching and learning - the potential of an enriched end user thesaurus The domain study Focus area The domain of EU EU as a practical oriented domain and not as a scientific domain.

Home page Lisa & Petur www.lisapetur.dk Help / Hjælp Help / Hjælp General The purpose of our Homepage is to allow external access to pictures and videos taken/made by the Gunnarsson family. The Association

Sport for the elderly - Teenagers of the future Play the Game 2013 Aarhus, 29 October 2013 Ditte Toft Danish Institute for Sports Studies +45 3266 1037 ditte.toft@idan.dk A growing group in the population

UDLEVERET PDF ==> Download: UDLEVERET PDF UDLEVERET PDF - Are you searching for Udleveret Books? Now, you will be happy that at this time Udleveret PDF is available at our online library. With our complete

To the reader: Information regarding this document All text to be shown to respondents in this study is going to be in Danish. The Danish version of the text (the one, respondents are going to see) appears

MADKUNDSKAB PDF ==> Download: MADKUNDSKAB PDF MADKUNDSKAB PDF - Are you searching for Madkundskab Books? Now, you will be happy that at this time Madkundskab PDF is available at our online library. With

Danish Language Course for International University Students Copenhagen, 12 July 1 August 2017 Application form Must be completed on the computer in Danish or English All fields are mandatory PERSONLIGE

Fejlbeskeder i SMDB Validate Business Rules Request- ValidateRequestRegist ration (Rules :1) Business Rules Fejlbesked Kommentar the municipality must have no more than one Kontaktforløb at a time Fejl

Application form for access to data and biological samples Ref. 2016-02 Project title: Applicant: Other partners taking part in the project Names and work addresses: "Skilsmisse og selvvurderet mentalt

Danish Language Course for Foreign University Students Copenhagen, 13 July 2 August 2016 Advanced, medium and beginner s level Application form Must be completed on the computer in Danish or English All

NOTIFICATION - An expression of care Professionals who work with children and young people have a special responsibility to ensure that children who show signs of failure to thrive get the wright help.

39 (104) The River Underground, Additional Work The River Underground Crosswords Across 1 Another word for "hard to cope with", "unendurable", "insufferable" (10) 5 Another word for "think", "believe",

New Nordic Food 2010-2014 Mads Randbøll Wolff Senior adviser Nordic Council of Ministers New Nordic Food The questions for today concerning New Nordic Food: - What is the goal for New Nordic Food? - How

English Information about the race. Practise Friday oct. 9 from 12.00 to 23.00 Saturday oct. 10. door open at 8.00 to breakfast/coffee Both days it will be possible to buy food and drinks in the racecenter.

Sammenligning af adresser til folkeregistrering (CPR) og de autoritative adresser Comparison of addresses used in the population register and the authentic addresses Side 1 Formål Purpose Undersøge omfanget