May 2012

Isan

The Isan language (w:Isan language) is a messy situation that we should come to a consensus on before somebody decides to add entries in it. It has its own ISO code ({{tts}}), but 'pedia says it is just a "collective name for the dialects of the Lao language as they are spoken in Thailand," and I am inclined to agree. Thai and Lao are already mutually intelligible, especially along the border, and Isan is more like Lao.

I can't find a single word unique to Isan - AFAICT it's all Lao with a handful of words borrowed, unchanged, from Thai. The only reason that we can't just merge entries easily is that Thai and Isan are written in the Thai script, but Lao is written in the similar Lao script. I recommend that we merge Isan with Thai, and add those words which Lao and Isan share, but Thai does not use, under the L2 header for Thai with {{context|Isan}} in front of the definition and with a usage note (which will be a template about Isan vs Thai so nobody uses it as a Standard Thai word by mistake). What do you think? --Μετάknowledgediscuss/deeds 05:08, 2 May 2012 (UTC)

If it's more like Lao than Thai, wouldn't it be better to fold Isan under Lao and, in most cases, have definitions like "Isan spelling of X"? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 16:38, 2 May 2012 (UTC)

Since Isan is written with the Thai script, unless the differences in the scripts are pretty trivial, bundling Isan and Lao would result in 100% non-matches with Lao, and so putting those two together would not make sense. --BB12 (talk) 18:00, 2 May 2012 (UTC)

Is this at all similar to the situation with some of the Balkan languages, where the same language when spoken is written in one of two scripts (Latin or Cyrillic)? WP calls this phenomenon digraphia, as in the intro header to the w:Serbian language article. -- Eiríkr Útlendi │ Tala við mig 18:13, 2 May 2012 (UTC)

It appears to be more like as though US English were written with Cyrillics. Beginning with the noun section of the Wikipedia Isan article, a number of tokens are provided that are the same and different from Lao and Thai. --BB12 (talk) 18:22, 2 May 2012 (UTC)

@Widsith & Doremítzwr: In an ideal world, we would do that. However, in practice that would be a repitition of almost all our Thai entries under another L2 header with identical information. It is thus much more manageable to merge it with Thai, which is extremely similar anyway.

@Eiríkr: It is a similar situation, but as Benjamin points out, there are minor lexical differences. Certainly a trans-script fix like SC would be unwieldy here.

@Benjamin: The scripts are extremely similar, but of course that still results in 100% non-matches; compare M (Latin) and М (Cyrillic). They make the same sound, are etymologically the same, and are written identically, but never end up on the same page on Wiktionary. --Μετάknowledgediscuss/deeds 03:54, 3 May 2012 (UTC)

The comparison of the Latin and Cyrillic troubles me in that if the scripts and language varieties are that close, then maybe they do need to be bundled together. In the opposite direction, another concern I have is what users will look for. If they want an Isan word, it would be burdensome to expect them to look for a Thai (or Lao) word. Also, if they look up a Thai (or Lao) word and no note says "this is the same in Isan," they will be left wondering whether they have the correct Isan word or not. (Since it is a dialect continuum situation, you can argue that this problem applies to lots of varieties anyway, but if speakers have the mind-set that Isan is a separate language, then perhaps separating the languages that way is best.) --BB12 (talk) 10:38, 7 May 2012 (UTC)

I agree. As it has its own ISO code, users should be allowed to create entries for it, and readers to look for them. This is the best way not to lose any information. I understand that it's more or less the same case as Hindi and Urdu. Lmaltier (talk) 20:32, 11 May 2012 (UTC)

My departure

Just to let you all know, I’m ceasing editing here. Simply put, this is something I need to get paid for. I am very glad for my experiences contributing to this project, working alongside knowledgeable and helpful editors. I hope now to take these skills, which I have acquired in my amateur efforts, on to professional employment. I consent to the removal of my administrator privileges upon one year’s inactivity, or sooner, as the community deems appropriate. I wish you all the greatest good fortune in your endeavours to build this nonpareil resource. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 02:46, 3 May 2012 (UTC)

I am sorry to see you go, but glad that you have enjoyed and benefited from your time here. Thank you so much for all your contributions here, and good luck to you as well. --Μετάknowledgediscuss/deeds 04:00, 3 May 2012 (UTC)

Best of luck in your new endeavors. DCDuringTALK 02:19, 4 May 2012 (UTC)

I would never remember to do it after a year, so I have removed your sysop status now. If you have second thoughts, just ask me or another -crat and this can be reversed without a vote. SemperBlotto (talk) 07:23, 4 May 2012 (UTC)

Your contributions will stand as a testament to your diligent efforts to expand and improve the English Wiktionary. Best wishes, --EncycloPetey (talk) 18:48, 6 May 2012 (UTC)

Thank you for your irreplaceable contributions to the human meta-organism. It was a great intellectual pleasure reading your posts and discovering many unfamiliar English words. I hope you reconsider your monetary-driven motivations sometime in the future :) Cheers! --Ivan Štambuk (talk) 19:39, 6 May 2012 (UTC)

That's a pity. If you end up not finding that kind of career, we'd like to have you back :)Equinox◑ 21:15, 6 May 2012 (UTC)

I'm sorry to see you go. Take care, —RuakhTALK 14:55, 9 May 2012 (UTC)

Microsoftify isn't a trademark, of course (though Microsoft is). I believe consensus is not to include this TM sign, but the {{trademark}} gloss can be used. Equinox◑ 15:34, 3 May 2012 (UTC)

Ah, my point was that the definition given of googlewhack (also not a trademark) is "A Google™ search result consisting of a single hit...", while Microsoftify is just "To assimilate into a Microsoft framework." Sorry I wasn't clearer. Smurrayinchester (talk) 15:40, 3 May 2012 (UTC)

We should not be using ™ designations in definitions at all. First, we have no legal obligation to do so; second, they are not linguistically informative; third, trademarks expire eventually, which means that we would need to regularly check the trademark status of each word so designated, to remove the designation when that event occurred. I would propose that at most either a usage note or an etymology note (as appropriate where the word derives from a trademark) be used to indicate that the word so designated had a trademark status at the time when the usage came about. bd2412T 16:04, 3 May 2012 (UTC)

I think they are linguistically informative. A person writing e.g. an instruction manual would want to ensure their choice of word would be understood as a generic one and not referring incorrectly to only one brand. 82.113.133.21 16:24, 3 May 2012 (UTC)

Why would we want to carry the trademark holder's water? Why would we want to create the expectation that we were a reliable source of trademark information when we are barely a reliable source of definitions? DCDuringTALK 16:59, 3 May 2012 (UTC)

Why would we want to mislead someone into thinking that a trademarked term will be generally understood as the generic, when it often will not? Equinox◑ 17:05, 3 May 2012 (UTC)

We are an international dictionary, but trademarks are not international. So a single tm-sign isn't really very informative at all, because it doesn't say in what countries the trademark applies. —CodeCat 17:42, 3 May 2012 (UTC)

I agree with BD2412 and CodeCat: we shouldn't use the trademark symbol, because trademarks are temporary and country-specific, because there is no expectation or requirement that we do, and finally because doing so frankly looks sarcastic on our part. —Angr 18:42, 3 May 2012 (UTC)

Whether we should indicate trademarks as such, and whether we should use the ™ symbol to do so are two different questions.

The symbol is used to protect a trademark by its owner, by ensuring that every single mention of it is annotated. We don't have any incentive or obligation to protect anyone's trademarks. Please don't use the ™ in Wiktionary because it's inappropriate. —MichaelZ.2012-05-03 23:53 z

I would just like to add that there are literally millions of words for which a trademark is or has been registered. The word "Please" has been registered by a half dozen different users, with respect to different goods and services. "Hello" has had over a dozen registrations. There are even subsisting registrations for "The". If™ we™ were™ to™ indicate™ every™ word™ with™ a™ trademark™, our™ sentences™ would™ end™ up™ looking™ like™ this™. bd2412T 14:44, 8 May 2012 (UTC)

Not true. It should only be used (if at all) when referring to the products/services trademarked with that name — not when using the word in its dictionary sense. For example, hello™ must refer to something, say a telephone system; it is not a trademark on the everyday greeting. Equinox◑ 12:18, 9 May 2012 (UTC)

In case someone still thinks it's a good idea, or even acceptable at all, to use these marks, here's some advice:

AMA Manual of Style

Under the US Federal Trademark Dilution Act, restricted use of trademark names applies mainly to commercial use of trademarks, not to editorial use in publication. For example, a photography magazine may not use the word “Kodak®” as part of its cover design and a computer manufacturer may not place the word “Kodak®” on the front of a computer. However, an author or editor may include the word “Kodak”—without the trademark symbol—in an article about cameras and film development without risking trademark infringement.

The symbol ®, or letters TM or SM, should not be used in scientific journal articles or references, but the initial letter of a trademarked word should be capitalized.[1]

Chicago Manual of Style Online

In publications that are not advertising or sales materials, all that is necessary is to use the proper spelling and capitalization of the name of the product. A trademark attorney can tell you when the use of the symbol is required.[2]

Although the symbols ® and ™ (for registered and unregistered trademarks, respectively) often accompany trademark names on product packaging and in promotional material, there is no legal requirement to use these symbols, and they should be omitted wherever possible.[3]

IEEE Computer Society Style Guide

The registered trademark (®) symbol indicates that the trademark is registered in the US Patent and Trademark Office; (™) indicates the trademark is pending. Avoid using trademark symbols in text.[4]

MLA Style Manual

Because the fair and consistent use of these symbols (or of footnotes denoting the trademark owners) requires exhaustive verification and vigilance on the part of the editor and because the use of these symbols (or footnotes) is not required by law, do not add trademark symbols, registered-trademark symbols, or trademark-denoting footnotes to trade names in MLA publications. In the interest of consistency, editors should also delete such references when inserted by authors.[5]

National Geographic Style Manual

The trademark symbols ® or TM are not usually used in editorial text. For use of the marks in other cases, consult our legal office and any licensing agreement that may apply.[6]

This user was unfairly blocked since he is clearly not a vandal (see contributions), and cannot even appeal for the block. This is not the first time and the blocker is from a long time known for abusive blocks.

@81.185.159.128 you're not aware of Wiktionary:Blocks and restrictions/Wonderfool. It's kinda more complicated than that, from what I can tell he starts of editing well (well is too strong a word, even competently is a little too strong), then deliberately gets himself blocked with patent vandalism or request a block on a talk page of an administrator, and vandalising administrator talk pages until someone agrees. As for why, fuck knows. Mglovesfun (talk) 15:08, 4 May 2012 (UTC)

And he's had so many names that he can't remember them. This time it was "Pixselax". SemperBlotto (talk) 15:10, 4 May 2012 (UTC)

I have heard of it, but I couldn't guess that it was him. And recognize that Semperblotto has made a pretty high number of abusive blocks and doesn't care a jot of the numerous claims he received. That's why I reported it. 81.185.159.128 15:16, 4 May 2012 (UTC)

No, I don't recognize that. First of all it's a matter of opinion what's abusive and what isn't, secondly my opinion is that SemberBlotto doesn't block abusively. Some of them are blocks I wouldn't make myself, granted, but there's a line between disagreeing then saying because one disagree it's a form of abuse. Mglovesfun (talk) 15:19, 4 May 2012 (UTC)

Think what you want, but he did receive claims quite a lot of times (just see his talk pages), and see his reactions : I don't care. This is not in my imagination, and you cannot say all this does not exist. There a kind of problem, and I'm not the only one thinking he's quite abusive with blocking and so. 81.185.159.128 15:27, 4 May 2012 (UTC)

I'm not saying none of what you refer to has happened or that nobody agrees with you, just clearly, not enough people agree with you to make it an issue. In fact that only reason we're talking about this is because I keep replying; if I didn't you'd be forced to talk to yourself about it. Mglovesfun (talk) 23:12, 4 May 2012 (UTC)

SemperBlotto is not a problem, but Wonderfool can be. Can any veterans give me some tips on recognizing him? Somehow I still get duped every time, until something really obvious happens (and then he gets blocked). --Μετάknowledgediscuss/deeds 23:55, 4 May 2012 (UTC)

Well, he often asks for tips on recognizing Wonderfool. --Vahag (talk) 01:01, 5 May 2012 (UTC)

LOL (for real, I actually did laugh out loud). I can still remember when I thought "WF" referred to the Wikimedia Foundation, and you can imagine how confusing that got. --Μετάknowledgediscuss/deeds 16:18, 5 May 2012 (UTC)

He usually edits competently in Romance languages; often cites from UK sporting columns (football etc.); and eventually goes mad and deletes the main page or starts adding blatantly ridiculous entries. Hurrah! Equinox◑ 19:24, 5 May 2012 (UTC)

He seems to me to be approaching this as a game, with extra points for how well he can pass as a normal and productive contributor until he gets bored. He then finishes things off by disrupting things, with points for how outrageously and/or creatively he can do so. As long as WF gets to play by his rules, we end up like Charlie Brown to WF's Lucy where we're just trying to kick the ball and he's trying to make us fall on our butts. Chuck Entz (talk) 02:53, 6 May 2012 (UTC)

What to do when the traditional 'lemma form' isn't actually a word?

I've recently been trying to improve some coverage of Zulu but I've come across a problem. Most dictionaries of the language list the stem of words, especially verbs. However, this stem isn't actually an attestable word, and it's not used by itself but always with a prefix added (for example bona "see" has the infinitive ukubona and only the latter is an actual word). Other Bantu languages probably have the same problem, especially those closely related to Zulu (I don't know how it is for Swahili). How should this be solved? Should we break with tradition and use the infinitive as the lemma (which would mean that all verb lemmas will end up beginning in uku-) or should we use the stem as the lemma, even though it's not a word? —CodeCat 21:13, 4 May 2012 (UTC)

FWIW some of the Old French and Middle French infinitives I've created are based on non-infinitive citations. But that doesn't mean there aren't any infinitive citations, just that it's a possibility. Mglovesfun (talk) 23:13, 4 May 2012 (UTC)

It's not a matter of citations though. The issue is that there is actually no such thing simply by the rule of grammar for those languages. I found out that in Zulu the imperative is the same as the stem for many verbs, so we could claim it to be that and call it a solved problem at least for those verbs. But there are some verbs that have a prefix even in the imperative, so that the bare stem is not a proper word. And I also found some Zulu nouns on Wiktionary that were given as bare stems, which I'm quite sure are ungrammatical, incomplete words - it is more or less equivalent to creating an entry bell for Latin bellum, or even ing for English -ing. Nouns and (most) verbs in Bantu languages always have a prefix attached to them, similar to how many ancient Indo-European languages always attach a case ending to a noun. Sometimes the ending happens to be no-ending, which also occurs in some Bantu languages for some noun classes, but not in Zulu. —CodeCat 23:26, 4 May 2012 (UTC)

isiZulu and Dicts.info both appear to use the stem as the citation form. I like the isiZulu better because it has a hyphen in front of it, warning the reader that it cannot stand alone. --BB12 (talk) 04:23, 5 May 2012 (UTC)

The isiZulu dictionary uses a scheme where the 'name' of each entry (the word you look up) has no hyphen, but the 'headword line' has one when appropriate. See http://isizulu.net/?na for example. I like that scheme because it's not always clear when a morpheme can stand alone and when not, and some can do both. So I would like to propose that for Zulu words we use a similar way of organising terms: the hyphenless form is used as the entry name, but we include a hyphen in the headword-line when it can't stand alone, such as in bona. Is that ok? —CodeCat 14:15, 5 May 2012 (UTC)

I think that will be the best solution, but if there was no precedent, I would have pushed for the imperative for verbs as the lemma form. --Μετάknowledgediscuss/deeds 16:12, 5 May 2012 (UTC)

At first I thought that would be possible, but the imperative isn't always identical to the stem. Single-syllable stems have an extra yi- prefix in the imperative. —CodeCat 00:01, 6 May 2012 (UTC)

Sanskrit nouns and adjectives are cited in the bare-stem form as is the common lexicographical practice, which generally do not occur "alone" as words, because of the little thing called sandhi. Lemma forms are, as I understand, exempt from "must be a citeable word" rule, because for some languages it simply doesn't make sense (e.g. polysynthetic, or extensive sandhi at word boundaries). The best course of action IMHO would be to follow the most common approach of the respective language's dictionaries, and not inventing something that does follows our petty norms, but is used nowhere. --Ivan Štambuk (talk) 19:52, 6 May 2012 (UTC)

Latin verb lemmata

I believe we should change Latin verb lemmata to be the present active infinitive (they are currently the first-person singular present active indicative). For example, the definitions and conjugation table would go at confirmare instead of at confirmo. Why should we do this?

The infinitive gives a lot more information about how to conjugate the verb, whilst the current lemmata give almost no information.

The Romance languages already use the infinitive as the lemma form (like Italian confermare), and this would make matching up cognate verbs easier.

Many etymology sections for English and Romance languages already link to infinitives only, as if they were the lemmata (for example, see fracture#Etymology).

I am not aware of any Latin dictionaries that do this, so there may be a problem they are avoiding that I have not realized. Otherwise, this seems like it would be a good improvement. --Μετάknowledgediscuss/deeds 16:38, 5 May 2012 (UTC)

The infinitive doesn't actually give any more information than the first-person singular. There are three different first-person endings (-o, -eo, -io) and three different infinitive endings (-are, -ere, -ire). —CodeCat 00:04, 6 May 2012 (UTC)

I guess I exaggerated when I said "a lot". However, it does give more information. For one thing, even when seen without macrons, it unifies by conjugation (-are=1st, -ere=2nd and 3rd, -ire=4th). By contrast, 3rd conjugation verbs in our current lemma forms are divided into -o (i.e. rego) and -io (i.e. capio). Also, certain verbs, like fero, seem regular until viewed in the infinitive. It is not massively more informative, but it complements the other benefits listed above. --Μετάknowledgediscuss/deeds 00:32, 6 May 2012 (UTC)

Latin does have a very strong precedent of using the PAI1S as the lemma. I've never seen a Latin dictionary (or any Classically inclined dictionary) use any other form. However, I'm not sure if it poses any real benefits, and it certainly does make the relationships between it and its daughter languages (which use the infinitive) somewhat complicated to explain (I have to admit I don't find the other arguments in favor of the infinitive terribly convincing). We should definitely wait and see what EncycloPetey has to say on this, as he was the primary proponent of the current system, as well as our most consistent Latin contributor. -Atelaesλάλει ἐμοί 01:02, 6 May 2012 (UTC)

I don't know why 1 person singular is chosen, but I would suspect that the finite vs. infinitive decision probably has something to do with frequency of use: one would want the lemma to resemble forms commonly encountered (though the classical languages do seem to have quite a fondness for participles and infinitives). Chuck Entz (talk) 02:29, 6 May 2012 (UTC)

Just for reference, I looked up habito in the Collins Gem Latin Dictionary (2004) and it does indeed list the entry under habito not habitare. Mglovesfun (talk) 10:47, 6 May 2012 (UTC)

Classical languages like Ancient Greek and Latin use the 1st-person singular present active indicative as the lemma form. This is the lemma form that has used in Latin dictionaries since the Renaissance, when major Latin dictionaries were first published. It is the "first" form of the four principal forms of the Latin verbs taught in school, and we list those four forms in the entry in the same sequence in which students are asked to memorize them. Latin textbooks use this as the lemma form, so Latin students and people educated in Latin will look for the 1st principal part as the lemma. Some dictionaries omit the infinitive altogether, and replace it with a number identifying the present active infinitive form.

So, the only argument I see made for the proposed change is that many etymology sections of Romance verbs link (incorrectly) to the Latin present active infinitive (note that it is not the infinitive, because Latin verbs have more than one infinitive form). I would say the obvious answer is to just correct the etymologies. Nothing is to be gained by breaking with hundreds of years of Latin lexicographical tradition, but much would be lost. --EncycloPetey (talk) 18:42, 6 May 2012 (UTC)

I have no particular preference for Latin lemma forms -- but I would dispute the "incorrectness" of linking to the infinitive from Romance etymology sections, since it's precisely from the Latin present active infinitives that Romance lemmata derive. Ƿidsiþ 19:32, 6 May 2012 (UTC)

Perhaps, but paradigmatic leveling can muddy the waters quite a bit. It wouldn't surprise me to find many instances where the the stem comes from some other form and the infinitive is really just a back-formation from that other form. I believe that's the way it is with the nouns, where everything is usually based on back-formation from the accusative plural. Chuck Entz (talk) 21:33, 6 May 2012 (UTC)

Well, explaining that is exactly what Etymology sections are there for. The fact that many of them currently are not that specific is less by design and more a product of the vagueness of some editors' sources. Ƿidsiþ 21:45, 6 May 2012 (UTC)

I don't see what would be lost by switching, but the gain is not great enough and there is scant support, so I will not pursue this. --Μετάknowledgediscuss/deeds 21:41, 6 May 2012 (UTC)

However, it should be remembered that, in most of these cases, the etymology is not talking about the infinitive form specifically, it's talking about the word, with all of its forms, viewed collectively. I wish we had better language for explicating that. -Atelaesλάλει ἐμοί 21:29, 6 May 2012 (UTC)

Its paradigm, all of its definitions, its etymology, the whole bit. -Atelaesλάλει ἐμοί 22:14, 6 May 2012 (UTC)

I also support tradition for Latin (and Ancient Greek of course) entries. On the other hand, if we want to keep the present infinitive form in etymologies, we could use there something like [[habito|habitare]] or find another way to mention both forms.

An other related issue is interwikis. Some wiktionaries use the infinitive as lemma form, some others prefer the PAI1S. It would be nice if we could find a way to communicate with editors of Latin entries in sister projects and seek a common stance or at least an agreement to create redirects from infinitives to PAI1s (and vice-versa). --flyax (talk) 22:25, 6 May 2012 (UTC)

head, sg, current, pos

Is there consensus on when to use head, sg, current and pos in headword-line templates? I much prefer head as I think it's the most widely used, and the most widely supported by our templates. Most notable, {{head}} supports head but NOT sg, current and pos. User:Yair_rand/usenec (which I love) also only uses head. sg is very common, while current and pos are much rare. pos seems to be used in some English templates, but as far as I can tell, not for any other language.

Obviously we're not talking about massively important issues here. On my user talk, CodeCat offered to replace sg, current and pos with head. I'd like that, I think using only one of these is better for usability. This doesn't mean that sg, pos and current should be banned, just that if a bot can make some minor edits to have more consistency within our entries, I'm all for it. Mglovesfun (talk) 11:15, 6 May 2012 (UTC)

There are some templates where we might ned to keep sg, since it's not always the form found as the headword for the entry where its used. I'd expect pos to identify the part of speech, not provide a particular form of the word. I've no idea why anyone is using current. --EncycloPetey (talk) 18:45, 6 May 2012 (UTC)

I support forcing all headline templates to use only head for such an alternate display, but I'm willing to settle for forcing all headline templates to at least accept head. EP, can you offer us an example on when sg might be needed? -Atelaesλάλει ἐμοί 10:32, 7 May 2012 (UTC)

For clarity, I'm not saying sg should not be used in this sort of situation, where head and sg are not equivalent. Having said that, this is the only template I'm aware of that doesn't use them equivalently. Mglovesfun (talk) 11:23, 7 May 2012 (UTC)

As EP says, pos= is strange and should go. It was originally used in {{en-adj}} and {{en-adv}}, was short for "positive" (as opposed to "comparative" and "superlative"), and was clearly analogous to the use of sg= in {{en-noun}} and {{en-proper noun}}, but that meaning was apparently forgotten over time — the relevant sense of "positive" is not so common as that of "singular", and our use of "POS" to mean "part of speech" offered a conflicting (if contextually bizarre) interpretation — so pos= came to be used in other English headword templates where it is not short for "positive".

As for the others — I think it's safe to deprecate current= in favor of head=. Dunno about sg= (in templates where it means head=).

Normalized spellings of Middle English

There was a discussion about this topic a month ago, but I want to revisit it with a different case. I own a print copy of this wonderful little book, a poem about King Arthur in Middle English (see Template:R:Furnivall 1864). Furnivall was kind enough to add in extra letters (in italics) so that forms in here match what he considered to be standard Middle English. I am intending to add a lot of words from this corpus, but I was wondering whether they should be added under normalized spellings (as if the italics had been there in the first place) or exactly as written in the manuscript. --Μετάknowledgediscuss/deeds 22:35, 6 May 2012 (UTC)

It looks like he's not just adding letters, he's probably also expanding abbreviations. He writes "honour" and "presence", but I suspect the manuscript doesn't have simply "hono" and "psence"; probably there's some little diacritic mark indicating an abbreviation. He also writes "þat", but I wonder if the MS really has "þt" or if it has ꝥ. At any rate, those abbreviated spellings are quite different from things like "pendragone" and "walysche", where the MS spelling probably really represents the author's pronunciation. I think [[pendragon]] and [[walysch]] can be entered, and perhaps called alternative spellings of [[pendragone]] and [[walysche]] depending on what the most common spellings are, but I wouldn't say that Middle English actually has words spelled [[hono]], [[psence]], and [[þt]]. —Angr 06:10, 7 May 2012 (UTC)

Why would they be difference from pendragone? It was entirely normal in Middle English and Early Modern English to abbreviate a trailing e with a tilde/macron/mark above the preceding character; pendragone was almost certainly written pendragoñ in the text.--Prosfilaes (talk) 06:40, 7 May 2012 (UTC)

In that case they aren't different and these are all expanded abbreviations rather than normalized spellings. —Angr 06:51, 7 May 2012 (UTC)

I agree with Angr. While it would be lovely to see exactly what's on the manuscript, Furnivall wasn't modernising any spellings, just expanding the usual scribal contractions. So yes, the expanded forms should indeed be the headwords and not the contracted forms. Ƿidsiþ 06:15, 7 May 2012 (UTC)

It's also available from Project Gutenberg, and it says in the introduction "The expansions of the contractions are printed in italics, but the ordinary doubt whether the final lined n or u—for they are often undistinguishable—is to be printed ne, nne, or un, exists here too."--Prosfilaes (talk) 07:22, 7 May 2012 (UTC)

I note that [[psence]] is a bluelink. Should we keep it (and more generally, all such abbreviations, Artho, etc), changing the definition to something like "{{abbreviation of|presence}}", or delete it? - -sche(discuss) 07:26, 7 May 2012 (UTC)

So: psence and pals are all my fault, and they're from a couple/few months ago. I think (personally) we ought to RFD them as a group (all my enm contractions) as if we reach consensus I will manually delete them and re-enter forms with italics as part of the word. Alternatively, we can make them all into abbreviations of x. Which option do we prefer, deletion or soft redirection? --Μετάknowledgediscuss/deeds 23:46, 7 May 2012 (UTC)

Use of ɻ in American pronunciations?

I've had a look at the discussion we had back in January about using /ɹ/ to transcribe RP "r", but are we also now using /ɻ/ for GenAm "r"? Generally British and American pronunciations of words containing "r" differ only in their rhoticity, so is should we be distinguishing /ɹ/ and /ɻ/ where "r" is pronounced in RP? It seems to me that if we are using the former we should also be using the latter for consistency, because if "r" is not represented by /r/ in RP, it is not /r/ in GenAm either. — Paul G (talk) 09:41, 7 May 2012 (UTC)

I thought there had been a decision to use /ɹ/ for both, although I'm not sure now where I got that idea from. Ƿidsiþ 10:21, 7 May 2012 (UTC)

What was that vote supposed to achieve? The current broad tr. for English seems neither to be cut towards easy-acces nor trueness to the IPA. (rat for example is neither /rat/ nor /ɹæt/) Korn (talk) 15:52, 7 May 2012 (UTC)

On the contrary, rat is both /rat/ and /ɹæt/, as well as /ɹat/ and /ræt/. It may not necessarily be [ɹæt] every single time it's uttered, but it's not a dictionary's job to give a narrow phonetic transcription of every possible pronunciation of a word in every imaginable context. —Angr 21:17, 7 May 2012 (UTC)

What? It is neither nor nor nor. I was referring to the IPA in the rat-entry, which has "/ræt/" only. While the /r/ in the entry seems to concede to ease of access on the keyboard - where red uses /ɹ/ instead, according to the vote - the /ae/ seems to aim towards proper pronunciation, although there is no contrasting /a/ phoneme. Korn (talk) 22:17, 7 May 2012 (UTC)

ps.: Am I mistaken that the situation is this: Phonemically, RP and GenAm are completely identical and what we give as RP and GenAm are actually phonetic transcriptions of two former standards of some sort that are not longer in widespread use by native speakers? Korn (talk) 22:28, 7 May 2012 (UTC)

I was talking about the phonemic representation of the word rat, not what's currently present at our entry [[rat]]. Ideally we should be following Appendix:English pronunciation, which would give /ɹæt/ rather than /ræt/ as is currently used in the entry. Phonemically, RP and GenAm are identical in the word rat and in a lot of other words, but in many words they're distinct, and what we indicate is the current pronunciations, not former ones. The specific IPA symbols we use to indicate sounds are the ones that have the weight of many decades of tradition behind them, but that doesn't make our transcriptions outdated. —Angr 22:48, 7 May 2012 (UTC)

Wiktionary:Votes/2008-01/IPA for English r got forgotten for a long time, and to a large extent was never implemented in the first place, so there are thousands of entries which violate this vote. I change them from time to time, but really, it is a bot job given the massive numbers involved. Mglovesfun (talk) 09:14, 9 May 2012 (UTC)

What is Sum-of-Parts?

This topic does come up occasionally but there has never really been a conclusive answer that I can tell. Our main mission as a dictionary is to include all words in all languages. And our current practice seems to be to treat any word as idiomatic. However, there are many languages where a single word may be SoP. In German and Dutch for example, nouns may be combined into compounds which have pretty transparent meanings. See for example WT:RFD#Plastikschwanz. I also recently came across some features of Zulu grammar; in Zulu, not just the subject but also the object is included in verb conjugation, and subject and object are both conjugated for noun class (of which there are about a dozen) so that leads to 150 forms for the present tense alone! In some languages, particularly those in America, entire sentences may be constructed out of one word. So, for those languages, 'all words' could well mean 'all sentences', and I don't think that is what our mission intends. So what exactly is SoP? Which attestable words should not be included? —CodeCat 12:19, 9 May 2012 (UTC)

My gut feeling, at least for languages like Dutch and German, is that something is sum of parts if it can be broken down into elements that all stand on their own, and whose meanings obviously combine to produce the compound (i.e. that it consists of adjectives and attributive nouns modifying a base noun). In other words, words made using by applying suffixes and prefixes - even ones with perfectly systematic meanings - to idiomatic words should be included as long as they're attestable. For instance zerbrechen ("shatter") is easy to work out from zer- ("into pieces") and brechen ("break"), but zerbrechen seems like a perfectly cromulent entry to me. Of course, this approach would require some fairly indefensible hypocrisy on our part - Kopfschmerz is just "Kopf" + "Schmerz", but then headache is just "head" and "ache" - which is why I'd also suggest that, as a kind of COALMINE-esque hack, if a foreign SOP word is defined as an idiomatic English word/phrase, then that's an automatic keep. Smurrayinchester (talk) 13:21, 9 May 2012 (UTC)

(I'd also say that SOP words with unusual grammar - such as German separable verbs - should be kept.) Smurrayinchester (talk) 13:38, 9 May 2012 (UTC)

To what extent should this matter be decided on a language-by-language basis by those qualified to opine on the linguistic and lexicographic merits, possibly in conjunction with with wiktionary for the language involved? Those languages that have a significant number of qualified contributors weighing in on the matter may provide a useful model for other languages. The community as a whole can suggest what matters should be taken into account and possibly criteria, but I doubt that any but the broadest guidelines are appropriate.

I also note that a policy of "atoms before molecules" seems like a good idea for all languages, without prejudicing the eventual inclusion of at least some molecules. DCDuringTALK 14:36, 9 May 2012 (UTC)

"What is Sum-of-Parts?" It's a silly policy that openly contradicts NOTPAPER, artificially and arbitrarily restricts the number of entries, and needs to be abolished ASAP Purplebackpack89(Notes Taken)(Locker) 14:12, 9 May 2012 (UTC)

Not really. Everyone agrees that bright sunny day doesn't belong, and everyone agrees that machinegun does belong, but in between there is a large grey area. To be honest whatever rules we have it will always be in some way subjective, that is the point of the relevant discussion pages. Ƿidsiþ 14:26, 9 May 2012 (UTC)

Rubbish. It is anything but arbitrary. Equinox◑ 22:20, 9 May 2012 (UTC)

A problem with words such as the German one listed is that, if they contain more than two syllables, they can conceivable be broken down in multiple ways. Thus "nonagonist" could conceivably be either a "non-agonist" (someone who isn't an agonist), or "nonagon-ist" (someone with a special fondness for nine-sided figures. Thus we would need an entry for the term, so as to let people know which it is. SemperBlotto (talk) 14:45, 9 May 2012 (UTC)

I don't see that argument; even if some mathematicians start talking about nonagon-ists, that's not going to stop biologists from talking about non-agonists. It's analogous to things we accept as SOP; a red dog could be a canine that reflects light in the 670nm range, or it could be an ugly communist girl. We don't tell people that a "red dog" is virtually always the first, nor does that stop people from meaning the second.--Prosfilaes (talk) 19:46, 9 May 2012 (UTC)

The question regarding Dutch and German terms should be considered from the position of this being an English-language dictionary. I don't speak German. If I were to turn to the dictionary to translate a German passage, I wouldn't know where to split words in order to look up the component parts. If a "word" in the sense of a continuous set of characters uninterrupted by a space or by word-ending punctuation, is atteestable, then we should include it. As for the usual English SOP situation, yes "red dog" typically means a canine of that color, and other uses are really just alternative senses of "red" coupled with alternative senses of "dog". However, I would suggest that where the most common meaning of a combination departs from the most common meaning of the individual terms in the combination, then that combination should be included. bd2412T 20:05, 9 May 2012 (UTC)

But then does that mean that we include every single thing that can be plastic? Everything that can be Ersatz? (Seriously, with 5 minutes of Googling I can attest Ersatzmauer, Ersatztorwart, Ersatzkraftwerk, Ersatzgroßvater, Ersatzsauerstoff and Ersatzhandschuh, and I expect there are literally thousands more of these - Ersatzhimmel, Ersatzbrücke, Ersatzherz, Ersatzsuppe, Ersatzzündkerze...) It would be impossible to have entries for everything that could be created this way - and the systematic way this lets people build nonce words (I couldn't find any use of Ersatzeiswagen (replacement ice cream van), but AFAIK there's nothing to stop someone using this compound if the need arises) means it's unlikely we ever could collect all the possible German compounds (the situation is even worse for something like Nuu-chah-nulth, where a "word" conveys the same amount of meaning as an English clause - we'd effectively have to create The Library of Babel to categorise that one.) Our search function currently automatically finds words that begin with the letters that you're typing in - start typing "Ersatzeiswagen" and "Ersatz" pops up. While I agree it's not perfect, it's a start to finding word boundaries. I think the only proper way to deal with these sorts of compounding languages would be an overhaul of the search function (perhaps allowing searches to be restricted by language, for instance), thought I'll admit is very unlikely to happen. Smurrayinchester (talk) 21:14, 9 May 2012 (UTC)

I think this problem is obviated by our requirement that all forms be attested three times over at least a year (for living languages). If you can't find three cites for Ersatzeiswagen, we can't include it. The same goes for the Nuu-chah-nulth word for "My hovercraft is full of eels": if no one has ever used it in print (or durably archived on the Internet), it won't be added here. —Angr 22:13, 9 May 2012 (UTC)

What about non-living languages? We could end up categorizing every sentence attestable in some languages that way.--Prosfilaes (talk) 10:09, 10 May 2012 (UTC)

Well, isn't that a good thing? That's certainly what I imagine "every word in every language" to entail. —Angr 21:08, 10 May 2012 (UTC)

So, if English were written without spaces, would you expect Wiktionnaire to include every sentence from every well-known English work? —RuakhTALK 21:46, 10 May 2012 (UTC)

If English were exactly like English, but written without spaces, no, because spaces are not what define what words are. Language is independent of writing. —Angr 21:50, 10 May 2012 (UTC)

If Nuu-chah-nulth had taken over the world instead of English, would you really be encouraging us to have entries on every single sentence?--Prosfilaes (talk) 00:51, 11 May 2012 (UTC)

I'd be encouraging us to have entries on every triply attested Nuu-chah-nulth word, yes. I don't actually know Nuu-chah-nulth, but I know roughly how polysynthetic languages work, and it's an exaggeration to say that every sentence is a word. Of course, there are sentences (containing a finite verb) that consist of a single word, as indeed there are in Latin (e.g. Flevit "He wept"), but most sentences are multiple words. I strongly suspect that while "It's full of eels" could potentially be a single word in Nuu-chah-nulth, "My hovercraft is full of eels" is probably at least two words long ("my-hovercraft" and "is-full-of-eels"), while "My hovercraft, which I had just picked up from the garage, was full of eels, so I took them home to my wife, who made a delicious eel pie out of them" is many words. —Angr 19:54, 11 May 2012 (UTC)

"Every word a sentence" isn't necessary to make creating entries for every "word" in polysynthetic languages unwieldy. Do we really want separate entries for "I gave it to him", "I gave those two things to him", "I gave those two things here to him", "I gave those two things there (nearby) to him", "I gave those two things there (far away) to him", "I gave it to her", "I gave it to you", "I gave it to them", etc., ad (almost) infinitum? Those all exist, though perhaps not all in the same language- and there are many, many more, often based on what would be expressed in English by separate adverbs, prepositions, articles, etc. Even more familiar languages, such as Hebrew, have similar problems: Hebrew has very common prefix versions of many prepositions, "and" and "the", and suffix versions of personal pronouns. To implement this, we would need an entry starting with ה־‎ for every Hebrew word that can take a definite article- and most of those would be attestable, since it's a basic part of the grammar. Chuck Entz (talk) 21:07, 11 May 2012 (UTC)

Well, we have already decided not to have entries for English nouns with 's added, so perhaps a similar decision can be (or has already been) made for Hebrew nouns preceded by ה־‎ (or for that matter ב־‎ or ל־‎). It would have to be decided on a case by case basis whether a certain language's clitics are to be treated like 's or not, but in principle I see nothing wrong with having separate entries for all of the things you listed above. Really they're no different from the English word dogs, which is also SOP as dog + -s, and yet we keep it. We're not going to run out of space, and there is no deadline. —Angr 21:31, 11 May 2012 (UTC)

Actually, we do allow such words if (like butcher's) they are the names of types of shops. SemperBlotto (talk) 21:36, 11 May 2012 (UTC)

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘
I notice two different arguments happening here. One seems to say "we cannot be expected to include everything → it's too much work", and the other seems to focus on "if a given term meets CFI and if there is a call for having it here, let's include it."

These strike me as orthogonal arguments.

If a term meets CFI and if there are grounds for including it here, I say, fine. Let someone interested put in the work. I don't think that "every word in every language" means that those of us here are under any duty to put that work in ourselves; we are all volunteers, after all. However, I *do* think that "every word in every language" means that, provided a term passes CFI, we should not be opposed to someone adding the term.

To sum up:

Those opposing a broader stance on SOP appear to be opposed to any imposed duty -- to quote Chuck just above, "[this will] make creating entries for every "word" in polysynthetic languages unwieldy" suggests the need to do all the building out ourselves.

Point 1: I don't think that's necessary. Let other interested editors put in that work.

Those opposing a narrower stance on SOP appear to be opposed to potential usability issues from the necessarily higher knowledge requirements for users -- to quote BD2412 further above, "If I were to turn to the dictionary to translate a German passage, I wouldn't know where to split words in order to look up the component parts," suggesting a higher barrier to entry for users of EN WT, as a user must know that a given term is SOP and know how to break it down into constituent parts before they could find anything useful.

Point 2: We (we = editors) might need to revisit the issue of who our intended audience is, as this would help clarify whether higher barriers to entry are acceptable. -- Cheers, Eiríkr Útlendi │ Tala við mig 21:24, 11 May 2012 (UTC)

Yes, This is the "slippery slope" argument. Just because we allow a certain class of word does not mean that anyone is under any pressure to add them all. I think that was agreed years ago. SemperBlotto (talk) 21:36, 11 May 2012 (UTC) (That might have been added at the wrong indent - this section is getting impossible to edit!)

But the "slippery slope" argument is the foundation and raison d'etre for SOP, so it's relevant. Chuck Entz (talk) 22:51, 11 May 2012 (UTC)

You certainly haven't summed up my view. I think that the dictionary is actively harmed by including entries for non-lexical expressions. I'm not worried that anyone will expect me to add entries for all sentences of a polysynthetic language, if only because I don't speak any such language; rather, I'm worried that someone will themself add such entries. Anyone who's ever voted "delete" at WT:RFD on the grounds that something isn't an idiom should recognize that they've already taken the stance that non-idiomatic expressions are harmful to the dictionary. They should either recant that stance, or else recognize that it also applies to things in other languages that an English-speaker might mistake for "words". (Similarly, when I object to the addition of encyclopedic information, it's not because I'm worried that anyone will force me to add such information.) —RuakhTALK 22:10, 11 May 2012 (UTC)

Not necessarily. People may have voted delete at RFD on the grounds that something isn't an idiom merely because our policy is to exclude non-idiomatic phrases, not because they actually believe non-idiomatic phrases are harmful. I'm not worried about someone adding all (triply attested) verb phrases of a polysynthetic language, indeed I would welcome it. But I am worried about us deleting forms like birds and walked because they are also transparent SOPs. —Angr 22:28, 11 May 2012 (UTC)

To Smurrayinchester, I would say yes. The attestation rule that Angr refers to is meant to limit our offerings to words that someone might come across and wish to have defined. So what if that means that thousands of compounds might be added? The rule doesn't go on to command you to find and add these compounds. bd2412T 02:46, 10 May 2012 (UTC)

Bd2412 and Angr have hit the nail on the head IMO.​—msh210℠ (talk) 07:15, 10 May 2012 (UTC)

@SemperBlotto There are languages that don't separate words at all - Japanese, I'm looking at you - and while I can certainly attest and cite, say, "何時でしょうか" ("What time is it?") for the benefit of people who don't know where the words begin and end, it doesn't seem like defining every sentence ever used in a Japanese book is within the scope of Wiktionary. Without knowing at least a little of the grammar of a language, a dictionary is never going to be much use. Smurrayinchester (talk) 09:28, 10 May 2012 (UTC)

I don't think "word" is defined as "a string of letters separated by spaces in writing". Surely there's an adequate definition of "word" for languages like Japanese that are written without spaces. —Angr 21:08, 10 May 2012 (UTC)

Japanese certainly has its own definition of a word (the rōmaji method of writing Japanese even puts spaces in to differentiate words). My point was that although an English speaker would not necessarily be able to recognise Japanese word boundaries in the usual Japanese alphabets, it's not practical or desirable to build Wiktionary around every possible combination of "superwords" (anything that an inexperienced user of the language might think was one word) - one of the main arguments given in this debate seems to be that because a non-German speaker who didn't know the words Ersatz or Torwart wouldn't know whether a Ersatztorwart was an Ersatz Torwart, an Ers Atztorwart, an Ersatzt Orwart or an Ers Azt Or Wart, we should include Ersatztorwart to help them find the pieces of the compound. Japanese is an (admittedly extreme) example of why this might not be practical or desirable. Smurrayinchester (talk) 22:09, 10 May 2012 (UTC)

I think we'll all be much happier once we realize that a bilingual dictionary can never, by itself, be a sufficient tool to enable translation between a language that you know and a language that you don't. (If it were, then machine translation would have been a solved problem by now.) A dictionary is a repository of lexical information, and translation requires more than that. End of story. This doesn't mean that all compounds should be deleted — many compounds really should be thought of as lexical items (albeit morphologically transparent ones) — but it does mean that not all of them should be kept. —RuakhTALK 21:44, 9 May 2012 (UTC)

This is a good point. It doesn't seem fair to our users to offer compound translation if that will only ever have spotty coverage of the combinatorial explosion of possible compounds. That said, I'm increasingly unsure about where the line between lexical objects and obvious compounds lies. Ersatzreifen (spare tyre) seems like a word we should have, Ersatzzündkerze (replacement spark plug, totally attestable) doesn't, but I'm struggling to come up with a concrete reason why I think this. Smurrayinchester (talk) 09:28, 10 May 2012 (UTC)

@Purplebackpack89 SoP isn't a policy but a slang term used by mostly experienced editors. It doesn't contradict WT:PAPER, chiefly because it doesn't even exist. And referring to the Wikipedia policy, last I checked Wikipedia also says that it is not a miscellaneous compilation of information. As Equinox pointed out, we have enough space for lots of pictures of kittens, but that doesn't mean we should include them just because it's practically possible to do so. Mglovesfun (talk) 22:29, 9 May 2012 (UTC)

I am convinced that we should not be excessively restrictive about the inclusion of SOP terms. After all as we are not paper, so it is no mayor goal to keep the database small. On the other hand, in analogy to Wikipedia, a mass deletion of valid articles is likely misunderstood as a sign of an arrogant and square censor-mentality of the community. Moreover I think the discussion on RFD about the presumed SOPness of particular terms leads to nowhere and is a complete waste of energy. Such RFD votes could be avoided or at least reduced if we come to a consensus about some set of rules ala WT:COALMINE, which qualify SOP terms for inclusion. For example a generalization of WT:COALMINE could be to allow: (i) SOP-terms which have less common non-SOP synonyms (ii) SOP-terms which have non-SOP translations. Additionally some rules, which qualify a SOP term as translation target could be established, e.g. if translations cannot be easily derived from the English parts or if the term if covered by a number of Wikipedia articles in different languages. What do you think? Matthias Buchmeier (talk) 08:35, 10 May 2012 (UTC)

I'd like to know why we would want to restrict the number of terms entered in the first place. After all, we can include every term and sentence of every language. And while I would not feel to well about it, I cannot come up with a convincing argument why we shouldn't. We have the phrasebooks which make a first step into sentence-permission. Korn (talk) 11:34, 10 May 2012 (UTC)

One convincing argument is from practicality. While we don't have a strictly limited space to add entries in, we do have a limited amount of eyes to watch over, fix, clean up and improve all those entries. The more we have compared to the amount of editors, the less attention each one will receive. Furthermore, if we include too many phrases, it would be harder to find individual words unless the search is improved to find words first and phrases only if there is no word. —CodeCat 11:44, 10 May 2012 (UTC)

Same reason that a maths textbook doesn't contain every possible maths problem and answer (1+1, 1+2, 1+3, ... 2+2, 2+3, ... 999+1, ...). It's absurd. There is an infinite number of them, and they can be formed using rules. Sentences are formed using rules of language. We are a dictionary, not a grammar book, and even a grammar book only gives the rules, not every possible application of those rules. Equinox◑ 11:58, 10 May 2012 (UTC)

I do not see why it is absurd. If we do not want entries which can be compiled by grammatic rules, we have to delete every compound which has no completely new meaning. And while that sounds like current SOP, it would include every form of coal/mine, headache, Rathaus etc. And we'd certainly have to delete the phrasebook, which is just that. Regarding CodeCat: While your reasons certainly are reasonable, since every entry which then would have to be cleaned up by hand is now added to RFD and discussed, it doesn't seem like such a big step regarding amount-of-work.

Several users posted into my comment at this position. I (Korn) moved them below my comment.

That said: I'd vote that we take SOP literally, use only semantics to define 'parts' and exclude prefixes. Non-English SOP-entries should be kept if they are a translation for an English non-SOP term and thus necessary to have a translation for that, but not the other way round. I agree that, while an Anglophone might not know how to break up 'Plastikschwanz', the search will always give 'Plastik' as a first result; which is not the best solution but it is one. As said, we are only partially here to teach Grammar (We do have inflection tables.) and if one does not want to include every German word ever used three times, I don't think we can go another way. Since, however, prefixes are sometimes very abstract, it's always a game of chance to make out the meaning by its parts. Korn (talk) 13:01, 10 May 2012 (UTC)

Adding an example: The city hall is the place where the government sits, which cannot be deducted from city+hall. Hence Rathaus (government-house), which is SOP, should be kept. Stadthalle (city hall) would have to be deleted since it is SOP: A hall in the city. And yes, that would also mean to delete headache. Korn (talk) 13:06, 10 May 2012 (UTC)

Following are the comments removed above:

Not true, as someone might misinterpret coalmine as co- + *almine rather than coal + mine. Where the compound has no helpful space or hyphen, there is this reason to have it, to show where the break in the words is. Equinox◑ 13:16, 10 May 2012 (UTC)

Well, that's the same problem as in German and Japanese. Then we'd have to include every word and sentence citable. Korn (talk) 16:41, 10 May 2012 (UTC)

I don't think it makes sense to conflate languages written with an alphabet (e.g. English, German, and Russian) with the very different schemes of languages like Japanese and Chinese. Words in the English language are written in the Latin alphabet, so English speakers will tend to read other languages written in the same or similar alphabets as having the same rules defining what constitutes a distinctive word. An English speaker is much less likely to look at a lengthy string of Japanese characters and conclude that it is a "word". bd2412T 17:22, 10 May 2012 (UTC)

But the problem is the same: The English speaker looks at a string of characters, word or sentence or whatnot, and does not know what comprises a separable term which one could look up in a dictionary. The basic decision here is whether we want users to already know enough about the language to tell things apart or whether we want to do that work for him. Korn (talk) 17:30, 10 May 2012 (UTC)

I think the real distinction of languages written with the Latin alphabet and some variations of it is that they have things that look like words - pronouncable (more or less) strings of letters with consonants and vowels separated at reasonable intervals by spaces and punctuation. An English speaker will look at a sentence like "Die Diskussion läuft etwa zwei bis vier Wochen, danach kann ein Administrator unter Berücksichtigung der in der Diskussion erbrachten Argumente eine Entscheidung treffen" and perceive a group of individual words, whereas a sentence like "中文版维基词典现在有管理员执行删除操作，所以请把所有有待删除的页面標示" (despite the punctuation mark and space in the middle) will not yield such a perception. bd2412T 15:26, 11 May 2012 (UTC)

And how does this lead you to the conclusion that German compounds and (I guess) Chinese compounds should be treated differently? Because I see my former point still standing: They are the same in that one looks at an uninterrupted glyph-line without knowing where a single lexical term ends. Korn (talk) 18:33, 11 May 2012 (UTC)

The difference lies in the nature of the characters. Very specifically, an English speaker would see the German sentence containing words composed of characters in the Latin alphabet, and having the sort of syllabic construction familiar to English speakers; words like "erbrachten" and "Berücksichtigung" look like individual words for which things like emphasis and pronunciation can be puzzled out. There is no such familiarity in "所以请把所有有待删除的页面標示" from which to draw out pronunciation, identify prefixes and suffixes, or the like. To someone unfamiliar with this character set, each character might just as well be an individual word. This is particularly exacerbated by the absence of spaces, which occur only in conjunction with punctuation, and not organically between collections of characters. bd2412T 19:11, 11 May 2012 (UTC)

Perhaps we should have more restrictive attestation requirements for German phrases that an English speaker, with absolutely no knowledge of the language, would assume are individual words? For example, maybe a cite should only "count" if it uses the phrase within the first ten words of a paragraph? After all, no such speaker will get more than ten words in without starting to realize that maybe they're not taking the right approach. —RuakhTALK 19:31, 11 May 2012 (UTC)

End of the comments removed from above.

Another practical thought: how would you define bright sunny day in a way simpler than bright, sunny and day do? Define unidiomatic utterances would be very difficult indeed, much harder than simply having the user look up the words they don't understand. Mglovesfun (talk) 13:19, 10 May 2012 (UTC)

It's not just "what is SOP?", but, more basically, "what is a word?". I remember an extreme example given by my phonology professor: it consisted of nothing but (lots of) consonants, and the translation was "I just saw those two women come this way out of the water". Dictionaries are great for languages where parts of speech reside conveniently in separate words, but with polysynthetic languages there are affixes representing subject, direct and indirect objects, adverbs, etc. To make matters worse, phonological interactions make it hard for non-fluent speakers to figure out what the parts are . I've seen dictionaries where all the entries for pages and pages share the same subject and object pronouns because those are prefixes and thus determine where the "word" goes in alphabetic order. On the other extreme you have German separable prefixes that are an integral part of the verb, yet can have all kinds of verbiage in between them and the main verb.Chuck Entz (talk) 13:31, 10 May 2012 (UTC)

@Chuck Entz yes. I've always argued the same about Spanish contractions too such verle (see him). They aren't words, they are two words written with no space in between. But, to someone not competent in Spanish, they appear to be words, so they may want to look them up. Mglovesfun (talk) 11:24, 11 May 2012 (UTC)

At least, everything considered as a word by the language should be includable, including long compound German words (but only those actually used, of course, not all words that could possibly be built) and contractions such as the French word du or the Portuguese word no. And, more generally, all elements belonging the vocabulary of the language (e.g. Atlantic salmon, because it belongs to the vocabulary despite its SOP character). I also agree that other cases (such as verle) might be includable when their inclusion is considered as really useful after discussion. Each kind of additional case should be discussed independently. Lmaltier (talk) 20:17, 11 May 2012 (UTC)

It looks to me as if we need a better definition of word, where the definition depends upon the language class. For typical European languages "a string of characters bounded by a space or punctuation" looks pretty good to me. I have no knowledge of other language types so can't contribute there. SemperBlotto (talk) 21:21, 11 May 2012 (UTC)

Remember, all numbers from 1 to 999,999 are written together in German. I could easily set up a bot to add 900,000 new entries on German numbers to Wiktionary. They're all "words" according to your definition, but this can't be what you want. -- Liliana• 21:40, 11 May 2012 (UTC)

See "slippery slope" elsewhere in this discussion. (the same goes for Italian numbers) SemperBlotto (talk) 21:42, 11 May 2012 (UTC)

I'm not profoundly concerned about that. Unlike most of these works, they're easily upkept by bot and there's no controversy over their definition.--Prosfilaes (talk) 23:52, 11 May 2012 (UTC)

Ad "what is a word": Let's face it -- "word" is understood by 99% of all people, including our users, as a string of characters without spaces in between. This definition says that German Hausschlüssel is a word while its English equivalent house key is not. I don't think there are any linguistic criteria other than orthography (if you want to count that) to distinguish between these two expressions. So I'm all for using critera independent from orthography. Problem is: there is more and more doubt among linguists as to whether the unit "word" does really exist linguistically and universally and if it does, whether it can be defined in any practical way. Considering this, using orthography as a criterion at least for some languages doesn't seem to be such a bad idea after all, e.g. in English which doesn't have any officially determined orthography and thus writing conventions tend to reflect speakers' intuitions as to what is lexicalized enough to count as a word (what is felt as being one unit tends to be written as one string, though of course that doesn't work always as perhaps the house key example shows). But then, languages like German which have an officially determined orthography show how arbitrary that can be. For example, the latest reform defined that daheimbleiben(“to stay home”) is to be written as one string, whereas it was written daheim bleiben before. While lexicalization considerations certainly played a role when the spelling changes were made, this certainly can't be considered proof that daheimbleiben is now more of a word than some years before. (Whether it is to be considered a word is indeed a very interesting question. There's a huge grey area between "clearly a word" and "clearly not a word".) Longtrend (talk) 11:49, 12 May 2012 (UTC)

Minor point -- China alone accounts for roughly 1/6th of the global population, and Chinese does not use spaces -- so 99% of all people would most definitely *not* necessarily conceive of a "word" "as a string of characters without spaces in between". -- Eiríkr Útlendi │ Tala við mig 19:11, 12 May 2012 (UTC)

True, but this is an English-language dictionary, and it is much more reasonable to suggest that 99% of all Enlgish-speaking people conceive of words written in Latin-derived alphabets "as a string of characters without spaces in between". Even people born and raised in China, when they learn English or Spanish or Polish, are taught to distinguish words in those languages by the spaces between them. (I know this for a fact, because I've been married to one of them for ten years now). bd2412T 15:23, 18 May 2012 (UTC)

Yes, but even on EN WT, we have entries in Chinese and Japanese, two notable languages that do not use spaces. The "Latin-derived alphabet" qualification is an important one. :) -- Eiríkr Útlendi │ Tala við mig 16:23, 18 May 2012 (UTC)

Any more input, perhaps? It would be a shame if we had this superlong discussion without coming to any consensus again. Longtrend (talk) 14:01, 18 May 2012 (UTC)

Well, a vote would force people to do something about the situation. We could for example decide whether SOP should be part of deletion policy or not, which in turn would force us to decide definitions and exemptions.Korn (talk) 14:19, 18 May 2012 (UTC)

I'd say SOP is already part of deletion policy, de facto at least; the problem is that different people have different impressions about when a term is SOP and when it isn't. It isn't the sort of thing that can be unambiguously defined, as it relies too much on subjective impressions. I don't think a vote would change that. It's like notability at Wikipedia: almost everybody agrees that articles on nonnotable subjects should be deleted, but people don't agree on what is notable and what isn't. —Angr 14:38, 18 May 2012 (UTC)

(After edit conflict)

Yes, very much what Angr says above. SOP can be blindingly obvious to someone well-versed in the relevant language and completely unclear to others, and once the semantics and mechanics of the term are explained, you'll still find that some people just might not see the term as SOP due to differences in how people think, or some folks might argue for the term's inclusion even so due to the structure of the term. Navajo shimá(“my mother”) is basically SOP as shi(“I, me, my”) + amá(“mother”), but due to the mechanics of the language, shimá is considered to be a single integral term. Japanese 貨物輸送運賃(kamotsu yusō unchin, “freightage, shipping costs”) is basically three words as 貨物(kamotsu, “freight, cargo”) + 輸送(yusō, “transportation, shipping”) + 運賃(unchin, “fare, rate, charge”), but it's still included in a number of J-E dictionaries, presumably as a translation target since this can be rendered as a single word in English.

So SOP does appear to be an important criterion by which we decide whether to keep an entry -- but it's also a gray area, and voting wouldn't do much to clarify things, as the gray-ness is due to the murkier problems of working between languages. -- Eiríkr Útlendi │ Tala við mig 16:15, 18 May 2012 (UTC)

I have been thinking about this a great deal, and I think we are looking at the question the wrong way. I just added a definition for market order, a term that is peculiar to the stock exchange, and has a very specific meaning not discernable from reading the individual parts. Stil, it is not exactly correct to say that market order is a "word". Clearly it is two words that come together to form an expression that means something discernably different than the individual words of which it is composed. So let's stop pretending that we are disputing whether an expression of two or three words is "a word" and recognize that what we are really doing is making a dictionary of "all words and expressions in all languages". This is not a call for a radical change to our rules, since it remains the case that "brown leaf" or "the weather in London" is not an expression at all different from the combination of words from which it is made; it is merely a proposal that we recognize that many of the disputes we have at RfD are about whether we should include expressions that can to some degree be figured out by looking at the words that go into them. However, since we are writing a dictionary here, which is intended to be a resource for people to discover meanings that they could not confidently puzzle out on their own, we should lean towards being helpful and inclusive of expressions for which someone might reasonably experience such difficulty. Cheers! bd2412T 15:35, 18 May 2012 (UTC)

@BD2412 -- that might be why some folks use the word term to refer to "a lexical unit", as this can include lexical units consisting of multiple words. -- Eiríkr Útlendi │ Tala við mig 16:23, 18 May 2012 (UTC)

"Term" could still be argued to be synonymous with a single word. I realize you are not using it that way, but "expression" removes all doubt. Cheers! bd2412T 16:44, 18 May 2012 (UTC)

Actually, in linguistics at least, "expression" is regularly used for pieces of linguistic data regardless of complexity or SOP-ness, so it's not that fitting either. Longtrend (talk) 16:50, 18 May 2012 (UTC)

Could you differentiate the role of an encyclopedia from that of a dictionary? DCDuringTALK 15:49, 18 May 2012 (UTC)

There are going to be a lot of distinctions in coverage between an encyclopedia and a dictionary, but I think it is important to recognize that there is also going to be a lot of overlap, and that is not a bad thing. Wikipedia has an article on piano because that is clearly an encyclopedic topic, but that doesn't mean that we should not have an entry for piano; the difference is that our entry exists to tersely define the word piano, and not to list famous pianists or piano concertos. We are not about to start having tens of thousands of biographical entries, or entries on topics like Supreme Court of Thailand or Death of Michael Jackson or The Curious Case of Benjamin Button, but that shouldn't stop us from having entries on terms like stock market and tennis racquet and predatory pricing. bd2412T 16:08, 18 May 2012 (UTC)

Well, it would follow from some suggestions made on these pages that the proper (official?) English name or English translation of the Thai name should be in Wiktionary.

I find it hard to take seriously hortatory proposals (and slogans) that do not grapple closely with the question of the limits on what is to be excluded. Your suggestions about cases that are far from the border you would favor does not do much to help us understand where you would recommend the border be. And as cases like the "Supreme Court of Thailand" might indicate not everyone agrees that the border is in the range you dismiss so offhandedly. As you have given the matter thought, perhaps you could more narrowly locate the border. DCDuringTALK 18:44, 18 May 2012 (UTC)

Sure, the question of what is a "word" is only part of the problem discussed here. But sometimes it's quite major; see the example of German Hausschlüssel vs. English house key that I gave above. AFAICT, both expressions differ only in that the second includes a space while the first does not. Since we attempt to include "all words in all languages" and since many people have a very specific, orthography-based understanding of the word "word", the implication would be to include Hausschlüssel (as well as random one-"word" sentences from polysynthetic languages) but not house key. I don't know whether this is a desirable approach. Longtrend (talk) 15:54, 18 May 2012 (UTC)

How exactly is the definition of a word relevant to this? Has somebody ever proposed any specific treatment for non-word entries? Korn (talk) 16:16, 18 May 2012 (UTC)

Whatever might be the exact definition of SOP used: Hausschlüssel and house key are probably SOP to the same degree. Yet, I bet nobody would include house key here, but at the same time few would want to exclude Hausschlüssel. The reason for this is alleged wordhood of the latter and alleged non-wordhood of the former. So the definition of a word is definitely relevant to this discussion. Longtrend (talk) 16:44, 18 May 2012 (UTC)

In fact, this whole conversation started because of the proposed deletion of Plastikschwanz, which is a single word, but whose internal morphology makes its meaning transparent. As I understand it, the agreement at Wiktionary has always been to include single words even if their internal morphology is "SOP", as with English birds and walked, which I trust no one wants to delete. But we start to get into gray areas with compounds like birdhouse and fishtank (and Plastikschwanz belongs to that group) and even more so with words in polysynthetic languages like Chukchi təmeyŋəlevtpəγtərkən "I have a fierce headache". —Angr 16:55, 18 May 2012 (UTC)

My father would shrug and say, "it can't hurt, and it might help". I think that is a useful guideline. Since no one is required to add anything to the dictionary, it puts no extra work on any of us to allow the hash brownies and birdhouses and Plastikschwänze to have entries, and it is not unreasonable to expect that these entries might help someone. We are, after all, writing a dictionary to serve as a resource for readers, and not for our own insular purposes. bd2412T 17:03, 18 May 2012 (UTC)

But (to re-stress your statement) we are writing a dictionary to serve as a resource for readers, and IMO we must fight the addition of any content not suitable for a dictionary — as measured by traditional dictionaries made by people better qualified than we are. If we gradually, through apathy or reluctance to interfere, sink into allowing everything that might be useful to anyone, we will just become a dump for everyone's crap of any kind. I think we must be vigilant. Equinox◑ 17:10, 18 May 2012 (UTC)

My friend, I think you give far too much credence to "traditional dictionary" writers. Noah Webster was famous for his proscriptive biases. Traditional dictionaries have been constrained by the available technology and the limitations of being written on paper. A trained lexicographer might be able to trace the Greek or Latin or Arabian roots of a thousand words, and yet not have a clue what a hash brownie is. Our attestation rules keep out the made-up stuff, so what we should be most vigilant for is the straight-up hoax, not the well-worn phrase that combines words of arguable ambiguity. bd2412T 17:36, 18 May 2012 (UTC)

Users (and contributors) are today not completely sold that a dictionary should not be prescriptive and proscriptive, a battle that was largely supposed to have been one by Mr. Grove's Third edition of Webster's unabridged in the 60s. As long as we think it is our obligation to define any attestable term taken out of the context that would enable it to be decoded from its parts there is no practical limit (not even being a phrase [or constituent]). It is hardly unreasonable to expect human users of a dictionary to decode terms consisting of one or more polysemous words.

In fact, there are systematic biases in what we include in Wiktionary. For common (boring) words (and many others, dated and otherwise) we could not rely on lexiciographically inclined contributors, but have relied on lightly edited copies of Websters 1913 entries, with its dated, even incomprehensible, wording and all. New entries and senses are added principally in areas that reflect the techno-geek, linguistic, and youth-related interests and biases of our user base, with occasional serious contributions from other PoV pushers, sometimes coming here from WP. We also have some nostalgia bias from our older contributors. We are highly unlikely to become a balanced resource for the population of the world at large if we dilute and squander our efforts and technical resources on phrasal entries for which we do not have the equivalent of Websters 1913 to provide the balance that we lack. DCDuringTALK 19:16, 18 May 2012 (UTC)

The beauty of this being a wiki is that you can direct your resources to working on whatever you feel needs to be worked on. I make appendices of letter variations, and no one has told me that I shouldn't do that because other things require attention. It doesn't expend any of your resources or squander any of your efforts if another editor wants to add something that you would have put as your lowest priority. bd2412T 20:51, 18 May 2012 (UTC)

We should work only within the scope of the service that we purport to be providing or that the funders and users think we provide. I suppose that as we actually are just free riders on software much better suited for an encyclopedia and get only a small fraction of the hits that WP gets that resources aren't much of an issue. I get the impression that Mediawiki is none to responsive to our special needs. That all Wiktionaries still get less than one fourth the hits that MWOnline gets need not trouble us I suppose. Nor that the hits for this April are down about 15% from last April. But I am concerned with the competitive weakness of Wiktionary. DCDuringTALK 23:42, 18 May 2012 (UTC)

I don't think the answer to competitive weakness is to offer less. There are two bottom line questions facing us. How do we get more eyes on our pages, and how do we get more people who feel as compelled to volunteer their time to improving the project as those of us who are participating in this discussion right now. There are practical limitations to how far we can go to achieve either goal. Obviously, if we had a definition of "Kim Kardashian" it would draw a lot of curious eyes, but that alone is not a reason to "define" that particular term. What we can do, however, is go bigger in terms of the definitions that we can present with a straight face. That is one of the reasons why I have sought to import public domain medical dictionaries, law dictionaries, and other technical sources, and that is why I proposed in the past that we should pick at least one foreign language to double down on and get as complete a coverage as possible. We need to be offering more than everybody else, not less, and we need to be offering things in terms of both scope and depth of content that no one else is. Don't forget, others can copy what we have built and add the content that we pass up on, and use that to draw eyes to their sites (for profit), so we need to always be putting another foot ahead of the game. bd2412T 01:06, 19 May 2012 (UTC)

We already offer "more". It is our reliability and quality that are the issue, IMO. If someone wants to look up the meaning and use of that, one of the top look-ups at MWOnline according to them, do you think a user would rely on what we present? I think importing specialized glossaries is wonderful, subject to copyright concerns. Much of that content, if attestable, would probably not be SoP, but some is, in my experience.

It would be interesting to find what the total traffic of all those who have copied our content is. How could we find out names of some relatively popular websites that have done so, especially in the last year or so? Should we Google some content that only we have that was added 13 months ago? DCDuringTALK 15:30, 19 May 2012 (UTC)

SOP or not SOP

Inspired by: Wiktionary:Requests_for_deletion#Plastikschwanz and Talk:Zirkusschule
While there are discussions about nuances and idiomatic value, what we have there is less a discussion about dildos and the circus and more one about the SOP-rule, really. If I look at the discussion, I think it can be boiled down to three views:

SOP-words should be deleted

SOP-words should be kept

SOP-words should be kept if English speakers cannot tell apart the parts of the compound

While I do not have an opinion on that, it seems necessary to take this to a more basic level before it flames again and again with every German compound ever entered. Korn (talk) 19:40, 9 May 2012 (UTC)

No; I just saw it this second. I must blame this embarrassment on my browser.Korn (talk) 19:45, 9 May 2012 (UTC)

Proposal: Starting points

I'm quite new here and when I want to open a category, I must look up a word and then scroll down to the categories. So if I wanted to enter a phrasebook, I'd need to find a term in the phrasebook and open the category. My idea is this:
We turn the lvl. 2 headers (==German==) into links to overview pages which contain links to all the interesting bits like the IPA for that language, the WT: About ..., the Phrasebook and the part of speech categories.
Should such pages already exist, the still should be easier to find. So how about it? Korn (talk) 11:41, 10 May 2012 (UTC)

Not a bad idea. I'd prefer it if the link text remained black, however. Ƿidsiþ 14:16, 10 May 2012 (UTC)

Sounds good to me, too. I'm all for improved usability and discoverability, and this change would increase both. -- Eiríkr Útlendi │ Tala við mig 15:38, 10 May 2012 (UTC)

Note that this wouldn't work with Tabbed languages. —RuakhTALK 15:44, 10 May 2012 (UTC)

What's a tabbed language? But yes, I mean it should link to Category:German language. And while I can see that keeping black text might be easier on the eye, how would people know it is a link? Maybe we could insert a disclaimer like: __ Portal: German __ under the header without being obtrusive? I'd also like to propose that we change explanations like German groups of words elaborated to express ideas, which doesn't tell the user what it links to: Phrasebook, idioms, aphorisms, proverbs. Korn (talk) 16:33, 10 May 2012 (UTC)

Re: "Would the proposed addition break this function or just don't work with it?": I don't know if it would break it — and even if it would, that's fixable — but I just meant that Tabbed Languages already uses the text of the L2 header as an active/clickable area, so it can't be used as a link. —RuakhTALK 20:00, 10 May 2012 (UTC)

Well, if we use the example shown below (made by someone in Yair's discussion), it would be right at the top of the page with tabbed languages, wouldn't it? Sounds like the perfect solution to me. Korn (talk) 20:20, 10 May 2012 (UTC)

But the link isn't to information about German (that's on Wikipedia), so "All about German" doesn't really fit; perhaps "German on Wiktionary"? That's if these links are desired, which I'm not convinced of. (No litotes intended.)​—msh210℠ (talk) 20:53, 10 May 2012 (UTC)

To begin with, I think that such links could be very, very useful. What I would suggest is a default link to the category page, with the option of linking to something along the lines of our About languageX pages. So, for example, we have WT:AGRC, which is intended for editors, and shows how to edit grc entries. If we had a prominent place to put a link, I'd happily make Wiktionary:On Ancient Greek (or some other title to be determined) which gives background on the language, tells the reader how to interpret some of the information, where to find certain things, links to appendices, all sorts of fun stuff. This leaves us room to make something super intuitive for users, without having to write one for every language right away (because we have the languageX category link as the default. -Atelaesλάλει ἐμοί 02:35, 11 May 2012 (UTC)

This is really wonderful, especially because it leaves room to grow. 'German on Wiktionary' seems the best option to me for wording. --Μετάknowledgediscuss/deeds 06:05, 12 May 2012 (UTC)

I like 'portal' personally, because it ties in with the terms used on Wikipedia. It's what people might already be familiar with. —CodeCat 11:43, 15 May 2012 (UTC)

Uhm...while I was the one to propose it, I am not going to be the one to implement it. I have no idea of any code-working whatsoever. So...maybe someone wants to make it happen or start a vote or whatever's necessary.Korn (talk) 03:46, 3 June 2012 (UTC)

Splitting "About" pages?

As noted peripherally in the "Proposal: Starting points" topic above, most of the About... pages are primarily aimed at bringing editors up to speed on the aspects of the language necessary to properly create and edit entries in that language. What about those who just want to look up words?

While it might be ok to refer to the relevant Wikipedia articles on the language in question, it would be nice to have an explanation of the features of a language in the context of how Wiktionary organizes and presents the language. Helpful hints such as how to find the lemma forms, how to distinguish parts of a word that are dealt with in separate entries, what is the significance of the different inflectional categories, etc. would be good to include, as well.

In some cases, this is covered in appendices, but it would be nice to be more systematic about meeting the specific needs of a novice to the language. Often the appendices seem to be aimed at those who already know something about the language, but want to expand their knowledge. In Appendix:Hebrew verbs, for instance, the term "binyan" is used consistently throughout, but never defined. What's more, Wiktionary has no entry yet for binyan- only a rather general one for the Hebrew בניין.

I would like to see us develop either separate pages or separate subpages to form an introduction to what someone needs to know in order to use the Wiktionary entries for a given language, maybe having titles such as "About Hebrew (editors)" and "About Hebrew (users)", for example. Chuck Entz (talk) 22:34, 11 May 2012 (UTC)

I think the problem with Appendix:Hebrew verbs is not that it's unsystematic (unless that means "incomplete", in which case, well, this is a wiki, these things take time), and not that it's aimed at those who already know something about the language (note that the only thing the page does is explain what binyanim are, so I don't accept your supposition that it's intended for readers who already know); rather, the problem with Appendix:Hebrew verbs is that it's terrible and needs to be completely rewritten. (I mean no insult to the editor who wrote it; I've tried to rewrite it at least a dozen times now, and y'know what? It's hard!) I don't think that (say) Wiktionary:About Hebrew (users) is very likely to be any better. —RuakhTALK 22:55, 11 May 2012 (UTC)

Perhaps we should have something along the lines of -pedia's Portals. SemperBlotto (talk) 07:18, 12 May 2012 (UTC)

If this proposal is followed through on (which I support) then the current 'about' pages should be moved to the Wiktionary namespace, because they would then be targeted primarily at editors, not users. —CodeCat 00:29, 16 May 2012 (UTC)

The current "about" pages already are in the Wiktionary namespace! —RuakhTALK 00:33, 16 May 2012 (UTC)

People generally seem to be in favor of language portals separate from the about pages,

The French Wiktionary has them; see "Autres langues traduites en français" on that page,

There is doubt whether there are enough editors on EN Wiktionary to do something like that, though it's noted that once a language portal goes up, little modification is ever needed

A sample page should be made up. --BB12 (talk) 00:46, 16 May 2012 (UTC)

Actually, what you mention about the French Wiktionary are langugage categories. But there are a few language portals, e.g. fr:Portail:Espéranto. Lmaltier (talk) 21:09, 18 May 2012 (UTC)

IPA: Central A

Wikipedia on the sound in question. For correctness' sake I'd like to request that [ä] is henceforth considered the correct sign in IPA-brackets rather than [a], which does not denote a central vowel. Languages concerned would naturally be those with central A such as Spanish, Latin, Polish, German... The current situation seems to be that brackets are required for all languages but English, but the narrowness of their content is laissez-faire. Korn (talk) 14:57, 12 May 2012 (UTC)

Thank you for starting a discussion here. As I put on your talk page, my position is that the broad-narrow distinction is a continuum, so there's no harm in putting the more general but easier to read [a] instead of [ä] in square brackets. I believe that for languages other than English, we should offer a transcription that is as close to physical correctness as possible without being awkard to read, in addition to a solely phonemic one (which is easy to read, but often useless unless you know all a language's phonological rules, as it abstracts away from them) and an extremely narrow phonetic one (which is often appliable only to certain regions and very hard to read for non-specialists). I do believe [ä] (which is just [a] with diacritics) is rather awkward to read, but in case it turns out as consensus to use it, I will do that too of course. Longtrend (talk) 15:14, 12 May 2012 (UTC)

It's a mistake to say that IPA [a] does not denote a central vowel. While the cardinal vowel denoted by [a] is front, not central, in practice IPA vowel symbols are rarely used to indicate their cardinal values, since very few languages actually pronounce vowels at their cardinal positions. Rather, the vowel symbols are used to denote the vowels in a particular language that come closest to the cardinal positions. For example, the cardinal value of [i] is defined as the vowel "produced with the tongue as far forward and as high in the mouth as is possible (without producing friction), with spread lips", but not many languages' [i] sound is actually that far forward and that high. Certainly the [i] of English knee and German nie isn't; yet we happily (and correctly!) transcribe both as [niː], because the vowel in question is the farthest forward and highest vowel that the respective language has. By the same token, it is 100% correct to transcribe a given language's most open vowel as [a] even if it doesn't happen to be fully front, as long as the vowel is closer to cardinal [a] than it is to cardinal [ɑ]. Diacritics like [¨] are used only in narrow transcriptions when the qualities they indicate are relevant to the discussion at hand. (For example, it would be important to distinguish between [a] and [ä] when discussing a language where the phoneme /a/ has a more front realization in some contexts and a more centralized realization in another.) But for the purposes of a dictionary showing the lexical level of representation, it is not merely unnecessary to use such diacritics, it's downright misleading. —Angr 16:43, 12 May 2012 (UTC)

The point of languages with a true central is that it is neither closer to [a] nor to [ɑ].

Even if it was closer to one of those, why break it down into a front-back dichotomy, why not accept central as a separate value and consider the vowel closest to that posiotion?

I do not find the example with [i] convincing. And I am not sure what you want to say with it. If the highest, frontest vowel of a language was [e] and the language would have no other vowel proximate to it, one would not be using <[i]> just because it is the highest and frontest. That would be misleading. One would use [e] because it is closest to the actual value. Just as we do use [i] because it is closest to the actual value. And concerning that: See points 1 and 2.

If /ɹ/ passed, we'd be contradicting ourselves not to use [ä]. Why be precise on the one but sloppy on the other?

Who says we want to depict lexical levels? For languages other than English, if I remember rightly, our policies decree that both phonemic and actual pronunciation are to be depicted. See here. If I look up a pronunciation here, I don't want to know that <syv> consists of the phonemes for s, y and v, I want to know how to pronounce it. And if a language had central vowels and dental consonants, I'd feel outright deceived if it wouldn't be depicted. Most especially in square brackets which at least I understand to be used with as much (rather than as little) detail as possible. Mind you that this is not a bilingual dictionary which explains its own transcription beforehand. We can neither assume the user to have a complete knowledge about the languages' phonology nor can we assume the Wikipedia entries we link to to be helpful for the transcription our users entered. Korn (talk) 13:54, 15 May 2012 (UTC)

I take it you have some linguistics training? I don't think most of our editors do, nor do most of our readers. So our goal should be something to use something that we can reliably enter and that our readers can reliably understand. Using [a] instead of [ä] helps that. Certainly few of our readers can tell or reproduce the difference.--Prosfilaes (talk) 19:17, 16 May 2012 (UTC)

ps.: [a] does not denote a central vowel in that if someone like you (not meant as an insult, I just mean that you feel that [a] suffices as a sign) puts it into a word, it doesn't tell me whether you actually meant [a] or just thought it would suffice for [ä]. Korn (talk) 14:00, 15 May 2012 (UTC)

Non-written attestations

WT:CFI#Attestation does say "Other recorded media such as audio and video are also acceptable, provided they are of verifiable origin and are durably archived." How exactly does this work? Non-written attestations obviously do not have spellings at all. So, do we use transcriptions? If so, how? Can we make our own transcription of audio sources, or do we need durably archived ones? Like song lyrics, obviously a good source of spoken language, but can we use any old site to transcribe the lyrics? Of course if we use lyrics for example from the CD sleeve, then that's actually written down anyway. So any audio sources that have a durably-archived written counterpart are nonissues anyway; just use the written version. For ones with no durably archived written sources, do we just assume good faith or what? Mglovesfun (talk) 23:16, 12 May 2012 (UTC)

Perhaps this was in reference to cases where only one spelling is possible and the audio or video confirms usage. Chuck Entz (talk) 00:22, 13 May 2012 (UTC)

Indeed, we often RFV a specific sense of a term, or an idiomatic expression whose component words are clear. In both of these cases, it can sometimes be quite clear what the spelling is. And even in cases where the spelling isn't otherwise clear, and where we therefore wouldn't want to depend on three audio or video cites, I don't think it would be a problem if one of the cites is non-written — or maybe even if two are. (BTW, I'm not sure about the accuracy of official song lyrics. In my experience they sometimes seem to be quite different from what's actually on the recording.) —RuakhTALK 00:50, 13 May 2012 (UTC)

Yes, for various reasons, song lyrics on an insert are often (sometimes wildly) different from what is sung. Equinox◑ 23:07, 14 May 2012 (UTC)

I would say that if a song clearly says X, but an insert prints it as Y, the song can be used to cite X and the insert can be used to cite Y. So in doesn't really matter if we can actually tell that X and Y are "wildly different". --Μετάknowledgediscuss/deeds 04:15, 15 May 2012 (UTC)

I disagree; I don't think that we should accept quotations from inserts at all. Durable archival of a song does not entail durable archival of the insert. —RuakhTALK 14:33, 15 May 2012 (UTC)

To clarify, I was assuming the insert is "durably archived" for our purposes. --Μετάknowledgediscuss/deeds 00:12, 16 May 2012 (UTC)

Inserts are held by libraries along with the CDs.--Prosfilaes (talk) 19:19, 16 May 2012 (UTC)

Hyperlink change

As per Yair rand on the talk page for the vote for languages with limited documentation, I would like to propose that if the vote passes, the link "Wiktionary:CFI/Languages with limited online documentation" be changed to "Wiktionary:Criteria for inclusion/Languages with limited online documentation". It is trivial, but consensus is needed to change the CFI page. --BB12 (talk) 05:45, 13 May 2012 (UTC)

I support this proposal, and support its implementation if consensus here is in favor (i.e., without a formal vote).​—msh210℠ (talk) 05:57, 13 May 2012 (UTC)

Filter 1

Special:Abusefilter/1 - should this be armed (i. e. set to Disallow)? From what I can tell, there haven't been any false positives, and this filter could prevent a lot of vandalism we're getting. -- Liliana• 19:02, 15 May 2012 (UTC)

What do you mean by "false positives"? There have certainly been cases where legitimate entries were started without L3 headers. —RuakhTALK 19:18, 15 May 2012 (UTC)

Since the filter only tags IPs and new (non-autopatroller) users anyway, I beg for examples.The idea is to prevent creation of pages that are obviously vandalism - i. e. which contain just a single line of text, with no headers at all. Such entries are usually deleted on sight anyway, so letting them through makes little sense. And given the volume of such edits, it would lessen the strain on administrators. -- Liliana• 19:22, 15 May 2012 (UTC)

Examples include [[shilpit]] and [[turn the table]]. And I disagree with you either about the meaning of "vandalism" or about the meaning of "i. e.", because I don't think [[Χλόη]] was vandalism (and obviously it wasn't deleted on sight). As for "lessen[ing] the strain on administrators", I don't see how that answers my question. —RuakhTALK 19:30, 15 May 2012 (UTC)

Yeah, then what I just said makes more sense, creating a new filter to restrict pages with no headers at all. -- Liliana• 19:37, 15 May 2012 (UTC)

Wait, what? So you do think that [[Χλόη]] was vandalism? —RuakhTALK 19:46, 15 May 2012 (UTC)

Re: "Certainly the original version of Χλόη was vandalism": I hope that you left out a word, and meant that it certainly wasn't vandalism . . . because it wasn't. An editor added a page for a real word in a real language, with accurate information about it, including the correct definition. How can that be vandalism? —RuakhTALK 00:40, 16 May 2012 (UTC)

Er, I'm referring to this revision by an IP, which is composed of badly formatted material, which is a mixture of extraneous facts not relevant to the page (or that would be obvious in a standard Wiktionary page) and facts that are slightly incorrect. We are talking about the same thing, right? --Μετάknowledgediscuss/deeds 00:50, 16 May 2012 (UTC)

Yes, we're referring to the same thing — and it's absolutely not vandalism. Do you really think that the editor was trying to format the material badly? —RuakhTALK 01:16, 16 May 2012 (UTC)

Alright, alright, not vandalism. How about this: it was functionally equivalent to vandalism. It required more work on our editors' part than most vandalism does, in fact. The information, most of which wouldn't belong on such a page in any case, was not even quite accurate if one considers (as we do as official policy here) that Greek and Ancient Greek are linguistically distinct as languages in their own rights. Atelaes and Saltmarsh did transform it into a good entry, but as Atelaes said below, it was "not really worth the time we spent cleaning it up." --Μετάknowledgediscuss/deeds 02:18, 17 May 2012 (UTC)

I haven't heard of this feature before. Would someone be willing to explain to me what the results would be if we turned the aforementioned switch? Would the original editor of Χλόη(Chlói) not have been able to save, or would the entry have been auto-deleted a moment later, or would a bot send them an angry letter to make them feel bad? -Atelaesλάλει ἐμοί 01:27, 16 May 2012 (UTC)

Liliana is suggesting that we click the "Prevent the user from performing the action in question" checkbox. To see what that does, you can trigger Special:AbuseFilter/5 by logging out (or opening a browser where you're not logged in) and trying to create an entry with page-text equal to its entry title. (You'll find that the software completely disallows it.) Another option — not what Liliana is suggesting — is to click the "Trigger these actions after giving the user a warning" checkbox. To see what that does, you can trigger Special:AbuseFilter/14 by trying to edit an entry such that it has <ref> but not <references. (You'll find that the software gives you a warning, but will let you save the changes if you insist.) —RuakhTALK 01:40, 16 May 2012 (UTC)

I see. Thanks for the info. Is there any way to add some more info to the explanation of why the page-save is disallowed? My screen only said "This action has been automatically identified as harmful, and therefore disallowed. If you believe your edit was constructive, please inform an administrator of what you were trying to do." While this isn't devoid of useful information, I strongly suspect we could make it better, perhaps a link to a brief page on the most basic Wiktionary syntax. What might also be nice is if we could have a super easy way to add the entry to the appropriate requests page. That being said, I think I would support turning this feature on. Χλόη(Chlói) was certainly not vandalism, it was clearly a good-faith attempt at an entry we lacked, but it was a very poorly informed good-faith entry, and not really worth the time we spent cleaning it up. Also, I have to admit that I basically never patrol. It's an incredibly tedious process that I just can't bring myself to do more than once in a blue moon. Anything which can lighten the load on those saintly folks who do subject themselves to this necessary task receives my support. -Atelaesλάλει ἐμοί 02:15, 16 May 2012 (UTC)

The majority of those edits don't look like vandalism, they just look badly formatted (though that said, there may be lots of deleted vandalism I can't see there). I don't like the idea of blocking those altogether - I think simply offering a warning, a link to the guidelines and perhaps a link to the New Entry Creator would be better if we don't want to scare these new editors away. Smurrayinchester (talk) 07:54, 16 May 2012 (UTC)

I agree with not blocking them.​—msh210℠ (talk) 15:00, 16 May 2012 (UTC)

To answer your question, yes the message can be customized, even separately for every filter. -- Liliana• 04:33, 16 May 2012 (UTC) (I think? Someone please confirm this for me)

In theory, but not always in practice. The ref-no-references filter was designed to have a custom text, but that custom text doesn't display; instead, the generic text Atelaes saw displays. - -sche(discuss) 15:43, 16 May 2012 (UTC)

Can we improve practice?

Generally anything we can do to encourage constructive engagement with potential contributors is good. In some languages it seems essential. However, the signal-to-noise ratio for English-language contributions seems to be getting low. Should we be directing would-be English-language contributors to various specific pages (WT:REE, WT:CFI, WT:ELE or simplified versions thereof)? Should we differentiate by language (ie, mere inclusion of language name) or script used? (Can we?) DCDuringTALK 15:47, 16 May 2012 (UTC)

If we're going the "warning message" way, what should it look like? I'd prefer it to contain some kind of example entry and a link to ELE and CFI. What do you think? -- Liliana• 20:33, 16 May 2012 (UTC)

I think linking to CFI and ELE will scare potential contributors away faster than anything else. When people try and convert others to Christianity, they don't hand out entire Bibles, they hand out little tracts. We need something digestible in 30 seconds, gives the basics, and links to our venerable articles. -Atelaesλάλει ἐμοί 22:24, 16 May 2012 (UTC)

Hmm. So you mean something like "here's an example entry, and to get more ideas look at entries like hinder"? That'd work as well. -- Liliana• 22:26, 16 May 2012 (UTC)

(Re what -sche said about the custom text's not displaying.) I've filed this as a bug.​—msh210℠ (talk) 22:27, 16 May 2012 (UTC)

I opposer enabling this per Ruakh. Furthermore I have seen some false positives, something like {{subst:new en plural|validwordhere}} would be disallowed because the system must check before the subst, because afterwards it does have headers. Ditto the editor who was using [[mg:{{subst:PAGENAME}}]] who was invoking the 'bad interwiki' tag with every edit. Mglovesfun (talk) 23:04, 20 May 2012 (UTC)

I have recently come across this duplicate entry. From what I have read at alternative spelling, it looks like the duplicate definitions should not exist and one or the other should point to the more commonly used, whichever that be. What are people's thoughts? Speednat (talk) 05:22, 19 May 2012 (UTC)

Note. This problem has been discussed very many times before, most especially in terms of color / colour. No solution acceptable to everyone has ever been found. SemperBlotto (talk) 08:00, 19 May 2012 (UTC)

I can do no better than go over old ground. 'More commonly used' only really works when it's clear cut. With something like humor and humour, it's not clear cut. A simple Google search won't suffice either, as humor is used in ten languages (according to our page) and humour in just three, and that will skew the results. I prefer our current solution as the least poor of a set of poor options. Mglovesfun (talk) 15:31, 19 May 2012 (UTC)

Can we prefer the more etymological form for the lemma? This might yield a mix of current British and US spellings rather than favouring one (viz. the note in -ize/-ise). —MichaelZ.2012-05-19 20:26 z

Which etymological form, the Latin one or the Norman French one from which it was borrowed? —CodeCat 21:41, 19 May 2012 (UTC)

Either, consistently. We could choose to lemmatize the most common surviving early form of a word in the language, or the form most true to its immediate precursor in another language, or the form most true to its earliest known relevant ancestor, or better, a prioritized list of these criteria. (Although the note might imply that -ize has always occurred in English for words with Greek roots.)

Anyway, we have some problems with lemmatizing that require attention. It's not good to have four entries for labour/labor/Labour/Labor. —MichaelZ.2012-05-22 18:48 z

I think that keeping full entries for both is not a poor solution, it's a good solution, because it's the best way to be neutral (NPOV). We are not in a hurry: with time, both entries will become perfect, or almost perfect, but perfection can be reached only if correct and relevant information is always kept in pages. Lmaltier (talk) 12:10, 20 May 2012 (UTC)

Well dunno, I like the solution we use for mold / mould. -- Liliana• 15:09, 20 May 2012 (UTC)

So do I, but there are plenty of British readers who will get their knickers in a twist at seeing their preferred spelling relegated to a mere note calling it the "UK spelling of such-and-such US spelling", and equally many American readers who would if it were the other way around. —Angr 16:50, 20 May 2012 (UTC)

One point to keep in mind is that the US spelling is used in the US only as far as I can tell. (Is is used anywhere else). But what is called the British spelling is used in dozens of other English speaking countries, possibly about 50 at a rough guess. British spellings tend to also be used in places like Spain where there are a lot of tourists visiting.--Dmol (talk) 21:22, 20 May 2012 (UTC)

U.S. spellings are frequently used also by the Canadians. British spellings are used by more English-speaking countries, but U.S. spellings are used by more English-speaking people (counting native speakers only). Of approximately 400 million native Engish speakers, some 330 million are Americans or Canadians. American spellings also tend to be used in places like Mexico, Panama, and Colombia for the visiting tourists. —Stephen(Talk) 10:40, 21 May 2012 (UTC)

Another point to keep in mind is that approximately two-thirds of the world's native English speakers are Americans. Or, better, we could leave statistics out of it altogether. —Angr 21:40, 20 May 2012 (UTC)

But you're missing the point. The USA is the odd one out, not the rest of the English speaking world.--Dmol (talk) 21:51, 20 May 2012 (UTC)

US English is also a widely used form of International English - outside Europe and the Commonwealth, I believe it dominates in terms of second language users (though British English is the standard in India, which of course has huge numbers of ESL speakers). British/Commonwealth and American are both widely used, and I don't think either is broadly used enough to call standard. Smurrayinchester (talk) 22:04, 20 May 2012 (UTC)

My dictionaries always had it labeled like this: "humor (US/UK) / humour (UK)". As in, the former is acceptable in the whole world, the latter in the UK only. Thus it makes sense to choose the form which is correct everywhere rather than just in one country. -- Liliana• 05:05, 21 May 2012 (UTC)

The Concise Oxford has the main definition at humour, and it's definition of humor is simply "US spelling of humour" (and as I'm typing this, the British English dictionary on Firefox is marking "humor" as a spelling mistake). Unlike organize/organise, where all US speakers would say "organize", but British speakers are split between both (often interchangeably), I don't think either word is acceptable on the opposite side of the pond. Smurrayinchester (talk) 08:04, 21 May 2012 (UTC)

Is it desirable to have some sort of central definition page, with translations etc. on it, and transclude it onto humor and humour, with a bit of template logic to adjust the spelling accordingly? It would mean that both pages would always be the same, but it would be less user friendly (we'd have to make sure users clicked on edit on the actual page, rather than in the bar at the top, perhaps by protecting the page without cascading protection). It would also be possible to have a bot to keep them the same if it's important, but that seems a bit clunky and inelegant. Smurrayinchester (talk) 21:59, 20 May 2012 (UTC)

The subtemplate idea has been discussed before. It'd pose a few problems. Inexperienced editors won't know how to get to the template to edit it. Also in some cases, there may be a US-only meaning of say humor that is not ever spelled humour. The bot idea poses the same problem as this last one. Mglovesfun (talk) 22:56, 20 May 2012 (UTC)

(One other point - humor currently seems like the better page than humour. It puts the most widely used definition at the top, rather than burying it under obsolete and clinical senses, and it manages to define humour far more succinctly without any loss of meaning. If we merge them at the end of this discussion, I'd pretty much just copy humor over to humour.) Smurrayinchester (talk) 10:08, 21 May 2012 (UTC)

Dictionary entries should be lemmatized at a single capitalization and spelling, not duplicated, triplicated, or fourplicated (as labour/Labour/labor/Labor). This results from the fact that we actually have entries for spellings, rather than for terms (i.e., lexical units). For one thing, this reinforces a fallacious binary Britishistic/USican view of the language.

In Canada, the spelling is chiefly humour, also humor. How can we make that clear with the current arrangement? —MichaelZ.2012-05-22 19:19 z

I think transcluded duplicate entries don't work well: we need to pick one form and stick with it, although it doesn't necessary have to be the same with every one. With -ise/-ize it's easy, because -ize spellings are acceptable everywhere. With -or/-our and similar it's a bit more awkward....the principle I thought we were operating under, which we must have discussed years ago, is that we take the spelling which was created first on Wiktionary and use that as the main form. I can live with that, although it does in effect mean that US spellings tend to be prioritised. Ƿidsiþ 19:43, 29 May 2012 (UTC)

sub-pages in the Etymology scriptorium

Yair rand was so kind to insert some scripting magic into the Wiktionary:Etymology scriptorium, and I successfully have created a first test page.
Feel free to try out and test.

AFAICS, subpages are in every respect inferior to the labeled section transclusion format formerly preferred at the Etymology Scriptorium. It is LST that should be replacing everything else, not the other way around. If only it were not so "odd and complicated", eh?! --Ivan Štambuk (talk) 19:15, 20 May 2012 (UTC)

Hatred

Wiktionary should not be promoting hatred. Yes, the English language (among others) is filled with hateful speech, reinforcing hate against women, racial minorities, LGBT and other sexuality-based minorities, and even children. Among them include the words "child prostitute", when its clear adults are doing the prostituting of children, "cunt" as an insult is degrading to women and the process of birth, "gay" as lame or stupid is degrading towards LGBT people. How can we allow and even promote this kind of language, as definitions, as "official"? What is most insidious is not the hate leveled by others as insults, but the kind of shame generated as self-hate of oneself, and the language that promotes it. I demand action on this topic, a simple brush off of "Oh, but its English" is an invitation for civil rights groups to pounce on wiktionary Doseiai2 (talk) 14:54, 20 May 2012 (UTC)

There's a difference between promoting hatred and documenting it. Like it or not, people use words in horrible ways. Since Wiktionary is a descriptive, not a prescriptive dictionary, we don't censor entries to promote good use of language. We present as clear as possible a picture of the way words are actually used, so that anyone can understand such language when they encounter it. We try to indicate in context labels such as "vulgar", "pejorative", "offensive", etc. and in usage notes when words should not be used, but we don't censor the words or definitions themselves. Chuck Entz (talk) 15:16, 20 May 2012 (UTC)

Can't we have a hatebox to place hateful language in? Its really disturbing that it is kept as definitions along with others.
The hatebox can explain language that we, as intelligent human beings, shall choose to not use, fight against, or otherwise aim to change. Doseiai2 (talk) 15:06, 20 May 2012 (UTC)

[after edit conflict] Wiktionary promotes neither hatred, nor any other thing. It describes them. And I see no reason for descriptions of hatred to be presented in a fundamentally different way from descriptions of love, joy, logic, science, fantasy, or anything else. (We can, and do, mark offensive terms as such, but somehow I don't think that's what you mean by "hatebox"?) —RuakhTALK 15:18, 20 May 2012 (UTC)

We are not "promoting" hatred and are not about to politicise our inclusions policy to satisfy your personal biases about words. Equinox◑ 15:21, 20 May 2012 (UTC)

We do not have any hateful entries (as mentioned by previous editors). We have quite a few poor quality ones, but that's because we have a shortage of good editors. SemperBlotto (talk) 15:53, 20 May 2012 (UTC)

Is Ungoliant saying that all pejoratives are hateful? Because some of them are considerably less PC than others. Siuenti (talk) 16:11, 20 May 2012 (UTC)

Yeah, I don't think "pejorative" is at all the same as what the OP has in mind. "Child prostitute", for example, is not pejorative. The OP means something closer to "terms that I would outlaw if I could". —RuakhTALK 16:23, 20 May 2012 (UTC)

Of course, my list of "terms that I would outlaw if I could" is probably quite different from the OP's. I would like to outlaw irregardless and lay when it means lie and beg the question when it means "raise the question", and I certainly consider these expressions hateful. This is why being descriptive is better than being prescriptive: because it's so much easier to be objective. —Angr 16:40, 20 May 2012 (UTC)

Well, the OP specified that "we, as intelligent human beings" will determine what is hateful. Presumably if you disagree with his or her judgments, then you're booted from the "we" — and perhaps also from "intelligent" and/or from "human beings", I'm not sure — but regardless, it's clear that (s)he is not asking for your opinion in that case. —RuakhTALK 17:45, 20 May 2012 (UTC)

The potential problem is worse than that, IMO. I would rather we did not depend on a vote of a majority of users, registered contributors, admins, or other solons either. The norm of descriptiveness at Wiktionary is also a defense against a "tyranny of the majority", albeit a weak one. DCDuringTALK 18:00, 20 May 2012 (UTC)

You say 'okina, I say ʻokina...

I recently "discovered" that all of my Hawaiian edits using the glottal-stop character [ʻokina]] were different from pretty much all of the existing entries, because I was using an apostrophe, and the others were using a different character that's more like a curved single quote. In other Polynesian languages, however, our entries almost all use the apostrophe. Mostly that's because Metaknowledge has been adding a lot of entries, and uses the one that's easiest to find on the keyboard.

Looking into it further, though (see w:ʻOkina), it seems that even those countries that have adopted the curved quote as their standard still have most native speakers using the apostrophe rather than the curved quote for communicating online. Although all of the languages that have made an explicit choice between the two have gone with the curved quote, it looks like most of the actual usage (except, perhaps for Hawaiian) goes with the apostrophe.

This may seem academic, but our Wiktionary search won't find the curved quote if you type in an apostrophe, and vice versa. So far, I don't believe I've seen dueling apostrophe/single-quote entry pairs in the same language, but it wouldn't surprise me.

It looks to me like it boils down to this:

The curved quote looks better and is the designated character for some languages, with no Polynesian languages designating the apostrophe. For Hawaiian, there may very well be a consensus to use it to the exclusion of the apostrophe.

The apostrophe is much easier to find on the keyboard, and is much more commonly used. Indeed, many Polynesian languages have no standard, either way. A non-native user is not likely to use the curved quote unless someone tells them to, or unless they're copying-and-pasting from an online source (I'm not sure how "smart quotes" figure into this, though). I should mention also that non-Polynesian languages with equivalent characters seem to use the apostrophe for entries here, and some of their terms are spelled the same as those in some Polynesian languages

I can see several likely ways (among many) of proceeding:

Decide to do nothing

Require the apostrophe for entry titles, but allow display of the single quote via "head=" parameters and the like.

Allow different languages to set different standards (though many don't have their own regular contributors, so there may be no one to make the decision).

Make the two characters interchangeable and indistinguishable in the Wiktionary search, etc, so users can create entries with either and it won't matter.

I just spent a bit of time converting my Hawaiian entries to the curved quote, so I'd prefer an actual decision to minimize changing things back and forth later (once more for a new standard would be ok). Chuck Entz (talk) 23:33, 20 May 2012 (UTC)

As it is, I would prefer the apostrophe ( ' ) for every such language except Hawaiian, where I think a curved quote ( ʻ ) is more appropriate. However, that is simply what the current de facto state of affairs seems to be. The fourth option, a technical solution, is the real way to deal with it, if we can. In the mean time, we should use redirects and {{also}}s to come as close to that as possible.

However, I already made an informal decision of sorts (as you said, based on my keyboard) that would be very hard to reverse now. I would welcome others' opinions, especially because this could be bot-mediated, however consensus goes. --Μετάknowledgediscuss/deeds 04:02, 21 May 2012 (UTC)

This is a quandary for a number of languages. Hawaiian is simple: the use of ʻ is standard. For Navajo, we use ʼ. But for other Polynesian languages, Athabaskan languages, Algonquin languages, and Romance languages, it seems like everybody does it differently. For the French, we use the straight ', while French Wiktionary and Wikipedia use ’. The straight ' is the most common because it’s the easiest to input, but sometimes I see different ones, even `. —This unsigned comment was added by Stephen G. Brown (talk • contribs) at 07:11, 21 May 2012‎ (UTC).

I don't think French is comparable. The question about the encoding of what's unmistakably the apostrophe should have little impact on the encoding of the okina or any other case where the character is not simply the apostrophe.--Prosfilaes (talk) 09:12, 21 May 2012 (UTC)

Sure it's comparable. What we do in either case depends on whether we decide to prefer more correct and specific forms, in terms of typography, Unicode, and/or national or international standards, or prefer more generalized and easy-to-enter forms. The Unicode code point for the ʻokina (modifier letter turned comma U+02BB) can be decomposed as left single quotation mark U+2018, which can be decomposed as an apostrophe U+0027. The latter can be used to type either French c'est and Hawaiian Hawai'i – correctly, but lacking the distinction of c’est vs. Hawaiʻi. So these can all be considered equivalent, and it would be nice if we had a general policy on how to resolve these cases. (And this reference work would deserve more respect if we had a policy favouring precision, correctness, and refinement.) —MichaelZ.2012-05-22 17:49 z

I don't think "would deserve more respect" counts for much, or is necessarily true. We should try and balance pedantic correctness with usability and completeness. The second requires that people can find and link to entries, and the third requires that they can create acceptable entries.--Prosfilaes (talk) 20:59, 22 May 2012 (UTC)

The second is satisfied using redirects and alternative form-of entries. The third is satisfied by welcoming any constructive edits and continuing to improve them. This is all routine.

Currently balancing “pedantic” correctness is a persistent lack of consistency in what would be considered elementary proofreading in any other publication. What I mean is that we can't agree to clean up our sloppy dictionary, whether that means preferring plain ASCII or good typography. —MichaelZ.2012-05-22 22:36 z

Would not redrects be a good and feasible solution (with a full page when the page would have to redirect to two different pages, if the case arises)? Lmaltier (talk) 21:11, 22 May 2012 (UTC)

to

I don't know whether this is the correct forum for this, but I think many of the "translations" for to as infinitive marker are off the point, if not plain wrong. For example in Spanish the infinitive of a verb always ends with -ar, -er or -ir, but that's not an infinitive marker in the same sense as to:

I want to read

Yo quiero leer.

If I was to define this, I would say there's no infinitive marker in Spanish, but there are three different infinitive endings. Similar comment applies to Catalan, French, Italian, Portuguese and Russian, probably also to Armenian, Azeri, Croatian, Czech, Esperanto, Greek, Hungarian, Ottoman Turkish, Persian and Turkish, but I don't know enough of those languages to say one thing or another. Finnish entry used to list a dozen of ways to end a verb, but I already changed that. Simple does not exist would probably be sufficient for Finnish and other languages listed above. German and Icelandic translations take a combined approach. These languages use an infinitive marker (zu and að respectively), but also the standard infinitive ending (these languages only have one) is shown without explanation as if it was an alternative infinitive marker. Romanian translation seems to be composed in the same manner, but again, I don't know enough to be sure. From this second group of languages, I would remove the verb endings and leave only the infinitive marker. --Hekaheka (talk) 05:11, 21 May 2012 (UTC)

I disagree. According to w:Marker (linguistics), suffixes are also used as markers. In English the infinitive marker happens to be a standalone word, and in some other languages it isn’t, but I don’t think that is a good reason not to have them as translations. Ungoliant MMDCCLXIV 05:39, 21 May 2012 (UTC)

I think it's more complicated than that. In Dutch, infinitives end in -(e)n, but in some cases it uses te in a similar way to English. Similar, but definitely not all cases. So from the English perspective, Dutch usually uses a bare infinitive (-en) and sometimes uses an infinitive with a particle (te ...-en). For example, I want to read is 'Ik wil lezen' with no particle. But 'I have little to eat' is 'Ik heb weinig te eten' with the particle. Without 'te' in that second case, it would translate to 'I have little food' which is not quite the same (because eten can also be a noun). Compare 'Ik heb weinig water' "I have little water" and 'Ik heb weinig te doen' "I have little to do" where 'doen' can't be a noun and 'water' can't be a verb. —CodeCat 12:07, 21 May 2012 (UTC)

I see that the choice of the verb "to want" as example was not the best possible. In Dutch like in many other Germanic languages the verb willen(“to want”) is considered a modal auxiliary and therefore "te" is not required. Another complication is that in Spanish certain verbs require "a" or "de" between the verb and the infinitive in contexts where English would use "to". The French equivalent is "de" and in Italian it is "di". Curiously, we do not currently mention them. --Hekaheka (talk) 20:00, 21 May 2012 (UTC)

French, like Spanish, uses à sometimes, as in « c’est difficile à voir » ("it's hard to see"), « quelque chose à voir » ("something to see"), etc. But these don't really serve to mark the infinitive, IMHO; it's just that Standard English forbids to-infinitives from being introduced by prepositions, whereas French and Spanish allow it. Plenty of other prepositions, such as pour/para ("for") and après/después de ("after"), can also take infinitives as complements. —RuakhTALK 20:35, 21 May 2012 (UTC)

I'm not so sure that's what the distinction is. I would say that "to" is a preposition along the same lines as à, and English is only different in allowing just the one. Chuck Entz (talk) 04:55, 22 May 2012 (UTC)

I'm sure that historically that was the case, but I think it's hard to explain the "to" in "To err is human" as a preposition. In French and Spanish, an infinitive can function roughly as a noun, and can be governed by the same sorts of prepositions as true nouns; in English, it's the to-infinitive that functions roughly as a noun, but can't be governed by the same sorts of prepositions as true nouns. (Note 1: English has a different non-finite verb form, the gerund-participle, that can function even more like a noun, and can be governed by a preposition: "his love of trying new things", "due to not wanting to cause problems", and so on. Note 2: All of the above is about Standard English. Some dialects do have certain constructions where a to-infinitive is introduced by a preposition, as in "going to Louisiana for to see my Suzie-Anna".) —RuakhTALK 11:31, 22 May 2012 (UTC)

You're right. Historically, the to-infinitive governed the dative case of the infinitive, so it was a noun. And the "for to" construction has a parallel in Dutch "om te", where "om" is optional in some cases. —CodeCat 11:52, 22 May 2012 (UTC)

It's very easy for West Germanic languages: The infinitive has an infinitive-ending, which is -en for continental languages and zero for English. English uses 'to' to mark the infinitive in order to compensate this lack of ending. (Cf. North Germanic languages, which use 'at'.) These languages share the same infinitive-constructions. Some if these constructions require 'to', some do not. When it is required, to/te/zu is considered part of the infinitive. Only in English, since the infinitive is always used with 'to', you might notice the difference less often:

doen / tun / to do

Ik moet doen / Ich muss tun / I must do.

Ik hebbe dit te doen / Ich habe dies zu tun / I have this to do.

The translations for West Germanic languages could be split, since one is the plain infinitive ending and the other the marker of an infinitive in a gramm. construction. Korn (talk) 22:21, 21 May 2012 (UTC)

I think you meant heb, hebbe is the subjunctive... —CodeCat 22:26, 21 May 2012 (UTC)

If I remember correctly, there are actually quite a few constructions that use the infinitive without "to" in English, but they're hard to spot because there's so little distinctive inflectional morphology to help in figuring out which grammatical form you're looking at. Chuck Entz (talk) 04:47, 22 May 2012 (UTC)

If Germanic languages are anything to go by, then any construction with a modal verb must use the infinitive -- I will go, or I must do as above. -- Eiríkr Útlendi │ Tala við mig 06:33, 22 May 2012 (UTC)

Also, I think, the second verb in many other two-verb constructions: "make him stop", "let them have it". (But compare "get him to stop", "allow them to have it", where to is retained.) Equinox◑ 09:03, 22 May 2012 (UTC)

I think that's more a general tendency to not have two verbs conjugated for the same subject in the same phrase. I'm sure other Indo-European and non-IE languages have similar restrictions. Finnish for example has the same restriction, although it also has verbs not conjugated for subject but still for tenses and moods: menisin "I would go", en menisi "I would not go". But that's a special case use with negation. With other auxiliary verbs the construction is like in Germanic languages, using the second verb in the infinitive: tahdon mennä "I want to go". —CodeCat 11:30, 22 May 2012 (UTC)

I would propose to add the following three definitions (or similar ones) and distribute the translations accordingly. The brackets are just here to show examples of necessity and are not part of the definition:

The infinitive-marker of the English language, indicating that the verb is not inflected in any way. ((present) infinitive-marker=de '-en', lat '-ire')

The infinitive-marker, indicating the lexical form of an English verb. (marker of lexical form=de '-en', lat '-eo')

A particle used with an infinitive in some constructions. (grammatic particle=de 'zu', fr 'à') Korn (talk)

The list of entries in the 'active' part of the wanted list (the part that gets shown in pages like recent changes and watchlist) is rather small and it could be easily extended. The advantage is that if there are more entries listed, the chance that someone will know and be able to create at least one of them is higher, which in turn increases the speed at which the list is processed. Not a bad thing I think? —CodeCat 12:53, 21 May 2012 (UTC)

It always used to be a little longer, and could be extended a little (but not so much that it goes onto two lines). I'll increase it to the maximum for my (restricted) screen setup. SemperBlotto (talk) 12:58, 21 May 2012 (UTC)

Your screen is not very wide then. (But I have a HD screen so I can't really compare...) —CodeCat 13:40, 21 May 2012 (UTC)

The physical screen is reasonably wide but, due to bad eyesight, I have text larger than normal. SemperBlotto (talk) 06:49, 22 May 2012 (UTC)

Is it possible to show some kind of bar above the list as a gauge to judge how wide the list can be? I've added one now but can you adjust the width so it fits your screen? (Keep in mind that the text is shown smaller on recent changes than it is on the wanted entries page) —CodeCat 12:16, 22 May 2012 (UTC)

Longest words

What are the longest words in Wiktionary (i.e. longest entry titles, omitting anything with a space)? Is there an easy way to list them? Equinox◑ 16:16, 21 May 2012 (UTC)

What is most important?

The requested entries page is very long and there is no sense of what is the most important. Perhaps a team of people could maintain a top twenty 'most wanted' list. I appreciate that there is some subjectivity but I think there could be consensus on which missing words/phrases users of the site would be most likely to want to look up.

Template:wikipedia in a Mobil View

When I click the "Mobil view" link on the bottom of an article that has a {{wikipedia}} template, the mobile view comes up fine, but when I click the Wikipedia link in the mobil view, the regular Wikipedia article is displayed instead of the mobil version. Is this how it should work? --Panda10 (talk) 19:47, 27 May 2012 (UTC)

The best solution would I think be for the WMF folks to make w: links from our mobile site go to the mobile WP site.​—msh210℠ (talk) 18:32, 29 May 2012 (UTC)

Foreign language "sums of parts"

One of the Greek translations of vomit is κάνω εμετό ≡ "I make" + "vomit". On Google it appears to be more common than the one-word terms εμώ or ξερνώ. How do other people enter such sum of parts in translation sections? — Saltmarshαπάντηση 05:37, 28 May 2012 (UTC)

{{onym}} seems to do the job but is "rfd'd" (but is not the same as {{l}}). I had assumed that {{t}} was preferred for translation sections. — Saltmarshαπάντηση 05:07, 30 May 2012 (UTC)

{{t}} is preferred because it provides a link to the entry on the foreign-language Wiktionary; but for an SOPism we don't want such a link. —RuakhTALK 11:51, 30 May 2012 (UTC)

I have used this for Finnish translations, which are SoP in Finnish: [[word1#Finnish|word1]] [[word2#Finnish|word2]] {{t|fi|wordn}}. Then I have added to some or all of the Finnish "wordX" entries a link to the English entry either as separate sense or a usex. Example: fi translation of the verb to kayak. --Hekaheka (talk) 06:47, 30 May 2012 (UTC)

Things I don't know what they are

Grammar question. Earlier I caught myself saying "things I don't know what they are" (meaning things whose identity/nature I do not know). Then I started to wonder about the correct way to say that, if in fact there is one.

"Things I don't know what are" seemed plausible, but sounds like an alien trying to learn English. ("Things I don't know who owns" is fine, but apparently this relates to word order: "who owns them" vs. "what they are". Compare "things I don't know why exist", which also sounds bizarre.)

Your formulation is not uncommon at Google Books, but somewhat more common is punctuation indicating construal of "I don't know what they are" as in apposition (set off by commas) or isolation (dashes) to "things". Not so common is insertion of which or that. DCDuringTALK 03:09, 29 May 2012 (UTC)

I don't think there's a grammatical way to say it: "things" is the implicit subject of the clause ([things] are [y]) which forms the object of the clause (I don't know [x]) which modifies it, and all together would form the object of whatever the main clause would be (I think- it's all very complicated and tangled). You have to pass that connection through multiple layers of nesting, subject-object inversion and pronoun reference. The brain's grammatical processes can't keep it all straight. Using apposition, at least you're severing one of those grammatical links, which simplifies the whole mess a bit- but it still doesn't quite work. Chuck Entz (talk) 05:02, 29 May 2012 (UTC)

I'm not sure about that article's analysis: "That's the girl that I don't know what she did." looks pretty ungrammatical to me. Chuck Entz (talk) 05:56, 29 May 2012 (UTC)

I agree, most of those examples look very ungrammatical to me. In Arabic, however, the resumptive pronoun is standard fare: البنت التي رأيتني معاها حبيبتي — "the girl that you saw me with her is my girlfriend." (The resumptive her is obligatory.) —Stephen(Talk) 06:43, 29 May 2012 (UTC)

In my experience, native English speakers often generate resumptive pronouns in this situation, where a gap would violate one of Ross's island constraints. When such a relative pronoun occurs in conversation, no one ever seems to notice anything odd about it; but I'd avoid it in edited prose. Unfortunately, the only way to avoid it is to completely recast to avoid an island: maybe "things I don't recognize" or "things I can't identify"? —RuakhTALK 12:28, 29 May 2012 (UTC)

Things that I don't know what they are are crawling on me sounds much better to me than things I don't know what they are are crawling on me.​—msh210℠ (talk) 18:41, 29 May 2012 (UTC)

By the way, I find ?"things I don't know who owns", meaning "things that I don't know who owns them", to be about as bad as *"things I don't know what are". I'd be O.K. with it if it were read as "things {I don't know who} owns", that is, as synonymous with "things owned by I-don't-know-who", but I take it that that's not the reading you have in mind? —RuakhTALK 13:59, 29 May 2012 (UTC)

Use of "sense" template in antonyms section.

In antonyms sections we often use the {{sense}} template to specify which sense we are antonyming. This specifies the sense of the word itself, but people very often think it should specify the opposite sense (that of the antonym) and change it. I am forever having to revert these changes (always without a block). Is there anything we can do to make the situation clearer for our users?

One obvious thing to do is not use the "sense" template when the term only has a single sense. I am removing these as I come across them. SemperBlotto (talk) 15:46, 29 May 2012 (UTC)

Creating a redirect with a name like {{antonym of}} or {{opposite sense to}} might help. At least then users might come across it while editing and realise they're making a mistake. Otherwise, the only thing that might help would be the awkward, cumbersome task of having a bot add a hidden comment like <--The {{sense}} template indicates what meaning of <WORD> each entry is the antonym of. It does not define the antonym itself-->. That's possibly using a sledgehammer to crack a nut, though. Smurrayinchester (talk) 17:25, 29 May 2012 (UTC)

I agree with SB, and I like SMurray's suggestion for multisense PoSes. DCDuringTALK 18:49, 29 May 2012 (UTC)

SMurray's idea does have the problem of inconsistency with the universal pattern, used not only in Synonyms, but also in Hypernyms, Hyponyms, Coordinate terms, and some other nym sections. DCDuringTALK 18:53, 29 May 2012 (UTC)

I might be the only one, but I think any solution we come up with here might be inferior to just undoing reverting the offending edits. I like thing as they are, at least until someone comes up with a better solution. Mglovesfun (talk) 20:34, 29 May 2012 (UTC)

Perhaps we could just have the word of in front of {{sense}}. For example, in dark:

I also disagree that we shouldn’t use {{sense}} in single sense terms. Because when someone adds a new definition it will be hard to know which sense the -onyms are referring to. I usually add sense to one-sense terms unless it’s unlikely to have more definitions. Ungoliant MMDCCLXIV 17:18, 30 May 2012 (UTC)

Hear, hear, perhaps editing {{sense}} rather than removing it is the way to go. Mglovesfun (talk) 16:38, 31 May 2012 (UTC)

"Getting" the database?

Is the database available for someone to have to incorporate into a software project?

If my question raises philosophical questions, then here is my defense: If something is public domain, it should be by that definition available for inclusion in any kind of project, profit or not. If there are useful things made available for free (public domain or CC-BY-SA-NC), then people of good will should respond by donating towards its maintenance. But the two issues are completely independent.

Voting for oneself

A little rule I think we ought to have: "any user whose user rights are the subject of a vote (i.e. for sysopping/desysopping, checkusering/decheckusering, bureaucrating/debureaucrating, and blocking/unblocking) may not cast a vote in support or opposition." We could just modify Wiktionary:Votes/header to change this. So, a couple questions:

In that case I support having such rule and I think it IS important enough for a vote. Ungoliant MMDCCLXIV 18:14, 31 May 2012 (UTC)

I naïvely thought this was already the case: it just makes sense, ethically, that one would not vote on issues with such clear possibilities for conflict of interest. (Not sure about the consensus or if this requires a vote; just stating my support for such a restriction.) -- Eiríkr Útlendi │ Tala við mig 16:24, 31 May 2012 (UTC)

I don't really see the harm in it. Mglovesfun (talk) 16:34, 31 May 2012 (UTC)

Actually, out of the three cases I cited, the one where you voted against our own desysopping was the only one where I think it was appropriate to do so. However, there was so much community support that it didn't matter in the least. As for the other two, Connel is extremely inactive and Wonderfool is extremely disruptive, and I think their votes should not have been counted. --Μετάknowledgediscuss/deeds 16:41, 31 May 2012 (UTC)

That to me isn't a good argument to support blanket banning (i.e. banning entirely with no exceptions) editors voting for/against themselves. In the two cases you've just cited, there were two other reasons why it wasn't a good idea. Mglovesfun (talk) 16:45, 31 May 2012 (UTC)

I think the real reason is what Eiríkr said above. I was just explaining how, in each case, the votes were inappropriate or ineffective. If you think blanket banning is a mistake, what parameters would you put on this? --Μετάknowledgediscuss/deeds 17:14, 31 May 2012 (UTC)

None, other than the rules we already have, one vote per person seems fine to me. Mglovesfun (talk) 18:17, 31 May 2012 (UTC)

I don't see the harm in voting for oneself. Is the person who's the subject of a vote not an editor also? Is his view worthless? Is everyone else unbiased? Compare govenrmental elections and recalls, where the candidate/officeholder (respectively) can vote as well as any other citizen.​—msh210℠ (talk) 18:22, 31 May 2012 (UTC)

I agree 100%. Voting for yourself to receive a right would be in rather poor taste, but I don't see why it should be forbidden. Voting for yourself not to lose a right is more interesting, but I don't think we need a special rule there. Actually, those votes are too murky in general: the latest vote was closed with the assumption that we need a strong majority to revoke privileges, but it's been argued in the past that we should actually need only a strong minority (say, perhaps, 40%), on the grounds that someone shouldn't have privileges if they don't have the community's strong support. (I don't think this vote was closed wrongly, mind you. But the question of whether Connel should have been allowed to vote seems like a very minor point compared to the rest of it.) —RuakhTALK 18:48, 31 May 2012 (UTC)

I think the trust implicit in increased user rights is coming from the rest of the community. That leads me to believe that it is the rest of the community who should decide. Moreover, community biases run both ways; personal biases are always in favor of oneself. In any case, most governmental elections involve such large numbers that it doesn't make a difference; around here, admins get created with a handful of votes.--Μετάknowledgediscuss/deeds 19:07, 31 May 2012 (UTC)

I don't see any harm in self-voting. I'm sure it is only a matter of time before my own desysopping is put to a vote. (Not sure which way I'd vote on that, but I would probably have an opinion) SemperBlotto (talk) 07:21, 1 June 2012 (UTC)

I suppose I am in favour of this, but from the kinds of vote I've seen on Wikt I don't think it is a big deal or a priority. Equinox◑ 21:03, 1 June 2012 (UTC)

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.