Add 'sum of parts' as a deletion reason?

I just deleted the entry chronological order which was badly formatted, but more importantly it was (to me) obviously SoP. But I couldn't really find any deletion reasons that reflect this adequately. We already have one for protologisms that don't meet CFI, so would it be ok to add a reason for SoP entries, which also don't meet CFI? —CodeCat 23:59, 1 October 2012 (UTC)

I don't think SOP should be a speedy deletion reason, i.e. deletion without discussion. There are too many cases at RFD where the nominator believes something to be blatantly and obviously SOP, but new evidence is brought forward that it isn't after all. I think there should be a discussion in cases of suspected SOP. —Angr 09:15, 2 October 2012 (UTC)

I think "not dictionary material" covers it. (Angr is right, though, that "chronological order" warrants discussion. It is SOP, period, end of story, no doubt about it, but not all editors agree that all SOP terms merit deletion, and this certainly seems like the type that some will argue for. Adjective+noun and noun+noun combinations always have supporters. And actually, for that matter, *I* don't agree that all SOP terms merit deletion, either, though in this case our entry for chronological is completely sufficient.) —RuakhTALK 11:38, 2 October 2012 (UTC)

Ok, but would there be a way to mark such entries without a rather blatant RFD? I don't really feel that all SoP entries should go either, because sometimes they are common collocations that translate differently into other languages. So I don't really want to nominate them for deletion, just for discussion and only possible deletion depending on the outcome. —CodeCat 11:52, 2 October 2012 (UTC)

I think it is a good addition to the list of reasons, because it is actually the motivation for some deletions. Properly applied, it is appropriate. Having the deletion reason displayed would give us a basis for re-examining some of the deletion choices to save the occasional questionable deletion. For example, see chronological order at OneLook Dictionary Search, which shows that RHU and a travel industry glossary have the term. I couldn't predict how an RfD discussion would turn out.

I assume that patrollers make deletion decisions based on a quick intuitive weighing of many considerations, including formatting quality and guesses about attestability and SoPitude. The stated single reason should be the one most that best educates the contributor. If "SoP" causes a contributor to challenge the deletion more intelligently than "not dictionary material", that would probably be a good thing. DCDuringTALK 11:57, 2 October 2012 (UTC)

That seems like pretty good wording. DCDuringTALK 12:25, 2 October 2012 (UTC)

Not really related to the topic but wouldn't be possible to easily and automatically flag all SOP entries for checking when created and keeping they flagged until they get older than a certain age? It could make the process of filtering those appropriate from the invalid ones easier. -FedsoTALK 15:22, 2 October 2012 (UTC)

The translation is not always the sum of the translation of the parts, though. I think Wiktionary could benefit from having more phrases. --LA2 (talk) 05:35, 13 October 2012 (UTC)

This is a dictionary. Not a universal language translator. --WikiTiki89 (talk) 10:00, 13 October 2012 (UTC)

Isn't the point of "all words in all languages" to facilitate translation? I agree that we should not have this is a dictionary, but surely we should have some phrases which would border on SOP to English speaker, but which translate differently to other languages or other language-speakers. bd2412T 01:43, 15 October 2012 (UTC)

Latin deponent verbs

Deponent verbs are verbs that only have passive forms, but have a meaning that is closer to active or mediopassive. I just came across diff and it made me think. This edit isn't technically wrong, but why was it not shown as passive in the first place? Why do we treat these verbs as having active forms, when they are morphologically passive? —CodeCat 17:42, 3 October 2012 (UTC)

Sorry, I just noticed this thread. I reverted it, because yes, the anon's edit is technically wrong. Wiktionary, as well as Latin scholarship, has a working consensus that deponent verbs are "passive in form but active in meaning (that phrase in particular was drilled into my head). Morphology is a neat tidbit, and it helps us in terms of etymology, using inflected forms, &c., but it's essentially unimportant in terms of the meaning. labaris is active because it's used in an active sense. The fact that it looks as if it's passive doesn't change that fact. --Μετάknowledgediscuss/deeds 01:26, 6 October 2012 (UTC)

St versus St.

I created an entry for St Martin's summer. When I created it, I found pages that omitted the period, so I omitted it, but afterwards, I found "St. Luke's summer." There doesn't seem to be consistency as the St. Luke's summer entry mixes the period use with non-period use. My understanding is that UK usage generally omits this period and US includes it. My citations for St. Martins have a period. Is there a best way to handle this? --BB12 (talk) 18:47, 3 October 2012 (UTC)

I would go with the cites. Maybe we could get a bot to create either alternative form entries or redirects wherever they are missing. I wouldn't waste effort doing it manually. If there were a consensus to standardize, especially on the non-period version, that would be fine with me too. DCDuringTALK 18:53, 3 October 2012 (UTC)

That sounds good, thank you. I've moved everything to the period-ful page and made the period-less one an alternative UK spelling. --BB12 (talk) 20:11, 3 October 2012 (UTC)

Are both St. Petersburg and St Petersburg the alternative forms of Saint Petersburg? The former is currently a redirect to Saint Petersburg, the latter doesn't exist. I also think that when referring to St. Petersburg, Florida the dotted spelling is more appropriate than to Saint Petersburg, Russia (Sankt-Peterburg) - more commonly written in full. --Anatoli(обсудить/вклад) 04:13, 4 October 2012 (UTC)

Excellent example. [1] indicates that "St." is the correct form for the Florida city, and [2] that "St." is correct for the borough in Pennsylvania, but I find other forms on Google. The Wikipedia article for the Russian city has a mix of all three forms. I agree with your judgment about which ones to spell out, but it looks like all three forms are used for all three locations. --BB12 (talk) 08:02, 4 October 2012 (UTC)

Synonyms vs Abbreviations

Shouldn't we differentiate synonyms from abbreviations? For example on [[Санкт-Петербург]], all of the listed "synonyms" are really just abbreviations. The real "synonyms" would be Ленинград(Leningrad) and Петроград(Petrograd) (not sure about the latter but many people still use the former to refer to the city). --WikiTiki89 (talk) 08:34, 4 October 2012 (UTC)

There is an ====Abbreviations==== header, but I prefer just (abbreviation) as it keeps all the synonyms in one section. Mglovesfun (talk) 10:41, 4 October 2012 (UTC)

Wait, If a word is both a synonym and an abbreviation, cannot we safely put it under both headers?--Dixtosa-wikified me 13:31, 4 October 2012 (UTC)

There are also entries with the header "Scientific names", which could (should, IMHO) be presented as synonyms. Extra headers consume (waste) vertical screen space. We still depend on people's judgment as to when an abbreviation is an adequate synonym in the usage they have in mind, just as we depend on them to select the best from a list of synonyms which are not abbreviations (or scientific names). DCDuringTALK 14:53, 4 October 2012 (UTC)

My point is that abbreviations are not worthy of being given the status of "synonym". I'd sooner call them "alternative forms". --WikiTiki89 (talk) 15:58, 4 October 2012 (UTC)

I agree, though it does depend on the abbreviation. Mr. and Mister are definitely only "alternative forms", not "synonyms", since they're both written representations of the same word; but BO and body odor are "synonyms", IMHO (as well as being "alternative forms"). —RuakhTALK 17:06, 4 October 2012 (UTC)

So should the rule be if they are abbreviated in spoken language also then they are synonyms? --WikiTiki89 (talk) 17:12, 4 October 2012 (UTC)

And otherwise they are Alternative forms? Will our users understand that logic? Will we need to slap them down for good-faith efforts to move things to fit whatever pattern they have experienced? DCDuringTALK 17:54, 4 October 2012 (UTC)

@Wikitiki89, DCDuring: Sorry, I was making an epistemic claim about reality, not a deontic claim about how we should format entries. Naturally it's nice when our entries bear some relationship to reality, but that's not the only consideration . . . —RuakhTALK 21:46, 4 October 2012 (UTC)

Just a point... you could also consider Mr. and Mister to be homophones, or alternative spellings like colour and color. —CodeCat 22:05, 4 October 2012 (UTC)

@Ruakh: We need words about what we hope is linguistic reality to help shape our norms, but we need to think about and discuss the likely realities of contributor and user expectations and behavior, too. DCDuringTALK 22:43, 4 October 2012 (UTC)

@CodeCat: I would not consider Mr. and Mister or color and colour to be homophones for the same reason I wouldn't call them synonyms. --WikiTiki89 (talk) 08:14, 5 October 2012 (UTC)

But they are pronounced the same. Whether they should be presented as homophones is a separable question. DCDuringTALK 12:42, 5 October 2012 (UTC)

Would you consider apple to be a homophone of apple just because they are pronounced the same? --WikiTiki89 (talk) 13:12, 5 October 2012 (UTC)

FWIW, it seems reasonable to say "Mr." and "Mister" are homophonous, but calling them "homophones" is odd. The same with "apple" and "apple," though that's like claiming that they rhyme. --BB12 (talk) 16:35, 5 October 2012 (UTC)

This is an example of a moderately frequent construction whereby a noun-based compound ("of new-time" = "modern") exists only in the genitive form, used adjectivally. I'm of two minds as to which PoS header to use: ===Adjective===, because such words are used only adjectivally ("modern")? Or ===Noun===, because the main word is a noun, and this is a genitive, indeclinable form (with "used only adjectivally" somewhere in the entry, perhaps as an usage note)? Or maybe something else, like ===Genitive adjective===? (I note that structurally comparable cases in English, like bug-eyed, were treated as invariable adjectives; that's what I'm doing also for the time being, but I thought it would be btter to check with you guys in case there is some consensus solution already on this kind of problems.) --Pereru (talk) 11:10, 4 October 2012 (UTC)

If it is treated as a noun, it can have a usage note or {{context}} that says "only adjectivally". And if it is treated as an adjective, its origin as the genitive of an otherwise nonexistant noun can be noted in an etymology, perhaps by template (to standardise the message displayed, if there are many words like this). - -sche(discuss) 16:15, 5 October 2012 (UTC)

{{rfex}} currently does not display anything on the page. I think it should display something similar to {{rfdef}} so that people reading the page might see it and add an example. --WikiTiki89 (talk) 10:01, 4 October 2012 (UTC)

The wisdom of that depends on what we think of the typical quality of usage examples from those motivated by such a display. Even without such encouragement users produce "[X] is [headword]." as a usage example for an adjective headword.

Now we depend on those who have hidden categories visible, work the rfex list, or are visiting the entry to make other changes in the relevant PoS section. We should at least have some kind of link to a section of WT:ELE that explains or links to an explanation of what makes a good usage example. DCDuringTALK 14:46, 4 October 2012 (UTC)

Perhaps it should be visible the way is {{attention}} (q.v.) is.​—msh210℠ (talk) 03:45, 5 October 2012 (UTC)

The point isn't to be able to find it on the page but to get people who know the language or dialect and happen to be on that page to add an example. It would be better if anons could see it, although I guess we should provide a link to a guide for usage examples to help reduce the amount of junk. --WikiTiki89 (talk) 08:45, 5 October 2012 (UTC)

It's a cool tool, but the fundamental issue is probably one for WT:BP: Are usage examples among the things that we want to actively solicit from users? DCDuringTALK 12:36, 5 October 2012 (UTC)

When we really need a usage example, yes. I don't propose that we add this to every definition without a usage example, just to those that really need one. --WikiTiki89 (talk) 13:11, 5 October 2012 (UTC)

More or less: the issue is, with the template being placed as it is now, do we want to encourage less regular contributors to add usage examples? DCDuringTALK 13:39, 5 October 2012 (UTC)

I think in some desperate cases we do and in others we don't. Maybe we should have two templates? --WikiTiki89 (talk) 13:42, 5 October 2012 (UTC)

Either a switch in {{rfex}} or a new template. But, if there is a consensus that we want to solicit more input from more casual contributors, we wouldn't need that. DCDuringTALK 15:00, 5 October 2012 (UTC)

Phonemes and phones in dialects

Dutch has many dialects, the two mainstream ones being Netherlandic and Belgian Dutch. Both of them have the same phonemes, but differ in how some of them are realised. So a single phonemic representation should really be enough to cover them both. But that brings up the question how to transcribe them. The phoneme that descended from Germanic /w/ is now [ʋ] in most of the northern Netherlands, [β̞] or [w] in the south and Belgium, and [w] in Suriname. Out of those, [ʋ] and [β̞] are quite common whereas a true [w] is more rare and is likely to considered marked by the majority of speakers. So which symbol should be used for this phoneme? —CodeCat 20:41, 5 October 2012 (UTC)

What symbol do phonologists who describe Dutch use? We're not working in a vacuum; surely there are precedents for such a widely known and described language. —Angr 20:54, 5 October 2012 (UTC)

Why wouldn't we just give both the most common Netherlandic and the most common Belgian pronunciations, just as we give separate UK and US pronunciations of English words that contain /ɹ/ (or don't)? - -sche(discuss) 21:19, 5 October 2012 (UTC)

UK and US English actually do differ phonemically though. What is final -r in one is a long vowel in the other, which causes differences in homophony among other things. Netherlandic and Belgian Dutch are much more alike in that respect, they have the same sets of homophones and rhymes. As for the standard symbol... it seems to be [ʋ], but that's more out of the Hollandic traditional domination I imagine than to be a true representation of the language as a whole. And historically it isn't even the most conservative realisation, since the southern [β̞] is closer in articulation to the original [w]. Those same traditions also transcribe [x] and [ɣ] both as [χ], even though they are distinct in many of the dialects and actually have a more fronted palatal-like articulation in the south. So we can't really always rely on tradition. —CodeCat 21:28, 5 October 2012 (UTC)

What CodeCat wants, and I don't blame her, is to be able to give one single phonemic representation that covers all the dialects, and then give each dialect's phonetic representation separately, since most of the time the differences between the dialects are just in the realization of phonemes and not in the phonemic representation itself. Phonemes are abstract concepts anyway, so I say pick a symbol and go with it. The advantage to /ʋ/ is (if I understand CodeCat's answer correctly) it's the symbol most widely used because of the Netherlandic bias of the field; the advantage to /w/ is that is the historically oldest. Under the circumstances, my preference is for /ʋ/ if that's what people familiar with Dutch phone{m/t}ic transcriptions from other dictionaries are most familiar with. But it doesn't really matter what symbol we pick as long as we use it consistently in all relevant entries and at Appendix:Dutch pronunciation. —Angr 22:36, 5 October 2012 (UTC)

Got it. In that case I also prefer ʋ. I’ve read a little about Dutch phonology before I joined Wiktionary and it’s the only one I’ve ever seen used. — Ungoliant(Falai) 00:05, 6 October 2012 (UTC)

Poll: Part of speech of some number words

Should the part-of-speech headings of the senses of cardinal numbers for the words hundred, thousand, million, billion, milliard, billiard, trillion, and quadrillion be changed to "Noun"? Now, some of them use "Numeral", other use "Cardinal numeral", and other use "Noun". The question only pertains to English. Entries of other languages are not directly affected by this poll, as the grammar of the other languages that concerns these number words may be different.

You can use this in the poll: {{subst:agree}}. This is a poll combined with a discussion: each poll participant is welcome to explain their preference and to question the reasoning of other poll participant. --Dan Polansky (talk) 07:29, 6 October 2012 (UTC)

The listed number words should have the part of speech heading of the cardinal number sense changed to "Noun"

AgreeDan Polansky (talk) 07:29, 6 October 2012 (UTC) They take plurals and indefinite articles like nouns, so their part of speech is "noun". --Dan Polansky (talk) 07:29, 6 October 2012 (UTC)

Support —CodeCat 11:30, 6 October 2012 (UTC) 'Ten' can also take a definite article or a plural, but only when referring to something labelled ten, not for something that numbers ten. So 'ten' does have a noun sense, but it's separate from its numeral sense.

The listed number words should not have the part of speech heading of the cardinal number sense changed to "Noun"

Agree (and apologies in advance for the long winded reasoning): The fact a number behaves like a noun in some contexts does not make it a noun. Oxford Dictionaries Online (and, presumably, the OED) marks "million" as a cardinal number with a plural, "millions" (they don't explicitly give "millions" a part of speech, but presumably it's still a cardinal number). This makes more sense to me. Having some noun properties doesn't make a number a noun, just as having some properties of articles doesn't make a determiner an article. The cardinal numbers can sometimes behave as nouns ("a million", "hundreds"), but they also behave sometimes more like adjectives ("the twenty cakes"), sometimes like determiners ("Twenty cakes are better than no cakes" - to me this is their most important function), sometimes pronouns ("Ten were eaten."). Marking them as nouns loses all this nuance. Smurrayinchester (talk) 13:42, 6 October 2012 (UTC)

You do have some good points. Originally 'thousand' was a noun and a phrase like 'two thousand men' would have been said as 'two thousands of men', with 'two' modifying 'thousands' and agreeing with it in case and gender. It's obvious that this is not the case anymore. But we do still have 'a thousand men' with an article, which seems unique to the numbers that were originally nouns. So there is still a slight difference, although I'm not sure if that is enough to call it a separate PoS. —CodeCat 13:52, 6 October 2012 (UTC)

There is also 'thousands of men' which is not possible with the original 'true' numerals, so you can't say *tens of men. On the other hand you can say 'dozens of men' since dozen is/was originally a noun. This actually reminds me of Dutch where speakers have found a way around this limitation: *tens of men becomes tientallen mensen, literally 'decades of men' (decade being a multiple of ten), while hundreds of men is still honderden mensen with the plural of 'hundred'. —CodeCat 13:57, 6 October 2012 (UTC)

Ok that is true, I didn't think about it. It looks like noun-like properties like having a plural (tens) are gradually being extended to the original numerals, and numeral-like properties like not needing a plural when modified (two millions) are being extended to the original nouns. But still, it seems that 'tens of' isn't quite as common or easy to say as 'hundreds of'. It's like English is currently still stuck in the process of equalising the grammatical treatment of these words and hasn't quite completed it yet. —CodeCat 14:31, 6 October 2012 (UTC)

@Smurrayinchester: What are the grammatical properties of "hundred" that nouns do not have? Put differently, what makes "hundred" not a noun? "Twenty" is not within the scope of the question, as its grammatical properties differ from those of "hundred". Unlike "hundred", "twenty" does not require a determiner and it usually does not take a plural. The poll question is not about all cardinal numbers, only about those that need a determiner and usually take plural, as incompletely listed above: hundred, thousand, million, etc. --Dan Polansky (talk) 15:12, 6 October 2012 (UTC)

Apart from the consistency argument (it seems bizarre that "twenty" would be a cardinal number but "million" would not), million can directly modify a noun - "a million eggs" - in a way that most nouns denoting number do not - *"a gross eggs", *"a score eggs". This, to me, makes a determiner. dozen admittedly does act that way - "a dozen eggs" not *"a dozen of eggs" - but a lot of dictionaries (Collins, Macmillan, Cambridge Advanced Learners) class that as a use of it as a determiner, not a noun (Oxford and Merriam-Webster do call it a noun, American Heritage calls it an adjective). Our entry at couple even draws a distinction between "a couple of eggs" (noun) and "a couple eggs" (determiner). Since "a million of eggs" is (in modern writing) wrong, but "a million eggs" is right, it seems sensible not to lump "million", "thousand", "hundred" etc in with the nouns. Smurrayinchester (talk) 15:52, 6 October 2012 (UTC)

Re: "consistency argument": if "twenty" and "million" have different grammatical properties, they do not need to have the same part of speech.

The missing "of" in "a million eggs" is a good point: this is a grammatical peculiarity. However, lumping "million" together with nouns such as "load" seems grammatically no more wrong than lumping "million" together with "twenty": the grammatical differences between the latter pair are larger than those between the former pair. --Dan Polansky (talk) 16:08, 6 October 2012 (UTC)

Hmm, I do see what you're getting at. I suppose "hundred", "thousand", "million" are not cardinal numbers on their own, but are what I've seen referred to as "place numbers" or "place values" - they are words that refer to the third, fourth and seventh digits of a number - and "place number" is not an accepted part of speech. (The reason old fashioned number words like "score" ("three score years") and "dozen" ("five dozen eggs") act the same way that they themselves were place numbers for the vigesimal and duodecimal folk number systems. "myriad" also seems like it can act in the same way.) I'd still prefer not to call this a noun; at least, not exclusively. In most uses, "million" seems more like a determiner than a noun to me, albeit one whose plural is a noun ("three million eggs" vs. "millions of eggs") and possibly a pronoun ("His farm produced too many eggs. Millions were thrown out each day."), with ten then being a hybrid that has properties of both cardinal numbers and place numbers (as Angr says, "tens of volts" is perfectly good English, as is "ten volts"). Still, I'll concede that they aren't just cardinal numbers the way "twenty" is. I've changed my "vote" accordingly. Smurrayinchester (talk) 19:16, 6 October 2012 (UTC)

It is easy to find "tens of men", even "forties of men".

There are numerous uses for all the simple number words where they are referring in some way to a number but are used as nouns. "To count by threes/sevens/hundreds/billions" would all be attestable, as would the use of all the simple number words with prepositions like in and as subject of clauses. In the teaching of arithmetic, one can find almost the whole range of grammatical possibilities for simple number words as nouns. "Add (a, the) seven and what do you get?"

To me, whatever else all these simple number words are (quantifier, a type of determiner? or adjective in traditional PoS terms), they are currently nouns in large classes of usage. And that is quite apart from their use to refer to objects associated with a given numerical value: "Can you give me two fives for a ten?" "I've got a full house, two sixes and three sevens". DCDuringTALK 19:40, 6 October 2012 (UTC)

I don't know, or I don't care or other option

Even a poll seems premature to me, though it may be necessary to stimulate the discussion.

I am reasonably sure that all of the simple English number words used, among other things, to indicate cardinal numbers are, at least sometimes, grammatically nouns. They form plurals. Their plurals can also be used in compound number words that they head to form plurals of compound numbers words: "They came in five hundreds".

The use of words like million has changed over time. For example, in the 19th century, five millions of dollars was much more common than five million dollars.

I am not sure how to present the quantifying function of any of these words.

As a dictionary, we need to present the number words without regard to the numbers they represent, except semantically.

I sometimes wonder whether we shouldn't have tables at the Translingual entries for numbers in symbolic form that show how they are represented in words in various languages, including English. DCDuringTALK 12:43, 6 October 2012 (UTC)

Comment: per DCDuring's second point, we should check if these words are multiple parts of speech. - -sche(discuss) 19:24, 6 October 2012 (UTC)

Other dictionaries' presentations fall into two classes those calling twelve and million "noun" and "adjective" and those calling them "number". Only MacMillan and Cambridge (Advanced Learners and American English) call them "number". AHD, RHU, MWOnline, WNW, Collins, MW 1913, Century etc call them noun and adjective. They are not all that much like adjectives, though. DCDuringTALK 21:33, 6 October 2012 (UTC)

Traditional grammar didn't used to distinguish between determiners and adjectives. Nowadays we'd say twelve is a determiner. million is not though as it is more like a noun. --WikiTiki89 (talk) 21:40, 6 October 2012 (UTC)

Lexicographers are much more conservative than linguists. Not very many dictionaries even have 'determiner' as a word class. Longmans DCE is one of the few that does. They have both as nouns, determiners, and pronouns. CALD, which has determiner as a word class doesn't put these number words in it. DCDuringTALK 22:47, 6 October 2012 (UTC)

I'm now in favour of ditching the "cardinal number" description, but instead of just having "noun", dividing the entries into "determiner" and "noun". Noun on its own loses too much of the meaning of the place numbers - "a million eggs" would not be grammatical if "million" was just a noun that meant "1,000,000 of something". Smurrayinchester (talk) 19:17, 6 October 2012 (UTC)

I agree that the presentation as a noun only is not really satisfactory. They do not really behave much like true adjectives. In contrast, the cardinals do behave enough like adjectives. Maybe determiner is a good enough category. DCDuringTALK 21:33, 6 October 2012 (UTC)

OED calls this a "countable determiner"! according to The Integration of Million Into the English System of Number Words:

A Diachronic Study, by Donald Sims MacQueen, which I stumbled across at Google Books. This would make for an eonomical presentation, requiring a modification of {{en-det}} to facilitate implementation. I fear that contributors will nevertheless regularly add PoS sections that conform tho their determiner-less set of word class names, ie "Adjective", "Number"/"Numeral", "Noun". DCDuringTALK 16:06, 7 November 2012 (UTC)

It's used in entries like [[Noah]], and generates (biblical) and puts entries into Category:en:Biblical characters. But if this template is intended as a context, it seems wrong: as Pass a Method points out, other works also feature Noah (so the character isn't only in the Bible), and even non-religious people use the term "Noah" to mean "Old Testament character who [is said to have] built an ark" (so the term isn't used only by adherents of Biblical faiths). Or if the template is intended to bring about categorisation, that seems wrong: it seems like putting {{mammal}} in front of [[bear]], something we don't do. We put "Category:en:Mammals" in [[bear]] 'manually'. So, should we stop using Template:biblical character? - -sche(discuss) 18:54, 7 October 2012 (UTC)

Furthermore, some faiths have such characters in other texts, such as hadith, book of mormons, kitab-i-iqan etc. Pass a Method (talk) 22:05, 7 October 2012 (UTC)

Keep. Although there are many other contexts in which biblical characters may appear, these are found in the Christian Bible, which is an important part of Western history and culture. Although there are many other sacred texts belonging to many faiths, the term "biblical" very clearly relates to this text (probably also the Hebrew Tanakh, since there's a great deal of overlap between the two- I'm not familiar with the usage in Judaism). I would have no problem with categories of "Quranic characters", "Vedic chracters", or others named after other sacred texts. By all means, the Christian Bible should be treated on even footing with any other sacred text, but there's no need to expunge it from all our categories- we just need to use terminology that's NPOV in the documentation, etc. It might be different if it were "Old Testament characters", since that inherently contains a judgment about Christianity vs. Judaism in its name. Chuck Entz (talk) 00:49, 8 October 2012 (UTC)

I don’t see why stop using it. Another template (or {{biblical character}} in addition to other templates) should be used for characters also found in other texts. — Ungoliant(Falai) 01:58, 8 October 2012 (UTC)

People, this is not RFDO. Please go here. Why do I need to babysit this place every time? -- Liliana• 04:57, 8 October 2012 (UTC)

I raised the issue here because I think it's a policy issue, or synecdoche for one ("do we want to use this class of templates, typified by {{biblical character}} and {{mammal}}?"), rather than a mere deletion request, given the number of users who've expressed views which amount to "disregard policy, keep the template", which IMO necessitates a discussion on changing policy. - -sche(discuss) 05:30, 8 October 2012 (UTC)

Misuse of context labels

There seems to be a huge problem of what I think are misused context labels where specific contexts are added to generally applicable definitions. Here is a good example [[uncountable]] and here is a bad example [[finger]]. uncountable gives the general definition and then two context-specific definitions. finger, on the other hand gives the general definition a context-specific label (anatomy). The word finger retains this definition outside the context of anatomy and therefore should not be tagged with {{anatomy}}.

1. I agree that "finger" is not a purely anatomical term and should not be glossed "anatomy". Equinox◑ 11:37, 9 October 2012 (UTC)

I agree with the specific cases and believe the problem is widespread, in part due to a belief that topical labels are a finding aid, especially for highly polysemic words. For options we could go with:

no topical labels, only usage contexts (followed by most dictionaries)

I think we should only have usage contexts on the definition line. Topics are what categories are for and I think part of the problem is that people often use context labels solely for their categories. It seems that putting raw category tags (like [[Category:en:Anatomy]]) is coming out of style with our heavy templatization of pages. --WikiTiki89 (talk) 13:10, 9 October 2012 (UTC)

Two more examples: the one discussed in the preceding section and on RFDO, {{biblical character}} in [[Noah]] (which wrongly implies the term isn't used outside of the Bible / aren't used except by Biblical characters, because it is being misused to as if it were part of the definition), and {{sports}} in [[BASE jumping]], which is likewise misused (the term denotes a sport, it isn't limited to sports). (As I wrote about, this is why I listed {{biblical character}} here rather than at RFDO: because a umber of users have expressed views which amount to "disregard policy, keep the template" (apparently because they're too lazy to add categories manually), which is a discussion on changing policy, and such discussions are suited to the BP rather than RFDO.) - -sche(discuss) 17:17, 9 October 2012 (UTC)

We could address the contributor lazinessefficiency issue by having topical categories default to a categorize-only no-display mode, requiring a switch (eg, "disp=1") to display as a usage context as well. Conversely, as we don't seem to want to categorize by usage context, we might need "nocat=1" as an option for items that are not used outside of a given context, but are not about the topic that corresponds to that context. (For example, a term used in the military to refer to, say, civilian women.)DCDuringTALK 18:07, 9 October 2012 (UTC)

A disp=1 parameter would not work that well, because there would be no way to say which ones to display and which ones to keep hidden. —CodeCat 14:27, 12 October 2012 (UTC)

Well, one could break the context labels into two groups, the "display" group and the "no-display" group. That would be a robust solution. But is it even logically possible for such a use case to arise?DCDuringTALK 20:16, 12 October 2012 (UTC)

Or, much simpler (and not violating any votes) is just to write the categories out at the bottom of the page the good old fashioned way. Mglovesfun (talk) 21:25, 12 October 2012 (UTC)

Having them at the sense line would allow for the option of selective display using either global preferences or something like what we have for the expanded or contracted display of quotations, translations and whatever is under {{rel-top}} and its relatives. DCDuringTALK 22:54, 12 October 2012 (UTC)

Verb conjugation templates

Out of curiosity, is there a reason why, for most (all?) languages here with complex verb conjugation paradigms, the conjugation tables are built so that the tense-aspect categories are the rows and the person categories are the columns (e.g., the table in Latin abdico)? I note the current Latvian table does that (e.g., celt(“to build”)). I had thought about changing the Latvian template so that the persons are the rows and the tense-aspect categories the columns -- the more traditional format, which I tend to prefer -- but I noticed that the opposite is much more frequent here and wondered if this is the result of a past cross-Wiktionary community decision or policy. --Pereru (talk) 12:55, 12 October 2012 (UTC)

The Germanic languages seem to follow the vertical arrangement of person and number too, as do many of the Slavic languages as far as I'm aware. There isn't really a policy. People probably started off making tables from scratch and then copied existing ones to make new ones. The tables for the Romance languages are almost identical so it's very obvious there that it started with one and was then re-used for other languages. —CodeCat 14:11, 12 October 2012 (UTC)

The current format results in tables that are longer than they are wide. The other orientation would result in our readers having to use horizontal scrolling - normally considered a bad design feature. SemperBlotto (talk) 14:16, 12 October 2012 (UTC)

(Edit conflict) I think it's mostly neatness. Dividing it by person gives 6 rows, which should be more or less constant all down the page (in Latin, this also fits neatly with the triplets of imperative and infinitive forms). If divided by tense, there would be a different number of columns for each mood/whatever, and wouldn't look as good on the screen (especially on small displays, where languages with lots of tenses might have conjugation tables too wide to read). It's not consistent across all languages - Dutch uses the method you propose, with person on the row and tense on the column (see uitgaan#Conjugation), while German uses a hybrid system where mood is the column and tense is the row (see schwimmen#Conjugation). It looks like there's no order to it, we just use whatever made most sense to the template designer at the time. Smurrayinchester (talk) 14:27, 12 October 2012 (UTC)

(Of course, Dutch is a very simple language to conjugate, since like English it only conjugates the past and present tense - presumably, Dutch conjugations templates are laid out that way because of neatness too, keep them as narrow as possible and to keep the number of columns constant.) Smurrayinchester (talk) 14:31, 12 October 2012 (UTC)

Ah, so the point is keeping the tables narrow, and as close to rectangular as possible! Hm, there may be some creative formatting that can be done to achieve these goals without having the columns be the persons -- I find this a somewhat disconcerting format. But I get the point: keep the tables narrow, so people with small displays can still see them. --Pereru (talk) 16:29, 12 October 2012 (UTC)

Not all languages have only 6 columns. Slovene has 9, see vedeti. —CodeCat 16:46, 12 October 2012 (UTC)

Church Slavonic language code

Why is there no language code for Church Slavonic (as opposed to Old Church Slavonic). Wikipedia gives the same ISO codes for both (cu) and here {{cu}} gives Old Church Slavonic. So can we make a separate code for Church Slavonic? --WikiTiki89 (talk) 23:43, 13 October 2012 (UTC)

Hm, good question. The fact that they have the same ISO code is probably the reason no one has distinguished them yet. We could make a code for the non-Old variety of Church Slavonic: if I'm remembering the details of our arcane language-code naming-scheme correctly, it could be {{zls-chs}} or similar (I suggest "chs" rather than "chu" so that it isn't mistaken for an Old Church Slavonic code). But there might be people who favour treating them as one language, like we treat Biblical and modern Hebrew as one language. I suppose the Beer Parlour (or perhaps WT:RFM) is the best place to raise the question. - -sche(discuss) 00:03, 14 October 2012 (UTC)

The difference between this and Hebrew is that {{he}} = "Hebrew" which can refer to both biblical and modern. While {{cu}} = "Old Church Slavonic" which cannot refer to modern Church Slavonic. So if we want to consider them to be one language, then the name has to be changed from Old Church Slavonic (which I don't like). --WikiTiki89 (talk) 10:12, 14 October 2012 (UTC)

On the one hand, Church Slavonic as it is used today "adapt[s] pronunciation and orthography and replac[es] some old and obscure words and expressions with their vernacular counterparts" (as Wikipedia puts it). On the other hand, Latin as it is used today does all of that (IVVENIS, juvenis), yet it is not distinguished from Classical Latin. And the name "Old Church Slavonic" doesn't have to be changed if we decide that the lect used today is still Old Church Slavonic. - -sche(discuss) 21:47, 14 October 2012 (UTC)

The only reason Old Church Slavonic is called Old Church Slavonic is to distinguish it from modern Church Slavonic. So if we decide they are the same language, then I think we should drop the Old. --WikiTiki89 (talk) 09:15, 15 October 2012 (UTC)

Except that the language is far more widely known as Old Church Slavonic, not Church Slavonic. —Angr 20:41, 15 October 2012 (UTC)

Are they even the same language? How different is CS from OCS? —CodeCat 20:48, 15 October 2012 (UTC)

About as different as Medieval Latin from Classical Latin. —Angr 20:54, 15 October 2012 (UTC)

Ok, I kind of suspected that. But we use Classical Latin as the standard, especially concerning pronunciation and grammar, which is rather different in more modern usage and more tuned to local languages. It's probably the same with (O)CS, too. So that would mean that we should really use OCS as the standard, not modern CS. And like you said, the old language is the one that is known, and the one that gets studied. —CodeCat 20:58, 15 October 2012 (UTC)

Well then can we have some sort of etymology-only code similar to {{etyl|LL.}}? --WikiTiki89 (talk) 07:07, 16 October 2012 (UTC)

German obsolete forms due to the 1996 Rechtschreibreform

I don't think all these spellings are actually obsolete. They just are no longer "standard". most of them should still be attestable today. --WikiTiki89 (talk) 10:05, 14 October 2012 (UTC)

My limited experience with these 'European' spelling reforms (limited other than with the French one) is that the reformed spellings aren't actually used as much as the previous spellings. I actually got corrected using gout instead of goût in a university-level French lesson. It's probably the reformed spellings in French which are nonstandard. I seem to think someone told me on Wiktionary that the same is true of German, about 2/3 of adults reject the reformed spellings (cannot remember where I read this so cannot link to it). WT:About German should probably cover this, WT:About French does for its reforms. Also 1996 seems a bit recent to be talking about 'obsoleteness' to me. Mglovesfun (talk) 18:56, 14 October 2012 (UTC)

I agree with both of you. When I've entered pre-1996 spellings, I've called them dated, but only because de.Wikt does. I'd rather call them alternative forms and have a usage-note-template to explain the spelling reform. The spelling reform only affects the language as taught in schools; outside of class, people still write as the please, and since most people alive today learnt to spell before 1996,... - -sche(discuss) 21:20, 14 October 2012 (UTC)

On a related note Portuguese had a spelling reform which came into effect in 2009. Most people and publications switched very quickly to the new orthography (even though it’s still transition period). Ideally (IMO), there should be templates for displaying things like “alternative spelling of foo, made obsolete by such and such spelling reform(s).”, and it should be categorised as an obsolete form. — Ungoliant(Falai) 21:32, 14 October 2012 (UTC)

I agree with Wikitiki89 in that calling them "obsolete" doesn't quite fit our descriptive approach. I like your proposal to call them "non-standard" because that's what they are, and to include a usage note, as -sche proposed. We shouldn't treat them just like the "correct" spellings -- even as a descriptive dictionary, we can and should state what's standard and what's not (after all, we do this with non-spelling issues, too). Longtrend (talk) 06:45, 15 October 2012 (UTC)

Tabbed languages and Definition editing options poll

The Tabbed languages gadget and the Definition editing options gadget have been sitting around for quite a while now... What do people think about finally turning them on? There have been some improvements since the last time they were discussed, such as the tabs moving to the top of the page if the window isn't quite wide enough to display them on the side comfortably, and the icons to the left of definitions being changed to be less intrusive.

I have no idea how much support either of these gadgets have, so a quick poll before running straight into a full vote seems like a good idea. I'm going to leave these open for about a week, and then if it looks like there's at least a reasonable chance of a full vote passing, I'll start it/them at WT:V. --Yair rand (talk) 17:07, 14 October 2012 (UTC)

Tabbed Languages

Support enabling Tabbed Languages

Support —CodeCat 17:37, 14 October 2012 (UTC) But we still need to fix thousands of instances of {{term}} that currently link to no language.

Support, with the understanding that, even once we approve turning it on be default, we'd still have to make various changes before we actually did so. Example #1: most lead-section content would need to be moved into language-sections. Example #2: either as part of this gadget or as a separate script, we'd need to add some sort of JS that tacks #English or other anchors onto links, so that a definition like # [[pain]] at [[douleur#French]] links to [[pain#English]] rather than to [[pain#French]]. (Naturally this would require some thought.) —RuakhTALK 18:25, 14 October 2012 (UTC)

$(".languageContainer a[href]:not(.extiw):not(.external):not([href*=\":\"]):not([href*=\"#\"]):not([href*=\"?\"])").attr("href",function(a,b){return b+"#English"}) takes 15-25 milliseconds to run on Chrome on a, and probably much longer on older browsers, though there's a decent chance that there's a much more efficient way to do it that hasn't occurred to me.

I agree that both of these points need to be resolved before the deployment. --Yair rand (talk) 20:55, 14 October 2012 (UTC)

Some time ago I proposed using a small and simple template to provide links to definitions. It wasn't considered necessary at the time, but maybe now? —CodeCat 20:57, 14 October 2012 (UTC)

Strong support. I've been thinking of doing this for about a month. I think that it will make a major difference in anons' opinions of the 'neatness' of our layout, and it improves the viewing experience in general. @CodeCat: No, I still very much oppose your linking template, but your point about {{term}} is valid. @Ruakh: I agree, but those fixes aren't beyond (y)our ability, right? Whether or not we get TL, the changes would still be welcome. @DCDuring: I have no clue what you're talking about. Can you elaborate? —Μετάknowledgediscuss/deeds 23:43, 14 October 2012 (UTC)

Support I have noticed that, when the MediaWiki preference "Auto-number headings" (under "Appearance") is enabled, the script fails to automatically go to the correct tab based on the URL anchor. I think that this preference is very important for Wiktionary editors as it instantly shows up problems with heading levels, etc. However I understand that not many people use this preference, so we might as well enabled Tabbed Languages regardless! This, that and the other (talk) 22:05, 15 October 2012 (UTC)

Support in light of Spinningspark's comments below about how far down readers have to scroll to find the English definition of a word on a page with a long TOC. With tabbed languages, that problem disappears, but that's no help to the non-logged-in readers who don't have Preferences they can set. —Angr 17:12, 16 October 2012 (UTC)

What if instead of hiding the option in preferences, we put some sort of javascript link at the top that toggles tabbed languages? Then even non-logged-in users can use it. --WikiTiki89 (talk) 18:45, 16 October 2012 (UTC)

I would support that idea, but I still think tabbed languages is better as the default. —CodeCat 18:56, 16 October 2012 (UTC)

Oppose enabling Tabbed Languages

SupportDCDuringTALK 18:02, 14 October 2012 (UTC) A long run for a short slide.

Support --WikiTiki89 (talk) 09:08, 15 October 2012 (UTC) Not ready yet. Also I don't think it should ever be default. Maybe there should be a quick link to change the view at the top of the page rather than hiding it in prefs.

Could you elaborate what is not ready yet? —CodeCat 11:40, 15 October 2012 (UTC)

Honestly I don't know what I meant by that. But I stand by the other two things I said. --WikiTiki89 (talk) 11:57, 15 October 2012 (UTC)

Oppose enabling Definition editing options

Discussion

As long as I can opt out forever, I don't really care. Mglovesfun (talk) 20:56, 14 October 2012 (UTC)

I seem to recall reporting some problems with the definition-editing thingy. (I don't recall where.) Any idea whether those have been resolved?​—msh210℠ (talk) 21:51, 14 October 2012 (UTC)

Am I correct in understanding that all of the content of a given headword would be downloaded, but only the selected language would be displayed? Would an image be downloaded only when the L2 section in which its link is placed is selected? We would want images for each language tab, wouldn't we? DCDuringTALK 13:25, 15 October 2012 (UTC)

For someone who has opted out, assuming that we don't have a dirigiste approach, would multiple copies of the same image download? DCDuringTALK 13:28, 15 October 2012 (UTC)

The whole page is downloaded first, and only when it is fully loaded, the tabs are put into place. So the content will only have to be downloaded once. —CodeCat 14:28, 15 October 2012 (UTC)

Having the same image appear on a page multiple times does not cause the image file to be downloaded twice.

All content in an entry would be loaded, just like it is now. Tabbed languages does not cause any improvement in the loading time.

I think we should have images for each language section regardless of whether tabbed languages is enabled. We shouldn't assume that the reader looked at the top section or the English section, and the reader can't assume that the English section's images would apply to the other sections, anyways. --Yair rand (talk) 14:33, 15 October 2012 (UTC)

Especially because many (and in the future hopefully, all) of our links link to specific sections on the page, which may cause the image to scroll out of view so that the user will never see it. —CodeCat 14:38, 15 October 2012 (UTC)

So [[jaguar]] will need nine copies of the image. What is the total size of the download for [[jaguar]] now? DCDuringTALK 14:42, 15 October 2012 (UTC)

Yes, but it will not be downloaded nine times. Browsers are smart enough to realise they don't need to download it each time because of caching. —CodeCat 14:44, 15 October 2012 (UTC)

The problem here is a discrepancy between what looks good on the tabless layout and what looks good on the tabbed layout. --WikiTiki89 (talk) 15:52, 15 October 2012 (UTC)

Quality requirement in CFI

A recent vote to allow citations from WebCite failed, partly over concerns that if passed it would allow poor quality citations supporting misguided entries. This has highlighted the lack of a quality requirement in the CFI and I am considering a proposal to add this and reintroduce the WebCite proposal. A have a draft at User:Spinningspark/Quality proposal; comments, criticisms and suggestions are all welcome. SpinningSpark 00:30, 16 October 2012 (UTC)

Increasing the necessary amount of citations for usenet should be an optional part of the vote. Regardless, you can count on my support vote! — Ungoliant(Falai) 00:45, 16 October 2012 (UTC)

Wouldn't that also invalidate many of our currently verified terms? —CodeCat 01:06, 16 October 2012 (UTC)

Yeah. That’s why it should be optional. Otherwise some people might vote against the entire proposal solely because of that bit. — Ungoliant(Falai)

I cannot see any real justification for making the number different between Usenet and Webcite and would not want to make that a possibility in the proposal. If there are entries with only marginal verification from Usenet then perhaps they should go through RfV again. But I agree it would be good to make the whole thing about different numbers and times optional. That way they both change together (if at all). SpinningSpark 07:08, 16 October 2012 (UTC)

Do we have any current policy on audio citations? --WikiTiki89 (talk) 07:14, 16 October 2012 (UTC)

I'd be happy enough for stuff that's literally only available on Usenet to fail. From my perspective, it would make it much, much harder for me to cite names of juggling patterns, but overall I think that it could be an improvement still. Mglovesfun (talk) 08:42, 16 October 2012 (UTC)

I see at least two reasons to make the number difference between Usenet and Webcite. For one, stuff is explicitly tagged with people's name on Usenet. It's fakable, but still makes it a little easier to tell independence. For another, Usenet is mostly a conversation; it seems more likely people are concerned about being understood in a conversation then when they're writing poetry or monologuing.--Prosfilaes (talk) 04:19, 17 October 2012 (UTC)

I have now separated out the number/time requirement into an option as requested. I have also added something on determining dates, which was an issue raised in the previous vote. SpinningSpark 09:18, 16 October 2012 (UTC)

That an archive should "be robust enough that the material would survive the collapse of the organization maintaining it" explains why Usenet is allowed, but isn't the (whole) reason the Wayback Machine is disallowed but WebCite is viewed differently. That has to do with the relative ease and difficulty, respectively, of person B effecting deletion of archived copies of person A's page.

It will be difficult to determine if online citations are independent of each other.

I think 6 citations over 2 years is an extremely low threshold. It would take someone a mere ten minutes and a little pinging reminder from the calendar on his phone or e-mail to set up blogs under five different usernames, and then (2 years later) a sixth, post things like "because he was a geoffrey (awesome person), he had already scored tickets", report them all to WebCite, and have his name added to the dictionary as a word meaning "awesome person". (We explicitly allow uses-with-definitions like that.) - -sche(discuss) 22:16, 16 October 2012 (UTC)

That's still quite an effort to achieve that, a two-year plan to put one over on Wiktionary is way above normal vandalism and disruption. I am willing to bet that the vast majority of "geoffrey" type entries come with no proper citations at all and involved absolutely minimal effort on the spur of the moment. Difficulty of proving independence and openess to abuse are criticisms that could equally be levelled at Usenet. Do we actually get this kind of problem from Usenet at all? I'm happy to set the timespan a lot higher, say five years, if that would satisfy you, but it really would not be worth proceeding with the proposal if the number of citations were set too high. People just would not bother. Three is a fairly reasonable number to write up, six is becoming very onerous and significantly higher would never get used.

Besides, the kind of sockpuppetry you describe is usually easily spotted. The puppetmaster gives himself away through writing style, subjects addressed, or characteristic errors being the same across all puppets. I am sure a "geoffrey = awesome person" entry would immediately set off alarm bells and would be looked at with great care. SpinningSpark 01:17, 17 October 2012 (UTC)

Yes, this is also possible with Usenet... that doesn't mean I don't worry about making it easier to do.

You have a good point that increasing the number of citations (which would have been my first thought) would make it hard on citers. And you're right that it would take much higher-caliber vandalism than we are used to. (Of course, we have had high-caliber, deep-game vandals before: Primetime, Wonderfool... speaking of which: @Wonderfool, wherever you are: I trust you'll think of something clever in regard to this. "Wonderfool" is already defined as a variant of "wonderful", backed up by books, so I'm not sure what to do to top that with mere blogs, but I'm sure there's something.) And I suppose if someone is going to do it, they won't be much more dismayed by a 3- or 5-year wait than by a 2-year wait.

I'll try to think of a way the other criterion which has come up in discussions of durability of online archives, which I phrased above as "the relative ease and difficulty, respectively, of person B effecting deletion of archived copies of person A's page", could be added to the paragraph on durability. - -sche(discuss) 03:51, 17 October 2012 (UTC)

This looks amazing. I'm very impressed; if the vote were right now, I'd vote "strong support". (Unfortunately, what's likely to happen is that as the vote gets closer, people will point out various problems that I haven't thought of, and by the time of the actual vote I'll have shifted down to "abstain". But still, right now, I think it's great.) —RuakhTALK 02:59, 17 October 2012 (UTC)

I'm not sure it's a good idea to say that internet-only sources should always be worth less than print. Newsweek magazine is ending its print edition soon, The Onion did so a while ago and there are rumours that The Guardian will do so soon (besides, most newspaper websites already have some features which are not included in their print editions). It seems odd to say that one citation from the December (paper) edition of Newsweek is worth two citations from the January (digital) edition. I wouldn't necessarily vote no, but surely there's some way of fixing it? I know the British Library has the power to request archival copies of digital news, as well as printed news - perhaps websites covered by such schemes (i.e. ones more selective than WebCite) should be exempt from the criteria? Smurrayinchester (talk) 12:56, 19 October 2012 (UTC)

In many newspapers, online-only articles seem to undergo much less copyediting, and editing in general, than print articles. I suspect that this applies even when they eliminate print editions entirely: compared to the print editions, the Web-sites will probably have more and speedier content, but probably not more and speedier copyeditors. —RuakhTALK 13:04, 19 October 2012 (UTC)

Sure, but on the other hand, a web article will probably get fixed within a couple of days (which is a helpful way of knowing if it's a nonstandard form or just a typo). For a print document, the only way to tell for sure if something is in error is to flip through the next few days worth of corrections columns or hunt down the publishing errata. If nothing else, web-only academic journals, with their peer review and editorial processes, are often as good as paper journals (unless they're like Journal of Cosmology). As an aside, would this also apply to e-books that don't get published in paper format (but are from reputable publishers)? Could we have used a quote from Stephen King's w:Riding the Bullet, for instance, which was released digitally in 2000 but not printed until 2009? Smurrayinchester (talk) 13:22, 19 October 2012 (UTC)

To comment on some of these points,

This proposal makes no assertion that online sources are worth less than print (even if in some cases that is actually true). The requirement for more cites from online is purely due to concerns over possible non-durability of online archives.

The proposal to require more cites from online is an option. It will be possible to vote for the proposal but against an increase in number of citations.

The quality requirement does assert that sources without a review mechanism are worth less than those with, regardless of whether or not they are online.

If it can be shown that the British Museum or some other institution is archiving an online journal then that journal is no longer online only, but durably archived without qualification.

The proposal is about online archiving, not electronic media in general, so it will have no effect on the existing status of e-books. Admittedly, this status is unclear and perhaps there should be a vote on that as well.

Pages getting too big?

It has just occurred to me that if ever Wiktionary's goal -- all words in all languages -- gets realized, the current format implies that some pages will have prohibitive lengths. Say, a -- most languages have some word or morpheme a (besides the letter a); won't this page become too big to be usable if it has, say, 500 sections from 500 languages? Also, the translation tables for English headwords -- won't they become difficult to use and parse once there are thousands of translations in them? Just curious. --Pereru (talk) 14:45, 16 October 2012 (UTC)

I support the idea of having languages on subpages (like a/en, a/fr, a/es, a/sh, etc. for a) and then have the parent page automatically display all subpages. This would leave the layout of the page itself essentially the same. But people seem to think that it's a waste of effort. Also this does not fix the translation tables problem. --WikiTiki89 (talk) 14:57, 16 October 2012 (UTC)

The whole concept of mixing all languages on a single page has always struck me as misguided, but it is going to be extremely difficult/impossible to do anything about it now. SpinningSpark 15:00, 16 October 2012 (UTC)

About the translation tables, please see a live demonstration on water. On pages, no, the pages itself won't become too large, but overcomplicated templates like {{list}} are going to seriously slow down page processing and account for high loading times. -- Liliana• 15:04, 16 October 2012 (UTC)

We could move the translations of water to a subpage and leave only those of major languages in the entry. — Ungoliant(Falai) 15:08, 16 October 2012 (UTC)

When we finally get Scribunto these high use, complicated templates can be rewritten in module form - supposed to be much faster. (Italian conjugation templates tested and waiting to be implemented) SemperBlotto (talk) 15:59, 16 October 2012 (UTC)

I would also support language subpages, because that way I can watchlist only the language(s) that interest me instead of everything. In the case of [[a]], I would watchlist [[a/ga]], [[a/sga]], and [[a/cy]], but as it is I don't watchlist it at all because I expect most edits to be to the Translingual and English sections, which I can't be bothered with. —Angr 16:01, 16 October 2012 (UTC)

If we were to have separate subpages, wWould it make sense to group languages in the same family and using a common script onto common subpages to speed the entry of identical headwords in those languages? The most practical objection I've heard to complete separation was from Metaknowledge to the effect that he liked being able to add terms on the same page from related languages (if I remember correctly). DCDuringTALK 16:26, 16 October 2012 (UTC)

We could, as a trial, split up only the pages a and water, and see how it goes. As long as all the languages are transcluded, there shouldn't be any problems. We might want to have an 'add language' button of some sort though... maybe we could co-opt the 'new discussion' button for this purpose?

I just realised, though, that this practice might clash with magic words like {{PAGENAME}}. We would have to enable subpages for the main namespace so that we could then use {{BASEPAGENAME}}. But that would then clash with our current practice, which uses {{SUBPAGENAME}} because of how we name reconstructed terms. They are named "Name/term" whereas the currently proposed scheme uses "term/code", and there is no magic word that could give "term" in both instances. We will need to re-evaluate the naming of reconstructed terms carefully. —CodeCat 16:31, 16 October 2012 (UTC)

I think there is too much focus on what is most convenient for editors. The main focus should be on what is most convenient for readers. I would imagine that the vast majority of accesses to en.Wiktionary are simply to find the definition of a word in English, so that is what should be at the top, but it isn't. Take banana for instance. Just the table of contents takes up the whole of the first screen (and more besides) even at the relatively high screen resolution height of 1080 px. On my notebook with a screen height of 600 px I have to page down four pages before I get to the English definitions (six if I had wanted the adjective). This is mostly due to the TOC being largely filled up with other language headings - although there are other things which could be done to help such as moving the etymology further down. Something really needs to be done about this. SpinningSpark 16:45, 16 October 2012 (UTC)

Can't this also be fixed with float:left or float:right. --WikiTiki89 (talk) 18:55, 16 October 2012 (UTC)

If you're going to use language subpages, how would you avoid conflicts? Like s/he meaning either "Hebrew page on s" or "word s/he in all languages". -- Liliana• 16:51, 16 October 2012 (UTC)

Huh? I don't follow. There is no Hebrew entry at s/he. SpinningSpark 16:57, 16 October 2012 (UTC)

Is that a made up example? There is no Hebrew entry at s either. Are there any real examples of this problem? SpinningSpark 20:05, 16 October 2012 (UTC)

Usability of large translation tables is already an issue, and will get worse as the tables get longer. The solution would probably be some version of the targeted translations script, but the current "star the target language from the list" system isn't great from a usability perspective. If anyone has ideas on how to make it more usable, I'd be happy to try and implement them. --Yair rand (talk) 17:40, 16 October 2012 (UTC)

I see this is a question many people have been thinking about. I probably don't know enough about Wiktionary to opine very meaningfully, but for all it's worth...

on long pages and loc TOCs, I do tend to support the idea of subpages, also because of loading time. I support the suggestion to do a test trial on one specifically big page -- be it water, a or banana (I like a because it mixes many different things -- actual words, morphemes, the letter "a"; I'd like to see how this could all be sorted out). Isn't it what is done in some other Wiktionaries, by the way?

on long translation tables: these are basically lists; perhaps we could have them have a fixed length, with internal scrollbars (just as is already done for some complicated conjugation/declension tables)? Or only have the name of the languages, without the translations, unless the user clicks on a specific language (a little like interwiki links)? Perhaps there could be a gadget were the user decides, for a given page (or for all pages) which languages s/he wants (or never wants) to see in a translation table? Maybe an input box where the user could enter the name or code of the language s/he wants a translation into? Maybe -- as is done with the content of large categories -- display only the first 50, 100, or 200 languages, with an alphabetic guide at the beginning so the reader can click on the first letter of the name of the language s/he wants a translation into? Maybe also make the translation tables have three, maybe four, columns rather than only two? --Pereru (talk) 17:58, 16 October 2012 (UTC)

I believe I had the idea of sorting the languages in big translation tables by language family, to make them easier to find. To find a specific language, just Ctrl+F it. -- Liliana• 18:01, 16 October 2012 (UTC)

@Pereru and SpinningSpark: We already have a means of preventing the ToC from forcing content off the landing screen: Right-hand side table of contents. It is available in Preferences (browser specific) and could be made available by default. You can test it. I couldn't imagine living without it under our current page structure. DCDuringTALK 18:41, 16 October 2012 (UTC)

Sorting by language family may make sense to us, but it probably won't make much sense to the average user. Alphabetical lists would make more sense because they are more universal. —CodeCat 18:43, 16 October 2012 (UTC)

Agreed. Many of our readers probably don't know what a language family is, and certainly don't know what languages are part of what families. We should also not assume that the reader knows how to use Ctrl+F. --Yair rand (talk) 19:45, 16 October 2012 (UTC)

But, DCDuring, this doesn't solve the problem of the ToC itself looking too complicated for use by the casual user. --Pereru (talk) 02:38, 17 October 2012 (UTC)

I support the idea of doing a test on a. That is if we can resolve the subpage problem. --WikiTiki89 (talk) 18:55, 16 October 2012 (UTC)

Could you use a different character to define subpages? For instance the backslash, so s (Hebrew) is s\he instead of s/he. There cannot be many entries that contain a backslash. SpinningSpark 20:01, 16 October 2012 (UTC)

That would make it impossible to use {BASEPAGENAME} in templates. Lua would obviate the problem, though. If subpages are to be used, they would probably be best placed outside the main namespace, to distinguish them from regular top-level pages, anyway. I don't support switching to subpages, as it would require a horrendous amount of template rewriting, javascript, and performance losses, for no real benefit. Categories wouldn't show just entry names in lists. If sections were loaded by Ajax, the pages would be inaccessible to those without JS, and have generally much worse load times for those who do. Alternatively, if links were used, that would make the site far less usable, much more confusing, and much, much slower and more tedious. We shouldn't destroy the readers' experiences just so that editors can have a more convenient watchlist. --Yair rand (talk) 20:15, 16 October 2012 (UTC)

But what alternatives are there? —CodeCat 20:21, 16 October 2012 (UTC)

Keeping it just as it is now. Tabbed languages and targeted translations can take care of the usability issues. We aren't actually likely to get to the point where pages are too large, byte-wise. a is less than 78KB, far smaller than any reasonably-sized Wikipedia article. water is less than 86KB. The pages could quadruple in size and probably still not get to the point where they pass the size of any of our images when accessed from somebody's iPad. We don't actually have a problem here. --Yair rand (talk) 20:36, 16 October 2012 (UTC)

Have you ever tried loading or editing a? Most of the time I get time-outs. It's not much better with water. —CodeCat 20:50, 16 October 2012 (UTC)

Is the problem on the server side, accessing all the templates? DCDuringTALK 20:56, 16 October 2012 (UTC)

This is about a. {{list}} uses some overly-complicated logic which causes it to eat a lot of processing power. Just watch the limit report in the page source and see how it is right up to the maximum limit! -- Liliana• 21:37, 16 October 2012 (UTC)

That's very peculiar. I don't have any problem loading or editing either of those pages. I don't see why there would be a problem; neither the HTML nor the wikitext are particularly long. Confusing. Do you also get time-outs on, say, w:Barack Obama? Its wikitext is more than twice as long as either a or water, and the HTML is 115KB, again larger than a or water. @DCDuring: Aren't the templates only accessed once the page (or a relevant template or page) is edited or purged? I don't think there would be any need to check them on every page load. --Yair rand (talk) 21:00, 16 October 2012 (UTC)

w:Barack Obama loads only within a few seconds. I really don't think the length of the text is the problem, it's the templates. The page loads slowly, not because it takes long to transfer over the internet, but because it takes Wiktionary's servers too long to process them.

Ok, I just tried it. I went to edit a and clicked save, without making any changes. My browser displayed 'connecting...' for two minutes, and then I got a 504 Gateway Time-out error. Then I tried loading the page, and it took about 70 seconds to load. I reloaded, and it took 2 seconds. So the slowdown definitely happens in server-side processing, not in transferring the page. But there is also some kind of caching that makes it load faster after the first try, although I have no idea what it is. —CodeCat 21:17, 16 October 2012 (UTC)

I would say that it's not simply the size, but how confusing a page like a looks -- how difficult it is to find what you're looking for, how lost you feel when looking through all those language sections... and that, if you know the structure of what you're looking at. If you're just a casual user... To me, the page is not too big in an absolute sense -- this may even be true, too, but the biggest problem is, frankly, how screwed up it looks. Some say tabbed languages makes it better; I tried it in my gadget box, and the result didn't look much better. --Pereru (talk) 02:32, 17 October 2012 (UTC)

Only a small percentage of pages will ever have more than two language sections, let along be prohibitvely large. I estimate that a supermajority will only ever have one language section: the inflected forms of Georgian verbs, for example, are unlikely to have homographs; in fact, most inflected forms are unlikely to have homographs: sure bats has some, but fugiebamus? arrodillasen? Likewise words containing clicks (ǃʻûĩ ǂʻàn ǀàũ), hieroglyph transliterations (m3-ḥs3), etc. I vehemently oppose splitting pages by language, but if pages are to be split, I suggest they should be split only after a certain treshhold is passed, e.g. only once they contain 2+ language sections or surpass a certain byte size... otherwise, a tiny tail will be wagging an enormous dog. Are there any prohibitively large pages not in Latin script?

If pages are split, how will users know to type rottweiler/en and rottweiler/fi to find the definition of "rottweiler" in English and Finnish, respectively? What will plain rottweiler look like? Will rottweiler transclude all the subpages, so that its display is unchanged? (I could live with that.) Or if you want main pages to be stripped-down disambiguation pages saying "for the Finnish definition, click here", what happens to users who don't know what language a word they want to look up is in, or who want to know what water means in all the languages which use it? They have to click through to each subpage, then go back to the main page and click to the next subpage, to slowly get a picture of the definitions in each language?

I disagree with SpinningSpark's assertion that the current format is only convinient for editors and that subpages would be easier for readers; I think the current format is easier for readers. I agree with Yair. water and a are nowhere near the size of Wiktionary:Requests for verification, and none of us seem to have trouble editing that page.

From this discussion, I gather that the problem is not the size of the pages at all, it is the number of templates, the few bad templates, and the translations tables. I support splitting translations tables off onto subpages once they pass a certain size. And @Liliana, let's orphan {{list}} and replace it with better templates. - -sche(discuss) 22:13, 16 October 2012 (UTC)

Splitting translation tables will not help with the page load times. Making a subpage for translations and transcluding it is no faster than if the content is put on the main page. What it will help with, though, is the size of the wikitext, but this only affects editing the page, not viewing. —CodeCat 22:25, 16 October 2012 (UTC)

If we need to split large translations tables off the main pages and not transclude them, but instead point to them using something like {{trans-see}}, I'm OK with that. - -sche(discuss) 22:35, 16 October 2012 (UTC)

Liliana has held that "complicated" templates are the problem. But what is the nature of the complexity? My hypothesis is that it has to do with the number of database accesses (especially to uncached parts of the database) required to construct the page prior to download. That could conceivably be transclusions (but presumably widely transcluded templates would be cached if they need to be accessed before download) or, more likely, it could be the use of "IFEXIST" and the construction of content from a category as {{list}} and {{suffixsee}} do. Any one instance of such an access might be trivial, but we may have hundreds of IFEXIST uses if I understand correctly. I'm pulling it together out of snippets rather than comprehensive knowledge so it may have major flaws. Can someone test this hypothesis or an improved version? DCDuringTALK 22:27, 16 October 2012 (UTC)

Page accesses are really not that slow. Wiktionary, like most websites, uses a relational database. Databases have all kinds of tricks to speed up the access, they are optimised for this purpose. Templates, on the other hand, are not. We found this out the hard way some time ago, when {{catboiler}} didn't exist yet and we instead had a single giant template with a switch containing all possible subtemplates. So what I think may be slow is the parsing and processing of the wikitext itself, in particular parser functions. I do believe that for {{list helper}} the 60 if-expressions and the switch with 60 items are the major culprit. I don't know what is causing water to slow down though. But you can quickly check the processing time of a page by editing the page and clicking preview. That way, you can cut out parts of a page and see if they make a big difference in the time. So if you edit water and click preview, you have an idea for the current time, then you edit again and remove all the translations and click preview. If that causes a significant speedup, you know where the problem is and you can try to localise it further. —CodeCat 23:09, 16 October 2012 (UTC)

Re the TOC: I was going to bring up fr.Wikt's way of handling the TOC, but ugh, look how awful fr:a looks. That said, look at fr:run. Would a TOC with collapsible subsections like that work here? - -sche(discuss) 03:48, 17 October 2012 (UTC)

My two cents (caveat: I know very little about the technical side of Wiktionary, only the contributing side) - I support putting massive lists of translations (maybe, say, 50+?) in a separate, clickable page. I think that's a good start to making some pages a little more user-friendly. As for the one-page-many-languages issue, I still think adding a "language preference" setting on the main page is the way to go. I don't know if they are any alternatives, since we have no way of knowing what language each user is after unless we ask them, right? ---> Tooironic (talk) 22:16, 17 October 2012 (UTC)

…in theory, the oldest uses of a word should be listed first, but in the absence of historical data this is difficult to do. Furthermore, definitions are sometimes related, and should be close to each other in such circumstances. Determining which usage is most common or popular requires expressing a Point of View, and should be avoided.

Note that I am not looking to start a debate on what the policy should be—I am asking only whether there is or was a policy, and if so, where it is recorded. —Psychonaut (talk) 13:47, 17 October 2012 (UTC)

I don't know of any policy. I would suspect that "there has apparently been a lot of past debate" = "no consensus" Chuck Entz (talk) 14:06, 17 October 2012 (UTC)

Chronological only makes sense if all senses are still current. Putting an archaic or even obsolete sense at the top isn't really the best idea. For some terms it makes more sense to put the most frequent definition at the top, especially if it's overwhelmingly more common. —CodeCat 14:21, 17 October 2012 (UTC)

I disagree, I think it does make sense to put them chronologically even (and especially) if this puts an obsolete sense at the top. --WikiTiki89 (talk) 14:31, 17 October 2012 (UTC)

Thanks, but I'm not trying to start another debate on the merits of chronological vs. frequency. I just want to know whether the community had formally decided anything about this. Perhaps a separate section could be created below if people want to debate the issue again. —Psychonaut (talk) 14:34, 17 October 2012 (UTC)

In the absence of consensus, we are fortunate that often frequency, literalness, and order of development coincide. In general it is case-by-case. Also subsenses, can be useful in clarifying the relationship of senses. Such a structure is highly likely to conflict with other kinds of ordering. DCDuringTALK 16:45, 17 October 2012 (UTC)

As far as I remember, there has never been an agreement in English Wiktionary about the order of definitions. In particular, there is no agreement on whether obsolete senses should be listed as first or as last. --Dan Polansky (talk) 19:24, 19 October 2012 (UTC)

FWIW My policy is essentially "baring special cases (flower comes to mind), if it has a context tag, it goes after anything without context tag". Circeus (talk) 20:45, 14 November 2012 (UTC)

Fundraising localization: volunteers from outside the USA needed

Please translate for your local community

Hello All,

The Wikimedia Foundation's Fundraising team have begun our 'User Experience' project, with the goal of understanding the donation experience in different countries outside the USA and enhancing the localization of our donation pages. I am searching for volunteers to spend 30 minutes on a Skype chat with me, reviewing their own country's donation pages. It will be done on a 'usability' format (I will ask you to read the text and go through the donation flow) and will be asking your feedback in the meanwhile.

The only pre-requisite is for the volunteer to actually live in the country and to have access to at least one donation method that we offer for that country (mainly credit/debit card, but also real-time banking like IDEAL, E-wallets, etc...) so we can do a live test and see if the donation goes through. All volunteers will be reimbursed of the donations that eventually succeed (and they will be low amounts, like 1-2 dollars)

By helping us you are actually helping thousands of people to support our mission of free knowledge across the world. Please sing up and help us with our 'User Experience' project! :)
If you are interested (or know of anyone who could be) please email ppena@wikimedia.org. All countries needed (excepting USA)!

News for editors

Is Wiktionary:News for editors being maintained at all these days? The last entry is April, but I've just discovered (by chance) that since then {{proto}} has been replaced by {{recons}} and is nominated for deletion(!) without a whisper on the page designed to keep occasional editors up to speed with changes.

It makes me wonder what other major changes I'm not aware of? I don't expect to be kept personally informed, but given there is a page explicitly designed so that editors like me can familiarise themselves with the latest goings on we need to know about, it's important to actually keep it updated. Thryduulf (talk) 20:37, 19 October 2012 (UTC)

As any editor effects such a change, or notices one being effected, he should annotate NFE.​—msh210℠ (talk) 06:29, 24 October 2012 (UTC)

Quote templates for individual books

I was searching for a page that contains an index to all the "Template:quote-" and found these two templates amongst the hits. This doesn't seem like the right use of templates as we'd end up with one for every book, etc. out there. I understand the creators intentions, but this doesn't seem like the way to go about it. I'm not really up on how one deletes a page? -- dougher (talk) 14:50, 20 October 2012 (UTC)

You can nominate templates for deletion at by adding <noinclude>{{rfd}}</noinclude> to the template. And creating a section about it on WT:RFDO. --WikiTiki89 15:10, 20 October 2012 (UTC)

You're right, it's silly and short-sighted. Unfortunately a lot of editors like them for some reason I can't fathom, so I don't think nominating them for deletion will get you anywhere. DTLHS (talk) 15:41, 20 October 2012 (UTC)

I would rename them {{RQ:Don Quixote}} and {{RQ:Fanny Hill}} to match the other members of Category:Quotation reference templates. And it is indeed extremely helpful to have templates for specific references, so you don't have to keep typing out the full title and tracking down the year and the URL and whatnot every time you want to cite a book. —Angr 16:00, 20 October 2012 (UTC)

English pronunciation: Unifying phonemic transcriptions

By our current conventions, we have different phonemic transcription systems for different dialects of English (mainly US and UK, sometimes Australian or South African pops up). For example, voter requires two different transcriptions for the same phonemes. Namely /ˈvəʊtə(ɹ)/ for UK and /ˈvoʊtɚ/ for US.

I propose that we unify our phonemic transcriptions so that entries like voter that have the same exact phonemes across all dialects will not need to have multiple transcriptions. There are two solutions, either choose a dialect to use as a standard (probably RP) or create our own compromise. I have created a suggestion for such a compromise at User:Wikitiki89/English pronunciation. Under my suggestion, voter will be transcribed as /ˈvəʊtə(ɹ)/. We can then go on to have narrower transcriptions such as [ˈvəʊtə] and [ˈvoʊɾɚ] for each dialect if we want.

Note: Words that have different phonemes in different dialects will still need multiple transcriptions. But I think these transcriptions should also use the unified transcription system. For example lever, instead of being /ˈliːvə(ɹ)/ and /ˈlɛvɚ/, will be /ˈliːvə(ɹ)/ and /ˈlɛvə(ɹ)/.

Also note: This only applies to vowels, as we don't currently differentiate between consonant phonemes in different dialects.

So basically, there are two questions here:

Do we want to unify the transcriptions?

What system should we use for the unified transcription, RP, my suggestion (possibly with some amendments), or other?

So then why aren't we using them? It sounds like a good system. --WikiTiki89 13:25, 21 October 2012 (UTC)

A benefit of our current system is that each contributor, including newer contributors, can add transcriptions in a standard (GenAm or RP) with which they are familiar (familiar with the sounds if not the IPA). I fail to see any benefit to switching to an arbitrary system; that contributors would have to learn the arbitrary system before they could start adding transcriptions seems like a drawback. The only difference that switching to RP as the single standard would produce, AFAICT, is the loss of broad US transcriptions: many of our entries already contain only the RP—but marked as such, rather than unhelpfully presented as a universal pronunciation. Stripping the dialectal qualifiers off of the various pronunciations of "lever" would be particularly and unnecessarily unhelpful: why would we want to muddle the issue of which vowel is used in which dialect? Yes, we could add narrow transcriptions to clear some of these issues up, but that seems like more work for no gain (compared to our current system). We would also have to decide whether to present only [ˈvoʊtɚ] as the narrow US transcription, or only [ˈvoɾɚ], or both, because as is often pointed out in discussions around here, the narrower a transcription is, the more confusing it is for those uninitiated into obscurer IPA symbols. - -sche(discuss) 19:20, 21 October 2012 (UTC)

I agree with all that. Also, worth saying that we're perfectly free to distinguish between dialects using phonemic transcription; there's no inherent problem with assigning a different phoneme set to each dialect if we're treating them differently. Ƿidsiþ 19:43, 21 October 2012 (UTC)

I never said we would strip the dialect qualifiers. Especially on lever they are necessary. My point is if we use the same phonemic transcription system for all dialects, it will be easier to see the phonemic differences between (RP) /ˈliːvə(ɹ)/ and (GenAm) /ˈlɛvə(ɹ)/. And voter would only need one transcription that would cover all dialects (/ˈvəʊtə(ɹ)/) and the RP and GenAm narrow transcriptions can be regularly derived from it so we don't even need to have them. Another reason I don't like having different phonemic transcriptions (or maybe this is a problem specific to GenAm) is that it doesn't do justice to smaller dialects within America. Our GenAm transcription currently merges /ɑː/ with /ɒ/, which is not universal in America, which is not such a huge problem, but it is an unnecessary problem. I think it's misleading and completely useless to show differences such as between /əʊ/ and /oʊ/ in phonemic transcriptions. Basically, pronunciation sections are supposed to show how to pronounce a word and not how to fake a British or American accent. If someone wants to learn that, they can look at the Wikipedia articles about British and American accents as that would help them more than looking up individual words one at a time. The differences in pronunciation of lever, however are not just a funny accent, but just a different pronunciation and that is what we do need to distinguish between. I think if our pronunciation sections are more succinct like that, they will be easier to read and follow and more useful. When people look a word for the pronunciation, they are not looking for the subtle differences in vowel quality between dialects. --WikiTiki89 20:08, 21 October 2012 (UTC)

It seems to me that this does the same as what the enPR system intends to do, except using IPA as the representation instead of some ad hoc scheme. I find IPA to be far more readable than enPR, so I support this. But I also suggest making a visible distinction between phonemic and diaphonemic representation. Essentially, diaphonemes add another 'layer' of abstraction: at the base level there are the phones, denoted with [ ], above that are the phonemes recognised within that dialect, with / /, and above that still are the diaphonemes which are common to all of English. The article w:Diaphoneme suggests several possible ways to denote diaphonemic transcription. Personally I like the double slashes: // //. —CodeCat 20:30, 21 October 2012 (UTC)

Just to clarify, what I am saying is that this should be allowed in addition to dialect-specific phonemes, not instead of. So an entry like voter would look like:

(after edit conflict, @Wikitiki:) In previous discussions, some users have objected to the use of what I'll term Schrödinger's ɹ, the parenthetical (ɹ), in UK transcriptions, on the grounds that the ɹ either is pronounced or it isn't; it doesn't exist in limbo. (And the standard is that it isn't pronounced.) I'm not sure if you're proposing that "voter" and "lever" would be /broadly/ transcribed something like this:

but the first option wrongly extends the false sense of ɹ limbo to GenAm, implying that the ɹ is optional across all dialects of English, when in fact it is always omitted in one standard (RP) and always pronounced in the other (GenAm).

The second option pulls our transcription scheme out of sync with one (or concievably both) standards, which is simply bizarre—why only convey the different pronunciations of the second syllables of those words, and not the first?

I can't tell if you'd also conflate /ɑːt/ and /ɑɹt/ or not, but that would be even more wrong. As CodeCat wrote above, when the (ɹ) is preceded by something stronger than a schwa, "UK and US English actually do differ phonemically though. What is final -r in one is a long vowel in the other, which causes differences in homophony among other things."

First of all, GenAm ≠ US and RP ≠ UK. Yes it's true that in RP, the parenthetical /(ɹ)/ is not pronounced and in GenAm it is, but RP does not represent all of the UK and GenAm does not represent all of the US, in fact they both only represent a small fraction of their respective countries. Also, maybe in the UK, the /(ɹ)/ either is pronounced or it isn't, but I'll tell you, in New England, I have heard it pronounced, unpronounced, and everything in between. What the parentheses indicate is that this /ɹ/ is dropped or partially dropped in some dialects. Unparenthesized /ɹ/ would indicate one that is (almost) never dropped (such as that in leverage). So from your two examples, the first one is the one I was referring to.

As for which vowel preceds the /(ɹ)/, I don't think it makes a difference whether it's stronger than a schwa or not. Either way it causes differences in homophony. In non-rhotic dialects, pa is a homophone of par just as much as pita is a homophone of Peter. In GenAm, sake is a homophone of socky. There is nothing special in the case of /ɑɹ/ becoming [ɑː]. The merger of /ɑɹ/ and /ɑː/ in RP is no different than the merger of /ɑː/ and /ɒ/ in GenAm. So we can continue to handle the homophones and rhymes the way we do now.

As I said before, the fact that GenAm doesn't do justice to smaller dialects is not that big of a problem, but it is an easily avoidable problem by using broader transcription. --WikiTiki89 22:07, 21 October 2012 (UTC)

That we use {{a|US}} where {{a|GenAm}} belongs is a convention we should revisit. - -sche(discuss) 01:20, 22 October 2012 (UTC)

The main reason Wikipedia uses diaphonemic transcriptions is to save space. In an encyclopedia article, you don't want to spend lines and lines of text explaining all the different translations of every word; you want to get to the meat of the article quickly. After all, w:lever is supposed to be an article about levers themselves, not about the word lever. But a dictionary entry is about the word itself; we have the space to give all the pronunciation information in full detail without distracting from the point of the entry, because the pronunciation information is part of the point of the entry. So I'm opposed to introducing diaphonemic transcriptions here. —Angr 21:56, 21 October 2012 (UTC)

We are currently regularly giving pronunciation information in two dialects, with occasionally some others popping up. A diaphonemic transcription would serve to cover most English dialects all at once, so even when we are missing a specific dialect, it would be covered by the diaphonemic transcription. Also, keeping the pronunciation section concise makes it more readable and more useful. --WikiTiki89 22:07, 21 October 2012 (UTC)

No, it makes it more readable but less useful, because people then have to go to the pronunciation appendix to find out how to interpret the various symbols in their dialect. We have enough trouble at Wikipedia trying to convince people that /njuː ˈmɛksɪkoʊ/ and /ˈbɜrfərd/ are the correct transcriptions of New Mexico and Burford (Oxfordshire) respectively without bringing those headaches over here too, where they're completely unnecessary. —Angr 22:21, 21 October 2012 (UTC)

Well if we give the diaphonemic transcription along with a minimum of RP and GenAm, I think we're good. That's fairly concise and smaller dialects remain happily covered by the diaphonemes. --WikiTiki89 22:34, 21 October 2012 (UTC)

But why give a diaphonemic transcription at all? The further a pronunciation transcription goes into Abstractness-Land, away from actual dialects, the more often it will be simply wrong (found in no dialect). It's better, in my view, to indicate how the word is actually pronounced in various dialects. If {{a|South African}}{{IPA|/nœʉ/}} is missing from [[no]], I'd rather add it than add {{IPA|//noʊ//}} and expect people to believe, even if they track down some Appendix: page on the subject, that {{IPA|//noʊ//}} somehow covers {{IPA|/nœʉ/}}. Our best entries, like háček, already include an array of dialects.

Besides, individual words may change sounds in ways other words of their class don't, e.g. two words that have the same vowel in other dialects may exceptionally have different vowels in one dialect, meaning a diaphonemic transcription is no more likely to cover (or to enable a person to deduce) a particular other dialect's pronunciation than a GenAm or RP transcription is. - -sche(discuss) 01:20, 22 October 2012 (UTC)

We can't cover every single minor dialect on every page though. There has to be some way to derive smaller dialects. As for words that have the same vowel in some dialects but different vowels in others, that is exactly the situation I was describing with lever and this would also apply to words like bath (//bæθ// vs. //bɑːθ//) and cloth (//klɒθ// vs. //kɔːθ//). As for including pronunciations that are not found in any dialect, practically none of our current phonemic transcriptions are actually found in any dialect. --WikiTiki89 09:04, 22 October 2012 (UTC)

We do cover detail needed to distinguish between significant dialectual differences in some cases, but most of our pronunciation sections are sub-optimally formatted stubs. For example at moor we note the significant distinction between the one-syllable southern and two-syllable northern UK pronunciations where the word is homophonous with more and mooer respectively. True we cannot cater to every minor dialect in every case, but we derive from real dialects that editors, and more importantly readers, are familiar with rather than one attempting to cover the entire anglosphere. Thryduulf (talk) 18:31, 13 November 2012 (UTC)

Idiomaticity and compounds

Compounds are generally idiomatic, even when the meaning can be clearly expressed in terms of the parts. The reason is that the parts often have several possible senses, but the compound is often restricted to only some combinations of them.

For example, mega- can denote either a million (or 220) of something or simply a very large or prominent instance of something. Similarly star might mean a celestial object or a celebrity. But megastar means "a very prominent celebrity", not "a million celebrities" or "a million celestial objects", and only rarely "a very large celestial object" (capitalized, it is also a brand name in amateur astronomy).

This is a terrible paragraph. It assumes that all languages form compounds like English does. It does not take into account the peculiarities of ideograph-based writing systems like Chinese, or those which simply do not use spaces like Thai. Nor does it account for cases like German, where generally multi-word terms are written together like a compound.

I'd like to see this paragraph removed entirely, to be handled by the general rule on idiomaticity. Of course, this would have several implications:

Terms like coalmine, which have in the past been kept solely because they were written together, would have to go. (Everyone rejoice, for we no longer will get troll entries like Chineseman.)

However, compounds which are idiomatic, like laptop, can still be kept.

Note that the effects of this are much smaller than many of you would think. The above rule only applies to compounds. A compound, as per our definition, is composed of two nouns. Words formed by appending or prepending prefixes/suffixes, like the megastar example used in the above example, are totally unaffected. (Note the irony of using an example that doesn't even apply to the paragraph in question.)

I would say scrap it altogether. The status of compounds should be decided separately for each language, but the general rule should be that compounds that are formed regularly are SOP. English words such as coalmine, however, should be included because there is no way for someone to know whether it is ok to spell it in one word; coal mine would be the regular formation. In German however, the story is different because single-word compounds are regularly formed. English affixes, however can be single words and still be regularly formed for example, I think chickens is a regular formation of chicken + -s and so does not merit its own entry (I am unlikely to win on this point though). Affixes with hyphens are more likely to be SOP, like pro-abortion is clearly pro- + abortion and does not merit its own entry. --WikiTiki89 22:29, 21 October 2012 (UTC)

I wouldn't mind if Wiktionary:About English basically copypasted this very paragraph. But it shouldn't be the default for all languages. -- Liliana• 22:31, 21 October 2012 (UTC)

Many of the ones you listed are not entirely SOP. --WikiTiki89 22:36, 21 October 2012 (UTC)

Angr, it seems you are totally skipping the "compounds which are idiomatic can still be kept" part. Without looking at the entries you linked, I'm saying most, if not all of them, are idiomatic. -- Liliana• 22:37, 21 October 2012 (UTC)

I specifically listed ones that don't seem idiomatic to me, with the exception of the figurative meaning of arsehole. They're all terms that at least some people would call SOP if they were written as two words and brought to RFD. —Angr 22:51, 21 October 2012 (UTC)

At the very least, the afore ones would stay, afore- is a prefix. Some of these are pretty borderline, especially accident-prone (which is I presume only kept because accidentprone is attestable by virtue of COALMINE). -- Liliana• 22:55, 21 October 2012 (UTC)

(e.c.) I agree with you that there is a problem and it needs to be fixed. I’m not convinced removing the paragraph is the best solution, although it’s an OK one. Another solution would be replacing it with something along the lines of “Each language has different rules regarding the idiomaticity of compounds. See the WT:AXX page for the individual language for guidance.” (but because most languages don’t have a WT:AXX page, a generalised guideline follows).

As for preventing COALMINE abuse, in the past I’ve thought of amending it to say “the compound form has to be reasonably common for COALMINE to apply.” (intentionally vague). — Ungoliant(Falai) 22:39, 21 October 2012 (UTC)

We need to clearly distinguish orthographic "compounding" from true compounding in the morphological sense. The way in which English writes compounds with a space between the parts is actually a peculiarity of English. No other Germanic languages do it that way. In many cases, a calque of an English two-word term would be considered a compound in other Germanic languages, despite having the same lexical structure as the English term. coal mine is just one example among many, many more. For me, what is much more telling is the accentuation: compounds are single accentual units and have only one primary stress, on the first part of the compound as is common to all Germanic languages. This also includes Chineseman (which is clearly not "Chinese man" because of this difference in accentuation!), blackbeard (which is identical), skinhead etc. Compounds do not have to be nouns either, consider bittersweet or sleepwalk; the part of speech is determined by the last part. —CodeCat 22:46, 21 October 2012 (UTC)

I agree that the paragraph on compounds bears no relationship to our actual policy (our practice) on compounds and needs to be rewritten. The paragraph also misses the point that we have hitherto included compounds because they are single words, not out of concern for the polysemy of their parts, and megastar is not a good example of either our actual practice or the thing the paragraph talks about. However, I disagree with the idea of excluding English compounds, and I agree with Angr that the impact will be large, not as small as is claimed; of the examples Angr cites, I think all could easily be argued to be SOP. I support replacement of the current section on compounds with a better section, but I oppose replacement of it with nothing / a free-for-all deletion-fest. - -sche(discuss) 02:23, 22 October 2012 (UTC)

Are you sure? I'd be happy to go through every single of the entries Angr linked to show how they're idiomatic. -- Liliana• 08:05, 22 October 2012 (UTC)

I started to go through entries in Category:English compound words that are erroneously listed as compounds, when they're really not. You can help! At the end, you'll see how many entries would actually be affected by the CFI change. -- Liliana• 17:23, 22 October 2012 (UTC)

You may not be interested in Chineseman. That's fine, don't look it up. I am interested in words like this, their history and usage, and I think it would be insane to have to demonstrate "idiomaticity" (whatever you think that means exactly) for single words. Incidentally, I don't understand the obsession with how "obvious" the definitions are. Dictionary entries are not solely about definitions; they are also about usage and etymology and pronunciation of words in our language. Ƿidsiþ 17:54, 22 October 2012 (UTC)

I tell you again, if you want that for English, that's fine with me, you can readd it in Wiktionary:About English. But it should not be the default for all languages! Why would you want to include entire Chinese books (!!!) as entries, just because Chinese doesn't use spaces at all? -- Liliana• 17:59, 22 October 2012 (UTC)

There was already agreement in at least one of the many previous discussions of this subject (link to be provided when I find it) that Asian languages which have very different orthographic traditions and which do not use spaces are split into words in a way different from the way languages like English and German are split into words. - -sche(discuss) 21:10, 22 October 2012 (UTC)

This proposal assumes that it's obvious how to form compounds, but it's not that easy. In Dutch, compounds can be formed in different ways:

just glue 2 words together

glue 2 words together, inserting en

glue 2 words together, inserting e

glue 2 words together, inserting s

glue 2 words together, inserting -

insert ' in some types of derivations

write the 2 words seperately

There are some rules, but those don't cover all cases; people who don't know the compound yet would need a dictionary to look up what the compound looks like. Some random examples:

The point is that a grammar book isn't very helpful. It can not explain why it's tandenborstel (tand & borstel), but tandpasta (tand & pasta). You need a dictionary for that. It's not fair to come up with an extreme example; the compounds are a serious problem for someone who is learning Dutch. -- Curious (talk) 20:59, 22 October 2012 (UTC)

But we will have to deal with extreme examples too. Someone suggested having entries on all numbers from 1 through 999999. That is pretty ridiculous. -- Liliana• 21:07, 22 October 2012 (UTC)

But those compounds have to be attestable and finally someone will have to create them. If you fear that someone might create entries for numbers 1-999999999999 then it will be sufficient to disallow bot-based creation of compound words. Matthias Buchmeier (talk) 07:39, 23 October 2012 (UTC)

It is impossible to prove that entries have been created by bot (or, the opposite, have not been created by bot). And I doubt such a restriction will find consensus, when we already use bots to create masses of inflected forms. -- Liliana• 08:04, 23 October 2012 (UTC)

Re: "I doubt such a restriction will find consensus": According to Wiktionary:Bots, bot-edits require consensus; so at least as a matter of policy, a restriction on bots doesn't require consensus. (As a matter of practice, you may be right that such a restriction would be hard to enforce.) —RuakhTALK 13:37, 23 October 2012 (UTC)

We already make an exception to disallow systematically formed chemical names, I believe (if we don't, we probably should). I don't see why we couldn't do the same for uninteresting numbers. A simple line like "Compounds formed systematically, such as those denoting chemicals, numbers or undiscovered elements, are generally not considered idiomatic even when written as a single word" should suffice. To be honest, while I see the problem with compounds in Germanic languages, I don't like the idea of deleting, say, bookshop or windowpane, even though they are unquestionably SOP (a shop for books, a pane for a window) and the equivalent German terms, Buchladen and Fensterscheibe, are equally so. Smurrayinchester (talk) 22:54, 23 October 2012 (UTC)

As far as I know, there is no exception for chemical compounds at all; they too are allowed per our CFI and could be created by anyone. We had Unsupported titles/Protein which failed only due to being unattestable, not due to being SoP. -- Liliana• 13:05, 27 October 2012 (UTC)

The key question should be: is it attested? and is it considered as a word in the language? (including set phrases). This is the solution to this issue. In Chinese, a whole book is not considered as a word... And in German, all compound words are considered as words, but only attested ones should be included. For numbers and other infinite series, a specific attestation rule could be designed. Lmaltier (talk) 21:23, 28 October 2012 (UTC)

Where can we discuss minor changes to the tools? --WikiTiki89 12:02, 22 October 2012 (UTC)

On the scripts talk pages (TL, DEO), I guess. Or right here would also be fine. --Yair rand (talk) 14:47, 22 October 2012 (UTC)

Ok, it seems that the only mildly annoying thing about the definition editor is how the links under the textbox ("+Add example sentence", "+Add quotation", and "More ►") hide when the mouse is not near the textbox. It just creates too much motion on the page. I think it would be better if they just remained there since they don't really get in the way of anything. Or if you really want to hide them then maybe just fade them out or something but without collapsing them and thereby causing the page to continuously increase and decrease in size. --WikiTiki89 15:08, 22 October 2012 (UTC)

Leaving them just faded out leaves a large empty space between the definitions, which I'm not sure is an improvement:

Lorem ipsum.

.

...

Currently there's a 200 millisecond delay (during which the change is canceled if the mouse hovering status changes) before expanding or hiding the links, the intention being that it doesn't change back and forth repeatedly, only when it's at least somewhat apparent that the user want to use those buttons. (This only works for browsers that support transitions.) I suppose the delay could be increased somewhat, which would probably reduce the amount of movement going on. The speed of expansion could also be slowed, which may or may not help. I think it's somewhat important to give a general impression of everything outside the green outline being the same as before, when the user has edits previewed but isn't actively trying to edit, so as to clearly show what has and hasn't changed as a result of their edit. I don't know what actions other than increasing the transition delay/speed can be taken to reduce the amount of movement. --Yair rand (talk) 15:45, 22 October 2012 (UTC)

I don't think anyone would mistake those links as being part of the entry. I honestly see no point in hiding them. But I just had another idea: What if they slide out over the content under them rather than pushing everything down? Also I noticed a glitch: After you press discard edit hovering over the definition still makes those links show up. --WikiTiki89 15:58, 22 October 2012 (UTC)

Shouldn't it? Users might want to add example sentences or quotations without editing the definition itself, and it doesn't need to be mandatory that the form stay like that. Having the extra buttons cover the lower definition sounds like it might get annoying, and when users are just editing the definition without doing anything else, which will probably be most of the time, having the extra buttons stick around might also be annoying. I don't know. Anyone else have an opinion on this? --Yair rand (talk) 17:46, 22 October 2012 (UTC)

Well if they want to add examples, they can press edit again. It doesn't make sense for those buttons to be there only for definitions that you started editing and then canceled. Also, having the page shift every time you move the mouse is a lot more annoying in my opinion than having a piece of the next definition covered and only when your mouse is near the definition you're editing so if you want to see the next definition, just move the mouse away, it's no different than a hover drop-down menu. --WikiTiki89 19:33, 22 October 2012 (UTC)

the presence of Irish and Welsh online

Following the RFV discussion of cothromas (which see for background), I propose that Irish and Welsh be removed from the list of languages which are well-attested online. I propose them together because, as I wrote in the previous discussion, they seem to be attested online to approximately the same degree: but it has become apparent that this degree is not very great. As Angr pointed out in the RFV discussion, terms like cothromas are attested in textbooks and other works, but these works are durably archived on the shelves of Irish libraries, not easily available for perusal online. Please indicate whether you support removing either language (allowing its terms to meet CFI with just one citation of use), or oppose that. - -sche(discuss) 08:08, 24 October 2012 (UTC)

The discussion was about Irish, I had the impression that Welsh is more used but whether that means it's better attested online is another issue. Mglovesfun (talk) 10:56, 24 October 2012 (UTC)

I think Irish is much better attested online than Welsh, since all acts of the Irish government are available in both Irish and English online. I'm not aware of anything like that in Welsh. I think there are also more Irish-language print newspapers and magazines with online editions than there are for Welsh. —Angr 17:50, 24 October 2012 (UTC)

Acts and Measures of the Welsh Assembly are also produced in both languages, although the Welsh Government doesn't have as wide a scope as the Irish Government (the powers of the Welsh Assembly are similar to the powers of state government in the USA; they control education, health, tax, and so on, but foreign policy, the military and most areas of criminal law are out of their hands) - as a random example, here's the Mental Health (Wales) Measure 2010 in Welsh - and by law, companies working in the public sector have to treat English and Welsh equally. I'm not sure how many durable citations that would produce (a Welsh language prescription or electricity bill would be full of useful words, but not durable) but it's worth noting that the British Library keeps a secure archive of a lot of Welsh sites, including Welsh Language Board's website. Smurrayinchester (talk) 11:54, 30 October 2012 (UTC)

Other discussions have given me the impression that a vote is not required to modify the list. Is this correct? If so, is there objection to removing these languages from the list? - -sche(discuss) 03:42, 29 October 2012 (UTC)

Per Angr’s comment Irish might need some more discussion, but I won’t mind if they are removed (they aren’t widely used languages anyway). If it turns out too much garbage is passing RFV, we can always add them back to LDL. — Ungoliant(Falai) 04:04, 29 October 2012 (UTC)

I have removed Irish and Welsh from the list of well-attested-online languages. This affects several ongoing RFV discussions. - -sche(discuss) 22:29, 12 November 2012 (UTC)

the presence of OTHER LANGUAGES online

I didn't get around to suggest some changes earlier but I suggest to remove all Arabic dialects and leave Modern Standard Arabic or just Arabic. They (all dialects) are present online, especially Egyptian but by no means they are well-documented by the definition - dialect. Dialects are not taught at school and no official body dictates what is right. There are occasional "teaching" sites and textbooks but they are of rather low quality and lack any sort of completeness.
I also suggest to remove Malayalam. The resources are extremely poor. Most Malayalam words can only be added by native speakers or imported from Malayalam Wiktionary. --Anatoli(обсудить/вклад) 23:00, 12 November 2012 (UTC)

I support removing the Arabic dialects from the list. (Who thought they were well and durably attested online in the first place?) I have no comment regarding Malayalam. - -sche(discuss) 01:47, 13 November 2012 (UTC)

Translation indentation analysis

User:DTLHS/translation indent analysis: list of languages with some kind of indentation in translation tables (I think I missed some, will re-run later). A lot of junk in there that should probably be reformatted. DTLHS (talk) 02:59, 25 October 2012 (UTC)

On Goguryeo

User:Joseon814 has been adding Gogureyo data. Also known as (Old) Koguryo, Goguryeo was spoken until the seventh century CE. They (Joseon814) evidently get their data for these entries from "Koguryo, the Language of Japan's Continental Relatives: An Introduction to the Historical-comparative Study of the Japanese-Koguryoic Languages with a Preliminary Description of Archaic Northeastern Middle Chinese" by w:Christopher I. Beckwith. As per Ruakh, 'According to Buyeo languages#Japanese–Koguryoic hypothesis, "Beckwith reconstructs about 140 Goguryeo words, mostly from ancient place names" (emphasis mine), so apparently these are scholarly hypotheses rather than actual attested terms.'

I have e-mailed Joseon814, but they are not forthcoming with further information, so we are left to speculate (beyond the book's title) on the validity of their data. I have suggested putting the data in an appendix as one solution, but what do others think? --BB12 (talk) 20:16, 25 October 2012 (UTC)

Are there any attested writings, or even words quoted in other sources, in Goguryeo? If not I suggest we can the language whole. -- Liliana• 20:33, 25 October 2012 (UTC)

That last part of your sentence got clipped :) --BB12 (talk) 20:44, 25 October 2012 (UTC)

Did it? “If not” = “If there aren't any attested writings, or even words quoted in other sources, in Goguryeo”; “I suggest we can the language whole” = “I suggest [that] we can the language whole” = “I suggest that delete all Goguryeo content”. —RuakhTALK 21:56, 25 October 2012 (UTC)

LOL, thank you. "Shitcan" is a word I understand, but "can," not so much.... --BB12 (talk) 22:35, 25 October 2012 (UTC)

I think it comes from the "trash can" sense of "can": "can it" = "throw it in the can". —RuakhTALK 22:40, 25 October 2012 (UTC)

Re Liliana: LinguistList says: "The Archaic Koguryo corpus dates to the third and fourth century A.D. and consists of about a dozen identifiable lexemes recorded in Chinese historical and geographical accounts of the Koguryo kingdom. The Old Koguryo corpus, largely dating to the seventh and eighth centuries, consists of over a hundred lexemes found in the form of glossed toponyms, plus a small number of words recorded in Chinese historical and geographical accounts." From what I've seen, I assume that Beckwith is working off this data. —Μετάknowledgediscuss/deeds 23:56, 25 October 2012 (UTC)

The problem is very simple: things are set up right now to reward vagueness and uncertainty. If there's any chance the entries don't meet CFI, stalling might seem like a good way to avoid that fate until everyone gets bored and moves on to other problems. That might not be what's really happening here, but even if it's unintentional, the result might be the same.

The issues involved aren't that complicated, though: the question you're posing is whether these entries meet CFI- so let's find out. We have a very simple, easy way to get a response- start rfving selected words. This will force Joseon814 to produce some kind of evidence, or at least an explanation. Either that, or the entries will get deleted, and the problem will be on its way to being solved, that way.Chuck Entz (talk) 02:12, 29 October 2012 (UTC)

Fortunately, it may not come to that. I see that at least Joseon814 is aware of the problem and seems cooperative. Disregard the above. Chuck Entz (talk) 04:07, 29 October 2012 (UTC)

I wouldn't mind taking over for the time being. Is there somewhere I can find out the process for updating the WotD template? Astral (talk) 07:13, 27 October 2012 (UTC)

If I were you, I'd ask Sche for pointers. Among other things, the reason WOTD takes so much work is that it requires audio and some other stuff. (Speaking of that, I'd better do some FWOTD work while I have a chance!) —Μετάknowledgediscuss/deeds 15:52, 27 October 2012 (UTC)

I did October 28-31 by looking over diffs to see what others had done. Will definitely seek Sche's wisdom before proceeding any further. Thanks! Astral (talk) 00:25, 28 October 2012 (UTC)

en-verb2

¶ All correct then. So, do you remember when I was talking about creating a different verb template with archaic additions? After a lot of scrambling, I managed to create a template that allows these forms, and I think that it functions correctly for the most part. The only problem is that, in manually setting each inflexion, it is out of numerical order. I suppose that that should not be too difficult to fix, but I really do not feeling like jacking around with it more tonight.¶ I just need your consent so that I (or also others) can incorporate this template into the words that possess these extra inflexions.

Then, I oppose the use of this template. —RuakhTALK 21:02, 28 October 2012 (UTC)

I like the idea in general, but it needs some adjustments. It either does not belong in the header line, or the archaic forms need to be collapsible. Also, it should only be used for verbs that are very common and well attested in these forms (such as in the works of Shakespeare or the KJV). It would also need to take variations such as -est/-st and -eth/-th into account. --WikiTiki89 08:26, 29 October 2012 (UTC)

Translation requests

Having a category of requests (via {{trreq}}) such as Category:Translation requests (Greek) is extremely useful. I just went to enter a translation of collector and found the translation table packed with translation requests - actual translations are difficult to find. What was this template's intended use? — Saltmarshαπάντηση 06:03, 27 October 2012 (UTC)

I've only ever used it when I know someone is actively checking the relevant category for requests. DTLHS (talk) 00:30, 28 October 2012 (UTC)

On one hand, it looks pretty awful on a page like others, but it did remind me to add the Latin translation by helpfully categorizing it. Maybe excess application ought to be avoided. —Μετάknowledgediscuss/deeds 00:33, 28 October 2012 (UTC)

Personally I would love to have a category or list with all words lacking a translation into a specific language, like Dutch. Right now I have no idea how much needs doing and what. —CodeCat 00:37, 28 October 2012 (UTC)

Such a list can be automatically extracted from a dump. When extracted automatically, the list is complete. I would just ditch {{trreq}}; I find it annoying. This addition of trreq to "others" is pointless, as most of the trreqs are just going to sit there for ages, as we have almost no contributors in most of the languages tagged. --Dan Polansky (talk) 09:24, 28 October 2012 (UTC)

I agree. I have not seen {{trreq}} make a translation appear somehow faster. -- Liliana• 09:32, 28 October 2012 (UTC)

I have. Most productively, I have been inserting {{trreq}} in the English vernacular names of species to cover languages spoken in the natural range of the species, where that range was not universal. Some contributors pay attention to such things. Can it be misused and overused? Of course. I've done so myself. That is how it is with many templates here. A maintenance/cleanup/request category can much more easily be overfilled with the assistance of bots than with a template that is inserted manually. A specific person can usually be dissuaded from overusing it fairly easily. DCDuringTALK 13:43, 28 October 2012 (UTC)

This template serves a useful purpose - it enables just that "requesting". It is not useful if it is used to indicate every future translation that Wiktionary might contain. For Greek translations (which I try to do) I need a list that people want - not a long list where "real" requests will get lost. Perhaps it should either be hidden from users, or placed at the bottom of translation sections. — Saltmarshαπάντηση 07:08, 30 October 2012 (UTC)

I'm starting to think that the requests and the links to words should be separate, maybe similar to how translations-to-be-checked are separate, maybe this hypothetical template: {{trreq-top}}. Any opinions on that? --Lo Ximiendo (talk) 07:40, 30 October 2012 (UTC)

Those for checking appear at the bottom, generally because it is uncertain which sense:translation they apply to. It is usually self-evident which translations are lacking. So {{trreq}} should be used when a specific language:translation is wanted by a Wiktionary user. That way translators have some hope of keeping up with requests, which is impossible if the list they're processing contains potentially thousands of blanket additions. — Saltmarshαπάντηση 05:48, 2 November 2012 (UTC)

I have been actively filling a number of translation requests but I have to say that many of requests will have to sit there for a very long time or will have to be removed. We have no skills in Assamese, Oriya, Aymara, Cebuano, Kirundi, etc. and native speakers are unlikely to join. I have some limited resources and skills for (apart from large state European languages, languages on my Babel list esp. Mandarin, Japanese and Slavic languages for the following: ar, th, my, lo, mn, kk, ky, az, uz, tk. I add mt, ta, te, kn, si only if a translation is easy to find. Perhaps Lo Ximiendo could kindly reduce the number of languages she adds. I agree that we need to boost some unpopular languages, at the moment we fail to provide even basic vocabulary for important languages with millions of speakers. The requests are a reminder that there is still a lot of work to do.

There is also some weird technical problem with adding a translation next to a {{trreq}}. E.g., I can't add a quick Mongolian translation if there are requests for Mirandese, I have to add it manually in the edit mode. Can anyone look into this? --Anatoli(обсудить/вклад) 01:18, 9 November 2012 (UTC)

Croatian -> Serbo-Croation in translation table

Don't know if you guys are aware but the automatic change of all "Croatian" translations to "Serbo-Croation" has put the translation table out of alphabetical order. See bald for example. Any way to fix this? ---> Tooironic (talk) 06:00, 28 October 2012 (UTC)

Yeah. Manually or by bot. In this case, I'm just removing it, because Serbo-Croatian is already listed in its correct alphabetical order. —Angr 08:02, 28 October 2012 (UTC)

I'm on pace to empty this before the end of December. Hurrah! Mglovesfun (talk) 14:17, 28 October 2012 (UTC)

On Goguryeo in an appendix

Nobody has really addressed the issue above on Gogureyo. Would it be acceptable to put the data in an appendix? --BB12 (talk) 21:50, 28 October 2012 (UTC)

Has it been confirmed whether the terms Joseon814 added are attested or reconstructed? — Ungoliant(Falai) 00:54, 29 October 2012 (UTC)

Despite multiple requests to discuss this issue, Joseon814 continues to be uncommunicative (though they asked a short question above). Ruakh has blocked them for a week, which seems appropriate. --BB12 (talk) 20:14, 30 October 2012 (UTC)

IPA of multi-word phrases

I find that many IPA transcriptions of multi-word phrases use spaces between the words. I think that this is wrong in cases where word boundaries are not phonemic. --WikiTiki89 09:13, 29 October 2012 (UTC)

Good point, I've always wondered what a space actually represents in IPA. Pauses are real of course, are they represented by a space? But of course, if nothing is pronounced, nothing should be recorded. This came up with prefixes and suffixes, where users were writing /ɛks-/ for ex-, which is wrong as the hyphen is not pronounced! Mglovesfun (talk) 09:24, 29 October 2012 (UTC)

Spaces have the same function in IPA transcription as they do in regular orthography: splitting up words. Where did you get the idea that this is "wrong"? —Angr 23:02, 29 October 2012 (UTC)

Because word boundaries are not phonemic in most multi-word phrases we include here. Do coalmine and coal mine have different pronunciations just because one has a space? And if the pronunciations are the same, why should the transcriptions be different? --WikiTiki89 08:24, 30 October 2012 (UTC)

Word boundaries are never phonemic; they can never be phonemic because they're not sounds. I'd say coal mine is a single phonological word regardless of how it's spelled because it has only one primary stress. I don't mind leaving the space out of its transcription, but I would certainly balk at leaving the spaces out of the phonetic transcription of the various phrases and proverbs we have here. (Though actually I'd leave phonetic transcription of those out altogether, as the individual words can be looked up separately.) —Angr 13:06, 30 October 2012 (UTC)

In which case cow mine, moose mine, book mine, etc., should all be single phonological "words", since they have the same pattern of stress. Which really stretches the definition of word for me.--Prosfilaes (talk) 20:04, 30 October 2012 (UTC)

Not everything that's a possible phonological word actually exists in the lexicon. If stetch were a word of English, it would be a phonological word too. The fact that English doesn't have the compounds you mention is no argument against the wordhood of coal mine. —Angr 21:09, 30 October 2012 (UTC)

I propose than homophones should have the same IPA transcription. Mglovesfun (talk) 21:21, 30 October 2012 (UTC)

Also consider pico de gallo. It is undoubtedly not a single word. But, pronunciations generally don't separate words in any special way (which is pretty inconvenient for language learners), so why should the IPA transcriptions? I think spaces should, if anywhere, be used only where a speaker would insert some sort of boundary such as a pause. --WikiTiki89 07:58, 31 October 2012 (UTC)

I agree with Angr for the most part. I would balk at removing the spaces from phrases and proverbs; I could go along with removing them from words that have run-together alternative forms. - -sche(discuss) 02:13, 1 November 2012 (UTC)

Anyone have access to the IPA Handbook? Perhaps there is an official recommendation on spaces and hyphens, &c. —MichaelZ.2013-04-09 20:39 z

“In phonemic and allophonic transcriptions it is common to include spaces to aid legibility, but their theoretical validity is problematic.”

“White spaces can be used to indicate word boundaries. Syllable breaks can be marked when required. The other two boundary symbols are used to mark the domain of larger prosodic units. There is also a linking symbol that can be used for explicitly indicating the lack of a boundary.” Refers to spaces and . (syllable break) | (comma) ‖ (full stop) ͜ (link), with examples of use. This seems quite applicable to our transcription of longer phrases. —MichaelZ.2013-04-09 20:54 z

I'm not particularly sensitive to or knowledgeable about pronunciation, but it seems to me that the stress pattern for something like coalmine/coal mine depends on the focus of the conversation. I would expect roughly equal stress for cases where coal mines are mentioned along with other kinds of mines. Further, I would expect mine to be used alone much more often than coal mine/coalmine where the kind of mine is immaterial or well-known, which would cover a great deal of usage, I expect. Is there any source of actual facts about individual terms or are we dependent on general theory, reasoning by analogy, expert opinion, etc ? DCDuringTALK 23:04, 9 April 2013 (UTC)

On fr.wikt, for French words, we use spaces, syllable breaks and links, e.g. au fur et à mesure /o fy.ʁ‿e a mə.zyʁ/. This is more readable than /o.fy.ʁe.a.mə.zyʁ/. Dakdada (talk) 18:48, 10 April 2013 (UTC)

That is clearer. The undertie should be a combining breve below U+035C: /o fy.ʁ͜e a mə.zyʁ/—MichaelZ.2013-04-10 20:41 z

Translation nesting

While we're having a good go at cleaning up translations, can we look at translation nesting? For example, there's a lot of *: Valencian under Catalan entries. Do we want these, or rather {{qualifier|Valencian}} instead? Mglovesfun (talk) 09:21, 29 October 2012 (UTC)

Or just nothing and have the {{context|Valencian}} on the entry itself. --WikiTiki89 09:28, 29 October 2012 (UTC)

I guess it makes sense to keep the nesting as Valencian is quite different from Catalan spoken in Catalunya (Eastern Catalan). Officially it has even the status of a separate language. Matthias Buchmeier (talk) 09:40, 29 October 2012 (UTC)

I think that having too many subsections or qualifiers makes it harder to find the actual translations, which is why such information should be kept to a minimum in translations tables but explained fully in the language's entry, using context labels, usage notes and so on. I wasn't really wanting to discuss Valencian so much as bring up the issue in general of where we draw the line. Mglovesfun (talk) 10:09, 29 October 2012 (UTC)

I would nest lects that are sometimes considered separate languages (like the Chinese or Romani lects), and use qualifiers for dialects (like Quebec French). - -sche(discuss) 14:35, 29 October 2012 (UTC)

Note that Wiktionary:Translations has no mention of nested translations. Would it be possible to compile a list of permitted languages / dialects and place it there? DTLHS (talk) 22:12, 29 October 2012 (UTC)

Review of the topical category tree

I've always thought it strange that we consider things like Category:en:Colors to be subcategories of Category:en:Physics, that Category:en:Trees is a subcategory of Category:en:Botany, and that Category:en:Water is a subcategory of Category:en:Chemistry. While this is a valid way to categorise topics, it does not agree with how language speakers generally categorise the concepts they speak of. People do not usually regard trees to be a specific part of botany, nor do people think of physics when they name an object's colour, nor do people envision chemical reactions in the context of water. In a sense, it is like our topical categories are backwards, listing basic concepts as subcategories of more advanced concepts when it should really be the other way around. This really bothers me when we look at categories for old languages like Gothic and see things like Category:got:Social sciences, Category:got:Linguistics and Category:got:Onomastics, none of which contain any entries and likely never will, just for the sake of their subcategory Category:got:Demonyms. What do you think of this situation? Should it be cleaned up, structured differently? How could it be improved? —CodeCat 20:18, 29 October 2012 (UTC)

If we're to have topical categories (and, as I've said before, I'm against it — but I know I'm outnumbered, so, if we're to have them) then they should be distinguished from jargon categories. A category for terms about physics and a category for terms in physicists' jargon should be two distinct categories: the first might perhaps include a category for terms about colors and the second would be populated (at least, populated for the most part) by {{context|physics}}. This would have the following benefits: (1) It'd be easier to find physicists' jargon terms: the category structure would generate a list of them. (2) Right now we must have a phyics category because we need it for physicists' jargon; if we have a separate physicists'-jargon category, then we could discuss whether a physics category is even worth having, and if the answer is no then that obviates your (CodeCat's) question. (3) Even if we do keep the physics category, we can structure our topical-category tree and our jargon-category tree differently from one another: the jargon categories would naturally fall into categories like we have now: science→mathematicsics→topology, but the topical categories could be according to some other system and, in particular, need not follow the subject-matter-as-studied-in-universities hierarchy that leads to physics→colors.​—msh210℠ (talk) 01:21, 30 October 2012 (UTC)

We might want to consider just eliminating all categorization of categories except where a user might think the entry would be found in the potential parent category instead of the more specific subcategory. --Yair rand (talk) 04:48, 31 October 2012 (UTC)

The categories aren't as closely subcategorised as you make out. en:Water isn't directly in en:Chemistry, it's in en:Liquids, which is in en:Matter, which is in en:Chemistry. Similarly, en:Colours is in en:Light, and en:Trees is in en:Plants. That doesn't seem too bad to me - in fact, the categories are laid out just I'd expect. High categories represent high levels of abstraction, low categories represent lower levels of abstractions. It's similar to how the Dewey Decimal system would categorise books about these topics. A book on colour for instance would be categorised with the number 535.8 - 500 for science, 30 for physics, 5 for light, .8 for colour. Certainly, any scheme that tried to reverse this, with "basic concepts" above "advanced concepts", would be incredibly unwieldy. I'm not sure colour would go, if not under "light" and "vision", nor what these are if not subcategories of physics and physiology respectively.

Obviously this kind of a silly scheme to use in languages that died out before the invention of the scientific method, but I don't see it being actively harmful, and the slight tidiness you'd get in a language like Gothic by collapsing or rearranging the category tree would surely be outweighed by the mess that big modern languages like English or German would become. Smurrayinchester (talk) 08:40, 31 October 2012 (UTC)

The issue I have with the current system is that it causes more basic terms to be nested many levels deep, requiring the creation of many parent categories that serve no purpose. I don't think that Matter belongs within Chemistry at all, since Chemistry is the study of intractions between matter, not something that 'contains' matter all in itself. I kind of like Msh210's idea of keeping human discipline separate from the things it studies, as to me Matter does not exist as part of Chemistry but alongside it. Matter still is matter even if nobody studies it. —CodeCat 02:06, 1 November 2012 (UTC)

The fact that en:Matter is a subcategory of en:Chemistry doesn't mean that all matter is completely described by chemistry, it just means that chemistry is one of the elements that helps describe matter (en:Matter is in both en:Chemistry and en:Physics, since both are involved in the study of matter). Matter is still matter if no-one studies it, sure, but people have studied it, and it seems odd not to take that into account when building a dictionary. There's always room for multiple category systems if necessary; the scientific systems can sit alongside other ones. I wouldn't object to the existence of Category:en:Greek classical elements, for instance, containing en:Fire and Category:en:Water (and of course, air, earth, ether, quintessence, etc). The way our category system works, these can sit happily side by side. Smurrayinchester (talk) 17:53, 2 November 2012 (UTC)

What I meant is, we should be able to have Category:en:Matter without creating a redlink to Category:en:Chemistry. I am not denying that there is a link between them, but it is more than simply saying that chemistry concerns itself with all matter (which is what a subcategory implies). Using subcategorisation as a kind of "related topics" seems like a misuse of the category tree to me. —CodeCat 17:58, 2 November 2012 (UTC)

Reviewing Translingual Han characters

My proposal is that the Translingual section of individual Han characters should not contain lexical definitions. By way of comparison, a does not contain a reference to the English indefinite article. Such definitions, where applicable, should be in the individual languages. The Han character should contain only information about the character. Such as perhaps, as "a Han character with six strokes".

Then what would be the purpose of it? You can tell how many strokes it has just by looking at it, without having to read it in a dictionary. --WikiTiki89 12:12, 30 October 2012 (UTC)

Well, I can't tell how many strokes it has. Mglovesfun (talk) 12:19, 30 October 2012 (UTC)

These meanings aren't as translingual as they may seem. History has already shown that Han characters can change in meaning and usage, and there is of course no guarantee that they change the same way in every language. So I support. —CodeCat 12:25, 30 October 2012 (UTC)

I see nothing wrong with giving the original meaning as translingual. It would be like giving the meaning "ox" for the letter 'a', even though it certainly does not any longer mean ox in any language I know of. --WikiTiki89 12:31, 30 October 2012 (UTC)

I object to this proposal. It would make SOME sense, if every single character had a definition in each CJKV language. As it is, "Translingual" is essentially "CJKV" and the definitions of how speakers of each of the four languages (including all Chinese topolects) describe the Han characters when they see them, not necessarily and not always as words in a normal sense. Many characters have meanings, often symbolic, which work across multiple parts of speech, understood by CJKV speakers, even if they are not used in the modern languages or need something to become a real word in a language. E.g. 食 is a character that means "to eat", "eating", even if in modern Japanese, it's only a noun, 食べる (taberu) is the verb. Still, the Japanese will call the character alone as 食べる (taberu). Modern Mandarin seldom uses the character on its own to mean "to eat" (吃 is used instead) but Cantonese does. Vietnamese use ăn but 食 is used in many Sino-Vietnamese derivations. When making up multicharacter words, individual characters are described as they are in the translingual sections, even if the resulting words have different or unrelated meanings. In any case, we don't have experts and volunteers redefining Hanzi. Moving definitions from translingual to individual languages, even just Mandarin is not safe. The character meaning and the word meanings are related but not identical. Removing from translingual would strip the entries of important info. --Anatoli(обсудить/вклад) 13:09, 30 October 2012 (UTC)

The individual language sections, as seen on 食, for example, provide the meanings for those languages, and the translingual page seems like a convenient way to bring all the information together. There are times, when looking at a character in Japanese, for example, that the meaning used is listed in the translingual section, not the Japanese section. The translingual page makes it much easier to find that information. It would be very unusual for someone using a Phoenician-derived script to be concerned about the various meanings of "a" over the centuries, but that is not the case for CJKV languages. --BB12 (talk) 16:07, 30 October 2012 (UTC)

I'd like to add a bit more. Chinese and Japanese dictionaries are divided into character and word dictionaries (like any other language dictionary). The character dictionaries字典 (cmn: zìdiǎn / ja: jiten / ko: jajeon / vi: tự điển) contain what our translingual sections do but specifying the pronunciation for the particular language, which is split into languages at Wiktionary. They don't specify parts of speech. The Chinese linguists always distinguish 詞(cí) (word) and 字(zì) (character) for definitions (even if a word is just one character). A Unihan dictionary (the well-known CJKV dictionary) contains not just technical info (stroke orders, radicals) but generic definitions applying to all languages, which use(d) Han characters. Removing the definitions from translingual sections would require copying all definitions to all language sections (without parts of speech). There are minor differences in definitions depending on the publisher but generally definitions of hanzi (hon3 zi6, etc.)/kanji/hanja/Hán tự just duplicate each other. --Anatoli(обсудить/вклад) 22:21, 30 October 2012 (UTC)

I also object to this proposal. Putting linguistic arguments aside, the reality is, the vast majority of Chinese characters do not have separate Mandarin entries. Removing thousands upon thousands of useful translingual definitions is going to do nothing to help the average user who just wants to know what a certain character means. ---> Tooironic (talk) 06:34, 31 October 2012 (UTC)

Yes, Mandarin lacks definitions as words or components. The actual character definitions are the same for each CJKV language, including all Chinese dialects. To fix the Category:Mandarin definitions needed, definitions would need to be copied (not moved) into Mandarin under ===Hanzi=== (like Japanese ===Kanji===) header and make that sufficient to be removed from Category:Mandarin definitions needed - it seems ===Hanzi=== is currently insufficient for entries to be removed from the category. Like I said, characters definitions under Translingual are language-neutral, all CJKV languages define character 食 as "to eat" (or "food") (吃, 食べる, 먹다, ăn) even if their corresponding readings are shí, 식 (sik), しょく/た.べる (shoku/ta-beru), thực). Copying into other languages and dialects would create unnecessary duplications. There's no quick solution, IMO, editors should be encouraged to work on single character definitions for Mandarin, which can't be done by a bot. --Anatoli(обсудить/вклад) 21:55, 31 October 2012 (UTC)

Just a point of clarification: character definitions differ according to language. An example is 安, which has the meaning in Japanese (and only Japanese AFAIK) of "inexpensive." The translingual definitions should be used for common definitions, and language-specific definitions used when needed. --BB12 (talk) 22:05, 31 October 2012 (UTC)

Yes, that's correct but the generic character info would still apply in case the Japanese entry lacked the specifically Japanese meanings. In other words, the sense "peaceful, tranquil, quiet" (original Chinese and common CJKV) applies to the Japanese Kanji + "cheap" (specifically Japanese). Mglovesfun suggests to fix Mandarin entries, so Japanese, Korean and Vietnamese are not affected. --Anatoli(обсудить/вклад) 22:20, 31 October 2012 (UTC)

Well I'm not proposing to not fix other languages. If a word/character means the same thing in six languages, can't we just say so? Surely nobody would say that we shouldn't have Mandarin definitions in a Mandarin section (and so on for all languages). Mglovesfun (talk) 22:26, 31 October 2012 (UTC)

If you want to start with Mandarin, ===Hanzi=== should have at least the translingual definitions without parts of speech (copied, not moved!), I'm sure the original was created from a Chinese character dictionary, anyway. As I said, it won't fix Category:Mandarin definitions needed issue you mentioned but it's a good start. The word definitions with parts of speech should be done a person understanding Chinese and understanding the differences between 詞(cí)/词(cí) and 字(zì). E.g., It would be wrong to create noun section with definition "coffee" for "zì" 咖 or 啡, the "cí" for "coffee" is 咖啡, 咖 and 啡 are only rare phonetical components used to make some words (not sure what heading they should have) apart from ===Hanzi===. --Anatoli(обсудить/вклад) 22:40, 31 October 2012 (UTC)