November 2010

What does it mean to say, in an etymology section that Alemannic German Chue comes from a "Germanic source" but that Mànn comes from "Proto-Germanic"? Didn't Chue derive from Proto-Germanic? Didn't Mànn derive from some Germanic source later than Proto-Germanic? (Heck, even Proto-Germanic itself is Germanic, I suppose, but no matter.) Shouldn't we merge these two categories, and all similar pairs? (Note, some words, like camouflage, appear in both categories!)​—msh210℠ (talk) 19:20, 15 November 2010 (UTC)

I believe the first category is intended for words borrowed from an unknown Germanic language, whereas the second is used when we know the derivation occurred from Proto-Germanic specifically. —CodeCat 19:26, 15 November 2010 (UTC)

Not if the word can be shown to have been derived from a source later than Proto-Germanic. It's also possible that the source of derivation is uncertain and may or may not have been Proto-Germanic. —CodeCat 19:33, 15 November 2010 (UTC)

But doesn't everything that derives from — for example — English derive from Proto-Germanic? And doesn't everything that derives from Proto-Germanic derive from a (possibly unknown) Germanic language?​—msh210℠ (talk) 19:45, 15 November 2010 (UTC)

Yes, but the distinction should be made clear. We shouldn't list the term as being derived from Proto-Germanic if it was really derived from Old Saxon but we can't be sure. —CodeCat 20:24, 15 November 2010 (UTC)

Ah, I see what you mean now. If a Hebrew word derives from Latin via French via English, then it derives from a Germanic language but not from Proto-Germanic. Right, good point. But surely nothing should be in both categories: one implication still holds: if it derives from Proto-Germanic then it derives from a Germanic language, right?​—msh210℠ (talk) 20:42, 15 November 2010 (UTC)

Yes. One is a subset of the other, so the category structure should reflect that. Also, even a native Germanic word might not have been borrowed from Proto-Germanic directly. For example if a language were to borrow hand in that form, then it could have come from any number of Germanic sources. But Proto-Germanic itself can be ruled out because its own term is *handuz. This method is used in for example Finnish etymology to distinguish Proto-Germanic from later Germanic (usually Old Norse/Swedish) borrowing. —CodeCat 20:54, 15 November 2010 (UTC)

My BP discussions

Well, Dan basically said above (off-topic into another discussion) that I create too many discussions here: one per day in November, on average. My personal POV is that if I want to create 100 or 1000 or any other number of relevant BP sections, they should be created. However, I suppose it's worth to check this opinion with more people. Do others agree with him? From the discussed subjects, questions, etc., should I refrain from creating discussions? Thanks. --Daniel. 04:46, 16 November 2010 (UTC)

I think all of the discussions started by you recently have made sense to start except for this one. --Yair rand (talk) 05:04, 16 November 2010 (UTC)

Oh, I was just testing how unfriendly was the situation. :) I never had the intention of fulfilling Dan's request anyway. --Daniel. 05:18, 17 November 2010 (UTC)

I think it's worth considering whether a given discussion is worthwhile, but if you think a discussion is worthwhile, I don't think you should be deterred by having started one the previous day. (There is some frequency past which too many discussions is a problem, but I don't think you've hit it.) —RuakhTALK 19:04, 17 November 2010 (UTC)

1 = "5 minutes", on a clock

By extension, we would have similar definitions for 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 and 12 as well. --Daniel. 20:45, 23 November 2010 (UTC)

I've never seen that definition of "1" used in Swedish literature, so it isn't translingual. Add it to English, if you find citations. --LA2 20:49, 23 November 2010 (UTC)

Are Swedish analog clocks written with Roman numerals? Anyway, "Translingual" does not mean "panlingual", so I don't think the reason that you presented alone is enough to demote the status of this Translingual definition. --Daniel. 21:41, 23 November 2010 (UTC)

I don't think that that's a meaning of 1. (You don't, for example, see 1 on a digital clock or in running text; while that's not conclusive, it's indicative.) Rather, 1 refers only to the number (one) of hours after twelve, and it happens to be that a hand points to it at five minutes/seconds after the hour/minute: a mechanical rather than a lexical fact.​—msh210℠ (talk) 21:49, 23 November 2010 (UTC)

A very creative and novel concept of linguistic meaning. Where is there evidence of use in any language? DCDuringTALK 00:05, 24 November 2010 (UTC)

To continue msh210's above point: 1 doesn't indicate five minutes or five seconds on a 24-hour analog dial/clock (nor would it on a Thai six-hour analog dial). — Beobach 00:25, 24 November 2010 (UTC)

On a whim, I went looking for all the ways 13:00 is represented in text:

On Thursday we left at 1 and I went to the Waldorf to lunch and stayed on until the dance tea — I only daced once — a fox trot — I don't feel a bit like dancing darlint — I think I must be waiting for you. We left the Waldorf at 6.20 and met Alvis at 6.30 and went with her to buy a costume — getting home about 9.

(It was actually quite difficult to find examples of "1", in that form, with no "am" or ":00" after it.) Now, is any variation of 1 or one ever used in text to mean five minutes or five seconds (past), the way that all of the above variations mean 13:00? If not, it doesn't meet CFI. — Beobach 00:56, 24 November 2010 (UTC)

I would look for books about making analog clocks, or books about teaching children how to read analog clocks. By the way, my suggested definition can be tweaked to avoid conflict with the 24-hour and 6-hour analog clocks that you mentioned. --Daniel. 03:45, 25 November 2010 (UTC)

Try searching Google books for the phrase "when the little hand is on the". You get instances such as “When the big hand is on the twelve, and the little hand is on the one, your lunch hour is officially over.” SemperBlotto 08:29, 25 November 2010 (UTC)

Ah, but that's just a description of where the hand is; "one" in such a phrase doesn't mean "five minutes". — Beobach 22:02, 27 November 2010 (UTC)

Distinguishing affixes by part of speech

As far as I know, the only categorisation we have for affixes is '(lang) suffixes', '(lang) prefixes' and such. However, in most languages, affixes characteristically derive words with just one particular part of speech. For example, -less always creates adjectives, while -ness creates nouns. I think this distinction is important enough that we should have separate categories for them, such as Category:English noun suffixes. This category would then be made a subcategory of Category:English nouns as well. Thoughts? —CodeCat 18:53, 17 November 2010 (UTC)

'Productive' doesn't quite work for languages that aren't spoken anymore, though. Nor does it work for affixes that were productive before but no longer are. —CodeCat 19:17, 17 November 2010 (UTC)

True. The reason I added it is that "English suffixes forming nouns" is something that people are going to add any suffix to which forms even one noun, which you seem not to want to include (and I agree).​—msh210℠ (talk) 19:39, 17 November 2010 (UTC)

Support. There are some things it might be nice to clarify — for example, is non- an "English adjective prefix", or do we only categorize what you might call "adjectivizing" prefixes? — but I support this regardless of which way it gets clarified. —RuakhTALK 19:42, 17 November 2010 (UTC)

[e/c] Yeah, I should clarify what I wrote above: I support it with the name "English noun suffixes" too. But I preferred a different name. Not sure I still do, now, though, in light of the discussion above. Well, "suffixes forming nouns" is still better than "noun suffixes" (which can mean "suffixes attached to nouns") IMO.​—msh210℠ (talk) 19:47, 17 November 2010 (UTC)

non- does not turn the affixed word into an adjective as far as I can tell. I think we would keep this one in the old category. I would say the new categories are only for cases where all derived words, regardless of origin, will always have the same part of speech. —CodeCat 19:46, 17 November 2010 (UTC)

This seems promising. Is this intended to included only affixes that have been productive in the language of the applicable L2 section? What about morphemes that were prefixes in an ancestral language and respelled in the entry's language? I'm not trying to create roadblocks, just seeking clarification for this proposal and related matters.

Wouldn't it be better to have the category structure include source and destination PoS? DCDuringTALK 22:30, 17 November 2010 (UTC)

As far as I'm concerned, if it's in Category:English prefixes or similar, it's eligible. And as for including the destination PoS too: the problem with that is that it creates a lot more categories if you have one for each source-and-destination PoS combination. —CodeCat 23:33, 17 November 2010 (UTC)

What are the resource costs of having categories? Of using them? From the length if the delays I experience it seems that the absolute size of the category seems to matter, but that's not much to go on. DCDuringTALK 23:41, 17 November 2010 (UTC)

It's not so much the resources it takes from the server as much as those it takes from us. We have to manage the category structure, and if we have a lot of small categories in a tightly-arranged structure we could easily get lost. I know I would! —CodeCat 23:54, 17 November 2010 (UTC)

I like the capacity to actually have lexical categories that no other dictionary has and apparently no on-line resource, possibly that exist nowhere else. The structure for this set of categories would be easy to maintain because it would be quite transparent. DCDuringTALK 00:07, 18 November 2010 (UTC)

Even when the suffixes are used exclusively to form words in a given lexical category, there's no guarantee that those words will stay in that category. Take the claim above that -ness always produces nouns. Clearly, witness has become a verb. Nouning and verbing may be more or less common in other languages, but are ubiquitous in English. Another problem I see is in positioning these affixes as a special type of noun. Nouns are words. Affixes are not. --Brett 00:29, 18 November 2010 (UTC)

As for the first point, the fact that witness became verbed does not cancel the fact that witness is still a noun and that -ness is a nominal suffix... e.g., I doubt that likeness will ever become verbed... —AugPi(t) 00:55, 18 November 2010 (UTC)

That certainly likenesses the use of many other nouns and adjectives that have since become verbs. The fact that English infinitives are morphologically unmarked might have something to do with it.

That aside, my primary motivation for this proposal was the realisation that a POS header and category of 'suffix' is too vague. In many languages, suffixes behave like the part of speech they 'generate'. They have declension or conjugation, inflected forms, and so on. For example, if the Dutch noun-forming suffix -heid is added to an adjective, the newly formed word would be expected to have a plural (-heden) and diminutive (-heidje) automatically. The suffix -heid itself behaves like it were a noun, the only difference is that it can't be used on its own. —CodeCat 13:19, 18 November 2010 (UTC)

To me, that sounds more like a reason for an Appendix than for a category. --EncycloPetey 20:15, 19 November 2010 (UTC)

Re: "However, in most languages, affixes characteristically derive words with just one particular part of speech." Do we have evidence for this claim, or are we having this whole conversation on the assumption that most world languages behave like English? I certainly know of no reason to think that pre- or micro- form words of just one part of speech, so I assume what's really being discussed are suffixes, not affixes as a whole.

The claim isn't as true for modern English suffixes as it once was, since nouns in "-ship" are often now used as verbs (We fellowshipped together.). In a number of languages (including English) participles can function as adjectives or nouns, in addition to being verb forms. For Classical Latin, the distinction between adjective and noun is not as strong as might be expected, since nearly any adjective could be used as a substantive. The major analytical grammars of Latin typically don't sort suffixes by the part of speech formed, but by whether they are primary word formers (from a root "verb") or form words secondarily (added to an existing word to create a new word). This grouping is potentially more useful. --EncycloPetey 19:49, 19 November 2010 (UTC)

Re: evidence vs. assumption: I disagree. If there is not generally a very strong correlation between affixes and the POS of words formed using that affix, then the proposed category system would be effectively useless. I think an Appendix is a better idea. --EncycloPetey 03:06, 20 November 2010 (UTC)

I don't think there needs to be a particularly strong correlation. As long as there's a decent number of POS-ifying affixes, they can usefully populate those categories, even if they're totally dwarfed by non–POS-ifying ones. (BTW: This hasn't been made explicit, but I'm taking for granted that the members of Category:English noun suffixes would also be members of Category:English suffixes. That is, I'm assuming we're only talking about augmenting the category structure by adding some categories, not about changing anything that's already there.) —RuakhTALK 03:37, 20 November 2010 (UTC)

If a significant fraction of affixes end up in multiple categories, then such a category system isn't useful. Consider -ly can form adverbs or adjectives, depending on the part of speech its attached to (man + ly yields an adjective from a noun, but open + -ly yields an adverb from a preposition/adjective). --EncycloPetey 03:45, 20 November 2010 (UTC)

This point could be addressed by having the category be more specific, including both origin and destination PoS in the lowest-level category names. Do we want to restrict membership in such categories to those affixes that have attestably formed multiple (three is our magic number) words of each given PoS, at least for well documented modern languages? DCDuringTALK 11:49, 20 November 2010 (UTC)

I think a useful, if somewhat subjective, criterium could be to look at how systematic a certain part of speech is for that suffix. -ness only very incidentally forms verbs, and those are always verbs based on an existing noun. The 'intuition' of the vast majority of English speakers will treat a word suffixed with -ness as a noun. For -ly it is more complicated, but there is still a system: when suffixed to adjectives it systematically forms adverbs of manner, while when suffixed to nouns it generally forms adjectives. For example the (nonce) word cloudily would be assumed to be an adverb because cloudy is an adjective. But cloudly would normally be an adjective because cloud is a noun. —CodeCat 14:45, 24 November 2010 (UTC)

Linking between pico and picco

Should pico and picco link to each other, by their topmost "See also:"? By extension, is there the guideline of linking between entries that differ only in the quantity of their characters? I recall seeing this apparent practice in action, rarely. --Daniel. 02:02, 9 November 2010 (UTC)

Do we so link? I've seen people say so; template:also's documentation doesn't indicate as much. Should we? IMO no: they're unlikely deliberately mistyped for one another. (By "deliberately mistyped" I mean typed incorrectly, but where the typist knew what he was typing, as one might type carre for carré if he has a U.S. keyboard, or bang for bằng, as opposed to real mistypings like pico for picco or picco for pico or tribuklaiton for tribulation or fatsisiouds for fastidious, which don't deserve "also" links IMO.)​—msh210℠ (talk) 16:32, 9 November 2010 (UTC)

(Or, to pick bluelinked examples of real mistypings that should not IMO link to one another, earth and warty.​—msh210℠ (talk) 17:23, 10 November 2010 (UTC))

No, they should not link. There's no end to it if we start duplicating every letter. --Makaokalani 17:26, 10 November 2010 (UTC)

Not to be confused with

I wanted to place reciprocal links on gambit and gamut as words which people might easily confuse, or they may be remembering one but looking for the other. Do we have any any policy on adding such "useful" links? __meco 15:13, 3 November 2010 (UTC)

We have {{resembles}}, although I have no idea where to use it. I've seen it at the top of a page like {{also}} and it doesn't work so well. Mglovesfun (talk) 23:02, 4 November 2010 (UTC)

I'd think top-of-the-page should be for words that look alike, whereas words that sound alike — which is highly language-dependent — should be sub a ==Language==. Perhaps use {{resembles}} sub ===See also===? (Iff they're real homophones in some dialect, you can list them as such instead.)​—msh210℠ (talk) 23:06, 4 November 2010 (UTC)

I usually put this information in a "Usage notes" section and/or list the potentially confused word in "See also" (as I've done at Latin servō). However, this should be done with words that are confused, not with words we think might be confused. I doubt that gambit and gamut are used often enough to be confused. --EncycloPetey 02:01, 5 November 2010 (UTC)

Those are numbers, not proof of error. I took a look at the context of "whole gambit", and the majority I came across were using the word correctly in the sense of a chance taken. We need evidence that the term is being widely used as an error, not evidence that a particular collocation exists. --EncycloPetey 19:40, 6 November 2010 (UTC)

Inflection Templates

I'm doing some technical research using Wiktionary data and I've noticed a couple of things that are confusing and not well documented. In the English site most words have inflection tags that identify the word and how it is inflected (i.e. the en-noun tag). The guidelines also document how these tags are formatted as well as how to tag in many languages. I've been reading up on the German inflection tags and trying to understand how they work. However, when I started doing some searches on the German site (de.wiktionary.org) I noticed that none of the entries I tried contained the tags. They instead contained an inflection table that contained the information. Then I went back to the English site and found that many words from German existed there and were using the tags. But, the inflections from the English site using the tags and the German site didn't match up, specifically for datives.

My questions are: why are all Wiktionary sites using these standard tags? why are there German words in the English site? Why are the German words in the English site not fully inflected? Is there are standard format for the inflection tables on the German site? How are inflections handled for other languages? —This unsigned comment was added by Voidmain (talk • contribs) at Novemeber 3 2010.

I find this difficult to comprehend. What tags are these? Context labels? Wiktionary:About German may (should) provide some answers. I think part of the answer may lie in the fact that English doesn't inflect much, i.e. singular and plural only, where German has a case system, each case with a singular and plural. So to avoid putting all those on one line. We use a collapsible table. If you can clarify your questions a bit more, I'll happily answer. Mglovesfun (talk) 23:47, 3 November 2010 (UTC)

By "tags" (s)he apparently means "inflection templates". (I'm guessing this is "tag" along the lines of "HTML tag", rather than along the lines of "annotation".) Note that this section is headed "Inflection Templates", and the first example "inflection tag" was "the en-noun tag". —RuakhTALK 00:22, 4 November 2010 (UTC)

To answer your questions:

Re "why are all Wiktionary sites using these standard tags?": The templates, what you call "tags", make it easier to standardize the appearance, formatting, and the choice of information items to be present at a given location. Some templates even provide semi-automatic inflection and declension. So {{en-noun}} automatically shows plural ending in "s" unless told otherwise.

Re "why are there German words in the English site?": English Wiktionary is a multilingual dictionary; it documents all languages, and uses English as the language of the documentation, or as the meta-language. Thus, a German entry in the English Wiktionary carries a heading "Synonyms" rather than "Synonyme". A usage note on a German term is written in English. The word "Katze" is located in a category whose name contains the term "animals" rather than "Tiere".

Re "Why are the German words in the English site not fully inflected?": Because no one has done the work yet. Some German words in the English Wiktionary are already fully inflected. An example of a word that is fully inflected (or rather fully at least) is machen, in its section "Conjugation"; don't forget to unfold the collapsible inflection tables.

Re "Is there are standard format for the inflection tables on the German site?" Most probably; you have to look at de.wiktionary.org. German Wiktionary is a project rather independent of English Wiktionary, with different policy, templates and common practices.

Re "How are inflections handled for other languages?": Just look around and see for yourself. In the English Wiktionary, a smaller part of inflection information is in inflection lines just above definitions, while larger part is in "Declension" and "Conjugation" sections. However, great many entries do not have a "Declension" or "Conjugation" section yet; no one has added them yet.

Each of the questions could be given much longer answer, but that would really be too much work for me to do. --Dan Polansky 08:37, 4 November 2010 (UTC)

"Atypically long, this word is sometimes employed to imply that the user has an above-average intellect." Is it? Examples? Equinox◑ 20:45, 3 November 2010 (UTC)

Should be at the Tea room? Mglovesfun (talk) 23:50, 3 November 2010 (UTC)

Projects

These two pages are the result of a recent epiphany. I believe they are self-explanatory, as "projects" listing related tasks, policies and goals. I propose using them, and creating new projects when necessary.

Fourthed. Basically, you can't tell people what they should be working on. Even if they are wasting there time on appendices that nobody is ever going to look at. SemperBlotto 11:53, 5 November 2010 (UTC)

I like the idea of "Project:Citations and quotations", but I concur that there aren't enough users to make a dedicated project page worthwhile. Equinox◑ 11:57, 5 November 2010 (UTC)

"The normal standard for modern languages is three independent attestations. However, Ancient Greek, as a dead language, requires only one attestation."

Good idea? I'd say yes. This is what Prince Kassad was talking about during the WT:RFV#wardon debate. Mglovesfun (talk) 15:13, 4 November 2010 (UTC)

Yes, good idea. I have made Wiktionary entries for several hapax legomena in Old Armenian, a dead language. --Vahag 15:28, 4 November 2010 (UTC)

Though it should be probably hard wired into CFI. It seems a little weird that a language-specific page can override the criteria for inclusion. -- Prince Kassad 15:55, 4 November 2010 (UTC)

Should perhaps be in CFI that it's three citations, unless overrided by an 'about' page. Middle French isn't too poorly attested, so I'd generally expect at least two citations, not just one. But for much older languages, one seems ok. Unless editors can't agree on the definition from a single citation. That's the problem. Mglovesfun (talk) 18:53, 4 November 2010 (UTC)

In the case where the definition is not universally agreed upon, it should be okay to write {{non-gloss definition|A word with unknown meaning}}, or something. -- Prince Kassad 19:44, 4 November 2010 (UTC)

I don't think that the CFI should be overridden by an 'about' page, because the 'about' pages are typically (AFAICT) not paid attention to at all except by the language's editors.​—msh210℠ (talk) 19:55, 4 November 2010 (UTC)

Also, "about" pages aren't typically policy: anyone can edit them. What happens when Wiktionary:About Esperanto explains that a word doesn't ever have to have been used, as long as it's officially sanctioned by a founding dictionary? (Actually, Esperanto is an easy target, but lots of languages have academies or other official language authorities that people half-listen to. Ha'Akadémya LaLashón Ha'Ivrít has coined lots of Hebrew words that people have adopted, and lots of words that people haven't. Our colleagues at he.wikt pay great heed to the Akadémya, but we shouldn't.) —RuakhTALK 21:32, 4 November 2010 (UTC)

Well, this would be a good justification for inclusion: explaining that a word has been recommended or coined by some academy or official language authorities, but probably never (or almost never) used. Lmaltier 21:48, 4 November 2010 (UTC)

This amendment for extinct languages was proposed in Wiktionary:Votes/pl-2007-12/Attestation criteria (Terms in extinct languages require only a single citation of use, though reconstructed terms are inherently unattestable and belong in an appendix.) but unfortunately it was not yet incorporated into CFI. Also what should be covered is the lemmatizaion: for well-attested ancient languages with reasonably consistent orthography, some arbitrary lemma form that can be trivially reconstructed should be used even if it's not directly attested. For poorly or relatively poorly attested ancient languages (i.e. the majority of them) or those using inconsistent orthography (e.g. pretty much all cuneiform writings), only actually attested forms should be added. --Ivan Štambuk 11:41, 5 November 2010 (UTC)

slow script warning when accessing any page

Every time I load a page on Wiktionary, I get a dialog box saying that a script is causing the page to run slowly. Choosing not to abort the script will cause my browser to freeze. I usually use IE8, but the problem disappeared when I switched to Firefox. Is anyone else experiencing this? --Ixfd64 23:43, 5 November 2010 (UTC)

Note that this was actually a recent change done by User:Mglovesfun. I'm not sure whether that was a really good idea, it seems to only overcomplicate things and we certainly don't need to become even more complex (we're already the hardest wiki to get into) -- Prince Kassad 19:34, 6 November 2010 (UTC)

Um, what was a recent change? --EncycloPetey 19:36, 6 November 2010 (UTC)

The etymological category structure is intended to show primarily the passing of words from one specific language to another specific language. The fact that there are few or no items currently in a category is no reason to change that, as the category structure (1) anticipates continued growth of Wiktionary, and (2) is parallel across all the hundreds of languages that exist. It would be maddening to try to have that structure different for each language with no commonality. --EncycloPetey 19:36, 6 November 2010 (UTC)

Yes, I did realize this was (or could be) a consequence of this sort of etymological categorization. The idea is not to have everything pouring directly into Category:Etymology. However for languages with smaller etymological category trees, such as Vietnamese, it has the effect of diving up a category that isn't very big to start with. Mglovesfun (talk) 21:40, 6 November 2010 (UTC)

The idea is not to have everything pouring directly into Category:Etymology. - why? Do we have a situation where any given language (including English) derives words from more than 200 languages? -- Prince Kassad 21:45, 6 November 2010 (UTC)

Category:EtymologyThis category has the following 194 subcategories, out of 590 total. I'm not sure than 200 (the usual limit for all categories to be displayed on one page) is such a good marker. Mglovesfun (talk) 01:37, 7 November 2010 (UTC)

I believe that for small categories like vi:Russian derivations that they can be combined with those above, as there is no apparent usefulness to me in creating new cats for "future" entries—which can be put in the top cat, and then, when it grows too large, divided up later. And I think it's implied for anyone who would look up what Russian means (and after that the link to the Wikipedia article w:Russian) can easily figure out that it is part of the Slavic family language.

Still, if consensus is against deleting these categories, I propose that some mention of this discussion or its results be noted at WT:Categorization for future reference. TeleComNasSprVen 04:16, 7 November 2010 (UTC)

To clarify a bit, I don't oppose these categories being deleted per a community consensus (if and when there is one) furthermore, we're missing the point a bit - this is generally how {{topic cat}} works, it can provide very deep category trees. While this works pretty well for English, the non-English languages use the same structure with way less content to fill it. So it's a topic cat issue. Mglovesfun (talk) 12:12, 7 November 2010 (UTC)

{{topic cat}} does not force the creation of deep category structures; it only enforces that the category tree is the same for all languages. Each subtree of the large category tree can be designed to be either deep or shallow.

You're right, it doesn't 'force' depth, but that is currently how we use it, which is what TeleComNasSprVen is complaining about. Does anyone feel that any non-topical categories are too subdivided? Mglovesfun (talk) 11:05, 8 November 2010 (UTC)

I'm not convinced the new system is any worse - nor any better, mind you. I think classifying by language family is hardly 'off-topic'. Mglovesfun (talk) 15:10, 8 November 2010 (UTC)

Blocking of CentralNotice banners

Hi all,

I'd like to alert you to a bug currently on the English Wiktionary main page. This particular change is blocking sitenotices (which includes our 2010 Fundraising banner) from displaying on the main page. While I'm not sure what brought about those edits, the 2010 fundraiser will be kicking off next week, and we really do need the banners to be displayed on what is arguably our second most successful project. I'd like to ask the community to undo these changes, at least for the duration of the fundraiser; and if they need to be reviewed for reinsertion, that can be done once the fundraiser has ended. Regards,

As you point out, it's not a bug, rather an edit made by (I think) popular demand. However my personal view is when I'm getting something for free, I'm prepared to watch some advertizing to get it for free. Mglovesfun (talk) 01:22, 7 November 2010 (UTC)

Sitenotices really mess up the look of the main page, but I suppose the fundraiser takes priority. Anyone know how to make it only hide local sitenotices (if that's possible)? --Yair rand (talk) 01:32, 7 November 2010 (UTC)

It looks like using #localNotice rather than #siteNotice might do the trick. —RuakhTALK 16:43, 7 November 2010 (UTC)

Yeah, I know it's not a bug, we just logged it that way on our list :). Unfortunately, the fundraiser does need to take priority in the meantime. Can someone post here once it's been fixed, so I can let the tech team know? Thanks. Drosenthal 20:12, 7 November 2010 (UTC)

Correction: I tried switching from #siteNotice to #localNotice, and it didn't work: the contents of the site-notice were hidden on the main page, but it still took up the vertical space, and the bracketed "dismiss" link was still there. I think that's actually worse. :-P So, I think we just have to suck it up and remove that bit of CSS until the fundraiser is over. Does anyone disagree, or can I go ahead with that? (Note that it can take a while for CSS changes to propagate to readers — there's a lot of caching, both server-side and client-side — so whatever change we decide to make, we should make it ASAP.) —RuakhTALK 21:26, 7 November 2010 (UTC)

Would #mw-dismissable-notice work? (There isn't a centralnotice up now, and I've never bothered to check whether it's placed in the #mw-dismissable-notice or not...) --Yair rand (talk) 21:42, 7 November 2010 (UTC)

I don't know — but I just discovered that by "kicking off next week", Drosenthal meant that it starts tomorrow (Monday). I'm not sure exactly what time tomorrow, but regardless, I've disabled the CSS. There was no time to waste. :-P Once the fundraiser starts, we'll be able to see that sort of thing, and test in our own user-CSS pages. —RuakhTALK 22:56, 7 November 2010 (UTC)

Thanks, folks, from the fundraising team. We actually launch this Friday, not today. Philippe (WMF) 18:15, 8 November 2010 (UTC)

Heh, yeah was just about to correct that. We were originally scheduled for today, but we moved it back. Thanks guys for being so quick to get on this, it could have seriously interfered with the fundraiser and that's not good for anyone. Drosenthal 18:22, 8 November 2010 (UTC)

To do this, there are still a lot of unanswered questions left in that discussion. I'm not necessarily against the idea, but we need a specific, demonstrable goal (modelled) for whatever changes we're going to make. --EncycloPetey 19:58, 9 November 2010 (UTC)

I agree. It seems a bit premature to deprecate three widely-used templates without knowing what all instances of them should be replaced with. —RuakhTALK 20:24, 9 November 2010 (UTC)

I agree. The discussion seems to have run out of gas before questions were addressed. Whether we need a vote I don't know, but we need some examples of how hard cases will be handled. This is also an opportunity to consider some questions, not strictly part of the matter narrowly defined, that have not been addressed. Some questions I have are:

When is a separate pronunciation (distinct from what is implicit in initialism or acronym) allowed or encouraged?

When is a separate etymology (distinct from what is implicit in initialism or acronym) allowed or encouraged?

Are proper nouns that are abbreviations of proper nouns more Translingual than the original proper noun? Eg, IBM seems quite translingual to me.

Similarly, what about Latin-derived abbreviations like ibid, et al, etc?

Not all need to be answered, but perhaps the work to implement this reform could also correct other problems with these entries. DCDuringTALK 20:28, 9 November 2010 (UTC)

I support the phase-out of the majority of the uses of the templates in headers.

Since acronym can be ambiguous and initialism is not a commonly known words I think pronunciations for both would always be encourage. It would be nice to have special template for initialism-style pronunciations to standardize display. Would we link to the letter name entries or provide phonetic transcripts directly. As there are multiple pronunciations for several letters, the former approach may be cleaner.

What difficult cases are you thinking?

I don't think there's anything different about acronyms/initialisms in terms of Translingualness. Not that we have very good criteria for this, but I think we can punt.

Some examples that would help me understand are the revised versions of AO, ASF, ATM, and AU. I think that the English sections will occupy about 3 times the vertical screen space of the current entry. That is not a fatal flaw, but it is a drawback. DCDuringTALK 23:46, 21 November 2010 (UTC)

Sure, thanks for mentioning it. I'll remember to cover this option in the vote. --Daniel. 11:59, 9 November 2010 (UTC)

PK, you're suggesting the category-containing categories should be named "Foo language categories", and the entry-containing categories should be named "Foo languages" — but "Sign languages" and "Constructed languages" should be exceptions, being category-containing categories? Why?​—msh210℠ (talk) 16:16, 9 November 2010 (UTC)

I can't speak for Kassad, but I don't see clear evidence to assume that he approves the simultaneous existence of both "Foo language categories" and "Constructed languages". In the discussion WT:RFM#Category:All sign languages and similar categories, Kassad said "I agree with Yair Rand. The topical categories are a bit overspecific and should probably be just deleted. -- Prince Kassad 13:10, 19 October 2010 (UTC)", which I personally interpret as he possibly wanting "Constructed languages", "Sign languages", "West Germanic languages", to contain categories, not entries. --Daniel. 17:01, 10 November 2010 (UTC)

Perhaps an option should be to have the entry-containing categories collapsed into one (Languages), and the category-containing categories named as you (Daniel) suggest ("Foo language categories").​—msh210℠ (talk) 16:16, 9 November 2010 (UTC)

"inflection-line" sounds like something native German speakers would use when trying to write this English word. I think it should be "inflection line". -- Prince Kassad 00:39, 9 November 2010 (UTC)

Not necessarily. In the name of the category, the term "inflection line" is used as a modifier of the term "templates". When used as modifiers, noun terms are often joined with dashes. So my understanding anyway; I hope any of the native speakers here corrects me if I am wrong. --Dan Polansky 08:37, 9 November 2010 (UTC)

I'd prefer it without the dash but that's just because it's easier to remember. —CodeCat 09:34, 9 November 2010 (UTC)

I prefer inflection-line (or without a hyphen) as the table/line distinction is a useful one for editors. Mglovesfun (talk) 09:36, 9 November 2010 (UTC)

"Category:Eye dialect" and "Category:English informal spellings"

I think so. "Eye dialect" is the written representation by an author of someone else's dialect speech in which words are spelled in a manner which indicates a non-standard pronunciation. In contrast, at least some of the examples are intentionally different or short spellings not put in someone else's mouth. DCDuringTALK 01:23, 9 November 2010 (UTC)

Citations for fiction

Currently, Appendix:Marvel Comics/mutant has some quotations, together with their respective definitions. At least once, in previous discussions, there was the suggestion of never placing quotations within appendices for fictional terms.

"See also" in Wikisaurus

The page Wikisaurus:person contains only one sense of "person": that is, "human being", "individual". That word naturally has multiple different definitions. It is reasonable, for example, that someone searches for a grammatical person (first-person, etc.) within that page. However, in that case, the proper Wikisaurus page would be Wikisaurus:grammatical person.

I disagree. In particular, I disagree with Wikisaurus having "see also" links that immitate {{also}}, as is seen in this revision of WS:person. If there should be a link from WS:person to WS:grammatical person (of which I am not convinced), it should be in the ordinary "See also" section, such as the one seen in WS:shoe, located at the end of a Wikisaurus entry. --Dan Polansky 08:30, 9 November 2010 (UTC)

I maintain my request for a "very visible link", so I disagree with the second proposal. If I couldn't precisely represent your proposal, please correct it. It is natural to link between Wikisaurus:shoe and Wikisaurus:clothing within the "Noun" section because there is an indirect relation between these two concepts; on the other hand, it is user-hostile to effectively hide the link between Wikisaurus:grammatical person and Wikisaurus:person, because it serves the purpose of disambiguation: if a person comes to the latter expecting to see grammatical persons, he or she should promptly have the chance of going to the correct page, without having to scroll down, effectively having to scan the whole page, until finding the correct link and recognizing it as such. --Daniel. 11:37, 9 November 2010 (UTC)

Attesting romanizations

I would expect this type of information be kept in a policy or a guideline somewhere, but I couldn't find it, so please feel free to point me to any page that enlightens this issue, if possible. Now to the questions.

Should our romanized entries (in Chinese pinyin, Japanese Hepburn, etc.) be attested like other entries, with three independent citations each?

For instance, strawberry in Japanese Hepburn is ichigo; if we can't find three citations for ichigo, should its respective entry be deleted?

In addition, if we do find three citations for a word from another romanization system, should we define it? For instance, if there are three independent citations for itigo, which is the same word in Kunrei romanization, would it merit an entry as well?

Our current practice seems to be to accept or reject an entire script, rather than individual examples of it. In the case of pinyin, not every Chinese word will necessarily have a pinyin spelling we can include, since pinyin depends on a known pronunciation in a topolect that pinyin supports, but in the case of Japanese, any word with an attested kana spelling will have a known Hepburn romanization. (I view this as being like forms of words: if a verb is regular, we don't require that each form be attested, or that any specific form (such as the lemma form) be attested, only that the verb as a whole be attested. Hepburn romanization is "regular" in the same way.) —RuakhTALK 03:28, 9 November 2010 (UTC)

"For instance, strawberry in Japanese Hepburn is ichigo; if we can't find three citations for ichigo, should its respective entry be deleted?" IMO yes. We've deleted plurals for hypothetically countable nouns before. This wouldn't mean the romanization should be removed from the entry, just that it should remain a red link. Mglovesfun (talk) 09:39, 9 November 2010 (UTC)

Good point, Martin. As a note of design, the inflection tables of multiple languages (if I remember correctly, some examples are Latin, Greek and Portuguese) display "black redlinks" (in other words, black text that is linked to nonexistent entries). This practice may be imitated into Japanese and Chinese, by leaving black text when the romanized entry does not exist. However, it should be noted that, there are various romanization systems and they are not much used in Japanese, in comparison to katakana, hiragana and han, so there would be many inexistent Japanese entries in Latin script. The hypothetical rule of "only keeping attested romanizations" is conceivably almost synonymous with the other hypothetical rule "do not define Japanese romanizations at all". --Daniel. 11:18, 9 November 2010 (UTC)

English conjugation tables

The English verb be displays a conjugation table that includes conditional subjunctives and future indicatives. I like this idea. (though the design could be improved and collapsible) Shouldn't other English verbs display conjugation tables as well? (Or, should "be" continue being a special case? I can fathom its high irregularity as a reason to have this unique treatment, but most conjugations in that table are not that irregular) --Daniel. 10:44, 9 November 2010 (UTC)

Most of those are made up. English doesn't have a future indicative per se, and though I don't really object to claiming that it does, we shouldn't claim that "I shall be", "I will be", and "you will be" are all future but that "you shall be" is not. It also doesn't have a future subjunctive, at all; and the "conditional subjunctive" is just the conditional (though I wonder how come that column has only "would" forms, not "should" forms). Also, "thou beest" is a bit anachronistic; when "thou" was in widespread use, its usual subjunctive was "be". —RuakhTALK 16:41, 9 November 2010 (UTC)

Agree with Ruakh about the fictional status of much of the table. It should be removed rather than spread.--Brett 13:11, 12 November 2010 (UTC)

Separators of numbers

I have a very simple request, that can be easily fulfilled by anyone who knowns the usage of Hindu-Arabic numerals in one or more languages.

I've extended this vote by 7 days as it hasn't been announced here. It's for JackBot (talk • contribs) not JackBot2, it means User:JackBot (2) as it previously failed for lack of votes, not actually for lack of support in terms of percentage of vote. Mglovesfun (talk) 13:16, 11 November 2010 (UTC)

I've seen dictionary notes before, never realize we had a category for them. Apparently, even according to the person adding the notes it's an 'experimental' project from 2005. I don't really see what purpose these serve; see Old Italian for an example. We have lots of words that other dictionaries don't have. Quite why we should put that in our entries is beyond me. Mglovesfun (talk) 19:18, 11 November 2010 (UTC)

As I commented once before, this is particularly useless when nobody ever mentions the edition of the dictionary in which the word appears. After all, it might be in a later or earlier edition of the same dictionary. Equinox◑ 21:10, 11 November 2010 (UTC)

It seems awfully tempting to me to delete the whole lot. I'd like to see an example that gives any genuinely useful information. Mglovesfun (talk) 12:49, 12 November 2010 (UTC)

Rikishi is even more bizarre, it lists two dictionaries which do have an entry for it. Mglovesfun (talk) 22:11, 13 November 2010 (UTC)

Delete.

Possible purposes:

scare-quotes: implying that a word is inferior (by indicating that other dictionaries don't include it).

bragging: implying that Wiktionary is superior (for including words that other dictionaries don't).

opposite-of-scare-quotes: implying that a word is valid (by indicating that other dictionaries do include it).

references-ish: implying that Wiktionary is reliable (for verifying our content against other dictionaries').

meta-information: hey, maybe some of readers (or editors) are genuinely interested in what words other dictionaries do and don't include.

I don't know which of those reasons, if any, is the real reason. Maybe a combination. None of them seems very persuasive to me.

Personally, I really hate it when an "other dictionaries don't include this" note gets added to an entry I've put a lot of effort into citing. Maybe if I knew the real reason, I wouldn't hate it so much.

The only place I've seen this done that makes sense to me is for languages that have official governing bodies that maintain official dictionaries or word-lists. It can be useful to know that a particular Spanish word isn't in the RAE's dictionary, for example. For English entries, I can imagine only a few potentially useful situations, such as when a word is found only in dictionaries, or appeared in a dictionary but has since been demonstrated to be an error or a hoax. However, I'm not certain this warrants its own section header, as opposed to "Usage notes". --EncycloPetey 19:56, 19 November 2010 (UTC)

Delete. I don't see that a category is the right structure for a repository of this kind of information. I could see having a metainformation namespace that contained links to as many online dictionaries as possible, references to editions of print dictionaries, negative results about dictionary coverage, as well as various other kinds of metainformation about the entry, including certain kinds of maintenance information. Searching for the Dictionary notes and References headers would be a more reliable and inclusive approach to finding all of this kind of information than using this category, so it has little transitional value either. DCDuringTALK 22:46, 19 November 2010 (UTC)

Per EncycloPetey, ====Usage notes==== seems like a better header when this information is keepable in some way. Mglovesfun (talk) 22:54, 19 November 2010 (UTC)

I don't know if ====Dictionary notes==== can be made into something useful or not (and if just a list should probably be on the Citations page) but a category for these seems entirely useless. DAVilla 19:50, 4 December 2010 (UTC)

Hindu-Arabic

The coverage and maturity of Hindu-Arabic in Wiktionary are growing. Currently, we have, among other pages:

The categories with a script in their names display a standard text from a boilerplate. For instance, the text of Category:Cyrillic script is the result of {{scriptcatboiler|Cyrl}}. However, Hindu-Arabic cannot share automatic standard texts for their categories due to its lack of a code. I then propose creating a code for it too, which may be a non-four-letter code for the purpose of never conflicting with ISO 15924, like Har. (Other non-ISO script codes that we use are "None", "Latinx", "Xyzy", "polytonic" and "musical".) As a result, Category:Hindu-Arabic numerical script might have {{scriptcatboiler|Har}}.

Hindu-Arabic is not a particularly clear name even with the addition of numerical; leaving it out makes it absolutely confusing.--Prosfilaes 12:04, 12 November 2010 (UTC)

I don't think that "Hindu-Arabic numerical script" is valid, and I think that "Hindu-Arabic script" is even worse. "Hindu-Arabic" refers to a number system, not a specific script. In Arabic, for example, writers use the same Hindu-Arabic number system, but with these ten digits: ٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩. I'm not sure if these categories and appendices are intended to cover the entire Hindu-Arabic number system, in which case they shouldn't have the word "script" and we shouldn't be modeling them on the actual script categories; or if they're only intended to cover the script of 0123456789, in which case they shouldn't have the term "Hindu-Arabic". Wikipedia calls 0123456789 "Western Arabic"; we should consult other sources and see if that term is standard. —RuakhTALK 13:01, 12 November 2010 (UTC)

I suppose Appendix:0123456789 would be an understandable name, though I'd disagree with it for other reasons.

From ISO 15924[1], that is our main resource of script codes, the distinction between "script" and "system" seems meaningless, especially since Braille and Mathematical notation are listed as scripts. In addition, we have one script code for musical notation.

Re: "the distinction between 'script' and 'system' seems meaningless": Tell me: do you consider ٠١٢٣٤٥٦٧٨٩ and 0123456789 to be in the same "script"? If you don't, then you are making that distinction, and these categories and appendices are ill-named, because there is no sense of "Hindu-Arabic" that applies only to the latter. If you do, then I have to ask why the categories and appendices seem to be carefully written in such a way as not to include the former. —RuakhTALK 16:15, 16 November 2010 (UTC)

I have attested "Hindu-Arabic" as 0123456789 through Citations:Hindu-Arabic numeration system and Citations:Hindu-Arabic numeral. I am also able to attest other combinations of words, including "Hindu-Arabic number" and "Hindu-Arabic system" as meaning 0123456789 too. From my research, I discovered that the symbols ٠١٢٣٤٥٦٧٨٩ are called "Arabic-Indic numerals", not "Hindu-Arabic numerals". Feel free to correct me by attesting them differently, if you would.

These citations pages also provide historical details that may or not be relevant, such as "It is called the Hindu-Arabic system because it was first developed in India (around A.D. 800) and then refined by the Arabs." --Daniel. 17:54, 16 November 2010 (UTC)

Good point. Furthermore, one lists them alongside the "European" numerals (rather than "Hindu-Arabic" numerals) 0123456789, and while I can't see the relevant page in the other one, it's a book about the Unicode standard. Unicode treats 0123456789 not as "Hindu-Arabic numerals" but as "ASCII digits". I don't object to borrowing Unicode's term "Arabic-Indic", but we can't contrast it with "Hindu-Arabic", because it's a subset of "Hindu-Arabic". —RuakhTALK 18:51, 16 November 2010 (UTC)

@Daniel.: Your citations don't really support your claim. Even with the ones that specifically list the digits — well, people frequently refer to "ABCD…" as "the English alphabet", but that doesn't mean that "abcd…" aren't also "the English alphabet". For that matter, they frequently refer to them simply as "the alphabet", but that doesn't mean that "αβγδ…" aren't also "the alphabet". Many of your citations make clear that Hindu-Arabic numerals were invented in India and came to us via the Arabic world, so obviously either (1) they think "Hindu-Arabic" does include those other symbols, or (2) they don't know what they're talking about. I think #1 is more likely; I suppose #2 is possible, but in that case, it doesn't speak for using those sources as a guide to our own usage. —RuakhTALK 18:12, 16 November 2010 (UTC)

Ruakh, here is a fact that may or may not be relevant here: if Hindu-Arabic only refers to 0123456789 and never to ٠١٢٣٤٥٦٧٨٩, then it would be extremely difficult to prove it by finding reliable sources that state "Don't call ٠١٢٣٤٥٦٧٨٩ Hindu-Arabic"; it would be redundant and unnecessary, like "Don't call 愛 a Latin letter".

If, hypothetically, all our citations call ABCDE "the alphabet" and none of our citations call αβγδ "the alphabet", I can see two reasonable conclusions: either (1) "the alphabet" only refers to ABCDE, or (2) we are missing one nuance of "the alphabet", and may show it by new citations.

Similarly, since I have attested "Hindu-Arabic numerals" as 0123456789 and there is no citation to prove otherwise, I would appreciate very much if you either recognized 0123456789 as the only Hindu-Arabic numerals, or provided sources that call ٠١٢٣٤٥٦٧٨٩ "Hindu-Arabic" too. --Daniel. 06:00, 17 November 2010 (UTC)

Re: "Similarly, since I have attested 'Hindu-Arabic numerals' as 0123456789 and there is no citation to prove otherwise": I disagree. Your 1988 citation explicitly applies the term in such a way that it must cover the forms used in India and the Arab world; and your 1912, 2001, and 2006 cites all come from books that do the same (though not in the specific sentences you quoted). Your 2004 cite seems to do the same as well, though I'm not positive (b.g.c. shows me very few pages in that book, so have to judge mostly by the table of contents). The closest thing to an exception is the 2007 cite: like the other books, its text applies "Hindu-Arabic" to non-Western forms as well, but interestingly it has a table of forms that seems to label only the Western forms as "Hindu-Arabic". (I say "seems to" because the table caption doesn't mention that the table includes the Western forms at all, so I may be misunderstanding the labeling.)
But if you have any doubt, take a look at google images:"Hindu-Arabic".
—RuakhTALK 19:26, 17 November 2010 (UTC)

google images:"Hindu-Arabic" was not helpful because it mainly presented comparisons between various numerical systems. google images:"Chinese numbers" displays the same phenomenon, by listing various instances of 0123456789 alongside Chinese numbers. Although, I didn't wish to force my fragile internet connection by waiting for more than few pages to appear, so I may have missed something from more distant pages.

As for my 1988 quote, it seems to be simply comparing the modern Hindu-Arabic numerals ("Our numerals as we now use them") with an older system of Hindu numbers that originated it.

These Hindu numbers are called Hindi numerals in the following phrase from Wikipedia (that refers to multiple books): "Arabs, on the other hand, call the system 'Hindu numerals', referring to their origin in India. This is not to be confused with what the Arabs call the 'Hindi numerals', namely the Eastern Arabic numerals (٠.١.٢.٣.٤.٥.٦.٧.٨.٩) used in the Middle East, or any of the numerals currently used in Indian languages (e.g. Devanagari: ०.१.२.३.४.५.६.७.८.९)".

Would you please let me see a link to any page of these (or other) books that explicitly calls ٠.١.٢.٣.٤.٥.٦.٧.٨.٩ Hindu-Arabic? To me, they too are simply talking about comparisons and origins of different numerical systems.

Anyway, I would probably be disappointed in learning that English is so poor as to not have an unambiguous and individual term for its own numerical system. From the discussion above, other possibilities for names to be used in Wiktionary might be "European Hindu-Arabic numerals" or even Appendix:Hindu-Arabic numerals/0123456789 (though I do not necessarily agree with them). --Daniel. 08:17, 22 November 2010 (UTC)

Re: "Anyway, I would probably be disappointed in learning that English is so poor as to not have an unambiguous and individual term for its own numerical system": O.K., I think I'm understanding your confusion. English does have an unambiguous and individual term for its own numerical system: it's the Hindu-Arabic system. But this system is not determined by the specific forms of digits (5 vs. ٥) or separators (period vs. comma). See this book for a list of the "[f]ive basic characteristics [that] define the Hindu-Arabic number system".
For a more interesting example, see this book, which uses the phrase "Hindu-Arabic, or Arabic, digits" to refer to the symbols that eventually came to be used in the Hindu-Arabic number system. (It's a bit anachronistic, like using "letters" to refer to logograms that eventually were incorporated into an alphabet, but it's understandable.)
But if you just want "any page of these (or other) books that explicitly calls ٠.١.٢.٣.٤.٥.٦.٧.٨.٩ Hindu-Arabic", see this cite, which speaks of “the 10 ‘Hindu-Arabic’ digits ٠١٢٣٤٥٦٧٨٩”. No ambiguity there.
—RuakhTALK 17:43, 22 November 2010 (UTC)

So "Hindu-Arabic" also applies to ٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩ after all. Then, I think they should be added to the relevant appendix and categories eventually. Thank you for your explanation and sources. --Daniel. 01:06, 26 November 2010 (UTC)

As for ISO 15924, why not use the code prefix Qasc, analogous to qfa? It would prevent conflicts with language codes and generally avoid confusion. -- Prince Kassad 19:05, 14 November 2010 (UTC)

I disagree with using "Qasc" as a prefix, because the codes would be too long (one particular old suggestion was creating codes like Qasc-Hara, which are not feasible). --Daniel. 19:16, 14 November 2010 (UTC)

They are only two characters longer than the qfa codes. Is that too long already? -- Prince Kassad 19:45, 14 November 2010 (UTC)

"Qaaa-Hara" is four keystrokes longer than "qfa-tor" (counting two Shift keys). In other words, "Qaaa-Hara" is 57% longer than "qfa-tor" in keystrokes. It is more keystrokes than you need to type the name of most scripts, which usually would include only one capital letter. --Daniel. 19:59, 14 November 2010 (UTC)

Alternatively, use anything longer than four characters so it does not conflict with ISO codes. -- Prince Kassad 09:52, 16 November 2010 (UTC)

Rhymes adder

I've been working on a script to simplify adding rhymes to rhymes lists (User:Yair rand/rhymesedit.js, available in WT:PREFS). It works much like the existing translations adder, placing input forms at the end of each section, and also automatically updates the entry for the added rhymes upon saving. Right now it's quite likely that there are a lot of bugs in it (due to it being pretty much untested), but assuming it can be made mostly bug-free, do people think that this is something we would want to have enabled by default at some point? --Yair rand (talk) 06:40, 12 November 2010 (UTC)

What happens when the target rhymes page does not exist (which I find is often the case)? --EncycloPetey 19:44, 12 November 2010 (UTC)

I'm not completely sure if this is what you're asking, but if the appropriate language section is not available in the entry or the entry does not exist, nothing is added to the entry, but the red link is still added to the rhymes list. --Yair rand (talk) 01:11, 14 November 2010 (UTC)

It pops up an error message. The rest of the rhymes get added OK tho. — lexicógrafa | háblame — 01:38, 14 November 2010 (UTC)

OK, I'm confused even more now. At first, it sounded as though you were adding rhyming words to lists on pages in the Rhymes namespace. Now it sounds as if you're adding a {{rhymes}} link to the pronunciation section of an entry based on what's extant in the Rhymes namespace. Which is it? --EncycloPetey 19:53, 16 November 2010 (UTC)

Input boxes appear on the list in the Rhymes namespace, that are used for adding words to the list, and then the script also automatically adds {{rhymes}} to the entries of each the newly added rhymes upon clicking the save button. --Yair rand (talk) 22:22, 16 November 2010 (UTC)

Is it case-sensitive? It shouldn't, it causes problems for German. -- Prince Kassad 23:30, 16 November 2010 (UTC) (edit: it also does not work with {{top4}} which is a problem for long rhymes lists...)

It's not case-sensitive in its sorting, and I'm pretty sure it does work with {{top4}}. (Note: I just fixed a bug in the script a few minutes ago that was causing it to mess up on any page that had the beginning of a header as the first character of the page. There's decent chance that the problems were due to that bug.) --Yair rand (talk) 00:11, 17 November 2010 (UTC)

ws header: link= or hyperlink=

A short and somewhat annoying edit war occurred between me and Dan Polansky over a parameter of a template. He suggests {{ws header|hyperlink=}}, while I expressed reasons for the shorter {{ws header|link=}} in the discussion WT:GP#Perpetual redlinks at Wikisaurus headers. He apparently is trying to dictate his opinion by ignoring the discussion and letting only the parameter "hyperlink=" exist in that template. Since then, I equally programmed both parameters, and protected the template. He subsequently asked me to unprotect it in order to remove "link=". I declined, and created this discussion instead. --Daniel. 09:02, 12 November 2010 (UTC)

I am dictating my opinion over your opinion, given I am a major Wikisaurus contributor, and you are a mere Wikisaurus meddler whose first edits to Wikisaurus were so horrible that I am still not sure whether they were not mere provocation. If more people support "link=" over "hyperlink=", I will not stand in the way. But as long as it is one me against one you, I should prevail.

I ask you fourth time to unprotect {{ws header}}, a template which was a subject of conflict between you and me. --Dan Polansky 09:16, 12 November 2010 (UTC)

Dan, discounting the fact that you don't like it that way, if both link and hyperlink achieve the same functionality why do not want them both as valid parameters? - [The]DaveRoss 14:17, 13 November 2010 (UTC)

The reason why I do not want to have two valid parameters is that I want no disunity of use of parameter names and template names. If the community prefers "link=" over "hyperlink=", let us go with "link=" and deprecate "hyperlink=".

Anyway, this conflict over link= vs hyperlink= must seem pretty petty to everyone uninvolved. The reason why I so stubbornly revert Daniel Dot (two times so far in {{ws header}}) is that this is not only a conflict over link= vs hyperlink= but also a conflict between me and his style of proceeding in Wiktionary. He likes to proceeed unilaterally in spite of disagreement of other people, with the justification that he has given his reasons and arguments, and that no one has convinced Daniel Dot that his arguments are wrong or insufficient. --Dan Polansky 07:47, 15 November 2010 (UTC)

It only applies to one version of one video game. Why do you want to pollute Wiktionary with these useless minutiae? Equinox◑ 14:05, 12 November 2010 (UTC)

I am not trying to "pollute" anything; I'm rather discussing this issue. Should I know any practice that draws a line where "slide delay" (as a term from one version of one game) should not be defined on Wiktionary? Or, alternatively, do you have such a proposal? If yes, why? (Especially in comparison with many rare words that are questioned through RFV and RFD regularly, but follow the rule of being attestable if they have three independent citations for one definition.) --Daniel. 17:48, 12 November 2010 (UTC)

Whether they are "useless minutiae" or not is no up to us to decide. They are real words, real as in "in use". We already have thousands of words used in extremely specific contexts, regions or time periods, or not used anymore at all (archaic or obsolete, not to mention those from extinct languages). All of these words ought to be relocated to the main namespace and this ridiculous discrimination based on the alleged fictitiousness (alleged because we already have thousands of other fictional universe words, e.g. those from religious works, but which are "OK") needs to stop. --Ivan Štambuk 19:41, 12 November 2010 (UTC)

I really think WT:FICTION is unfair to a lot of terms in that it essentially considers a very large corpus to be non-independent and furthermore invalidates even a single quotation from that corpus. By the same logic the modern sense of truthiness should be removed because all but two citations refer to Stephen Colbert. It would make a lot more sense to me to require all but one quotation to be independent of reference to the universe, or better all but one to be from a work that is not immersed in nor principally concerns that universe, but even that would leave this stricter interpretation of "independent" than is otherwise applied. DAVilla 19:42, 4 December 2010 (UTC)

I'm not sure I understand the definition, but it seems like this isn't a "term originating in a fictional universe". That said, I don't see how it meets the CFI. I just looked through the Google Books, Google Groups, Google Scholar, and Google News Archive hits, and not one seemed to be relevant. Not even a mention. —RuakhTALK 19:27, 12 November 2010 (UTC)

Agreed, it did not originate from the work, so WT:FICTION does not apply in searching for citations, but it still does have to be cited. DAVilla 19:36, 4 December 2010 (UTC)

Modifying widely-transcluded templates.

My understanding had been that we're not supposed to modify widely-transcluded templates unless the modifications have been discussed and are supported. However, a number of administrators do make such edits, and when I notice them and roll them back, I often take flak for it. So, question: Was my understanding mistaken? Is there no expectation of previous discussion? (And if not — did there use to be, but isn't any longer? Or was there just never such an expectation, and I've been walking around these past four years believing something that was never true?) —RuakhTALK 21:26, 12 November 2010 (UTC)

I'm for being more bold. The discussion mechanism on Wiktionary is broken: look at the last fifty topics at the BP, how many of them reached a conclusion? --Vahag 07:44, 13 November 2010 (UTC)

I tend to agree. While there is nothing wrong with discussing problems, only a few discussions ever really have a useful result. Most of the time they either die off before anything happens, or people just keep discussing endlessly. —CodeCat 10:06, 13 November 2010 (UTC)

To clarify: I wasn't expecting discussions "at the BP", nor "discussing problems". What I do, when I want to change such a template, is I post a note on the template talk-page along the lines of:

Anyone object to my doing [change]? Because [reason for change, if not obvious].

True, but not everyone has such templates in their watch lists. Then again, not everyone follows BP or GP either, so I guess neither solution is really perfect. —CodeCat 14:05, 13 November 2010 (UTC)

I would propose that people be free to make changes they see as necessary, but also be very quick to roll back any changes they make if anyone voices concern. If there are concerns then discussion can take place, if no concern is expressed then changes don't have to wait for potential discussion. Ruakh it might reduce the amount of flak you get if you don't rollback directly but instead leave a note on the editors talk page saying that you have a concern about the particular changes. - [The]DaveRoss 14:09, 13 November 2010 (UTC)

The incident that sparked this was my blowing up over Ran rolling back my edits to {{el-noun}}. The revision, and the summary of my mods:

I used <!-- --> to space out the coding, because it makes it easier to see what's going on and make changes.

I removed some extra spaces and the colon between "plural" and the linked plural form.

Nothing substantial was really changed with the exception of the spaces and colon, but it still apparently warranted a rolling back, which coupled with my bumpy past with Ran just really made me blow a gasket. — [ R·I·C ] opiaterein — 18:38, 14 November 2010 (UTC)

Per Opi, depends what the modification is. If it will affect how editors use the template, or significantly changes what it displays, then yes. If we're talking about adding or removing spaces, adding sc=Grek or something like that, it would be mad to revert as you'd just load the job queue even further, with no related benefits. Mglovesfun (talk) 15:10, 15 November 2010 (UTC)

Listing some nonfictional terms

For conveniency of editors interested in the distinction between "fiction" and "nonfiction" (as explained on WT:CFI and WT:FICTION), I am listing here the terms that are nonfictional but are directly related to fictional contexts.

They have been kept in appendices by me; however, in the light of recent discussions, they should be moved to the main namespace. (Or, to comply with the opinions of some people, some are to be outright deleted.)

I believe that all of them are citable, especially the ones from famous franchises such as Pokémon. I also have a personal list of some hundreds of nonfictional terms related to Pokémon that I may add to Wiktionary eventually.

As I mentioned above, I don't believe that slide delay is citeable. Nor Invisible Shiny Bulbasaur. Most of the others would fall under the "names of specific entities" rule, and I find it hard to believe that any of said would survive RFD, with the possible exception of Star Trek (though that one may actually be subject to the brand-name requirements, I'm not sure). A few may actually be valid, but even for those, honestly, I'd be inclined to speedy-delete mainspace entries that weren't well-cited. They don't seem worth RFVing. —RuakhTALK 02:32, 13 November 2010 (UTC)

The distinction between "fictional terms" and "nonfictional terms" is irrelevant. These are words that are used in spoken/written language, and that's all that matters. Whether they are citeable in the works of a particular franchise or also outside it is irrelevant. If they can be cited in the works of 3+ different authors, we must include them. We already have thousands of words from all sorts of religious fiction which are cited/citeable exclusively within the works of the same franchise of religious fiction. This discrimination harkens back to the elitist days of early lexicography when dictionary compilers used only works from "established" writers, i.e. upper-class folks with formal academic training, and ignored what was perceived as peasant speech of the unsophisticated and illiterate vulgus. Now we repeat the same discriminatory attitudes by selectively ignoring the immense corpus of literary/non-literary fiction, because it's not "serious" enough, despite the fact that these words are known to hundreds of millions of people. --Ivan Štambuk 11:44, 13 November 2010 (UTC)

"If they can be cited in the works of 3+ different authors..." Good luck. Equinox◑ 01:29, 14 November 2010 (UTC)

Hm, this view is starting to make sense to me. Given that we do include other context-specific terms in the mainspace, it looks like the only thing holding back terms from fictional universes is the "unprofessional" nature of them. I don't want Wiktionary to turn into Urbandictionary, but not including words just because they came from fictional universes sounds a bit hypocritical... --Yair rand (talk) 03:29, 14 November 2010 (UTC)

For feedback, I have recently created these few nonfictional terms directly related to fictional universes, in the main namespace:

How exactly am I wasting the time of anyone? --Daniel. 13:45, 15 November 2010 (UTC)

I don't think you should be able to get round the fictional universe criteria by using "in the Harry Potter video game" or "in Pokémon card games". If I watch an episode of Dr Who, it's a fictional universe, but the TV program itself does exist, it's just fictional in its nature. Mglovesfun (talk) 15:08, 15 November 2010 (UTC)

But the TV program itself is not an entity originating in a fictional universe. The TV program comes under the broad head of names of specific entities. A TV program such as "Dr Who" would be probably voted down in RFD, but not because of the section WT:CFI#Fictional universes. Put more broadly, names of literary works do not come under the head of WT:CFI#Fictional universes. --Dan Polansky 15:44, 15 November 2010 (UTC)

I have not proposed the creation of Dr Who as an entry, but from Ruakh's comment about Star Trek, I suppose it may be created if it meets WT:BRAND.

If "Basic Pokémon" and these other words are somehow unwanted even if attestable, I suggest clarifying (that is, amending, changing) the CFI to convey this fact with different criteria, closer to the consensus. As a related example, chess is another game that is arguably "fictional in nature", with its "bishop", "king", etc. as characters engaged in a war. --Daniel. 15:49, 15 November 2010 (UTC)

New Free e-book on dictionary use

If you're interested in learning more from empirical studies about how people actually used dictionaries, you should find Herbert A. Welker's new e-book very helpful. Entitled, Dictionary Use: A General Survey of Empirical Studies, it contains, an introduction, and surveys or abstracts of 320 empirical studies arranged as follows:

Table of Contents

Introduction

Situations of use

Different divisions of research topics

Methods

Questionnaire surveys

Interviews

Observation

Protocols

Tests and experiments

Log files

The “dictionary use scenario”

Reference needs and skills

Reference needs

Reference skills

Criticism

Surveys

Surveys of native speakers’ use of monolingual dictionaries (L1Ds)

Surveys of the use of monolingual learners’ dictionaries or of bilingual dictionaries

Studies of actual dictionary use

Studies of the effects of dictionary use

Reading Comprehension

Writing

Translation

The effects of dictionary use during translations done by FL learners

The effects of dictionary use during translations done by translators or “trainee translators”

The effects of dictionary use on vocabulary learning

Studies of specific dictionary features and of specific dictionaries

Research on the use of specific dictionary features

Actual use

The effects of specific dictionary features

Studies of the use of specific dictionaries (or of a specific dictionary type)

Rhymes adder

I've been working on a script to simplify adding rhymes to rhymes lists (User:Yair rand/rhymesedit.js, available in WT:PREFS). It works much like the existing translations adder, placing input forms at the end of each section, and also automatically updates the entry for the added rhymes upon saving. Right now it's quite likely that there are a lot of bugs in it (due to it being pretty much untested), but assuming it can be made mostly bug-free, do people think that this is something we would want to have enabled by default at some point? --Yair rand (talk) 06:40, 12 November 2010 (UTC)

What happens when the target rhymes page does not exist (which I find is often the case)? --EncycloPetey 19:44, 12 November 2010 (UTC)

I'm not completely sure if this is what you're asking, but if the appropriate language section is not available in the entry or the entry does not exist, nothing is added to the entry, but the red link is still added to the rhymes list. --Yair rand (talk) 01:11, 14 November 2010 (UTC)

It pops up an error message. The rest of the rhymes get added OK tho. — lexicógrafa | háblame — 01:38, 14 November 2010 (UTC)

OK, I'm confused even more now. At first, it sounded as though you were adding rhyming words to lists on pages in the Rhymes namespace. Now it sounds as if you're adding a {{rhymes}} link to the pronunciation section of an entry based on what's extant in the Rhymes namespace. Which is it? --EncycloPetey 19:53, 16 November 2010 (UTC)

Input boxes appear on the list in the Rhymes namespace, that are used for adding words to the list, and then the script also automatically adds {{rhymes}} to the entries of each the newly added rhymes upon clicking the save button. --Yair rand (talk) 22:22, 16 November 2010 (UTC)

Is it case-sensitive? It shouldn't, it causes problems for German. -- Prince Kassad 23:30, 16 November 2010 (UTC) (edit: it also does not work with {{top4}} which is a problem for long rhymes lists...)

It's not case-sensitive in its sorting, and I'm pretty sure it does work with {{top4}}. (Note: I just fixed a bug in the script a few minutes ago that was causing it to mess up on any page that had the beginning of a header as the first character of the page. There's decent chance that the problems were due to that bug.) --Yair rand (talk) 00:11, 17 November 2010 (UTC)

Okay, this has been sitting here for quite a while. If there are no objections, I'm turning this on in a few days from now. --Yair rand (talk) 19:19, 21 January 2011 (UTC)

The rhymes adder is now enabled by default. --Yair rand (talk) 19:51, 23 January 2011 (UTC)

Geographic language categories

There seems to be no real consensus on how to handle geographic language categories. This is evidenced by the recent RFDOs on Category:Languages of Tibet and Category:Languages of New Mexico, as well as the historic RFDO on Category:Languages of the United States Virgin Islands (passed as no consensus). Some people are inclusionists and think we should have any conceivable area, others think it is redundant and useless. There is as of yet no policy on this, only the draft Wiktionary:Language categories created by myself which generally only allows language categories for internationally accepted sovereign countries, with other areas being decided on a case-by-case basis. The question is whether this should be elevated to policy, or whether we want a completely new idea. -- Prince Kassad 16:29, 16 November 2010 (UTC)

As for your specific concern of inclusionism, it makes sense to create categories for small areas if there are languages restricted to them: for example, Category:Languages of Vatican. --Daniel. 05:23, 17 November 2010 (UTC)

Well, that's a country anyway. But let's say, er, Languages of Naples (Neapolitan) and Languages of Dalmatia (Dalmatian)? I think not. I'm for restricting to countries (whatever they might be. I suppose internationally recognized countries, or some such. Including past ones, like the Holy Roman Empire, though I think that including all of the USSR, the w:Russian Democratic Federative Republic, the w:Russian Empire, and the w:Tsardom of Russia is too much. Where to draw the line...).​—msh210℠ (talk) 06:06, 17 November 2010 (UTC)

Per my RFDO comments, I see no linguistic value in these. Let Wikipedia handle it. I don't really feel strongly about deleting the lot, as I imagine the page traffic for these categories is very, very low. I suppose it's one way to find obscure languages, like people browsing Category:Languages of Italy and finding out there's a Venetian language. That's the only positive point I can think of. Mglovesfun (talk) 23:57, 21 November 2010 (UTC)

We can do it geographically rather than politically: Languages of [the various continents], of the Americas, of the Caribbean, of Iberia, of the Middle East, of the Arabian Peninsula, of the Sahara, of Asia Minor, of the Himalayas, of the Alps, of the Italian Peninsula, of the Great Plains, of the Appalachians, of the Canadian Shield, of Carpathian Ruthenia.​—msh210℠ (talk) 18:38, 1 December 2010 (UTC)

That is even worse. Political borders are clear and precise. Geographical borders are not. -- Prince Kassad 18:41, 1 December 2010 (UTC)

Geographical borders are not, but neither are language borders (if you will: I mean borders of regions in which specified languages are respectively spoken). I suspect the two may match up somewhat, though, not that I know much about this.​—msh210℠ (talk) 18:57, 1 December 2010 (UTC)

I'm just thinking that users are probably more familiar with political countries than geographical areas of the earth. -- Prince Kassad 19:06, 1 December 2010 (UTC)

True, but we run into the problems mentioned above. Category pages can have little maps, and the categories can be subcats of the cats for the areas their referents are parts of.​—msh210℠ (talk) 19:57, 1 December 2010 (UTC)

Wiktionary:Criteria for inclusion#Attestation allows "[a]ppearance in a refereed academic journal" to bypass any other form of attestation. It doesn't require that the term be used (as opposed to mentioned); it doesn't require that the term have been around for at least a year; and it doesn't even require that the journal be durably archived, which doesn't usually come up, but personally I'm not so sure about the durable archival of the e-journal that mentions fluffragette (see its RFV discussion). Oh, and technically, it doesn't explicitly require that the appearance itself be in a refereed article, as opposed to a letter-to-the-editor or something; but I take that to be implicit, so that aspect doesn't worry me. (And, interestingly, it doesn't put any restrictions on what the journal says about the word; technically one could argue that even something like "the form *foobar is unattested, and perhaps impossible" would count. Dunno why we currently shuffle reconstructed forms off to appendices. ;-) )

Does anyone support the current version? (Not necessarily the weirdest quirks resulting from a naïve or too-literal reading, but the overall idea: that a single use or mention in a peer-reviewed paper should bypass all other attestation requirements?)

I think the principle that a refereed academic journal can be assumed to have a higher level of authority than other literature is sound, and that a single citation is therefore are sufficient. The lack of a requirement for a use-mention distinction is concerning. I wouldn't go so far as to say that we must require only uses, however. I am more partial to option three in your vote, though not the exact wording. Academic articles like these, because of their authority, should be trustworthy sources on words that they document without using. A list of phobias on a trivia site is a good example of why we disallow mentions as sources; a list of words in a journal of linguistics or lexicography documenting real words from a linguistic community which may not appear in literature (as from a spoken language, or even perhaps words in written languages that don't appear in print) seems like a good source.

The distinction that I would draw in academic sources, then, is not whether the word is used, but whether the context of the article suggests that the word is an attestable member of a particular language. I realize that's not completely objective, but I think we could make it work. The problem I have with the article where "fluffragette" is found is not that it is a list of words, but that the article itself is about word formation and indicates that the list is of neologisms verbally observed by the author, possibly only once. A field linguist's list of common animal words in Shabo, on the other hand, would seem worthy of pulling from.

To me, it makes sense to extend this principle to all peer-reviewed academic works (like university press books, and not just articles), but I don't like the broadness introduced by "reliable source." I would also assume that when this criterion was introduced, it was not contemplated that a peer-reviewed academic journal could be not durably archived; it would make sense to have that apply to all quotations universally.

And really, if we are looking at the standards for attestation, "Clearly widespread use" is the one that really bugs me. If it is so clear, just demonstrate it with a quotations fitting one of the other criteria.Dominic·t 12:08, 15 November 2010 (UTC)

I agree that we need to do something about tiny languages (and perhaps even tiny dialects of English), but I think it might be better to do that explicitly, rather than by trying to create some general principle that will de-disadvantage those languages. BTW, feel free to modify that proposal in whatever way you see fit. So far you're the only one who's commented supporting it, so for now, you own it. ;-) —RuakhTALK 15:45, 15 November 2010 (UTC)

I may have overemphasized the point about languages without literature (because it seemed like the clearest example) but the point that I am really trying to make applies to all languages we are documenting. Because it is languages we are documenting, after all. Our mission is to document language, not literature. The privileging of the written word over the spoken word, and, at that, the durably archived written word over the word in more transitory media, is something we and all dictionaries must struggle with. Basically, the use requirement forces us to find a word used by its speakers in their own voice (whether fictional or not), but not all voices are recorded that way. That doesn't even hurt just the disadvantaged speakers and their words—there is lots of technical language that is also difficult to cite because it is language that might not see publication in the voice of its speakers either. If a (durably archived) peer-reviewed academic work says something is a real word without necessarily using it, even if we cannot find it in use elsewhere, that is acceptable to me as proof. I do think the questions of use vs. mention and and of one or three quotations are entirely separate though. Dominic·t 10:47, 16 November 2010 (UTC)

I'd go for "Proposed Change 1: Remove", in the absence of a rationale that points otherwise. --Dan Polansky 14:36, 15 November 2010 (UTC)

I'd be happy enough to remove the academic journal criterion - count them as durably archived and therefore as valid citations; yes. To put them ahead of other durably archived sources, no! Regarding "I think the principle that a refereed academic journal can be assumed to have a higher level of authority than other literature is sound" (Dominic) I don't think that's the case. As sound yes, ok, but authority with respect to written language? No. Mglovesfun (talk) 15:02, 15 November 2010 (UTC)

Like Dominic, I like option 3. An alternative is to count mentions in journals (but only uses elsewhere), while requiring at least one use (somewhere). That's what's done in Wiktionary:Votes/pl-2007-12/Attestation criteria. (I like Ruakh's option 3 better than that, though.)​—msh210℠ (talk) 18:47, 15 November 2010 (UTC)

A problem with mentions in articles, particularly linguistics articles, is that transliterations are used of languages that are not written, and even of languages that are not written using Latin letters (example). We don't want the latter, and (I think) we want the former iff the transliteration scheme is standard. AFAICT this problem applies also to option 3 in Ruakh's vote, but not to the alternative I suggest just above, as that requires some attested use.​—msh210℠ (talk) 18:56, 15 November 2010 (UTC)

Furthermore something that is mentioned once, with no usage (anywhere else) there's no way it can support any definition. If I were to write 'gollygalf is an interesting word', that couldn't support any possible sense of gollygalf other than 'we don't know what it is'. Mglovesfun (talk) 19:02, 15 November 2010 (UTC)

Each of the proposed changes is better than the status quo, but I think Proposal 1 (remove the privilege) is best. (At a minimum, we should require that appearances be in durably-archived journals... but I wouldn't add that as an option to this vote, because I fear enough votes might be diluted between it and Proposal 2 that none of the options would have a majority. Perhaps we could have a runoff, if no one option has a majority, but change has more support than the status quo.) One clarification question, though: in the event Proposal 3 (require assertion of common use) passed, and we found a word used in two books and a journal article — but only used, not asserted to be common — it would meet CFI, correct? — Beobach972 19:11, 15 November 2010 (UTC)

I'd been hoping that people would improve the proposals, but no one has . . . I've set the vote to start in three days, to give a last chance for improvements before the vote starts. —RuakhTALK 02:09, 24 November 2010 (UTC)

Does it make sense to differentiate by language characteristics? For example, should we be more willing to rely on academic sources for extinct languages and/or tiny languages? I don't really see any reason to enshrine academic journals in any way for usage in modern languages with abundant online corpora like English. Furthermore academic journals are not readily searchable except by those blessed with access. Public libraries afford only the most limited access to such specialized and costly resources. DCDuringTALK 11:31, 24 November 2010 (UTC)

Non-language three-letter templates

What are our three-letter templates that are not language codes? I remember {{rfd}}, {{rfc}}, {{rfv}}, {{rfp}}, {{rfe}} and {{sic}}. --Daniel. 10:57, 15 November 2010 (UTC)

This is the 15-th discussion in Beer parlour that you have initiated in November. That makes one new Beer parlour thread per day, on average. What about you reduce the rate to one half? Or what about doing something uncontroversial for a while, such things that do not require much discussion? --Dan Polansky 14:31, 15 November 2010 (UTC)

Also {{art}}, {{dat}} and {{gen}}. More specifically, three-letter, no caps, Latin script, no diacritics. Or, using only abcdefghijlkmnopqrstuvwxyz. We had drive to get rid of these on fr: and now apart from one, only a couple exists as redirects, and the redirects as listed as deprecated templates. I've always wondered why we don't renamed stuff like rfv and rfd to avoid future clashes. Mglovesfun (talk) 15:05, 15 November 2010 (UTC)

From Dan's apparent criticism above, I suppose I would not be able satisfy everyone if I tried to. Two months ago, there were people claiming that I don't discuss enough.

Thank you for listing these templates, Martin. --Daniel. 15:31, 15 November 2010 (UTC)

The complaints about your not discussing enough were not about the absolute volume of discussion you generate, but rather about your doing too many possibly controversial changes without a discussion. If you reduce the volume of controversial changes, you will be able to reduce the volume of accompanying discussion you generate in Beer parlour. Anyway, maybe other people see it differently from me. --Dan Polansky 15:37, 15 November 2010 (UTC)

Hmm, I don't remember ever doing a controversial change without it being discussed first, but perhaps my views on controversy and necessity for discussions are different from those of other people.

By the way, I like to point out suggestions for improvement of Wiktionary and question its practices when they are too obscure. I don't feel the need for reducing the absolute volume of discussions created per day by me. --Daniel. 16:08, 15 November 2010 (UTC)

Re "I don't feel the need for reducing the absolute volume of discussions created per day by me": I know. That is why I have pointed out that I do feel the need that you reduce the volume. I do not know how other people see it, though. --Dan Polansky 16:11, 15 November 2010 (UTC)

You have quite often (I remember four or five occasions) edited major templates without prior discussion and caused problems that broke hundreds or thousands of entries. Those changes might not be controversial in nature, but they were damaging in effect, and the breakage might have been avoided if others had had a chance to review them first. Equinox◑ 00:46, 16 November 2010 (UTC)

As of the last dump, I think the only ones not yet mentioned are {{voc}}, {{inv}}, {{rfi}}, {{wse}}. --Bequw→τ 00:44, 17 November 2010 (UTC)

Thanks to msh210 and Bequw too, for replying. Another template worthy of mentioning is {{see}}, that should be used only as a language template (see is the code for Seneca), but also retains the functionality of being equal to {{also}}. --Daniel. 08:57, 18 November 2010 (UTC)

Preventing creation of new entries by anons

Wikipedia prevents "anons" (IP addresses, who aren't logged in to a named user account) from creating pages. I think this might be a good idea on Wiktionary too, because a lot of IP-created entries are obvious vandalism, and (relatively speaking) we have far fewer users patrolling vandalism than WP does. While it's slightly annoying to have to sign up (I didn't bother for a few months when I first came here circa 2008), I think it's a reasonable thing to ask, and restricting entry creation to registered users would probably kill a significant category of vandalism — plus we have Requested entries for those who want to suggest an entry but don't know how to write it properly, or don't want to bother. 1. Is this something that could be rolled out across Wiktionary? 2. If so, how do people feel about it? Equinox◑ 00:42, 16 November 2010 (UTC)

I'd want to see some data about how many pages from anons are immediately deleted. Maybe we can get in touch with someone with access to that kind of information (or is available in the database dumps?) Nadando 00:57, 16 November 2010 (UTC)

As we weigh whether or not to require new would-be editors to create accounts before creating pages, we should also consider than some of our established editors sometimes edit without logging in, if they are for example on public computers. When I look at Special:NewPages, the vandalism I find is (as you say, Equinox) obvious (and thus easy to spot and delete)... to me, preventing anons from creating pages is unnecessary. Furthermore, that new anons sometimes give us useful content, and that established users sometimes choose not to log in before giving us useful content — to me, each of those things justifies allowing users to create pages without needing to create or log in to accounts first. — Beobach972 04:30, 16 November 2010 (UTC)

As for the statistics Nadando does well to request: this is only a day's anecdote, but may give some idea: On the 15th, anons gave us meningsverschil, ‎Daygo, chính trị, ‎:bộ chính trị, and řádka. Daygo is currently undergoing RFV (rightly doubted and listed by Equinox), but the other 4 are good Dutch, Vietnamese, and Czech entries. Meanwhile, 32 of the ~83 pages deleted on the 15th were created by anons. Thus, slightly more than 10% of anon contributions on that day were good, while anon vandalism represented slightly less than 40% of what was deleted (the rest being vandalism by logged-in users or miscarriages by bots or logged-in users). — Beobach972 04:30, 16 November 2010 (UTC)

I don't really like it, it seems to me that the whole idea of a Wiki is that anyone can create or edit an article. To be honest it's our job to ditch the ones that aren't any good. Ƿidsiþ 09:47, 16 November 2010 (UTC)

Just as point of fact, Wikipedia's current hostility to anonymous editing is an attempt to prevent libel issues from editors who would use the site to cause very serious harm in the real world. This is an issue which we do not yet have, if we ever will. Anonymous article creation and semi-protection were developed in the wake of the John Seigenthaler controversy not because of the overall quality of their edits (which are good, for the same reason the wiki system works), but because of the potential risk that even a single instance of a certain type of vandalism represented. I think it is important to understand that we don't have the same equation here; even on Wikipedia it is recognized that losing anonymous article creations was a trade-off, but we have far less to lose and far more to gain from it.

Also, the data only gets us so far, as the intangible aspects of anonymous editing may be even more important than the contributions themselves. All of us started out anonymous, and anonymous editing, even clueless, bad anonymous editing, is the gateway to becoming a good editor. It is also in line with the open and meritocratic principles that sustain the project. Targeting it is a form of collective punishment, of creating security through creating barriers to all newcomers, good and bad. It doesn't matter if registration is simple and easy; just as the vandals who are not committed won't bother, so too the casual outsiders with something to add won't take the extra step either. Dominic·t 10:23, 16 November 2010 (UTC)

I oppose. I know many productive anon users, I don't want to scare them away either. --Anatoli 01:47, 18 November 2010 (UTC)

If we believe that new entries by anons are particularly likely to be vandalism, then I think we should first try to make entry-creating edits easier to patrol. Right now a patrolling admin can ask the recent-changes interface to show only unpatrolled anonymous edits, but (s)he can't ask it to show only unpatrolled anonymous entry-creating edits. Such a feature should lower the "cost" of allowing anons to create entries. If we try that sort of step, and we still find this to be a problem, then we can consider stronger measures afterward. —RuakhTALK 04:41, 18 November 2010 (UTC)

Or this (last updated in 2009- might not still work). Nadando 05:14, 18 November 2010 (UTC)

Every time this comes up I have to say that creation of new entries and editing in general must be given the same treatment in terms of access privilege. If you want to make it impossible for anonymous users to create new entries then that's fine with me as long as anonymous users are not able to edit pages either. Otherwise you will not deter the deposit of cruft, you will only redistribute its accumulation. People will start defining terms where they are listed in derived and related terms rather than on a separate page where they belong, and where the merits of including or excluding the terms are more easily weighed. In fact they already do do this, which is a royal pain but nowhere near the landfill of carnage that will be brought on with a shortsighted policy change. DAVilla 19:21, 4 December 2010 (UTC)

I am rather late to this discussion, and I understand that, but I would like to say that I oppose the concept (though I see it's been significantly opposed as it is). For myself, I know that there have in the past been several productive anon editors of languages such as Vietnamese. I don't want to scare away people who want to add a few words in their language but don't necessarily want to make an account, because they don't feel that level of dedication to the project. Anyway. --Neskaya … gawonisgv? 20:19, 13 January 2011 (UTC)

As usual, I used Google Books as a durable source of works from independent authors, with the exception of plotkai, which has dozens of quotes from webpages (that is, articles and forums) and employs Archive.org as the durable source (though I'm not sure if I used Archive.org correctly here); as a result, I am pondering whether or not it fits the label "Internet slang".

Actually it looks like damage counter will be deleted. It doesn't seem to be used outside the Pokemon universe, and the definition itself is SOP and unnecessarily specific. ---> Tooironic 11:17, 17 November 2010 (UTC)

The current definition is of a real game, and specific enough to avoid being SOP: it is not anything that counts damage; it means 10 points of damage in that specific trading card game.

In more than one conversation, I have compared Pokémon TCG to other games for purposes of inclusion to the dictionary. For example, there is the fact that development is defined as both

(uncountable) The process of developing; growth, directed change

(chess, uncountable) The active placement of the pieces, or the process of achieving it

The latter definition is a more specific version of the former, but conveys a linguistic nuance restricted to the game of chess. The same applies to damage counter of Pokémon TCG, in comparison to any counter of damage.

Note: When I commented this before, there was at least one reply suggesting that chess terms are more important and more worthy to be included than terms of Pokémon TCG. I fundamentally disagree with this idea. --Daniel. 11:55, 17 November 2010 (UTC)

I think our fictional universe rule requires citations independent of the universe. Chess is not a fictional universe, Pokemon on the other hand is. -- Prince Kassad 13:44, 17 November 2010 (UTC)

Pokémon, basically, is not a fictional universe; it is a franchise that depicts multiple fictional universes.

There are words coined to represent fictional concepts from Pokémon, such as "Pikachu", "S. S. Anne", "Goldenrod" and "Oran", that are under the rules of WT:FICTION for the purpose of being or not defined on Wiktionary.

There are other words that represent real concepts directly related to Pokémon, that are not represented by special rules. For example, game mechanics and strategies such as "F.E.A.R." and "Masuda method". --Daniel. 14:47, 17 November 2010 (UTC)

I don't think the Pokémon thing is a more specific sub-sense of damage counter; it is just an instance of one. Daniel's pending additions in Appendix:Chip's_Challenge include block (“a brown object that can be moved by Chip, by ices or force floors”) and creature (“any of a set of harmful moving things”). These are blocks and creatures, and they have particular aspects to match the game's needs, but that doesn't make them anything more than blocks and creatures. Hundreds of video games feature "monsters" and "zombies", but I wouldn't want to see hundreds of separate senses, one for each game, simply because they have different abilities and colours. Equinox◑ 14:31, 17 November 2010 (UTC)

I believe Archive.org does the job of being durably archived. Multiple citations from the page Citations:plotkai includes the piece of text "acessed on" with a link to Archive.org. --Daniel. 21:46, 17 November 2010 (UTC)

"For example, the Wayback Machine maintained by Archive.org is not considered usable for attestation, because the archive of a site can be erased at the request of the site owner." Equinox◑ 21:55, 17 November 2010 (UTC)

(Interestingly, Google Groups will also remove Usenet posts at the author's request. But Usenet is archived in other places, presumably...?) Equinox◑ 21:58, 17 November 2010 (UTC)

This reminds me, btw, of a bit on WT:CFI about pages being "durably archived" on Google. I know it's the case that after a page has been down awhile it no longer appears in search results — does the cache still remain after that? I think if a durable archive is to be suggested it should probably be the Wayback Machine. —Muke Tever 04:57, 12 October 2005 (UTC)

[...] Another point of inclarity in your post is that it seems to be assuming archive.org (the Internet Wayback machine) is accepted as a source, and arguing that webcitation.org is no worse; in fact, though, we don't accept the Wayback Machine as a source of attestation.​—msh210℠ (talk) 15:21, 9 August 2010 (UTC)

Aha, I might have quoted from an inappropriate place, because I thought the rule existed so I used Google to search pages on Wiktionary. Equinox◑ 22:46, 17 November 2010 (UTC)

No, Daniel, you misread what I'd written. I wasn't opining we shouldn't use it: I was stating what I thought was an already decided-upon practice that we don't.​—msh210℠ (talk) 01:43, 18 November 2010 (UTC)

Actually, I didn't say that you were opining we shouldn't use it. Your quote is pretty clear as stating a supposed fact (that we have the practice of never using the Wayback Machine as a source of attestation), rather than an opinion (that you, rather than the community as a whole, prefer to dismiss the Wayback Machine as counting for attestation).

I assumed that if a consensus was hypothetically attained, someone must have opinions supporting said consensus. (Or, rather, since you assumed we had a consensus, you must have assumed that someone supports said consensus.)

Apparently I have skipped a few steps of thought in that message. I apologize for that. --Daniel. 11:50, 18 November 2010 (UTC)

In that case, I misread what you wrote. ;-) Sorry.​—msh210℠ (talk) 16:31, 18 November 2010 (UTC)

What are the durably archived sources?

I would like to know the answer to the question above. If anyone knows a durably archived source that has not yet been mentioned in this conversation, please do so. Especially if your source is accessible from the Internet.

The ones I remember are: Books (including the ones that can be viewed through Google Books and Wikisource), movies, video games and Usenet.

Wikimedia projects, including Wikipedia and Wikibooks, and presumably other Wikis such as Wikia or Uncyclopedia are "durably archived" by definition, but I can see a consensus not to count them towards the 3-cite rule of attestation, apparently because these sites can be easily edited by anyone.

Finally, there is the Wayback Machine from Archive.org, that has raised some controversy because "the archive of a site can be erased at the request of the site owner" according to WT:CFIEDIT.

I'd qualify some of the above, but you asked for more, not less, so: laws, court decisions, legislative minutes, any of which are published officially; archived periodicals; engravings on monuments, tombstones, Walk-of-Fame-star-type things.​—msh210℠ (talk) 08:40, 18 November 2010 (UTC)

Good examples. Magazines and newspapers also serve as durably archived sources.

Oh, msh210, you have my formal permission to express disagreement with any of the sources that I have mentioned. --Daniel. 17:25, 18 November 2010 (UTC)

One of the things in CFI I have a problem with. Since durably is comparable, how durably? Does it have to be forever? How can we know that these sources will last forever? I'd much rather we use Wikipedia's "reliable third-party sources", with a bit more qualifying. Wiktionary:Durably archived sources would be a massive help. To be honest, I don't think anyone gives a sh*t what 'durably archived' actually means. It's just one of those things, if we can't fix it, just ignore it. Mglovesfun (talk) 17:31, 18 November 2010 (UTC)

"Durably" means - will last at least as long as this wiki. SemperBlotto 17:34, 18 November 2010 (UTC)

Sure, but we can't know that, can we? We don't have a crystal ball. Mglovesfun (talk) 17:36, 18 November 2010 (UTC)

The non-parametric statistical theory of extreme values suggests that for any randomish phenomenon measured over a period of time, T, with extreme value Xhigh and Xlow, there is a 50% chance the phenomenon will register values higher than Xhigh and a 50% chance that it will register values lower than Xlow over the next time period of length T. What percentage of the works of Classical Greece and Rome were lost each century over the last 2000 years? What was the highest level of loss: 10%, 20%, 40%, more? I think there is some similar statistical reasoning can get one to estimate the total expected life of a phenomenon that has lasted T years to be another T years. Another way of looking at it is economically: how much does it cost to maintain an archive and access to it once its immediate economic utility is negligible relative to the resources of the civilization that might incur the cost? High-value religious, official, and literary texts seem to last a long time, outlasting their languages. But low-value text seems likely to not be worth the effort of transcribing and may not outlast its initial physical medium. There are even questions of how long it pays to maintain a community of scholars that can decode ancient scripts and translate ancient texts into more modern forms.

In our case, even high-acid paper has lasted 100 years or so and better quality papyrus, parchment, and paper much longer. How old is the oldest analog sound recording? How old is the oldest digital data? The means of access for electrically recording information are also problematic, with old formats requiring retranscription, which raises questions of economics.

How long will it be possible or worthwhile to maintain electronic copies of blogs, organizational web pages, and news forums that are out of reach of erasure by the original owner of the data ?

So, w:Wiktionary hasn't been around for 8 years yet. w:Usenet covers 28 years, but archiving by commercial enterprises is only 15 years old. The w:Wayback machine/w:Internet archive is about 14 years old. By the statistical logic and SB's threshold criterion alone, we should accept all of these. A problem with the Internet Archive is that the owners of the content (and others (such as the Scientologists) have the power to have content removed or to prevent their content from being archived. DCDuringTALK 20:25, 18 November 2010 (UTC)

If a Wiktionarian successfully quotes something from the Wayback Machine, then the owner evidently did not "prevent their content from being archived"; it leaves only the problem of the owner being able to remove them later. Couldn't it be solved by a bot checking for dead links from the Internet Archive once in a while? Perhaps once every five or ten years? --Daniel. 10:13, 21 November 2010 (UTC)

There is nothing that prevents the owner from requesting removal at any time. The owners of the Internet Archive need to comply to avoid hostile judicial and legislative action which would jeopardize the very existence of the archive as a public resource. DCDuringTALK 12:29, 21 November 2010 (UTC)

What would be the purpose of re-assessing our evaluation of a term every five or ten years? Would a barely legal word suddenly not be considered a word just because some guy decided he didn't want people to see something from the past on a completely unrelated (or worse, an entirely pertinent) concern? No, the criteria are for a durable source precisely for this reason. Barring a change to the criteria themselves, once a term is accepted it is accepted for good. Consider that when you brandish dubious terms like MissingNo.. We expect these to be here for all time. DAVilla 08:21, 4 December 2010 (UTC)

I don't consider video games to be durably archived. Nor wikis (WMF or otherwise). Nor web-pages archived by archive.org. And for books, I'd only count "real" books that are actually published to print; there's some book-like content on Google Books that I'm not sure has ever been to paper. —RuakhTALK 21:04, 18 November 2010 (UTC)

I consider a given video game or WMF wiki to be durably archived, basically because millions of people have copies of it. Modern books would fall into the same (personal) criterion. --Daniel. 10:13, 21 November 2010 (UTC)

In the spirit of openness we disfavor all sources that are not available without incremental cost, ie, from libraries. Games are available from few libraries. —This unsigned comment was added by DCDuring (talk • contribs).

Well, I suppose there are not many, if any, words that can only be cited from video games, so that is not much of an issue. Even video game-related terms such as HP and special attack are citeable from other places, such as books about video games. --Daniel. 07:24, 22 November 2010 (UTC)

WT:SEA was built based on the types of sources that have been generally agreed to be durable. DAVilla 08:21, 4 December 2010 (UTC)

In addition, there is Wayback and, for French sites only, archiving performed by BNF (Bibliothèque nationale de France), which are accessible to researchers. BNF archives everything published in France (books, magazines, etc.), indefinitely.

Therefore, it seems that even normal Internet pages are durably archived.

About wikis, there is no reason to exclude them, when it is clear that the use is a natural use, not an artificial use coined to deceive us. When there is any doubt, the citation should be excluded, but only in this case. Lmaltier 09:02, 4 December 2010 (UTC)

Does the Google cache reflect that the page hasn't been re-visited by the web-crawling spider, or does Google intentionally archive these, and if so for how long? Wayback is mentioned specifically, and the argument against its use is not my own invention. This is not a policy but a reflection (and interpretation, of course) of how the community feels, so although you're certainly welcome to question it, realize there will have to be a lot of minds to change.

On the other hand, BnF is durable by the sound, look, and feel of it, being the national library of France and all. That's exciting because on of its programs, Gallica, has started archiving e-books.

Wikis are neither included or excluded. If durable then certainly the quotations would count (although frankly I can't think of any wikis that are considered durable except where they are published in another form). The text there just comments on how difficult it is to cite them. DAVilla 18:22, 4 December 2010 (UTC)

About Google, the page I mention was deleted from the site a long time ago, and is not found by normal searches, but is still accessible nonetheless. For how long, I don't know.

The RFM discussion also dealt with two other moves, but in this poll I am focusing only on the categories for templates that belong to an inflection line.

The rationale for the move was basically that a template with a conjugation table is an inflection template but not one that belongs to an inflection line, so the name "Spanish inflection templates" is misleading, as the category is only for templates that belong on the inflection line.

If you basically agree with the move but prefer "Category:English inflection line templates" to "Category:English inflection-line templates" (the difference is only in the missing dash or hyphen), please indicate so in your cast vote. --Dan Polansky 11:17, 19 November 2010 (UTC)

Support —CodeCat 16:54, 19 November 2010 (UTC) but without the hyphen.

In this case you need to vote oppose. -- Prince Kassad 18:49, 19 November 2010 (UTC)

Why? This is as intended: you if basically agree, you support. If you strenuously disagree with the hyphenated form, then an oppose would be in order, but if you merely prefer the form without hyphen, you support and state your preference. Anyway, I am surprised that you oppose the move on the account that the new name does not sound properly English when native speakers have had no problem with the new name so far. --Dan Polansky 18:56, 19 November 2010 (UTC)

Support, even though I never liked the name inflection line. I prefer headword line. --Vahag 17:26, 19 November 2010 (UTC)

SupportDaniel. 19:41, 19 November 2010 (UTC) I, too, prefer headword, rather than inflection, in this case; "headword" was my first suggestion when I proposed deprecating the name "Category:English inflection templates". --Daniel. 19:41, 19 November 2010 (UTC)

Should it be "Category:English headword line templates" or "Category:English headword templates", per your preference? --Dan Polansky 20:05, 19 November 2010 (UTC)

Thanks for asking. I prefer Category:English headword line templates. After pondering on this subject, I came to the conclusion that I don't like the other alternative, Category:English headword templates, because headword lines includes other items, not just headwords, such as genders, inflections and transliterations (and parentheses, for that matter). --Daniel. 09:20, 20 November 2010 (UTC)

Support —Saltmarshαπάντηση 17:03, 22 November 2010 (UTC) But (for what its worth) (1) the hyphen seems unnecessary, (2) I would have preferred Headword line template etc.

The hyphen is there to indicate that the noun phrase "inflection line" is being used attributively, to indicate that "inflection line" modifies "template" rather than "inflection" modifying "line template." —AugPi(t) 04:54, 25 November 2010 (UTC)

Quote signs in category names, appendices

Hi, there is a policy that words or phrases with an apostrophe are entered with a straight ASCII apostrophe, for technical reasons, unfortunately. Hopefully one day we will overcome this imperfection. On the other hand, it is accepted that curly quotes should be used in articles wherever possible. In that respect we are more advanced than Wikipedia. However, we need to come up with some policy: I have been changing links to stuff like Appendix:Variations of "man" to Appendix:Variations of “man” and then making a redirect to the page with the straight quotes. For Categories, there is a problem: the redirect redirects you when you visit the category, but doesn’t redirect the pages which are added to the category. Compare Category:Nouns ending in “-ism” by language and Category:Nouns ending in "-ism" by language (please leave like this for now, until this discussion is resolved).

What policy shall we use here? I would rather not use the quote signs at all in those categories, which also already is done sometimes now: Category:Danish words suffixed with -isme. It avoids the problem and doesn’t look bad to me.

Of course, my vote would be to abandon straight quotes altogether and fix the software, but… H.(talk) 08:02, 20 November 2010 (UTC)

Re "... it is accepted that curly quotes should be used in articles wherever possible." It seems you get this wrong. I know of no consensual support of this by the community. There may be a plain majority of supporters of such a thing, judging from Wiktionary:Votes/pl-2008-12/curly quotes in WT:ELE, which ended (8-8-0) at the end of voting period, (9-8-1) counting the late votes. It seems at best tolerated when lovers of curly quotes place them at various places, to avoid an edit war. I certainly do not feel obliged to use curly quotes whenever possible. --Dan Polansky 08:58, 20 November 2010 (UTC)

I have reverted some of your changes in category names, such as this. Category names do not use typographic or curly quotes; if you want to change this, you have to garner consensus or at least some support in a discussion. --Dan Polansky 09:10, 20 November 2010 (UTC)

We could comment them out, leaving them just in the wikitext. This would helpful to editors and I bet no automated spider will check there. --Bequw→τ 05:01, 22 November 2010 (UTC)

@msh210&Bequw: Well, Google Groups hides the e-mail of people, behing a CAPTCHA. If that is so useless, I wouldn't worry hiding the e-mails here too. As for leaving them just in the wikitext, I would disagree because it seems pointless to add this information where it can't be read immediately. Besides, eventually spammers would learn that Wiktionary has a bunch of e-mails in wikitext and how to gather them at once. --Daniel. 06:36, 22 November 2010 (UTC)

I quite agree with your last point: If we agree to hide e-mail addresses, then they shouldn't be in comments either.​—msh210℠ (talk) 07:20, 23 November 2010 (UTC)

I don't necessarily disagree with the cool appearance of the at-signs of your example, but it would be extremely easy to convert automatically the piece of text info{{@}}wikimedia.org into info@wikimedia.org in order to spam it. --Daniel. 03:53, 25 November 2010 (UTC)

Right, but AFAICT anything we do to hide e-mail address will have that problem if spammers read our dumps and catch on to our system of obfuscation, so we have to ignore that problem. I've made {{@}} meanwhile, q.v. (Of course, another solution, as you suggested originally, is to remove the addresses altogether. (And presumably to delete old revisions? Well, whatever.) But the benefit of having a better-cited citation overrides IMHO any concern for the privacy of Usenet posters who, remember, don't really have it anyway. What do others think?)​—msh210℠ (talk) 08:23, 25 November 2010 (UTC)

Late again, but please also remember that most users do not enter typographic quotation marks directly from their keyboards. Putting these in category names for which the typographic quotation is not the absolute rule makes the category substantially less accessible, and adds a need for yet more redirects --Neskaya … gawonisgv? 20:27, 13 January 2011 (UTC)

Citation tools and templates

Given the numerous citation tools for wikipedia, should we provide template name & parameter compatibility with their citation templates. I imagine ours are a bit different (if at least more restrictive). Our display can/will differ, but several of these tools are quite helpful and easy to use. --Bequw→τ 01:03, 22 November 2010 (UTC)

How to list word pairs and their translations

Many pairs of verb + preposition have their own entries, not only go on and take back, but also believe in. Is there any guideline for when to create a separate entry like that? I'm asking from the perspective of a non-English language. The Swedish translation to "believe in" is "tro på". This is actually a non-trivial piece of information, since "in" normally doesn't translate to "på". Swedes believe "on" things, not "in". Likewise, "believe of" (to believe something of someone; which doesn't have its own entry) translates to "tro om" (believe about). So, do I really need to create separate entries for "tro på" and "tro om"? That would lead to very many entries, with the benefit that I can link directly to each entry, but also with the drawback that a reader looking at the separate entry would not get the whole picture of how the verb is used. Or should I fit this knowledge into the article for the verb, and how? Just as one more example sentence? Are there any examples of how this dilemma has been solved or addressed for other non-English languages? --LA2 04:02, 22 November 2010 (UTC)

I made two examples: tro#Swedish (the verb, definition 1) uses example sentences, while vika#Swedish uses separate inflection lines for each phrase, under the same Verb heading, above the same Conjugation heading. --LA2 07:14, 22 November 2010 (UTC)

In the sentence "the painting will go on sale next week", the verb "go" happens to be followed by "on", but this is not the idiomatic phrasal verb "go on", is it?

I wonder if this is not also the case with "I didn't have anyhing to go on", listed as definition 3 of go on (added by Taxman in April 2006).

In the sentence "let's go on with the show", the phrase "go on" happens to be followed by "with", but we don't have a separate article for "go on with". Should we?

Maybe there are lots of false friends among these phrasal verbs. --LA2 02:30, 23 November 2010 (UTC)

Poll: Inflection to inflection-line 2

In a recently started poll about renaming categories for certain templates (still running), some people preferred a naming option that was not explicitly offered from the start: "headword line". Let me ask two more sets of questions, to resolve the possible naming options.

Old name:

Category:Spanish inflection templates

Renaming options:

(a) Category:Spanish inflection-line templates

(b) Category:Spanish inflection line templates

(c) Category:Spanish headword-line templates

(d) Category:Spanish headword line templates

The two sets of questions are such that each disregards one aspect: one asks about the choice of phrase to the disregard of hyphenation, the other asks about hyphenation to the disregard of the phrase.

Please post "support" under those statements that capture your preference. You would post two preference votes in total, one per triplet of preference statements.

Support The appellation "inflection line" in Wiktionary historically comes from an English bias. Because English is almost a non-inflected language, it can accomodate all information about inflection in the headword line. Other languages use the headword line for other stuff, e.g. gender, transliteration, perfective/imperfective, alternative script, etc. and put the inflection under Declension and Conjugation. I remember when I was new I was very confused when people referred to "inflection line". It's a misleading and unintuitive name.--Vahag 16:53, 22 November 2010 (UTC)

SupportDaniel. 17:56, 22 November 2010 (UTC) As I stated in one or more previous discussions, I agree with what Vahagn said now. It is possible and very common for a headword line to be devoid of inflections, therefore "inflection line" is imprecise and should be disencouraged in favor of the most intuitive name "headword line". --Daniel. 17:56, 22 November 2010 (UTC)

per Ruakh (06:10, 23 November 2010 (UTC) in this section 1.2) and Beobach (04:24, 23 November 2010 (UTC) in section 1.3, just below). —This unsigned comment was added by msh210 (talk • contribs) at 23 November 2010.

Support, as per 1 and 2 above. If the change is made every effort should be made (I'll help) to make sure that this nomenclature change is made throughout the project. —Saltmarshαπάντηση 11:44, 24 November 2010 (UTC)

A "line template" is a template that generates a line, such as definition line templates and headword line templates, which basically defeats the purpose of your distinction. (: But I still prefer the hyphenated version, for other reasons. --Daniel. 01:52, 23 November 2010 (UTC)

I realized that the original reason for adding line to the category name (viz, to avoid confusion with a category of inflection templates) doesn't apply if headword is used in the category, so I'm starting another straw poll: "headword templates" vs. "headword[- ]line templates".​—msh210℠ (talk) 16:10, 24 November 2010 (UTC)

3.1. I prefer "headword" alone over either "headword-line" or "headword line".

I do not know, but let me note that the templates generate not only the headword but also other parts of the headword line. --Dan Polansky 18:07, 24 November 2010 (UTC)

Yes, Dan Polansky's statement is correct. msh210, please read the first two messages (Vahagn's and mine) about this subject in the section "2.1 I prefer a hyphenated version to a version without hyphen:". --Daniel. 21:03, 24 November 2010 (UTC)

Vahagn didn't write anything in 2.1 AFAICT, but I had read your comment there. "Headword templates" is shorter and as accurate as "headword[- ]line templates", as the templates do generate the headword (if often other things also). MHO, of course; the straw poll will determine what people like. If everyone except me (who already put my name in 3.1) and Yair (who already put his in 3.2) puts his in 3.4, I'll be glad to yield to Yair.​—msh210℠ (talk) 07:54, 25 November 2010 (UTC)

I'm sorry, I meant section "1.2". That is:

The appellation "inflection line" in Wiktionary historically comes from an English bias. Because English is almost a non-inflected language, it can accomodate all information about inflection in the headword line. Other languages use the headword line for other stuff, e.g. gender, transliteration, perfective/imperfective, alternative script, etc. and put the inflection under Declension and Conjugation. I remember when I was new I was very confused when people referred to "inflection line". It's a misleading and unintuitive name.--Vahag 16:53, 22 November 2010 (UTC)

As I stated in one or more previous discussions, I agree with what Vahagn said now. It is possible and very common for a headword line to be devoid of inflections, therefore "inflection line" is imprecise and should be disencouraged in favor of the most intuitive name "headword line". --Daniel. 17:56, 22 November 2010 (UTC)

Yes, I'd read those, too. My reply immediately above was to them rather than to anything from 2.1.​—msh210℠ (talk) 08:43, 26 November 2010 (UTC)

3.2. I prefer either "headword line" or "headword-line" over "headword" alone.

Support. Most of the templates aren't simply headword templates. They display the headword, usually along with some inflections and/or other information about the word. They produce content to fill the headword line. --Yair rand (talk) 04:02, 25 November 2010 (UTC)

SupportDan Polansky 08:33, 26 November 2010 (UTC) I am not wholly sure but I'll take a stance. I am willing to yield to plain majority. --Dan Polansky 08:33, 26 November 2010 (UTC)

3.3. I expressed a preference in the hyphenated vs. unhyphenated poll above (2.1 or 2.2) and prefer that form of "headword[- ]line" over "headword" alone, but "headword" alone over the other form of "headword[- ]line".

The introduction of the section 3.3 is too large, and unnecessarily complex because it repeats few sections while effectively deprecates other sections. --Daniel. 21:03, 24 November 2010 (UTC)

"Redundant" languages

Sure that this has been talked about before, though how generally I don't know. I'd also like to split the debate in two, as I've been criticized before for bringing up too many points one discussions.

There are essentially some languages with ISO 639-1 or ISO 639-3 codes that we don't (currently) allow in NS:0. Ignoring constructed languages (a different issue) we don't allow Ancient Hebrew (known as Classical Hebrew) as we treat it as Hebrew (he as opposed to hbo). We also don't allow Chinese, but only Mandarin, Cantonese, Min Nan, etc.

The currently debate is on Flemish, whether Flemish sections are 'redundant' to Dutch ones. This could apply to quite a few other languages - Anglo-Norman, Norwegian Nynorsk, Norwegian Bokmål, Scots to name just four. So, what do we do? Decide them all on an individual basis I suppose. But using what criteria? Is it just voting or is there are minimum burden of proof, or is it just voting? Mglovesfun (talk) 15:44, 22 November 2010 (UTC)

To answer the question "using what criteria?":

Consulting other authorities, at least on living speeches (to use the Middle English meaning of that word). We might ask: is the speech protected as a language by international treaties or inter-governmental organisations? Scots is a protected minority language (protected by the English), it is not English. We will definitely ask: do dictionaries of the speech(es) agree on what language they are dictionaries of? Dictionaries which contain "Flemish" speech and dictionaries which contain "Dutch" speech agree that they are dictionaries of the Dutch language, not dictionaries of separate languages. In this way, we avoid as much as we can the capriciousness of votes on each, and the bogeymen (or, in Middle English, bugge-men) of "original research" and the "slippery slope". (For example, the differences between Old Norse and Icelandic are as small as or smaller than the differences between Middle and modern English: if we combined Middle and modern English, we would struggle not to slide down the "slippery slope" of "having" to conflate very-similar languages because we'd already combined less-similar ones. However, if we did conflate Old Norse and Icelandic, we would be in a small minority; we might even be the first to do so: "original research".) — Beobach 05:41, 24 November 2010 (UTC)

The Oxford English Dictionary includes Scots and Middle English, the latter silently except for dates, so I don't see your solution as a magic one. Nor, particularly, do I see any problem with distinguishing any pair of languages we like, no matter what we conflate. If it would serve the purposes of our users to separate Moldovian and Romanian, then separate we should. And original research is an issue for Wikipedia, not here.--Prosfilaes 06:30, 24 November 2010 (UTC)

The OED includes Scots (and English) and calls itself a dictionary of the English language, but the Dictionary of the Scots Language calls itself a dictionary of the Scots language — ie, the answer to the question "do dictionaries of the speech(es) agree on what language they are dictionaries of?" is "no". (When we see that Scots is protected as a minority language, we then see more evidence that Scots and English are separate languages.) The idea of consulting such authorities is not magical; we have been doing it all along: if someone proposed merging Spanish and Russian, the idea would sooner be shot down as unsound "because no-one considers them the same language" than as a disservice to our users or our editors. The idea is not a solution, either: if authorities agree, as on Spanish/Russian or Old Norse/Icelandic, there's no problem to solve (except perhaps, in that hypothetical situation: why do some of our editors want to combine Spanish and Russian?!); if authorities are quite divided, or if there are no authorities for a particular speech, they're not much help at all. It's simply an answer to Mglovesfun's question, "is there a minimum burden of proof, or is it just voting?" — Beobach 21:59, 24 November 2010 (UTC)

The OED's use of Middle English is very specific: it doesn't include ME words for the sake of it. But words which exist in modern English are traced back to their earliest appearence in the language, including in Old English or Middle English periods. This means that most OE words and many ME words are necessarily excluded. Ƿidsiþ 09:15, 3 December 2010 (UTC)

I don't generally support the idea that we shouldn't blindly follow ISO 639 - which we don't, we already know there are ISO-639 codes we don't use. ISO 639 wasn't designed to be used for Wiktionary, while isn't pretty important for us to have shorter codes for longer and/or difficult to type languages name (like Old Provençal) we shouldn't "handcuff" ourselves with somebody else's system. Mglovesfun (talk) 15:16, 3 December 2010 (UTC)

I don't generally support the idea that we shouldn't blindly follow ISO 639 - so we should blindly follow ISO? -- Prince Kassad 16:09, 3 December 2010 (UTC)

Why looking for problems, for controversies? The only way to prevent controversies is the use of an external source. On fr.wikt, we allow at least all languages recognized by the foundation + all languages with an ISO-639 code, and, as ISO-639 is incomplete, we have defined clear criteria for inclusion of other languages. Note that allowing a language does not mean that sections for this language are encouraged. Lmaltier 20:22, 5 December 2010 (UTC)

That is a remarkably sane approach, Lmaltier. I, for one, would endorse it on en.WT. - Amgine/talk 02:52, 30 December 2010 (UTC)

Middle English/English crossover (for example)

A bit of research tends to show that anything that's attestable in Middle English is attestable in Early Modern English too. With respect to Old/Middle/Modern French, see estre where we have four entries. Does anyone 'mind' having this 'duplication'. I mean they're all valid, attestable in the given languages, but does having four entries for what are essentially four different stages of the same language (or even three, combining Anglo-Norman into Old French)? Mglovesfun (talk) 15:44, 22 November 2010 (UTC)

A point you should note though is that while there will be words that are identical in both stages, there will also be some that differ. It makes little sense to say 'they are the same except when they are not' and then deciding to include only words in the earlier language that differ from those in the later one. And don't forget that while MidE and EME share many words, there are some quite significant pronunciation differences. If we start adding pronunciation sections (as we have for OE) then we will definitely want to keep them apart. —CodeCat 21:50, 22 November 2010 (UTC)

Part of that is our normalizing of Middle English spelling towards Modern English trends, which I'm personally not thrilled with. I also suspect that early Middle English had many more words that didn't survive to Modern English. In any case, the fact that a word can be attested in Middle English, and the definitions for which it can be attested do some service towards giving a history of the word that we don't usually do elsewhere.--Prosfilaes 03:15, 23 November 2010 (UTC)

I think the tricky part is in deciding where one language ends and the next begins. The really comprehensive English dictionaries bypass the problem by including plenty of Middle English words in modernized spellings; the really comprehensive Hebrew dictionaries bypass the problem by including Ancient Hebrew words; and from what I understand, the really comprehensive Dutch dictionaries more or less bypass the problem by including Flemish words, except that they apparently reintroduce the problem somewhat by trying to tag the Flemish-only words. We are a bit unusual in trying to be really comprehensive, while also trying to distinguish all these languages. One approach is to declare, rather arbitrarily, that Middle English ceased to exist on January 1st, 1550 (when its native speakers adopted Modern English en masse), that Flemish ceases to exist when you cross the boundary from King Albert's domain to Queen Beatrix's, and so on. (Note: any Dutch works produced outside the Low Countries will be arbitrarily classified as English.) Once we have such a dictum, I don't see a problem with duplicate entries for any words that dare flout it. —RuakhTALK 06:42, 23 November 2010 (UTC)

"Dutch works produced outside the Low Countries will be arbitrarily classified as English" haha! :)

As far as I know, the comprehensive Dutch dictionaries distinguish between Flemish Dutch and Netherlands Dutch natiolects (like American vs British English) — not asserting them to be different languages. We should consider that approach here: tagging words as {{context|Flemish|lang=nl}}/{{context|Belgium|lang=nl}} vs {{context|Netherlands|lang=nl}} if they are specific to one region, and tagging them as "pan-Dutch", or leaving them tagless, if they are used in both places. — Beobach 07:35, 23 November 2010 (UTC)

That's certainly the most reasonable approach, but how will it help in our war with the French? (And yeah, I think pan-Dutch words should be tagless.) —RuakhTALK 13:09, 23 November 2010 (UTC)

I like the system using arbitrary dates to define the difference from one language to another. It's far, far from perfect, I'd say it's equivalent to having a minimum age for the consumption of alcohol or sexual consent in that it's better than nothing. Using spelling is a bit POV, plus in the same text, one word could be Middle English, and the next word only Modern English. Per CodeCat, sometimes the word may be the same, but the pronunciation, gender, inflected forms (etc.) may be different. Mglovesfun (talk) 13:40, 23 November 2010 (UTC)

To me, Middle English is a stage of the English language, not something separate from it, and considering it as a different entity helps no one. It even weakens our modern English sections by having words appear to come out of nowhere. It is uniquely awkward, because English was a flourishing literary language during exactly the period when it supposedly switched over, and since I work with quite a lot of stuff published around then -- like Mallory -- I'm very aware that picking one or other language is arbitrary. This is very different from the situation between Old English and Middle English, which is characterised by a near-total lack of attested writings (allowing us to draw a convenient line between the two), and during which time the language lost grammatical gender and very quickly absorbed thousands of new words from a completely new source. There the historical record has left us with a very clear language change. Above, someone suggested that ME entries are useful for things like pronunciation detail. This is fair, although singling out a 14th century pronunciation still seems rather random given that there are vast periods of "modern English" not covered by our modern English pronunciation sections: during Shakespeare's time there was no such sound as /ʌ/, and love rhymed with cove, to take one obvious example. Nor were spellings fixed in ME, making lemmata very awkward. The same can be said for most pre-modern languages, but in the case of ME there is a good solution available in merger with "English". Actually, when it comes right down to it Middle English wouldn't even have to disappear -- I think the sections are potentially useful for those interested in the period -- I just don't think their existence should exclude ME data and citations from modern English entries, that's what annoys me about it.

(On the French issue: for similar reasons, I would favour merging Middle French into obsolete form of entries under "French", but retaining Old French as separate for reasons of grammatical inflection etc.) Ƿidsiþ 13:47, 23 November 2010 (UTC)

Yes, w:fr:Moyen français was a redirect to w:fr:Français for nearly a year, until 2008. Anglo-Norman should really be merged into Old French too - I know it had a distinct influence on Middle English (and therefore English) but I'd rather use context labels, which is what I was doing before I discovered than Anglo-Norman had an ISO 639-3 code. Mglovesfun (talk) 13:55, 23 November 2010 (UTC)

On the Dutch thing, I suppose there are two points that need to be addressed here. The first is Flemish. I think quite simply that it makes little sense to distinguish it as a separate language. There are some obvious differences, yes, but they rely on the same written standard for the most part. The differences between dialects in the extremes of the Netherlands (excluding those that are now recognised as languages, of course) is generally no greater than the difference between Netherlandic Dutch and Belgian Dutch. The Dutch as spoken in Belgium, even if it is not called Dutch, is readily readable and understandable by anyone from the Netherlands, particularly those in Brabant.

The second point regarding Dutch is the treatment given to Middle English and Middle French. Middle Dutch as a language is already quite similar to modern Dutch, just as Middle English resembles modern English to a degree. The pronunciation differences are about the same, too. But a modern Dutch speaker can not understand a complete text, just as a modern English speaker can't easily understand Chaucer without at least a glossary. What exactly the grounds are for differentiating what are essentially 'dialects' of one language spaced apart not in space but in time, I don't know. But I think presumed mutual intelligibility would play a big part, not just of the writing but also of the pronunciation. If a Middle text is read out to a modern speaker using authentic pronunciation (so far as we can tell what it was), would a modern speaker be able to make good sense of it? I think in that regard the difference between Middle English and Modern English might be as great as the difference between Modern English and Modern Scots. And we do treat Scots as a separate language. —CodeCat 14:24, 23 November 2010 (UTC)

Re: "If a Middle text is read out to a modern speaker using authentic pronunciation (so far as we can tell what it was), would a modern speaker be able to make good sense of it?": In the case of Middle English, I'd say the answer is "no". When we were reading The Canterbury Tales in high school English, one of the other English teachers dropped in and recited the Pardoner's Prologue for us, so we could get a sense of the rhythm and rhyme and whatnot, and I literally did not understand a single word. (And none of my classmates seemed to, either, though of course I can't say that for sure.) But that's only if we ask what a 2010 (or 1998) speaker would understand; a 1551 speaker might well have understood more, and certainly a 1551 speaker would have no problems with a 1549 work (provided it was from the same dialect). It's like a SprachbundDialect continuum in time. —RuakhTALK 14:54, 23 November 2010 (UTC) edited 16:03, 23 November 2010 (UTC) per CodeCat's comment below

Plus if Shakespeare had appeared in your class and recited one of his sonnets, even assuming you could get over your astonishment and concentrate, I suspect you would hardly have understood a word of that either. Ƿidsiþ 15:08, 23 November 2010 (UTC)

Nah, we also watched some clips from Shakespeare in Love, and had no problem understanding it. ;-) —RuakhTALK 16:03, 23 November 2010 (UTC)

Realistically it's more like a dialect continuum. So we would use the same approach that is used for such dialects. Either the definition is geographically arbitrary, or it is formed by an isogloss, the analogues of which are respectively a fixed year and a sound change in diachronic linguistics. —CodeCat 15:03, 23 November 2010 (UTC)

Akan/Twi/Fante

Apparently, they used to have a different literary tradition but now they're all written the same. If that's true, certainly we don't need to differentiate. (There's a similar discussion on Meta about merging ak.wikipedia and tw.wikipedia due to this situation.) -- Prince Kassad 20:42, 5 December 2010 (UTC)

User:Daniel./Nonfiction

In particular, there are Citations:MissingNo., Citations:Curselax, Citations:Matrixism, Citations:Jediism, Citations:Narutard and Citations:Eevolution as recently attested entries. "Curselax" is a game strategy, "MissingNo." is a glitch, "Narutard" is a fan of a series, "Matrixism" and "Jediism" are religions, and "Eevolution" is a fanmade word that represents a group of fictional creatures. I find the last one very controversial as confronting the basic idea that fictional characters shouldn't ever be defined here, though not from a wikilawyerist point of view, because it does not fall under the criterion of "originating in fictional universes". Finally, I removed Citations:Potterism's status of "attested", because it indeed isn't. --Daniel. 17:49, 22 November 2010 (UTC)

These are mostly specific to their universes, as has been pointed out to you before. The fact that they can refer to "real" things does not mean they can't be universe-specific. They have no place here and are pernicious. Equinox◑ 23:24, 22 November 2010 (UTC)

Yes, I agree on those. I did say "mostly" — actually, it's only half, three of the six, but I think there are further Pokémon-specific entries still extant by Daniel. Equinox◑ 23:38, 22 November 2010 (UTC)

Equinox, you are right: I estimated in another discussion that I am able to define at least some hundreds of nonfictional terms whose context is of Pokémon. Do you have any suggestion of guideline for inclusion that excludes all "universe-specific" words (which would deprecate or amend WT:FICTION), particularly one that additionally allows the existence of entries for Jediism, Matrixism and Narutard?

Your objections, as I remember them, are merely nuances of "[these words are] polluting this otherwise-useful project". I am aware that practices from other Mediawiki projects do not apply here, but I feel a direct analogy with Wikipedia's short policy named WP:UNENCYCLOPEDIC. Basically, "Delete it from the [dictionary] because it does not belong in the [dictionary]." without further elaboration is a personal opinion or a circular reasoning, rather than a relevant logical argument. --Daniel. 01:47, 23 November 2010 (UTC)

Jediism is (until someone can find evidence to the contrary) never used in the Star Wars franchise; it's an invention of people who are referring to the Star Wars franchise, and seems to be cited for CFI. Mglovesfun (talk) 13:46, 23 November 2010 (UTC)