"Hack" in the computer sense initially meant a creative solution. It was only later that it came to mean compromising someone's Hotmail account or stealing files from three-letter agencies. Since we have it backwards, I'm soliciting feedback for how to fix it. Thoughts? —Justin (koavf)❤T☮C☺M☯ 02:51, 1 February 2018 (UTC)

I fear this is a part of a larger project. The relationships among the various senses and etymologies of hack and hacker are not at all settled. See, for example “hack” in Douglas Harper, Online Etymology Dictionary, 2001–2019.. Also, is hack ("cough") onomatopoetic? DCDuring (talk) 05:58, 6 February 2018 (UTC)

To me, the verb hack in this sense is odd. I only recall hearing it used as a noun: 10 hacks to improve something or other (where hack means "clever idea"). If someone wants to hack their love life, I would probably understand it to mean they want to stop it. —Stephen(Talk) 10:34, 6 February 2018 (UTC)

@Koavf, one solution would be to put the computing definitions (and their extensions) in chronological order of their development, grouping related terms together. So first "hacking" for expert coding (and thence optimising daily processes), and then "hacking" for breaching security.

@Stelio: This is beautiful and probably better than anything I could have made. May I suggest that in the future, you don't use green/red in case any of the readers out there are color blind? —Justin (koavf)❤T☮C☺M☯ 10:38, 21 February 2018 (UTC)

Indeed yes, I'm aware of colour blindness as a barrier for visual comparison; I take pains to distinguish colours on graphs I put in professional presentations and fully support the Web Accessibility initiative. The headers I used are meant as the main differentiators; the colouring was just for some quick visual impact. Red-blue is a safer combination, and one I usually use; mea culpa for publishing before thinking deeper. Fixed! -Stelio (talk) 10:48, 21 February 2018 (UTC)

I've gone ahead and done that reordering, in the absence of any comments. -Stelio (talk) 09:17, 13 March 2018 (UTC)

This month, we suggest you to focus somehow on the words to talk about the radio.

Well, for those who do not know LexiSession yet, it is a collaborative transwiktionary experiment. You're invited to participate however you like and to suggest next month's topic. The idea is to look at other community improvements on the selected topic to improve our own pages. It already bring new collaborators to contribute for the first time on a suggested topic. If you participate, please let us know here or on Meta, to keep track on the evolution of LexiSession. I hope there will be some people interested this month, and if you can spread it to another Wiktionary, you are welcome to do so. Ideally, LexiSession should be a booster for every Wiktionary on the same agenda, to give us more insight into the ways our colleagues works in the other projects.
Noé 09:45, 1 February 2018 (UTC)

A new user, Kakiyomi2 (talk • contribs), has been creating entries for Middle Japanese (or Classical Japanese as the user calls it), but we don't have a code for the language. --Lo Ximiendo (talk) 16:37, 2 February 2018 (UTC)

That's me. I've added entries for かはす, かはる, かへる, かへす, かふ, and かめ, partly just to get interest going in this. I intend now to leave it some months to see what feedback it generates before starting to add more from my extensive materials.--Kakiyomi2 (talk) 17:18, 2 February 2018 (UTC)

For historical and political reasons, jap is generally eschewed in favor of ja or jp where a two-letter code might suffice, and jpn, where a three-letter code is needed.

In monolingual Japanese sources, the stage of the written language from roughly the Heian period (800s) through to the Meiji period (late 1800s) is broadly described as 文語(bungo, literally “literary language”), in contrast to 口語(kōgo, “spoken language, vernacular”, literally “mouth language”). There is precedent for using some variation of the term literary, as in the three-letter code ltc for Middle Chinese / Classical Chinese (presumably derived from literary Chinese). By extension, I'd prefer ltj if we can use just a three-letter code. If we need a 3-3 code, I'd suggest jpn-ltj. ‑‑ Eiríkr Útlendi │Tala við mig 19:24, 2 February 2018 (UTC)

PS: To my knowledge, the ISO only has codes for Old Japanese (ojp) and Japanese (ja). I'm not aware of any extant standardized codes for anything in between circa 800 and the modern era. ‑‑ Eiríkr Útlendi │Tala við mig 19:30, 2 February 2018 (UTC)

We do not get to make up two or three letter codes. Japanese is ja as a two-letter, ISO 639-1 code, and jpn as a three-letter, ISO 639-2/3 code. "jap" is an obsolete code for Madi, now merged into Yamamadi (jaa).

If we need a three letter code, qaa–qtz are reserved for local use. "ojp covers "7th-10th centuries AD", according to the Linguist List, which basically controls the extinct section of ISO 639-3. Japanese is listed as the child of Old Japanese, so presumably it covers everything from then until now.--Prosfilaes (talk) 21:50, 2 February 2018 (UTC)

Right, we can't make up a two- or three-letter code (because the ISO might later assign it, and besides it'd be confusing). If a code is needed, the customary naming scheme, described in Wiktionary:Languages, is to use the nearest ISO family code and then three letters that approximate the language named, so the code should be "jpx-ltj" if we call it "Literary Japanese", or "jpx-mja" if we call it "Middle Japanese", or something else starting with "jpx-". - -sche(discuss) 23:04, 2 February 2018 (UTC)

Re: codes, thank you both for the pointers. I dimly remembered that there was a mechanism for creating our own codes (the prefix of three letters from qaa through qtz), but I encounter such issues so rarely that I couldn't recall any useful details. I'm happier with the jpx- prefix, as that's a lot easier to remember than anything beginning with q.

Re: dating, there's some terminology confusion. OJP is variously described in English as including everything textual prior to the Heian period (i.e. 794 and before), or up through the end of the Heian period (1185), or until some relatively arbitrary point in the middle of the Heian period (probably where Linguist List gets its dating). For EN WT purposes, so far as I've understood it, we're using the earlier dating, in alignment with Japanese sources. The main inflection point in the development of the language is the loss of certain vowel distinctions recorded using w:Jōdai Tokushu Kanazukai, which shift was apparently complete by the start of the Heian period. The EN WP article on w:Early Middle Japanese describes some of this in more detail. (NB: Anything pre-historic, i.e. before the first texts, is usually described as Ancient Japanese, Proto-Japanese, or Proto-Japonic.) ‑‑ Eiríkr Útlendi │Tala við mig 00:32, 3 February 2018 (UTC)

It is not bad at all to use ojp if there is no other approprite code. In Middle Ages spoken Japanese changed but written Japanese stayed similar. — TAKASUGI Shinji (talk) 01:12, 3 February 2018 (UTC)

@everyone who finds this thread relevant: I did some reformatting of the provisional entry at かへる, to: 1) bring the formatting more in line with WT entries in general and other JA entries more specifically; 2) add in kanji usage information in a way that mirrors JA WT and other monolingual dictionaries. I'm less certain about #2, since what I added isn't deeply researched (mostly I wanted to provide a quick-and-dirty visual example), and it's based on modern sources and historical kanji usage can be very divergent. Thoughts? ‑‑ Eiríkr Útlendi │Tala við mig 22:24, 5 February 2018 (UTC)

Also, I was just looking at Okinawan today, and the literature suggests an Old (first documented until early 1600s), Middle (1600s to 1800s), and Modern stratification. If you all think it's appropriate, we could add those languages as well. DerekWinters (talk) 18:16, 2 February 2018 (UTC)

For modern Japanese, hiragana entries are only ever soft redirects to the kanji spellings (except for those words that have no associated kanji).

The かへる entry in its current state is laid out as the lemma for the classical form of modern 帰る(kaeru, “to return, to go back to one's starting point, to go home”, intransitive).

I see a few problems with this.

This is inconsistent treatment between modern JA and classical.

This is incomplete treatment: the かへる spelling applies to at least one noun (蛙・蝦), the lemma form of at least three different verbs (one with five discrete kanji spellings: 代へる・換へる・替へる・変へる・易へる, one with four: 反る・返る・帰る・還る, and one with one: 孵る), and the non-lemma conjugated forms of six more verbs (two with three discrete kanji spellings each: 代ふ・換ふ・替ふ, and 買ふ・代ふ・換ふ, several verbs with one: 交ふ, 肯ふ, 支ふ, 飼ふ).

(For our readers unfamiliar with the Japanese language, I hope the above makes clear some of the lexicographical challenges inherent in the language and its writing system.)

I feel rather strongly that we should have the same policy for both modern Japanese and older stages of the language with regard to choosing lemma spellings.

I'd suggest one of two approaches:

Align Middle / Classical Japanese practice with modern Japanese, and use the kanji spellings for the lemma, with hiragana entries as soft redirects.

Pros:

We already have this practice, editors are used to it, and we can likely repurpose a good bit of the supporting infrastructure (templates, modules, etc.).

Cons:

As illustrated above, using kanji for the lemmata obscures the fact that, in many cases, we have one word spelled in multiple ways, with each spelling imparting a shade of meaning, but not fundamentally altering the basic theme.

We must also either duplicate a lot of data, or arbitrarily choose one kanji as the "main" and create the others as soft-redirect "alternative form" entries. This can also obscure relationships between senses and spellings.

When one kanji spelling has multiple readings, and more than one reading belongs to the same category, only the last reading on the page actually gets added to the category. This appears to be a fundamental flaw in the underlying MediaWiki database software. See 避く(saku, yoku, “to dodge, to avoid”) as one such example -- although both readings are marked for inclusion in [[Category:Japanese_shimo_nidan_verbs]], only the yoku reading actually appears on that page.

Drastically rework our approach to Japanese to use hiragana spellings as the lemma, breaking each derivation out under its own ===Etymology=== section, indicating on each sense line which kanji spelling is most commonly used. Kanji-spelling entries would instead be stubs redirecting to the hiragana entries.

Pros:

This aligns with the common practice of monolingual Japanese dictionaries, including JA WT, and is also closer to how many bilingual dictionaries function.

This is easier for learners, who may know how a word sounds (and can thus work out the hiragana), but might not know how to spell it in kanji.

This is easier for learners when looking for the various meanings that might apply to a particular verbal or otherwise-non-kanji context. In our current setup, unless the hiragana entries include glosses for all the kanji spellings, users have to click through each separate spelling to try to find the appropriate meaning. Maintaining glosses in multiple places can be difficult.

Categories will index more appropriately. While a single kanji spelling might have multiple readings that must all be indexed within the same category (but cannot be due to the software), a single hiragana spelling is already the reading, and will thus only need to be indexed once within the same category.

Cons:

We'd need to rework all of our existing entries and infrastructure.

In terms of simple numbers of pros versus cons, it seems clear that hiragana spellings would be the better choice. However, that one con is a huge one. If we were starting from scratch, I'd definitely argue wholeheartedly that we go that route. Given the current state, I still argue in favor of hiragana spellings, for both modern Japanese and older forms, albeit with an awareness of the enormousness of the work required to convert our existing entry base.

I also like hiragana entries as far as Classical Japanese is concerned. Modern Japanese officially uses mixed spellings and we can very easily check real usage. — TAKASUGI Shinji (talk) 07:41, 3 February 2018 (UTC)

In Wiktionary:Entry_layout#Part_of_speech it says "Some POS headers are explicitly disallowed:" which includes "Abbreviation" and "Initialism" but it doesn't suggest what should be used instead in those cases. And of course they end up in a cleanup category. DonnanZ (talk) 14:04, 3 February 2018 (UTC)

Please use the actual part of speech as if it were a normal word or phrase used in the same context, like PC (personal computer) is "Noun". SNAFU has both "Phrase" and "Noun". --Daniel Carrero (talk) 14:12, 3 February 2018 (UTC)

OK, I guess that can be done, but a guidance note there wouldn't go amiss. Another odd one is "Symbol" which isn't even mentioned, but is used at TranslingualCH. DonnanZ (talk) 14:20, 3 February 2018 (UTC)

Absolutely, I support having some guidance note. But actually "Symbol" is mentioned, in the part that says "Symbols and characters: Diacritical mark, Letter, Ligature, Number, Punctuation mark, Syllable, Symbol". --Daniel Carrero (talk) 14:25, 3 February 2018 (UTC)

Heh, so it is, even though CH may be an abbreviation.... DonnanZ (talk) 14:37, 3 February 2018 (UTC)

I think it makes sense to say that "CH" is a "Symbol", because chemical elements and formulae are not phrases and so don't use nouns; they use letters as symbols, which may be in diagrams as opposed to text. That's just my personal interpretation. Feel free to disagree if you want. Aside from that, it seems normal in the English Wiktionary to use "Symbol" for chemical elements and formulae, so at least it's consistent if nothing else. --Daniel Carrero (talk) 16:10, 3 February 2018 (UTC)

Another one: Even though "Idiom" is specifically disallowed, it can still be selected if you use NEC (new entry creator). I didn't check if there are other disallowed ones there, I think there are. DonnanZ (talk) 14:42, 4 February 2018 (UTC)

What should be used instead of Idiom? There are many such lexical items in Japanese that don't fit into POS categories (they aren't nouns, verbs, adjectives, etc, but rather four-character set phrases in some cases, or whole sentences in others). ‑‑ Eiríkr Útlendi │Tala við mig 18:20, 7 February 2018 (UTC)

Many entries use "Definitions", which doesn't have a voted consensus. What should be done with those? —Rua (mew) 18:25, 7 February 2018 (UTC)

So far I've only encountered a Definitions header in Chinese entries, for which the Chinese editor community has made a strong argument (as I've understood it, largely based on Chinese terms not fitting nicely into POS categories). Is this header in use in the entries of any other languages? ‑‑ Eiríkr Útlendi │Tala við mig 18:29, 7 February 2018 (UTC)

Apparently the cuneiform characters given by Unicode are wrong. I've recognized some of the erroneous signs, including "e", "ku", "an", "kan". I've also noticed that the Neo-assyrian sign for "an" mentioned here matches perfectly the Hittite "an" sign, while here in Wiktionary we write "an" as "𒀭". It looks as if Hittite was written in Neo-assyrian, but instead we are writing it in some earlier stage. As far as I've read it seems like Unicode doesn't have signs for the Neo-assyrian cuneiform.

The site assyrianlanguages.org also offers Hittite texts with their corresponding transliteration. I don't know what we're supposed to do in this kind of situation. --Tom 144 (𒄩𒇻𒅗𒀸) 15:05, 4 February 2018 (UTC)

I am afraid this is a problem with all paleoscripts. There is not a Hittite Unicode but a cuneiform script not specific to any alphabet. Unicode block for cuneiform script does not cover all variants of different alphabets nor its allophones, only a standard representation of each glyph based on the most common language. Generally it is hard to reproduce an original inscription in any paleoscript with Unicode. --Vriullop (talk) 12:27, 5 February 2018 (UTC)

@Vriullop: In that case, should we use pictures as in Egyptian?--Tom 144 (𒄩𒇻𒅗𒀸) 01:42, 6 February 2018 (UTC)

Probably. Doing so should be easier for Hittite than for Egyptian, if the Hittite signs are not stacked in various ways like hieroglyphs are: we could probably just get images of every variant sign and make a template which would convert input text like "fu bar2" into the sign for "fu" and the second of two variant signs for "bar". Although he's busy (aren't we all?), I think @JohnC5 had interest in doing something similar for Italic languages and might be interested in this idea. Wikimedia Commons hopefully already posseses all the needed images. - -sche(discuss) 01:51, 6 February 2018 (UTC)

I'm afraid we don't have them all. I've looked up some online sources but they contradict each other. I'm thinking in trying to contact some known author, or would this be too much?--Tom 144 (𒄩𒇻𒅗𒀸) 23:20, 6 February 2018 (UTC)

You could try to make them yourself if you have the necessary expertise. DTLHS (talk) 23:25, 6 February 2018 (UTC)

On behalf of the program commmittee of Wikimania 2018 - Cape Town, we are pleased to announce that we are now accepting proposals for workshops, discussions, presentations, or research posters to give during the conference. To read the full instructions visit the event wiki and click on the link provided there to make your proposal:

Actualités reach the stars! Despite a missing admin, this edition offer you four articles: about thesauri in French and English wiktionaries; new words coined by French government; a funny dictionary and the suffix -gate. Surrounded by shiny pictures are the shorts, galactic stats, nice videos and a note about the last LexiSession. Big news: we include the stats for the quantity of pictures included in French Wiktionary and we plan to reach 100.000 this year!

This issue was written by eight people and was translated for you by Pamputt and I. This translation may be improved by readers (wiki-spirit) like it was last month by Stephen G. Brown and Xbony2 (thanks mate!). We still receive zero money for this publication and your comments are welcome. To celebrate this new year, I worked on a description of our workflow to explain how we do our journal, and I translated it in English for you! I'll be happy to help if you want to start your own journal here in the future Noé 20:53, 5 February 2018 (UTC)

About a week ago, Rua tried to remove the distinction between “epicene” nouns and nouns and those with sociolinguistic variation in gender usage from the Portuguese noun headline module. This caused {{pt-noun}} to display incorrect information and filled Category:Portuguese nouns with varying gender with thousands of entries that were never intended to be there. She had already tried to do this many times, and as before I had to stop what I was doing to write a hasty fix.

More recently, she speedied {{pt-noun-form}}. The reason given was “Deleted per RFD, RFDO”. Where is the RFD? I recall that some HWL templates that were redundant to {{head}} were RFDed, but pt-noun-form was not completely redundant: it had a paramater that made it display information about metaphonic plurals and add the entry to the appropriate category. Rua had her bot convert the template, in some cases manually removing said parameter. As a result, Category:Portuguese metaphonic plurals is now empty and the information about metaphonic plurals is gone.

As many here may remember, Rua (then CodeCat) was pulling the same crap on our Thai content a while back, trying to meddle with languages that she doesn’t contribute to nor understands. I hope I can avoid a drama as big as that one, but I confronted her about the removal of information and she didn’t respond, so I have to post here. The bot guidelines say that an operator must undo damage caused by their bot, and that’s what she should do. — Ungoliant(falai) 13:34, 6 February 2018 (UTC)

Rua has committed three fouls, by my count. She has removed valuable information from entries, which is easily solved by doing a bot run to undo what she did. Secondly, she deleted a template that does not seem to have been RFDed (or at least it was not linked to in any RFDO discussions) as having failed RFD, which is a misrepresentation. Thirdly, she did not respond to Ungoliant's question, which is irresponsible both as an admin and as a conscientious fellow editor. These are all related to problems in the past, which she swore would be nullified by her seeking consensus. @Rua, Chuck Entz —Μετάknowledgediscuss/deeds 17:19, 7 February 2018 (UTC)

Spite fights like with "Redoing the work that was too hard for poor Ungoliant" are not what edit summaries are for, that's my two cents. mellohi! (僕の乖離) 20:57, 7 February 2018 (UTC)

I would like to update that she has edited this discussion page after this was posted, and therefore is actively ignoring this thread. —Μετάknowledgediscuss/deeds 21:50, 7 February 2018 (UTC)

I have no interest in participating in a show trial. I can only make it worse, so I'm staying quiet and awaiting the inevitable storm. —Rua (mew) 23:05, 7 February 2018 (UTC)

@Rua: No, you can make it better. I listed the solutions in my earlier comment. To make it abundantly clear: if you restore the template and do a bot run to replace it where you removed it (or choose another solution for denoting metaphonicity in those entries, it need not be that particular template), and acknowledge that you made a change without consensus and will avoid that in the future, I (and I expect everyone else, including Ungoliant) will be satisfied. The storm is not inevitable unless you choose it to be. There would never have been a BP thread if you had responded to the message on your talk page, and the thread need not continue if you choose to fix the problem. —Μετάknowledgediscuss/deeds 00:44, 8 February 2018 (UTC)

This is what I mean. All eyes are on me, I'm the only one who does everything wrong. —Rua (mew) 13:23, 8 February 2018 (UTC)

That's a straw man. You removed lexicographical information from dictionary entries, so this is a clear-cut issue. You can choose to fix it, or indulge in an ill-advised persecution complex. —Μετάknowledgediscuss/deeds 15:29, 8 February 2018 (UTC)

No, I'm not happy. There are entries where you removed the information and have not replaced it. It used to display (show the reader the information) on the plural form, which is the actual one affected by the process; now all we have is categorisation of the lemma (with no display there either). Indeed, it is true that you have to earn good will. —Μετάknowledgediscuss/deeds 15:55, 8 February 2018 (UTC)

@Rua, you are ignoring me. You have not fixed the problem, but instead did a little bit of work toward fixing it. That is not acceptable. —Μετάknowledgediscuss/deeds 01:55, 10 February 2018 (UTC)

I'd like to mention that both Rua and Ungoliant engaged in wheel-reversion far more times before seeking community help than good practice would permit. Please try to be more aware about becoming trapped in unproductive behaviour. Korn [kʰũːɘ̃n] (talk) 10:31, 8 February 2018 (UTC)

What reason would CodeCat/Rua have to make a corrective action? Similar fait accompli have worked for them in the past. After Wiktionary:Votes/sy-2017-11/Desysopping CodeCat aka Rua failed spectacularly, why should CodeCat/Rua bother? Their best strategy for them is to do nothing, and continue in the same vein as they did in the last multiple years. I do not remember CodeCat ever fixing after themselves anything for which it turned out there was no consensus, but my memory may fail me. I mean, if you misbehave for multiple years, and after that, the community gives you a resounding approval, why would you, at that point, introduce a behavior change? --Dan Polansky (talk) 10:32, 18 February 2018 (UTC)

It would be a great if we had a code for Proto-Munda, the common ancestor of the Munda languages (mun). There are a lot of reconstructions here. @-sche. —AryamanA(मुझसे बात करें • योगदान) 16:16, 8 February 2018 (UTC)

How long has this been around? I just noticed it on an Armenian entry only now. Someone should have told the word dewd about this. This can actually be a useful distinction in the case of some languages, like Romance ones, and I've been particularly interested in using something like this for Albanian to distinguish between terms it borrowed/took from Vulgar Latin in ancient times as a natural process of prolonged interaction versus the much later learned borrowings from Classical Latin in the last couple of centuries. One reason why I've been using the der template for ancient Latin loans as opposed to calling them explicitly borrowings, since the process was much different. But now I don't have to do that. Word dewd544 (talk) 22:22, 10 February 2018 (UTC)

@Word dewd544: We'd have thousands of entries to fix, that's why I've never bothered with it... But yes, it could be an interesting distinction. I wish it were developed a bit more in the documentation page though. --Per utramque cavernam (talk) 11:57, 14 February 2018 (UTC)

Yeah, I know. I don't have the stamina to start using it for many languages. There's too much to do. For most other languages it's understood that borrowings from Latin were learned. Albanian was just a unique case since there were at least two distinct "layers" or periods of borrowing/incorporation, the first of which happened organically in the distant past, sometimes from vulgate terms that weren't even fully attested. And Armenian can be an applicable language too, I guess, although just using the regular 'borrowed' from Old Armenian wouldn't really be that different. Same can go for modern Greek words borrowed from its Ancient counterpart. I also agree that it should've been described in more detail in the doc page; that could have been useful. I guess it never really took off. Word dewd544 (talk) 22:02, 14 February 2018 (UTC)

Can an administrator or template editor please add ["km"]="Module:km" to the list phonetic_extraction on Module:links (with a comma after the Thai line)? The relevant discussion can be found at Wiktionary talk:Khmer romanization. Thanks! Wyang (talk) 13:04, 13 February 2018 (UTC)

Can we rename this to Inuvialuktun? This is the name that I believe is more commonly used throughout the literature (especially modern literature) and is simpler than Western Canadian Inuktitut. DerekWinters (talk) 08:32, 15 February 2018 (UTC)

Well-spotted. I agree it should be renamed. It looks like about 70 entries (translations tables, modules, etc) will be affected. I can rename it in a day or so, if no-one wants to beat me to it (feel free to!), or raise objections. - -sche(discuss) 09:51, 15 February 2018 (UTC)

Do we want to eliminate Proto-CMP (plf-pro) and replace it with Proto-CEMP (poz-cet-pro)? Some of the Proto-CMP entries already have identically spelled Proto-CEMP correspondents; but the others would have to be moved.

If so, is someone with a bot willing to change all instances of plf-pro to poz-cet-pro in mainspace?

Merging the ones that are spelled identically (and not just changing the code in mainspace entries but deleting then-redundant links like so) is a no brainer; I'll take a go at mainspace entries where CMP can be merged into CEMP that way with AWB. The other (Proto-CMP) entries should, I think, be moved, per the linked-to discussion. - -sche(discuss) 21:32, 15 February 2018 (UTC)

It's never been clear to me what our end goal is in terms of grouping languages. If we want to eventually provide a code for every well demonstrated monophyletic grouping of languages, then #2 is the way to go. —Μετάknowledgediscuss/deeds 23:59, 15 February 2018 (UTC)

Providing a code for every well demonstrated and widely accepted monophyletic grouping seems to definitely be our goal for the Indo-European languages, so why not for other families? I guess what my question amounts to is this: are EP and NP well demonstrated and widely accepted as being both monophyletic and clearly distinct from general Polynesian, with the members listed on Wikipedia? —Mahāgaja(formerly Angr) · talk 09:54, 16 February 2018 (UTC)

Having codes for groupings is fine, but my opinion on the proto-languages is the same as three years ago: we do not need proto-languages with only miniscule differences from their parent as separate languages, and they are probably best treated as simply dialect labels of their parent. --Tropylium (talk) 00:32, 18 February 2018 (UTC)

I agree, but that's beside the point at this stage. We do currently have the proto-languages but not the groupings; I'm looking for agreement to add the groupings. Whether we want to remove the proto-languages is a different issue, and one I don't know enough about Polynesian linguistics to weigh in on. —Mahāgaja(formerly Angr) · talk 15:11, 18 February 2018 (UTC)

Hello. I am an admin from Turkish Wiktionary. I need some help with storing javascript arrays into data files, just like we do with Lua modules. So, we have this (tr:MediaWiki:YeniMadde.js) js file which helps users who doesn't know how to create a new entry, but in it, there are some arrays used. I have also created this page: tr:MediaWiki:YeniMadde.js/Menüler.js to store all arrays. But I couldn't manage to access them from the main js file. I have read mw:Manual:Interface/JavaScript page, these are useful information, but still do not understand how can I access an array from an external js file. If anyone could help me, I would appreciate it. Thanks! ~ Z (m) 10:23, 16 February 2018 (UTC)

@HastaLaVi2: The way in which I transfer items between scripts is, in script 1, placing the items in the window object, then in script 2 loading script 1 with jQuery.getScript and using the items in a callback: jQuery.getScript(/* script URL */,function(){/* code that uses the items in this script */}). You can see an example of this technique in MediaWiki:Gadget-AcceleratedFormCreation.js, where User:Conrad.Irwin/creationrules.js is loaded and its function window.generate_entry is used. (That's where I got the technique originally.) Maybe there is a more elegant way to do this, I don't know. I like the Lua way, in which modules don't write to the global object. 19:39, 16 February 2018 (UTC)

Thanks a lot for your response! Now I see it, actually using window object is the good way of doing this. I agree with you on that now. I am really new at this coding extensions to the wiki, but I hope to be getting better in time. So thanks again for your help! :) ~ Z (m) 10:30, 17 February 2018 (UTC)

Hi. I am already a rollbacker on Simple English Wiktionary. I am also autopatrolled here. I am trusted here and I regularly look into recent changes and revert vandalism. Therefore, I would like to request for the rollback right. Pkbwcgs (talk) 15:11, 16 February 2018 (UTC)

@Pkbwcgs: Looking at your edits, I don't actually see many which are undoing edits other than your own; most of your work seems to be fixing systematic formatting problems, which is still very helpful, thank you! Still, you've been around here and around Simple for a year and you are a rollbacker there, and I see no reason to deny this request (as another admin pointed out once, people can just undo edits or write js to acquire the same one-click functionality as the rollback feature; it's not a restricted ability the way being able to delete things or block people is), so I have granted it. - -sche(discuss) 15:49, 18 February 2018 (UTC)

I'd like to start a discussion about the ancestor of the various Middle Indo-Aryan lects. As stated by {{R:inc:Kobayashi:2004}} "Vedic was probably a specific dialect of Old Indo-Aryan; it was quite close to, but not identical with the language from which Middle Indo-Aryan developed." This is clearly illustrated by various archaisms found in MIA, such as no *gẓʰ-*kṣ merger in Gandhari and Pali, so to say they are descended from Vedic is demonstrably inaccurate:

@Victar: This issue has been raised before. Like last time, I think we should treat Vedic as representative of all Old Indo-Aryan dialects (which is the status quo now); it's just a technicality, and in 99% of cases the Sanskrit and MIA forms match perfectly. And if we take Vedic as representing all OIA dialects, it's not "demonstrably false" at all. Furthermore, MIA languages underwent later standardization where the thorn cluster Sanskritक्ष्(kṣ) was standardized to kh (ch in Maharashtri Prakrit). For example we have Sindhi[script needed](khã̄iṇu) and Kashmiri[script needed](chawun) for the word you give as an example. (and oh look the Dardic matches the Sanskrit, how interesting)

Also, the layout you gave does not reconcile the Sanskrit dialects. How can *झापयति(jhāpayati) lead to क्षापयति(kṣāpayati)? The example also completely ignores the Prakrits, which are IMO equally if not more important than the languages here. It is also generally accepted that Sauraseni Prakrit is a direct descendant of Rigvedic Sanskrit. Ashokan Prakrit is missing too, which is of much greater antiquity than either Gandhari or Pali, comprising the "Early Middle-Indo-Aryan" stage. They're important if we intend to discuss the ancestor of all MIA languages.

I totally refuse to format etymologies in this manner. I make a *lot* of Hindi entries, and I am not changing anything to reconstructed Sanskrit unless it is necessary (like at Hindiझरना(jharnā), where Proto-Indo-Aryan is more than enough).

Woah, @AryamanA, slow your roll. No one is telling anyone to do anything -- I was just opening it up to dialog. I'm totally fine having "Sanskrit" represent all dialects of OIA; it's only when we start calling it "Vedic" that I find we run into a problem, which current literature would agree with. And I'm not "ignoring" Prakrits in my example. My intention wasn't to detail the whole of the IA tree; I was simply illustrating the *gẓʰ-*kṣ merger discrepancy I mentioned above it. If we ever added reconstructions for these unattested Sanskrit forms, I haven't even put thought into the transcription of it. I'm also well aware of the Sanskritization process.

I certainly would be opposed to creating a bunch reconstructed Sanskrit entries that are identical to Vedic Sanskrit, but I don't see a problem with creating Sanskrit reconstruction entries for differing ancestral dialectal forms. I also don't see a problem reflecting this dialectal form in descendent trees, if not as a separate level, perhaps on the same line, ex. Sanskrit: kṣā­pa­ya­ti, *jhāpa­ya­ti. All in all, it's not very different from what we already do for Latin. What are your thoughts on that? --Victar (talk) 07:04, 17 February 2018 (UTC)

@Victar: Sorry if my response was too aggressive, I'm just putting all my cards on the table so this discussion doesn't drag on like our previous discussion on this topic. I think Sanskrit*झापयति(jhāpayati) is unnecessary if we already have Proto-Indo-Aryan*gẓʰāpa­ya­ti. Maybe we could keep it unlinked or something, but I feel that having a full-blown entry for Sanskrit*झापयति(jhāpayati) is redundant.

@AryamanA: No worries. If I was to sum up your previous discussion on this topic, it was that we're treating Sanskrit as Latin, placing all forms in a developmental and dialectal continuum. I'm on board with that, but than, like with Latin, we need to address even the unattested forms. Compare *accatto to *झापयति(jhāpayati). I still take issue with calling *झापयति "Vedic" because it nullifies that whole advantage of the temporal and dialectical vauguity of a unified Sanskrit. Why not just K.I.S.S., as we do for Old French and Anglo-Norman French, and simply keep them all on the same line, as so? --Victar (talk) 18:57, 17 February 2018 (UTC)

@Victar: yeah I too feel the actual ancestors of IA languages were so close to Sanskrit that distinguishing between them is often pointless. I don't oppose reconstructing Sanskrit terms if someone can. A slight problem may be posed if there's an IIR/IE etymon and we use {{inh|sa}} or {{der|sa}} in the reconstructions as it's going to cause CAT:Sanskrit terms derived from Proto-Indo-European to display unattested words. It can be resolved by entering "see kṣā­pa­ya­ti" in the etymology. -- माधवपंडित (talk) 09:49, 17 February 2018 (UTC)

As to the matter of chronology to keep in mind, Middle Indic dialects existed at the same time as Vedic Sanskrit. Even the Rig Veda has many words that clearly come from synchronic basilects spoken daily (as opposed to the conservative, ceremonial acrolect used in the Rig Veda). These dialects gave vocabulary, phonology, and morphology which appear all over the Rig Veda. It's a very frustrating issue, since Vedic Sanskrit cannot be their ancestors but existed within a dialectal continuum with them at the time of the composition of the hymns. Our lexicographical issue stems from the fact that only one dialect is recorded from this period. I'm not proposing a solution to this issue, but merely ensuring that when we talk about MI potentially “coming from Vedic,” we realize that this is deceptive because MI already existed by then. —*i̯óh₁nC[5] 11:10, 17 February 2018 (UTC)

I thought that Sanskrit is only an excellent proxy for the ancestor of Indo-Aryan languages and not the thing itself, so that we use it as such for convenience. Making reconstructed Sanskrit entries seems to me both inconvenient and technically incorrect. I feel the same way about reconstructed Ashokan Prakrit, but ultimately I believe that decisions like these should belong to those who do the work. Crom daba (talk) 13:54, 17 February 2018 (UTC)

@Crom daba: Who's to say the term Sanskrit can't refer to the collection of OIA lects, of which only one was standardized and made the prestige dialect. If we look at it that way, the Sanskrit reconstructions of other dialectical forms are perfectly correct. DerekWinters (talk) 15:39, 17 February 2018 (UTC)

We could say that if it pleases us. But there does seem to be an understanding philologically that when we speak of Sanskrit we mean a specific corpus of texts (especially when we talk of Vedic Sanskrit and so) and a certain usage of the language (as a language of Religion and higher learning), if I'm not mistaken its very name refers to this.

We could also reconstruct Old Church Slavonic or 18th-century Slaveno-Serbian or Classical Mongolian or Old Turkic, but it seems inconvenient and not necessarily correct. Crom daba (talk) 16:38, 17 February 2018 (UTC)

@माधवपंडित: If we're calling Sanskrit a OIA continuum, a Pali etymology with from {{inh|pi|sa|*झापयति|tr=jhāpayati}}, {{m|sa|क्षापयति|tr=kṣāpayati}} would be just fine. --Victar (talk) 19:41, 17 February 2018 (UTC)

One could also do something like from dialectal {{inh|pi|sa|*झापयति|tr=jhāpayati}} (compare {{m|sa|क्षापयति|tr=kṣāpayati}}), from... --Victar (talk) 22:49, 17 February 2018 (UTC)

@Victar: That's fine. However in the reconstructed Sanskrit entries, the user may be directed to the attested variation for further etymology. -- माधवपंडित (talk) 02:31, 18 February 2018 (UTC)

Shouldn't "Proto-Indo-Aryan" already cover this distinction? Or is there a language that descends from PIA, but not (pre-Vedic) Sanskrit? Crom daba (talk) 09:54, 17 February 2018 (UTC)

Also, just had a thought this morning. Ashok became Buddhist. Does that mean Pali predated Ashokan Prakrit o_O? DerekWinters (talk) 12:32, 17 February 2018 (UTC)

@DerekWinters: I think the Buddhist Canon was transcribed during or after the time of Ashoka, so it didn't really "predate" it, but was probably only a little later. An interesting thing to note is that the Girnar dialect of Ashokan Prakrit and Pali share a lot of features. —AryamanA(मुझसे बात करें • योगदान) 14:56, 17 February 2018 (UTC)

@AryamanA: But didn't Buddha speak Pali natively (or a very similar version)? Also yeah, the Gujjars came in the from the northwest so I wonder how much of the Girnar dialect they absorbed. DerekWinters (talk) 15:29, 17 February 2018 (UTC)

@DerekWinters: Hmm, according to Masica both Pali and Ashokan Prakrit are of the same stage, the "Early Middle Indo-Aryan", along with Old Ardhamagadhi. I guess they were both spoken at the same time. And anyways, Ashokan Prakrit is not really one language, more of a pan-India group of early Prakrits that were mutually intelligible. —AryamanA(मुझसे बात करें • योगदान) 16:07, 17 February 2018 (UTC)

If we want to have various PIA dialects as "Sanskrit", I agree with Victar that the sub-label "Vedic Sanskrit" needs to be limited to the actual attested Vedic, and not for other early forms from the same period. Merging things as Proto-Indo-Aryan instead of Sanskrit would probably work. I do not think Mitanni Aryan is a major issue here, since last I checked, the evidence to consider it a part of Indo-Aryan specifically at all, instead of simply early Indo-Iranian NOS, is pretty weak. It's clearly neither Nuristani nor Iranian, but that doesn't mean it has to be IA.

Huh, I guess that requires an OIA dialect that merged the "thorn" clusters by POA, but not by voice (*kš, *ĉš > *ch, but *gž, *ĵž > *jh)? Of course ch as the usual correspondence/reflex of kṣ in parts of MIA already suggests something of the sort. --Tropylium (talk) 10:17, 18 February 2018 (UTC)

I'm sorry, Metaknowledge, I don't know much about these languages, so I can't check their edits. -- Curious (talk) 11:32, 18 February 2018 (UTC)

They seem somewhat negligent, but put in work. Kumyk and Karakalpak changes are correct as far as I can tell (although the gsub function doesn't work that way so they aren't working as intended currently), Tofalar module change (I didn't think we had a automatic Tofalar transliteration, weird) was consistent with WT:Tofa language (which was apparently made by an earlier incarnation of the user still), but seems to have deleted h character probably by accident.

Proto-Turkic entries they made seem basically correct but also riddled with mistakes. This user requires cleaning after, but I'm hoping they can evolve into an asset. Crom daba (talk) 22:56, 17 February 2018 (UTC)

The problem is that we don't know what sources they're using, and they won't communicate with us. They're still making lots of mistakes after (at least) two years, and they've been quite disruptive at times, even apparently creating a throw-away new account to continue an edit-war (see the revision history at Module:ba-translit). They won't improve if they refuse to listen to us. Chuck Entz (talk) 02:34, 18 February 2018 (UTC)

He is somewhat trying to be useful but leaving lots of mess around to deal with. I believe there are a couple of accounts linked to him along with various IP addresses that do the same. Recently I created *ï(“plant”) and mentioned its possible relation to *ïgač(“tree”). He immediately created two entries for one root. One as "ïgač" (what i mentioned) and the other as "ɨ(ń)gač" (From Starling), probably not realizing they are the same. One of the mess he leaves behind is related to orthography, looking at the entry "ɨ(ń)gač", he transliterated Old Uyghur word as "îġać" which is amusing along with other orthographies. It is rather interesting to see such dedication to add stuff so wrong. I witnessed him trying to add transliterations for Old Uyghur just by looking at some words i listed on PT pages, it seems that he has no idea and he is trying to come up with what might resemble the transliteration. He created *jāg(“fat”) and immediately decided that *jagɨ(“enemy”) should have a long /a/ as well and created that entry. He put "Starostin, Sergei; Dybo, Anna; Mudrak, Oleg (2003), “*jāgɨ”" in his reference not even bothering to pay attention the source has the /a/ short.

A lot of the time he is just inventing stuff and being annoying to deal with as he seems to be running alternative accounts. --Anylai (talk) 09:04, 18 February 2018 (UTC)

Maybe we need a Korean regular to communicate some rules to them. If they keep making more mistakes than we have the manpower to handle, blocking them might be a better option. Crom daba (talk) 11:51, 18 February 2018 (UTC)

As long as they preface it with "According to the controversial Anglo-Uralic theory" I have no problem with this. Crom daba (talk) 20:20, 18 February 2018 (UTC)

Reminds me of this "journal" article. I love the disclaimer: "Individual authors are responsible for facts included and views expressed in their articles". So much for peer review... Chuck Entz (talk) 07:18, 19 February 2018 (UTC)

Let me voice my opinion.

This user leaves behind mess that needs cleaning. For some time now, each of my sessions has begun with looking at my Watchlist and cleaning up what this user has done recently.

This user does not not appear to consult dictionaries, invents stuff e.g. based on cognates.

This user misses some of the fundamentals of Turkology - this is unfortunate, as s/he often edits the Etymology section.

This user won't communicate, although I have proposed him/her to register so we could communicate.

All of this is a pity. It would be nice to have a communication with this user. Would be ideal to see this user grow into a reliable and responsible editor — every contributor can potentially make a difference. Borovi4ok (talk) 08:41, 19 February 2018 (UTC)

Although some users don't care for the anagram sections in entries, I've gotten so that I rather enjoy them (just cracked a smile, for example, when I saw that gone to the dogs and get the goods on are a pair). This is one of those quirky extras that contributes to Wiktionary's thoroughness and uniqueness.

Word lovers might also appreciate a system-maintained list of all the anagrams in English Wiktionary, perhaps in the form of an alphabetical appendix containing 2 entries for each anagram (1 for each member of the pair). I'm not a programmer, but expect that bots could probably build and maintain it. Does anyone else like this proposal? -- · (talk) 00:31, 18 February 2018 (UTC)

I've been adding a number of Ligurian entries as of late, and it occurred to me that there should probably be some kind of consensus on which orthography to use for the entries. The "official orthography" I use does not have an actual "official" status (it is promoted by the Académia Ligùstica do Brénno, which deals with Genoese, which - AFAIK - is the variant Ligurian is based on), and uses different levels of "accuracy".

marking every long vowel (except for the stressed ones, unless they fall into one of the above cases)

Using every accent, in a didactic context (which is what I do, so that a word's phonemic realization is clear)

Since no official orthography exists, I wanted to see if anyone has any thought on this. Also, summoning @Lo Ximiendo, as a seemingly active user regarding the Ligurian language -- GianWiki (talk) 14:20, 18 February 2018 (UTC)

@GianWiki Could we make a survey for speakers of Ligurian in order to see what they think of this? --Lo Ximiendo (talk) 18:07, 18 February 2018 (UTC)

@GianWiki, I'd choose the one that most closely approximates how people actually write and add more diacritics in the headword line, but I don't know anything about Ligurian. —Μετάknowledgediscuss/deeds 19:14, 18 February 2018 (UTC)

@GianWiki, for etheric guidance, pray to either the Christian God, or even to Indo-European gods and goddesses such as Wotan or Freyja or whatever is appropriate. --Lo Ximiendo (talk) 20:53, 18 February 2018 (UTC)

Most Ligurian entries I added came from A Compagna, the magazine published by Académia Ligùstica do Brénno. I recall that the majority of articles were written without the extra accents but, unlike in Italian, it was not rare to find running text with didactic accents.

I have no preference. Whatever GianWiki supports I will support. — Ungoliant(falai) 11:18, 19 February 2018 (UTC)

Hey all, I wanted to start a discussion about the formatting of declension tables across PIE descendents. Right now, they're very unstandardized, as demonstrated here. I'm wondering if it wouldn't be a good idea to write a unified module that all theses languages can piggyback on instead. What are peoples thoughts on that? @Erutuon, JohnC5, Rua, Metaknowledge, Mahagaja, Per utramque cavernam --Victar (talk) 18:14, 18 February 2018 (UTC)

I don't see any need to standardise them, unless we intend to standardise all inflection templates across all languages (which I think would meet with a great deal of resistance). They have different aesthetics, many of which created in concert with other templates for those languages, and that's just fine. —Μετάknowledgediscuss/deeds 18:26, 18 February 2018 (UTC)

I think if you go beyond PIE descendents, it becomes more difficult to standardize. Also, I'm just talking about declension tables, not verbal inflection tables, etc. Looking past just formatting, many languages lack declension tables, and creating a sort of plug-and-play module would help remedy that. --Victar (talk) 18:52, 18 February 2018 (UTC)

This seems like a conspiracy to introduce the barbaric practice of placing the accusative before the genitive to more languages, I oppose it totally. Crom daba (talk) 19:04, 18 February 2018 (UTC)

NAGD is very reasonable for West Germanic, I'll have you know. Glory to its creator, boo Germany for not picking up on it. Jokes done, I don't see a need for unification either. The current practice allows for tailored tables and shows no major detriments. Writing some IE-module can be done without changing the current tables and the covens of the individual languages then can decide whether to migrate. Korn [kʰũːɘ̃n] (talk) 19:59, 18 February 2018 (UTC)

NAGD isn't very reasonable for West Germanic. NADG would be reasonable. Accusative and dative (sometimes merged into a single accu-dative case) are more similar than genitive and accusative or genitive and dative. Instrumental and vocative however are another thing. For Latin something like NV[Acc]-G(L)[Abl]D and at the same time NV[Acc]-GD[Abl](L) might be more reasonable: The optional locative is somewhere between genitive and ablative, and dative is between genitive (1st and 5th declension sg.) and ablative (1st and 3rd til 5th declension pl., 2nd declension). Considering Vulgar Latin and Romance languages, ablative should be near accusative like NV[Acc][Abl]D(L)G. This would even fit with the basic West Germanic NADG. For all of a PIE however, a sorting based on tradition as NGD[Acc]V[Abl]LI might be less controversal and might make more sense. -80.133.110.226 20:46, 18 February 2018 (UTC)

LOL, I laugh because I think you're joking, but I honestly don't know. You could always make order an option of the module. --Victar (talk) 22:04, 18 February 2018 (UTC)

I am joking (mostly, NAGD does irk me), but it was meant to point out that standardization may be incompatible with the respective grammar traditions of languages (such as ordering).

As far as diversity of table styles is concerned, I like it, but maybe it could be seen as unprofessional, no strong feelings either way. Crom daba (talk) 23:22, 18 February 2018 (UTC)

I agree too. About the ordering of cases, this shouldn't be a problem. Erutuon has already written a script for rearranging them in one's preferred order. --Per utramque cavernam (talk) 23:13, 18 February 2018 (UTC)

It should be mentioned as context to this debate is this discussion. Part of the issue is whether to display transliterations in Sanskrit on a separate line as we do in Russian, Arabic, and Ancient Greek, which I feel is clean, clear, and allows you to read the table either in the native alphabet or in transliteration easily. Victar feels that we should have each transliteration follow every term. That should certainly be part of this discussion, though there does not seem to be much impetus to standardize them at this point, it seems. —*i̯óh₁nC[5] 05:22, 19 February 2018 (UTC)

That discussion did rejog my interest in this, but how to display non-Latin text next to transliterations is a conversation to be had down the road. The more important discussion at hand in the technical feasibility and community support for the idea, and I rather not muddy things by interjecting my personal formatting opinions, but people are welcome to chime in at the other discussion. --Victar (talk) 05:37, 19 February 2018 (UTC)

There was a giant programming project undertaken by someone to make a general inflection table interface module (maybe for Uzbek?) that could be used for all languages. @Erutuon, do you remember what I'm talking about? —*i̯óh₁nC[5] 05:53, 19 February 2018 (UTC)

I don't care to bikeshed it, especially not the order of the cases, which can remain unstandardized for all I care, but something a little less random in the colors and typesizes would be nice.--Prosfilaes (talk) 23:04, 19 February 2018 (UTC)

A simpler first step towards standardisation would be to use one CSS style for all of these templates (so that the look at least is consistent). Any future changes can then be made in one place instead of having to maintain them all separately. This has the additional advantage in that users could then choose to override the style formatting in their personal CSS file, and it would cascade through to all tables in all languages. The wikitable class is one example standard. -Stelio (talk) 16:06, 22 February 2018 (UTC)

This makes perfect sense to me. While different languages have different requirements in the structure and content of their declension and conjugation tables, that does not mean we cannot employ a single, uniform style across all of those tables. Each language can keep the structure which is most appropriate (as is a concern of several above) and still look like similar content across the whole project. - TheDaveRoss 13:43, 1 March 2018 (UTC)

Is it just me, or that I actually like the variety in language-specific styles of declension tables? mellohi! (僕の乖離) 13:56, 1 March 2018 (UTC)

Hear, hear! A different visual style for a given language has the added usability advantage of making it immediately obvious that you're looking at something specific to that language. Frankly, I don't want a unified visual style for all languages. I like that Spanish tables have their own color coding (see abrir), different from Finnish (kaahata), different from Navajo (atłʼó), different from German (gucken), different from Japanese (開く(aku)), etc. etc. ‑‑ Eiríkr Útlendi │Tala við mig 18:03, 1 March 2018 (UTC)

The 500,000th English lemma is motlopi. Congratulations and here's to the next 500,000. DTLHS (talk) 21:03, 19 February 2018 (UTC)

Exciting! And (on the subject of milestones) before the month is out, we should make it to 5.5 million entries. We also have entries from about half of the world's languages at this point. - -sche(discuss) 21:29, 19 February 2018 (UTC)

I'm surprised it's linear. Are we never going to run out of words? (Also Hindi just hit 10,000, but that's hardly anything) —AryamanA(मुझसे बात करें • योगदान) 00:05, 20 February 2018 (UTC)

Obviously, the pool of English lemmas in existence has an input rate lower than our rate of adding English lemmas to Wiktionary, but not all such lemmas are equally easy to add, so we're not going to run out so much as come up against words that are increasingly difficult to define and cite (and you could say that we're already seeing the first signs of that). We're currently mainly limited by effort, so the number of lemmas in existence is irrelevant; while growth is linear, we can't "feel" the ceiling. —Μετάknowledgediscuss/deeds 00:12, 20 February 2018 (UTC)

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘
Still a lot to do before reaching the ceiling for foreign languages...

It would be nice to have lemma figures to go with the pie chart. DonnanZ (talk) 13:30, 20 February 2018 (UTC)

Congrats! That's great! It's a beautiful milestone! To compare, French Wiktionary only have 360,000 lemmas for French (but about a third of people contributing and the project is one year and half younger). @Wyang I am very interested if you can make a chart for French here, to discuss it with my colleagues Noé 12:38, 21 February 2018 (UTC)

I had suggested some time ago that we have something like a "language of the month" where we pick a language that could use a lot of expansion and focus on that for a month. I still think it would be an interesting thing to try. I am curious as to whether we have exhausted the supply of translation dictionaries in the public domain. bd2412T 14:27, 20 February 2018 (UTC)

@BD2412: No way, there is so much literature we haven't even touched, coming from the context of Indian languages. But I totally agree that "language of the month" would be a good idea. —AryamanA(मुझसे बात करें • योगदान) 21:00, 20 February 2018 (UTC)

I am rather worried about what might result if people all started adding entries to a language with few native speaker editors, leaving them swamped with work just to catch our inevitable errors. —Μετάknowledgediscuss/deeds 21:02, 20 February 2018 (UTC)

Well Wyang you put my crappy graph to shame :D Thanks for your nice chart. Let us fight over the sweet sugar of the 5,500,000 milestone, I WANT IT. Equinox◑ 15:58, 21 February 2018 (UTC)

What do people think about renaming {{PIE root}} to {{root}} and changing the format to {{root|ine-pro|iir-pro|*h₃er-}}? That seems more inline with the other etymology templates, and we can potentially use it for other languages, like Sanskrit, i.e. {{root|sa|hi|घट्}}. @Rua, Erutuon, AryamanA, JohnC5 --Victar (talk) 02:02, 20 February 2018 (UTC)

@AryamanA: Yeah, {{root}} would only be used on child language entries, like Hindi entries with {{root|sa|hi|घट्}}. --Victar (talk) 02:39, 20 February 2018 (UTC)

@Victar: I'd be interested in this proposal, but @Rua is the one to convince. —*i̯óh₁n̥C[5] 03:13, 20 February 2018 (UTC)

I might suggest that the first and second params be switched, to better match the current behavior of {{bor}}, {{inh}}, etc. ‑‑ Eiríkr Útlendi │Tala við mig 02:07, 20 February 2018 (UTC)

@Eirikr, if we could do the same thing in PIE using {{ine-noun|root=*h₃er-}}, then it wouldn't be a problem switching the |lang= order, but otherwise it might be confusing, having to also use {{root|ine-pro|*h₃er-}}. Maybe not. --Victar (talk) 02:27, 20 February 2018 (UTC)

Apparently it makes sense to have a {{root}} template for roots supposed for any one language and a root template for ancestor languages used in entries of their children languages. Hence I conclude that {{PIE root}} should be generalized and there be made, just to give an example name, a template {{ancestral root}} or {{aroot}}, so that, for instance, one can use for Proto-Afro-Asiatic and Proto-Turkic the same root template as for Proto-Indo-European. So we won’t need any {{AFA root}} or {{TRK root}} to complement {{PIE root}} and this latter will be replaced by {{aroot}}, and we won’t need any {{arc-root}} or {{gez-root}} to complement {{syc-root}} and {{ar-root}} and {{HE root}} and these three latter will be replaced by {{root}}. Fay Freak (talk) 23:57, 27 March 2019 (UTC)

I broadly agree that deliberately nonstandard spellings are worth categorizing, but I'm not sure it's right to call "Amerikkka" (for example) a "misspelling", since it is deliberately/intentionally used to invoke "KKK"; it's not an error made out of ignorance. Deliberately nonstandard grammar seems harder to separate from dialectally typical grammar (of e.g. an immigrant community) from which I would expect most examples to derive; it also seems similar to uses of fossilized / no longer standard grammar ("if need be", etc), so the criteria for inclusion in such a category seem fuzzier, although they may not be a barrier to having such a category. - -sche(discuss) 21:08, 20 February 2018 (UTC)

I've just noticed that @Victar has been using the |tr= parameter to enter both transliteration and transcription of a term in Proto-Iranic entries (for example Sogdian[script needed](ʾʾsʾwk’ /āsūk/, “gazelle”)).

I like how this looks and it satisfies the need I've been talking about in previous discussions on transliteration and transcription. I would suggest that an additional parameter is added, for example |tsc=, that will produce the same formatting while allowing transliteration to be automatically generated.

This could be used for: languages written in cuneiform, sparsely attested languages written in abjads (Middle Iranian and Middle Turkic languages, Arabic Middle Mongol), Old Turkic, Khitan and Jurchen (once they're properly encoded) even Kalmyk (phonemic schwas are unwritten).

Support. It's remarkable that we've had so many discussions and conflicts regarding this, but it still has not come to pass. —Μετάknowledgediscuss/deeds 20:56, 20 February 2018 (UTC)

Support. I've been doing the same thing as Victar for Middle Persian and Old Persian. —AryamanA(मुझसे बात करें • योगदान) 21:02, 20 February 2018 (UTC)

Support: My only stipulation is that it be made clear that it shouldn't be used for IPA pronunciations. --Victar (talk) 00:32, 21 February 2018 (UTC)

I agree that the parameter should not be used for IPA, because the example /āsūk/ isn't IPA, and in order for IPA to be correctly formatted (and non-IPA not to be formatted incorrectly as IPA), the parameter can only be used for one or the other. — Eru·tuon 07:27, 2 March 2018 (UTC)

Also, I vote for |ts= (transcription). (And maybe we can rename |tr= to |tl= in the future.) --Victar (talk) 08:38, 2 March 2018 (UTC)

Support: I fully support this, if you if you are willing to solve this before I am able. Let me know if I can help with advice. Thank you! Isomorphyc (talk) 21:33, 4 March 2018 (UTC)

@Crom daba, this seems to have gotten wide support. Did you want to move forward? --Victar (talk) 04:40, 2 March 2018 (UTC)

@Crom daba, sorry, I thought you were a coder when I put that to you. Isomorphyc's module is too divergent from the standard module. @Erutuon, did you want to try your hand at implementing this? --Victar (talk) 21:20, 4 March 2018 (UTC)

@Erutuon, I threw this together. Did you want to check to see if it's too "hacky". :) {{User:Victar/Template:link|xpr|𐫙𐫢𐫗𐫇𐫍𐫡|t=grace, gratitude|ts=išnōhr}} → {{User:Victar/Template:link|xpr|𐫙𐫢𐫗𐫇𐫍𐫡|ts=išnōhr|t=grace, gratitude}}. --Victar (talk) 22:34, 4 March 2018 (UTC)

@Erutuon OK, span added to the transcription as well. I thought about having a comma, but I think it isolates it from the transliteration. Is there any reason you don't wish to implement this? --Victar (talk) 21:28, 6 March 2018 (UTC)

@Victar: I have no objection to the feature; I don't feel like working on any major projects at the moment. I might be willing to fix bugs, though. — Eru·tuon 00:42, 7 March 2018 (UTC)

@Erutuon Cool. Hope thing are OK with you. It's already coded, just looking for approval to be added to the module. If you don't have time, could you ping someone else with edit access that you trust? Thanks. --Victar (talk) 00:51, 7 March 2018 (UTC)

Is it typical for non-IPA transcriptions to be enclosed in slashes? That is likely to cause confusion about which transcriptions are IPA and which aren't, and about the purpose of the parameter: is it intended for IPA because the transcription contains slashes? Then again, there probably has to be something to distinguish the strict transliteration from the transcription in the output of the template, and slashes serve that purpose. — Eru·tuon 07:27, 2 March 2018 (UTC)

Could ⟨orthography brackets⟩ be used for the transliteration (letter-to-letter mapping), or is that also inappropriate? —suzukaze (t・c) 07:32, 2 March 2018 (UTC)

There's an example of Durkin-Meisternernst's (a single person) Dictionary of Manichean Middle Persian and Parthian that uses slashes for non IPA transcription.

MacKenzie's Pahlavi has non-IPA transcription as a headword with transliteration(s) written inside square brackets.

I agree that the ⟨orthography angle brackets⟩ can be hard to distinguish from (regular parentheses) without looking closely. What about <regular angle brackets>? As far as I know, these aren't used for anything in IPA, and they're visually more distinct. ‑‑ Eiríkr Útlendi │Tala við mig 17:21, 2 March 2018 (UTC)

I suppose I'm also just used to //. This is how entries look in {{R:xpr:DMMPP}}: hwfryʾd Pa/MP /bufrayād/ a. 'helping well, helpful; helper'. --Victar (talk) 05:05, 4 March 2018 (UTC)

Just to mention the param would need to be |tsN=, because there are often several variants. This is surely pushing it, but I was also wondering if some sort a transcription qualifier would be warranted, i.e. {{l|pal|tr=bʾčwk|ts1=bāzūk|ts2=bāzūg|tsq1=early|tsq2=late}} → [script needed](bʾčwk /bāzūk (early), bāzūg (late)/). Or, this could be pointing to the need of a separate module. --Victar (talk) 18:28, 2 March 2018 (UTC)

@Victar Why not simply {{l|pal|tr=bʾčwk|ts=bāzūk (early), bāzūg (late)}}, simpler templates seem to hold up better over time. If it gets more complicated than that (and possibly already at that point) that information should be at the main entry. Crom daba (talk) 22:28, 2 March 2018 (UTC)

As will surprise no one, since I have advocated for this multiple times in the past, I Support this proposal. It did occur to me that we should perhaps limit the languages that may use this parameter to prevent abuse. As @Erutuon worried, I think users might misuse this parameter by giving IPA transcriptions to words. Furthermore, I think that in confusion, users might use |ts= instead of the more appropriate |tr= for normal transcriptions. I think we could greatly alleviate both issues by implementing a system similar to the one used for overriding manual transliteration. In this way, we can specify which languages would allow |ts= based on the languages' less informative writing systems (partial syllabaries, abjads, cuneiform nightmare-scapes, etc.). Indeed, we could even prevent |ts= from function in the absence of |tr=. This would prevent the bad behavior of only providing a transcription instead of both—a bad habit I've noticed with some frequency. The transcription should (in my option) always be secondary to the transliteration as transcriptions are often a reconstructed abstraction from the actual, attested symbols. I'd also mention that if we implement this, some clean up will be necessary for instances where users have used |pos= to provide transcriptions for Mycenaean Greek, which prevents giving transcriptions through the overridden |tr= parameter. I think this can be mostly accomplished by finding all the uses of |pos= that contain at least 2 / characters and converting them to |ts= manually. —*i̯óh₁n̥C[5] 11:16, 5 March 2018 (UTC)

@JohnC5, yeah, that was my concern, people using this as an IPA transcription. We could make it so that it only works for languages with non-Latn scripts. I'm not sure if that's possible, or how much of a drain that query would be. --Victar (talk) 17:53, 5 March 2018 (UTC)

Those both sound like good proposals, although we'll want to spell characters correctly. —Μετάknowledgediscuss/deeds 06:28, 9 March 2018 (UTC)

@JohnC5 It has been forgotten to add |fNts= to {{head}}, which I just tried to use to update the formatting of 𒍣. Then, it can be documented there.Fay Freak (talk) 15:54, 3 August 2018 (UTC)

That was decided against in favor of simply using |ts=foo, bar --Victar (talk) 17:35, 3 August 2018 (UTC)

I don’t understand what you are telling me, @Victar. Is |fNtr= somehow an obsolete parameter? As it looks on 𒍣, it is used for differently written inflected forms that need to be added to the headword which need to be transliterated and transcribed separately, not for alternative transcriptions. Fay Freak (talk) 19:33, 3 August 2018 (UTC)

How does this wiki have policy about speak-only (non-written) languages to be collected? Must their entries be made as Latin script or IPA or not at all? --Octahedron80 (talk) 20:42, 20 February 2018 (UTC)

You should follow the conventions of any published materials that exist, such as scholarly papers or dictionaries. DTLHS (talk) 20:55, 20 February 2018 (UTC)

(e/c) We have to select an orthography for unwritten languages, and document it on the relevant About page. If only linguists have documented it, and those linguists use IPA, then it may be most appropriate to follow their lead and add entries in IPA. If they have a working orthography (maybe Latin script, to make documentation work easier), or if any orthography is being taught to the native speakers (maybe using a modification of Thai script if this is a regional language in Thailand), that should be selected instead. —Μετάknowledgediscuss/deeds 20:56, 20 February 2018 (UTC)

I dropped in to look around. Glad it's got a way to join without "signing up" or installing anything, but I do feel that a free open project like Wiktionary should use free open tech like IRC, not a closed-source thing that may require a specific client, or be restricted by the makers in future (this does happen, e.g. LogMeIn). Equinox◑ 21:42, 20 February 2018 (UTC)

Discord is nearly malware in my view, it is almost impossible to remove once you have installed it on PC. IRC, with all of its problems, is my preference. - TheDaveRoss 21:49, 20 February 2018 (UTC)

Yeah, before someone accuses me of a being a Luddite who hates graphics, sounds, etc.: I'd be fine with those things built optionally on top of free open tech. (Imagine the uproar if Wikimedia replaced e-mail as a contact medium with Facebook. Heh.) I'm gonna go on IRC RIGHT NOW and raise a ruckus <3 Equinox◑ 21:57, 20 February 2018 (UTC)

Dave, if you're afraid of installing it, you can use Discord in your browser. Eq, I can't remember whether you hate xkcd or not, so have this. —Μετάknowledgediscuss/deeds 00:06, 21 February 2018 (UTC)

That comic accuses an open-source fan of being smug and autistic for wanting free open tech. I am the exact opposite of that, which is why I dislike xkcd, which is reliably smug and autistic. The comic doesn't even have a point. Huh! (P.S. The Wiktionary IRC is still pretty good, though I go there less than once a month and see the same four or five faces. Smart faces. Heheh.) Equinox◑ 00:42, 21 February 2018 (UTC)

I would suggest that a character in the comic makes those accusations and that character is the butt of the joke, but that isn't really important. Also, it would be great if more people used the IRC if only on a semi-regular basis like Equinox. I think the Wiktionary community was closest back when a good chunk of regular contributors (20%?) were frequently able to engage in casual conversation. - TheDaveRoss 13:05, 22 February 2018 (UTC)

"Esperanza was a Wikipedia project founded on 12 August 2005..." Equinox◑ 00:41, 21 February 2018 (UTC)

That's a really weird story, but I suppose it works as a cautionary tale too. —AryamanA(मुझसे बात करें • योगदान) 22:18, 21 February 2018 (UTC)

There are rather a few self-hosting alternatives, w:Mumble (software) and w:TeamSpeak (proprietary iirc, but freeware) being two off the top of my head. (Also, Nextcloud's Talk, FLOSS implementation of Spreed, allows video conferencing - but my server is bandwidth limited to 9Mbs) The problem with Discord is the time-worn observation that if anything on the internet is free, you are the product being sold. Quite a few of us maintain servers of varying abilities on the internet, which we prefer to "free" services. - Amgine/t·e 04:26, 21 February 2018 (UTC)

I doubt many, if any, project members are going to use the Discord voice chat feature, but it makes for a good chatroom software. I'm perfectly fine with them selling everything I write, just as anyone has open access to what I write in public discussions here. It makes no difference to me. --Victar (talk) 22:40, 2 March 2018 (UTC)

Is there a reason why we don't separate English thesaurus entries from other languages? E.g. thesaurus:die and thesaurus:死亡 are put in the same category. Seems strange to me. ---> Tooironic (talk) 04:38, 21 February 2018 (UTC)

This was discussed several times (for example here, see Dan Polansky's vote and the talk page; or here), but no solution has been reached yet. --Per utramque cavernam (talk) 15:22, 21 February 2018 (UTC)

The stance on their inclusion in Biblical canons varies across definitions. Baruch and 1 and 2 Maccabees only mention being apocryphal, Sirach and Wisdom don't mention it at all, Tobit only mentions being in the Catholic canon, not Eastern Orthodox ones, and only Judith actually seems unbiased. I propose using the definition given for Tobit for all seven, but with the additional mention of the Eastern Orthodox. (Sirach is especially interesting, because the synonym Ecclesiasticus mentions some groups not considering it canonical)

As a tangential note, I also updated Appendix:Books of the Bible to include them in the Catholic canon listed. That one was also odd because even though our listed source, Catholic Online, includes them now, I checked the Wayback Machine, and it didn't when the list was previously said to have been retrieved.

Be bold and have at it! :) And I suggest you link to this discussion in your edit summaries so that anyone who wants to propose some other wording can do so in this central discussion. - -sche(discuss) 19:06, 23 February 2018 (UTC)

You may have heard already, Wikidata people are very interested by Wiktionaries data. They are now at the step of creation of a dedicate Lexeme namespace in Wikidata. Lydia, in charge of this project, call for a vote for the licencing of this new namespace. I think we wiktionarian are concerned by this vote, because it may change the kind of connections we may do between Wiktionaries and Wikidata. Lydio only offered argument pro CC0, but there is a lot of con either. I summed some there, but I call for your expertise and capacity of judgment on this matter. I think it is not some much on the legal part but on the psychological and ethical aspects we can give a different perspective, as we are and we know people that have lexicographical data to share and people that reuse Wiktionaries data.

I think we need to imagine some prospective, because they may have built some but they didn't share the potential consequences for each possibility, and I am quite worry with their agenda. In this perspective, the Wikidata team asked for a Wikilegal note about lexicographical data but it is a draft that need to be severely improve, as it doesn't include some fundamental aspects of Wiktionaries so far. Your comments on this essay are welcome too.

Well, sorry if you feel this is not of your concern. I think it can't be bad to know more, to be able to collaborate rather than be notice of a undesired change too late Noé 08:35, 23 February 2018 (UTC)

I'm curious how this will play out in practice. I'm all pro-sharing and making data available as widely possible, but this basically means that Wikidata has to start from scratch, and that the collaboration between the projects will always be complicated (at least in one direction, taking data from Wikidata is fine). But if I write a bot to update Wikidata items from Wiktionary I would technically violate licensing terms. – Jberkel 10:24, 23 February 2018 (UTC)

For the last part, yes, and I am curious to know how they will prevent the violation of SA in CC BY-SA. For the first part, have some pieces of a page written in CC0 but displayed as CC BY-SA may also be considered as copyfraud, I think. So, both project may be independent and not compatible in any way. Strange. Noé 12:47, 23 February 2018 (UTC)

There would be no issue with using CC0 content within any other context. - TheDaveRoss 14:10, 23 February 2018 (UTC)

True, if there was a scrupulous curation of data which would aim at not including infringing material. But so far Wikidata is just making massive import regardless of license of the source. They didn't went as far as dumping attentive communities like OSM, but seems reckless about massive extract from misc. Wikipedia for example. This cast doubtfulness on the legality of the whole database, which propagate to any project using it. --Psychoslave (talk) 16:12, 27 February 2018 (UTC)

I am rather concerned by this proposal, but also how all these people interested in lexicographical data have not even bothered to engage with the bigger Wiktionaries, where lexicographical data are handled. @Lydia Pintscher (WMDE), can you explain this? —Μετάknowledgediscuss/deeds 22:11, 23 February 2018 (UTC)

@Pigsonthewing: Léa has not edited here since September. Wikidata is a different wiki, and does not count as engaging with this community. We are not informed of what goes on. —Μετάknowledgediscuss/deeds 20:31, 27 February 2018 (UTC)

I think it's sad that licensing issues led to this situation, but I don't know the best way out of it.

Question: do contributors to Wikimedia projects have the rights to republish their contributions under a more permissive license? If some users think that it is acceptable (or even preferable) that their work is published under CC0, that should slightly reduce the issue of duplication.

The consequences of this proposal for the wiktionaries if it goes through, and of the introduction of lexicographic data on Wikidata more generally, is difficult to predict. Some amount of time that users would otherwise spend working on the wiktionaries locally will likely be lost through them working on Wikidata instead. On the other hand, a lot of work might become redundant locally through the increased efficiency that centralisation of data in theory is capable of providing. Furthermore, some amount of users that would otherwise work little or not at all with lexicographic data on the wiktionaries could end up working a lot with such data on Wikidata because that format appeals more to them (and they could also end up doing more work (or indeed any at all) directly on the wiktionaries as a consequence of this).

The relative strengths of such effects are difficult to predict, and hence whether the introduction of lexicographic data on Wikidata will have a net positive or net negative effect on the wiktionaries. --Njardarlogar (talk) 11:21, 25 February 2018 (UTC)

My 2 cents. The same way that you can import public domain data to Wiktionary under CC-BY-SA, there is no problem importing CC0 data from Wikidata. Other way round, you can not republish CC-BY-SA data under CC0. The underlying question is that facts are not copyrightable. So, what is a fact and what is a creative creation in Wiktionary? As far as I undertood, Wikidata will import facts as the information in the heading line and lexical categories, but not definitions at all. Other data as pronunciation may be in the border line. --Vriullop (talk) 17:02, 27 February 2018 (UTC)

When should we mark a term with plurale tantum as opposed to using {{en-plural noun}} (which produces the gloss "plural only")? I tend to prefer the latter as it avoids jargon. Is there a real difference? Equinox◑ 13:13, 24 February 2018 (UTC)

I don't think there is a real difference. For English, at least, our users would probably prefer "plural only", which doesn't need much explanation. DCDuring (talk) 16:32, 24 February 2018 (UTC)

I would say "plurale tantum" should be automatically converted to "plural only". Andrew Sheedy (talk) 19:08, 24 February 2018 (UTC)

Yes, whichever wording we decide on should be displayed by both templates/values (templates should accept one as an alias of the other). This problem has been noted for ten years, by the way! - -sche(discuss) 23:09, 24 February 2018 (UTC)

I have changed it to "plural only" in all the modules I could find, and will now look at entries. (Ideally, all templates/modules that currently accept one should be made to accept both as input, and just display plural only for both.) After ten years, let's finally fix this! - -sche(discuss) 22:37, 12 March 2018 (UTC)

Inconsistent and confusing romanisation formats given by various templates and modules[edit]

Example: русский(russkij), where about half of the romanisations are italic, and half are not.

Something I have wondered for a long time ― why is there a need to format romanisations differently in {{l}}, {{m}} and {{head}}? Why not italicise romanisations by default? And, is it really necessary to have both {{l}} and {{m}} for languages written in scripts not affected by italicisation? Wyang (talk) 00:30, 25 February 2018 (UTC)

@Wyang: I believe the notion is that for scripts which we don't italicize in mentions (Russian, Greek, etc.), we italicize the romanization to show the distinction between the mention and non-mention formats. Was it @Erutuon who implemented this? —*i̯óh₁n̥C[5] 00:46, 25 February 2018 (UTC)

@JohnC5: I think transliteration was italicized before I started messing with stuff. I just added extra classes to transliteration so that it could be located by CSS and JavaScript. — Eru·tuon 02:28, 25 February 2018 (UTC)

I think {{l}} and {{m}} have been generating differently formatted romanisations like this for quite some time, although I never really understood why romanisations are unitalicised in {{l}} and {{head}}. It is inconsistent and looks unprofessional on entries, when romanisations are differently formatted, some are rússkij and some are rússkij. Wyang (talk) 02:42, 25 February 2018 (UTC)

I'll chime in from a Japanese-entry editor standpoint to state that I agree that the difference is weird, and I'd prefer it if {{l}}, {{m}}, and {{head}} were aligned to show romanizations in italics. ‑‑ Eiríkr Útlendi │Tala við mig 04:35, 25 February 2018 (UTC)

The way in which we reconstruct PIE's morphology is anachronistic. There is a general consensus that Anatolian left early, and that many features of traditional reconstructions are really post-PIE innovations, such as the feminine gender, the optative & subjunctive moods, the reconstructed dative and ablative plurals, dual number(?). I think we should update our terminology, and reconstruct PIE to two stages. One would be the closest common ancestor of all IE languages excluding Anatolian (Proto-Nuclar-Indo-European), and the other would be the common ancestor between PNIE and PAnatolian (Proto-Indo-European). This would mean that we would need to move all PIE pages to PNIE, and remove the Anatolian descendants and place them under cognates or add the to the etymology section. I realize that this idea is probably going to face a lot of opposition, but I believe it's necessary if we want to accurately represent PIE. What do you think? @Rua, JohnC5, Victar, AryamanA, Mahagaja, Chuck Entz. --Tom 144 (𒄩𒇻𒅗𒀸) 01:07, 25 February 2018 (UTC)

I support this, having proposed this before. —*i̯óh₁n̥C[5] 01:46, 25 February 2018 (UTC)

Right, I forgot to credit you and to link to the original discussion.--Tom 144 (𒄩𒇻𒅗𒀸) 02:13, 25 February 2018 (UTC)

I oppose moving all of our PIE material to "PNIE", that's just ridiculous. I encourage you to add the relevant information to our entries, perhaps under the Reconstruction sub-heading, and generally think of reconstruction pages as places to organize our current knowledge about an etymon rather than absolutely final representations of words as they really were.

If this information really is that huge and incompatible with our current PIE entries, then I'd approve of adding an Indo-Hittite (I don't like this term either) language to host it. 03:22, 25 February 2018 (UTC) —This unsigned comment was added by Crom daba (talk • contribs).

I think that would be quite a drastic move, even if it is more accurate. Maybe an Indo-Hittite language would be better for the common ancestor of PIE and PAnatolian. Also pinging @माधवपंडित who is more knowledgeable in PIE (and Old Indo-Aryan) than me. —AryamanA(मुझसे बात करें • योगदान) 03:38, 25 February 2018 (UTC)

I proposed this somewhere, but the term Indo-Hittite isn't very popular. Maybe we could work around this using synonymous terminology such as Early-PIE and late-PIE, but this terminology is generally used by revival sites, and after reading about them is difficult to take them seriously. Proto-Indo-Anatolian does not sound better. Certainly the most aesthetic solution is to redirect everything. --Tom 144 (𒄩𒇻𒅗𒀸) 03:57, 25 February 2018 (UTC)

Like @AryamanA said, this is a very big change. Because of the human tendency to resist change, this idea is alarming to me right now (moving most PIE entries to a new and comparatively unheard of language, PNIE!), but this is not something I cannot get behind. If the literature confirms this, we can implement this change. Wiktionary should strive to be correct and consistent so that in the future people can refer to and rely on Wiktionary's information rather than having to consult various sources. -- माधवपंडित (talk) 05:29, 25 February 2018 (UTC)

@माधवपंडित: I'll leave this quotes from respected authors that support the view:

"Interestingly, there is by now a general consensus among Indo-Europeanists that the Anatolian subfamily is, in effect, one half of the IE family, all the other subgroups together forming the other half; and it is beginning to appear that within the non-Anatolian subgroup, Tocharian is the outlier against all other subgroups." – Ringe, Don (2006) From Proto-Indo-European to Proto-Germanic, Oxford University Press, page 5

"If we compare the New Zealand tree of IE with the Pennsylvania tree, we see that they share some fundamentals on the interrelationship of the IE languages. In both models, the first split in the tree is between the Anatolian group of languages and all the others, and the second is between Tocharian and the rest of the family. This is in accordance with the views of the majority of Indo-Europeanists at present. Anatolian is radically different from the rest of the family in many respects..." – Clackson, James (2007) Indo-European Linguistics: An Introduction (Cambridge Textbooks in Linguistics), Cambridge: Cambridge University Press, page 13

"Support for the Indo-Hittite scenario (sometimes under a different name) has increased in recent years (since 1995). There is a growing body of evidence which is best explained on the assumption that Proto-Anatolian did not share all the common changes which characterize the other IE languages." – Beekes, Robert S. P. (2011) Comparative Indo-European Linguistics: An Introduction, revised and corrected by Michiel de Vaan, 2nd edition, Amsterdam, Philadelphia: John Benjamins Publishing Company, page 31

"The ‘Indo-Hittite hypothesis’ has been much discussed over the years, even resulting in a monograph (Zeilfelder 2001). Although at first scholars were sceptical, in the last decade it seems as if a concensus is being reached that the Anatolian branch indeed was the first one to split off of the Proto-Indo-European language community." – Kloekhorst, Alwin (2008) Etymological Dictionary of the Hittite Inherited Lexicon (Leiden Indo-European Etymological Dictionary Series; 5), Leiden, Boston: Brill, →ISBN, page 22

"But evidence has been growing that Anatolian split off at a time when the development of some of these categories (such as tthe s-aorist) was only nascent." – Fortson, Benjamin W. (2004) Indo-European Language and Culture: An Introduction, first edition, Oxford: Blackwell, page 155

Experts from all schools of thought and all Universities tend to agree on this topic. I do not know any respectable source that argues against this.--Tom 144 (𒄩𒇻𒅗𒀸) 15:00, 25 February 2018 (UTC)

Are we going to be able to say anything different in the new framework that we couldn't before, or are we just putting a new (albeit more accurate) label on the same can? What PIE content do we have that's identifiably not NPIE? After all, we can't call anything IE if it's not attested in some form in a descendant of NPIE, so if NPIE is an abstraction, then PIE is an abstraction of an abstraction.

A change that makes what all the easily-accessible sources call PIE different from what we call PIE is going to require extra effort to avoid confusion. I'm not saying we shouldn't do it, but we need to be sure we're clear about what we're doing, and why we're doing it. Chuck Entz (talk) 06:10, 25 February 2018 (UTC)

@Chuck Entz: It's not really about what PIE has that PNIE does not, but rather the other way around. PNIE has a feminine gender, an optative, subjunctive, a well formed dual, simple thematic verbs, a perfect, adjectives in *-to-, comparatives in *-yos-. While PIE doesn't have any of those things, and some of them had different meanings (such as the perfect < stative). A hole bunch of suffixes and word formations are not reconstructible for PIE.

Those are the issues concerning morphology, but we also have the lexicon. Not all etymons reconstructed in wiktionary have descendants in Anatolian. As a rule of thumb, we shouldn't extrapolate a secure PNIE reconstruction, even if it has a morphological archaic look. For example, the word "*udōr" ~ "*udnés" is reconstructible for PNIE with a "plural" collective in "-eh₂", if we assumed that because r/n stems are archaic, then we can extend it to PIE we would be mistaking, because Anatolian evidence actually show that "*udōr" was instead the collective of *wódr̥. We should only reconstruct something for PIE when we can do it for PAnatolian and PNIE too.

There shouldn't be any confusion since we are not changing the definition of PIE. It is still the closest common ancestor of all IE languages including Anatolian. We are just adding the extra term PNIE for reconstructions that are not suitable for Anatolian, which happen to be the majority.

I understand skepticism, honestly I thought this would get a lot more opposition. But I believe this will have to be fixed eventually, we cannot just simply ignore the contradictions in our reconstructions --Tom 144 (𒄩𒇻𒅗𒀸) 16:04, 25 February 2018 (UTC)

I'm not skeptical. I'm well aware of all that. The main question is whether we need to rework our basic structure in response rather than noting it in the entries where relevant. The point about PNIE content that doesn't apply to Anatolian is a good one, though. Chuck Entz (talk) 16:52, 25 February 2018 (UTC)

As to whether we're changing the definition of PIE: we're changing the definition of the part of PIE that most people are interested in. I also don't look forward to our having to go through all the etymologies and decide whether a reconstruction is PIE s.l. or NPIE. It seems to me like incorporating the distinction into our structure will force us into making that judgment in order to avoid misrepresentation. We're going to have to decide whether w:Schroedinger's cat is alive or dead, rather than leaving it conveniently undefined.Chuck Entz (talk) 17:10, 25 February 2018 (UTC) Update: The question I asked last night basically eliminates this concern: if everything we have inherently includes NPIE by virtue of requiring presence in some form in NPIE to be IE, there's no need to decide whether it's also PIE in the broader sense. Chuck Entz (talk) 21:30, 25 February 2018 (UTC)

This does seem like the sort of thing that would be easier to handle with labels. @Florian Blaschke may wish to comment; when I googled thhe phrase "Nuclear Indo-European" one of the results was him opining on Wikipedia that "There is no clear evidence for a 'Nuclear Indo-European' excluding Anatolian, either; instead, Anatolian can simply be a peripheral branch that became isolated early on and did not take part in some developments common to all or most other languages, without those forming a monophylum, but instead a dialect continuum or language area like early medieval Common Slavic or Old High German through which innovations could still spread (and instead, Anatolian went through innovations of its own)." - -sche(discuss) 17:41, 25 February 2018 (UTC)

@-sche: Mmm… I don't understand how does this contradict the existence of PNIE. Pooth has some ideas that may be similar, but easier to understand. He argues that during the period when PIE branches where splitting, they still had some common innovations. He calls this Vulgar-PIE, it's not the closest common ancestor of anything, but a "dialectal continuum". Of course that, he makes this assumption to account for the absence on any PIE branch with an ergative alignment, a partitive in "*-ém", an allative in "-m", a absolutive plural in "-e", a locative plural in "-is", an allative plural in "*-ms", an sociative-associative plural in "-eh₁",and many other weird things he reconstructs. In other words, he does not believe this because he has very good arguments that support his view, but because he needs to assume this so his reconstructions don't fall apart. It wouldn't be too crazy to say that he is biased. The dialectal continuum is necessary for PNIE's subgrouping after the split of Tocharian. We know this because common innovations such as satemization and centumization, double dentals, northern replacement of *bh by *m, augment, and others cannot be traced back to a tree model. But unsurprisingly, PAnatolian does not figure in those issues. PAnatolian has the triple reflex, does not have neither of *-bh- nor *-m-, and shows no trace of an augment. There are plenty of PNIE common innovations that have no trace in Anatolian. As I showed in the above citations, This isn't a controversial issue, there is a wide consensus over this topic.--Tom 144 (𒄩𒇻𒅗𒀸) 19:45, 25 February 2018 (UTC)

I don't see what's so hard to understand about that. Anatolian was definitely spoken in Asia Minor throughout the 2nd mill. BC and most probably in the 3rd mill. BC too (at least that's what most people seem to assume nowadays, who might even go back further, into the 4th mill. BC). In the Corded Ware period, Indo-European must basically have been a vast dialect continuum centred on Europe (Tocharian is thought to have been isolated in Asia early too, as early as c. 3000 BC, prior to the expansion of Indo-Iranian out of Eastern Europe). Innovations can spread through such a continuum. Anatolian was geographically isolated, separated from this continuum by the Black Sea, the Sea of Marmara and the Aegean, so you would not expect it to participate in common Nuclear IE innovations. There is no clear evidence for a "Proto-Nuclear-IE", and when Anatolian (and Tocharian) split off, dialectal divisions within the remainder, NIE, can already have existed, such as the isogloss middle endings in *-y (Indo-Iranian, Greek, Germanic) vs. middle endings in *-r (Italic, Celtic, Tocharian, Phrygian), where Anatolian plainly belongs to the second group. In fact, the *-bʰ-/*-m- story is not so clear; there is evidence for both inside Anatolian too, and in fact inside other branches, which contradicts the idea of a simple ancient isogloss separating the "m-branches" Germanic and Balto-Slavic from the rest, and there is an alternative way to account for the alternation (see here, jump to ch. 3 starting on p. 178).

Consider a parallel: Old Icelandic was closest to Old West Norwegian, Old Norse already exhibiting clear dialectal divisions at the period of our earliest literary attestations (12th century), but Icelandic has been largely isolated from the mainland languages since after the high medieval period, and thus did not participate in the changes of the late medieval period that saw the strong influence (mostly but not exclusively lexical) from Middle Low German in Mainland Scandinavian and an erosion of word endings and morphology, as well as common phonological developments. So Mainland Scandinavian now superficially looks like it descends from a "Proto-Mainland-Scandinavian" and Icelandic separated before, but this division is not the historical truth, as we know. --Florian Blaschke (talk) 03:45, 26 February 2018 (UTC)

Also, even if you do decide to believe in Indo-Hittite sensu stricto (in the absence of compelling evidence for it, it's better to default to treating Anatolian like any other primary branch), Anatolian is poorly attested outside of Hittite, and even the Hittite evidence is often sketchy, obfuscated by the script. Various lexemes are suspected to be hidden under Akkadograms and Sumerograms. Many numerals attested everywhere else, even in Tocharian, are missing, but sometimes indirect evidence is found that may however be debated. I don't think you'll find much enthusaism for scrapping the numerals. Relying that strongly on the absence of evidence in Anatolian is a terrible idea. Just because something is missing in Anatolian does not mean that PIE never had it; accidental failure of attestation, or loss, is always possible.

It's a lot like Proto-Germanic. Many lexemes well-attested in West Germanic, and often also in North Germanic, are missing in Gothic (including Crimean Gothic), but there is frequently no other, more compelling or inherent reason not to reconstruct them for Proto-Germanic. Earlier generations of scholars refused to reconstruct any lexeme for Proto-Germanic not attested in Gothic, but Kroonen has no such compunctions, he does reconstruct even many lexemes limited to West Germanic for Proto-Germanic, and so do we. We even reconstruct the instrumental as a noun case for Proto-Germanic, even though Gothic lacks it (Old High German has it), so even grammatical categories are affected! This is a methodical decision, but it makes sense (new evidence keeps cropping up, and not limiting reconstructions unreasonably helps). Although it is generally accepted now that Gothic is the first split from Germanic, its attestation is limited. Gothic may never have had the instrumental as a noun case, or at least some of these lexemes, but we don't care. It could have lost them; it happens. Methodical fundamentalists may object to this practice, but Wiktionary has taken a stand already and sided with the younger scholars who view the issue differently, preferring to err on the side of inclusivity. --Florian Blaschke (talk) 04:16, 26 February 2018 (UTC)

I agree with Chuck, which is that you seem to be missing the point, Tom. You continue to muster more evidence, but we're talking about how to build an effective dictionary. For the time being, people will look for this information under PIE, and we have a pretty good infrastructure for keeping it as PIE. That means that the wisest course of action is probably finding some way to mark entries (with a context label, or maybe a custom template?) so that readers know to what level a reconstructed word or morphological feature can actually be reconstructed. —Μετάknowledgediscuss/deeds 20:04, 25 February 2018 (UTC)

If we do decide to keep a unified PIE, I think we should definitely include a disclaimer that what we call PIE is really NPIE unless we specifically say otherwise, and that applies especially to inflection sections. This reminds me a lot of Ancient Greek, where the Attic dialect, which had several significant (though far less so than in NPIE) innovations not found in the other dialects, eclipsed all the others during the Hellenistic period. We don't call Koine and Byzantine Greek "Nuclear" or "Attic" by name, even though they clearly are. Perhaps should mark the NPIE-specific morphology the way we do the Attic declensions and other Attic-specific inflectional morphology. Chuck Entz (talk) 21:30, 25 February 2018 (UTC)

Well, PIE entries do go listed under one same reconstruction page. We could separate them under different headings, and of course, add a disclaimer or something clarifying the different reconstructed stages. That way we can include both PIE and PNIE without changing much of the current infrastructure. Since all PIE pages would presuppose a PNIE page, then the obvious solution is to keep the PNIE form as a lemma. That way we wouldn't need to redirect anything, and we would present both stages whenever possible in a consistent way. If the heading solution isn't supported, then I would obviously support the greek dialectal analogical solution. --Tom 144 (𒄩𒇻𒅗𒀸) 21:53, 25 February 2018 (UTC)

I don't know a whole lot about this topic, but I would oppose introducing either PNIE or PIH as a new proto-language. I think we can say everything we need to say with usage notes and labels while retaining ine-pro as the only IE ancestor language. —Mahāgaja(formerly Angr) · talk 07:26, 26 February 2018 (UTC)

Should I start a vote about the usage of lables then? --Tom 144 (𒄩𒇻𒅗𒀸) 22:50, 2 March 2018 (UTC)

I believe Florian Blaschke pointed out the difficulty involved in attaching a label. There already is a disclaimer on every PIE page linking to Wiktionary:About_Proto-Indo-European, which could use such information (I believe the template understates the uncertainty of the reconstructions, too).

Ideally, the list of descendants should already show whether Hittite or else is included under the root. You would only double your work if additional labels should be added. In one way, the appendix is just a way to collect cognates under a root. There is also the perspective that the roots represent parts of PIE that might have been spoken that way for real. However given the variability of language as we know it, it would be fantastic to have a language as consistent as implied, over thousands of years -- so it should be immediately clear to everyone with more than a passing interested that the *PIE label is highly, let's say, probabilistic. A PNIE label might only add to the confusion (read: my confusion, for what it's worth, if nothing else, etc. p. p.). Rhyminreason (talk) 03:30, 29 March 2018 (UTC)

Should there be any standardization on how these entry names should be formatted? — justin(r)leung{ (t...) | c=› } 05:12, 26 February 2018 (UTC)

Standardization seems like a good idea to me. —suzukaze (t・c) 21:10, 27 February 2018 (UTC)

I think we should strive to use the dash-separated full names whenever possible──it is pretty much the standard practice in education. Wyang (talk) 23:25, 27 February 2018 (UTC)

I would suggest redirects from whichever format isn't used (generally or for any particular/exceptional entries) to whichever format is used, if both formats are attested. - -sche(discuss) 00:49, 28 February 2018 (UTC)

In April I will go to Berlin to assist to the Wikimedia Conference as Wiktionary "representative". I quote the term as I do understand that not everybody, myself include, like this terminology, as I obviously can't represent our whole diversity. But nonetheless I would like to be indeed as representative of the Wiktionary community as I can, so I come here to ask your feedback. It's important to me to go there being confident that I gathered a decent overview of the Wiktionarian community goals, issues and needs, as well as any point that make consensual agreement as being important.

There is already a set of question to prepare the conference on Meta. What are the messages you want to be see passed there? What information do you want to get? You can reply to the previous set of question on Meta, but you are also welcome to reply freely here, I'll do my best to work with you.

Please spread the word in any Wiktionary where you are active, or even where you can simply translate this message, translated feedback would be also an extremely welcome input.

Be bold with notifications anytime you think there is something I should read regarding this topic, and feel free to ask me anything you want. --Psychoslave (talk) 16:44, 27 February 2018 (UTC)

Anybody want to have a little bit of fun on the side? I started a subpage of my user page that anyone can edit. I'm attempting to document Wiktionary username etymologies. User:PseudoSkull/Etymologies of usernames Please edit this page with any further information you know. You can put yourself on the page, too, if I forgot to add you, and do not feel hesitant to edit it. This is actually something I'm always curious about. I'm good at memorizing peoples' usernames, but sometimes I really wonder hard where those usernames came from. PseudoSkull (talk) 20:34, 27 February 2018 (UTC)

The latter is currently a subcat of the former, but this is a mistake; prepositional phrases ≠ phrasal prepositions. So if nobody objects, I'm going to remove this. --Per utramque cavernam (talk) 11:50, 28 February 2018 (UTC)

@Per utramque cavernam: It might be best to keep it in the category so that it can be easily found, but to make the breadcrumbs show "Phrases" instead. — Eru·tuon 18:51, 28 February 2018 (UTC)

I know that Google is not an acceptable reference (as per WT:REF), but could we use number of Google results as a kind of unofficial criterion? For example, "exsercize" (for some reason how I spell exercise) has only 856 results on Google, but "excercise" has 26.8 million, so I think it's obvious that exercise should be included but not "exsercize". If this seems acceptable, we could compare "misspelling" to "accepted spelling" and come up with a ratio maximum (minimum?). For example, zymography:zimography is about 197:1 based of Google searches, so it seems acceptable to include despite zimography only having 1,560 results.

Of course, this is not a perfect method, as a lot of misspelling results are just mentions, and for misspellings that are words in their own right, this method is useless (e.g. collage for college, homophones), and misspellings could be proper spellings of terms not in Wiktionary (e.g. zuchetto; "zuchetto" returns many surnames). But I think it gives good rough estimate on how widely-used misspellings are. – Gormflaith (talk) 15:57, 28 February 2018 (UTC)

No, the "number" of Google results is utter bullshit and shouldn't be used for anything. Google Books ngrams are more useful. DTLHS (talk) 19:02, 28 February 2018 (UTC)

Just try a search that claims that there are, say, a thousand hits on Google Books. Then page through the results as quickly as you can. I just did it for the word figpecker. (Don't ask.) The first page said 4,000 hits, the last page said 430. Sometimes the results are not as dramatic and sometimes much more. DCDuring (talk) 02:37, 1 March 2018 (UTC)

Unfortunately, even apart from the result-counting problems, there are serious numbers of content-free spam pages that copy each other, especially ones that use rare words to try to attract specialised searchers. Equinox◑ 10:35, 2 March 2018 (UTC)