http://jktauber.com/feed/all/atom/J. K. Tauber: at the intersection of computing, linguistics, philology, and learning science2019-04-30T03:40:08Zhttp://jktauber.com/2019/04/30/tour-greek-morphology-part-28/A Tour of Greek Morphology: Part 282019-04-30T03:40:08Z2019-04-30T03:40:08ZJames Tauber
<p>Part twenty-eight of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part twenty-eight of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In this post, we look systematically at the imperfect active distinguishers in much the same way as we did the present active distinguishers in <a href="https://jktauber.com/2017/08/26/tour-greek-morphology-part-13/">Part 13</a>.</p>
<p>Before we summarise all the distinguisher paradigms we&rsquo;ve seen so far, there are actually three forms in the SBLGNT not covered yet: εἰσῄει, παρῆσαν, and συνῆσαν (all in Luke/Acts). εἰσῄει is from εἰς+εἶμι (making it a compound of <strong>IA-11</strong>) and παρῆσαν is παρά+εἰμί (making it a compound of <strong>IA-10</strong>). In our text, συνῆσαν is from σύν+εἰμί but <em>could</em> be from σύν+εἶμι. Either way, for completeness we need to add <strong>IA-10-COMP</strong> and <strong>IA-11-COMP</strong>.</p>
<p>So with those, here are all the imperfect active distinguisher paradigms we&rsquo;ve discussed:</p>
<table class="table table-condensed table-bordered">
<tr><th> <th>IA-1 <th>IA-2 <th>IA-3 <th>IA-4 <th>IA-5 </tr>
<tr><th>1SG <td>Xον <td>Xουν <td>Xουν <td>Xων <td>Xων </tr>
<tr><th>2SG <td>Xες <td>Xεις <td>Xους <td>Xᾱς <td>Xης </tr>
<tr><th>3SG <td>Xε(ν) <td>Xει <td>Xου <td>Xᾱ <td>Xη </tr>
<tr><th>1PL <td>Xομεν <td>Xοῦμεν <td>Xοῦμεν <td>Xῶμεν <td>Xῶμεν </tr>
<tr><th>2PL <td>Xετε <td>Xεῖτε <td>Xοῦτε <td>Xᾶτε <td>Xῆτε </tr>
<tr><th>3PL <td>Xον <td>Xουν <td>Xουν <td>Xων <td>Xων </tr>
</table>
<table class="table table-condensed table-bordered">
<tr><th> <th>IA-6 <th>IA-7 <th>IA-8 <th>IA-9 <th>IA-9b </tr>
<tr><th>1SG <td>Xῡν <td>Xην/Xειν <td>Xουν <td>Xην <td>Xην </tr>
<tr><th>2SG <td>Xῡς <td>Xεις <td>Xους <td>Xης <td>Xης/Xησθα </tr>
<tr><th>3SG <td>Xῡ <td>Xει <td>Xου <td>Xη <td>Xη </tr>
<tr><th>1PL <td>Xυμεν <td>Xεμεν <td>Xομεν <td>Xαμεν <td>Xαμεν </tr>
<tr><th>2PL <td>Xυτε <td>Xετε <td>Xοτε <td>Xατε <td>Xατε </tr>
<tr><th>3PL <td>Xυσαν <td>Xεσαν <td>Xοσαν <td>Xασαν <td>Xασαν </tr>
</table>
<table class="table table-condensed table-bordered">
<tr><th> <th>IA-10 <th>IA-11 <th>IA-10-COMP <th>IA-11-COMP </tr>
<tr><th>1SG <td>ἦ/ἦν <td>ᾖα/ᾔειν <td>Xῆ/Xῆν <td>Xῇα/Xῄειν </tr>
<tr><th>2SG <td>ἦς/ἦσθα <td>ᾔεις/ᾔεισθα <td>Xῆς/Xῆσθα <td>Xῄεις/Xῄεισθα </tr>
<tr><th>3SG <td>ἦν <td>ᾔει(ν) <td>Xῆν <td>Xῄει(ν) </tr>
<tr><th>1PL <td>ἦμεν <td>ᾖμεν <td>Xῆμεν <td>Xῇμεν </tr>
<tr><th>2PL <td>ἦτε <td>ᾖτε <td>Xῆτε <td>Xῇτε </tr>
<tr><th>3PL <td>ἦσαν <td>ᾖσαν/ᾔεσαν <td>Xῆσαν <td>Xῇσαν/Xῄεσαν </tr>
</table>
<p>It will be worth taking some future posts to talk about the -σθα ending that crops up in the <strong>2SG</strong> as well as some of the more extraordinary forms in <strong>IA-10</strong> and <strong>IA-11</strong> (along with compounds).</p>
<p>But for now, just capturing the common element in each row (like we did in Part 13):</p>
<table class="table table-condensed table-bordered">
<tr><th> <th nowrap>IA-1 <th nowrap>IA-2 <th nowrap>IA-3 <th nowrap>IA-4 <th nowrap>IA-5 <th nowrap>IA-6 <th nowrap>IA-7 <th nowrap>IA-8 <th nowrap>IA-9 <th nowrap>IA-10 <th nowrap>IA-11 </tr>
<tr><th>1SG <td colspan=11 class="text-center">-ν </tr>
<tr><th>2SG <td colspan=9 class="text-center">-ς <td colspan=2 class="text-center">-ς/-σθα </tr>
<tr><th>3SG <td colspan=9 class="text-center">- <td colspan=2 class="text-center">-(v) </tr>
<tr><th>1PL <td colspan=11 class="text-center">-μεν </tr>
<tr><th>2PL <td colspan=11 class="text-center">-τε </tr>
<tr><th>3PL <td colspan=5 class="text-center">-ν <td colspan=6 class="text-center">-σαν </tr>
</table>
<p>As with the present active paradigms, some cells across inflectional classes have identical distinguishers and so those cells alone can&rsquo;t identify the inflectional class (and hence all the other forms in that class). In particular:</p>
<ul>
<li>The <strong>1SG</strong> can&rsquo;t distinguish within the set {<strong>IA-2</strong>, <strong>IA-3</strong>, <strong>IA-8</strong>} or within the set {<strong>IA-4</strong>, <strong>IA-5</strong>} or within the set {<strong>IA-7</strong> (if η), <strong>IA-9</strong>}</li>
<li>The <strong>2SG</strong> and <strong>3SG</strong> can&rsquo;t distinguish within the set {<strong>IA-2</strong>, <strong>IA-7</strong>} or within the set {<strong>IA-3</strong>, <strong>IA-8</strong>} or within the set {<strong>IA-5</strong>, <strong>IA-9</strong>}</li>
<li>The <strong>1PL</strong> can&rsquo;t distinguish within the set {<strong>IA-2</strong>, <strong>IA-3</strong>} or within the set {<strong>IA-4</strong>, <strong>IA-5</strong>} or within the set {<strong>IA-1</strong>, <strong>IA-8</strong>}</li>
<li>The <strong>2PL</strong> can&rsquo;t distinguish within the set {<strong>IA-1</strong>, <strong>IA-7</strong>}</li>
<li>The <strong>3PL</strong> can&rsquo;t distinguish within the set {<strong>IA-2</strong>, <strong>IA-3</strong>} or within the set {<strong>IA-4</strong>, <strong>IA-5</strong>}</li>
</ul>
<p>The distinctions from <strong>IA-7</strong> on up are less important because they are tiny, non-productive classes. Looking at just <strong>IA-1</strong> through <strong>IA-6</strong>:</p>
<ul>
<li>{<strong>IA-2</strong>, <strong>IA-3</strong>} can&rsquo;t be distinguished by <strong>1SG</strong>, <strong>1PL</strong>, or <strong>3PL</strong> but <em>can</em> by <strong>2SG</strong>, <strong>3SG</strong>, or <strong>2PL</strong>.</li>
<li>{<strong>IA-4</strong>, <strong>IA-5</strong>} also can&rsquo;t be distinguished by <strong>1SG</strong>, <strong>1PL</strong>, or <strong>3PL</strong> but <em>can</em> by <strong>2SG</strong>, <strong>3SG</strong>, or <strong>2PL</strong>.</li>
</ul>
<p>So at least for the first six classes, any of <strong>2SG</strong>, <strong>3SG</strong>, or <strong>2PL</strong> uniquely identifies the class (at least within the imperfect active system).</p>
<p>It is interesting then that the <strong>2SG</strong> and <strong>3SG</strong> are the very cells most likely to cause confusion within the sets {<strong>IA-2</strong>, <strong>IA-7</strong>}, {<strong>IA-3</strong>, <strong>IA-8</strong>}, and {<strong>IA-5</strong>, <strong>IA-9</strong>} and in those cases, it is the <strong>1PL</strong> or <strong>3PL</strong> that can come to the rescue in identifying the class (although the value of X itself can do that given the tiny size of the <strong>IA-7</strong>, <strong>IA-8</strong> and <strong>IA-9</strong> classes).</p>
<p>If we try to group our classes along the lines we did in <a href="https://jktauber.com/2017/08/26/tour-greek-morphology-part-13/">Part 13</a>, we get a hierarchy very similar to that in the present:</p>
<table class="table">
<tr><td colspan=3><b>IA-</b>{<b>1</b>, <b>2</b>, <b>3</b>, <b>4</b>, <b>5</b>} <td colspan=3><b>3PL</b> in -ν; <b>1SG</b> and <b>3PL</b> identical
<tr><td>&nbsp;<td colspan=2><b>IA-</b>{<b>2</b>, <b>3</b>, <b>4</b>, <b>5</b>} <td>&nbsp;<td colspan=2>long vowels before the endings; circumflexes in the <b>1PL</b> and <b>2PL</b>
<tr><td>&nbsp;<td>&nbsp;<td><b>IA-</b>{<b>2</b>, <b>3</b>} <td>&nbsp;<td>&nbsp;<td>ου in <b>1SG</b>, <b>1PL</b>, and <b>3PL</b>
<tr><td>&nbsp;<td>&nbsp;<td><b>IA-</b>{<b>4</b>, <b>5</b>} <td>&nbsp;<td>&nbsp;<td>ω in <b>1SG</b>, <b>1PL</b>, and <b>3PL</b>
<tr><td colspan=3 nowrap><b>IA-</b>{<b>6</b>, <b>7</b>, <b>8</b>, <b>9</b>, <b>9b</b>, <b>10</b>, <b>11</b>, <b>10-COMP</b>, <b>11-COMP</b>} <td colspan=3> <b>3PL</b> in -σαν
<tr><td>&nbsp;<td colspan=2><b>IA-</b>{<b>6</b>, <b>7</b>, <b>8</b>, <b>9</b>}<td>&nbsp;<td colspan=2> <b>2SG</b> only in -ς
<tr><td>&nbsp;<td colspan=2><b>IA-</b>{<b>9b</b>, <b>10</b>, <b>11</b>, <b>10-COMP</b>, <b>11-COMP</b>}<td>&nbsp;<td colspan=2> <b>2SG</b> in -ς/-σθα
</table>
<p>along with cross-cutting categories such as:</p>
<table class="table">
<tr><td><b>IA-</b>{<b>2</b>, <b>3</b>, <b>8</b>} <td>ουν in <b>1SG</b>
<tr><td><b>IA-</b>{<b>2</b>, <b>7</b>} <td>ει in <b>2SG</b> and <b>3SG</b>
<tr><td><b>IA-</b>{<b>3</b>, <b>8</b>} <td>ου in <b>1SG</b>, <b>2SG</b>, and <b>3SG</b>
<tr><td><b>IA-</b>{<b>1</b>, <b>7</b>} <td>ετε in <b>2PL</b>
</table>
<p>and, ignoring accents:</p>
<table class="table">
<tr><td><b>IA-</b>{<b>4</b>, <b>9</b>} <td>ατε in <b>2PL</b>
</table>
<p>But given the closed nature of <strong>IA-7</strong> and up, many of these will be easy to disambiguate. We&rsquo;ll go through the details in the next post.</p>
http://jktauber.com/2019/04/20/consolidating-vocabulary-coverage-and-ordering-too/Consolidating Vocabulary Coverage and Ordering Tools2019-04-20T22:54:32Z2019-04-20T22:54:32ZJames Tauber
<p>One of my goals for 2019 is to bring more structure to various disperate Greek projects and, as part of that, I&rsquo;ve started consolidating multiple one-off projects I&rsquo;ve done around vocabulary coverage statistics and ordering experiments.</p>
<p>One of my goals for 2019 is to bring more structure to various disperate Greek projects and, as part of that, I&rsquo;ve started consolidating multiple one-off projects I&rsquo;ve done around vocabulary coverage statistics and ordering experiments.</p>
<p>Going back at least 15 years (when I first started blogging about <a href="https://jktauber.com/2004/11/26/programmed-vocabulary-learning-travelling-salesman/">Programmed Vocabulary Learning</a>) I&rsquo;ve had little Python scripts all over the place to calculate various stats, or try out various approaches to ordering.</p>
<p>I&rsquo;m bringing all of that together in a single repository and updating the code so:</p>
<ul>
<li>it&rsquo;s all in one place</li>
<li>it&rsquo;s usable as a library in other projects or in things like Jupyter notebooks </li>
<li>it can be extended to arbitrary chunking beyond verses (e.g. books, chapters, sentences, paragraphs, pericopes)</li>
<li>it can be extended to other texts such as the Apostolic Fathers, Homer, etc (other languages too!)</li>
</ul>
<p>I&rsquo;m partly spurred on by a desire to explore more stuff <a href="https://thepatrologist.com">Seumas Macdonald</a> have been talking about and be more responsive to the occasional inquiries I get from Greek teachers. Also I have a poster <em>Vocabulary Ordering in Text-Driven Historical Language Instruction: Sequencing the Ancient Greek Vocabulary of Homer and the New Testament</em> that got accepted for <a href="https://sites.uclouvain.be/eurocall2019/">EUROCALL 2019</a> in August and this code library helps me not only produce the poster but also make it more reproducible.</p>
<p>Ultimately I hope to write a paper or two out of it as well.</p>
<p>I&rsquo;ve started the repo at:</p>
<p><a href="https://github.com/jtauber/vocabulary-tools/">https://github.com/jtauber/vocabulary-tools/</a></p>
<p>where I&rsquo;ve basically rewritten half of my existing code from elsewhere so far. I&rsquo;ve reproduced the code for generating core vocabulary lists and also the coverage tables I&rsquo;ve used in multiple talks (including my BibleTech talks in <a href="https://jktauber.com/2010/03/28/my-bibletech-2010-talk/">2010</a> and <a href="https://jktauber.com/2015/05/06/my-bibletech-2015-talk/">2015</a>). </p>
<p>I&rsquo;ve taken the opportunity to generalise and decouple the code (especially with regard to the different chunking systems) and also make use of newer Python stuff like <code>Counter</code> and dictionary comprehensions which simplifies much of my earlier code. </p>
<p>There are a lot of little things you can do with just a couple of lines of Python and I&rsquo;ve tried to avoid turning those into their own library of tiny functions. Instead, I&rsquo;m compiling a little tutorial / cookbook as I go which you can read the beginnings of here:</p>
<p><a href="https://github.com/jtauber/vocabulary-tools/blob/master/examples.rst">https://github.com/jtauber/vocabulary-tools/blob/master/examples.rst</a></p>
<p>There&rsquo;s still a fair bit more to move over (even going back 11 years to some stuff from 2008) but let me know if you have any feedback, questions, or suggestions. I&rsquo;m generalising more and more as I go so expect some things to change dramatically.</p>
<p>If you&rsquo;re interested in playing around with this stuff for corpora in other languages, let me know how I can help you get up and running. The main requirement is a tokenised and lemmatised corpus (assuming you want to work with lemmas, not surface forms, as vocabulary items) and also some form of chunking information. See <a href="https://github.com/jtauber/vocabulary-tools/tree/master/gnt_data">https://github.com/jtauber/vocabulary-tools/tree/master/gnt_data</a> for the GNT-specific stuff that would (at least partly) need to be replicated for another corpus.</p>
http://jktauber.com/2019/02/01/initial-apostolic-fathers-text-complete/Initial Apostolic Fathers Text Complete2019-02-01T17:40:30Z2019-02-01T17:40:30ZJames Tauber
<p>Exactly three months ago to the day, I announced that Seumas Macdonald and I were working on a corrected, open, digital edition of the Apostolic Fathers based on Lake. That initial work is now complete.</p>
<p>Exactly three months ago to the day, I announced that Seumas Macdonald and I were working on a corrected, open, digital edition of the Apostolic Fathers based on Lake. That initial work is now complete.</p>
<p><a href="https://jktauber.com/2018/11/01/preparing-open-apostolic-fathers/">Preparing an Open Apostolic Fathers</a> discussed the original motivation and the rather detailed process we went through.</p>
<p>The corrected raw text files are available on GitHub at <a href="https://github.com/jtauber/apostolic-fathers">https://github.com/jtauber/apostolic-fathers</a> but I also generated a static site at <a href="https://jtauber.github.io/apostolic-fathers/">https://jtauber.github.io/apostolic-fathers/</a> to browse the texts. The corrections will be contributed back to the OGL First1KGreek project.</p>
<p>The next step for us will be to lemmatise the text and there has already been some interest from others in getting the English translation corrected and aligned as well.</p>
<p>Recall that, while we were essentially correcting the Open Greek and Latin text, we used the CCEL text and that in Logos to identify particular places to look at in the printed text. We did this by lining up the CCEL, OGL and Logos texts and seeing where any of them disagreed. Those became the places we went back to, in multiple scans of the printed Lake, to make our corrections to the base text we started with from OGL.</p>
<p>How often did each of those three &ldquo;witnesses&rdquo; disagree? Here are some stats. <strong>A</strong> = CCEL, <strong>B</strong> = OGL, <strong>C</strong> = Logos. And so <strong>AB/C</strong> is where CCEL and OGL agreed against Logos, <strong>AC/B</strong> is where CCEL and Logos agreed against OGL, <strong>A/BC</strong> is where OGL and Logos agreed against CCEL, and <strong>A/B/C</strong> is where all three disagreed.</p>
<table class="table">
<tr><th>FILE <th>AB/C <th>AC/B <th>A/BC <th>A/B/C</tr>
<tr><td>001 <td>1.29% <td>1.15% <td>7.97% <td>0.32%</tr>
<tr><td>002 <td>0.76% <td>1.20% <td>3.39% <td>0.37%</tr>
<tr><td>003 <td>1.58% <td>2.20% <td>4.97% <td>0.28%</tr>
<tr><td>004 <td>0.57% <td>1.33% <td>7.01% <td>0.28%</tr>
<tr><td>005 <td>1.05% <td>1.79% <td>6.21% <td>0.84%</tr>
<tr><td>006 <td>0.88% <td>1.18% <td>7.54% <td>0.69%</tr>
<tr><td>007 <td>0.39% <td>0.88% <td>3.34% <td>0.20%</tr>
<tr><td>008 <td>0.79% <td>0.87% <td>5.41% <td>0.44%</tr>
<tr><td>009 <td>0.25% <td>1.53% <td>2.68% <td>0.38%</tr>
<tr><td>010 <td>0.44% <td>4.05% <td>4.36% <td>0.25%</tr>
<tr><td>011 <td>0.36% <td>1.86% <td>4.23% <td>0.14%</tr>
<tr><td>012 <td>0.92% <td>1.15% <td>5.59% <td>0.43%</tr>
<tr><td>013 <td>1.29% <td>0.90% <td>6.08% <td>0.34%</tr>
<tr><td>014 <td>1.25% <td>0.34% <td>4.91% <td>0.08%</tr>
<tr><td>015 <td>0.96% <td>0.65% <td>6.74% <td>0.50%</tr>
<tr><td><b>TOTAL</b> <td><b>1.11%</b> <td><b>1.12%</b> <td><b>5.98%</b> <td><b>0.34%</b></tr>
</table>
<p>One can immediately see CCEL diverged the most from the others (it had considerable lacunae for a start). The numbers involving Logos diverging are probably overly high because there was a weird systemic error we only noticed after work had started that a middle dot was often erroneously added after eta. This ultimately didn&rsquo;t affect anything other than perhaps flagging places Seumas and I had to check that we otherwise wouldn&rsquo;t have needed to.</p>
<p>But at the end of the day, how much did we change? How much of the OGL original remained? How similar was our result to the text on CCEL? And for a bit of fun, how often was my first correction and Seumas&rsquo;s first correction the same as what we ended up with after consensus was achieved? Here&rsquo;s the breakdown by work:</p>
<table class="table">
<tr><th>FILE <th>CCEL <th>OGL <th>JT <th>SM</tr>
<tr><td>001 <td>91.27% <td>99.02% <td>99.85% <td>99.91%</tr>
<tr><td>002 <td>96.02% <td>98.90% <td>99.77% <td>99.90%</tr>
<tr><td>003 <td>94.58% <td>97.63% <td>99.77% <td>99.60%</tr>
<tr><td>004 <td>92.42% <td>98.48% <td>99.91% <td>100.00%</tr>
<tr><td>005 <td>92.32% <td>98.32% <td>99.79% <td>99.89%</tr>
<tr><td>006 <td>91.28% <td>98.82% <td>98.82% <td>99.80%</tr>
<tr><td>007 <td>96.07% <td>98.92% <td>99.90% <td>99.90%</tr>
<tr><td>008 <td>93.89% <td>99.30% <td>100.00% <td>99.91%</tr>
<tr><td>009 <td>96.82% <td>99.75% <td>98.60% <td>99.87%</tr>
<tr><td>010 <td>94.94% <td>96.27% <td>99.87% <td>99.68%</tr>
<tr><td>011 <td>95.04% <td>98.54% <td>99.77% <td>99.91%</tr>
<tr><td>012 <td>93.86% <td>98.78% <td>99.87% <td>99.90%</tr>
<tr><td>013 <td>93.15% <td>99.20% <td>99.87% <td>99.83%</tr>
<tr><td>014 <td>94.90% <td>99.62% <td>99.92% <td>99.74%</tr>
<tr><td>015 <td>92.69% <td>99.16% <td>99.96% <td>99.62%</tr>
<tr><td><b>TOTAL</b> <td><b>93.32%</b> <td><b>98.97%</b> <td><b>99.83%</b> <td><b>99.84%</b></tr>
</table>
<p>You just beat me Seumas :-)</p>
http://jktauber.com/2019/01/14/more-thoughts-different-morphological-analyses/More Thoughts on Different Morphological Analyses2019-01-14T01:01:09Z2019-01-14T01:01:09ZJames Tauber
<p>In <a href="https://jktauber.com/2018/12/10/five-types-morphological-analysis/">Five Types of Morphological Analysis</a> I outlined five distinct ways of approaching morphological (or potentially any linguistic) analysis. In support of some of these, I have some additional examples from a pair of papers I&rsquo;m reading and a conference I just attended.</p>
<p>In <a href="https://jktauber.com/2018/12/10/five-types-morphological-analysis/">Five Types of Morphological Analysis</a> I outlined five distinct ways of approaching morphological (or potentially any linguistic) analysis. In support of some of these, I have some additional examples from a pair of papers I&rsquo;m reading and a conference I just attended.</p>
<p>Baayen et al (2018) (co-written by Jim Blevins, my undergraduate advisor from 25 years ago and still a mentor), in describing their own word-based, discriminative approach to morphology, contrast it with both widespread morpheme-based approaches and increasingly popular exponent-focused realizational approaches. I&rsquo;ll leave a discussion of these different approaches to another time, but what is relevant to my previous post is this comment:</p>
<blockquote>
<p>[morpheme-based and realizational analyses] may be of practical value, especially in the context of adult second language acquisition. It is less clear whether the corresponding theories, whose practical utility derives ultimately from their pedagogical origins, can be accorded any cognitive plausibility.</p>
</blockquote>
<p>Note the distinction they are making between analyses of practical (adult SLA, pedagogical) value and cognitive plausibility.</p>
<p>Again, it&rsquo;s not the point of this post to describe (much less assess) their arguments for why morphemes and exponents might not be cognitively plausible and what the alternative is, merely that they acknowledge certain analyses might be useful for pedagogical purposes independent of their cognitive plausibility (thereby agreeing with my <strong>psychological</strong> vs <strong>pedagogical</strong> distinction).</p>
<p>Perhaps <strong>cognitive</strong> would be another word for my <strong>psychological</strong> category.</p>
<p>They furthermore suggest:</p>
<blockquote>
<p>Constructional schemata, inheritance, and mechanisms spelling out exponents are all products of descriptive traditions that evolved without any influence from research traditions in psychology. As a consequence, it is not self-evident that these notions would provide an adequate characterization of the representations and processes underlying comprehension and production. It seems particularly implausible that children would be motivated to replicate the descriptive scaffolding of [these] theoretical accounts&hellip;</p>
</blockquote>
<p>Terms like &ldquo;descriptive traditions&rdquo; and &ldquo;descriptive scaffolding of theoretical accounts&rdquo; refer to what I had in mind with my <strong>synchronic</strong> category of analysis. Perhaps <strong>descriptive</strong> and <strong>theoretical</strong> would be other words for that category.</p>
<p>In a related paper, Baayen et al (2019), they talk about three possible responses to the challenge posed to linguistics (or at least linguistically-informed natural language processing) by the success of machine learning. </p>
<p>Αgain it&rsquo;s outside the scope of this post to get into those details, but in short, their suggested possible responses are: (1) admit defeat, (2) claim the hidden layers reflect traditional linguistic representations, (3) rethink the nature of language processing in the brain. They go on to explore the third option in the context of morphology and the lexicon, stating that</p>
<blockquote>
<p>the model that we propose here brings together several strands of research across theoretical morphology, psychology, and machine learning.</p>
</blockquote>
<p>Note that this is essentially a claim that it&rsquo;s possible to reconcile at least three of the different approaches I&rsquo;ve outlined: the synchronic/description/theoretical, the cognitive/psychological, and the algorithmic/machine-learning.</p>
<p>(Missing here is any reference to diachrony or pedagogy, which I think they would agree are distinct approaches to what they are attempting to unify).</p>
<p>Now last week, I attended the Society for Computation in Linguistics meeting, coinciding with the big annual meeting of the Linguistic Society of America. One of the goals of SCiL is to build bridges from the NLP community to the linguistics community so it was of particular interest to me.</p>
<p>But again one of the big things that came up in multiple talks was distinct approaches: the approach of the NLP practitioners, often referred to as the <strong>engineering</strong> approach, and that of the linguists, often referred to as the <strong>scientific</strong> approach. At their most self-deprecating, the NLP practioners confessed their over-obsession with metrics on &ldquo;tasks&rdquo; and lack of regard for the underlying scientific &ldquo;questions&rdquo;. Noah Smith, in fact, joked that NLPers can annoy linguists by asking what their &ldquo;task&rdquo; is and linguists can annoy NLPers by asking what their &ldquo;question&rdquo; is.</p>
<p>The point of mentioning this is yet another example of a difference in approach and perspective.</p>
<p>Diachrony didn&rsquo;t feature at all in either the Baayen/Blevins papers nor at SCiL, but certainly my other distinctions seem more broadly confirmed (albeit with alternative terminology). So I think we have:</p>
<ul>
<li><strong>algorithmic</strong> / <strong>engineering</strong> / <strong>task-oriented</strong></li>
<li><strong>diachronic</strong></li>
<li><strong>synchronic</strong> / <strong>descriptive</strong> / <strong>theoretical</strong></li>
<li><strong>psychological</strong> / <strong>cognitive</strong></li>
<li><strong>pedagogical</strong></li>
</ul>
<p>Now this is not to say some of these approaches can&rsquo;t be combined (as shown in the Baayen/Blevins papers). But even when one is attempting to combine some of them, I think it&rsquo;s useful to acknowledge (a) the multiple approaches being combined; (b) other approaches with distinct goals and evaluation procedures that aren&rsquo;t being consisdered but which may still be valuable in other contexts.</p>
<p>At the end of the day, I&rsquo;m trying to turn arguments of the form &ldquo;that isn&rsquo;t a good theory/description/implementation/explanation of morphology&rdquo; into a more nuanced &ldquo;it probably isn&rsquo;t good for this but it might be good for that&rdquo;.</p>
<h2 id="references">References</h2>
<p>Baayen, R. H., Chuang, Y. Y., and Blevins, J. P. (2018). Inflectional morphology with linear mappings. The Mental Lexicon, 13 (2), 232-270.</p>
<p>Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan E., and Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1-39.</p>
http://jktauber.com/2018/12/10/five-types-morphological-analysis/Five Types of Morphological Analysis2018-12-10T07:41:52Z2018-12-10T07:00:54ZJames Tauber
<p>People talking about morphological analyses can often speak across each other because they have different purposes in mind. Here&rsquo;s an initial attempt to outline five possibly distinct notions one might be referring to.</p>
<p>People talking about morphological analyses can often speak across each other because they have different purposes in mind. Here&rsquo;s an initial attempt to outline five possibly distinct notions one might be referring to.</p>
<p>I&rsquo;m tentatively labelling them:</p>
<ul>
<li>algorithmic</li>
<li>diachronic</li>
<li>synchronic</li>
<li>psychological</li>
<li>pedagogical</li>
</ul>
<p>although the labels matter less than being clear about the distinction.</p>
<p><strong>Algorithmic</strong> means I can go from an inflected form to a lemma + morphosyntactic properties (or vice versa) efficiently on a computer. The way this is achieved might not be psychologically plausible or historically accurate but it can be implemented in software to get the job done.</p>
<p><strong>Diachronic</strong> means I can explain (or at least speculate) how the inflected form came about: what the roots are, what grammaticalisation took place, what sound changes explain seeming irregularities, etc.</p>
<p><strong>Synchronic</strong> means I can describe the inflected forms without recourse to historical data or reconstruction. This might focus on perspicuity rather than computational efficiency or psychological plausibility.</p>
<p><strong>Psychological</strong> means the analysis is consistent with what I think is (or was) going on in the minds of native speakers. Some people may equate this with syncronic analyses but I think you can have a psychologically implausible yet still descriptively adequate synchronic analysis.</p>
<p><strong>Pedagogical</strong> means a useful way of explaining it to students. This <em>may</em> be diachronic, but might be more synchronic (whether psychologically plausible or not).</p>
<p>Analyses can obviously be compatible with more than one of these. But I think it&rsquo;s helpful to be clear what the goals of any morphological description are. If the goal is to lemmatise and tag a new text, then psychological or historical plausibility, or analytical or pedagogical clarity might not matter. If one&rsquo;s goal is a diachronically-informed analysis to help students, it should be clear why an otherwise perfectly adequate morphological parser might not be producing useful information.</p>
<p>Those who have been following my <em>Tour of Greek Morphology</em> know I&rsquo;ve tried to be careful distinguishing, for example, historical explanations from how I think native speakers internalise(d) word forms, or how students should learn them.</p>
<p>I still come across a lot of people who think the &ldquo;modern&rdquo; way of understanding morphology is learning the &ldquo;morphemes&rdquo; and rules, not memorising paradigms. Besides getting the history somewhat wrong, this is also making the mistake of conflating these different types of analyses and not recognising that one type of analysis might be perfectly valid for one purpose but not another.</p>
<p>Here&rsquo;s a fun game to play: how would you analyse/explain the form λαμβάνω? Or ἔλαβον (especially when 3rd plural) or λήμψομαι? Or μαθητής vs μαθητοῦ? Or ἔδωκεν vs δέδωκα vs δός?</p>
<p>Maybe I haven&rsquo;t quite nailed the labels yet. Maybe there are further distinctions to draw. I welcome people&rsquo;s input.</p>
http://jktauber.com/2018/11/01/preparing-open-apostolic-fathers/Preparing an Open Apostolic Fathers2018-11-02T04:48:22Z2018-11-01T04:47:38ZJames Tauber
<p>I&rsquo;m working with Seumas Macdonald on an open, corrected digital edition of the Apostolic Fathers based on Lake.</p>
<p>I&rsquo;m working with Seumas Macdonald on an open, corrected digital edition of the Apostolic Fathers based on Lake.</p>
<p><a href="https://thepatrologist.com">Seumas Macdonald</a> asked me a few weeks ago what it would take to expand some of our text and vocab ordering experiments to the text of Apostolic Fathers (we&rsquo;re both desirous of more comprehensible input for Greek learners).</p>
<p>My reply was that we first of all needed to get a good open text and then lemmatise it. I thought the &ldquo;get a good open text&rdquo; would be trivial but it turned out not to be.</p>
<p>I asked around without much positive response. I found HTML versions of the Lake texts on the <a href="https://www.ccel.org">Christian Classics Ethereal Library</a> (CCEL) website but they turned out to be problematic quality-wise (see below).</p>
<p>It then occurred to me to check what was in the <a href="http://www.perseus.tufts.edu/hopper/">Perseus Digital Library</a>. It only had the Epistle of Barnabas but the related <a href="http://opengreekandlatin.github.io/First1KGreek/">First 1000 Years of Greek</a> at the Open Greek and Latin Project had done the rest.</p>
<p>The Perseus/OGL texts were considerably better than the CCEL ones, but were still not without problems. It was clear that the two collections had been produced independently, however, which is important for what follows.</p>
<p>I&rsquo;m almost certain the CCEL texts were keyed in. There is haplography and dittography galore! The hapolography even corresponds almost perfectly to line breaks in the printed Lake editions I looked at.</p>
<p>The Perseus/OGL texts, on the other hand, are the results of OCR with some manual correction.</p>
<p>I wrote some code to extract both the CCEL and Perseus/OGL texts and put them in a comparable format. I then wrote a script to align the two. My thinking was to go through all the places where the two disagreed, check the printed Lake and correct the Perseus/OGL text accordingly.</p>
<p>I decided to throw the Lake text from Logos into the mix as well, not as an input to the correction itself but merely as another &ldquo;edition&rdquo; to flag differences with (to then check with the printed Lake).</p>
<p>Thus began a project Seumas and I have been working on the last few weeks. Once differences in any of the three texts are identified, they are flagged for review and Seumas and I independently look at the printed Lake and correct the Perseus/OGL base text.</p>
<p>If our corrections disagree, we continue to work on them until we come to consensus. This three-way comparison followed by two-way independent correction is proving to work very well (although it&rsquo;s a lot of work!)</p>
<p>All the code, the source texts (except Logos), and work-in-progress are available at</p>
<p>https://github.com/jtauber/apostolic-fathers</p>
<p>and you can follow along the status in the README. There are also more detailed notes on the whole process.</p>
<p>Once the candidate versions of all the texts are published, I&rsquo;ll do another post just with some interesting statistics on the nature of errors in the CCEL, Perseus/OGL, and Logos texts. The &ldquo;scribal errors&rdquo; in the CCEL text are particularly fascinating but even some of the Perseus/OGL OCR errors will be worth writing about.</p>
<p>Seumas and I will then contribute back the corrections to CCEL, Perseus/OGL, and Logos. Hopefully our texts will also be featured on the <a href="http://biblicalhumanities.org/dashboard/">Biblical Humanities Dashboard</a> as the go-to open digital text of the Apostolic Fathers (so no one else has to repeat this effort).</p>
<p>Finally, we&rsquo;ll start the process of lemmatisation so the Apostolic Fathers can be included in our open learning materials.</p>
http://jktauber.com/2018/10/18/tour-greek-morphology-part-27/A Tour of Greek Morphology: Part 272018-10-18T05:22:24Z2018-10-18T05:22:24ZJames Tauber
<p>Part twenty-seven of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part twenty-seven of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Let&rsquo;s finish our survey of imperfect middle endings in the indicative with the athematic verbs.</p>
<table class="table">
<tr><th>&nbsp;<th>IM-6 <th>IM-7 <th>IM-8 <th>IM-9
<tr><th>1SG <td>Xύμην <td>Xέμην <td>Xόμην <td>Xάμην
<tr><th>2SG <td>Xυσο <td>Xεσο <td>Xοσο <td>Xασο/Xω
<tr><th>3SG <td>Xυτο <td>Xετο <td>Xοτο <td>Xατο
<tr><th>1PL <td>Xύμεθα <td>Xέμεθα <td>Xόμεθα <td>Xάμεθα
<tr><th>2PL <td>Xυσθε <td>Xεσθε <td>Xοσθε <td>Xασθε
<tr><th>3PL <td>Xυντο <td>Xεντο <td>Xοντο <td>Xαντο
</table>
<p>The classes are similar to their <strong>IA-</strong> equivalents except there is no ablaut between the singular and plural.</p>
<table class="table">
<tr><th>IM-6<td>-νυ- verbs like δείκνυμι<td>stem ends in ῠ
<tr><th>IM-7<td>τίθημι, ἵημι and their compounds <td>stem ends in ε
<tr><th>IM-8<td>δίδωμι and compounds<td>stem ends in ο
<tr><th>IM-9<td>ἵστημι and compounds<td>stem ends in ᾰ
</table>
<p>The intervocalic sigma in <strong>2SG</strong> generally does not drop out in the athematics although it sometimes can, particularly in <strong>IM-9</strong> which seems to be the class most starting to merge with the thematics. Note, though, that the lack of circumflex in this case eliminates confusion with an <strong>IM-4</strong> <strong>2SG</strong>.</p>
<p>The lack of circumflex in the <strong>3SG</strong> and <strong>2PL</strong> also eliminates confusion with <strong>IM-4</strong> in those cells.</p>
<p><strong>IM-7</strong> can be confused for <strong>IM-1</strong> in the <strong>3SG</strong> and <strong>2PL</strong>, though.</p>
<p>In the next few posts we&rsquo;ll summarise the inference rules and ambiguities for the imperfect and look at some type and token frequencies, just like we did for the present.</p>
http://jktauber.com/2018/09/08/tour-greek-morphology-part-26/A Tour of Greek Morphology: Part 262018-09-08T19:55:16Z2018-09-08T19:55:16ZJames Tauber
<p>Part twenty-six of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part twenty-six of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>We&rsquo;ve looked at the imperfect endings for the thematic actives and middles. Now let&rsquo;s look at the athematic active endings.</p>
<table class="table">
<tr><th>&nbsp;<th>IA-6 <th>IA-7 <th>IA-8 <th>IA-9 <th>IA-9b <th>IA-10 <th>IA-11
<tr><th>1SG <td>Xῡν <td>Xην/Xειν <td>Xουν <td>Xην <td>Xην <td>ἦ/ἦν <td>ᾖα/ᾔειν
<tr><th>2SG <td>Xῡς <td>Xεις <td>Xους <td>Xης <td>Xης/Xησθα <td>ἦς/ἦσθα <td>ᾔεις/ᾔεισθα
<tr><th>3SG <td>Xῡ <td>Xει <td>Xου <td>Xη <td>Xη <td>ἦν <td>ᾔει/ᾔειν
<tr><th>1PL <td>Xυμεν <td>Xεμεν <td>Xομεν <td>Xαμεν <td>Xαμεν <td>ἦμεν <td>ᾖμεν
<tr><th>2PL <td>Xυτε <td>Xετε <td>Xοτε <td>Xατε <td>Xατε <td>ἦτε <td>ᾖτε
<tr><th>3PL <td>Xυσαν <td>Xεσαν <td>Xοσαν <td>Xασαν <td>Xασαν <td>ἦσαν <td>ᾖσαν/ᾔεσαν
</table>
<p><strong>IA-6</strong> is the -νυ- verbs like δείκνυμι. There is ablaut between the singular and plural (ῡ vs υ).</p>
<p><strong>IA-9</strong> is ἵστημι and compounds. There is again the expected singular/plural ablaut (η vs α).</p>
<p><strong>IA-8</strong> is δίδωμι and compounds. There is a vowel alternative but it is ου/ο and not ω/ο ablaut like in the present.</p>
<p><strong>IA-7</strong> is τίθημι, ἵημι and their compounds. The vowel alternation here is ει/ε and not η/ε ablaut like in the present except for the η in the <strong>1SG</strong>.</p>
<p><strong>IA-9b</strong> is φημί which is like ἵστημι but with the added <strong>2SG</strong> Xησθα.</p>
<p><strong>IA-10</strong> and <strong>IA-11</strong> are εἰμί and εἶμι respectively. The -σθα <strong>2SG</strong> ending comes up again but there are other differences that we will eventually want to unpack.</p>
<p>For the most part, the endings follow those of the thematic imperfects. The consistent difference is the <strong>3PL</strong> -σαν (although see below).</p>
<p>We&rsquo;ll save for later posts what&rsquo;s going on with the -σθα ending and with various parts of the <strong>IA-10</strong> and <strong>IA-11</strong> paradigms. But I want to note something intriguing about the unexpected vowel alternations in <strong>IA-7</strong> and <strong>IA-8</strong>.</p>
<p>Xουν ~ Xους ~ Xου is what we see in <strong>IA-3</strong> and Xεις ~ Xει in <strong>IA-2</strong>. This suggests that these athematic verbs were starting to be inflected <em>as if</em> they were thematic.</p>
<p>Along similar lines, John 21.18 has ἐζώννυες with a theme vowel. Acts 27.1 has παρεδίδουν for the plural (yet παρεδίδοσαν in Acts 16.4).</p>
http://jktauber.com/2018/09/06/back-international-colloquium-ancient-greek-lingui/Back from International Colloquium on Ancient Greek Linguistics2018-09-06T15:30:10Z2018-09-06T15:30:10ZJames Tauber
<p>Last week I attended the ninth International Colloquium on Ancient Greek Linguistics at the University of Helsinki.</p>
<p>Last week I attended the ninth International Colloquium on Ancient Greek Linguistics at the University of Helsinki.</p>
<p>It was an excellent conference with a lot of good linguistic and philological content featuring some nice quantatitive analyses.</p>
<p>Some of the paper highlights for me:</p>
<ul>
<li><strong>Paul Kiparsky</strong> on a regular sound change explanation (via Optimality Theory) for various alternations usually explained via analogy
<br><a href="https://www.helsinki.fi/en/conferences/international-colloquium-on-ancient-greek-linguistics/abstracts-a-k#section-58193">abstract</a></li>
<li><strong>Robert Crellin</strong> on the ambiguity of Greek without vowels as part of an exploration of why Greek introduced written vowels in the first place <br><a href="https://www.helsinki.fi/en/conferences/international-colloquium-on-ancient-greek-linguistics/abstracts-a-k#section-58038">abstract</a></li>
<li><strong>Lucien van Beek</strong> on atelic perfects in Homeric Greek
<br><a href="https://www.helsinki.fi/en/conferences/international-colloquium-on-ancient-greek-linguistics/abstracts-k-z#section-58151">abstract</a></li>
<li><strong>David Goldstein</strong> on differential agent marking (dative vs prepositional phrase) in Herodotus
<br><a href="https://www.helsinki.fi/en/conferences/international-colloquium-on-ancient-greek-linguistics/abstracts-a-k#section-58055">abstract</a></li>
<li><strong>Sandra Rodríguez Piedrabuena</strong> on (im)politeness strategies in Ancient Greek
<br><a href="https://www.helsinki.fi/en/conferences/international-colloquium-on-ancient-greek-linguistics/abstracts-k-z#section-58138">abstract</a></li>
</ul>
<p>I may do individual follow-up posts to some of these as they inspired potential investigations of my own in the future.</p>
<p>It was also great just catching up with people I&rsquo;ve met the last couple of years at Greek and Indo-European conferences at UCLA, Oxford, and Cambridge.</p>
http://jktauber.com/2018/08/25/tour-greek-morphology-part-25/A Tour of Greek Morphology: Part 252018-08-25T21:52:11Z2018-08-25T21:52:11ZJames Tauber
<p>Part twenty-five of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part twenty-five of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In the <a href="https://jktauber.com/2018/07/29/tour-greek-morphology-part-24/">previous part</a> we looked at the endings of the active imperfects with theme vowels. Now we are going to look at the middles.</p>
<table class="table">
<tr><th>&nbsp;<th>IM-1 <th>IM-2 <th>IM-3 <th>IM-4 <th>IM-5
<tr><th>1SG <td>Xόμην <td class="info">Xούμην <td class="info">Xούμην <td class="warning">Xώμην <td class="warning">Xώμην
<tr><th>2SG <td>Xου <td class="info">Xοῦ <td class="info">Xοῦ <td class="warning">Xῶ <td class="warning">Xῶ
<tr><th>3SG <td>Xετο <td>Xεῖτο <td>Xοῦτο <td>Xᾶτο <td>Xῆτο
<tr><th>1PL <td>Xόμεθα <td class="info">Xούμεθα <td class="info">Xούμεθα <td class="warning">Xώμεθα <td class="warning">Xώμεθα
<tr><th>2PL <td>Xεσθε <td>Xεῖσθε <td>Xοῦσθε <td>Xᾶσθε <td>Xῆσθε
<tr><th>3PL <td>Xοντο <td class="info">Xοῦντο <td class="info">Xοῦντο <td class="warning">Xῶντο <td class="warning">Xῶντο
</table>
<p>The vowel differences between these five different classes of verb should largely be familiar to you by now as they&rsquo;re pretty much the same pattern we&rsquo;ve seen in the present active, present middle, and imperfect active—namely:</p>
<ul>
<li>The <strong>-2</strong> class historically had an ε before the theme vowel and this led (depending on whether the theme vowel was ε or ο) to ει or ου</li>
<li>The <strong>-3</strong> class historically had an ο before the theme vowel and this led (regardless of whether the theme vowel was ε or ο) to ου</li>
<li>The <strong>-4</strong> class historically had an α before the theme vowel and this led (depending on whether the theme vowel was ε or ο) to ω or ᾱ</li>
<li>The <strong>-5</strong> class is like the <strong>-4</strong> class but with a η for the ᾱ</li>
</ul>
<p>One difference in the above table from what we&rsquo;ve seen before is that the <strong>2SG</strong> ending is identical between <strong>IM-2</strong> and <strong>IM-3</strong> and between <strong>IM-4</strong> and <strong>IM-5</strong>.</p>
<p>The fact the distinguisher is a bare diphthong might remind you of the <strong>2SG</strong> in the present middle, which in <a href="https://jktauber.com/2017/07/23/tour-greek-morphology-part-9/">part 9</a> we partially explained as historically coming from a dropped intervocalic sigma (e.g. ε+σαι &gt; εαι &gt; ηι &gt; ῃ). This is indeed what happened here too.</p>
<p>The pattern is clearer put alongside the <strong>3SG</strong> and <strong>3PL</strong> as well.</p>
<table class="table">
<tr><th>&nbsp;<th>PM-1 <th>IM-1
<tr><th>2SG <td>ε+σαι > ῃ <td>ε+σο > ου
<tr><th>3SG <td>ε+ται <td>ε+το
<tr><th>3PL <td>ο+νται <td>ο+ντο
</table>
<p>We can see here that, prior to the dropping of the sigma (and subsequent contraction) to a long-ο written as a spurious diphthong ου, the present and imperfect endings in the <strong>2SG</strong>, <strong>3SG</strong>, and <strong>3PL</strong> just differed in a final αι/ο alternation (which is tantalisingly close to just a iota/no-iota alternation like we might expect).</p>
<p>If we try to summarise the historical origins of the personal endings, we might get something like the following:</p>
<table class="table">
<tr><th>&nbsp; <th>PA <th>IA <th>PM <th>IM
<tr><th>1SG <td>μι <td>μ <td>μαι <td>μην
<tr><th>2SG <td>σι <td>σ <td>σαι <td>σο
<tr><th>3SG <td>τι <td>τ <td>ται <td>το
<tr><th>1PL <td>μεν <td>μεν <td>μεθα <td>μεθα
<tr><th>2PL <td>τε <td>τε <td>σθε <td>σθε
<tr><th>3PL <td>ντι <td>ντ <td>νται <td>ντο
</table>
<p>There is a clear μ/σ/τ/ντ pattern in the <strong>1SG</strong>/<strong>2SG</strong>/<strong>3SG</strong>/<strong>3PL</strong>. Cross-cutting this there is a clear ι/-/αι/ο pattern in the <strong>PA</strong>/<strong>IA</strong>/<strong>PM</strong>/<strong>IM</strong>. The exception is the μην in the <strong>IM</strong> <strong>1SG</strong> (where we might expect μο).</p>
<p>The <strong>1PL</strong> and <strong>2PL</strong> seem to be playing by a different set of rules and notice they don&rsquo;t make a distinction between the present and imperfect at all.</p>
<p>Note that this summary of endings, while providing a historical background to the Greek forms we see, is really in the realm of Indo-European comparative linguistics rather than Greek. It&rsquo;s the foundation to how Ancient Greek came to be the way it was but doesn&rsquo;t reflect the way native speakers would have internalised inflections nor should be suggestive of the way they should be taught nowadays.</p>
<p>The goal here is to explain some things once the <em>actual</em> endings are already familiar.</p>
http://jktauber.com/2018/07/29/tour-greek-morphology-part-24/A Tour of Greek Morphology: Part 242018-07-29T17:51:05Z2018-07-29T17:51:05ZJames Tauber
<p>Part twenty-four of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part twenty-four of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Now let&rsquo;s look at the imperfect forms corresponding to the active omega verbs we looked at in the present way back in <a href="https://jktauber.com/2017/07/02/tour-greek-morphology-part-4/">part 4</a>.</p>
<p>We&rsquo;ll use <strong>IA-1</strong> through <strong>IA-5</strong> for the distinguisher patterns corresponding to the verbs that followed <strong>PA-1</strong> through <strong>PA-5</strong> in the present.</p>
<table class="table">
<tr><th>&nbsp;<th>IA-1<th>IA-2<th>IA-3<th>IA-4<th>IA-5
<tr><th>1SG<td>Xον<td class="info">Xουν<td class="info">Xουν<td class="warning">Xων<td class="warning">Xων
<tr><th>2SG<td>Xες<td>Xεις<td>Xους<td>Xᾱς<td>Xης
<tr><th>3SG<td>Xε(ν)<td>Xει<td>Xου<td>Xᾱ<td>Xη
<tr><th>1PL<td>Xομεν<td class="info">Xοῦμεν<td class="info">Xοῦμεν<td class="warning">Xῶμεν<td class="warning">Xῶμεν
<tr><th>2PL<td>Xετε<td>Xεῖτε<td>Xοῦτε<td>Xᾶτε<td>Xῆτε
<tr><th>3PL<td>Xον<td class="info">Xουν<td class="info">Xουν<td class="warning">Xων<td class="warning">Xων
</table>
<p>Recall:</p>
<table class="table">
<tr><th>PA-1<td>barytone omega verbs
<tr><th>PA-2<td>circumflex omega verbs with INF -εῖν / 3SG -εῖ
<tr><th>PA-3<td>circumflex omega verbs with INF -οῦν / 3SG -οῖ
<tr><th>PA-4<td>circumflex omega verbs with INF -ᾶν / 3SG -ᾷ
<tr><th>PA-5<td>ζάω + compounds
</table>
<p>It is clear that the imperfect endings shown above had a theme vowel (alternating ο/ε exactly as with the present) which historically contracted with the preceding vowel (if it existed) under exactly the same rules as with the present forms (explained in detail in <a href="https://jktauber.com/2017/07/17/tour-greek-morphology-part-8/">part 8</a>).</p>
<table class="table">
<tr><th>&nbsp;<th>theme vowel<th>ending
<tr><th>1SG<td>ο<td>ν
<tr><th>2SG<td>ε<td>ς
<tr><th>3SG<td>ε<td>-
<tr><th>1PL<td>ο<td>μεν
<tr><th>2PL<td>ε<td>τε
<tr><th>3PL<td>ο<td>ν
</table>
<p>Too often with paradigms we only look at the person/number alternations within a fixed tense/aspect/voice. Let&rsquo;s now look at the possible present / imperfect alternations in the endings we&rsquo;ve seen (ignoring the augment for now):</p>
<table class="table">
<tr><th>&nbsp;<th>present<th>imperfect
<tr><th rowspan=2>1SG <td>Xω <td>Xον
<tr> <td>Xῶ <td>Xουν or Xων
<tr><th rowspan=5>2SG <td>Xεις <td>Xες
<tr> <td>Xεῖς <td>Xεις
<tr> <td>Xοῖς <td>Xους
<tr> <td>Xᾷς <td>Xᾱς
<tr> <td>Xῇς <td>Xης
<tr><th rowspan=5>3SG <td>Xει <td>Xε(ν)
<tr> <td>Xεῖ <td>Xει
<tr> <td>Xοῖ <td>Xου
<tr> <td>Xᾷ <td>Xᾱ
<tr> <td>Xῇ <td>Xη
<tr><th rowspan=3>3PL <td>Xουσι(ν) <td>Xον
<tr> <td>Xοῦσι(ν) <td>Xουν
<tr> <td>Xῶσι(ν) <td>Xων
</table>
<p>With <strong>1PL</strong> and <strong>2PL</strong> endings identical between present and imperfect.</p>
http://jktauber.com/2018/07/23/normalisation-column-morphgnt/The Normalisation Column in MorphGNT2018-07-23T16:10:46Z2018-07-23T16:07:51ZJames Tauber
<p>Eliran Wong asked for a more detailed description of the &ldquo;normalisation&rdquo; column in MorphGNT so I promised him I&rsquo;d write a blog post about it.</p>
<p>Eliran Wong asked for a more detailed description of the &ldquo;normalisation&rdquo; column in MorphGNT so I promised him I&rsquo;d write a blog post about it.</p>
<p>I first outlined the objective of the column in a <a href="https://jktauber.com/2005/08/30/upcoming-new-morphgnt/">2005 blog post</a> but enough time has passed and new work done that I thought it was worthy of a new post.</p>
<p>The core idea of the normalised column is to give the inflected form as it would be stated in isolation.</p>
<p>To use the example from the 2005 post, consider the phrase in Matthew 1.20:</p>
<blockquote>
<p>τὴν γυναῖκά σου</p>
</blockquote>
<p>If you were to ask someone what the accusative singular feminine definite article is, you&rsquo;d expect the answer τήν and not τὴν. Similarly if you asked what the accusative singular of γυνή is, you&rsquo;d expect the answer γυναῖκα and not γυναῖκά. The differences in Matthew 1.20 are contextual and, for many applications (particularly morphology) aren&rsquo;t of much interest.</p>
<p>And so years ago, I went about adding a new column that normalised this sort of thing. Similarly μετά, μεθ&rsquo;, μετ&rsquo;, and μετὰ all get normalised to μετά in this separate column.</p>
<p>Back in the 2005 post, I enumerated the normalisations as:</p>
<ul>
<li>existing text may exhibit elision (e.g. μετ&rsquo; versus μετά)</li>
<li>existing text may exhibit movable ς or ν</li>
<li>final-acute may become grave</li>
<li>enclitics may lose an accent</li>
<li>word preceding an enclitic may gain an extra accent</li>
<li>the οὐ / οὐκ / οὐχ alternation</li>
</ul>
<p>When I published the SBLGNT analysis, another normalisation was added, namely the normalisation of capitalisation at the start of paragraphs or direct speech. The capitalisation is not an inherent part of the inflected form in isolation, only the particular context of the token, and so it is normalised.</p>
<p>In <a href="https://jktauber.com/2017/04/17/analysing-verbs-nestle-1904/">Analysing the Verbs in Nestle 1904</a> I covered some differences between the SBLGNT and Nestle 1904 analyses that normalisation would have smoothed over. Note that normalisation COULD go further (for example, spelling differences) but I chose not to do that in the normalisation column.</p>
<p>In brief, the things NOT normalised include:</p>
<ul>
<li>spelling</li>
<li>crasis (e.g. κἀγώ vs καὶ ἐγώ)</li>
</ul>
<p>In <a href="https://jktauber.com/2015/11/27/annotating-normalization-column-morphgnt-part-1/">Annotating the Normalization Column in MorphGNT: Part 1</a> I started talking about annotating WHY each token was normalised the way it was and you can see some counts there for how many tokens underwent normalisation of accent or capitalisation, and how many had elision or a movable nu or sigma.</p>
<p>In many cases, the normalisation can be automated without any need for human intervention (by having a list of elidable words, enclitics, etc). I&rsquo;ll soon publish my latest Python code for doing this. In some cases, manual checking is needed (although lemmatisation generally resolves a lot of the ambiguities). In <a href="https://jktauber.com/2016/01/17/direct-speech-capitalization-first-preceding-head/">Direct Speech Capitalization and the First Preceding Head</a> I talked about the start of some work to go through all capitalisation and identify the reason for it. Similarly <a href="https://jktauber.com/2017/02/15/new-morphgnt-releases-and-accentuation-analysis/">New MorphGNT Releases and Accentuation Analysis</a> discusses work on annotating the reason for all accentuation changes.</p>
<p>There is still lots more work to do this for the SBLGNT but I did apply the idea when working on Seumas Macdonald&rsquo;s <a href="https://github.com/seumasjeltzz/DigitalNyssa">Digital Nyssa</a> project. For that, I produced a file the first five lines of which are:</p>
<div class="codehilite"><pre><span></span>Ἦλθε ἦλθε capitalisation
καὶ καί grave
ἐφ’ ἐπί elision
ἡμᾶς ἡμᾶς
ἡ ἡ proclitic
</pre></div>
<p>Here each token is normalised in the second column with the third column giving the reason for any difference between the token and the normalised form (and also indicating proclitics).</p>
<p>The possible annotations (and there can be more than one on a token) are:</p>
<ul>
<li>grave</li>
<li>capitalisation</li>
<li>elision</li>
<li>movable</li>
<li>extra</li>
<li>proclitic</li>
<li>enclitic</li>
</ul>
<p>I hope to eventually be able to provide the same for the entire SBLGNT (and other Greek texts).</p>
<p>Doing all this normalisation has a number of benefits. It makes it easier to extract forms for studying morphology, it allows searches to work more as expected (you don&rsquo;t want to have to think up all the possible ways a form could actually be written in a text to search for it), it also allows much easier searching for particular phenomena (for example particular clitic accentuation).</p>
<p>It also allows for more rigorous validation of things like accentuation. Work in this area has already uncovered a number of accentuation errors in the SBLGNT text, for example, and could help with automated checking of OCR, etc.</p>
http://jktauber.com/2018/04/25/first-impressions-john-lees-accents-book/First Impressions of John Lee’s Accents Book2018-05-26T18:34:16Z2018-04-25T03:05:16ZJames Tauber
<p>John Lee&rsquo;s <em>Basics of Greek Accents</em> was released today. Here are some first impressions.</p>
<p>John Lee&rsquo;s <em>Basics of Greek Accents</em> was released today. Here are some first impressions.</p>
<p>Like D. A. Carson&rsquo;s 1985 book <em>Greek Accents: A Student&rsquo;s Manual</em>, Lee&rsquo;s new book (based on notes from a class he taught at Macquarie University) is designed to backfill knowledge of Greek accents for those students whose beginning Greek skipped over them.</p>
<p>At least since Wenham&rsquo;s <em>Elements of New Testament Greek</em>, there has been a trend in beginning New Testament Greek (and perhaps Classical Greek) textbooks to do away with instruction about accentuation. I haven&rsquo;t investigated, but I suspect this correlates with a reduction in English-to-Greek exercises in textbooks too.</p>
<p>Lee, like Carson before him, considers an understanding of accents to be vital to learning Greek. The book, published by Zondervan, is clearly (in name and cover design) intended by them to fill the gap left by Mounce&rsquo;s <em>Basics of Biblical Greek</em>. </p>
<p>Lee&rsquo;s book is small—110 pages and about the size of a 5 x 7 photograph. It&rsquo;s compact but lucid nevertheless. The modern typography makes for more pleasant reading that both Carson book and Probert&rsquo;s 2003 <em>New Short Guide to the Accentuation of Ancient Greek</em>. </p>
<p>It&rsquo;s a gentler introduction than either Carson or Probert. There are eight chapters or &ldquo;lessons&rdquo; and each has two sets of exercises (marked as &ldquo;In Class&rdquo; and &ldquo;Homework&rdquo;). All exercises involve adding accents to unaccented text. Examples and exercises are NT focused but not exclusively and the book would be more than suitable for Classical Greek students as well.</p>
<p>As is understandable given its goals, there are no theoretical underpinnings given and little historical explanation.</p>
<p>I&rsquo;ve found a few places where, given it&rsquo;s for beginners (albeit those who know some Greek), I wish Lee had been a little more explicit. For example he says that &ldquo;Aorist active infinitives in -σαι accent on second last&rdquo; but never explains when one might expect an acute versus a circumflex. A one line rule with several examples is typical. But it is rare that all the edge cases are covered.</p>
<p>After saying that the verb is generally recessive, he gives various forms of λύω including the subjunctive λυθῶ. He gives contraction as the reason for this one deviant form, but that is the last thing he says about subjunctives other than a remark a couple of pages later about ἀποδῷ being the pattern for compound -μι verbs.</p>
<p>While Lee is a gentler introduction, one thing I like about Carson&rsquo;s book on accents is he&rsquo;ll often be a little more exploratory, considering a new form and whether previous rules are adequate to cover the evidence, and only once motivated, introduce a new rule. In doing this, students are encouraged to think a little more about how the rules interact. In a way, Carson&rsquo;s approach is more like what I&rsquo;ve been trying to do with my morphology blog posts.</p>
<p>While there&rsquo;s much to commend it as a first introduction to accents, I do find Lee often misses the forest and instead just catalogs the trees. There&rsquo;s little view of the whole as a system, how the parts interact. I understand why you don&rsquo;t start with that, but I feel you need to get to it eventually.</p>
<p>As an example, I recently summarised the first and second declension noun accents as follows:</p>
<ul>
<li>by default the accent is persistent</li>
<li>however, if the ending is a different length than in the base form (nominative singular), the law of limitation may require an accent change (e.g. X́XS -&gt; XX́L, L̃S -&gt; ĹL, ĹL -&gt; L̃S)</li>
<li>if the base form is oxytone, it becomes perispomenon (X́-&gt;L̃) in oblique cases (genitive and dative)</li>
<li>in the 1st declension, the genitive plural is always perispomenon -ῶν (even if the base is not oxytone)</li>
</ul>
<p>I gave examples of contrasting pairs for every accentuation and syllable length combination in both the first and second declension, and highlighted various things like the importance of building an intuition for the L̃S ~ ĹL alternation (the σωτῆρα rule). I also pointed out that the oblique case perispomenon (XL̃) is only possible because all oblique case endings are long.</p>
<p>Now, I&rsquo;m not suggesting that this is sufficient—it needs a certain amount of unpacking and is jargon heavy. But this, or something similar, makes a nice summary that ties multiple things together in explaining the first and second declension. It covers the fact that persistence and the law of limitation might be in conflict and how that gets resolved. It explains what happens to oxytones in the oblique cases, and gives the exception of 1st declension genitive plural, pointing out this is not limited just to the oxytones like the previous rule.</p>
<p>In contrast, Lee covers the relevant rules but never brings them together in the context of a single paradigm (other than θεός which hardly demonstrates most of the points). The statement about the genitive plural is 28 pages later than the statement about circumflexes in the oblique when the base form is oxytone. His examples of the law of limitation do cover a couple of direct~oblique alternations but that is isolated from the chapter on noun accentuation and is never explained in the context of vowel length patterns in the noun endings.</p>
<p>All in all, however, I think Lee&rsquo;s book is a good first introduction to Greek accentuation and its presentation is undoubtedly cleaner than that of previous books. My main criticism is that it is incomplete and students would benefit from some consolidation of the principles taught. Some of that criticism may be mitigated in a classroom situation, for which it was originally intended. Students working alone might have more questions than the book answers. I would recommend something like Probert as a follow on (it will also make a better reference). That said, I think Lee achieves his aim in providing the &ldquo;basics&rdquo; and (to quote the back cover blurb) &ldquo;a foundation [students] will use as they continue their studies&rdquo;.</p>
http://jktauber.com/2018/05/26/tour-greek-morphology-part-23/A Tour of Greek Morphology: Part 232018-05-26T03:34:57Z2018-05-26T03:34:56ZJames Tauber
<p>Part twenty-three of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part twenty-three of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Okay, so we want to contrast two forms of the indicative generally referred to as the &ldquo;present&rdquo; and &ldquo;imperfect&rdquo;.</p>
<p>As we always do with paradigms, we&rsquo;ll keep certain things constant (in this case, the lexeme, voice and mood) and vary things along along one axis (person / number agreement) and another axis (present vs imperfect).</p>
<table class="table">
<tr><th>&nbsp;<th>present<th>imperfect</tr>
<tr><th>1SG <td>λύω <td>ἔλυον </tr>
<tr><th>2SG <td>λύεις <td>ἔλυες </tr>
<tr><th>3SG <td>λύει <td>ἔλυε </tr>
<tr><th>1PL <td>λύομεν <td>ἐλύομεν </tr>
<tr><th>2PL <td>λύετε <td>ἐλύετε </tr>
<tr><th>3PL <td>λύουσι <td>ἔλυον </tr>
</table>
<p>There are numerous things which should stand out:</p>
<ul>
<li>the <strong>imperfect</strong> forms all have an initial ἐ-</li>
<li>this is then followed by the same λυ root found in the <strong>present</strong></li>
<li>this is then followed by an ε/ο &ldquo;theme&rdquo; vowel</li>
<li>the <strong>1SG</strong> and <strong>3PL</strong> are identical in the <strong>imperfect</strong></li>
<li>the <strong>present</strong> and <strong>imperfect</strong> share the same ending in the <strong>1PL</strong> and in the <strong>2PL</strong></li>
</ul>
<p>There&rsquo;s another perhaps more subtle thing you may notice:</p>
<ul>
<li>the endings in the <strong>imperfect</strong> <strong>2SG</strong> and <strong>3SG</strong> are the same as the <strong>present</strong> <em>without the ι</em></li>
</ul>
<p>Recall also that the -ουσι ending in the <strong>present</strong> <strong>3PL</strong> historically came from -οντι. Without the ι, that would be -οντ and given Greek words can only end in ν, ς, or a vowel, dropping the τ from -οντ would give us the -ον we see.</p>
<p>Furthermore, if we consider the <em>athematic</em> <strong>1SG</strong> ending -μι and drop the ι, we get -μ. This is not one of the sounds a Greek word can end in and historically, this was changed to an ν. This gives us the -ον we see in the <strong>1SG</strong>.</p>
<p>So it seems that <em>historically</em> the relationship between the two sets of endings has to do with the existence or non-existence of an ι. The only exceptions are the <strong>1PL</strong> and <strong>2PL</strong>. Interestingly these are the only two-syllable endings (counting the theme vowel).</p>
<p>It could even be stated (at least in the earlier history) as: <strong>imperfect</strong> has ἐ- but not -ι- and the <strong>present</strong> has -ι- but not ἐ-, except in the two-syllable ending cases where the only contrast is the existence or absence of ἐ-.</p>
<p>We&rsquo;ve only looked at λύω / ἔλυον so far, so in the next couple of posts we&rsquo;ll look to see how the imperfect endings work in other lexemes.</p>
http://jktauber.com/2018/05/16/tour-greek-morphology-part-22/A Tour of Greek Morphology: Part 222018-05-16T17:32:15Z2018-05-16T17:28:37ZJames Tauber
<p>Part twenty-two of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part twenty-two of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>I’ve deliberated for a while about whether to follow the <em>present</em> with the <em>imperfect</em> or with the <em>aorist</em>. I had recently elected to go with the aorist but as I sketched out what I wanted to say, I realised it would be easier if I’d said some things about the imperfect first.</p>
<p>And so I’ve decided to do a few posts about the imperfect.</p>
<p>We won’t talk about the endings in this post. I want us to <em>start</em> thinking about the imperfect and its relationship to the present not in terms of endings but in terms of the overall paradigm structure.</p>
<p>In previous posts, we saw that the present comes in two voices: an active and a middle (although we haven’t yet touched on the notion of presents coming in <em>both</em> versus <em>just one</em> of these). Within each voice, we looked at six indicative forms (corresponding to patterns of person and number agreement) and an infinitive (which effectively just has no person or number). We haven’t yet covered this, but each present voice also has imperative forms, subjunctive and optative forms, and participles in each of three genders.</p>
<p>The imperfect, in contrast, only has the indicative forms. No infinitive, no participles, no imperative, no subjunctive, and no optative.</p>
<p>We might be tempted to think of this in terms of the imperfect somehow being “defective”, as if we were doing a feature comparison like this:</p>
<table class="table">
<tr><th>&nbsp;<th>present<th>imperfect</tr>
<tr><th>indicatives<td>✓<td>✓</tr>
<tr><th>infinitives<td>✓<td>✗</tr>
<tr><th>imperatives<td>✓<td>✗</tr>
<tr><th>subjunctives<td>✓<td>✗</tr>
<tr><th>optatives<td>✓<td>✗</tr>
<tr><th>participles<td>✓<td>✗</tr>
</table>
<p>But another way to think of the imperfect as being <em>part</em> of the “present” family and providing a contrasting set of indicatives.</p>
<p>So we have:</p>
<ul>
<li>indicatives 1 (“present”)</li>
<li>indicatives 2 (“imperfect”)</li>
<li>infinitives</li>
<li>imperatives</li>
<li>subjunctives</li>
<li>optatives</li>
<li>participles </li>
</ul>
<p>This model suggests that, say, the infinitive or imperatives or participles, are just as much the infinitive, imperatives, or participles of the imperfect as they are of the present. </p>
<p>This also leads to the need for a new name for this entire family. Traditionally it’s referred to as the “present system” because of the shared stems, but as I&rsquo;ve ranted on this blog before, I think it’s unfortunate to use “present” for both the entire system and for one of the two types of indicatives within it.</p>
<p>For reasons we&rsquo;ll touch on later, the system could perhaps better be called the “imperfective system”. </p>
<p>But the remainder of posts on the imperfects will focus on their endings and, in particular, the contrast with the other set of indicatives (the “present” indicatives we’ve been talking in about the previous posts).</p>
http://jktauber.com/2018/03/18/conference-time/Conference Time2018-03-18T17:30:18Z2018-03-18T17:30:18ZJames Tauber
<p>I&rsquo;m off for another string of conferences, this time in Copenhagen, Chicago, and New Orleans.</p>
<p>I&rsquo;m off for another string of conferences, this time in Copenhagen, Chicago, and New Orleans.</p>
<p>First is a workshop on <em>Original Language Resources for Bible Translation and Education</em> organised by Nicolai Winther-Nielsen of the Global Learning Initiative and Reinier de Blois of the United Bible Societies. David Instone-Brewer put it best when he responded to the workshop invitation with &ldquo;All the key people in one place with lots of time to talk and plan. How could I miss this?&rdquo; Perhaps most exciting for me is I finally get to meet Ulrik Sandborg-Petersen for the first time after working together for more than twelve years!</p>
<p>I fly from Copenhagen to Chicago at the end of the week for the annual conference of the American Association of Applied Linguistics. It will be my first time attending the conference and I&rsquo;m looking forward to learning a lot (although in contrast to the Copenhagen workshop, I&rsquo;ll know virtually no one).</p>
<p>I have to leave AAAL slightly early though, to go down to New Orleans for the first US VueConf. Vue.JS is an important technology in the Scaife Viewer and DeepReader reading environments. I went to the first European VueConf last year and gave a lightning talk on DeepReader. I had hoped to give a talk on the Scaife Viewer at VueConf US but my talk wasn&rsquo;t accepted so I&rsquo;m hoping at least for another lightning talk.</p>
http://jktauber.com/2018/03/10/tour-greek-morphology-part-21/A Tour of Greek Morphology: Part 212018-03-10T22:55:53Z2018-03-10T22:55:53ZJames Tauber
<p>Part twenty-one of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part twenty-one of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>I started this series with</p>
<blockquote>
<p>I ultimately hope to cover everything that a beginner-intermediate grammar might but <strong>in a much more exploratory fashion</strong>. I’ll occasionally touch on morphological theory but I mostly want to point out phenomena in the language that students <strong>have already seen</strong> but perhaps have not thought about in any depth.</p>
</blockquote>
<p>(emphasis added)</p>
<p>In short, the primary goal has been (and will continue to be) to take data the reader already is assumed to know and to make observations and construct relationships that the reader perhaps didn’t already realise or know. The secondary goal is to talk a little bit about linguistic theory and historical linguistics in relation to the specific phenomena being discussed.</p>
<p>Now that we’re finished our first pass over (particularly the endings of) the present indicatives and infinitives, I wanted to summarise a few key points we’ve touched on that are of a more conceptual nature.</p>
<ul>
<li>A paradigm is a way of showing related forms next to one another for comparison. We often keep some morphosyntactic properties constant while varying others. We often but, not always, keep the lexeme constant.</li>
<li>We can look at paradigms along (at least) three dimensions: (1) we can take one lexeme’s inflection and look at what stays the same and what changes in different cells; (2) we can take a morphosyntactic property set and look at what stays the same and what changes across different lexemes; (3) we can take a <em>subset</em> of morphosyntactic properties and vary them while keeping the rest of the set (and the lexeme) fixed.</li>
<li>Greek rarely has a one-to-one mapping between an individual morphosyntactic property and some surface property of the inflected form.</li>
<li>There are some cells in a paradigm that are highly predictable and others than are highly predictive.</li>
<li>There are relationships between cells which are often more helpful than relationships between a cell and its underlying or historical stem.</li>
<li>The primary role of morphology is to discriminate between alternatives, not build up compositional meaning.</li>
<li>Ambiguity in morphology can be tolerated if other things (syntax, context) help disambiguate.</li>
<li>There is a big difference between looking at patterns in the surface forms and exploring the historical reasons those patterns developed. While the latter is vital for answering “why”, it is not a crucial part of language acquisition. (Native English speakers don’t acquire strong verbs by understanding how Proto-Indo-European ablaut patterns led to Germanic inflectional classes!)</li>
</ul>
<p>As well as these conceptual points, we’ve talked about the actual endings, inflectional classes, vowel contractions, frequency effects, and which cells might be the best to use as a lemma.</p>
<p>We also spent time actually testing our models against the corpus data with some Python scripts and showed how that uncovered some patterns we hadn’t previously considered.</p>
<p>We haven’t looked at everything to do with the presents, but it’s time to move on, at least for a while, to a different part of the verbal system.</p>
<p>That said, if you have any questions about the previous twenty parts, or any questions you&rsquo;re hoping will be answered in subsequent posts, just leave a comment (or email me if you want to ask anonymously).</p>
http://jktauber.com/2018/03/05/tour-greek-morphology-part-20/A Tour of Greek Morphology: Part 202018-03-05T03:20:32Z2018-03-05T03:20:32ZJames Tauber
<p>Part twenty of a tour through Greek inflectional morphology to help get
students thinking more systematically about the word forms they see (and maybe
teach a bit of general linguistics along the way).</p>
<p>Part twenty of a tour through Greek inflectional morphology to help get
students thinking more systematically about the word forms they see (and maybe
teach a bit of general linguistics along the way).</p>
<p>In <a href="https://jktauber.com/2017/10/16/tour-greek-morphology-part-17/">part 17</a>,
we went through counts for our present active (infinitive and indicative) classes. Now we&rsquo;ll wrap things up by doing the same for the middle.</p>
<p>Recall this is based on the analysis of 820 tokens available
<a href="https://gist.github.com/jtauber/accb8180f56fceee37f57a040faa4b8a">here</a>
which was described in the last two parts.</p>
<p>Let us first of all look at the number of distinct lemmas in each of our 14 classes.</p>
<table class="table">
<tr><th nowrap>PM-1 <td>barytone thematics with INF -εσθαι / 3SG -εται <td>105</tr>
<tr><th nowrap>PM-2 <td>circumflex thematics with INF -εῖσθαι / 3SG -εῖται <td>21</tr>
<tr><th nowrap>PM-3 <td>circumflex thematics with INF -οῦσθαι / 3SG -οῦται (ζηλόω, ἐλαττόω, λυτρόομαι, διαβεβαιόομαι) <td>4</tr>
<tr><th nowrap>PM-4 <td>circumflex thematics with INF -ᾶσθαι / 3SG -ᾶται <td>11</tr>
<tr><th nowrap>PM-5 <td>circumflex thematics with INF -ῆσθαι / 3SG -ῆται (χράομαι and compound) <td>2</tr>
<tr><th nowrap>PM-6a <td>INF -υσθαι / 3SG -υται (ἀπόλλυμι, ἐνδείκνυμι, συναναμίγνυμι) <td>3</tr>
<tr><th nowrap>PM-7 <td>INF -εσθαι / 3SG -εται (compound of τίθημι) <td>3</tr>
<tr><th nowrap>PM-8 <td>INF -οσθαι / 3SG -οται <td>-</tr>
<tr><th nowrap>PM-9 <td>INF -ασθαι / 3SG -αται (δύναμαι, compounds of ἵστημι) <td>8</tr>
<tr><th nowrap>PM-10 <td>ἧμαι <td>-</tr>
<tr><th nowrap>PM-10-COMP <td>compounds of ἧμαι (κάθημαι) <td>1</tr>
<tr><th nowrap>PM-11 <td>κεῖμαι <td>1</tr>
<tr><th nowrap>PM-11-COMP <td>compounds of κεῖμαι <td>7</tr>
<tr><th nowrap>PM-12 <td>οἶμαι <td>1</tr>
</table>
<p>Again, even the small counts are elevated due to compound verbs. Folding
compounds of the same base verb, only <strong>PM-1</strong>, <strong>PM-2</strong>, <strong>PM-3</strong>, <strong>PM-4</strong>,
and <strong>PM-6a</strong> have more than one or two members (and <strong>PM-6a</strong> only has three).</p>
<p>This is just looking at the number of unique lemmas in each class but there are
two other sets of numbers that are worth looking at:
(1) the total number of tokens in the SBLGNT;
(2) the distribution of classes amongst the hapax legomena.</p>
<table class="table">
<tr><th>class <th>lemmas <th>tokens <th>hapax <th>hapax details</tr>
<tr><th nowrap>PM-1 <td>105 <td>523 <td>45</tr>
<tr><th nowrap>PM-2 <td>21 <td>57 <td>7</tr>
<tr><th nowrap>PM-3 <td>4 <td>5 <td>3 <td>ζηλόω ἐλαττόω λυτρόομαι</tr>
<tr><th nowrap>PM-4 <td>11 <td>33 <td>4 <td>μυκάομαι κοιμάομαι καταράομαι ἐγκαυχάομαι</tr>
<tr><th nowrap>PM-5 <td>2 <td>2 <td>2 <td>χράομαι and συγχράομαι</tr>
<tr><th nowrap>PM-6a <td>3 <td>9 <td>-</tr>
<tr><th nowrap>PM-7 <td>3 <td>5 <td>2 <td>διατίθεμαι and μετατίθημι</tr>
<tr><th nowrap>PM-8 <td>- <td>- <td>-</tr>
<tr><th nowrap>PM-9 <td>8 <td>156 <td>4 <td>ἐξίστημι ἐφίστημι ἀνθίστημι ἀφίσταμαι</tr>
<tr><th nowrap>PM-10 <td>- <td>- <td>-</tr>
<tr><th nowrap>PM-10-COMP <td>1 <td>5 <td>-</tr>
<tr><th nowrap>PM-11 <td>1 <td>9 <td>-</tr>
<tr><th nowrap>PM-11-COMP <td>7 <td>15 <td>-</tr>
<tr><th nowrap>PM-12 <td>1 <td>1 <td>1 <td>οἶμαι</tr>
</table>
<p>Recall the hapax legomena matter because they give an indication of what
classes were still productive.</p>
<p>If we fold compounds under their base verb, only <strong>PM-1</strong>, <strong>PM-2</strong>, <strong>PM-3</strong>,
and <strong>PM-4</strong> have more than one hapax legomenon.</p>
<p>Let&rsquo;s now look at counts for each paradigm cell for each class:</p>
<table class="table">
<tr><th>&nbsp; <th nowrap>PM-1 <th nowrap>PM-2 <th nowrap>PM-3 <th nowrap>PM-4 <th nowrap>PM-5 <th nowrap>PM-6a <th nowrap>PM-7 <th nowrap>PM-8 <th nowrap>PM-9 <th nowrap>PM-10-C <th nowrap>PM-11 <th nowrap>PM-11-C <th nowrap>PM-12</tr>
<tr><th>INF <td>89 <td>15 <td>4 <td>8 <td>- <td>4 <td>- <td>- <td>12 <td>2 <td>- <td>3 <td>-</tr>
<tr><th>1SG <td>85 <td>17 <td>- <td>3 <td>- <td>1 <td>4 <td>- <td>9 <td>1 <td>1 <td>- <td>1</tr>
<tr><th>2SG <td>19 <td>1 <td>- <td>5 <td>- <td>- <td>- <td>- <td>7 <td>- <td>- <td>- <td>-</tr>
<tr><th>3SG <td>228 <td>7 <td>- <td>8 <td>- <td>- <td>- <td>- <td>74 <td>2 <td>7 <td>11 <td>-</tr>
<tr><th>1PL <td>20 <td>4 <td>- <td>3 <td>1 <td>3 <td>- <td>- <td>9 <td>- <td>1 <td>- <td>-</tr>
<tr><th>2PL <td>24 <td>9 <td>- <td>3 <td>- <td>- <td>1 <td>- <td>32 <td>- <td>- <td>- <td>-</tr>
<tr><th>3PL <td>58 <td>4 <td>1 <td>3 <td>1 <td>1 <td>- <td>- <td>13 <td>- <td>- <td>1 <td>-</tr>
<tr><th>&nbsp; <th>523 <th>57 <th>5 <th>33 <th>2 <th>9 <th>5 <th>- <th>156 <th>5 <th>9 <th>15 <th>1</tr>
</table>
<p>As in the active, the <strong>3SG</strong> and <strong>INF</strong> dominate with only a few interesting
exceptions. The third person (especially <strong>3SG</strong> but also <strong>3PL</strong>) is unusually low in
<strong>PM-2</strong>. In <strong>PM-9</strong>, the <strong>2PL</strong> is usually high. This is almost certainly just
because of particular lexical items that happen to be in those classes rather than
an inherent characteristic of the class itself, although because the origins
of some classes are derivational, there may occasionally be tendencies on
semantic grounds.</p>
<p>If the goal is just to identify the person/number, not the class,
(which is true in reception but not learning) then most of these numbers
collapse because of shared endings. Here are the counts just focused on the
common endings (without accents):</p>
<table class="table">
<tr><td>INF <td>-σθαι <td>137
<tr><td>1SG <td>-μαι <td>122
<tr><td rowspan=2>2SG <td>-{ι} <td>25
<tr> <td>-σαι <td>7
<tr><td>3SG <td>-ται <td>337
<tr><td>1PL <td>-μεθα <td>41
<tr><td>2PL <td>-σθε <td>69
<tr><td>3PL <td>-νται <td>82
</table>
<p>And that&rsquo;s it for the present middles. I&rsquo;ll do a brief summary post next and
then we&rsquo;ll start exploring beyond the presents.</p>
http://jktauber.com/2018/02/03/new-draft-morphological-tags-morphgnt/New Draft Morphological Tags for MorphGNT2018-02-03T13:47:39Z2018-02-03T13:47:39ZJames Tauber
<p>I&rsquo;ve finally done the work in translating the MorphGNT tagging system to a new proposal for initial feedback.</p>
<p>I&rsquo;ve finally done the work in translating the MorphGNT tagging system to a new proposal for initial feedback.</p>
<p>At least going back to my initial collaboration with Ulrik Sandborg-Petersen in 2005, I&rsquo;ve been thinking about how I would do morphological tags in MorphGNT if I were starting from scratch.</p>
<p>Much later, in 2014, I had some discussions with Mike Aubrey at my first SBL conference and put together a <a href="https://github.com/morphgnt/sblgnt/wiki/Proposal-for-a-New-Tagging-Scheme">straw proposal</a>. There was a rethinking of some parts-of-speech, handling of tense/aspect, handling of voice, handling of syncretism and underspecification.</p>
<p>Even though some of the ideas were more drastic than others, a few things have remained consistent in my thinking:</p>
<ul>
<li>there is value in a purely morphological analysis that doesn&rsquo;t disambiguate on syntactic or semantic grounds</li>
<li>this analysis does not need the notion of parts-of-speech beyond purely <a href="https://jktauber.com/2015/11/05/morphological-parts-speech-greek/">Morphological Parts of Speech</a></li>
<li>this analysis should not attempt to distinguish middles and passives in the present or perfect system</li>
</ul>
<p>As part of the handling of syncretism and underspecification, I had originally suggested a need for a value for the case property that didn&rsquo;t distinguish nominative and accusative and a need for a value for the gender property like &ldquo;non-neuter&rdquo;.</p>
<p>In the absence of feedback beyond a vague feeling that something <em>like</em> this should be done, I didn&rsquo;t immediately make further progress but, a year later, started gathering more notes on <a href="https://github.com/morphgnt/sblgnt/wiki/Handling-Ambiguity">handling ambiguity</a>. That then led to a more concrete proposal just around <a href="https://github.com/morphgnt/sblgnt/wiki/Proposal-for-Gender-Tagging">gender</a> and <a href="https://github.com/morphgnt/sblgnt/wiki/Proposal-for-Case-Tagging">case</a> (although not without open questions).</p>
<p>I&rsquo;ve now implemented those smaller-scale proposals as a first draft for the MorphGNT SBLGNT and plan to apply them to other GNT texts soon. The <code>new-tags</code> branch for MorphGNT SBLGNT is available at: <a href="https://github.com/morphgnt/sblgnt/tree/new-tags">https://github.com/morphgnt/sblgnt/tree/new-tags</a>.</p>
<p>This adds a new column (the intention is not to replace existing analyses yet, just augment them) that:</p>
<ul>
<li>makes voice formal not functional (while still using <code>P</code> in the aorist and future for what Carl Conrad would called MP2)</li>
<li>does not give morphosyntactic properties for uninflected words</li>
<li>implements basic nominative/accusative case syncretism in the neuter with a single value</li>
<li>implements basic non-neuter, non-feminine, and (in most genitive plurals) complete gender syncretism with a value for each</li>
</ul>
<p>One immediate affect of this is that a list I have from Randall Tan of disagreements between the MorphGNT SBLGNT analysis and that of the Nestle 1904 largely goes away because many of them were merely different judgements of gender or case on non-morphological grounds. This new tag retains the uncertainty. Another benefit of the tagging scheme is that it provides a reasonable output for an automated morphological analysis system which can then, in a separate step, be disambiguated syntactically (or semantically), potentially with human input.</p>
<p>There are some important things to note, however, as just saying &ldquo;this is a purely morphological analysis that doesn&rsquo;t disambiguate&rdquo; oversimplifies things greatly.</p>
<p>Firstly, while punting distributional and semantic part-of-speech questions like &ldquo;is this an adverb or a conjunction&rdquo; or &ldquo;what type of pronoun is this&rdquo; is extremely helpful, there are still some questions that impact a purely morphological tagging such as whether to represent a fossilised verb acting as a particle as having morphological inflection.</p>
<p>Secondly, there are what I have called <strong>extended syncretisms</strong> not modelled where there can be uncertainty between properties taken as a pair. For example 1st person singular vs 3rd person plural in -ον, or 1st declension genitive singular vs accusative plural in -ας. It may be worth still conveying this ambiguity but just through disjunction, saying for example that a word is <code>GSF^APF</code>. These are almost always phonological coincidences rather than structural syncretism and so should be modelled differently.</p>
<p>Related to this is the &ldquo;double&rdquo; syncretism between accusative singular masculine and neuter on the one hand and nominative and accusative singular neuter on the other hand. If we model the latter as <code>CSN</code> then we&rsquo;ve lost the former (which, if by itself could be modelled as <code>ASY</code>). So, in a sense <code>CSN</code> and <code>ASY</code> are syncretic (but also share an overlapping cell). <code>CSN^ASY</code> doesn&rsquo;t quite seem right because of that overlap and the fact that this isn&rsquo;t just a phonological coincidence as best I know.</p>
<p>Thirdly, I have only modelled basic syncretism, not endings in wildly different parts of the paradigms (so would definitely not be called syncretism) that also happen to have converged by phonological change. For example both -ου and -ον can be nominal endings or unrelated verbal endings (with quite a few interpretations, mind you, especially for -ου). No attempt has been made to capture this in a single tag (although a disjunctive representation might be possible).</p>
<p>And finally (although related to the previous point), a certain amount of <em>lexical</em> disambiguation is applied. There are many cases where not being familiar with the lexeme makes a form highly ambiguous but that ambiguity goes away if the lemma is known. A simple example is imperfects versus second aorists where the principal parts resolve the ambiguity. The draft new tags for MorphGNT SBLGNT effectively assume the lemmatisation has been done and is correct.</p>
<p>In light of this, some people might be surprised, therefore, that υἱοῦ is tagged <code>GSY</code> and not <code>GSM</code> given it&rsquo;s lexically masculine. My current argument (at least in my own head) is that, regardless of a specific lexeme like υἱοῦ, <code>GSM</code>, as a morphological tag, doesn&rsquo;t really make sense in the Greek paradigmatic system because, by nature, genitive singulars have the same form in the non-feminines. I think there&rsquo;s definitely a difference, if subtle, between true ambiguity and underspecification. It&rsquo;s not that υἱοῦ is ambiguous as to gender, it&rsquo;s just that the cell doesn&rsquo;t distinguish masculine from neuter. Lexical knowledge is still being used, otherwise it could be feminine (or even a middle imperative!).</p>
<p>So, in short, syncretism inherent to the paradigmatic system is captured well but other forms of ambiguity will need to be handled other ways (potentially via a disjunctive list of possibilities). This seems a reasonable, practical compromise.</p>
<p>Let me know your thoughts. There&rsquo;s definitely still more to do and I do plan on expressing more ambiguity with some form of disjunction. I&rsquo;ll probably do a post soon with some more thoughts (and stats) on that.</p>
http://jktauber.com/2018/01/21/lexical-dispersion-greek-new-testament-gries-dp/Lexical Dispersion in the Greek New Testament Via Gries's DP2018-01-21T01:02:44Z2018-01-21T01:02:43ZJames Tauber
<p>Measures of dispersion are interesting to apply to a corpus because they tell you whether a word is distributed across parts of the corpus as expected or concentrated more in just some parts. I thought I&rsquo;d play around with Gries&rsquo;s DP as a measure of dispersion on the SBLGNT lemmas.</p>
<p>Measures of dispersion are interesting to apply to a corpus because they tell you whether a word is distributed across parts of the corpus as expected or concentrated more in just some parts. I thought I&rsquo;d play around with Gries&rsquo;s DP as a measure of dispersion on the SBLGNT lemmas.</p>
<p>There are lots of measures of dispersion but Stefan Th. Gries&rsquo;s is perhaps the simplest (see [1] for a detailed survey of lots of different measures as well as the original definition of his own).</p>
<p>Here it is in Python for lemmas:</p>
<div class="codehilite"><pre><span></span>dp = sum(abs((p[part] / t) - (lp[lemma][part] / l[lemma])) for part in p) / 2
</pre></div>
<p>where:</p>
<ul>
<li><code>p[part]</code> is a dictionary mapping corpus part to the count of words in that part</li>
<li><code>l[lemma]</code> is a dictionary mapping lemmas to the count of that lemma in the corpus</li>
<li><code>lp[lemma][part]</code> is a dictionary of dictionaries mapping lemmas and parts to the count of the lemma in that part</li>
</ul>
<p>but see [1] for some simple worked examples.</p>
<p>One thing Gries doesn&rsquo;t talk about (email me if you know of any discussion of this) is how to handle very low frequency words as they&rsquo;ll dominate the high DP values.</p>
<p>Using <strong>books</strong> as the parts, here are the top 10 most evenly dispersed lemmas in the GNT:</p>
<div class="codehilite"><pre><span></span>0.0466 ὁ
0.1085 εἰς
0.1154 καί
0.1178 ὅς
0.1250 εἰμί
0.1358 ποιέω
0.1382 γίνομαι
0.1385 πολύς
0.1395 μετά
0.1420 μή
</pre></div>
<p>Here are the top 10 least evenly dispered lemmas (including all frequencies, even hapax legomena):</p>
<div class="codehilite"><pre><span></span>0.9984 φιλοπρωτεύω
0.9984 ἐπιδέχομαι
0.9984 μειζότερος
0.9984 Διοτρέφης
0.9984 φλυαρέω
0.9982 χάρτης
0.9982 κυρία
0.9976 προσοφείλω
0.9976 ἑκούσιος
0.9976 ἄχρηστος
</pre></div>
<p>but this list looks very different if we, say, restrict ourselves to lemmas that occur 5 times or more:</p>
<div class="codehilite"><pre><span></span>0.9827 ἀντίχριστος
0.9752 καταλαλέω
0.9687 ἐπιφάνεια
0.9681 νήφω
0.9680 ἀρετή
0.9667 μῦθος
0.9641 Μελχισέδεκ
0.9568 πλεονεκτέω
0.9557 νόημα
0.9532 ἐνέργεια
</pre></div>
<p>or 30 times or more:</p>
<div class="codehilite"><pre><span></span>0.8952 ἀρνίον
0.8085 καυχάομαι
0.8024 θηρίον
0.7987 μέλος
0.7969 εἴτε
0.7266 συνείδησις
0.7202 περιτομή
0.7199 θρόνος
0.7139 ὑποτάσσω
0.7116 Παῦλος
</pre></div>
<p>If we use <strong>chapters</strong> as the corpus division, we get a little different top ten most evenly distributed by Gries&rsquo;s DP:</p>
<div class="codehilite"><pre><span></span>0.0677 ὁ
0.1440 καί
0.1913 εἰμί
0.2084 εἰς
0.2117 αὐτός
0.2259 ἐν
0.2366 οὗτος
0.2378 ὅς
0.2437 δέ
0.2561 οὐ
</pre></div>
<p>and obviously this is even more problematic for lower frequency words at the other end.</p>
<p>It&rsquo;s interesting to look, though, at chapters within a single book. For example, here are the most evenly distributed lemmas in John&rsquo;s gospel using chapters for parts:</p>
<div class="codehilite"><pre><span></span>0.0574 ὁ
0.0867 καί
0.0977 αὐτός
0.1331 οὐ
0.1391 οὗτος
0.1440 ὅτι
0.1480 λέγω
0.1569 δέ
0.1576 εἰμί
0.1658 εἰς
</pre></div>
<p>and here are the least evenly distributed lemmas that occur at least 10 times:</p>
<div class="codehilite"><pre><span></span>0.9470 σταυρόω
0.9414 Ἀβραάμ
0.9126 νίπτω
0.8958 Πιλᾶτος
0.8914 πρόβατον
0.8812 Λάζαρος
0.8493 καρπός
0.8426 ἄρτος
0.8371 προσκυνέω
0.8221 ψυχή
</pre></div>
<p>Obviously Gries&rsquo;s DP is extremely easy to calculate, and I plan to experimentally include it in the Greek Vocabulary Tool for the Perseus Project but there are still some things to work out with low frequency words.</p>
<p>It&rsquo;s very interesting, though, as a way of contrasting words that otherwise have the same frequency in a corpus. For example, here are all the lemmas that occur <em>exactly</em> 30 times in the SBLGNT, with their book-based Gries&rsquo;s DP:</p>
<div class="codehilite"><pre><span></span>0.3276 διδαχή
0.3558 ἐγγύς
0.3708 σκότος
0.4143 ἀγοράζω
0.5360 σκανδαλίζω
0.5833 συνέρχομαι
0.6230 ἴδε
0.6485 ἐπικαλέω
0.7266 συνείδησις
0.8952 ἀρνίον
</pre></div>
<p>There is a massive range in the DP which I think is quite illustrative.</p>
<p>Here is the list with their chapter-based DP (notice how high the lowest DP now is):</p>
<div class="codehilite"><pre><span></span>0.8769 ἀγοράζω
0.8821 σκότος
0.8869 συνέρχομαι
0.8958 σκανδαλίζω
0.9016 ἐγγύς
0.9016 διδαχή
0.9034 ἴδε
0.9083 ἐπικαλέω
0.9441 συνείδησις
0.9609 ἀρνίον
</pre></div>
<p>One of my reasons for exploring Gries&rsquo;s DP (and potentially other measures of lexical dispersion) is the application to language learning. My sense is that dispersion might be a useful input to deciding what vocabulary to learn. For example διδαχή or σκότος might be better to learn before ἀρνίον because, even though they all have the same frequency, you are more likely to encounter διδαχή or σκότος in a random book or chapter.</p>
<p>[1] Gries, Stefan Th. (2008) <a href="http://www.linguistics.ucsb.edu/faculty/stgries/research/2008_STG_Dispersion_IJCL.pdf">Dispersions and adjusted frequencies in corpora</a>. International Journal of Corpus Linguistics 13:4. John Benjamins.</p>
http://jktauber.com/2017/12/24/some-unix-command-line-exercises-using-morphgnt/Some Unix Command Line Exercises Using MorphGNT2017-12-25T00:24:19Z2017-12-24T00:15:35ZJames Tauber
<p>I thought I&rsquo;d help a friend learn some basic Unix command line (although pretty comprehensive for this type of work) with some practical graded exercises using MorphGNT. It worked out well so I thought I&rsquo;d share in case they are useful to others.</p>
<p>I thought I&rsquo;d help a friend learn some basic Unix command line (although pretty comprehensive for this type of work) with some practical graded exercises using MorphGNT. It worked out well so I thought I&rsquo;d share in case they are useful to others.</p>
<p>The point here is not to actually teach how to use <code>bash</code> or commands like <code>grep</code>, <code>awk</code>, <code>cut</code>, <code>sort</code>, <code>uniq</code>, <code>head</code> or <code>wc</code> but rather to motivate their use in a gradual fashion with real use cases and to structure what to actually look up when learning how to use them.</p>
<p>This little set of commands has served me well for over twenty years working with MorphGNT in its various iterations (although I obviously switch to Python for anything more complex).</p>
<h3 id="task-0">Task 0</h3>
<p>Clone https://github.com/morphgnt/sblgnt in git.</p>
<h3 id="task-1">Task 1</h3>
<p>Using <code>wc</code> and the concept of wildcards/globbing (and relying on the fact I have one line-per-word in those files) work out how many words are in the main text of SBLGNT.</p>
<h3 id="task-2">Task 2</h3>
<p>Using <code>grep</code> and <code>wc</code> work out how many times μονογενής appears. (You might be able to do it with just <code>grep</code> and appropriate options, but try using <code>grep</code> without options and <code>wc</code> and understand the concept of &ldquo;piping&rdquo; the output of one command to the input of another)</p>
<h3 id="task-3">Task 3</h3>
<p>How many verbs (tokens) are there in John’s gospel? (still doable just with <code>grep</code> and <code>wc</code>)</p>
<h3 id="task-4">Task 4</h3>
<p>How many <em>unique</em> verbs (lemmas) are there in John’s gospel?</p>
<p>(learn how to use <code>awk</code> to extract fields, and how to use <code>sort</code> and <code>uniq</code> in tandem)</p>
<h3 id="task-5">Task 5</h3>
<p>What are the 5 most common verbs (lemmas) in John’s gospel? (you might want to use <code>head</code>)</p>
<h3 id="task-6">Task 6</h3>
<p>Get counts in John’s Gospel of how many tokens appear in each tense/aspect (hint: use <code>cut</code>) and write the results to a file called <code>john.txt</code> rather than just output it in the terminal.</p>
<h3 id="task-7">Task 7</h3>
<p>Come up with your own question that you think could be answered using the types of operations and try it out.</p>
http://jktauber.com/2017/11/22/sbl-papers-now-online/SBL Papers Now Online2017-11-22T12:04:07Z2017-11-22T12:04:07ZJames Tauber
<p>I&rsquo;ve put my two SBL papers this year (from both the recent Annual Meeting and the International Meeting) online and also sync&rsquo;d my Annual Meeting slides to audio I recorded on my iPhone.</p>
<p>I&rsquo;ve put my two SBL papers this year (from both the recent Annual Meeting and the International Meeting) online and also sync&rsquo;d my Annual Meeting slides to audio I recorded on my iPhone.</p>
<ul>
<li><em>SBL 2017 Annual</em>: <strong>Linking Lexical Resources for Biblical Greek</strong> <br><a href="https://www.academia.edu/35220175/Linking_Lexical_Resources_for_Biblical_Greek">[slides]</a> <a href="https://vimeo.com/243936959">[video]</a></li>
<li><em>SBL 2017 International</em>: <strong>The Route to Adaptive Learning of Greek</strong> <br><a href="https://www.academia.edu/35220134/The_Route_to_Adaptive_Learning_of_Greek">[slides]</a></li>
</ul>
<p>For completeness, here are my other SBL talks:</p>
<ul>
<li><em>SBL 2016 Annual</em>: <strong>An Online Adaptive Reading Environment for the Greek New Testament</strong> <br><a href="https://www.academia.edu/30722025/An_Online_Adaptive_Reading_Environment_for_the_Greek_New_Testament">[slides]</a></li>
<li><em>SBL 2015 Annual</em>: <strong>A Morphological Lexicon of New Testament Greek</strong> <br><a href="https://www.academia.edu/18816954/A_Morphological_Lexicon_of_New_Testament_Greek">[slides]</a></li>
</ul>
http://jktauber.com/2017/11/18/speaking-sbl-2017-linking-lexical-resources/Speaking at SBL 2017 on Linking Lexical Resources2017-11-18T11:16:02Z2017-11-18T11:16:02ZJames Tauber
<p>I&rsquo;m again speaking at the SBL Annual Meeting, this time in Boston. My topic is basically the &ldquo;lemma lattice&rdquo; work started by Ulrik Sandborg-Petersen and I back in 2006 but which I&rsquo;ve never presented in this sort of setting before.</p>
<p>I&rsquo;m again speaking at the SBL Annual Meeting, this time in Boston. My topic is basically the &ldquo;lemma lattice&rdquo; work started by Ulrik Sandborg-Petersen and I back in 2006 but which I&rsquo;ve never presented in this sort of setting before.</p>
<p>Here&rsquo;s the official abstract:</p>
<blockquote>
<p><strong>Linking Lexical Resources for Biblical Greek</strong></p>
<p>As more resources for Biblical Greek, both old and new, become openly available, the opportunities for integrating them become greater. At the level of the word, it might seem a trivial task to match based on lemma. But no two texts are lemmatised the same way and no two lexicons will make the same choices of headwords. Numerical solutions such as Strongs and Goodrick-Kohlenberger solve some problems but introduce new ones. After surveying the various issues and challenges, this talk will provide both a framework for moving forward and a report on practical ways that a variety of texts, lexicons, and other resources such as principal-part lists are being linked in the service of open, biblical digital humanities.</p>
</blockquote>
<p>I&rsquo;ll certainly post my slides after my talk but I&rsquo;ll also try to record it on my iPhone like I did at BibleTech 2015.</p>
http://jktauber.com/2017/10/16/tour-greek-morphology-part-17/A Tour of Greek Morphology: Part 172017-11-04T20:07:25Z2017-10-16T01:37:39ZJames Tauber
<p>Part seventeen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part seventeen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>As mentioned in the <a href="https://jktauber.com/2017/09/07/tour-greek-morphology-part-16/">last post</a> in the series, we now have an inflectional class for all 5,314 present active infinitive or indicative forms in the MorphGNT SBLGNT in a file that looks like the following:</p>
<div class="codehilite"><pre><span></span>010120 ἐστί(ν) 3SG PA-10 εἰμί PA-10
010123 ἐστί(ν) 3SG PA-10 εἰμί PA-10
010202 ἐστί(ν) 3SG PA-10 εἰμί PA-10
010206 εἶ 2SG PA-10 εἰμί PA-10
010213 μέλλει 3SG PA-1 μέλλω PA-1
010213 ζητεῖν INF PA-2 ζητέω PA-2
010218 εἰσί(ν) 3PL PA-10 εἰμί PA-10
010222 βασιλεύει 3SG PA-1 βασιλεύω PA-1
010303 ἐστί(ν) 3SG PA-10 εἰμί PA-10
010309 λέγειν INF PA-1 λέγω PA-1
010309 ἔχομεν 1PL PA-1/PA-8 ἔχω PA-1
</pre></div>
<p>Where the columns are:</p>
<ul>
<li>the book/chapter/verse reference</li>
<li>the normalized form</li>
<li>the morphosyntactic properties</li>
<li>the inflectional classes possible without disambiguation</li>
<li>the lemma</li>
<li>the disambiguated inflectional class</li>
</ul>
<p>Now it&rsquo;s time to do some counts.</p>
<p>Let us first of all look at the number of distinct lemmas in each of our 13 classes.</p>
<p>The numbers for classes <strong>PA-5</strong> and above are low enough that we should look at them individually:</p>
<table class="table">
<tr><th nowrap>PA-1 <td>barytone omega verbs<td>338</tr>
<tr><th nowrap>PA-2 <td>circumflex omega verbs with INF -εῖν / 3SG -εῖ<td>145</tr>
<tr><th nowrap>PA-3 <td>circumflex omega verbs with INF -οῦν / 3SG -οῖ<td>21</tr>
<tr><th nowrap>PA-4 <td>circumflex omega verbs with INF -ᾶν / 3SG -ᾷ<td>31 </tr>
<tr><th nowrap>PA-5 <td>ζάω + compound (συζάω)<td>2 </tr>
<tr><th nowrap>PA-6a <td>ὀμνύω; δείκνυμι + compound (ἀμφιέννυμι)<td>3 </tr>
<tr><th nowrap>PA-7 <td>τίθημι + compounds (ἐπιτίθημι παρατίθημι περιτίθημι);<br>compounds of ἵημι (ἀφίημι συνίημι)<td>6 </tr>
<tr><th nowrap>PA-8 <td>δίδωμι + compounds (διαδίδωμι ἀποδίδωμι μεταδίδωμι παραδίδωμι<td>5 </tr>
<tr><th nowrap>PA-9 <td>compounds of ίστημι (καθίστημι μεθίστημι συνίστημι);<br>compound of φημί (σύμφημι);<br>that one weird case of συνίημι<td>5 </tr>
<tr><th nowrap>PA-9-ENC <td>φημί<td>1 </tr>
<tr><th nowrap>PA-10 <td>εἰμί<td>1 </tr>
<tr><th nowrap>PA-10-COMP <td>compounds of εἰμί (ἄπειμι ἔξεστι(ν) πάρειμι)<td>3 </tr>
<tr><th nowrap>PA-11-COMP <td>compounds of εἶμι (ἔξειμι εἴσειμι)<td>2 </tr>
</table>
<p>Notice that even the small counts are elevated due to compound verbs. Folding compounds of the same base verb, the classes from <strong>PA-5</strong> on have only one or two members.</p>
<p>This is just looking at the number of unique lemmas in each class but there are two other sets of numbers that are worth looking at: (1) the total number of tokens in the SBLGNT; (2) the distribution of classes amongst the hapax legomena.</p>
<table class="table">
<tr><th>class <th>lemmas <th>tokens <th>hapax <th>hapax details
<tr><th nowrap>PA-1 <td>338 <td>2563 <td>151
<tr><th nowrap>PA-2 <td>145 <td>856 <td>65
<tr><th nowrap>PA-3 <td>21 <td>35 <td>15
<tr><th nowrap>PA-4 <td>31 <td>117 <td>16
<tr><th nowrap>PA-5 <td>2 <td>41 <td>1 <td>συζάω
<tr><th nowrap>PA-6a <td>3 <td>5 <td>2 <td>ὀμνύω ἀμφιέννυμι
<tr><th nowrap>PA-7 <td>6 <td>37 <td>3 <td>εἴσειμι παρίστημι παρατίθημι
<tr><th nowrap>PA-8 <td>5 <td>35 <td>2 <td>διαδίδωμι μεταδίδωμι
<tr><th nowrap>PA-9 <td>5 <td>9 <td>3 <td>συνίημι σύμφημι μεθίστημι
<tr><th nowrap>PA-9-ENC <td>1 <td>22 <td>0
<tr><th nowrap>PA-10 <td>1 <td>1551 <td>0
<tr><th nowrap>PA-10-COMP <td>3 <td>39 <td>1 <td>ἄπειμι
<tr><th nowrap>PA-11-COMP <td>2 <td>4 <td>1 <td>εἴσειμι
</table>
<p>Why do the hapax legomena matter? Well they give an indication of what classes were still productive.</p>
<p>Note, however, that the hapax in <strong>PA-5</strong> and above are VERY low in number and, with the exception of ὀμνύω in <strong>PA-6a</strong> they are all compounds. This strongly suggests that only <strong>PA-1</strong>, <strong>PA-2</strong>, <strong>PA-3</strong>, and <strong>PA-4</strong> were productive.</p>
<p>Notice that the token numbers for <strong>PA-6a</strong>, <strong>PA-9</strong> and <strong>PA-11-COMP</strong> are particularly low too. Potentially relevant in the case of <strong>PA-6a</strong> and <strong>PA-9</strong> is that these are the classes most like to have developed thematic alternatives. This might be worthy of a future post in this series!</p>
<p>Let&rsquo;s now look at counts for each paradigm cell for each class:</p>
<table class="table">
<tr><th>&nbsp; <th nowrap>PA-1 <th nowrap>PA-2 <th nowrap>PA-3 <th nowrap>PA-4 <th nowrap>PA-5 <th nowrap>PA-6a <th nowrap>PA-7 <th nowrap>PA-8 <th nowrap>PA-9 <th nowrap>PA-9-ENC <th nowrap>PA-10 <th nowrap>PA-10-COMP <th nowrap>PA-11-COMP</tr>
<tr><th>INF <td>394 <td>171 <td>5 <td>21 <td>13 <td>1 <td>11 <td>10 <td>1 <td>- <td>124 <td>3 <td>3</tr>
<tr><th>1SG <td>460 <td>116 <td>3 <td>21 <td>6 <td>1 <td>7 <td>10 <td>2 <td>4 <td>138 <td>1 <td>-</tr>
<tr><th>2SG <td>164 <td>46 <td>- <td>5 <td>2 <td>- <td>- <td>1 <td>- <td>- <td>92 <td>1 <td>-</tr>
<tr><th>3SG <td>923 <td>295 <td>16 <td>35 <td>13 <td>3 <td>11 <td>13 <td>5 <td>17 <td>896 <td>31 <td>-</tr>
<tr><th>1PL <td>141 <td>52 <td>2 <td>19 <td>5 <td>- <td>1 <td>- <td>- <td>- <td>52 <td>1 <td>-</tr>
<tr><th>2PL <td>218 <td>99 <td>4 <td>8 <td>1 <td>- <td>4 <td>- <td>- <td>- <td>93 <td>1 <td>-</tr>
<tr><th>3PL <td>263 <td>77 <td>5 <td>8 <td>1 <td>- <td>3 <td>1 <td>1 <td>1 <td>156 <td>1 <td>1</tr>
<tr><th>&nbsp; <th>2563 <th>856 <th>35 <th>117 <th>41 <th>5 <th>37 <th>35 <th>9 <th>22 <th>1551 <th>39 <th>4</tr>
</table>
<p>What is obvious from this is just how important, regardless of inflectional class, the <strong>3SG</strong> form is. The <strong>INF</strong> is also very important. We&rsquo;ve seen in a previous post that both cells are very good predictors of inflectional class (much better than <strong>1SG</strong>) but they are also just both very common. The <strong>1SG</strong>, despite being a bad predictor, is still important in terms of frequency.</p>
<p>The <strong>3PL</strong> is a distant fourth with one apparent deviation: it is very common in <strong>PA-10</strong> (i.e. the copula), more so than the <strong>INF</strong> or <strong>1SG</strong>. In fact, the proportion of <strong>3PL</strong> in this class is actually average, it&rsquo;s the <strong>INF</strong> and <strong>1SG</strong> that are unusually low (with much of the frequency drop taken up by the <strong>3SG</strong>).</p>
<p>As well as εἰμί, φημί (<strong>PA-9-ENC</strong>) is also disproportionately <strong>3SG</strong>.</p>
<p>Of course, given how common <strong>PA-1</strong> is, even the plurals there outnumber the most common cells in the other classes.</p>
<p>If the goal is just to identify the person/number, not the class, (which is true in reception but not learning) then a lot of those numbers collapse because of shared endings. Here are the counts just focused on the common endings (without accents):</p>
<table class="table">
<tr><td rowspan=2>INF <td>-ν <td>604
<tr> <td>-ναι <td>153
<tr><td rowspan=2>1SG <td>-ω <td>606
<tr> <td>-μι <td>163
<tr><td rowspan=3>2SG <td>-{ι}ς <td>217
<tr> <td>-ς <td>1
<tr> <td>(-)ει <td>93
<tr><td rowspan=3>3SG <td>-{ι} <td>1282
<tr> <td>-σι(ν) <td>49
<tr> <td>(-)εστι(ν) <td>927
<tr><td>1PL <td>-μεν <td>273
<tr><td>2PL <td>-τε <td>448
<tr><td rowspan=2>3PL <td>-σι(ν) <td>511
<tr> <td>-ασι(ν) <td>7
</table>
<p>This just emphasises even more (even though it was in the previous table) that there is only 1 <strong>2SG</strong> in -ς (without an iota, subscripted or otherwise): the παραδίδως in Luke 22.48.</p>
<p>The 7 <strong>3PL</strong>s in -ασι(ν) are:</p>
<ul>
<li>τιθέασι(ν) in Matt 5.15 </li>
<li>ἐπιτιθέασι(ν) in Matt 23.4 </li>
<li>περιτιθέασι(ν) in Mark 15.17 </li>
<li>φασί(ν) in Rom 3.8 </li>
<li>συνιᾶσι(ν) in 2Co 10.12 </li>
<li>εἰσίασι(ν) in Heb 9.6 </li>
<li>διδόασι(ν) in Rev 17.13 </li>
</ul>
<p>One <em>could</em> argue that these are subsumed by saying <strong>3PL</strong> ends in -σι(ν) but given that, in the very same lexemes, -σι(ν) can also indicate <strong>3SG</strong>, it is useful calling out the α, even though the root vowel alternation is enough to distinguish singular and plural.</p>
<p>That&rsquo;s it (for now) for counts of the present actives. In the next couple of posts, we&rsquo;ll turn to the middle forms.</p>
http://jktauber.com/2017/11/03/four-types-but/Four Types of But2017-11-04T03:55:14Z2017-11-03T23:53:47ZJames Tauber
<p>In his talk on adversive conjunction in Gothic at the 29th UCLA Indo-European Conference, Jared Klein started with a wonderful example paragraph in English.</p>
<p>In his talk on adversive conjunction in Gothic at the 29th UCLA Indo-European Conference, Jared Klein started with a wonderful example paragraph in English.</p>
<blockquote>
In order to finish the project, I don't need money <i>but</i><sub>2</sub> time. I would like to be done by the end of this year, <i>but</i><sub>3</sub> I don't think that is going to happen. Nobody is to blame for this <i>but</i><sub>1</sub> me, because I've wasted a lot of time on things that have proved to be irrelevant. <i>But</i><sub>4</sub> this is too depressing; let's talk about something else.
</blockquote>
<p>He went on to talk about the Gothic equivalents for each but I thought it was a great illustration of four distinct types of adversatives all using &ldquo;but&rdquo; in English.</p>
<p>Klein didn&rsquo;t necessarily use the following terms but the four could be described as:</p>
<ol>
<li>prepositional</li>
<li>phrasal</li>
<li>clausal</li>
<li>discourse</li>
</ol>
http://jktauber.com/2017/11/02/tour-greek-morphology-part-19/A Tour of Greek Morphology: Part 192017-11-02T11:27:18Z2017-11-02T11:27:18ZJames Tauber
<p>Part nineteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part nineteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>It&rsquo;s now time to do for the middle forms what we did for the actives in <a href="https://jktauber.com/2017/09/07/tour-greek-morphology-part-16/">part 16</a>, namely come up with the rules to help disambiguate inflectional classes. These were sketched out in theory in <a href="https://jktauber.com/2017/08/29/tour-greek-morphology-part-14/">part 14</a> but now it&rsquo;s time to actually write the rules and test them in code against the SBLGNT.</p>
<p>This is what my Python script does:</p>
<table class="table table-condensed table-bordered table-striped">
<tr>
<td>
<b>INF</b>:Xεσθαι or
<b>3SG</b>:Xεται or
<b>2PL</b>:Xεσθε
<td><i>is</i>
<td>
<b>PM-1</b> if lemma ends in ω or ομαι<br>
<b>PM-7</b> if lemma ends ημι
<tr>
<td>
<b>1SG</b>:Xομαι or
<b>1PL</b>:Xόμεθα or
<b>3PL</b>:Xονται
<td><i>is</i>
<td>
<b>PM-8</b> if lemma ends in δίδομαι<br>
<b>PM-1</b> if lemma ends in ω or otherwise ends in ομαι
<tr>
<td>
<b>1SG</b>:Xοῦμαι or
<b>3PL</b>:Xοῦνται
<td><i>is</i>
<td>
<b>PM-2</b> if lemma ends in έω or έομαι<br>
<b>PM-3</b> if lemma ends in όω or όομαι
<tr>
<td>
<b>1SG</b>:Xῶμαι or
<b>1PL</b>:Xώμεθα or
<b>3PL</b>:Xῶνται
<td><i>is</i>
<td>
<b>PM-5</b> if lemma ends in χράομαι<br>
<b>PM-4</b> if lemma otherwise ends in άομαι
<tr>
<td>
<b>2SG</b>:Xῇ
<td><i>is</i>
<td>
<b>PM-2</b> if lemma ends in έω or έομαι<br>
<b>PM-5</b> if lemma ends in άομαι
<tr>
<td>
<b>1PL</b>:Xύμεθα
<td><i>is</i>
<td>
<b>PM-2</b> if lemma ends in έω or έομαι<br>
<b>PM-3</b> if lemma ends in όω or όομαι (not needed in SBLGNT)<br>
<b>PM-5</b> otherwise (not needed in SBLGNT)
<tr>
<td>
<b>3SG</b>:Xεῖται or
<b>2PL</b>:Xεῖσθε
<td><i>is</i>
<td>
<b>PM-2</b> if lemma ends in έω or έομαι<br>
<b>PM-11</b> if lemma ends in εῖμαι
<tr>
<td>
<b>1PL</b>:Xείμεθα
<td><i>is</i>
<td>
<b>PM-11</b> if lemma is κεῖμαι<br>
<b>PM-11-COMPOUND</b> otherwise (not needed in SBLGNT)
<tr>
<td>
<b>INF</b>:Xεῖσθαι
<td><i>is</i>
<td>
<b>PM-2</b> if lemma ends in έω or έομαι<br>
<b>PM-11</b> if lemma is κεῖμαι (not needed in SBLGNT)<br>
<b>PM-11-COMPOUND</b> otherwise
<tr>
<td>
<b>INF</b>:Xῆσθαι
<td><i>is</i>
<td>
<b>PM-10-COMPOUND</b> if lemma is κάθημαι<br>
<b>PM-5</b> otherwise (not needed in SBLGNT)
</table>
<p>I decided to cover a bunch of ambiguities not specifically needed by the SBLGNT—not strictly necessary but it will help when the script is extended to run on a larger corpus.</p>
<p>Note the special-casing of δίδομαι, κεῖμαι, κάθημαι, and χράομαι. χράομαι is an example, like ζάω in <a href="https://jktauber.com/2017/09/07/tour-greek-morphology-part-16/">part 16</a>, that is misleadingly lemmatized with an alpha. More on that later!</p>
<p>We now have an inflectional class for all 820 present middle infinitive or indicative forms in the MorphGNT SBLGNT.</p>
<p>You can download the entire output of my Python script <a href="https://gist.github.com/jtauber/accb8180f56fceee37f57a040faa4b8a">here</a>.</p>
<p>Are there multiple classes for a particular lexeme (like there was in the active)?</p>
<p>Two of the 167 lexemes show multiple classes:</p>
<ul>
<li>δύναμαι: <strong>PM-9</strong> normally but a <strong>2SG</strong>:δύνῃ that comes up as a <strong>PM-1</strong> (<strong>PM-9</strong> would predict a Xασαι)</li>
<li>κάθημαι: <strong>PM-10-COMPOUND</strong> normally but a <strong>2SG</strong>:κάθῃ that comes up as a <strong>PM-1</strong> (<strong>PM-10-COMPOUND</strong> would predict a Xησαι)</li>
</ul>
<p>If κάθῃ were καθῇ, we&rsquo;d have the possibility of reanalysis as a <strong>PM-5</strong> and it&rsquo;s still possible that&rsquo;s what&rsquo;s going on and the accentuation just doesn&rsquo;t reflect that.</p>
<p>δύνῃ for δύνασαι is somewhat less expected and it should be noted that both forms appear in the SBLGNT, sometimes within the same author. That the <strong>PM-4</strong> <strong>2SG</strong> all show up with an un-contracted ᾶσαι adds slightly more mystery.</p>
<p>For now we&rsquo;ll leave δύνῃ and κάθῃ as <strong>PM-1</strong> but we revisit them later.</p>
<p>In the next part, we&rsquo;ll look at counts for the present middles across the SBLGNT.</p>
http://jktauber.com/2017/11/01/ucla-indo-european-conference/Off to the UCLA Indo-European Conference2017-11-01T23:28:16Z2017-11-01T23:28:16ZJames Tauber
<p>Tomorrow I&rsquo;m off to Los Angeles for the <a href="http://www.pies.ucla.edu/IECprogram.html">Twenty-Ninth Annual UCLA Indo-European Conference</a>.</p>
<p>Tomorrow I&rsquo;m off to Los Angeles for the <a href="http://www.pies.ucla.edu/IECprogram.html">Twenty-Ninth Annual UCLA Indo-European Conference</a>.</p>
<p>Indo-European studies are notoriously impenetrable, even for linguists, but a couple of months ago, I finally decided now was the time to attend this major conference (to the extent an IE conference <em>can</em> be &ldquo;major&rdquo;).</p>
<p>I&rsquo;m not great at conferences at the best of times, especially when I&rsquo;m not a speaker and/or don&rsquo;t know very many people, so this will be quite a stepping-out-of-the-comfort-zone for me.</p>
<p>But as an aspiring comparative philologist, I&rsquo;m sure it&rsquo;s going to be very rewarding for me.</p>
http://jktauber.com/2017/10/27/tour-greek-morphology-part-18/A Tour of Greek Morphology: Part 182017-10-30T15:48:13Z2017-10-27T05:20:41ZJames Tauber
<p>Part eighteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part eighteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In <a href="https://jktauber.com/2017/08/26/tour-greek-morphology-part-13/">Part 13</a> we summarised the present active endings and in <a href="https://jktauber.com/2017/09/05/tour-greek-morphology-part-15/">part 15</a> posed the question &ldquo;Do these paradigms cover all the forms in the Greek New Testament?&rdquo;</p>
<p>Now we&rsquo;re going to answer the same question for the middle endings summarised in <a href="https://jktauber.com/2017/08/29/tour-greek-morphology-part-14/">part 14</a>.</p>
<p>Again, I&rsquo;ve written a short Python program that reveals there are 16 forms in 23 instances that do NOT match.</p>
<p>Two of these forms are of κάθημαι: the <strong>1SG</strong> itself plus the <strong>3SG</strong> κάθηται. The <strong>3SG</strong> bears a resemblance to the <strong>PM-5</strong> <strong>3SG</strong> (differing only in accent) but this is not a circumflex verb. The existence of the η in the <strong>1SG</strong> rather than an ῶ indicates this is an athematic verb. It is in fact a compound verb κατά+ἧμαι.</p>
<p>We don&rsquo;t have a paradigm class for ἧμαι OR its compounds so let&rsquo;s add them now.</p>
<table class="table">
<tr><th>&nbsp; <th>PM-10 <th>PM-10-COMPOUND
<tr><th>INF <td><i>ἧσθαι</i> <td><i>Xῆσθαι</i>
<tr><th>1SG <td><i>ἧμαι</i> <td>Xημαι
<tr><th>2SG <td><i>ἧσαι</i> <td><i>Xησαι</i>
<tr><th>3SG <td><i>ἧται</i> <td>Xηται
<tr><th>1PL <td><i>ἥμεθα</i> <td><i>Xήμεθα</i>
<tr><th>2PL <td><i>ἧσθε</i> <td><i>Xησθε</i>
<tr><th>3PL <td><i>ἧνται</i> <td><i>Xηνται</i>
</table>
<p>(we don&rsquo;t actually need <strong>PM-10</strong> for the SBLGNT but I&rsquo;ve included it for completeness)</p>
<p>Next we have κεῖμαι and ITS compounds which account for 10 more forms. Here again we have an athematic verb with a vowel we haven&rsquo;t covered before.</p>
<table class="table">
<tr><th>&nbsp; <th>PM-11 <th>PM-11-COMPOUND
<tr><th>INF <td><i>Xεῖσθαι</i> <td><i>Xεῖσθαι</i>
<tr><th>1SG <td>Xεῖμαι <td><i>Xειμαι</i>
<tr><th>2SG <td><i>Xεῖσαι</i> <td><i>Xεισαι</i>
<tr><th>3SG <td><i>Xεῖται</i> <td>Xειται
<tr><th>1PL <td>Xείμεθα <td><i>Xείμεθα</i>
<tr><th>2PL <td><i>Xεῖσθε</i> <td><i>Xεισθε</i>
<tr><th>3PL <td><i>Xεῖνται</i> <td>Xεινται
</table>
<p>Note that <strong>INF</strong> and <strong>1PL</strong> are identical between the two of them (so will be an ambiguity we&rsquo;ll need to cover, although not for the SBLGNT).</p>
<p>Our next word is οἶμαι which only appears in the SBLGNT in the <strong>1SG</strong>. We won&rsquo;t reconstruct the entire paradigm (we may come back to it later) but will use <strong>PM-12</strong> to designate the οἶμαι form.</p>
<p>This leaves us with three forms, all <strong>2SG</strong>:</p>
<ul>
<li>καυχᾶσαι</li>
<li>ὀδυνᾶσαι</li>
<li>κατακαυχᾶσαι</li>
</ul>
<p>In all cases, this looks a lot like a <strong>PM-4</strong> that just hasn&rsquo;t dropped the sigma in -ᾶσαι to form -ᾷ. In fact, all the <strong>PM-4</strong>s in the SBLGNT seem to have this behaviour so we probably shouldn&rsquo;t treat it as a separate paradigm but rather an alternative realisation within the <strong>PM-4</strong> <strong>2SG</strong> cell (similar to Xῃ/Xει in the <strong>PM-1</strong>). We&rsquo;ll discuss in a later post why <strong>PM-4</strong> might exhibit this when other circumflex middle paradigms don&rsquo;t seem to.</p>
<p>But with this tweak and the additions of <strong>PM-10</strong>, <strong>PM-10-COMPOUND</strong>, <strong>PM-11</strong>, <strong>PM-11-COMPOUND</strong>, and <strong>PM-12</strong> we now have full coverage of the present middle indicatives and infinitives in the SBLGNT.</p>
<p>You may be wondering whether we could have just identified these paradigms way back when we first laid out the different present middle paradigms. We absolutely could have. But I think the way we&rsquo;ve discovered them demonstrates an important concept: that of rigorously testing a linguistic model against a corpus.</p>
<p>This whole blog series is, in fact, laying the ground work for a rigorous description of Greek morphology that has been my goal to write for many years.</p>
<p>But coming back to the short term: we still have to explore the disambiguation of assigning inflectional classes to the middle forms, like we did for the actives in <a href="https://jktauber.com/2017/09/07/tour-greek-morphology-part-16/">part 16</a>. We&rsquo;ll do that in the next part.</p>
http://jktauber.com/2017/09/07/tour-greek-morphology-part-16/A Tour of Greek Morphology: Part 162017-10-15T22:57:52Z2017-09-07T03:42:10ZJames Tauber
<p>Part sixteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part sixteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In the <a href="https://jktauber.com/2017/09/05/tour-greek-morphology-part-15/">previous post</a> we went through and made sure we had all our active endings covered ready for counting. As pointed out (and in detail in <a href="https://jktauber.com/2017/08/26/tour-greek-morphology-part-13/">Part 13</a>), though, we still had some ambiguities. If we want to assign just a single inflectional class to each form in the SBLGNT, we need some way of disambiguating. Fortunately, the lemma does this (even if it resorts to using fake forms like the uncontracted circumflex <strong>1SG</strong>s).</p>
<p>This allows us to write code that basically follows these rules:</p>
<table class="table table-condensed table-bordered table-striped">
<tr>
<td>
<b>1SG</b>:Xημι or
<b>3SG</b>:Xησι(ν)
<td><i>is</i>
<td>
<b>PA-7</b> if lemma ends in τίθημι or ίημι<br>
<b>PA-9</b> if lemma ends in ίστημι or φημι
<tr>
<td>
<b>1PL</b>:Xῶμεν or
<b>3PL</b>:Xῶσι(ν)
<td><i>is</i>
<td>
<b>PA-5</b> if lemma is ζάω<br>
<b>PA-4</b> otherwise
<tr>
<td>
<b>1PL</b>:Xοῦμεν or
<b>3PL</b>:Xοῦσι(ν)
<td><i>is</i>
<td>
<b>PA-2</b> if lemma ends in έω<br>
<b>PA-3</b> if lemma ends in όω
<tr>
<td>
<b>2PL</b>:Xετε
<td><i>is</i>
<td>
<b>PA-1</b> if lemma ends in ω<br>
<b>PA-7</b> if lemma ends in ημι
<tr>
<td>
<b>1PL</b>:Xομεν
<td><i>is</i>
<td>
<b>PA-1</b> if lemma ends in ω<br>
<b>PA-8</b> if lemma ends in ωμι
<tr>
<td>
<b>1SG</b>:Xῶ
<td><i>is</i>
<td>
<b>PA-2</b> if lemma ends in έω<br>
<b>PA-3</b> if lemma ends in όω<br>
<b>PA-5</b> if lemma is ζάω<br>
<b>PA-4</b> if lemma otherwise ends in άω
<tr>
<td>
<b>INF</b>:Xέναι
<td><i>is</i>
<td>
<b>PA-7</b> if lemma ends with ίημι<br>
<b>PA-11-COMPOUND</b> if lemma ends with ειμι
</table>
<p>Part 13 also mentioned the <strong>2SG</strong>:Xης ambiguity between <strong>PA-7</strong> and <strong>PA-9</strong> but that doesn&rsquo;t crop up in the SBLGNT: there are in fact no <strong>PA-7</strong> OR <strong>PA-9</strong> <strong>2SG</strong>s in the SBLGNT.</p>
<p>There ARE however three <strong>1PL</strong> forms which do still cause a problem with the rules above:</p>
<ul>
<li>ἀφίομεν</li>
<li>ἱστάνομεν</li>
<li>συνιστάνομεν</li>
</ul>
<p>Each of these matches <strong>1PL</strong>:Xομεν BUT the MorphGNT lemmas are ἀφίημι, ἵστημι, and συνίστημι respectively.</p>
<p>What is happening here is that new forms have developed belonging to a different inflectional class than the particular form chosen for the lemma. For example ἱστάνομεν is an ω verb but it&rsquo;s otherwise the same as the athematic ἵστημι. Arguably the MorphGNT lemmatization could be changed to ἱστάνω if you consider a difference in inflectional class to be a new lexeme. This is a topic I&rsquo;ll be covering in my talk at SBL 2017 in Boston in November. For now, in our Python code, we&rsquo;ll just special-case these as <strong>PA-1</strong> but we will come back to discussing this more. Note that we only caught this here because it was an ambiguous form so we were checking for particular lemma patterns.</p>
<p>We now have an inflectional class for all 5,314 present active infinitive or indicative forms in the MorphGNT SBLGNT.</p>
<p>The output of my Python script begins:</p>
<div class="codehilite"><pre><span></span>010120 ἐστί(ν) 3SG PA-10 εἰμί PA-10
010123 ἐστί(ν) 3SG PA-10 εἰμί PA-10
010202 ἐστί(ν) 3SG PA-10 εἰμί PA-10
010206 εἶ 2SG PA-10 εἰμί PA-10
010213 μέλλει 3SG PA-1 μέλλω PA-1
010213 ζητεῖν INF PA-2 ζητέω PA-2
010218 εἰσί(ν) 3PL PA-10 εἰμί PA-10
010222 βασιλεύει 3SG PA-1 βασιλεύω PA-1
010303 ἐστί(ν) 3SG PA-10 εἰμί PA-10
010309 λέγειν INF PA-1 λέγω PA-1
010309 ἔχομεν 1PL PA-1/PA-8 ἔχω PA-1
</pre></div>
<p>The columns are:</p>
<ul>
<li>the book/chapter/verse reference</li>
<li>the normalized form</li>
<li>the morphosyntactic properties</li>
<li>the inflectional classes possible without disambiguation</li>
<li>the lemma</li>
<li>the disambiguated inflectional class</li>
</ul>
<p>You can download the entire thing <a href="https://gist.github.com/jtauber/510a1aa27e2d7e2ccb979fd152ee9e8a/f950582b7f03fec5bf09d155ead2b98734ab636e">here</a>.</p>
<p>We&rsquo;ll use this to do our counts in the next post.</p>
<p>One question comes to mind: are the disambiguated inflectional classes consistent for all the forms of a lexeme (beyond the three exceptions we already saw above)?</p>
<p>Well, looking at the full output of the script, we find there are a few more in the SBLGNT:</p>
<table class="table table-condensed table-bordered">
<tr><th rowspan=3>ὀμνύω <td rowspan=2><b>INF</b><td>ὀμνύναι <td><b>PA-6a</b>
<tr> <td>ὀμνύειν <td rowspan=2><b>PA-1</b>
<tr> <td colspan=2><i>all other forms</i>
<tr><th rowspan=4>δείκνυμι <td><b>INF</b><td>δεικνύειν <td rowspan=2><b>PA-1</b>
<tr> <td><b>2SG</b><td>δεικνύεις
<tr> <td><b>1SG</b><td>δείκνυμι <td rowspan=2><b>PA-6a</b>
<tr> <td><b>3SG</b><td>δείκνυσι(ν)
<tr><th rowspan=4>συνίστημι <td><b>1PL</b><td>συνιστάνομεν <td rowspan=2><b>PA-1</b>
<tr> <td><b>INF</b><td>συνιστάνειν
<tr> <td><b>1SG</b><td>συνίστημι <td rowspan=2><b>PA-9</b>
<tr> <td><b>3SG</b><td>συνίστησι(ν)
<tr><th rowspan=4>ἀφίημι <td><b>1PL</b><td>ἀφίομεν <td rowspan=2><b>PA-1</b>
<tr> <td><b>3PL</b><td>ἀφίουσι(ν)
<tr> <td><b>2SG</b><td>ἀφεῖς <td><b>PA-2</b>
<tr> <td colspan=2><i>all other forms</i> <td><b>PA-7</b>
<tr><th rowspan=4>συνίημι <td><b>INF</b><td>συνιέναι <td rowspan=2><b>PA-7</b>
<tr> <td><b>2PL</b><td>συνίετε
<tr> <td rowspan=2><b>3PL</b><td>συνίουσι(ν) <td><b>PA-1</b>
<tr> <td>συνιᾶσι(ν) <td><b>PA-9</b>
</table>
<p>In each case we have an originally athematic verb occasionally acting like it&rsquo;s thematic (and, in the case of ὀμνύω even the lemma is written as if it was thematic). We WILL have more to say about this in a few posts but we&rsquo;ve now done enough that we can count how many times each inflectional class appears in the SBLGNT and how many different lexemes follow each inflectional class. We&rsquo;ll do that in the very next post.</p>
<p>There is still another thing worth checking: is the value of X in our paradigm patterns consistent across a lexeme too? Yes it is, accent aside, if you only compare within the same inflectional class. The X for the δείκνυμι cells in <strong>PA-6a</strong> is always δείκν, for example, but the <strong>PA-1</strong> cases have X = δεικνύ.</p>
<p><strong>UPDATE</strong>: I just discovered a mis-disambiguated παριστάνετε that needs to be special-cased as a <strong>PA-1</strong>.</p>
http://jktauber.com/2017/09/25/pyuca-12-released-support-new-versions-unicode/pyuca 1.2 Released with Support for New Versions of Unicode2017-09-25T23:58:20Z2017-09-25T23:58:20ZJames Tauber
<p>pyuca is my pure-Python implementation of the Unicode Collation Algorithm—a library I use almost every day to properly sort Greek (although the library is not Greek-specific). I was recently asked how to use pyuca with a more recent DUCET than 6.3.0. That led to me needing to make a number of changes to the core code so it now supports 8.0.0, 9.0.0 and 10.0.0 as long as you have the right Python version.</p>
<p>pyuca is my pure-Python implementation of the Unicode Collation Algorithm—a library I use almost every day to properly sort Greek (although the library is not Greek-specific). I was recently asked how to use pyuca with a more recent DUCET than 6.3.0. That led to me needing to make a number of changes to the core code so it now supports 8.0.0, 9.0.0 and 10.0.0 as long as you have the right Python version.</p>
<p>pyuca has always supported custom collation element tables, but when someone tried the DUCET from Unicode 8.0.0, the test suite failed.</p>
<p>At first I thought perhaps that was because the test suite is from 6.3.0 (or 5.2.0 if running Python 2.7) but when I got around to trying the 8.0.0 test suite on the 8.0.0 DUCET it too failed.</p>
<p>It turned out to be that a few changes were made by the Unicode Consortium to what code points are considered CJK Unified Ideographs. This is hard-coded in pyuca because it&rsquo;s required for implementing the implicit weight calculations (weights for certain CJK ideographs are calculated programmatically rather than explicitly listed in the DUCET).</p>
<p>In 9.0.0 the collation element table format was slightly changed to add a new @implicitweights directive so for things to work with 9.0.0, I had to implement that. Then in 10.0.0, more changes were made to what code points are considered CJK Unified Ideographs.</p>
<p>It didn&rsquo;t stop there, though. Because pyuca relies on Python&rsquo;s <code>unicodedata</code> library for getting information on character categories, certain versions of Python won&rsquo;t work with certain versions of Unicode.</p>
<p>So I added some logic (both to pyuca itself, and to the test suite) to use the appropriate collation code (with the right implicit weight calculations) and appropriate DUCET depending on what version of Python you are running.</p>
<p>Some of this dispatching-based-on-Python-version had already been written by Chris Beaven, Paul McLanahan, and Michal Čihař as part of their backporting of pyuca to 2.7 (after I&rsquo;d declared I&rsquo;d only support 3). So I just extended this with the following results:</p>
<ul>
<li>Python 2.7: test and use 5.2.0</li>
<li>Python 3.3: test 5.2.0, 6.3.0 and use 6.3.0 by default</li>
<li>Python 3.4: test 5.2.0, 6.3.0 and use 6.3.0 by default</li>
<li>Python 3.5: test 5.2.0, 6.3.0, 8.0.0 and use 8.0.0 by default</li>
<li>Python 3.6: test 5.2.0, 6.3.0, 8.0.0, 9.0.0 and use 9.0.0 by default</li>
<li>Python 3.7-dev: test 5.2.0, 6.3.0, 8.0.0, 9.0.0, 10.0.0 (so we&rsquo;re ready)</li>
</ul>
<p>pyuca 1.2 has now been released and is available on PyPI. The repository is at <a href="https://github.com/jtauber/pyuca">https://github.com/jtauber/pyuca</a>.</p>
http://jktauber.com/2015/10/30/core-vocabulary-new-testament-greek/The Core Vocabulary of New Testament Greek2017-09-21T03:37:28Z2015-10-30T19:00:00ZJames Tauber
<p>In a 2008 paper, Wilfred Major constructs what he calls the 50% and 80% vocab lists for Classical Greek. That is, the lemmata that account for 50% and 80% respectively of tokens in the Classical Greek corpus. In this post I provide the code for the equivalent for the Greek New Testament and talk about some of the results.</p>
<p>In a 2008 paper, Wilfred Major constructs what he calls the 50% and 80% vocab lists for Classical Greek. That is, the lemmata that account for 50% and 80% respectively of tokens in the Classical Greek corpus. In this post I provide the code for the equivalent for the Greek New Testament and talk about some of the results.</p>
<p>Major&rsquo;s paper is <a href="https://camws.org/cpl/cplonline/files/Majorcplonline.pdf">It’s Not the Size, It’s the Frequency: The Value of Using a Core Vocabulary in Beginning and Intermediate Greek</a> and as well as listing the 65 words in the &ldquo;50% List&rdquo; he lists the roughly 1,100 words in the &ldquo;80% List&rdquo; complete with glosses in both cases.</p>
<p>Major also discusses other issues near and dear to this blog such as the relevance of form frequency as well as lemma frequency. I&rsquo;ll respond to him on some of these topics in later blog posts.</p>
<p>Now, for many years I&rsquo;ve talked about the limitations of a purely frequency-based approach to vocab ordering but that doesn&rsquo;t mean producing such lists is useless, just that there are things we can do to improve on that approach. But I still thought it would be interesting to produce GNT 50% and 80% lists.</p>
<p>The code is available <a href="https://gist.github.com/jtauber/d05bbe3ee9536bf59147">here</a>.</p>
<p>The 50% list consists of just 27 lemmata. The only verbs are γίνομαι, εἰμί, ἔχω, and λέγω. The only nouns are θεός, κύριος, and Ἰησοῦς.</p>
<p>The 80% list consists of 317 lemmata.</p>
<p>As expected, this is considerably smaller than Major&rsquo;s Classical Greek lists which are based on a considerably larger corpus.</p>
<p>It&rsquo;s easy to tweak the code to look at forms rather than lemmata. The 50% <em>forms</em> list for the GNT consists of 97 forms from 52 lemmata.</p>
<p>Interestingly, those 97 forms consist of 16 forms of the article, 15 forms of the (1st/2nd person) personal pronouns, and 6 forms of αὐτός. This suggests that even without arguments on morphological grounds, it&rsquo;s worth learning the full paradigms for the article, the personal pronouns and αὐτός really early on.</p>
<p>Unsurprisingly, λέγω gets a decent showing with 4 forms: εἶπεν, λέγει, λέγω and λέγων. I&rsquo;ve long though it&rsquo;s worth learning those right away without needing to introduce full paradigms.</p>
<p>There&rsquo;s a lot more that could be explored even with this frequency-based approach. And lots more to say based on the other things Major talks about in his paper.</p>
<p>Finally, it should be stressed that very few full verses of the GNT would be readable with just the 80% list and probably none with the 50% list. I may do another post later on to confirm that.</p>
<p><strong>UPDATE</strong>: Now see <a href="http://jktauber.com/2015/11/16/actual-core-vocab-lists-greek-new-testament/">Actual Core Vocab Lists for Greek New Testament</a></p>
http://jktauber.com/2017/08/16/tour-greek-morphology-part-12/A Tour of Greek Morphology: Part 122017-09-05T18:33:37Z2017-08-16T23:15:54ZJames Tauber
<p>Part twelve of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part twelve of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>There is one very important verb we haven&rsquo;t looked at the paradigm of yet: the copula.</p>
<p>For comparison, we&rsquo;ll put the present infinitive and indicative forms alongside the common endings of the <strong>μι</strong> verbs we saw in <a href="https://jktauber.com/2017/08/02/tour-greek-morphology-part-10/">part 10</a>.</p>
<table class="table">
<tr><th>INF <td>εἶναι <td>-ναι
<tr><th>1SG <td>εἰμί <td>-μι
<tr><th>2SG <td>εἶ <td>-ς
<tr><th>3SG <td>ἐστί(ν) <td>-σι(ν)
<tr><th>1PL <td>ἐσμέν <td>-μεν
<tr><th>2PL <td>ἐστέ <td>-τε
<tr><th>3PL <td>εἰσί(ν) <td>-ασι(ν)
</table>
<p>Notice:</p>
<ul>
<li>all but the <strong>INF</strong> and <strong>2SG</strong> are enclitic</li>
<li>in the <strong>INF</strong>, <strong>1SG</strong>, <strong>1PL</strong> and <strong>2PL</strong> we find the expected ending</li>
<li>the <strong>3SG</strong> and <strong>3PL</strong> are slightly different</li>
<li>the <strong>2SG</strong> is lacking the ending all together</li>
<li>with all the endings removed, we sometimes have ἐσ and sometimes εἰ</li>
</ul>
<p>Recall in <a href="https://jktauber.com/2017/07/23/tour-greek-morphology-part-9/">part 9</a> we said that &ldquo;it was not uncommon for Attic-Ionic to have σι for τι in other dialects&rdquo; (a type of lenition). Perhaps the <strong>3SG</strong> ending was originally τι(ν) and it just became σι(ν) in all the <strong>μι</strong> verbs except the copula.</p>
<p>And in <a href="https://jktauber.com/2017/08/03/tour-greek-morphology-part-11/">part 11</a> we questioned &ldquo;why the active <strong>2SG</strong> and <strong>3SG</strong> forms don’t end in σι and τι to mirror σαι and ται.&rdquo; Well, what if they originally did and some change masked this?</p>
<p>The <strong>3SG</strong> τι(ν) would be explained as an original τι with the occasional movable nu. The <strong>3SG</strong> σι(ν) would just come from τι(ν) via the tendency for τι to become σι in Attic-Ionic.</p>
<p>The <strong>2SG</strong> εἶ is perfectly explainable as coming from ἐσι with the intervocalic sigma dropping. In fact, we find ἐσσί in Homer, Pindar and other writings in older or more conservative dialects. If εἶ came from an older ἐσσί, that would not only suggest a -σι ending but a ἐσ stem. [<strong>EDIT</strong>: it&rsquo;s also possible, or even likely given the evidence of other Indo-European languages, that the first sigma was dropped much earlier in Proto-Indo-European and the instances of ἐσσί are actually a reintroduction of a double sigma by analogy with the <strong>3SG</strong>!]</p>
<p>Is it plausible that εἶναι came from ἐσ+ναι and εἰμί from ἐσ+μι? Absolutely! A sigma dropping and the preceding vowel lengthening would explain those forms. But why would we still find ἐσμέν rather than, say, εἰμέν? Well it turns out Homer and Herodotus <em>do</em> have εἰμέν. There is clearly tension between keeping the ἐσ and going to εἰ and different dialects went a different way even at the level of different cells in the paradigm.</p>
<p>In the <strong>3PL</strong>, we do find that Homer (as well as εἰσί) has ἔᾱσι, following the <strong>3PL</strong> ending of the other <strong>μι</strong> verbs, but much as the <strong>ω</strong> verb ending -ουσι comes from -οντι, we can explain εἰσί from ἐσ+ντι.</p>
<p>Further justification of earlier forms comes from comparison with other Indo-European languages but doing that would take us too far afield for this survey. For now, we&rsquo;ll just summarize what we have for this new paradigm.</p>
<p>We&rsquo;ll call this <strong>PA-10</strong> but because of the ἐσ/εἰ alternation, we can&rsquo;t really isolate distinguishers across the entire paradigm other than the full words themselves. </p>
<table class="table">
<tr><th>&nbsp; <th>PA-10 <td>&nbsp; <td>&nbsp;
<tr><th>INF <td>εἶναι <td>ἐσ+ναι <td><i>sigma-drop and compensatory lengthening</i>
<tr><th>1SG <td>εἰμί <td>ἐσ+μι <td><i>sigma-drop and compensatory lengthening</i>
<tr><th>2SG <td>εἶ <td>ἐσ+σι <td><i>sigma-drop (twice) and compensatory lengthening</i>
<tr><th>3SG <td>ἐστί(ν) <td>ἐσ+τι <td>
<tr><th>1PL <td>ἐσμέν <td>ἐσ+μεν <td>
<tr><th>2PL <td>ἐστέ <td>ἐσ+τε <td>
<tr><th>3PL <td>εἰσί(ν) <td>ἐσ+ντι <td><i>lenition of tau, sigma and nu drop with compensatory lengthening</i>
</table>
<p>As always, I stress this is a historical explanation, not an explanation of what was going on in the minds of native Greek speakers nor the best way to initially learn the forms of the copula.</p>
<p>The μι/σι/τι/ντι pattern is fascinating, though; with its parallel to the middle μαι/σαι/ται/νται.</p>
<p>There are still, of course, open questions, like the relationship between these endings and those of the <strong>ω</strong> verbs that differ (not least of which -μι vs -ω itself!) Or the fact that our other <strong>μι</strong> verbs seemed to use a different vowel in the singular than the plural and there&rsquo;s no sign of that in the copula. [<strong>EDIT</strong>: also as noted, ἐσσι as the original form is problematic; it was likely ἐσι in Proto-Greek.]</p>
<p>One earlier observation we can say a little bit more about now, though, is the alpha in the -ασι(ν) ending which previously seemed inexplicable. As we shall see later on, when a <strong>ν</strong> can&rsquo;t be pronounced in a particular context, it often became an <strong>α</strong> rather than just dropping out completely. Given we reconstruct an <strong>ν</strong> in the <strong>3PL</strong> ending, this <strong>ν</strong> becoming an <strong>α</strong> rather than dropping out entirely explains -ασι(ν) (with no compensatory lengthening). Because the <strong>μι</strong> verbs (unlike the <strong>ω</strong> verbs) have a <strong>3SG</strong> ending in σι(ν), keeping the <strong>α</strong> around was useful to discriminate between the singular and plural. In the case of the copula, though, the <strong>3SG</strong> retained the <strong>τ</strong> so there was less reason to keep the old <strong>ν</strong> (pronounced as <strong>α</strong>) around and it could just drop out entirely.</p>
<p>We&rsquo;ve now covered the major present infinitive and indicative paradigms. In the next few posts in this series we&rsquo;re going to step back a little and talk about the relationship between paradigms, the notion of lemmas and citation forms, some more about cell filling and class inference, and some statistics about the frequency of these different paradigms we&rsquo;ve looked at. Then we&rsquo;ll move beyond the present and look at a whole new set of paradigms!</p>
http://jktauber.com/2017/09/05/tour-greek-morphology-part-15/A Tour of Greek Morphology: Part 152017-09-05T11:55:48Z2017-09-05T00:19:27ZJames Tauber
<p>Part fifteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part fifteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In the previous two posts in this series (<a href="https://jktauber.com/2017/08/26/tour-greek-morphology-part-13/">part 13</a> and <a href="https://jktauber.com/2017/08/29/tour-greek-morphology-part-14/">part 14</a>) we summarized the paradigms we&rsquo;ve seen so far for the present infinitive and indicative both in the active and middle.</p>
<p>Do these paradigms cover all the forms in the Greek New Testament? Which paradigms are more common? Which are productive? We&rsquo;ll explore these questions in the next few posts.</p>
<p>Let&rsquo;s start with the active forms.</p>
<p>The first test is whether every present active infinitive and indicative verb in the MorphGNT SBLGNT matches with one of the patterns we&rsquo;ve discussed GIVEN ITS MORPHOSYNTACTIC PROPERTY SET. We want to test, for example, whether every verb tagged as <code>-PAN----</code> matches one of Xειν, Xεῖν, Xοῦν, Xᾶν, Xῆν, Xύναι, Xέναι, Xόναι, Xάναι, or εἶναι. Or whether every verb tagged as <code>2PAI-S--</code> matches one of Xεις, Xεῖς, Xοῖς, Xᾷς, Xῇς, Xυς, Xης, Xως, Xης, or εἶ.</p>
<p>Running a short Python script over the MorphGNT, it turns out there are 14 forms in 69 instances that do NOT match.</p>
<p>Three of these forms are φημί. The issue here is that φημί is enclitic in the indicative and so, even though it otherwise follows a <strong>PA-9</strong> paradigm, the accentuation doesn&rsquo;t match. If we want to capture the enclitic nature of φημί in its inflection class, we&rsquo;ll need to create a variant of <strong>PA-9</strong> that is enclitic.</p>
<table class="table">
<tr><th>&nbsp; <th>PA-9 <th>PA-9-ENCLITIC
<tr><th>INF <td>Xάναι <td><i>Xάναι</i>
<tr><th>1SG <td>Xημι <td>Xημί
<tr><th>2SG <td>Xης <td><i>Xής</i>
<tr><th>3SG <td>Xησι(ν) <td>Xησί(ν)
<tr><th>1PL <td>Xαμεν <td><i>Xαμέν</i>
<tr><th>2PL <td>Xατε <td><i>Xατέ</i>
<tr><th>3PL <td>Xᾶσι(ν) <td>Xασί(ν)
</table>
<p>The <strong>2SG</strong> appears more frequently as φῄς in Classical Greek but neither form appears in the SBLGNT so we&rsquo;ll put that issue aside for now.</p>
<p>Another eight of these forms are compounds of the copula and so have different accentuation and breathing (but are otherwise identical to <strong>PA-10</strong>).</p>
<table class="table">
<tr><th>&nbsp; <th>PA-10 <th>PA-10-COMPOUND
<tr><th>INF <td>εἶναι <td>Xεῖναι
<tr><th>1SG <td>εἰμί <td>Xειμι
<tr><th>2SG <td>εἶ <td>Xει
<tr><th>3SG <td>ἐστί(ν) <td>Xεστι(ν)
<tr><th>1PL <td>ἐσμέν <td>Xεσμεν
<tr><th>2PL <td>ἐστέ <td>Xεστε
<tr><th>3PL <td>εἰσί(ν) <td>Xεισι(ν)
</table>
<p>The only additional variation here is εἰσίασιν in Hebrews 9.6 but this is not, in fact, derived from εἰς + εἰμί but rather εἰς + εἶμι. Let&rsquo;s create a new paradigm for εἶμι even though it doesn&rsquo;t appear in the the SBLGNT just so we can derive a paradigm for the compound case from it.</p>
<p>Here <strong>PA-11</strong> and <strong>PA-11-COMPOUND</strong> are shown alongside <strong>PA-10</strong> for comparison (note the italic forms don&rsquo;t appear in the SBLGNT):</p>
<table class="table">
<tr><th>&nbsp; <th>PA-10 <th>PA-11 <th>PA-11-COMPOUND
<tr><th>INF <td>εἶναι <td><i>ἰέναι</i> <td>Xιέναι
<tr><th>1SG <td>εἰμί <td><i>εἶμι</i> <td><i>Xειμι</i>
<tr><th>2SG <td>εἶ <td><i>εἶ</i> <td><i>Xει</i>
<tr><th>3SG <td>ἐστί(ν) <td><i>εἶσι(ν)</i> <td><i>Xεισι(ν)</i>
<tr><th>1PL <td>ἐσμέν <td><i>ἴμεν</i> <td><i>Xιμεν</i>
<tr><th>2PL <td>ἐστέ <td><i>ἴτε</i> <td><i>Xιτε</i>
<tr><th>3PL <td>εἰσί(ν) <td><i>ἴασι(ν)</i> <td>Xίασι(ν)
</table>
<p><strong>PA-11</strong> and <strong>PA-11-COMPOUND</strong> are very similar to <strong>PA-6a</strong> through <strong>PA-9</strong> except with ει/ι instead of υ/υ, η/ε, ω/ο, η/α. The <strong>INF</strong> being ιε is a little unexpected but outside the scope of the current discussion as we really are just wanting to capture the <strong>3PL</strong> of <strong>PA-11-COMPOUND</strong> for now.</p>
<p>Note that εἰσιέναι in Acts 3.3 is also from εἰς + εἶμι but this slipped us by because we have a Xέναι pattern already. Similarly, we have ἐξιέναι in Acts 20.7 and 27.43. With the addition of <strong>PA-11-COMPOUND</strong> we now have a slight ambiguity with <strong>PA-7</strong> (in the <strong>INF</strong>) and <strong>PA-10-COMPOUND</strong> (in the <strong>1SG</strong> and <strong>2SG</strong>). This isn&rsquo;t a problem at the moment but will come up again (as will other ambiguities) in the next post.</p>
<p>Adding these paradigm variants covers 12 of our originally non-matching forms. The remaining two are the impersonal χρή and ἔνι which represent fossilized phrases with the copula elided. For our stats we&rsquo;ll ignore them.</p>
<p>In the next post, we&rsquo;ll see if we can categorize the lexemes in the SBLGNT into inflection classes based on these paradigms and therefore be able to study how frequent they are from both a type and token perspective.</p>
http://jktauber.com/2017/09/02/more-vocabulary-statistics/More Vocabulary Statistics2017-09-02T17:56:19Z2017-09-02T17:56:19ZJames Tauber
<p>With a boost in numbers on <a href="http://vocab.oxlos.org">http://vocab.oxlos.org</a>, this post looks at some slightly more detailed statistics from the first activity.</p>
<p>With a boost in numbers on <a href="http://vocab.oxlos.org">http://vocab.oxlos.org</a>, this post looks at some slightly more detailed statistics from the first activity.</p>
<p>Just 5 days ago there were <strong>82</strong> sign ups with <strong>52</strong> people having completed the first activity. Now there have been a total of <strong>116</strong> signups and <strong>79</strong> people have done at least the first activity (with <strong>44</strong> having done more than one). Thank you very much everyone!</p>
<p>In my <a href="https://jktauber.com/2017/08/29/some-initial-vocabulary-statistics/">last post</a> we looked at mean item difficulty (what proportion of people get an item correct) by frequency bucket.</p>
<p>We saw that the coarse frequency buckets had an okay correlation with item difficulty but not great. We&rsquo;ll explore that a little more in the near future but in this post I want to introduce another dimension: the ability of the person being asked the item.</p>
<p>I should note that in psychometrics (and in item response theory in particular, which we&rsquo;ll be getting to) the term &ldquo;ability&rdquo; is used in a specific sense of the measurement we&rsquo;re trying to take of the person (with no assumption of whether it&rsquo;s innate or even desirable). It&rsquo;s just the person-specific construct we&rsquo;re trying to measure.</p>
<p>As an initial proxy for this &ldquo;ability&rdquo; in the context of the first activity on the site, I&rsquo;ve used the total percentage of items in that activity answered correctly by a given person. This is just the raw percentage of items answered correctly, not quite the same as the estimate of NT vocabulary coverage shown on the site. This raw percentage is then used to group people into buckets (just in the context of the first activity for now).</p>
<p>Now we can tabulate item frequency buckets vs person ability buckets with the following result:</p>
<div align="center">
<img src="https://jktauber.com/site_media/static/5_buckets.png" width="100%">
</div>
<p>First off, you can see we&rsquo;re still somewhat lacking in numbers of people of beginning-intermediate ability.</p>
<p>But importantly, you can see how mean item difficulty (the number in each cell) varies by ability bucket (the column). We&rsquo;ve already seen that mean item difficulty isn&rsquo;t a great predicator of item frequency bucket. Splitting out different abilities like we do above makes discrimination easier in some cases. But the important thing to note in the table above is that the mean item difficulty WITHIN a frequency bucket (row) is a good indicator of a person&rsquo;s overall ability bucket.</p>
<p>This is less the case in the bucket for the most frequent items (the row labeled <strong>1</strong>), which makes ability buckets 20% and above difficult to discriminate. Similarly, the less frequent item buckets aren&rsquo;t as good at discriminating between the lower ability buckets. This is what we would expect.</p>
<p>But overall, frequency buckets <strong>2</strong> through <strong>5</strong> (and especially <strong>3</strong> and <strong>4</strong>) do an excellent job of discriminating each of the ability buckets above 20%. <strong>5</strong> seems particularly well suited for each of the buckets at 40% ability and above and <strong>1</strong> only really between the 0–20% bucket and the rest.</p>
<p>I suspect it&rsquo;s going to be interesting to have more fine-grained item frequencies but even MORE interesting to put aside frequency all together and bucket them by overall difficulty. I&rsquo;ll do that in a subsequent post once I&rsquo;ve done the analysis. At some point I&rsquo;ll also look at individual items and their ability to discriminate ability.</p>
<p>For now, though, I did want to share a finer-grained bucketing of ability, with ten buckets instead of five:</p>
<div align="center">
<img src="https://jktauber.com/site_media/static/10_buckets.png" width="100%">
</div>
<p>The lack of people below the 50% ability mark makes this a little less useful and there are adjacent ability buckets that cease to be discriminating at this level of granularity.</p>
<p>But the important pattern is still there, assuming for now frequency is a proxy for difficulty: if an item is easy, it can&rsquo;t discriminate people of higher ability, although may be great at discriminating those of lower ability; and if an item is hard, it can&rsquo;t discriminate people of lower ability, although may be great at discriminating those of higher ability.</p>
http://jktauber.com/2015/11/13/initial-thoughts-cost-learning-form/Initial Thoughts on the Cost of Learning a Form2017-08-30T00:58:23Z2015-11-13T19:00:00ZJames Tauber
<p>Over the years, when generating vocab coverage stats or orderings for graded readers, I&rsquo;ve used either lemmas or inflected forms as the items being learnt.</p>
<p>The problem with using inflected forms is that it assumes knowing one form of a lexeme has nothing to do with knowing any other form of that lexeme. The problem with using lemmas is that it assumes knowing one form of a lexeme is enough to know all of them.</p>
<p>Over the years, when generating vocab coverage stats or orderings for graded readers, I&rsquo;ve used either lemmas or inflected forms as the items being learnt.</p>
<p>The problem with using inflected forms is that it assumes knowing one form of a lexeme has nothing to do with knowing any other form of that lexeme. The problem with using lemmas is that it assumes knowing one form of a lexeme is enough to know all of them.</p>
<p>Of course, the path forward lies somewhere in between and one of the motivations for all my <em>Morphological Lexicon</em> work is to have the necessary data in machine-actionable form to take a much more intelligent approach to the relationship between knowing one form and knowing another.</p>
<p>This gets in to some very deep areas of psycholinguistics and learnability but, for now, I&rsquo;m mostly just looking for a better measure of the &ldquo;cost&rdquo; or &ldquo;effort&rdquo; of learning a new form for the purposes of judging readability, etc. than just assuming all forms are equal or that learning a lemma gives you all the forms.</p>
<p>An initial improvement could be made by using <a href="http://jktauber.com/2015/11/03/distinguishers-morphology/">themes and distinguishers</a>. Consider λόγου, whose theme is λογ and distinguisher is ου. The theme identifies the lexeme (by definition it&rsquo;s the part of the word shared by all cells in a paradigm for a particular lexeme). The distinguisher both identifies some morphsyntactic properties (the fact it&rsquo;s a genitive singular, assuming we can tell it&rsquo;s a nominal) and gives some hints as to inflectional class (i.e. it reduces the possible distinguishers other cells in the paradigm can take).</p>
<p>So a simple way of modeling things is to say that, in order to understand λόγου, you need to know λογ and ου. Breaking apart the themes and distinguishers is an improvement over just looking at lexemes or forms. Using the theme takes care of suppletive stems too. (Although it does raise the question: does learning that two suppletives stems are the same lexeme cost effort or save it?)</p>
<p>There are a few situations that need more consideration though. Firstly stems that aren&rsquo;t truly suppletive but are systematically derived from one another. (e.g. λαμβαν / λαβ). To first approximation, you could just model this as full suppletion in terms of effort but a more refined approach would be to give a &ldquo;discount&rdquo; on the effort of learning λαμβαν if you already know λαβ or vice-versa. Even then, you&rsquo;d likely only want to provide that discount once learning the nu-infix pattern had been costed.</p>
<p>Secondly, consider families of distinguishers for the same properties that differ because of sandhi (either in that particular cell or in others, causing the theme to have less of the stem). For example here are the 28 distinguishers for dative singular nominals according to my current analysis: -ᾳ, -αντι, -ατι, -γι, -δι, -ει, -ειρι, -ενι, -εντι, -ῃ, -ι, -ιδι, -κι, -κτι, -νι, -ντι, -οϊ, -ονι, -οντι, -οτι, -ουντι, -πι, -ρι, -τι, -τῳ, -υϊ, -ῳ, -ωντι. The reason 28 are needed are because of sandhi in other cells such as the nominative singular. The only ending is -ι so you really only need to know that one thing (plus perhaps that iota is subscripted after a long alpha, eta or omega). The distinguisher analysis is still useful (particularly for its role in hinting at inflectional class) but the cost should be massively discounted once you recognize the -ι pattern.</p>
<p>Thirdly, I haven&rsquo;t yet talked about costs and discounts for the actual sandhi rules. Should the -ους ending in the genitive singular (for stems ending in εσ or οσ) be discounted if you know both the genitive singular ending -ος and the εσ+ος → ους / οσ+ος → ους sandhi rules?</p>
<p>And finally, while I&rsquo;ve talked a couple of times here about the distinguisher hinting at the inflectional class, that information hasn&rsquo;t been incorporated in to any costing or discounting in our discussions yet. It&rsquo;s worthy of a little more research into the psycholinguistics literature, but presumably seeing something like πίνακος primes you for recognizing πίναξ. It&rsquo;s also potentially useful for disambiguation: if you know the nominative plural ends in -ες, for example, then you know that -ος is a genitive singular not a nominative singular.</p>
<p>There&rsquo;s clearly lots more to explore but it reinforces what I keep saying: having data like the distinguisher analysis opens us up to explore this sort of thing and potentially incorporate it in new learning tools.</p>
<p>In this post, I&rsquo;ve just talked about morphology, but things can of course be extended (and <em>need</em> to be extended) to constructions beyond the word. That, of course, requires richer analysis beyond what I&rsquo;m doing with the <em>Morphological Lexicon</em> but that is something I eventually want to tackle as well.</p>
http://jktauber.com/2017/08/29/some-initial-vocabulary-statistics/Some Initial Vocabulary Statistics2017-08-29T04:04:24Z2017-08-29T04:04:24ZJames Tauber
<p>Here are some very preliminary statistics from the Greek Vocab site&rsquo;s first month.</p>
<p>Here are some very preliminary statistics from the Greek Vocab site&rsquo;s first month.</p>
<p>So far <strong>82</strong> people have signed up to <a href="http://vocab.oxlos.org/">http://vocab.oxlos.org/</a> and <strong>52</strong> have completed at least the first activity, a common noun receptive vocabulary leveling test based on a test form developed (for English) by Paul Nation.</p>
<p>Recall from my <a href="https://jktauber.com/2017/07/29/new-site-vocabulary-experiments/">initial post</a> on the site, that vocabulary items in that activity are classified into one of five buckets based on how many times they occur in the Greek New Testament.</p>
<p>Here are the mean results (with standard error) for each bucket for the first activity (N=52):</p>
<table class="table">
<tr><th>bucket <th>occurences <th>mean ± std err
<tr><td>1 <td>32 or more times <td>0.966 ± 0.008
<tr><td>2 <td>16 to 31 times <td>0.837 ± 0.028
<tr><td>3 <td>4 to 15 times <td>0.667 ± 0.041
<tr><td>4 <td>2 or 3 times <td>0.556 ± 0.049
<tr><td>5 <td>1 time <td>0.582 ± 0.047
</table>
<p>The first four buckets get increasingly more difficult, as one would expect. But notice the buckets 4 and 5 are indistinguishable within the standard error of the two means.</p>
<p>Here are the results of the next three activities of the same type.</p>
<table class="table">
<tr><th>bucket <th>GNT Nouns 2 <th>GNT Nouns 3 <th>GNT Nouns 4
<tr><td>&nbsp; <td>N=30 <td>N=19 <td>N=15
<tr><td>1 <td>0.985 ± 0.004 <td>0.991 ± 0.005 <td>0.985 ± 0.007
<tr><td>2 <td>0.894 ± 0.020 <td>0.901 ± 0.021 <td>0.930 ± 0.018
<tr><td>3 <td>0.631 ± 0.046 <td>0.661 ± 0.039 <td>0.689 ± 0.051
<tr><td>4 <td>0.602 ± 0.060 <td>0.570 ± 0.067 <td>0.574 ± 0.059
<tr><td>5 <td>0.450 ± 0.048 <td>0.556 ± 0.064 <td>0.611 ± 0.050
</table>
<p><strong>GNT Nouns 2</strong> actually does successfully separate buckets 4 and 5 (apparently the hapax legomena in that test were harder) but it doesn&rsquo;t do a great job distinguishing buckets 3 and 4. <strong>GNT Nouns 3</strong> fails to distinguish buckets 4 and 5 and only barely separates 3 and 4. <strong>GNT Nouns 4</strong> likewise doesn&rsquo;t really distinguish buckets 4 and 5 and only barely separates 3 and 4.</p>
<p>It should be noted that the ability level of the average person doing an activity increases with each activity. This isn&rsquo;t clear from the data presented here but is from other data. This is likely because a person who has done reasonably well on one activity is more likely to continue to do more activities.</p>
<p>I COULD mitigate this problem by only including results for earlier activities from people who have completed all four. But before I do that, I&rsquo;d actually like to just see more people do all four activities.</p>
<p>Furthermore, the vast majority of people doing these activities are scoring above 50% and, in fact, no one scoring below 40% has attempted activities beyond the first. <strong>I NEED MORE BEGINNER-INTERMEDIATE LEVEL PEOPLE</strong> to do all four tests! They will better discriminate mid-to-hard difficulty items (more on that concept later).</p>
<p>But preliminary indications are that I haven&rsquo;t quite got the buckets right yet. Fortunately, I can re-run analyses with different bucketing even if the distribution of items chosen for the tests are based on the existing bucketing scheme.</p>
<p>I&rsquo;ll continue to blog more statistics over time. Some topics I&rsquo;d like to explore include inter-test reliability, G-theory, ANOVA, and IRT modeling.</p>
<p>Thank you to everyone who is contributing to this. Please spread the word!</p>
http://jktauber.com/2017/08/29/tour-greek-morphology-part-14/A Tour of Greek Morphology: Part 142017-08-29T00:49:41Z2017-08-29T00:49:41ZJames Tauber
<p>Part fourteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part fourteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Now we summarize our middle distinguishers. As we did for <strong>PA-6a</strong>, we&rsquo;ll include the upsilon for <strong>PM-6a</strong>.</p>
<table class="table">
<tr><th>&nbsp; <th>PM-1 <th>PM-2 <th>PM-3 <th>PM-4 <th>PM-5 <th>PM-6a <th>PM-7 <th>PM-8 <th>PM-9
<tr><th>INF <td>Xεσθαι <td>Xεῖσθαι <td>Xοῦσθαι <td>Xᾶσθαι <td>Xῆσθαι <td>Xυσθαι <td>Xεσθαι <td>Xοσθαι <td>Xασθαι
<tr><th>1SG <td>Xομαι <td>Xοῦμαι <td>Xοῦμαι <td>Xῶμαι <td>Xῶμαι <td>Xυμαι <td>Xεμαι <td>Xομαι <td>Xαμαι
<tr><th>2SG <td>Xῃ or Xει <td>Xῇ or Xεῖ <td>Xοῖ <td>Xᾷ <td>Xῇ <td>Xυσαι <td>Xεσαι <td>Xοσαι <td>Xασαι
<tr><th>3SG <td>Xεται <td>Xεῖται <td>Xοῦται <td>Xᾶται <td>Xῆται <td>Xυται <td>Xεται <td>Xοται <td>Xαται
<tr><th>1PL <td>Xόμεθα <td>Xούμεθα <td>Xούμεθα <td>Xώμεθα <td>Xώμεθα <td>Xύμεθα <td>Xέμεθα <td>Xόμεθα <td>Xάμεθα
<tr><th>2PL <td>Xεσθε <td>Xεῖσθε <td>Xοῦσθε <td>Xᾶσθε <td>Xῆσθε <td>Xυσθε <td>Xεσθε <td>Xοσθε <td>Xασθε
<tr><th>3PL <td>Xονται <td>Xοῦνται <td>Xοῦνται <td>Xῶνται <td>Xῶνται <td>Xυνται <td>Xενται <td>Xονται <td>Xανται
</table>
<p>and if we capture the common elements in each row:</p>
<table class="table">
<tr><th>&nbsp; <th>PM-1 <th>PM-2 <th>PM-3 <th>PM-4 <th>PM-5 <th>PM-6a <th>PM-7 <th>PM-8 <th>PM-9
<tr><th>INF <td>-σθαι <td>-σθαι <td>-σθαι <td>-σθαι <td>-σθαι <td>-σθαι <td>-σθαι <td>-σθαι <td>-σθαι
<tr><th>1SG <td>-μαι <td>-μαι <td>-μαι <td>-μαι <td>-μαι <td>-μαι <td>-μαι <td>-μαι <td>-μαι
<tr><th>2SG <td>-{ι} <td>-{ι} <td>-{ι} <td>-{ι} <td>-{ι} <td>-σαι <td>-σαι <td>-σαι <td>-σαι
<tr><th>3SG <td>-ται <td>-ται <td>-ται <td>-ται <td>-ται <td>-ται <td>-ται <td>-ται <td>-ται
<tr><th>1PL <td>-μεθα <td>-μεθα <td>-μεθα <td>-μεθα <td>-μεθα <td>-μεθα <td>-μεθα <td>-μεθα <td>-μεθα
<tr><th>2PL <td>-σθε <td>-σθε <td>-σθε <td>-σθε <td>-σθε <td>-σθε <td>-σθε <td>-σθε <td>-σθε
<tr><th>3PL <td>-νται <td>-νται <td>-νται <td>-νται <td>-νται <td>-νται <td>-νται <td>-νται <td>-νται
</table>
<p>Notice that, other than the contraction happening in <strong>2SG</strong> obscuring the historical σαι, and unlike the active, there is no difference between the thematic and athematic endings.</p>
<p>That does mean, however, that the <strong>INF</strong> is no longer completely predictive of the other forms and, in fact no cells are (<strong>2SG</strong> getting close but failing because of the -ῇ ambiguity).</p>
<ul>
<li><strong>INF</strong>, <strong>3SG</strong>, and <strong>2PL</strong> can&rsquo;t distinguish within the set {<strong>PM-1</strong>, <strong>PM-7</strong>}</li>
<li><strong>1SG</strong>, <strong>1PL</strong>, and <strong>3PL</strong> can&rsquo;t distinguish within the set {<strong>PM-1</strong>, <strong>PM-8</strong>}, the set {<strong>PM-2</strong>, <strong>PM-3</strong>}, or the set {<strong>PM-4</strong>, <strong>PM-5</strong>}</li>
<li><strong>2SG</strong> (at least if ῇ) can&rsquo;t distinguish within the set {<strong>PM-2</strong>, <strong>PM-5</strong>}</li>
</ul>
<p>That means, even if you had the <strong>INF</strong>, <strong>3SG</strong>, AND <strong>2PL</strong> of a word, you might not be able to predict its other forms (but if you had a single one of those other forms, all the rest would be predictable). And if you had the <strong>1SG</strong>, <strong>1PL</strong>, and/or <strong>3PL</strong> of a word, you might not be able to predict its other forms (but again, if you had a single one of those other forms, all the rest would be predictable).</p>
<p>This mirrors the ambiguous categories we&rsquo;ve already seen.</p>
<table class="table table-bordered">
<tr><td><b>PM-</b>{<b>1</b>, <b>7</b>}<td>ε in <b>INF</b>, <b>3SG</b>, and <b>2PL</b>
<tr><td><b>PM-</b>{<b>1</b>, <b>8</b>}<td>ο in <b>1SG</b>, <b>1PL</b>, and <b>3PL</b>
<tr><td><b>PM-</b>{<b>2</b>, <b>3</b>}<td>οῦ in <b>1PL</b> and <b>3PL</b>
<tr><td><b>PM-</b>{<b>4</b>, <b>5</b>}<td>ῶ in <b>1PL</b> and <b>3PL</b>
</table>
<p>Plus:</p>
<table class="table table-bordered">
<tr><td><b>PM-</b>{<b>2</b>, <b>5</b>}<td>ῇ ending in <b>2SG</b>
</table>
<p>Also, without accentuation, <strong>PM-4</strong> and <strong>PM-9</strong> would be indistinguishable in <strong>INF</strong>, <strong>3SG</strong>, and <strong>2PL</strong>. And, similarly, <strong>PM-1</strong> and <strong>PM-2</strong> in <strong>2SG</strong>.</p>
<p>In the next part, we&rsquo;ll look at the MorphGNT to see whether the distinguishers here and in <a href="https://jktauber.com/2017/08/26/tour-greek-morphology-part-13/">part 13</a> fully cover all present infinitive and indicative verbs in the SBLGNT. We&rsquo;ll also look at some frequency data. How (relatively) common are each of the paradigms we&rsquo;ve identified? Which seem to be productive and which not? We&rsquo;ll also briefly touch on words that change inflectional class (and hence paradigm) and what role ambiguous forms might play in this.</p>
http://jktauber.com/2017/08/27/greek-letter-frequencies/Greek Letter Frequencies2017-08-27T04:47:25Z2017-08-27T04:47:25ZJames Tauber
<p>I recently saw a nice visualisation of English letter bigram frequencies and decided to replicate it with Greek New Testament data.</p>
<p>I recently saw a nice visualisation of English letter bigram frequencies and decided to replicate it with Greek New Testament data.</p>
<p>You can see the English original in <a href="http://allthingslinguistic.com/post/164611717478/datarep-letter-and-next-letter-frequencies-in">this post</a> on All Things Linguistic. That&rsquo;s not where I originally saw it, though. I think I saw a link on Twitter to a Reddit post.</p>
<p>I wrote a quick Python script to generate the same style of visualisation based on word types (not tokens) in the SBLGNT after stripping accents and folding to lowercase (but keeping the apostrophe used to mark elision). This is the result:</p>
<div align="center">
<img src="https://jktauber.com/site_media/static/greek-letter-frequencies.png" width="100%">
</div>
<p>The intensity of red in the left column indicates the relative frequency of that letter overall. Each row then indicates (via ordering and the intensity of blue) the relative frequencies of what letter follows that red letter. The superscript then indicates the single most likely letter to follow that sequence of two letters. So it shows all unigram frequencies, all bigram frequencies, and the most common trigram for each bigram.</p>
<p>I also used the same bigram and trigram data to generate pseudowords, much like the English original did. At the time, I only tweeted about this second part.</p>
<blockquote class="twitter-tweet" data-lang="en"><p lang="und" dir="ltr">Trigram-based generation of Greek-like words seems promising: ὀκρός θρωτοί δελθομοῦς ἐδωσῖνα ἐπιδάς εὑόν εἰπῆς ἐνησόφος πόδου δόξηλθον μετέ</p>&mdash; James Tauber (@jtauber) <a href="https://twitter.com/jtauber/status/894510737552486400">August 7, 2017</a></blockquote>
<p><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>Patrick Burns asked me for the pseudword generation code so I extracted it, cleaned it up a bit and posted it to a gist <a href="https://gist.github.com/jtauber/71c6ab6a7bfaf42cffe64d74b69e7a2a">here</a>.</p>
<p>I never got around to posting my letter frequency visualisation, but Seumas Macdonald (not knowing I&rsquo;d already done the work) pointed me to the All Things Linguistic blog post and asked about the possibility of doing the same for Greek. It was enough of a nudge to get this blog post written.</p>
<p>Thanks Seumas and Patrick!</p>
http://jktauber.com/2017/08/26/tour-greek-morphology-part-13/A Tour of Greek Morphology: Part 132017-08-26T22:43:04Z2017-08-26T22:43:04ZJames Tauber
<p>Part thirteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part thirteen of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Let&rsquo;s summarize all 10 active distinguisher paradigms we&rsquo;ve seen so far (this will probably only layout properly if your browser is wide):</p>
<table class="table">
<tr><th>&nbsp; <th>PA-1 <th>PA-2 <th>PA-3 <th>PA-4 <th>PA-5 <th>PA-6 <th>PA-7 <th>PA-8 <th>PA-9 <th>PA-10
<tr><th>INF <td>Xειν <td>Xεῖν <td>Xοῦν <td>Xᾶν <td>Xῆν <td>Xναι <td>Xέναι <td>Xόναι <td>Xάναι <td>εἶναι
<tr><th>1SG <td>Xω <td>Xῶ <td>Xῶ <td>Xῶ <td>Xῶ <td>Xμι <td>Xημι <td>Xωμι <td>Xημι <td>εἰμί
<tr><th>2SG <td>Xεις <td>Xεῖς <td>Xοῖς <td>Xᾷς <td>Xῇς <td>Xς <td>Xης <td>Xως <td>Xης <td>εἶ
<tr><th>3SG <td>Xει <td>Xεῖ <td>Xοῖ <td>Xᾷ <td>Xῇ <td>Xσι(ν) <td>Xησι(ν) <td>Xωσι(ν) <td>Xησι(ν) <td>ἐστί(ν)
<tr><th>1PL <td>Xομεν <td>Xοῦμεν <td>Xοῦμεν <td>Xῶμεν <td>Xῶμεν <td>Xμεν <td>Xεμεν <td>Xομεν <td>Xαμεν <td>ἐσμέν
<tr><th>2PL <td>Xετε <td>Xεῖτε <td>Xοῦτε <td>Xᾶτε <td>Xῆτε <td>Xτε <td>Xετε <td>Xοτε <td>Xατε <td>ἐστέ
<tr><th>3PL <td>Xουσι(ν) <td>Xοῦσι(ν) <td>Xοῦσι(ν) <td>Xῶσι(ν) <td>Xῶσι(ν) <td>Xασι(ν) <td>Xέασι(ν) <td>Xόασι(ν) <td>Xᾶσι(ν) <td>εἰσί(ν)
</table>
<p>As we&rsquo;ve already noted, some cells have identical distinguishers (for example, the ῶ of <strong>PA-2</strong>, <strong>PA-3</strong>, <strong>PA-4</strong> and <strong>PA-5</strong>). More on that shortly.</p>
<p>But first note something about <strong>PA-6</strong>—it subsumes the next three paradigms and, in fact, in the case of <strong>2SG</strong> subsumes every paradigm except <strong>PA-10</strong>. In otherwords, a word form from another paradigm technically matches <strong>PA-6</strong> too. If you go back to <a href="https://jktauber.com/2017/08/02/tour-greek-morphology-part-10/">part 10</a>, you&rsquo;ll see that our exemplar for <strong>PA-6</strong> was δεικνύναι, δείκνυμι, and so on. The <em>only</em> reason <strong>PA-6</strong> doesn&rsquo;t have a vowel like <strong>PA-7</strong>, <strong>PA-8</strong>, <strong>PA-9</strong> is that the vowel is always υ and hence it was dropped out of the distinguisher analysis. But we have no reason at this stage not to supposed the upsilon is an important part of the <strong>PA-6</strong> paradigm (it just doesn&rsquo;t distinguish cells <em>within</em> the paradigm). So I&rsquo;m going to tentatively put it back for the purposes of comparing <em>across</em> paradigms. I&rsquo;ll call this modified distinguisher paradigm <strong>PA-6a</strong>.</p>
<p>In repeating the paradigm of paradigms with this small modification:</p>
<table class="table">
<tr><th>&nbsp; <th>PA-1 <th>PA-2 <th>PA-3 <th>PA-4 <th>PA-5 <th>PA-6a <th>PA-7 <th>PA-8 <th>PA-9 <th>PA-10
<tr><th>INF <td>Xειν <td>Xεῖν <td>Xοῦν <td>Xᾶν <td>Xῆν <td>Xύναι <td>Xέναι <td>Xόναι <td>Xάναι <td>εἶναι
<tr><th>1SG <td>Xω <td>Xῶ <td>Xῶ <td>Xῶ <td>Xῶ <td>Xυμι <td>Xημι <td>Xωμι <td>Xημι <td>εἰμί
<tr><th>2SG <td>Xεις <td>Xεῖς <td>Xοῖς <td>Xᾷς <td>Xῇς <td>Xυς <td>Xης <td>Xως <td>Xης <td>εἶ
<tr><th>3SG <td>Xει <td>Xεῖ <td>Xοῖ <td>Xᾷ <td>Xῇ <td>Xυσι(ν) <td>Xησι(ν) <td>Xωσι(ν) <td>Xησι(ν) <td>ἐστί(ν)
<tr><th>1PL <td>Xομεν <td>Xοῦμεν <td>Xοῦμεν <td>Xῶμεν <td>Xῶμεν <td>Xυμεν <td>Xεμεν <td>Xομεν <td>Xαμεν <td>ἐσμέν
<tr><th>2PL <td>Xετε <td>Xεῖτε <td>Xοῦτε <td>Xᾶτε <td>Xῆτε <td>Xυτε <td>Xετε <td>Xοτε <td>Xατε <td>ἐστέ
<tr><th>3PL <td>Xουσι(ν) <td>Xοῦσι(ν) <td>Xοῦσι(ν) <td>Xῶσι(ν) <td>Xῶσι(ν) <td>Xύασι(ν) <td>Xέασι(ν) <td>Xόασι(ν) <td>Xᾶσι(ν) <td>εἰσί(ν)
</table>
<p>Now let&rsquo;s capture the common elements in the rows:</p>
<table class="table">
<tr><th>&nbsp; <th>PA-1 <th>PA-2 <th>PA-3 <th>PA-4 <th>PA-5 <th>PA-6a <th>PA-7 <th>PA-8 <th>PA-9 <th>PA-10
<tr><th>INF <td>-ν <td>-ν <td>-ν <td>-ν <td>-ν <td>-ναι <td>-ναι <td>-ναι <td>-ναι <td>-ναι
<tr><th>1SG <td>-ω <td>-ῶ <td>-ῶ <td>-ῶ <td>-ῶ <td>-μι <td>-μι <td>-μι <td>-μι <td>-μί
<tr><th>2SG <td>-{ι}ς <td>-{ι}ς <td>-{ι}ς <td>-{ι}ς <td>-{ι}ς <td>-ς <td>-ς <td>-ς <td>-ς <td>εἶ
<tr><th>3SG <td>-{ι} <td>-{ι} <td>-{ι} <td>-{ι} <td>-{ι} <td>-σι(ν) <td>-σι(ν) <td>-σι(ν) <td>-σι(ν) <td>ἐστί(ν)
<tr><th>1PL <td>-μεν <td>-μεν <td>-μεν <td>-μεν <td>-μεν <td>-μεν <td>-μεν <td>-μεν <td>-μεν <td>-μέν
<tr><th>2PL <td>-τε <td>-τε <td>-τε <td>-τε <td>-τε <td>-τε <td>-τε <td>-τε <td>-τε <td>-τέ
<tr><th>3PL <td>-σι(ν) <td>-σι(ν) <td>-σι(ν) <td>-σι(ν) <td>-σι(ν) <td>-ασι(ν) <td>-ασι(ν) <td>-ασι(ν) <td>-ᾶσι(ν) <td>-σί(ν)
</table>
<p>The <strong>INF</strong>, although coming in two variants, has the property that it gives us enough information to know <strong>every form of the word in the present indicative active</strong>.</p>
<p>No other slots in our paradigms do that.</p>
<ul>
<li>The <strong>1SG</strong> can&rsquo;t distinguish within the set {<strong>PA-2</strong>, <strong>PA-3</strong>, <strong>PA-4</strong>, <strong>PA-5</strong>} or within the set {<strong>PA-7</strong>, <strong>PA-9</strong>}</li>
<li>The <strong>2SG</strong> can&rsquo;t distinguish within the set {<strong>PA-7</strong>, <strong>PA-9</strong>}</li>
<li>The <strong>3SG</strong> can&rsquo;t distinguish within the set {<strong>PA-7</strong>, <strong>PA-9</strong>}</li>
<li>The <strong>1PL</strong> can&rsquo;t distinguish within the set {<strong>PA-2</strong>, <strong>PA-3</strong>}, the set {<strong>PA-1</strong>, <strong>PA-8</strong>}, or the set {<strong>PA-4</strong>, <strong>PA-5</strong>}</li>
<li>The <strong>2PL</strong> can&rsquo;t distinguish within the set {<strong>PA-1</strong>, <strong>PA-7</strong>}</li>
<li>The <strong>3PL</strong> can&rsquo;t distinguish within the set {<strong>PA-2</strong>, <strong>PA-3</strong>} or within the set {<strong>PA-4</strong>, <strong>PA-5</strong>}</li>
</ul>
<p>Among other things, this is why the <strong>1SG</strong> isn&rsquo;t a great choice of <strong>lemma</strong> (or headword, or citation form) for a lexeme. It&rsquo;s the reason so many dictionaries and lemmatizations show the circumflex verbs uncontracted (e.g. ποιέω for ποιῶ) even though in many dialects, including the Koine, that&rsquo;s a nonsense word. Even then, most dictionaries don&rsquo;t distinguish <strong>PA-7</strong> from <strong>PA-9</strong> (τίθημι vs ἵστημι) although admittedly that&rsquo;s not as important as they aren&rsquo;t productive.</p>
<p>In almost all respects, <strong>the present active infinitive is the perfect lemma for the present active forms of a verb</strong>. Some have argued against the infinitive as lemma as it doesn&rsquo;t form a clause by itself (although nor do verbs with obligatory complements). A close candidate is the <strong>3SG</strong>, the benefit of which is how common it is. The main downside is just it doesn&rsquo;t distinguish <strong>PA-7</strong> and <strong>PA-9</strong>. But one could hardly go wrong focusing on the <strong>INF</strong> and <strong>3SG</strong> as the forms to most associate with each present active verb.</p>
<p>It should be noted that even though the <strong>1SG</strong> is the worst <em>predictively</em>, it&rsquo;s completely <em>predictable</em> from any other forms. Also, despite some ambiguity in the <strong>1PL</strong> and <strong>3PL</strong>, they can be predicted from one another. Similarly, all the singulars in the <strong>PA-7</strong> and <strong>PA-9</strong> words predict each other.</p>
<p>Another way of thinking about this is to group our paradigm classes by their shared properties:</p>
<table class="table table-bordered">
<tr><td colspan=3><b>PA-</b>{<b>1</b>, <b>2</b>, <b>3</b>, <b>4</b>, <b>5</b>}<td colspan=3><b>INF</b> ends in -ν, <b>1SG</b> in -ω/-ῶ<td>thematic or omega verbs
<tr><td>&nbsp;<td colspan=2><b>PA-</b>{<b>2</b>, <b>3</b>, <b>4</b>, <b>5</b>}<td>&nbsp;<td colspan=2>circumflex throughout endings<td>circumflex or contract verbs
<tr><td>&nbsp;<td>&nbsp;<td><b>PA-</b>{<b>2</b>, <b>3</b>}<td>&nbsp;<td>&nbsp;<td>οῦ in <b>1PL</b> and <b>3PL</b>
<tr><td>&nbsp;<td>&nbsp;<td><b>PA-</b>{<b>4</b>, <b>5</b>}<td>&nbsp;<td>&nbsp;<td>ῶ in <b>1PL</b> and <b>3PL</b>
<tr><td colspan=3><b>PA-</b>{<b>6a</b>, <b>7</b>, <b>8</b>, <b>9</b>, <b>10</b>}<td colspan=3><b>INF</b> ends in -ναι, <b>1SG</b> in -μι<td>athematic or μι verbs
<tr><td>&nbsp;<td colspan=2><b>PA-</b>{<b>6a</b>, <b>7</b>, <b>8</b>, <b>9</b>}<td>&nbsp;<td colspan=2><b>3SG</b> in -σι(ν), <b>3PL</b> in -ασι(ν)
<tr><td>&nbsp;<td>&nbsp;<td><b>PA-</b>{<b>7</b>, <b>9</b>}<td>&nbsp;<td>&nbsp;<td>η in singulars
</table>
<p>There are the other cross-cutting categories:</p>
<table class="table table-bordered">
<tr><td><b>PA-</b>{<b>1</b>, <b>8</b>}<td><b>1PL</b> ends with ομεν
<tr><td><b>PA-</b>{<b>1</b>, <b>7</b>}<td><b>2PL</b> ends with ετε
</table>
<p>If one ignores accentuation, one could conceivably also come up with cross-cutting categories such as <b>PA-</b>{<b>1</b>,<b>2</b>} which shares the ει in the <strong>INF</strong>, <strong>2SG</strong>, and <strong>3SG</strong>. Or <b>PA-</b>{<b>4</b>, <b>9</b>} which both have ατε in <strong>2PL</strong>. Or <b>PA-</b>{<b>1</b>, <b>2</b>, <b>3</b>} which all have ουσι(ν) in <strong>3PL</strong>. </p>
<p>Next we&rsquo;ll look at the middles.</p>
http://jktauber.com/2017/08/05/first-week-new-vocab-site/First Week of New Vocab Site2017-08-24T12:30:48Z2017-08-05T15:52:28ZJames Tauber
<p>Last week I launched a site for Greek vocabulary. Here&rsquo;s how the first week has gone.</p>
<p>Last week I launched a site for Greek vocabulary. Here&rsquo;s how the first week has gone.</p>
<p>Over time <a href="http://vocab.oxlos.org/">http://vocab.oxlos.org/</a> will contain a variety of tools for learning and assessing Greek vocabulary. As mentioned in <a href="https://jktauber.com/2017/07/29/new-site-vocabulary-experiments/">my blog post</a> a week ago, I&rsquo;m starting with some experiments based on the work of Paul Nation.</p>
<p>I&rsquo;m delighted with the response so far and am very thankful to everyone who has participated. In the first week 58 people signed up, 37 people completed at least one full activity with 19 completing more than one and six people completing at least four activities.</p>
<p>Thanks to <a href="https://thepatrologist.com">Seumas Macdonald</a>, I expanded the initial New Testament vocabulary testing a couple of days ago to some Patristic vocabulary. I&rsquo;ll also be adding some classical Greek vocabulary soon.</p>
<p>As my previous post says, some of my initial research questions are:</p>
<ul>
<li>how reliable is a test like Nation&rsquo;s vocabulary level test at estimating one’s NT Greek vocabulary size?</li>
<li>how much is frequency a factor in how likely a student is to know a word?</li>
<li>what other factors contribute to likelihood a student knows a word?</li>
</ul>
<p>I do need to continue to gather data but so far the Nation-style test seems to be working well and individual frequency bands actually do seem very good indicators of overall vocabulary size. I&rsquo;ll publish results with analysis over time. I&rsquo;ll also continue to release new activities.</p>
<p>As well as expanding the vocabulary to broader corpora and other parts of speech besides nouns, I also want to explore the impact of English cognates and relatedness between lexemes due to derivation. I&rsquo;ll also be adding some additional activity types based on the work of other vocabulary acquisition researchers such as Schmitt and Meara. </p>
<p>Thanks again to everyone who has participated so far and please continue to do so (and share a link to the site with Greek students, particularly those at a less-advanced level).</p>
http://jktauber.com/2017/08/05/speaking-berlin/Speaking in Berlin2017-08-05T15:52:21Z2017-08-05T15:52:21ZJames Tauber
<p>This afternoon I&rsquo;m heading off to Berlin for my first Society of Biblical Literature International Meeting, where I&rsquo;ll be speaking on adaptive reading environments for Biblical Greek.</p>
<p>This afternoon I&rsquo;m heading off to Berlin for my first Society of Biblical Literature International Meeting, where I&rsquo;ll be speaking on adaptive reading environments for Biblical Greek.</p>
<p>I&rsquo;ve attended a number of SBL Annual Meetings in the US and spoken at two but this will be my first International Meeting. At the invitation of Professor Nicolai Winther-Nielsen, I&rsquo;ll be giving an update on the talk I gave at last year&rsquo;s Annual Meeting.</p>
<p>Here&rsquo;s my abstract:</p>
<blockquote>
<p><strong>The Route to Adaptive Learning of Greek</strong></p>
<p>One of the promises of machine-actionable linguistic data linked to biblical texts is the enablement of new types of language learning tools. At their simplest, such tools might involve adding the necessary scaffolding to enable students to read more text than they otherwise might by providing glosses for rarer words or help on idioms, irregular morphology, and unusual syntactic constructions. Such tools, however, are hardly novel and have long been manually produced in printed form. Equivalent electronic versions don&rsquo;t really take advantage of what&rsquo;s possible. In this paper I discuss an online reading environment for Ancient Greek, and the Greek New Testament in particular, that takes advantage of the availability of open, machine-actionable resources such as treebanks and morphological analyses for more automated and consistent generation of scaffolding but which goes a step further by being adaptive to an individual student&rsquo;s knowledge at a given point. Such knowledge need not be explicitly provided (although it can be: to align with a particular textbook, for example). It can also be built up implicitly from what the reader is requesting more information or help on: What words are they having trouble remembering the meaning of? What forms are they having trouble parsing? The model of student knowledge is then integrated with learning tools such as spaced-repetition flash cards and parsing drills with the results of these tools then feeding back into better adapting scaffolding for reading. The online reading environment will be open source and potentially applicable to a wide range of other language and texts provided the necessary linguistic data is available.</p>
</blockquote>
<p>Thank you to Professor Winther-Nielsen for inviting me.</p>
http://jktauber.com/2017/08/03/tour-greek-morphology-part-11/A Tour of Greek Morphology: Part 112017-08-04T01:03:47Z2017-08-03T22:44:50ZJames Tauber
<p>Part eleven of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part eleven of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In <a href="https://jktauber.com/2017/08/02/tour-greek-morphology-part-10/">part 10</a>, we looked at some new active forms. Now it&rsquo;s time to look at the corresponding middle forms.</p>
<table class="table">
<tr><th>INF <td>δείκνυσθαι <td>τίθεσθαι <td>δίδοσθαι <td>ἵστασθαι
<tr><th>1SG <td>δείκνυμαι <td>τίθεμαι <td>δίδομαι <td>ἵσταμαι
<tr><th>2SG <td>δείκνυσαι <td>τίθεσαι <td>δίδοσαι <td>ἵστασαι
<tr><th>3SG <td>δείκνυται <td>τίθεται <td>δίδοται <td>ἵσταται
<tr><th>1PL <td>δεικνύμεθα <td>τιθέμεθα <td>διδόμεθα <td>ἱστάμεθα
<tr><th>2PL <td>δείκνυσθε <td>τίθεσθε <td>δίδοσθε <td>ἵστασθε
<tr><th>3PL <td>δείκνυνται <td>τίθενται <td>δίδονται <td>ἵστανται
</table>
<p>In the middle forms, there is no change in the vowel and so it doesn&rsquo;t need to be included in the distinguisher. In this sense, we really only have one distinguisher paradigm for all these forms in the middle.</p>
<p>However, if we were contrasting against the active forms as well, we could identify a <strong>PM-6</strong>, <strong>PM-7</strong>, <strong>PM-8</strong>, and <strong>PM-9</strong> paired up with <strong>PA-6</strong>, <strong>PA-7</strong>, <strong>PA-8</strong>, <strong>PA-9</strong>:</p>
<table class="table">
<tr><th>&nbsp; <th>PM-6 <th>PM-7 <th>PM-8 <th>PM-9
<tr><th>INF <td>Xσθαι <td>Xεσθαι <td>Xοσθαι <td>Xασθαι
<tr><th>1SG <td>Xμαι <td>Xεμαι <td>Xομαι <td>Xαμαι
<tr><th>2SG <td>Xσαι <td>Xεσαι <td>Xοσαι <td>Xασαι
<tr><th>3SG <td>Xται <td>Xεται <td>Xοται <td>Xαται
<tr><th>1PL <td>Xμεθα <td>Xέμεθα <td>Xόμεθα <td>Xάμεθα
<tr><th>2PL <td>Xσθε <td>Xεσθε <td>Xοσθε <td>Xασθε
<tr><th>3PL <td>Xνται <td>Xενται <td>Xονται <td>Xανται
</table>
<p>But the common endings for the <strong>μι</strong> verbs are very clear. Here they are alongside our previously reconstructed endings for the previous middle paradigms:</p>
<table class="table">
<tr><th>INF <td>-σθαι <td>ε σθαι
<tr><th>1SG <td>-μαι <td>ο μαι
<tr><th>2SG <td>-σαι <td>ε σαι > ῃ
<tr><th>3SG <td>-ται <td>ε ται
<tr><th>1PL <td>-μεθα <td>ο μεθα
<tr><th>2PL <td>-σθε <td>ε σθε
<tr><th>3PL <td>-νται <td>ο νται
</table>
<p>This not only provides clear support for the ε+σαι reconstruction of the ῃ <strong>MID 2SG</strong> form but also makes clear how the <strong>ω</strong> verbs (both barytone and circumflex) use the same endings as the <strong>μι</strong> verbs but with the ε/ο vowel (the so-called thematic vowel) attached to the stem before the ending. In the middle, this is the only difference (slightly obscured when ῃ is used in the <strong>2SG</strong>).</p>
<p>As mentioned in <a href="https://jktauber.com/2017/07/23/tour-greek-morphology-part-9/">part 9</a>, there are some tantalising patterns here: the αι in 5 out of 7 cells; the μ/σ/τ in the 1st/2nd/3rd person.</p>
<p>The appearance of μι in the <strong>ACT 1SG</strong> is particular interesting because we now have a μι/μαι contrast in the <strong>1SG</strong> between active and middle which exactly mirrors the οντι/ονται contrast in the <strong>3PL</strong>.</p>
<p>One might well question why the active <strong>2SG</strong> and <strong>3SG</strong> forms don&rsquo;t end in σι and τι to mirror σαι and ται. Or why the active infinitive isn&rsquo;t σθι. Or why the <strong>1PL</strong> and <strong>2PL</strong> have only a vague relationship between the active and middle. And we still have the question of where the alpha in the <strong>ACT 3PL</strong> ασι(ν) ending comes from. We&rsquo;ll touch on some of these questions in the next post and we <em>will</em> reveal some more historical and dialectal patterns.</p>
<p>But it is again worth reiterating that <strong>the primary role of a distinguisher is not to be decomposable but merely to discriminate meaning</strong>. That there are patterns between the distinguishers at all is not a fundamental requirement of the role they play in conveying information. There may be historical reasons for the patterns (as we&rsquo;ve already seen) and learnability pressures that favour them (or even conspire to introduce them over time) but we should not <em>expect</em> them and therefore view their absence as any kind of defect or irregularity.</p>
http://jktauber.com/2017/08/02/tour-greek-morphology-part-10/A Tour of Greek Morphology: Part 102017-08-02T18:04:35Z2017-08-02T18:04:35ZJames Tauber
<p>Part ten of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part ten of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In previous posts we&rsquo;ve explored five distinct active and middle paradigms in the present indicative and infinitive.</p>
<p>There are still a number of inflectional classes in the present we haven&rsquo;t covered yet and we&rsquo;ll introduce a few more active forms in this post.</p>
<table class="table">
<tr>
<th>INF
<td><i>δεικνύναι<i> &dagger;
<td>τιθέναι
<td>διδόναι
<td>-ιστάναι &dagger;
<tr>
<th>1SG
<td>δείκνυμι
<td>τίθημι
<td>δίδωμι
<td>-ίστημι
<tr>
<th>2SG
<td><i>δείκνυς</i> &dagger;
<td><i>τίθης</i>
<td>-δίδως
<td><i>ἵστης</i> &dagger;
<tr>
<th>3SG
<td>δείκνυσι(ν)
<td>τίθησι(ν)
<td>δίδωσι(ν)
<td>-ίστησι(ν)
<tr>
<th>1PL
<td><i>δείκνυμεν</i>
<td>-τίθεμεν
<td><i>δίδομεν</i>
<td><i>ἵσταμεν</i> &dagger;
<tr>
<th>2PL
<td><i>δείκνυτε</i>
<td><i>τίθετε</i>
<td><i>δίδοτε</i>
<td><i>ἵστατε</i> &dagger;
<tr>
<th>3PL
<td><i>δεικνύασι(ν)</i>
<td>τιθέασι(ν)
<td>διδόασι(ν)
<td><i>ἱστᾶσι(ν)</i>
</table>
<p>In the above table, <i>italics</i> indicates the form does not appear in the NT but the cell is filled from elsewhere; a preceding hyphen indicates the NT only contains the form with a preverb; and &dagger; indicates the NT has another form from one of the inflectionals classes we&rsquo;ve already seen (more on that later).</p>
<p>It is worth noting that there are very few verbs that follow these paradigms but they are very common. In a future post, we&rsquo;ll look at the frequencies in more detail.</p>
<p>Let&rsquo;s start with the distinguishers (removing the common elements in each column):</p>
<table class="table">
<tr>
<th>&nbsp;<th>PA-6<th>PA-7<th>PA-8<th>PA-9
<tr>
<th>INF
<td>Xναι
<td>Xέναι
<td>Xόναι
<td>Xάναι
<tr>
<th>1SG
<td>Xμι
<td>Xημι
<td>Xωμι
<td>Xημι
<tr>
<th>2SG
<td>Xς
<td>Xης
<td>Xως
<td>Xης
<tr>
<th>3SG
<td>Xσι(ν)
<td>Xησι(ν)
<td>Xωσι(ν)
<td>Xησι(ν)
<tr>
<th>1PL
<td>Xμεν
<td>Xεμεν
<td>Xομεν
<td>Xαμεν
<tr>
<th>2PL
<td>Xτε
<td>Xετε
<td>Xοτε
<td>Xατε
<tr>
<th>3PL
<td>Xασι(ν)
<td>Xέασι(ν)
<td>Xόασι(ν)
<td>Xᾶσι(ν)
</table>
<p>At this point, the relationship between <strong>PA-6</strong> and each of <strong>PA-7</strong>, <strong>PA-8</strong>, <strong>PA-9</strong> seem to mirror that between <strong>PA-1</strong> and each of <strong>PA-2</strong>, <strong>PA-3</strong>, <strong>PA-4</strong> respectively. This is especially evident in the infinitive and plurals where (-, ε, ο, α) is to (<strong>PA-1</strong>, <strong>PA-2</strong>, <strong>PA-3</strong>, <strong>PA-4</strong>) is to (<strong>PA-6</strong>, <strong>PA-7</strong>, <strong>PA-8</strong>, <strong>PA-9</strong>).</p>
<p>If we isolate just the common endings (recurring horizontally) and place them alongside the endings we reconstructed in <a href="https://jktauber.com/2017/07/23/tour-greek-morphology-part-9/">part 9</a>, we get:</p>
<table class="table">
<tr>
<th>INF
<td>-ναι
<td>ε εν
<tr>
<th>1SG
<td>-μι
<td>ω -
<tr>
<th>2SG
<td>-ς
<td>ε ις
<tr>
<th>3SG
<td>-σι(ν)
<td>ε ι
<tr>
<th>1PL
<td>-μεν
<td>ο μεν
<tr>
<th>2PL
<td>-τε
<td>ε τε
<tr>
<th>3PL
<td>-ασι(ν)
<td>ο ντι > ουσι(ν)
</table>
<p>Notice that:</p>
<ul>
<li>thematic vowels seem to be entirely missing</li>
<li>the <strong>3PL</strong> has an alpha, though</li>
<li>some endings seem identical except for the lack of thematic vowel (<strong>1PL</strong> and <strong>2PL</strong>)</li>
<li>some are close (<strong>2SG</strong> and <strong>3PL</strong>)</li>
<li>some are not so close (<strong>INF</strong> and <strong>3SG</strong>)</li>
<li>but now the <strong>3SG</strong> and <strong>3PL</strong> are almost identical to <em>each other</em> in these new paradigms</li>
<li>the <strong>1SG</strong> seems completely unrelated</li>
</ul>
<p>Because of the lack of thematic vowels (seen most strikingly in the <strong>1PL</strong> and <strong>2PL</strong> forms), these types of verbs are often called <strong>athematic</strong> verbs. Because of the completely different ending μι in the <strong>1SG</strong>, they are also often called <strong>μι</strong> verbs. They <em>could</em> be called <strong>ναι</strong> verbs, but I&rsquo;m not aware of anyone who does that. Those three things are the most obvious contrasts, though.</p>
<p>When we look back at the full forms, we also notice:</p>
<ul>
<li>the vowel preceding the endings is different in the singular and the plural</li>
<li>ἱστᾶσι(ν) is accented in a way that suggests a contraction, probably from αα which makes sense given the other plural forms.</li>
<li>έα and όα haven&rsquo;t contracted in the <strong>3PL</strong> (and note if they did, they would be identical to the <strong>3SG</strong> in <strong>PA-7</strong> and <strong>PA-8</strong>)</li>
</ul>
<p>It is as if the stems are τιθη, διδω, and ἱστη in the singular and τιθε, διδο, and ἱστα in the infinitive and plural. This is noteworthy for at least three reasons.</p>
<p>Firstly, it&rsquo;s the first time we&rsquo;ve seen a contrast that only indicates number and not person.</p>
<p>Secondly, it&rsquo;s not (just) a different ending indicating the number but a change in the vowel.</p>
<p>And thirdly, it&rsquo;s redundant as the ending alone still conveys number.</p>
<p>On the surface, it appears that δεικνυ keeps its vowel the same although length is not clear yet. </p>
<p>It is important to note that, unlike the circumflex verbs <strong>PA-2</strong> through <strong>PA-5</strong> which, as we have shown, all have the same endings (as each other and as <strong>PA-1</strong>), <strong>PA-6</strong> through <strong>PA-9</strong> have a new set of common endings distinct from those of <strong>PA-1</strong> thru <strong>PA-5</strong> (with some overlap). The paradigms cannot be explained merely as stems interacting differently with the <em>same</em> endings.</p>
<p>We will pick up this point again soon, but first (in the next post), we&rsquo;ll look at the middle forms of our new verbs.</p>
http://jktauber.com/2017/07/29/nt-book-similarity-jaccard-distance-lemma-sets/NT Book Similarity by Jaccard Distance of Lemma Sets2017-07-30T17:51:28Z2017-07-29T20:11:55ZJames Tauber
<p>I was thinking about vocabulary differences between books of the New Testament and decided to see what happens when you do a hierarchical clustering analysis of NT books using the Jaccard distance of their lemma sets.</p>
<p>I was thinking about vocabulary differences between books of the New Testament and decided to see what happens when you do a hierarchical clustering analysis of NT books using the Jaccard distance of their lemma sets.</p>
<div class="alert alert-danger">
<b>UPDATE</b>: I'm now convinced much (although not all) of this is due to length effects. If you think about it, the Jaccard distance between a large set and a small set is going to be large just by virtue of the large set having more in it than the small set. This will naturally group the non-letters together, the short letters together, Romans and the Corinthian letters together and so on. So until I come up with a way to correct Jaccard distance for text length, I'd take this post with a grain of salt.
</div>
<p>This is some old-school stylometry but the results are still pretty interesting. For each book, I calculated the set of lemmas and then, for each pair of books, calculated the Jaccard coefficient (the ratio of the intersection of the sets and the unions of the sets).</p>
<p>I then did a cluster analysis using Ward&rsquo;s criterion and rendered the results as a dendrogram:</p>
<div align="center">
<img src="https://jktauber.com/site_media/static/ward_jaccard_lemma.png" width="100%">
</div>
<p>Notice that the first split is between the letters and non-letters.</p>
<p>Within the non-letters, John&rsquo;s Gospel and Revelation cluster together as do Acts and the Synoptics. The Synoptics cluster with each other more than they do with Acts. Matthew and Mark cluster together more than they do with Luke.</p>
<p>The highest division in the letters is between:</p>
<ul>
<li>the non-pastoral Pauline epistles plus Hebrews, James and 1 Peter</li>
<li>the pastorals plus the rest of the general epistles (2 Peter, the Johannine epistles and Jude)</li>
</ul>
<p>That first division of letters further clusters into:</p>
<ul>
<li>Galatians, Ephesians, Philippians, Colossians, 1 Thessalonians, 2 Thessalonians </li>
<li>Romans, 1 Corinthians, 2 Corinthians, Hebrews, James and 1 Peter</li>
</ul>
<p>Ephesians and Colossians cluster together, the two epistles to the Thessalonians cluster together, and Galatians and Philippians cluster together.</p>
<p>Romans, 1 Corinthians, and 2 Corinthians cluster (although 1 Corinthians clusters closer to Romans than to 2 Corinthians). James and 1 Peter cluster. Hebrews is in the same overall group but clusters closer to the Romans/Corinthian subgroup.</p>
<p>The second division of letters clusters into:</p>
<ul>
<li>Philemon, 2 John, 3 John</li>
<li>Titus, 1 Timothy, 2 Timothy</li>
<li>Jude, 1 John, 2 Peter</li>
</ul>
<p>with the second and third clustering slightly closer than the first.</p>
<p>2 John and 3 John cluster much closer to each other than to Philemon. The epistles to Timothy cluster slightly closer together than they do to Titus. 1 John and 2 Peter cluster slightly closer together than they do with Jude.</p>
<p>I haven&rsquo;t thought about length effects here but they may influence the clustering of very short books together (and possibly very long books). A lot of the clustering does follow similar lengths so it&rsquo;s definitely worth thinking more about.</p>
<p>Of course, there&rsquo;s nothing new about this kind of analysis. As I said at the start, it&rsquo;s old school—the sort of thing I can imagine being published in a &ldquo;humanities computing&rdquo; journal in the 80s. But it&rsquo;s still interesting. And it might be even more interesting to apply to finer-grained text divisions and/or with properties other than lemmas.</p>
http://jktauber.com/2017/07/29/new-site-vocabulary-experiments/New Site for Vocabulary Experiments2017-07-29T03:24:20Z2017-07-29T03:00:26ZJames Tauber
<p>I&rsquo;ve put together a new little site to host various activities to research vocabulary knowledge and acquisition in the context of Ancient and Biblical Greek.</p>
<p>I&rsquo;ve put together a new little site to host various activities to research vocabulary knowledge and acquisition in the context of Ancient and Biblical Greek.</p>
<p>The new site is at:</p>
<blockquote>
<p><a href="http://vocab.oxlos.org/">http://vocab.oxlos.org/</a></p>
</blockquote>
<p>While eventually there will be a range of activity types and some spaced repetition practice, there is just a single activity type at the moment, based on work by vocabulary acquisition expert Paul Nation in the 1980s and 1990s.</p>
<p>It is a <strong>receptive</strong> vocabulary test, which means it focuses on whether you can understand a word when you come across it in text rather than whether you can produce the word in the right context. Each step of the activity asks you to select a word that best matches a given gloss, taken over a list of word-gloss pairs with a range of different frequencies.</p>
<p>Nation&rsquo;s original tests (for English as a Foreign Language learners) used word lists split into frequency bands like the top 2000, top 3000, top 5000, and so on.</p>
<p>I took the common nouns in the Greek New Testament and similarly broke them in to frequency bands. Rather than have identically-sized buckets, I went by frequency cut offs:</p>
<ul>
<li>bucket 1 : 32 or more times</li>
<li>bucket 2 : 16 to 31 times</li>
<li>bucket 3 : 4 to 15 times</li>
<li>bucket 4 : 2 or 3 times</li>
<li>bucket 5 : 1 time</li>
</ul>
<p>(Whether these are appropriate buckets will be assessed as part of this work.)</p>
<p>From each bucket, 36 word-gloss pairs were randomly chosen (the glosses coming from Dodson&rsquo;s public domain glosses of NT lexemes). Of those 36, only 18 are tested, the 18 untested words used for distractors. This follows Nation&rsquo;s approach.</p>
<p>So each activity of this type involves 90 items. I&rsquo;ve so far generated two activities but it&rsquo;s easy for me to generate more over time. I&rsquo;ll also expand the items to other parts of speech and a larger Greek corpus (including Classical). As long as I have frequency information and glosses, I can easily generate activities.</p>
<p>I also have some other types of activities I&rsquo;d like to implement, based on the research literature. I&rsquo;d like to roll out a new activity once every couple of weeks or so.</p>
<p>There are some fairly basic, fundamental questions that I&rsquo;ll be able to start to answer once I get more people trying the initial activities:</p>
<ul>
<li>how reliable is a test like this at estimating one&rsquo;s NT Greek vocabulary size?</li>
<li>how much is frequency a factor in how likely a student is to know a word?</li>
<li>what other factors contribute to likelihood a student knows a word?</li>
</ul>
<p>Future activities will be able to explore some of this in more detail such as the impact of English cognates or relatedness between lexemes due to derivation, etc.</p>
<p>Ultimately this is all input into producing better learning tools. It will feed directly into the adaptive online reading environment I&rsquo;m currently working on.</p>
<p>Thank you to everyone who has tried the activities so far and PLEASE continue to do more activities as I roll them out and help spread the word. The more people of varying ability I get doing these activities, the richer and more insightful the data will be.</p>
<p>I&rsquo;ll share those insights on this blog as things progress.</p>
http://jktauber.com/2017/07/23/tour-greek-morphology-part-9/A Tour of Greek Morphology: Part 92017-07-23T05:42:29Z2017-07-23T05:42:29ZJames Tauber
<p>Part nine of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part nine of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In <a href="https://jktauber.com/2017/07/17/tour-greek-morphology-part-8/">part 8</a> we saw, amongst other things, that the present active infinitive has a spurious diphthong ει from ε+ε whereas the the present active second and third person singulars have a ει that is a true ε+ι diphthong.</p>
<p>This somewhat justifies our observation of the ις and ι pattern in the second and third person singulars across all the present actives we&rsquo;ve seen so far.</p>
<p>If we show the &ldquo;inert&rdquo; part of the endings separated from the vowel that interacts with a preceding stem vowel to form the circumflex verbs, we get something like this:</p>
<table class="table">
<tr><th>&nbsp;<th>active<th>middle
<tr><th>INF<td>ε ε ν<td>ε σθαι
<tr><th>1SG<td>ω -<td>ο μαι
<tr><th>2SG<td>ε ις<td>η ι (sometimes ε ι)
<tr><th>3SG<td>ε ι<td>ε ται
<tr><th>1PL<td>ο μεν<td>ο μεθα
<tr><th>2PL<td>ε τε<td>ε σθε
<tr><th>3PL<td>ου σι(ν)<td>ο νται
</table>
<p>You can see the predominance of initial ε and ο with three exceptions:</p>
<ul>
<li>the ω of the <strong>ACT 1SG</strong></li>
<li>the ου of the <strong>ACT 3PL</strong></li>
<li>the η of the <strong>MID 2SG</strong></li>
</ul>
<p>We now know to ask the question: is ου in <strong>ACT 3PL</strong> a spurious diphthong (from ο+ο) or a true diphthong (from o+υ)? If υ works the same way as ι in our contraction rules, it must be a spurious diphthong.</p>
<p>There&rsquo;s additional evidence for this:</p>
<ul>
<li>In the Western Greek dialects (like Doric) we find -οντι</li>
<li>It was not uncommon for Attic-Ionic to have σι for τι in other dialects (we&rsquo;ll encounter more examples later)</li>
<li>Dentals like ν drop out in Attic-Ionic when followed by σ and this generally causes the preceding vowel to lengthen (what is called <strong>compensatory lengthening</strong>)</li>
</ul>
<p>So it seems our ουσι(ν) was originally from the -οντι preserved in Doric.</p>
<p>This introduces interesting parallels with the -ονται in the middle.</p>
<p>What about the ῃ in the <strong>MID 2SG</strong>? We don&rsquo;t need to go to another dialect to see traces of what&rsquo;s going on. In the NT we have the <strong>PM-4</strong> circumflex verb:</p>
<table class="table">
<tr><th>INF<td>
<tr><th>1SG<td>καυχῶμαι
<tr><th>2SG<td><b>καυχᾶσαι</b>
<tr><th>3SG<td>
<tr><th>1PL<td>καυχώμεθα
<tr><th>2PL<td>καυχᾶσθε
<tr><th>3PL<td>καυχῶνται
</table>
<p>with <strong>ᾶσαι</strong> for ᾷ. The ᾶσαι can be explained as the stem vowel α interacting with the ending εσαι. The ᾷ can be explained simply through the σ dropping out (and similarly the ῃ in the <strong>PM-1</strong> and <strong>PM-2</strong> and so on) plus our contraction rules.</p>
<p>Interestingly, later Greek restored the uncontracted ending and we find it again in Modern Greek.</p>
<p>And so we have the reconstructed endings:</p>
<table class="table">
<tr><th>&nbsp;<th>active<th>middle
<tr><th>INF<td>ε εν<td>ε σθαι
<tr><th>1SG<td>ω -<td>ο μαι
<tr><th>2SG<td>ε ις<td>ε σαι &gt; ῃ
<tr><th>3SG<td>ε ι<td>ε ται
<tr><th>1PL<td>ο μεν<td>ο μεθα
<tr><th>2PL<td>ε τε<td>ε σθε
<tr><th>3PL<td>ο ντι &gt; ουσι(ν)<td>ο νται
</table>
<p>There are some tantalising patterns here, especially in the middle: the αι in 5 out of 7 cells; the μ/σ/τ in the 1st/2nd/3rd person.</p>
<p>As usual I want to emphasize the reconstructed forms in this table help explain things historically but should not necessarily be taken as an indication of a process that went on syncronically in the minds of native speakers. I&rsquo;m not aware of any evidence that native speakers would have, for example, thought of ουσι as being an underlying οντι, or ῃ as being an underlying εσαι. </p>
<p>We haven&rsquo;t yet explained what&rsquo;s going on with the <strong>ACT 1SG</strong> nor why ει would have been an alternative for ῃ in the <strong>MID 2SG</strong>.</p>
<p>But other than the <strong>ACT 1SG</strong>, all other endings start with either an ε or ο. We&rsquo;ll talk more about this later (including why this vowel is called the <strong>thematic vowel</strong>) but note that which of the two vowels is used is completely predictable by what follows.</p>
<p>If the following segment is nasal (μ or ν), the vowel is ο. If the following segment is ε, ι, σ, or τ, the vowel is ε. Most descriptions consider the ε the default and the nasal context leading to ο being the exception. But we could also look for features that ε, ι, σ, and τ have that μ and ν don&rsquo;t (other than just being NON-nasal).</p>
http://jktauber.com/2017/07/17/tour-greek-morphology-part-8/A Tour of Greek Morphology: Part 82017-07-17T14:47:57Z2017-07-17T04:46:00ZJames Tauber
<p>Part eight of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part eight of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>So far, just for the active, we&rsquo;ve suggested the following contraction rules.</p>
<ul>
<li>έει &gt; εῖ </li>
<li>έω &gt; ῶ</li>
<li>έε &gt; εῖ</li>
<li>έο &gt; οῦ</li>
<li>έου &gt; οῦ</li>
<li>άω &gt; ῶ</li>
<li>άε &gt; ᾶ</li>
<li>άει &gt; ᾷ (in the indicative) and ᾶ (in the infinitive)</li>
<li>άο &gt; ῶ</li>
<li>άου &gt; ῶ</li>
<li>όω &gt; ῶ</li>
<li>όε &gt; οῦ</li>
<li>όει &gt; οῖ (in the indicative) and οῦ (in the infinitive)</li>
<li>όο &gt; οῦ</li>
<li>όου &gt; οῦ</li>
<li>ήω &gt; ῶ</li>
<li>ήε &gt; ῆ</li>
<li>ήει &gt; ῇ (in the indicative) and ῆ (in the infinitive)</li>
<li>ήο &gt; ῶ</li>
<li>ήου &gt; ῶ</li>
</ul>
<p>In this post I want to explain why these aren&rsquo;t just an arbitrary set of sound changes and that they are really quite systematic. We&rsquo;ll say a little bit about Greek orthography and build a model using some simple phonological features that explains the core contraction rules quite compactly.</p>
<p>Before I do that, though, I want to emphasize again that I&rsquo;m not suggesting these &ldquo;rules&rdquo; need to be learned by the language learner. They are historical explanations for the spelling of circumflex verb endings in certain dialects and I&rsquo;m discussing them to give people a flavour for linguistic description. <strong>The best way to learn the circumflex verbs is to produce and read them in context.</strong> It really doesn&rsquo;t take long to just intuitively know that ἀγαπᾷς is a second person singular or that ἀγαπᾶν is an infinitive. You don&rsquo;t need to know the contraction rules or how to model them with phonological features.</p>
<p>But if you&rsquo;re interested in WHY the forms are ἀγαπᾷς and ἀγαπᾶν (including why one has an iota subscript and the other doesn&rsquo;t) keep reading!</p>
<h2 id="orthography">Orthography</h2>
<p>You&rsquo;ve probably been told that ε and o are always short vowels. As far the LETTERS themselves go, in our standard Greek orthography, that is true. But a long ε and a long o existed as sounds in Classical Greek and earlier. Different dialects wrote these differently. Some just wrote Ε and Ο regardless of whether they were long or short. This is similar to Α, Ι, or Υ, which could be used for both the short and long variants. The Ionians, however, used the digraphs ΕΙ and ΟΥ for the long-Ε and long-Ο respectively. At the time, this was NOT the same sound as the diphthongs ΕΙ and ΟΥ, despite being written the same. It is likely that the long ε and long ο were pronounced with the tongue a little higher up (hence closer to the way ι and υ were pronounced) to reduce any confusion with η and ω which were pronounced with a lower tongue, closer to α. The digraphs ΕΙ and ΟΥ, when used for the long ε and long ο are sometimes called &ldquo;spurious diphthongs&rdquo; because they weren&rsquo;t actually diphthongs at all, they were long monophthongs.</p>
<p>The Greeks started to standardize on the Ionian spelling and, in 403 BC, Athens officially adopted the Ionian spelling.</p>
<p>This purely orthographic convention explains why εε &gt; ει and οο &gt; ου. That doesn&rsquo;t mean ALL occurences of ει are long ε or all occurences of ου are long ο. ει and ου CAN be true diphthongs, but when they come from ε+ε or ο+ο respectively, they are just long monophthongs.</p>
<p>Now as already mentioned, both short and long α was just written as α and so αα &gt; α is a similarly straightforward contraction (the result being a long α). If you have a circumflex or an iota subscript, the α must have been long. </p>
<h2 id="basic-contractions">Basic Contractions</h2>
<p>So the diagonals of this contraction table make sense:</p>
<table class="table">
<tr><th>&nbsp;<th>ε<th>ο<th>α
<tr><th>ε<td class="success">ει<td>ου<td>η
<tr><th>ο<td>ου<td class="success">ου<td>ω
<tr><th>α<td>α<td>ω<td class="success">α
</table>
<p>Now ε+ο and o+ε both result in a long ο (written ου). The order doesn&rsquo;t matter. The ο wins out over the ε and the ε assimilates to ο resulting in the equivalent to ο+ο.</p>
<p>Both α+ο and ο+α result in ω and again order doesn&rsquo;t matter. At the time of the spelling standardization, ω was effectively in between α and ο so this makes sense.</p>
<p>Note, however, that α+ε and ε+α don&rsquo;t behave the same way in our table above. α+ε results in α but ε+α results in η. We might expect both to be η given how α+ο and ο+α behaved. It seems that order matters in some cases but not others.</p>
<h2 id="phonological-features">Phonological Features</h2>
<p>One way we can model all this is by assigning each of the vowels binary features of <strong>low</strong>, <strong>back</strong>, and <strong>round</strong> and making generalisations about those categories.</p>
<div align="center">
<img src="https://jktauber.com/site_media/static/mid-low-vowels.png">
</div>
<p>In other words:</p>
<table class="table">
<tr><th>&nbsp;<th>low<th>back<th>round
<tr><th>ε<td><big>-</big><td><big>-</big><td><big>-</big>
<tr><th>ο<td><big>-</big><td><big>+</big><td><big>+</big>
<tr><th>η<td><big>+</big><td><big>-</big><td><big>-</big>
<tr><th>ω<td><big>+</big><td><big>+</big><td><big>+</big>
<tr><th>α<td><big>+</big><td><big>+</big><td><big>-</big>
</table>
<p>Note that not all combinations are possible and <strong>+round</strong> implies <strong>+back</strong>.</p>
<p>(We haven&rsquo;t included ι or υ here as they don&rsquo;t play a part in this analysis.)</p>
<p>Now all the ε, ο, α contractions can be explained in terms of assimilation of <strong>+low</strong> and <strong>+round</strong> and <em>partial</em> assimilation of <strong>+back</strong>, as follows:</p>
<ul>
<li>the output is <strong>+low</strong> if <em>either</em> input vowel is <strong>+low</strong></li>
<li>the output is <strong>+round</strong> if <em>either</em> input vowel is <strong>+round</strong></li>
<li>the output is <strong>+back</strong> if the <em>first</em> input vowel is <strong>+back</strong></li>
<li>the output is <strong>+back</strong> if it is <strong>+round</strong></li>
</ul>
<p>The rules also explain why any vowel + ω goes to ω. In fact, if you work them through, these simple rules explain all 23 contractions in our list at the top of the post (and more that haven&rsquo;t come in to play yet) with just one additional rule:</p>
<ul>
<li>if you have more than two vowels, the contraction is left associative</li>
</ul>
<p>There are likely other solutions with other features and rules but my analysis roughly follows that of Sommerstein in <em>The Sound Pattern of Ancient Greek</em>, that of Bubeník in <em>The Phonological Interpretation of Ancient Greek: A Pandialectal Analysis</em> (which also considers differences in things like the Doric dialect), and apparently that of Lejeune in <em>Phonétique historique du mycénien et du grec ancien</em> (on which Bubeník&rsquo;s is based). This style of analysis is typical of the early second half of the twentieth century so I&rsquo;m not claiming it&rsquo;s in any way state-of-the-art. But it demonstrates that the contraction rules are very systematic.</p>
<h2 id="the-difference-in-the-infinitive-vs-indicative">The Difference in the Infinitive vs Indicative</h2>
<p>There is one final thing we haven&rsquo;t explicitly addressed but which is fully explained by these simple rules on features: why is άει sometimes ᾷ and sometimes ᾶ (and likewise why is όει sometimes οῖ and sometimes οῦ)?</p>
<p>The answer is simply that if the ει is a spurious diphthong (i.e. actually just a long εε) then our simple rules will result in long ᾶ but if it&rsquo;s a true diphthong, the result is long α + ι which is written ᾷ. Similarly in the case of όει, a spurious diphthong will result in οῦ (from οεε &gt; οοε &gt; οο &gt; ου) but a true diphthong in οῖ (οει &gt; οοι &gt; οι)).</p>
<p>What this tells us is that the ει in the ειν ending in the infinitive is a spurious diphthong but the ει in εις and ει in the second and third person singular actives are true diphthongs.</p>
http://jktauber.com/2017/07/16/man-walks-bar/A Man Walks Into A Bar2017-07-16T20:18:30Z2017-07-16T20:15:24ZJames Tauber
<p>I&rsquo;ve thought for a while that &ldquo;A man walks into a bar&rdquo; jokes are a great example of how definiteness works in English. I mentioned this to Jonathan Robie in Cambridge and he seemed to like the example too so I thought I&rsquo;d share it more broadly.</p>
<p>I&rsquo;ve thought for a while that &ldquo;A man walks into a bar&rdquo; jokes are a great example of how definiteness works in English. I mentioned this to Jonathan Robie in Cambridge and he seemed to like the example too so I thought I&rsquo;d share it more broadly.</p>
<p>Consider the standard joke form:</p>
<blockquote>
<p>A man walks into a bar. The bartender says X. The man says Y.</p>
</blockquote>
<p>Notice this has two indefinite articles and two definite articles. When do we use the indefinite article and when do we use the definite article?</p>
<p>In our sentence above, we&rsquo;ve neither been introduced to the man nor the bar before. And so we use the indefinite article.</p>
<p>We can&rsquo;t say &ldquo;* <strong>The</strong> man walks into a bar&rdquo; unless he&rsquo;s been introduced before. Likewise we can&rsquo;t say &ldquo;* <strong>the</strong> bar&rdquo; unless the bar&rsquo;s been introduced before. For example,</p>
<blockquote>
<p>Chris is one crazy guy! The man walks into a bar&hellip;</p>
</blockquote>
<p>is fine if we take the man to be Chris. Similarly,</p>
<blockquote>
<p>You know that bar on 52nd Street? A man walks into the bar&hellip;</p>
</blockquote>
<p>works if the bar in the joke is the one on 52nd Street.</p>
<p>If we were telling a second joke, we could use <strong>the</strong> to indicate the man (or the bar) was the same but notice we&rsquo;d have to use something like <strong>another</strong> and NOT <strong>a</strong> for introducing a second bar (or man):</p>
<blockquote>
<p>Later, the man walks into another bar&hellip;</p>
</blockquote>
<p>or</p>
<blockquote>
<p>Later, another man walks into the bar&hellip;</p>
</blockquote>
<p>Notice in our original joke, the third sentence starts &ldquo;<strong>The</strong> man&rdquo;. This makes sense because that man has already been introduced. We wouldn&rsquo;t say &ldquo;* The man walks into a bar. The bartender says X. <strong>A</strong> man says Y.&rdquo; Even it were a different man, we&rsquo;d probably use something like &ldquo;<strong>Another</strong> man&rdquo;.</p>
<p>But notice we <em>did</em> use <strong>the</strong> with the bartender even though he or she has NOT been introduced yet. The reason is our <em>frame</em> for a bar is that it has a bartender. The existence of the bartender has effectively been set up by us having a bar and that&rsquo;s the bartender we want to reference so it&rsquo;s not a completely new reference. Saying &ldquo;* A man walks into a bar. <strong>A</strong> bartender says X&rdquo; would be odd. Notice also that even if the bartender is a man, the following &ldquo;The man says Y&rdquo; is unambiguous.</p>
<p>Even if there were more than one bartender (certainly possible, although not prototypical for the frame) we&rsquo;d have to say something like &ldquo;<strong>One of the</strong> bartenders says X&rdquo;. </p>
<p>This can be demonstrated with an example where we EXPECT multiple instances.</p>
<blockquote>
<p>A man walks into a classroom. <strong>One of the</strong> students says X.</p>
</blockquote>
<p>In this case, it would be odd to say &ldquo;* <strong>A</strong> student says X&rdquo; and even odder to say &ldquo;* <strong>the</strong> student says X&rdquo;. We want definiteness (because the classroom frame has already established the likelihood of a <em>group</em> of students and that&rsquo;s the group we want to reference a member of) but because it&rsquo;s a group, we need to say &ldquo;one of&rdquo; to call out an individual.</p>
<p>&ldquo;One of the&rdquo; calls out an indefinite member of a definite group.</p>
http://jktauber.com/2017/07/14/tour-greek-morphology-part-7/A Tour of Greek Morphology: Part 72017-07-14T05:47:25Z2017-07-14T05:47:25ZJames Tauber
<p>Part seven of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part seven of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>κλῶμεν in 1Co 10.16 is clearly <strong>ACT 1PL</strong> but we can&rsquo;t tell from just that if it&rsquo;s a <strong>PA-4</strong> or <strong>PA-5</strong>. In authors like Galen and Hippocrates we find the <strong>MID 3SG</strong> κλᾶται which we&rsquo;ve called <strong>PM-4</strong>, which strongly suggests it&rsquo;s a <strong>PA-4</strong> in the active.</p>
<p>If that&rsquo;s the case, we&rsquo;d expect an <strong>ACT 2SG</strong> of κλᾷς, an <strong>ACT 3SG</strong> of κλᾷ, and an <strong>ACT 3PL</strong> of κλῶσι(ν).</p>
<p>But in various authors we can find the respective forms κλάεις, κλάει, and κλάουσι.</p>
<p>This suggests that <strong>α</strong> plays the same role in <strong>PA-4</strong> and <strong>PM-4</strong> as <strong>ε</strong> did in <strong>PA-2</strong> and <strong>PM-2</strong>.</p>
<p>For this to work,</p>
<ul>
<li>άω &gt; ῶ</li>
<li>άε &gt; ᾶ</li>
<li>άει &gt; ᾷ (in the indicative) and ᾶ (in the infinitive)</li>
<li>άο &gt; ῶ</li>
<li>άου &gt; ῶ</li>
</ul>
<p>We&rsquo;ll discuss the άει issue in the next post.</p>
<p>What about <strong>PA-3</strong> and <strong>PM-3</strong>? We&rsquo;re basically trying to solve for <strong>x</strong> given:</p>
<ul>
<li>xω &gt; ω</li>
<li>xε &gt; ου</li>
<li>xει &gt; οι (in the indicative) and ου (in the infinitive)</li>
<li>xο &gt; ου</li>
<li>xου &gt; ου</li>
</ul>
<p>It&rsquo;s difficult to find examples in the present verb forms of other dialects and texts, but even in the New Testament it&rsquo;s not difficult to find cases where οε and οο are alternatively spelled ου (e.g. ἀγαθοεργ- in 1 Tim and ἀγαθουργ- in Acts). This makes <strong>ο</strong> a possible candidate for <strong>x</strong> and note, in particular, the <strong>ACT 3SG</strong> forms have so far all been quite transparent in what vowel ends the stem.</p>
<p>So we appear to have:</p>
<ul>
<li>όω &gt; ῶ</li>
<li>όε &gt; οῦ</li>
<li>όει &gt; οῖ (in the indicative) and οῦ (in the infinitive)</li>
<li>όο &gt; οῦ</li>
<li>όου &gt; οῦ</li>
</ul>
<p>And although a proper argument will get us quite far afield (maybe one day), it turns out <strong>PA-5</strong> and <strong>PM-5</strong> can be explained by:</p>
<ul>
<li>ήω &gt; ῶ</li>
<li>ήε &gt; ῆ</li>
<li>ήει &gt; ῇ (in the indicative) and ῆ (in the infinitive)</li>
<li>ήο &gt; ῶ</li>
<li>ήου &gt; ῶ</li>
</ul>
<p>So, in summary, the circumflex verbs can be explained through a historical interaction (generally referred to as a contraction) between a vowel at the end of the original stem and the vowel at the start of what is added to it. </p>
<ul>
<li><strong>PA-2</strong> and <strong>PM-2</strong> come from a stem originally ending in <strong>έ</strong></li>
<li><strong>PA-3</strong> and <strong>PM-3</strong> come from a stem originally ending in <strong>ό</strong></li>
<li><strong>PA-4</strong> and <strong>PM-4</strong> come from a stem originally ending in <strong>ά</strong></li>
<li><strong>PA-5</strong> and <strong>PM-5</strong> come from a stem originally ending in <strong>ή</strong></li>
</ul>
<p>Often circumflex verbs are referred to as <strong>contract verbs</strong> but, while contraction is indeed the historical explanation for how the circumflex verbs got their forms, I like the name <strong>circumflex verbs</strong> because it describes an actual synchronic characteristic of the verb forms rather than an explanation of how they happened to get like that. It&rsquo;s interesting that ancient grammarians like Dionysius Thrax called them <strong>perispomenon</strong> verbs (the term for words with a circumflex on the last syllable) and called <strong>PA-1</strong>/<strong>PM-1</strong> verbs <strong>barytone</strong> verbs (the term for words with NO ACCENT on the last syllable).</p>
<p>In the next post, we&rsquo;ll explore why the contraction rules are not random but, in fact, are quite systematic. We&rsquo;ll also touch on why the contractions don&rsquo;t seem to work quite the same way in the infinitive.</p>
http://jktauber.com/2017/07/11/tour-greek-morphology-part-6/A Tour of Greek Morphology: Part 62017-07-12T00:26:04Z2017-07-11T22:55:12ZJames Tauber
<p>Part six of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part six of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Every form we&rsquo;ve seen of λύω so far starts with <strong>λυ</strong>, unchanged except for accent. Also, all the forms that start with <strong>λυ</strong> (or <strong>λύ</strong>) have been forms of λύω.</p>
<p>Every form we&rsquo;ve seen so far that&rsquo;s active first person plural ends with <strong>μεν</strong>. Also, all the forms that end with <strong>μεν</strong> have been active first person plural.</p>
<p>Put another way, the <strong>λύ</strong> in λύομεν has nothing to do with being active first person plural and the <strong>μεν</strong> in λύομεν has nothing to do with being a form of λύ (at least based on every paradigm we&rsquo;ve seen so far).</p>
<p>What about the <strong>ο</strong> in between them? It cannot (at least at the moment) be said to only depend on the fact we have a form of λύω nor can it be said to only depend on the fact we have an active first person plural form. The vowel seems to depend BOTH on the lexical item AND the morphosyntactic properties of voice, person, and number.</p>
<p>Similarly with ποιεῖτε. The initial <strong>ποι</strong> indicates and only indicates the lexical item. The final <strong>τε</strong> indicates and only indicates the active second person plural. The fact we have <strong>εῖ</strong> rather than <strong>ο</strong> (or <strong>ε</strong> or <strong>οῦ</strong> or any other vowel) is because of BOTH the lexical item and the morphosyntactic properties.</p>
<p>What is happening here becomes very clear when we look at some older texts or texts in more conservative dialects. For example, in Herodotus, written in the Ionic dialect, we don&rsquo;t find ποιεῖτε but instead ποιέετε. In fact, here&rsquo;s what we find:</p>
<table class="table">
<tr><th>ACT INF<td>ποιέειν
<tr><th>ACT 1SG<td>ποιέω
<tr><th>ACT 2SG<td>ποιέεις
<tr><th>ACT 3SG<td>ποιέει
<tr><th>ACT 1PL<td>
<tr><th>ACT 2PL<td>ποιέετε
<tr><th>ACT 3PL<td>
</table>
<p>There are a couple of things about this that are remarkable. Firstly, if we split off the common part (now <strong>ποιέ</strong> rather than ποι) then our distinguishers are all IDENTICAL to those of λύω. Secondly, this restores the accent placement to be properly recessive.</p>
<p>Our ποιῶ and ποιεῖτε are so accented (and not *ποίω or *ποίειτε) because the accent has remained on the same mora (relative to the start) as the older form.</p>
<p>The vowels are thus explained by noting that historically:</p>
<ul>
<li>έει &gt; εῖ </li>
<li>έω &gt; ῶ</li>
<li>έε &gt; εῖ</li>
</ul>
<p>Even without finding the necessary forms in Herodotus, we can infer (assuming the ποιέ is consistent and the distinguishers are those of λύω) the forms missing above and hence the following additional historical vowel changes:</p>
<ul>
<li>έο &gt; οῦ</li>
<li>έου &gt; οῦ</li>
</ul>
<p>And making the same assumption about the middle forms add:</p>
<ul>
<li>έῃ &gt; ῇ</li>
</ul>
<p>All the <strong>PA-2</strong> and <strong>PM-2</strong> endings can now be explained by:</p>
<ul>
<li>the verb-specific common part (the <strong>stem</strong>) ending in <strong>ε</strong></li>
<li>the voice / person / number endings originally being identical to those of λύω</li>
<li>the six historical vowel changes listed (referred to as <strong>contractions</strong>)</li>
</ul>
<p>In the tour&rsquo;s next post, we&rsquo;ll see if we can similarly explain the other forms we&rsquo;ve seen. Then, in a subsequent post, we&rsquo;ll come back to these vowel changes and see what&rsquo;s systematic about them.</p>
<p>I want to close by emphasizing that I am only trying to describe HOW the circumflex verbs came about, not suggest anything about how native speakers processed or generated the contracted forms. As an analogy: it might be <em>interesting</em> to learn why the English words <em>foot</em> and <em>feet</em> are spelled the way they are relative to how they are pronounced but that explanation doesn&rsquo;t bear much, if any, relation to what&rsquo;s going on in the minds of native speakers nor is it necessarily of any use to people learning English as a second language. I&rsquo;ll touch on that again in a few posts time, but you can also read my 2015 post <a href="https://jktauber.com/2015/11/19/dangers-reconstructing-too-much-morphophonology/">The Dangers of Reconstructing Too Much Morphophonology</a>.</p>
http://jktauber.com/2017/07/10/categories-reader-work/Categories of Reader Work2017-07-10T22:00:15Z2017-07-10T22:00:15ZJames Tauber
<p>I sometimes get people expressing an interest in my Greek reader work or get asked about the status of my &ldquo;reader&rdquo; and I have to ask them to clarify which reader they mean. I thought I might do a quick post where I spell out various &ldquo;reader&rdquo; projects I have worked on and am working on.</p>
<p>I sometimes get people expressing an interest in my Greek reader work or get asked about the status of my &ldquo;reader&rdquo; and I have to ask them to clarify which reader they mean. I thought I might do a quick post where I spell out various &ldquo;reader&rdquo; projects I have worked on and am working on.</p>
<p>My interest in tools for helping read Greek (especially, but by no means only, the New Testament) goes back at least thirteen or fourteen years. In a <a href="https://jktauber.com/2004/11/26/programmed-vocabulary-learning-travelling-salesman/">2004 post</a> copied over to this blog, I talk about algorithms for ordering vocabulary to accelerate verse coverage. It was around this time I was also working on what became <a href="http://quisition.com/">Quisition</a>, a flashcard site with spaced repetition.</p>
<p>In November 2005, I registed the domain <code>readjohn.com</code> with a view to building a site to help people learn Greek by reading through John&rsquo;s gospel. The reason for John was not only the simplicity of its Greek but the fact it&rsquo;s the one thing I had the OpenText analysis for at the time. As proof I had more than just the GNT in mind, I point out that I registered <code>readhomer.com</code> just two months later. I wasn&rsquo;t just thinking Greek either, as I registered <code>readdante.com</code> at the same time.</p>
<p>Vocabulary was just an initial part of the model of what it takes to be able to read a text. It happens to be the easiest to model because all it takes, to first approximation, is a lemmatized text. But it illustrates the basic concept: if you model what is needed to read a text and you model what a student knows, you can:</p>
<ul>
<li>help order texts (including individual clauses or even phrases) in a way that&rsquo;s appropriate to the student&rsquo;s level</li>
<li>appropriately scaffold the texts with just enough information to fill in the gap in their understanding</li>
</ul>
<p>One thing I was experimenting with for scaffolding was inlining Greek that the student could understand (according to the ordering generated by my vocabulary algorithms) in larger text kept in English. So in the first lesson, the student might be given something like John 1.41 in this form:</p>
<blockquote>
<p>He first found his own brother Simon καὶ λέγει αὐτῷ, &ldquo;We have found the Messiah!&rdquo; </p>
</blockquote>
<p>The combination of vocabulary ordering algorithms (driven by clause-level analysis of John&rsquo;s gospel) with this sort of inlining I was calling a <strong>New Kind of Graded Reader</strong> and you can find a lot of posts from around March 2008 on this blog about it including <a href="https://jktauber.com/2008/02/10/new-kind-graded-reader/">this video</a>. I subsequently did <a href="https://jktauber.com/2010/03/28/my-bibletech-2010-talk/">a full-length talk at BibleTech 2010</a>. There&rsquo;s also <a href="https://jktauber.com/2010/04/25/inline-replacement-john-2/">a post with an extended example of the inlining approach</a>.</p>
<p>That initial category of reader work is still alive and by no means abandonded, it&rsquo;s just taking a long time to get the analysis broadened to take into account not just vocabulary but inflectional morphology, lexical relatedness, syntactic constructions, etc. In fact, a large part of my linguistic analysis work is motivated by the reader work (which was a big theme of <a href="https://jktauber.com/2015/05/06/my-bibletech-2015-talk/">my BibleTech 2015 talk</a>).</p>
<p>The <em>second</em>, somewhat independent (although still very much corpus-driven and using much of the same machine-actionable linguistic data) reader project was the semi-automated generation of more traditional print readers (the sort with rarer words glossed in footnotes and perhaps more obscure syntactic constructions or idioms commented on). You can read more about it in <a href="https://jktauber.com/2015/11/07/generating-readers/">this post</a>. One aim with the <strong>semi-automatic generation of printed readers</strong> was being able to customize them quite easily to a particular level. The scaffolding wouldn&rsquo;t necessarily be adaptive but it could be personalized.</p>
<p>Again this is still of great interest to me and motivates a lot of work on machine-actionable data. While I might experiment with approaches other than using TeX, I still want to do more in this area, most likely collaborating with people interested in particular texts (and able to help work on glosses and syntactic commentary).</p>
<p>A <em>third</em> category of work is a loose collection of various little prototypes over the years for ways of presenting information in a reader. This includes things like interlinears, colour-coded texts, various ways of showing dependency relations, etc. Brian Rosner and I consolidated these prototypes in a <strong>framework for generating static HTML files</strong> in <a href="https://github.com/jtauber/online-reader">https://github.com/jtauber/online-reader</a>. There are various online demos linked in the README. </p>
<p>That repo <em>did</em> initially include a dynamic reading environment written in Vue.js but that was broken out as the starting point for DeepReader (see below).</p>
<p>The <em>fourth</em> category of work (which goes back to my vision for readjohn.com, readhomer.com and readdante.com when I registered the domains) is an <strong>online adaptive reading environment with integrated learning tools</strong>. I talked about this at SBL 2016 in San Antonio, a Global Philology workshop in Leipzig in May, and I will be talking about it at SBL International 2017 in Berlin next month.</p>
<p>The idea is to integrate vocabulary and morphological drills with the reading environment so the text drives what to drill, the results of the drills help determine the text, the scaffolding needed, etc.</p>
<p>So the adaptive reading environment will model:</p>
<ul>
<li>what&rsquo;s needed to understand an upcoming passage</li>
<li>what the student has already seen</li>
<li>what the student has inquired about</li>
<li>what is at an optimal recall interval</li>
<li>what the student is good or not so good at understanding (based on explicit assessment including meta-cognitive questions)</li>
</ul>
<p>This is what I&rsquo;m most actively working on at the moment. As with the other categories of readers, it relies heavily on linguistic resources so I&rsquo;m doing a lot in that area.</p>
<p>From an <em>implementation</em> point-of-view, this is being implemented as a Vue.js-based application running in the browser talking to a range of microservices on the backend. Much of the &ldquo;heavy lifting&rdquo; will be done by the microservices. The generic parts of the frontend application are being broken out by Brian and me as a framework called DeepReader which could be used for all sorts of readers (even just Kindle-style EPUB readers). I&rsquo;ll have a lot more to say about DeepReader in the future as well as the specific application of it to building an adaptive reading environment for Greek.</p>
<p>So there are really four distinct categories of reader projects that I&rsquo;ve been working on on and off for the last thirteen or fourteen years:</p>
<ul>
<li>a &ldquo;New Kind of Graded Reader&rdquo;</li>
<li>semi-automatic generation of printed readers</li>
<li>framework for generating static HTML files</li>
<li>online adaptive reading environment with integrated learning tools</li>
</ul>
<p>They are all related in that they build on the same linguistic data (which is where most of the effort actually goes).</p>
<p>Hopefully all that provides a little bit of a high-level guide to all the reading stuff talked about on this blog, on Twitter, and which is implemented in various repositories on GitHub.</p>
<p>I should stress none of the code is specific to the New Testament or even to Greek. I&rsquo;d be happy to collaborate with anyone on producing the necessary linguistic data for other texts and other languages.</p>
http://jktauber.com/2017/07/06/tour-greek-morphology-part-5/A Tour of Greek Morphology: Part 52017-07-06T16:26:55Z2017-07-06T16:26:55ZJames Tauber
<p>Part five of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part five of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In <a href="https://jktauber.com/2017/07/02/tour-greek-morphology-part-4/">part four</a>, we introduced the <strong>circumflex verbs</strong> in the present active. Now we&rsquo;re going to look at their middle forms.</p>
<p>Here they are alongside the middle of λύω:</p>
<table class="table">
<tr><th>INF<td>λύεσθαι<td>ποιεῖσθαι<td>δηλοῦσθαι<td>τιμᾶσθαι<td>χρῆσθαι
<tr><th>1SG<td>λύομαι<td>ποιοῦμαι<td>δηλοῦμαι<td>τιμῶμαι<td>χρῶμαι
<tr><th>2SG<td>λύῃ or λύει<td>ποιῇ or ποιεῖ<td>δηλοῖ<td>τιμᾷ<td>χρῇ
<tr><th>3SG<td>λύεται<td>ποιεῖται<td>δηλοῦται<td>τιμᾶται<td>χρῆται
<tr><th>1PL<td>λυόμεθα<td>ποιούμεθα<td>δηλούμεθα<td>τιμώμεθα<td>χρώμεθα
<tr><th>2PL<td>λύεσθε<td>ποιεῖσθε<td>δηλοῦσθε<td>τιμᾶσθε<td>χρῆσθε
<tr><th>3PL<td>λύονται<td>ποιοῦνται<td>δηλοῦνται<td>τιμῶνται<td>χρῶνται
</table>
<p>As you can see, the circumflex pervades except in the <strong>1PL</strong> where the law of limitation prohibits it. This is also the one place the λύω accent is on the distinguisher.</p>
<p>Note also that, as was the case with the active, the forms in each row essentially have the same endings just with a vowel change.</p>
<p>Here are the common elements of each row of the distinguisher in both the active and middle:</p>
<table class="table">
<tr><th>&nbsp;<th>active<th>middle
<tr><th>INF<td>-ν<td>-σθαι
<tr><th>1SG<td>-<td>-μαι
<tr><th>2SG<td>-{ι}ς<td>-{ι}
<tr><th>3SG<td>-{ι}<td>-ται
<tr><th>1PL<td>-μεν<td>-μεθα
<tr><th>2PL<td>-τε<td>-σθε
<tr><th>3PL<td>-σι(ν)<td>-νται
</table>
<p>The iota in the <strong>2SG</strong> active and middle and the <strong>3SG</strong> active is questionable because we&rsquo;re splitting a diphthong but we&rsquo;ll return to that in another post.</p>
<p>The vowels prior to this common element seem to change as follows:</p>
<ul>
<li>if the distinguisher has a monophthong ε in λύω,<br>it will have ει, ου, α, η in the other paradigms</li>
<li>if the distinguisher has a monophthong ο in λύω,<br>it will have ου, ου, ω, ω in the other paradigms</li>
</ul>
<p>This applies to the active too (although the diphthongs there are found in more cells of the λύω paradigm).</p>
<p>We&rsquo;ll explore this more in the next post.</p>
<p>Before we end this one, though, let&rsquo;s label the paradigms for our present middle distinguishers:</p>
<table class="table">
<tr><th>&nbsp;<th>PM-1<th>PM-2<th>PM-3<th>PM-4<th>PM-5
<tr><th>INF<td>Xεσθαι<td>Xεῖσθαι<td>Xοῦσθαι<td>Xᾶσθαι<td>Xῆσθαι
<tr><th>1SG<td>Xομαι<td class="info">Xοῦμαι<td class="info">Xοῦμαι<td class="warning">Xῶμαι<td class="warning">Xῶμαι
<tr><th>2SG<td>Xῃ or Xει<td>Xῇ or Xεῖ<td>Xοῖ<td>Xᾷ<td>Xῇ
<tr><th>3SG<td>Xεται<td>Xεῖται<td>Xοῦται<td>Xᾶται<td>Xῆται
<tr><th>1PL<td>Xόμεθα<td class="info">Xούμεθα<td class="info">Xούμεθα<td class="warning">Xώμεθα<td class="warning">Xώμεθα
<tr><th>2PL<td>Xεσθε<td>Xεῖσθε<td>Xοῦσθε<td>Xᾶσθε<td>Xῆσθε
<tr><th>3PL<td>Xονται<td class="info">Xοῦνται<td class="info">Xοῦνται<td class="warning">Xῶνται<td class="warning">Xῶνται
</table>
<p>Notice that the <strong>1SG</strong>, <strong>1PL</strong>, and <strong>3PL</strong> distinguishers are identical for <strong>PM-2</strong> vs <strong>PM-3</strong> and for <strong>PM-4</strong> vs <strong>PM-5</strong>. This was similar to what we saw in the active case (although there, the <strong>1SG</strong> was even less helpful in identifying the paradigm).</p>
<p>Notice also that these are exactly the rows where the distinguisher in λύω starts with an omicron.</p>
http://jktauber.com/2017/07/02/tour-greek-morphology-part-4/A Tour of Greek Morphology: Part 42017-07-02T20:09:53Z2017-07-02T20:09:53ZJames Tauber
<p>Part four of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part four of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In the <a href="https://jktauber.com/2017/06/29/tour-greek-morphology-part-3/">previous part</a> we saw that more than half of the verb lexemes in the NT appearing in the present indicative follow the exact pattern of λύω, i.e. <strong>PA-1</strong> in the active and <strong>PM-1</strong> in the middle. In the next few parts to this series, we&rsquo;re going to look at some of the verbs that do NOT.</p>
<p>Here&rsquo;s our first example, placed alongside λύω for comparison (a paradigm of paradigms again):</p>
<table class="table">
<tr><th>INF<td>λύειν<td>ποιεῖν
<tr><th>1SG<td>λύω<td>ποιῶ
<tr><th>2SG<td>λύεις<td>ποιεῖς
<tr><th>3SG<td>λύει<td>ποιεῖ
<tr><th>1PL<td>λύομεν<td>ποιοῦμεν
<tr><th>2PL<td>λύετε<td>ποιεῖτε
<tr><th>3PL<td>λύουσι(ν)<td>ποιοῦσι(ν)
</table>
<p>Look closely at each pair on a row and notice a few things:</p>
<ul>
<li>in the infinitive, in all singulars, and in the third plural, the distinguishers are identical EXCEPT for accent</li>
<li>in the first and second plurals, the only other difference is ου vs ο and ει vs ε</li>
<li>whereas λύω never has the accent on the distinguisher, the seven forms of ποιῶ above ALWAYS do and it is always a circumflex</li>
<li>the accent is not strictly recessive the way it is in λύω and <strong>PA-1</strong> verbs in general</li>
</ul>
<p>We are going to call this new pattern <strong>PA-2</strong>.</p>
<p>There are many other verbs that follow the <strong>PA-2</strong> pattern and yet others that are quite similar but with small differences.</p>
<p>Here are some examples placed side-by-side with λύω and ποιῶ:</p>
<table class="table">
<tr><th>INF<td>λύειν<td>ποιεῖν<td>δηλοῦν<td>τιμᾶν<td>ζῆν
<tr><th>1SG<td>λύω<td>ποιῶ<td>δηλῶ<td>τιμῶ<td>ζῶ
<tr><th>2SG<td>λύεις<td>ποιεῖς<td>δηλοῖς<td>τιμᾷς<td>ζῇς
<tr><th>3SG<td>λύει<td>ποιεῖ<td>δηλοῖ<td>τιμᾷ<td>ζῇ
<tr><th>1PL<td>λύομεν<td>ποιοῦμεν<td>δηλοῦμεν<td>τιμῶμεν<td>ζῶμεν
<tr><th>2PL<td>λύετε<td>ποιεῖτε<td>δηλοῦτε<td>τιμᾶτε<td>ζῆτε
<tr><th>3PL<td>λύουσι(ν)<td>ποιοῦσι(ν)<td>δηλοῦσι(ν)<td>τιμῶσι(ν)<td>ζῶσι(ν)
</table>
<p>It will be clearer to see the similarities and differences by just showing the distinguishers.</p>
<table class="table">
<tr><th>&nbsp;<th>PA-1<th>PA-2<th>PA-3<th>PA-4<th>PA-5
<tr><th>INF<td>Xειν<td>Xεῖν<td>Xοῦν<td>Xᾶν<td>Xῆν
<tr><th>1SG<td>Xω<td class="success">Xῶ<td class="success">Xῶ<td class="success">Xῶ<td class="success">Xῶ
<tr><th>2SG<td>Xεις<td>Xεῖς<td>Xοῖς<td>Xᾷς<td>Xῇς
<tr><th>3SG<td>Xει<td>Xεῖ<td>Xοῖ<td>Xᾷ<td>Xῇ
<tr><th>1PL<td>Xομεν<td class="info">Xοῦμεν<td class="info">Xοῦμεν<td class="warning">Xῶμεν<td class="warning">Xῶμεν
<tr><th>2PL<td>Xετε<td>Xεῖτε<td>Xοῦτε<td>Xᾶτε<td>Xῆτε
<tr><th>3PL<td>Xουσι(ν)<td class="info">Xοῦσι(ν)<td class="info">Xοῦσι(ν)<td class="warning">Xῶσι(ν)<td class="warning">Xῶσι(ν)
</table>
<p>I&rsquo;ve given each of these patterns a label: <strong>PA-3</strong>, <strong>PA-4</strong>, <strong>PA-5</strong>.</p>
<p>(I&rsquo;ve also shaded some of the cells but I don&rsquo;t think that will come through if you&rsquo;re just reading this via the email subscription.)</p>
<p>All four of the new patterns have circumflex accents on the distinguisher in every cell. For this reason we will call these verbs <strong>circumflex</strong> verbs. </p>
<p>Notice that in <strong>1SG</strong>, the distinguisher is identical across all the circumflex verbs (-ῶ). What that means is, given just the <strong>1SG</strong> form of a circumflex verb, you can&rsquo;t tell exactly which of the patterns will be followed overall. Xῶ could be <strong>PA-2</strong>, <strong>PA-3</strong>, <strong>PA-4</strong> OR <strong>PA-5</strong>. You CAN tell, however, that it&rsquo;s not a <strong>PA-1</strong> verb (because of the circumflex).</p>
<p>In contrast to <strong>1SG</strong>, if you know ANY of the <strong>INF</strong>, the <strong>2SG</strong>, the <strong>3SG</strong>, or the <strong>2PL</strong>, you can tell exactly which pattern is being followed.</p>
<p>That leaves the interesting case of the <strong>1PL</strong> and <strong>3PL</strong>. An ου in either cell distinguisher means we have a <strong>PA-2</strong> or <strong>PA-3</strong> but don&rsquo;t know which. An ω in either cell distinguisher means we have a <strong>PA-4</strong> or <strong>PA-5</strong> but don&rsquo;t know which.</p>
<p>Put another way: presented with a <strong>1PL</strong> ending in -οῦμεν, we can tell (at least given what we&rsquo;ve see up until this point) what the <strong>1SG</strong> and <strong>3PL</strong> must be but we&rsquo;re left with two possibilities for all the other cells. The moment we know just one of those OTHER cells, though, we can tell what every cell must be.</p>
<p>We&rsquo;ll continue to explore these new patterns (and their corresponding middle patterns) over the next few posts.</p>
http://jktauber.com/2017/06/30/collapsible-treedown/Collapsible Treedown2017-06-30T05:57:18Z2017-06-30T05:57:18ZJames Tauber
<p>Jonathan Robie&rsquo;s Treedown format is a really nice way of conveying basic syntactic structure in real texts. I recently experimented a little with some code for collapsing and expanding of the structure.</p>
<p>Jonathan Robie&rsquo;s Treedown format is a really nice way of conveying basic syntactic structure in real texts. I recently experimented a little with some code for collapsing and expanding of the structure.</p>
<p>You can read about <a href="http://jonathanrobie.biblicalhumanities.org/blog/2017/05/12/lowfat-treebanks-visualizing/">Treedown in more detail</a> but the idea is to convey structure in a plain text format that still conveys meaning. The name &ldquo;treedown&rdquo; is a nod to &ldquo;markdown&rdquo; and the philosophy is very similar—convey information visually but in a way that&rsquo;s easy to transmit and edit in plain text.</p>
<p>One of the things that appeals to me about Treedown is how easily it can be used to just initially sketch out high level argument structure without getting into the weeds. But even if the analysis does go a little deeper, you want to be able to pull back and see the high-level structure without getting too much in the way of just reading. So to this end, I hacked together a bit of HTML, CSS and JS to demonstrate some UI to support this &ldquo;collapsibility&rdquo;. </p>
<p><img src="https://d26dzxoao6i3hh.cloudfront.net/items/3f3U3x2i3g2P0z2Y3B0W/Screen%20Recording%202017-06-28%20at%2009.21%20AM.gif?v=be0d1b9e" width="100%"></p>
<p>This is just plain Treedown (or one proposal for it—it&rsquo;s still a work in progress) but with some lightweight interactivity that lets the reader determine how much structure they want to see. Square brackets around the Treedown label indicates a further analysis that can be expanded.</p>
<p>I made a variant that lets you get a &ldquo;preview&rdquo; of the next level of structure when you hover over it, using labelled brackets:</p>
<p><img src="https://d26dzxoao6i3hh.cloudfront.net/items/2h020Y3H110j0Z1I0O0x/Screen%20Recording%202017-06-28%20at%2009.22%20AM.gif?v=174d3d11" width="100%"></p>
<p>I then thought that perhaps this preview might be better conveyed just with colour, where each Treedown label gets its own colour. Here&rsquo;s what that might look like:</p>
<p><img src="https://d26dzxoao6i3hh.cloudfront.net/items/1T460x1y3P360N0r3t1t/Screen%20Recording%202017-06-28%20at%2009.23%20AM.gif?v=3d39e68c" width="100%"></p>
<p>There are all just quick prototypes but let me know what you think.</p>
http://jktauber.com/2017/06/29/tour-greek-morphology-part-3/A Tour of Greek Morphology: Part 32017-06-29T02:23:03Z2017-06-29T02:23:03ZJames Tauber
<p>Part three of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part three of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In the first two parts (<a href="https://jktauber.com/2017/06/23/tour-greek-morphology-part-1/">part one</a> and <a href="https://jktauber.com/2017/06/25/tour-greek-morphology-part-2/">part two</a>), we looked at the present indicative forms of λύω.</p>
<p>I want to now add the infinitives, λύειν (for the active) and λύεσθαι (for the middle).</p>
<p>So we now have:</p>
<table class="table">
<tr><th>&nbsp;<th>active<th>middle
<tr><th>INF<td>λύειν<td>λύεσθαι
<tr><th>1SG<td>λύω<td>λύομαι
<tr><th>2SG<td>λύεις<td>λύῃ or λύει
<tr><th>3SG<td>λύει<td>λύεται
<tr><th>1PL<td>λύομεν<td>λυόμεθα
<tr><th>2PL<td>λύετε<td>λύεσθε
<tr><th>3PL<td>λύουσι(ν)<td>λύονται
</table>
<p>Adding the infinitives does make certain commonalities jump out even more: all the &lsquo;ει&rsquo; in the active and both the &lsquo;αι&rsquo; and &lsquo;(σ)θ&rsquo; in the middle. </p>
<p>But one of the big questions to address next is: does any of this have anything to do with the present indicative (and infinitive) forms of any <em>other</em> words besides λύω?</p>
<p>Fortunately (otherwise it might not have been the best of starting places) it <strong>does</strong>. In the MorphGNT, there are 645 distinct lexemes appearing in the present indicative and 383 of them follow <em>exactly</em> the same pattern as λύω above including the accentuation.</p>
<p>In the present active indicative, there are 10 verbs that exhibit all six cells in the paradigm: θέλω, ἀκούω, λέγω, μένω, λαμβάνω, γινώσκω, πιστεύω, μέλλω, ἔχω, βλέπω (note that λύω is not, in fact, among them).</p>
<p>In the middle, there are no words filling all six cells in the MorphGNT but there are five verbs that fill five of the cells: βούλομαι, λογίζομαι, ἔρχομαι, ἐργάζομαι, προσεύχομαι.</p>
<p>But allowing for the missing cells, 271 lexemes follow this active pattern in the present indicative and 160 lexemes follow this middle pattern (with overlap in the case of lexemes that have both active and middle forms):</p>
<table class="table">
<tr><th>&nbsp;<th>active<th>middle
<tr><th>INF<td>Xειν<td>Xεσθαι
<tr><th>1SG<td>Xω<td>Xομαι
<tr><th>2SG<td>Xεις<td>Xῃ or Xει
<tr><th>3SG<td>Xει<td>Xεται
<tr><th>1PL<td>Xομεν<td>Xόμεθα
<tr><th>2PL<td>Xετε<td>Xεσθε
<tr><th>3PL<td>Xουσι(ν)<td>Xονται
</table>
<p>The accent is recessive in every case so will be an acute on the right-most syllable of X in every case but Xόμεθα where the law of limitation means the accent can&rsquo;t go back as far as X. I could skip accents altogether but they&rsquo;ll turn out to be very important in the next few posts so it&rsquo;s actually helpful to include them in this template where they fall on the distinguisher (the part other than the X that varies from cell to cell). And note that if the distinguisher doesn&rsquo;t have an accent in the template it&rsquo;s because it doesn&rsquo;t have the accent in the full form.</p>
<p>I&rsquo;m going to call the active and middle pattern above <strong>PA-1</strong> and <strong>PM-1</strong> respectively.</p>
<p>We must avoid the temptation to talk of stems at this point. Even though X above does correspond to what&rsquo;s normally thought of as the stem, we will encounter many paradigm templates (including in the next few posts in this series) where that is not the case and it&rsquo;s better to be precise and avoid confusion from the start.</p>
http://jktauber.com/2017/06/25/tour-greek-morphology-part-2/A Tour of Greek Morphology: Part 22017-06-25T21:50:52Z2017-06-25T21:50:52ZJames Tauber
<p>Part two of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>Part two of a tour through Greek inflectional morphology to help get students thinking more systematically about the word forms they see (and maybe teach a bit of general linguistics along the way).</p>
<p>In the <a href="https://jktauber.com/2017/06/23/tour-greek-morphology-part-1/">first part</a> we took an initial look at the present active indicative paradigm for λύω, repeated below for easy reference:</p>
<ul>
<li>λύω</li>
<li>λύεις</li>
<li>λύει</li>
<li>λύομεν</li>
<li>λύετε</li>
<li>λύουσι(ν)</li>
</ul>
<p>There are a number of morphsyntactic properties we could alter to see the effect on the paradigm, but in this post, we&rsquo;ll look at the middle voice:</p>
<ul>
<li>λύομαι</li>
<li>λύῃ or λύει</li>
<li>λύεται</li>
<li>λυόμεθα</li>
<li>λύεσθε</li>
<li>λύονται</li>
</ul>
<p>So again, we&rsquo;re showing, side-by-side, the various number-person forms for λύω, keeping the tense, aspect, voice, and mood constant. In this way we can see, by comparing the paradigms (a paradigm of paradigms!), how the active/middle alternation is realized in Greek (at least for the present indicative λύω!)</p>
<p>A few things may immediately jump out at you:</p>
<ul>
<li>the forms continue to all start with λυ</li>
<li>the υ is always followed by a vowel (and mostly ε or ο)</li>
<li>the second person singular has two possible forms</li>
<li>three of the forms end in -αι</li>
<li>both the first person forms have a μ and both the third person forms have a τ</li>
<li>the first and second plural both have a θ and there seems to be more of a link between the active and middle forms (<strong>ομε</strong>ν/<strong>ομε</strong>θα, <strong>ε</strong>τ<strong>ε</strong>/<strong>ε</strong>σθ<strong>ε</strong>)</li>
</ul>
<p>We have to be careful not to make too much of some of these yet. Many a bad linguistic analysis has come from noticing patterns in a small number of instances without seeing if the same pattern applies more broadly! We need more data. But these initial observations are at least things to keep in the backs of our minds as we explore more forms. Some of them will prove particularly interesting later on.</p>
<p>For now I just want to explore the two second person singular forms, λύῃ and λύει. You&rsquo;ll notice one of these forms is identical to the third singular active form. Isn&rsquo;t this potentially confusing?</p>
<p>Yes, but there are two things to note here: one, it should generally be clear from the context, regardless of the ending, whether a third person active or second person middle is intended. Ambiguities in morphology like this are far more likely in cases where <em>multiple</em> morphsyntactic properties vary at once (in this case both person AND voice) and where the larger context is likely to make clear which alternative is meant. It&rsquo;s worth also noting, for example, that -ει can also end a dative noun (and in fact does in over 300 cases in the NT).</p>
<p>Two, the -ῃ forms are much more common in the NT than the -ει and, in fact, there&rsquo;s actually only one second person -ει form in the SBLGNT text and it is βούλει where <em>lexically</em> the word must be middle anyway and so even the context isn&rsquo;t needed to disambiguate.</p>
<p>As to why two forms developed in the first place, we&rsquo;ll have to wait a bit to discuss that.</p>
http://jktauber.com/2017/06/25/another-european-trip/Another European Trip2017-06-25T19:56:58Z2017-06-25T19:56:58ZJames Tauber
<p>I was here last month but I&rsquo;m back again for a series of conferences and then my graduation.</p>
<p>I was here last month but I&rsquo;m back again for a series of conferences and then my graduation.</p>
<p>Last week I attended the inaugural <strong>Language, Data and Knowledge</strong> conference in Galway, Ireland including the OntoLex Model Workshop which preceded it. The LDK conference was a nice intersection of linguistics and linked data very much in the spirit of the work described on this website. I got to met a few people I&rsquo;ve known of for a while as well as meet some new people I hope to stay in touch with and potentially collaborate with. The conference will be biennial, with the next one in Leipzig. I definitely plan to submit something for that one!</p>
<p>Then I attended <strong>VueConf</strong> in Wrocław, Poland. Vue.js is the JavaScript framework I&rsquo;m using for my online reading environment work and the timing turned out perfectly for me to attend. I gave a lightning talk on the DeepReader project (which I&rsquo;ll also blog about here soon).</p>
<p>I&rsquo;m currently in Leipzig just to visit some people at the Humboldt Chair of Digital Humanities again.</p>
<p>Then I&rsquo;m heading to Cambridge for the <strong>Tyndale House Workshop in Greek Prepositions</strong>. Looking forward to seeing a lot of my friends there and having some good discussions, not just about the topic at hand but more broadly as well.</p>
<p>Then I&rsquo;m heading to Lampeter, Wales for my graduation on July 7th. Three years ago, I decided that it might be useful for me to have a qualification in Classical Greek as well as in linguistics and so I started pursuing a postgraduate diploma at the University of Wales Trinity Saint David. Two days ago, I found out I&rsquo;m being awarded the diploma <em>with Distinction</em> which was my unspoken hope despite occasionally doing poorly at my unseen translations.</p>
http://jktauber.com/2017/06/23/tour-greek-morphology-part-1/A Tour of Greek Morphology: Part 12017-06-23T04:17:44Z2017-06-23T02:34:49ZJames Tauber
<p>This is the first post in a (likely long) series exploring the inflectional morphology of Greek. My goal is to work through various aspects of Greek morphology to help students think more systematically about the subject.</p>
<p>This is the first post in a (likely long) series exploring the inflectional morphology of Greek. My goal is to work through various aspects of Greek morphology to help students think more systematically about the subject.</p>
<p>I ultimately hope to cover everything that a beginner-intermediate grammar might but in a much more exploratory fashion. I&rsquo;ll occasionally touch on morphological theory but I mostly want to point out phenomena in the language that students have already seen but perhaps have not thought about in any depth.</p>
<p>We&rsquo;ll start with a paradigm familiar to all students of New Testament Greek:</p>
<ul>
<li>λύω</li>
<li>λύεις</li>
<li>λύει</li>
<li>λύομεν</li>
<li>λύετε</li>
<li>λύουσι(ν)</li>
</ul>
<p>At its most basic, a <strong>paradigm</strong> is just a showing of related forms next to one another for comparison. The idea is to get a sense of how forms and meaning relate by showing contrastive examples.</p>
<p>In most cases, there&rsquo;s something held constant across all the cells. In the list above, all the forms are present active indicative forms of the word λύω. What distinguishes them from the point of view of their <strong>morphosyntactic properties</strong> is the person and number.</p>
<p>Respectively the list above is:</p>
<ul>
<li>the first person singular (present active indicative form of the word λύω)</li>
<li>the second person singular</li>
<li>the third person singular</li>
<li>the first person plural</li>
<li>the second person plural</li>
<li>the third person plural</li>
</ul>
<p>It may not be the case that the <em>forms</em> all have something in common, although in this case you can see they all start with λύ. It may be tempting to make the simple analysis that λύ itself means &ldquo;the present active indicative form of the word λύω&rdquo; and, say, εις means &ldquo;the second person singular&rdquo;. But as we shall see, that&rsquo;s not the most helpful analysis in general.</p>
<p>It&rsquo;s worth thinking about other possibilities we could draw from just this tiny example (even though many theories will be ruled out once we look at other data): perhaps λ indicates indicative; perhaps εις indicates not only second person singular but present active too; perhaps εις is only used if the word starts with an λ.</p>
<p>About all we can say at this stage is the way you discriminate between, say, a second person singular and a third person singular, in the case of the present active indicative of λύω, is the εις vs ει. And that particular example, in the absence of seeing the other cells, may even lead one to conclude you get from the third singular to the second singular by adding a sigma.</p>
<p>The point is there&rsquo;s a LOT we can&rsquo;t tell yet. What we CAN tell, within the set of forms with the properties held constant, is how to discriminate across forms with the morphosyntactic properties that vary. In other words, IF we have a present active indicative of λύω, how do we tell the person and number?</p>
<p>There is one very important property of Greek morphology that we can see just in the paradigm so far: there is no <em>consistent</em> way person is discriminated for a given number, nor number for a given person. In other words, the relationship between the forms λύω and λύομεν seems completely unrelated to that between λύεις and λύετε. And the relationship between λύω and λύεις seems completely unrelated to that between λύομεν and λύετε even though they differ in meaning in only one property. Or put another way, we can&rsquo;t just tell the person OR number, only the person AND number. We will talk more about this in future posts.</p>
<p>Finally, you may be wondering &ldquo;why is λύω used so often?&rdquo;. There are multiple reasons for this choice. Firstly, as we shall see later, λύω has completely regular stem formation. Secondly the υ is robust in the face of what sounds follow it. Some Classical Greek textbooks will use παύω for the same reasons.</p>
http://jktauber.com/2017/05/31/modelling-derivational-morphology/Modelling Derivational Morphology2017-05-31T20:45:33Z2017-05-31T20:45:33ZJames Tauber
<p>While most of my focus has been on inflectional morphology, I&rsquo;ve done a little bit of work on modelling derivational morphology and it&rsquo;s been a desideratum for my reader and learning algorithm work dating back to at least the original 2008 &ldquo;New Kind of Graded Reader&rdquo; presentations.</p>
<p>While most of my focus has been on inflectional morphology, I&rsquo;ve done a little bit of work on modelling derivational morphology and it&rsquo;s been a desideratum for my reader and learning algorithm work dating back to at least the original 2008 &ldquo;New Kind of Graded Reader&rdquo; presentations.</p>
<p>In the 90s I was even in conversation with Harold Greenlee about putting his work online. There are numerous problems with this kind of work, though. The first is just mistakes and dubious connections. John Lee&rsquo;s 2013 paper <em>Etymological Follies: Three Recent Lexicons of the New Testament</em> gives numerous examples. Lee is always worth listening to when it comes to lexicons!</p>
<p>There&rsquo;s another major issue which is that expressing etymology (or even just cognate groupings) doesn&rsquo;t really tell you what I actually care about which is how easy is the meaning of a lexical item to learn based on other cognate lexical items you&rsquo;ve learned. I&rsquo;ve previously talked about <a href="https://jktauber.com/2015/11/13/initial-thoughts-cost-learning-form/">modelling the cost of learning a new form</a> in the context of inflectional morphology but I&rsquo;m also interested (as mentioned in various &ldquo;New Kind of Graded Reader&rdquo; presentations) in the derivational equivalent between lexemes. There&rsquo;s some interesting theoretical work in this area going back to at least Jackendoff&rsquo;s 1975 paper <em>Morphological and Semantic Regularities in the Lexicon</em>. This was picked up in Bochner&rsquo;s 1993 book <em>Simplicity in Generative Morphology</em> which was a huge influence on me in thinking about morphology as paradigmatic relationships <em>between</em> words rather than morpheme-based approaches. </p>
<p>So for my purposes, at least, I want to model how easy it is to work out the meaning of a word from known cognates potentially given similar analogical pairs of cognates. What I&rsquo;d ultimately like to develop is some sort of weighting between pairs that represents how transparent the connection in meaning is from their cognate forms.</p>
<p>Take for example the pair</p>
<blockquote>
<p>Ἰταλία:Ἰταλικός</p>
</blockquote>
<p>If that pair is known, then something like</p>
<blockquote>
<p>Γαλατία:Γαλατικός</p>
</blockquote>
<p>is much easier to understand. So if you understand Ἰταλία, Ἰταλικός, and Γαλατία, you can almost certainly take a stab at guessing the meaning of Γαλατικός. I care about that because a big part of my research is modelling how &ldquo;easy&rdquo; a passage might be for a student to read.</p>
<p>The analogy might be abstracted as</p>
<blockquote>
<p>-ια:-ικος::place:person-from-that-place</p>
</blockquote>
<p>but it also applies to things like</p>
<blockquote>
<p>Πόντος:Ποντικός</p>
</blockquote>
<p>which is -ος:-ικος so first/second declension doesn&rsquo;t matter. </p>
<p>Given a new place, you could probably easily construct a plausible denominal adjective for someone from that place with -ικος. A Greek speaker unfamiliar with the philosophical school would still immediately recognize Στοϊκός as suggesting &ldquo;someone from the στοά&rdquo; although we might want to score the transparency of that lower that those based on geographical proper nouns.</p>
<p>But now consider</p>
<blockquote>
<p>κοινωνία:κοινωνικός</p>
</blockquote>
<p>or</p>
<blockquote>
<p>εἰρήνη:εἰρηνικός</p>
</blockquote>
<p>The meaning of the <em>root</em> clearly transfers to the lexical items in each pair but the relationship between the items in each pair is a little less transparent. It&rsquo;s still there if you think about it but it almost certainly needs to be weighted less. κοινωνία and εἰρήνη are not physical places. The -ικος derivative is still in some sense about something coming from somewhere but rather than a person from a place, it seems to be a state coming from another state (metaphorical place). </p>
<p>Then you get something like</p>
<blockquote>
<p>ὄνος:ὀνικός</p>
</blockquote>
<p>If you think really really hard about it you can see how ὀνικός (in the sense of millstone) might have come from ὄνος (donkey). But this is at best a potentially useful mnemonic for learners rather than a productive derivation. It should be weighted even lower (no pun intended). And then where might</p>
<blockquote>
<p>κέραμος:κεραμικός</p>
</blockquote>
<p>fit in this weighting? (and to what extent do English cognates help too in cases such as this?)</p>
<p>I&rsquo;m not yet sure how best to produce weightings for this kind of lexical relatedness. My guess is a first pass could be achieved by crowdsourcing on <a href="http://oxlos.org">oxlos</a>. Ultimately, some of the weighting could be calculated via regression based on vocabulary quizzes (although I worry about confounding factors unless the students are beginners). Even just doing the crowdsourcing would be interesting to see how much agreement there was in the &ldquo;obvious relatedness&rdquo; ordering of pairs like Πόντος:Ποντικός &gt; στοά:Στοϊκός &gt; κοινωνία:κοινωνικός &gt; ὄνος:ὀνικός.</p>
<p>Finally, it occurs to me this gives a potential measure of &ldquo;false friendship&rdquo; amongst cognates as a mismatch between the obviousness of relatedness in form vs in meaning.</p>
<p>I have some old work at <a href="https://github.com/morphgnt/morphological-lexicon/tree/master/projects/derivational_morphology">https://github.com/morphgnt/morphological-lexicon/tree/master/projects/derivational_morphology</a> which I probably need to clean up at some point for all this.</p>
<p>As is often the case, this blog post was triggered by Jonathan Robie asking me something and me realising I&rsquo;d never written up my thoughts on the topic despite having thought about it on and off for a decade :-)</p>
http://jktauber.com/2017/05/24/comparing-analyses-herodotus/Comparing Analyses from Herodotus2017-05-25T18:08:06Z2017-05-24T14:01:37ZJames Tauber
<p>An analysis I did of a couple of chapters of Herodotus looks like it might be an interesting example to use for various treebanking approaches—both in terms of how things are structured as well as how they are visualised.</p>
<p>An analysis I did of a couple of chapters of Herodotus looks like it might be an interesting example to use for various treebanking approaches—both in terms of how things are structured as well as how they are visualised.</p>
<p>As the last assignment for my Postgraduate Diploma in Ancient Greek, I had to write a brief commentary of Herodotus 2.35–36, which catalogs (with hasty generalisations galore) differences between Egypt and the rest of the world. The catalog consist of a series of statements of the form “Egyptians do THIS whereas everyone else does THAT” or “[In Egypt] the men do THIS and the women do THAT [as opposed to the other way around like everywhere else]”.</p>
<p>In his commentary, Lloyd notes that this sort of catalog could be quite monotonous but that Herodotus avoids this through “skilful stylistic variation”. My commentary spent a decent proportion of its short word count digging deeper into this variation.</p>
<p>Quite coincidentally, Greg Crane sent me some examples of student treebanking recently in the context of how to compare analyses and they happened to be of Herodotus 2.35. They differ from each other and from my own way of thinking about the sentences. Note that these aren’t difficult or ambiguous sentences, though! The syntax is easy, I just don’t think most analysis conventions and visualisation tools do a great job of capturing what’s going on.</p>
<p>In my assignment, I started off presenting a canonical example of the construction and it’s that example that I want to show here. The original sentence is</p>
<blockquote>
<p>τὰ ἄχθεα οἱ μὲν ἄνδρες ἐπὶ τῶν κεφαλέων φορέουσι, αἱ δὲ γυναῖκες ἐπὶ τῶν ὤμων.</p>
</blockquote>
<p>But I started off considering these sentences:</p>
<blockquote>
<p><strong>οἱ ἄνδρες</strong> τὰ ἄχθεα ἐπὶ τῶν <strong>κεφαλέων</strong> φορέουσι</p>
<p><strong>αἱ γυναῖκες</strong> τὰ ἄχθεα ἐπὶ τῶν <strong>ὤμων</strong> φορέουσι</p>
</blockquote>
<p>The verb (in the present, as always in these comparisons), direct object, and prepositional phrase construction are identical. What is being contrasted (shown in bold) is how the particular location (the complement in the prepositional phrase) varies with the subject. </p>
<p>Herodotus sets up the contrast with μέν and δέ postpositives.</p>
<blockquote>
<p>οἱ <strong>μὲν</strong> ἄνδρες τὰ ἄχθεα ἐπὶ τῶν κεφαλέων φορέουσι</p>
<p>αἱ <strong>δὲ</strong> γυναῖκες τὰ ἄχθεα ἐπὶ τῶν ὤμων φορέουσι</p>
</blockquote>
<p>He then alters the “constants” in the comparison, topicalising the direct object and eliding repetition of the verb. This results in:</p>
<blockquote>
<p>τὰ ἄχθεα<br>
&nbsp;&nbsp;&nbsp;&nbsp;μὲν<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>οἱ … ἄνδρες</b><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ἐπὶ τῶν <b>κεφαλέων</b> φορέουσι<br>
&nbsp;&nbsp;&nbsp;&nbsp;δὲ<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>αἱ … γυναῖκες</b><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ἐπὶ τῶν <b>ὤμων</b> [φορέουσι]</p>
</blockquote>
<p>The above was an indented structure I manually constructed for my commentary. It’s not machine actionable and is missing a lot but I think it does a decent job of capturing some of what&rsquo;s going on. It makes clear:</p>
<ul>
<li>the topicalisation of τὰ ἄχθεα</li>
<li>the μέν and δέ construction as a whole</li>
<li>the elision of the verb</li>
</ul>
<p>It is these three properties that I think make this a particularly interesting example.</p>
<p>Here’s the first student treebank analysis:</p>
<p><img src="https://jktauber.com/site_media/static/herodotus1.png" width="100%"></p>
<p>The student supplies the elided verb (although it’s not co-referenced in any way) but not the elided direct object. There’s no indication of the topicalisation.</p>
<p>It doesn’t quite seem right to me to say the two clauses are conjoined by δέ with the μέν hanging off the verb. I think of the μέν and δέ as equal partners in this construction and as tagging the two things being compared.</p>
<p>Here’s the second student treebank analysis:</p>
<p><img src="https://jktauber.com/site_media/static/herodotus2.png" width="100%"></p>
<p>This analysis seems a lot more confused. The coordination is shown as being done with the μέν this time, with the δέ dangling. The prepositional phrases are shown as governed by the subjects rather than the verb.</p>
<p>To be clear, I’m not trying to critique the students so much as raise questions for analysis conventions and visualisation, especially for reading environments and querying.</p>
<p>Again, this (and the other sentences in Herodotus 2.35–36) aren’t difficult. I doubt either student had any trouble understanding the sentence. I just think it wasn’t clear how to adequately model their understanding of the structure. </p>
<p>I think elision and conjunction are the biggest issues in most analyses like this and good structures and visualisation for handling those will go a long way to making treebanks more consistent and more useful.</p>
<p>Using this sentence from Herodotus as an example, what are better ways of making sure analyses both enable useful queries and can be visualised in more perspicuous ways?</p>
<p><strong>UPDATE</strong>: perhaps &ldquo;coordination&rdquo; would be better than conjunction as one of the &ldquo;biggest issues&rdquo; and I think &ldquo;theticals&rdquo; (HT: Jonathan Robie) could be added to that list to make the triad: elision, coordination, and theticals.</p>
<p><strong>UPDATE 2</strong>: I also need to stop saying elision when I mean ellipsis! I&rsquo;m spending too much time with morphophonology and not enough time with syntax :-)</p>
http://jktauber.com/2017/05/03/headed-germany-next-week/Headed to Germany Next Week2017-05-03T03:24:27Z2017-05-03T03:24:27ZJames Tauber
<p>Next week I&rsquo;m headed to Germany for a whirlwind trip to Göttingen, Heidelberg, and Leipzig to share and discuss ideas with other scholars.</p>
<p>Next week I&rsquo;m headed to Germany for a whirlwind trip to Göttingen, Heidelberg, and Leipzig to share and discuss ideas with other scholars.</p>
<p>I&rsquo;ll be speaking at a Global Philology workshop in Göttingen, attending a Digital Classics conference in Heidelberg (where I&rsquo;ll also have to sit the final exam for my Postgraduate Diploma in Greek if I can find someone to invigilate), and then spending a few days in Leipzig meeting with the team at the Humboldt Chair of Digital Humanities at Universität Leipzig.</p>
<p>I&rsquo;m very excited to now be working more closely with the digital classics community and meeting many people whose names I&rsquo;ve known for a while.</p>
<p>I&rsquo;m also thrilled to visit Leipzig again after more than ten years and get my fill of musical history there. I&rsquo;m also hoping for a bit of a physics history fill too given the importance of both Göttingen and Leipzig in the history of quantum mechanics.</p>
http://jktauber.com/2017/04/21/handling-morphological-ambiguity/Handling Morphological Ambiguity2017-04-21T05:45:10Z2017-04-21T05:26:40ZJames Tauber
<p>On my <a href="https://jktauber.com/now/">now</a> page, I currently list &ldquo;finalising an improved set of morphology tags to use&rdquo; under Medium Term. As I find myself sometimes having to clarify the motivation for and state of this, I thought I&rsquo;d share what I just wrote in the <a href="http://biblicalhumanities.org">Biblical Humanities</a> Slack.</p>
<p>On my <a href="https://jktauber.com/now/">now</a> page, I currently list &ldquo;finalising an improved set of morphology tags to use&rdquo; under Medium Term. As I find myself sometimes having to clarify the motivation for and state of this, I thought I&rsquo;d share what I just wrote in the <a href="http://biblicalhumanities.org">Biblical Humanities</a> Slack.</p>
<p>Firstly, some background on previous notes&hellip;</p>
<p>Back in 2014, I wrote down some notes <a href="https://github.com/morphgnt/sblgnt/wiki/Proposal-for-a-New-Tagging-Scheme">Proposal for a New Tagging Scheme</a> after discussions with Mike Aubrey. In 2015, after some discussions with Emma Ehrhardt, wrote down <a href="https://github.com/morphgnt/sblgnt/wiki/Handling-Ambiguity">Handling Ambiguity</a>. Then in February 2017, after discussion on the Biblical Humanities Slack, I put forward a concrete <a href="https://github.com/morphgnt/sblgnt/wiki/Proposal-for-Gender-Tagging">Proposal for Gender Tagging</a>.</p>
<p>Here&rsquo;s a slightly cleaned up version of what I wrote in Slack&hellip;</p>
<p>All I&rsquo;ve done is propose a way of representing certain single-feature ambiguities (especially gender but also nom/acc in neuter). I have not proposed anything for multi-feature ambiguities nor have I actually DONE any work that uses these proposals.</p>
<p>Multi-feature ambiguities at the morphology level (1S vs 3P, GS vs AP, etc) are rarely ambiguous at the syntactic or semantic level for very good reason: the syntactic/semantic-level disambiguation is what allows one to tolerate the ambiguity at the morphology level (one reason that, as a cognitive scientist, I quite like discriminative models of morphology).</p>
<p>But if I continue with my goal to produce a purely morphology analysis, without &ldquo;downward&rdquo; disambiguation, then I want to be able to provide a way of representing form over function AND representing ambiguity.</p>
<p>I want to stress again that I think nom vs acc in neuter, or gender in genitive plurals is a DIFFERENT kind of ambiguity than 1S vs 3P or GS vs AP. For these multi-feature ambiguities (or what my wiki page calls extended syncretism although not sure I really like that term) it may come down to just providing a disjunction of codes, e.g. GSF∨APF.</p>
<p>Also just in terms of motivation: clearly a morphological analysis that ignores downward disambiguation from syntax or semantics is unhelpful (and potentially even misleading) for exegesis and so a lot of use cases wouldn’t want to do it. HOWEVER, my goal is three fold:</p>
<p>(1) I want to have a way to model the output of automated morphological analysis systems prior to either automated or human downward disambiguation;<br />
(2) as someone studying how morphology works from a cognitive point of view, I care about modelling how ambiguity is resolved at different levels and so want a model that can handle that;<br />
(3) because a student is quite likely to be confronted with this disambiguity, it needs to be in my learning models. I want to be able to search for cases where 1S vs 3P ambiguity or GSF vs APF ambiguity or NSN vs ASN ambiguity is resolved by syntax or semantics so they can be illustrated to the student. I want to know, for a given passage, whether such ambiguity exists so learning can be appropriately scaffolded. And note that, for me, this extends to ambiguity resolved by just accentuation as well (which is another potentially useful thing to model for various applications).</p>
<p>In conclusion, I want to again state I&rsquo;m not at all against a functional, full-disambiguated parse code existing. I have NEVER proposed REPLACING the existing tagging schemes. I just want to add a new column useful for the reasons I&rsquo;ve listed above in (1) – (3) and produce new resources that perhaps ONLY use that purely morphological parse code.</p>
<p>Finally I want to note there&rsquo;s an important difference between what we put in our data and how we present it to users. People should not assume that when I&rsquo;m describing codes to use in data that I&rsquo;m suggesting that&rsquo;s what end-users should see.</p>
<p><strong>UPDATE</strong>: one topic I didn&rsquo;t discuss here is ambiguity in endings that is resolved by knowledge of the stems or principal parts. For example, without a lexicon, there are ambiguities between imperfect and aorist that are easily resolved with additional lexical-level information.</p>
http://jktauber.com/2017/04/18/initial-reboot-oxlos/An Initial Reboot of Oxlos2017-04-18T12:41:46Z2017-04-18T12:41:46ZJames Tauber
<p>In a recent post, <a href="https://jktauber.com/2017/04/10/update-lxx-progress/">Update on LXX Progress</a>, I talked about the possibility of putting together a crowd-sourcing tool to help share the load of clarifying some parse code errors in the CATSS LXX morphological analysis. Last Friday, Patrick Altman and I spent an evening of hacking and built the tool.</p>
<p>In a recent post, <a href="https://jktauber.com/2017/04/10/update-lxx-progress/">Update on LXX Progress</a>, I talked about the possibility of putting together a crowd-sourcing tool to help share the load of clarifying some parse code errors in the CATSS LXX morphological analysis. Last Friday, Patrick Altman and I spent an evening of hacking and built the tool.</p>
<p>Back at BibleTech 2010, I gave a talk about Django, Pinax, and some early ideas for a platform built on them to do collaborative corpus linguistics. Patrick Altman was my main co-developer on some early prototypes and I ended up hiring him to work with me at Eldarion.</p>
<p>The original project was called <strong>oxlos</strong> after the betacode transcription of the Greek word for &ldquo;crowd&rdquo;, a nod to &ldquo;crowd-sourcing&rdquo;. Work didn&rsquo;t continue much past those original prototypes in 2010 and Pinax has come a long way since so, when we decided to work on oxlos again, it made sense to start from scratch. From the initial commit to launching the site took about six hours.</p>
<p>At the moment there is one collective task available—clarifying which of a set of parse codes is valid for a given verb form in the LXX—but as the need for others arises, it will be straightforward to add them (and please contact me if you have similar tasks you&rsquo;d like added to the site).</p>
<p>If you&rsquo;re a Django development, you are welcome to contribute. The code is open source under an MIT license and available at <a href="https://github.com/jtauber/oxlos2">https://github.com/jtauber/oxlos2</a>. We have lots we can potentially add beyond merely different kinds of tasks.</p>
<p>If your Greek morphology is reasonably strong, I invite you to sign up at</p>
<blockquote>
<p><a href="http://oxlos.org/">http://oxlos.org/</a></p>
</blockquote>
<p>and help out with the LXX verb parsing task.</p>
<p>It&rsquo;s probably not that relevant anymore, but you can watch the original 2010 talk below. I&rsquo;d skip past the Django / Pinax intro and go straight to about 37:00 where I start to discuss the collective intelligence platform.</p>
<iframe src="https://player.vimeo.com/video/10515200" width="640" height="363" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
http://jktauber.com/2017/04/17/analysing-verbs-nestle-1904/Analysing the Verbs in Nestle 19042017-04-17T01:51:57Z2017-04-17T00:24:59ZJames Tauber
<p>The last couple of weeks, I&rsquo;ve been working on getting my <code>greek-inflexion</code> code working on Ulrik Sandborg-Petersen&rsquo;s analysis of the Nestle 1904. The first pass of this is now done.</p>
<p>The last couple of weeks, I&rsquo;ve been working on getting my <code>greek-inflexion</code> code working on Ulrik Sandborg-Petersen&rsquo;s analysis of the Nestle 1904. The first pass of this is now done.</p>
<p>The motivation for doing this work was (a) to expand the verb stem database and stemming rules; (b) to be able to annotate the Nestle 1904 with additional morphological information for my adaptive reader and some similar work Jonathan Robie is doing.</p>
<p>My usual first step when dealing with a next text is to automatically generate as many new entries in the lexicon / stem-database as I can (see the first step in <a href="https://jktauber.com/2017/04/10/update-lxx-progress/">Update on LXX Progress</a>).</p>
<p>In some cases, this is just a new stem for an existing verb because of a new form of an already known verb. But sometimes it&rsquo;s an entirely new verb.</p>
<p>I thought the Nestle 1904 would be considerably easier than the LXX because the text is so similar but there were numerous challenges that arose.</p>
<p>It became clear very quickly that there were considerable differences in lemma choice between the Nestle 1904 and the MorphGNT SBLGNT. This didn&rsquo;t completely surprise me: I&rsquo;ve spend quite a bit of time cataloging lemma choice differences between lexical resources and there are considerable differences even between BDAG and Danker&rsquo;s Concise Lexicon.</p>
<p>But even these aside, there were 7,743 out of 28,352 verbs mismatching after my code had already done it&rsquo;s best to automatically fill in missing lexical entries and stems.</p>
<p>A. The normalisation column in Nestle 1904 doesn&rsquo;t normalise capitalisation, clitic accentuation, or moveable nu, all of which greek-inflexion assumes has been done.</p>
<p>Capitalisation alone accounted for 1042 mismatches. Clitic accentuation alone accounted for 1008 mismatches. Moveable nu alone accounted for 4153 mismatches.</p>
<p>B. Nestle 1904 systematically avoids assimilation of συν and ἐν preverbs.</p>
<p>Taken alone, these accounted for 91 mismatches. Mapping prior to analysis by <code>greek-inflexion</code> is somewhat of a hack that I&rsquo;ll address in later passes.</p>
<p>C. There were 8 spelling differences in the endings which required an update to stemming.yaml:</p>
<ul>
<li>κατασκηνοῖν (PAN) in Matt 13:32</li>
<li>κατασκηνοῖν (PAN) in Mark 4:32</li>
<li>ἀποδεκατοῖν (PAN) in Heb 7:5</li>
<li>φυσιοῦσθε (PMS-2P) in 1Cor 4:6</li>
<li>εἴχαμεν (IAI.1P) in 2John 1:5</li>
<li>εἶχαν (IAI.3P) in Mark 8:7</li>
<li>εἶχαν (IAI.3P) in Rev 9:8</li>
<li>παρεῖχαν (IAI.3P) in Acts 28:2</li>
</ul>
<p>D. The different parse code scheme (Robinson&rsquo;s vs CCAT) had to be mapped over.</p>
<p>This should have been straightforward but voice in the formal morphology field sometimes seemed to be messed up (which I corrected as part of G. below).</p>
<p>E. There were 182 differences (type not token) in lemma choice, mostly active vs middle forms.</p>
<p>See <a href="https://gist.github.com/jtauber/28ddfeee3175903026dade4ab965ac6c#file-lemma-differences-txt">https://gist.github.com/jtauber/28ddfeee3175903026dade4ab965ac6c#file-lemma-differences-txt</a> for the full list.</p>
<p>F. There were a small handful of per-form lemma corrections I made</p>
<ul>
<li>ἐπεστείλαμεν AAI.1P ἀποστέλλω ἐπιστέλλω</li>
<li>ἀγαθουργῶν PAP.NSM ἀγαθοεργέω ἀγαθουργέω</li>
<li>συνειδυίης XAP.GSF συνοράω σύνοιδα</li>
<li>γαμίσκονται PMI.3P γαμίζω γαμίσκω</li>
</ul>
<p>G. Finally, I made 69 (type not token) parse code changes.</p>
<p>See <a href="https://gist.github.com/jtauber/28ddfeee3175903026dade4ab965ac6c#file-parse-txt">https://gist.github.com/jtauber/28ddfeee3175903026dade4ab965ac6c#file-parse-txt</a> for the list.</p>
<p>With all this, the <code>greek-inflexion</code> code (on a branch not yet pushed at the time of writing) can correctly generate all the the verbs in the Nestle 1904 morphology.</p>
<p>There are definitely improvements I need to make in a second pass and at least a small number of corrections that I think need to be made to the Nestle 1904 analysis.</p>
<p>But it&rsquo;s now possible for me to produce an initial verb stem annotation for the Nestle 1904 and I&rsquo;m a step closer to a morphological lexicon with broader coverage.</p>
<p><strong>UPDATE</strong>: I&rsquo;ve added some more parse corrections but not yet updated the gist.</p>
http://jktauber.com/2017/04/10/update-lxx-progress/Update on LXX Progress2017-04-10T17:48:20Z2017-04-10T17:48:20ZJames Tauber
<p>As mentioned in previous posts, I&rsquo;ve been working through the LXX, initially making sure my <code>greek-inflexion</code> library can generate the same analysis of verbs as the CATSS LXX Morphology and adding to the verb stem database accordingly. This is a preliminary to being able to run the code on alternative LXX editions such as Swete and provide a freely available morphologically-tagged LXX.</p>
<p>As mentioned in previous posts, I&rsquo;ve been working through the LXX, initially making sure my <code>greek-inflexion</code> library can generate the same analysis of verbs as the CATSS LXX Morphology and adding to the verb stem database accordingly. This is a preliminary to being able to run the code on alternative LXX editions such as Swete and provide a freely available morphologically-tagged LXX.</p>
<p>The general process has been, one book at a time:</p>
<ul>
<li>programmatically expand the stem database with missing stems where the analysis given by CATSS fits what <code>greek-inflexion</code> stemming rules expect</li>
<li>where the analysis from CATSS doesn&rsquo;t fit what <code>greek-inflexion</code> expects, evaluate if it&rsquo;s<ul>
<li>a parse error in the CATSS (at this stage by far the most common problem, but also the most time consuming to identify and fix)</li>
<li>a missing stemming rule (very rare at this stage)</li>
<li>some temporary limitation of <code>greek-inflexion</code> (it could be smarter about some accentuation, for example)</li>
</ul>
</li>
</ul>
<p>Working a few hours a week, it took about a month to do 1 Kings (i.e. 1 Samuel), in part because it had close to 100 parsing errors in the CATSS, many of them quite inexplicable (like getting the voice wrong when the ending should make that very easy to determine).</p>
<p>The work up until this point covers about 35% of the LXX, but I decided for the rest to go broad rather than book-by-book.</p>
<p>In other words, I&rsquo;ve expanded the stem database (per step one above) for the entire LXX in one go and will now work through the problem cases.</p>
<p>What is very encouraging is that expanding the verbs attempted from 35% to 100% only led to 731 analysis mismatches in 1,875 locations. Given the LXX has just over 100,000 verbs, that&rsquo;s less than a 2% error rate.</p>
<p>Let me be clear, however, what I&rsquo;m claiming. I&rsquo;m NOT saying I can morphologically tag verbs with 98% accuracy. I&rsquo;m merely saying that 98% of the CATSS LXX morphological analysis can be explained by the rules and data in <code>greek-inflexion</code>. The other 2% is likely to MOSTLY be errors in the CATSS analysis with a few errors in my stem database, stemming rules, or accentuation rules.</p>
<p>At the rate I worked through 1 Kings, going through the rest of the mismatches might take the rest of the year, but I think I can speed things up by batching similar kinds of mismatches together. For example, there are 586 forms where <code>greek-inflexion</code> didn&rsquo;t generate the form in the CATSS analysis with the morphosyntactic properties given but was able to generate the form with different morphosyntactic properties. In almost all cases that corresponds to a mistake in the CATSS analysis. It&rsquo;s the most time consuming part to deal with but batching them up together (especially dealing with the same mismatch across all remaining books at once) should speed things up.</p>
<p>It may also lend itself to crowd-sourcing. I could probably pretty easily whip up a little website that shows people the form and asks them to choose between the CATSS analysis and the <code>greek-inflexion</code> analysis (not telling them which is which).</p>
<p>It may be worth me spending a few hours setting that up!</p>
http://jktauber.com/2017/02/15/new-morphgnt-releases-and-accentuation-analysis/New MorphGNT Releases and Accentuation Analysis2017-02-15T22:34:50Z2017-02-15T22:34:50ZJames Tauber
<p>Over the last few weeks, I&rsquo;ve made a number of new releases of the MorphGNT SBLGNT analysis fixing some accentuation issues mostly in the normalization column. This came out of ongoing work on modelling accentuation (and, in particular, rules around clitics).</p>
<p>Over the last few weeks, I&rsquo;ve made a number of new releases of the MorphGNT SBLGNT analysis fixing some accentuation issues mostly in the normalization column. This came out of ongoing work on modelling accentuation (and, in particular, rules around clitics).</p>
<p>Back in 2015, I talked about <a href="http://jktauber.com/2015/11/27/annotating-normalization-column-morphgnt-part-1/">Annotating the Normalization Column in MorphGNT</a>. This post could almost be considered Part 2.</p>
<p>I recently went back to that work and made a fresh start on a new repo <a href="https://github.com/jtauber/gnt-accentuation">gnt-accentuation</a> intended to explain the accentuation of each word in the GNT (and eventually other Greek texts). There&rsquo;s two parts to that: explaining why the normalized form is accented the way it but then explaining why the word-in-context might be accented differently (clitics, etc). The repo is eventually going to do both but I started with the latter.</p>
<p>My goal with that repo is to be part of the larger vision of an &ldquo;executable grammar&rdquo; I&rsquo;ve talked about for years where rules about, say, enclitics, are formally written up in a way that can be tested against the data. This means:</p>
<ul>
<li>students reading a rule can immediately jump to real examples (or exceptions)</li>
<li>students confused by something in a text can immediately jump to rules explaining it</li>
<li>the correctness of the rules can be tested</li>
<li>errors in the text can be found</li>
</ul>
<p>It is the fourth point that meant that my recent work uncovered some accentuation issues in the SBLGNT, normalization and lemmatization. Some of that has been corrected in a series of new releases of the MorphGNT: 6.08, 6.09, and 6.10. See <a href="https://github.com/morphgnt/sblgnt/releases">https://github.com/morphgnt/sblgnt/releases</a> for details of specifics. The reason for so many releases was I wanted to get corrections out as soon as I made them but then I found more issues!</p>
<p>There are some issues in the text itself which need to be resolved. See the Github issue <a href="https://github.com/morphgnt/sblgnt/issues/52">https://github.com/morphgnt/sblgnt/issues/52</a> for details. I&rsquo;d very much appreciate people&rsquo;s input.</p>
<p>In the meantime, stay tuned for more progress on <code>gnt-accentuation</code>.</p>
http://jktauber.com/2016/12/04/diacritic-stacking-skolar-pe-fixed/Diacritic Stacking in Skolar PE Fixed2017-01-06T21:54:23Z2016-12-04T17:24:04ZJames Tauber
<p>Back in <a href="http://jktauber.com/2016/01/28/polytonic-greek-unicode-is-still-not-perfect/">Polytonic Greek Unicode Still Isn’t Perfect</a> and <a href="http://jktauber.com/2016/02/09/updated-solution-polytonic-greek-unicodes-problems/">An Updated Solution to Polytonic Greek Unicode’s Problems</a> I talked about problems with stacking vowel length and other diacritics. At least in terms of the font used on this site, the problems are now solved.</p>
<p>Back in <a href="http://jktauber.com/2016/01/28/polytonic-greek-unicode-is-still-not-perfect/">Polytonic Greek Unicode Still Isn’t Perfect</a> and <a href="http://jktauber.com/2016/02/09/updated-solution-polytonic-greek-unicodes-problems/">An Updated Solution to Polytonic Greek Unicode’s Problems</a> I talked about problems with stacking vowel length and other diacritics. At least in terms of the font used on this site, the problems are now solved.</p>
<p>After discussions on the Unicode mailing list, it was clear that the solution to better handling of complex diacritic stacking in polytonic Greek was NOT more precomposed forms but better support in fonts, etc. So I reached out to David Březina, the creator of the Skolar typeface, used on this site, to see if the issues could be addressed.</p>
<p>I&rsquo;m delighted to say that Březina&rsquo;s foundry <a href="https://www.rosettatype.com">Rosetta Type</a> has released new versions of Skolar PE that address all the issues I had.</p>
<p>I&rsquo;ve now switched over this site to use the new version, which does mean those old posts complaining about the issues will read a little funny as they won&rsquo;t actually show examples of the problems they purport to.</p>
<p>Thank you, David, for listening to my input and making my favourite Greek typeface even better!</p>
<p><strong>UPDATE (2017-01-06)</strong>: turns out I also needed to add <code>font-feature-settings: "ccmp";</code> for it to work on Safari.</p>
http://jktauber.com/2017/01/02/first-pass-morphgnt-verb-coverage-and-lxx-beginnin/First Pass of MorphGNT Verb Coverage and LXX Beginnings2017-01-02T14:40:38Z2017-01-02T14:40:38ZJames Tauber
<p>In <a href="http://jktauber.com/2016/12/02/greek-inflexion-and-update-morphological-lexicon/">greek-inflexion and an Update on the Morphological Lexicon</a> I said that all the verbs in the MorphGNT SBLGNT analysis should be done by the end of the year. I hit that goal and made a decent start on the Septuagint.</p>
<p>In <a href="http://jktauber.com/2016/12/02/greek-inflexion-and-update-morphological-lexicon/">greek-inflexion and an Update on the Morphological Lexicon</a> I said that all the verbs in the MorphGNT SBLGNT analysis should be done by the end of the year. I hit that goal and made a decent start on the Septuagint.</p>
<p>As mentioned in that previous post, by May 2016 I could generate every single verb form in:</p>
<ul>
<li>Louise Pratt’s intermediate grammar</li>
<li>Helma Dik’s Greek verb handouts </li>
<li>Andrew Keller &amp; Stephanie Russell’s beginner-intermediate text book</li>
</ul>
<p>On December 8th, I&rsquo;d actually finished coverage of <strong>all the verbs in the MorphGNT SBLGNT</strong> (with a little bit of help from Nathan Smith).</p>
<p>The stem database is available at <a href="https://github.com/jtauber/greek-inflexion/blob/morphgnt/morphgnt_lexicon.yaml">https://github.com/jtauber/greek-inflexion/blob/morphgnt/morphgnt_lexicon.yaml</a>. I should emphasize, though, this is just a first pass and there&rsquo;s more work to do but the coverage is now there.</p>
<p>I immediately started work on applying the <code>greek-inflexion</code> code and stemming rules to the CATSS analysis of the LXX. By the end of 2016, I&rsquo;d built a stem database and updated the stemming rules to cover the Pentateuch, 1 Maccabees, Jonah, Nahum, and Ezra-Nehamiah. Work on the rest of the CATSS analysis will continue over the next few months.</p>
<p>I decided to start a new stem database from scratch for the LXX (although I recently wrote a script to compare stem databases for inconsistencies). My primary reason for this was to see if I ended up with the same analysis for a verb stem as a way of catching potential errors in my original MorphGNT analysis. The classical Greek exemplars listed above, the MorphGNT SBLGNT and the LXX analysis all share the same stemming rules, though. </p>
<p>My reasons for doing the stem analysis on the CATSS morphological analysis were threefold:</p>
<ul>
<li>expand coverage of the stem database to more parts for existing verbs as well as new verbs</li>
<li>provide broader tests for the stemming rules</li>
<li>prepare for a morphological analysis of the Swete text of the LXX/OG.</li>
</ul>
<p>A fourth benefit quickly emerged, though: I found errors in the CATSS analysis.</p>
<p>I&rsquo;ve been maintaining patch files which, after a review pass, I&rsquo;ll contribute back to CCAT (if they are interested). Fun fact: it was contributing corrections back to the CCAT&rsquo;s GNT analysis which started me on the path to MorphGNT 24 years ago!</p>
<p>The patches are available at <a href="https://github.com/jtauber/greek-inflexion/tree/lxx/lxxmorph">https://github.com/jtauber/greek-inflexion/tree/lxx/lxxmorph</a>. They need to be reviewed as they all pretty much assume the text is correct (including accentuation, which was a major reason for the corrections I made) and I&rsquo;ve redone the analysis without considering context. <strong>An easy way to contribute would be to help review these patch files.</strong></p>
<p>All this work on <code>greek-inflexion</code> has led to some improvements to the underlying <code>inflexion</code> library as well as numerous corrections to <code>greek-accentuation</code>.</p>
<p>Work on the LXX coverage will continue as well as expansion to other texts (both Hellenistic and Classical).</p>
<p>Also in an early stage is better modeling of stem formation and endings.</p>
<p>Finally, the fruits of all this will soon be applied to the online Greek reader I talked about at SBL 2016, with a goal to release a prototype for the Johannine gospel and epistles in a couple of months.</p>
http://jktauber.com/2016/01/28/polytonic-greek-unicode-is-still-not-perfect/Polytonic Greek Unicode Still Isn’t Perfect2016-12-04T17:26:48Z2016-01-28T06:53:26ZJames Tauber
<p>Whether we&rsquo;re talking about fonts, programming languages, keyboard entry or even the command-line, support for polytonic Greek has greatly improved even in the last 10 years much less the 23 years since I&rsquo;ve been doing computational analysis of Greek texts.</p>
<p>Whether we&rsquo;re talking about fonts, programming languages, keyboard entry or even the command-line, support for polytonic Greek has greatly improved even in the last 10 years much less the 23 years since I&rsquo;ve been doing computational analysis of Greek texts.</p>
<p><strong>UPDATE (2016-12-04): The Skolar examples in this post will no longer make sense as the issues have now been fixed. See <a href="http://jktauber.com/2016/12/04/diacritic-stacking-skolar-pe-fixed/">Diacritic Stacking in Skolar PE Fixed</a>.</strong></p>
<p>With configurable input sources in OS X, it&rsquo;s easy to type polytonic Greek and the default fonts support all the Unicode codepoints for polytonic Greek. I can now just type Greek (rather than a transliteration or BetaCode) in data files or forum posts or emails or tweets or GitHub issues. There are still <em>some</em> display issues with using polytonic Greek in fixed-width fonts but that&rsquo;s improving. Last year I talked about the bug I reported that got <a href="http://jktauber.com/2015/11/02/atom-editor-11-fixes-polytonic-greek-bug/">fixed in the Atom editor</a>.</p>
<p>Python has long supported Unicode and Python 3 made it even easier to deal with text processing of Unicode files. It doesn&rsquo;t sort polytonic Greek correctly out of the box, but I wrote <a href="https://github.com/jtauber/pyuca">pyuca</a> to solve that problem!</p>
<p>The situation seemed almost perfect until I started doing a lot more work that required me to track vowel length and, in particular use a macron ˉ to distinguish long α, ι, and υ from short. It&rsquo;s okay when the macron is the only diacritic on a vowel: the problems start when a vowel has both an acute and a macron. (There is no need for a macron and a circumflex as the circumflex already implies the vowel is long. Same with an iota subscript.)</p>
<h3 id="problem-1-no-precomposed-character-code-points">Problem 1: No precomposed character code points</h3>
<p>ᾱ can be written as the decomposed <code>U+03B1 U+0304</code> or the precomposed <code>U+1FB1</code>:</p>
<div class="codehilite"><pre><span></span>&gt;&gt;&gt; len(&#39;ᾱ&#39;)
1
&gt;&gt;&gt; [hex(ord(ch)) for ch in &#39;ᾱ&#39;]
[&#39;0x1fb1&#39;]
&gt;&gt;&gt; [unicodedata.name(ch) for ch in &#39;ᾱ&#39;]
[&#39;GREEK SMALL LETTER ALPHA WITH MACRON&#39;]
&gt;&gt;&gt; unicodedata.decomposition(&#39;ᾱ&#39;)
&#39;03B1 0304&#39;
</pre></div>
<p>ά can be written as the decomposed <code>U+03B1 U+0301</code> or the precomposed <code>U+03AC</code> (assuming normalization to a tonos which the Greek Polytonic Input Source on OS X does):</p>
<div class="codehilite"><pre><span></span>&gt;&gt;&gt; len(&#39;ά&#39;)
1
&gt;&gt;&gt; [hex(ord(ch)) for ch in &#39;ά&#39;]
[&#39;0x3ac&#39;]
&gt;&gt;&gt; [unicodedata.name(ch) for ch in &#39;ά&#39;]
[&#39;GREEK SMALL LETTER ALPHA WITH TONOS&#39;]
&gt;&gt;&gt; unicodedata.decomposition(&#39;ά&#39;)
&#39;03B1 0301&#39;
</pre></div>
<p>But there&rsquo;s no precomposed character <code>ᾱ́</code>:</p>
<div class="codehilite"><pre><span></span>&gt;&gt;&gt; len(&#39;ᾱ́&#39;)
2
&gt;&gt;&gt; [hex(ord(ch)) for ch in &#39;ᾱ́&#39;]
[&#39;0x1fb1&#39;, &#39;0x301&#39;]
&gt;&gt;&gt; [hex(ord(ch)) for ch in unicodedata.normalize(&#39;NFC&#39;, &#39;ᾱ́&#39;)]
[&#39;0x1fb1&#39;, &#39;0x301&#39;]
</pre></div>
<p>As you can see, even Python 3 views <code>ᾱ́</code> as two characters. This also screws up font metrics in many text editors and browser text areas (like the one I&rsquo;m writing this post in).</p>
<h3 id="problem-2-many-fonts-with-otherwise-excellent-polytonic-greek-support-dont-display-it-properly">Problem 2: Many fonts with otherwise excellent polytonic Greek support don&rsquo;t display it properly</h3>
<p>The Skolar PE font I use on this site can&rsquo;t properly display <code>ᾱ́</code>. It displays it as ᾱ́. Ironically this is one time the fixed width fonts do a better job!</p>
<h3 id="problem-3-you-cant-normalize-an-alternative-ordering-of-diacritics">Problem 3: You can&rsquo;t normalize an alternative ordering of diacritics</h3>
<p>If you already have a <code>GREEK SMALL LETTER ALPHA WITH TONOS</code> and you add a <code>COMBINING MACRON</code> you end up (at least in the fonts I&rsquo;ve tried) with something that even visually looks different from the <code>GREEK SMALL LETTER ALPHA WITH MACRON</code> followed by <code>COMBINING ACUTE ACCENT</code>:</p>
<div class="codehilite"><pre><span></span>&gt;&gt;&gt; &quot;\u03ac\u0304&quot;
&#39;ά̄&#39;
</pre></div>
<p>(Notice that <code>ά̄</code> != <code>ᾱ́</code> and oddly, Skolar PE does a better job of the former than the latter: ά̄ vs ᾱ́)</p>
<p>And to make matters worse, you can&rsquo;t normalize one to the other:</p>
<div class="codehilite"><pre><span></span><span class="k">[hex(ord(ch)) for ch in unicodedata.normalize(&#39;NFC&#39;, &#39;\u03ac\u0304&#39;)]</span>
<span class="k">[&#39;0x3ac&#39;, &#39;0x304&#39;]</span>
</pre></div>
<p>You have to combine the components in the correct order with the macron FIRST:</p>
<div class="codehilite"><pre><span></span>&gt;&gt;&gt; [hex(ord(ch)) for ch in unicodedata.normalize(&#39;NFC&#39;, &#39;\u03b1\u0304\u0301&#39;)]
[&#39;0x1fb1&#39;, &#39;0x301&#39;]
&gt;&gt;&gt; [hex(ord(ch)) for ch in unicodedata.normalize(&#39;NFC&#39;, &#39;\u03b1\u0301\u0304&#39;)]
[&#39;0x3ac&#39;, &#39;0x304&#39;]
</pre></div>
<p>This is not a bug: technically <code>ά̄</code> and <code>ᾱ́</code> are distinct graphemes but it&rsquo;s still an annoyance because it requires any code that adds diacritics to need to know the correct order in which to add them.</p>
<h3 id="problem-4-no-support-in-the-greek-polytonic-input-source">Problem 4: No support in the Greek Polytonic Input Source</h3>
<p>The Greek Polytonic Input Source supports typing a digraph (diacritic then base) to produce precomposed characters but you can&rsquo;t use a trigraph to enter <code>ᾱ́</code>. In fact, every time I&rsquo;ve needed to type <code>ᾱ́</code> in this post, I&rsquo;ve needed to copy paste it from an earlier usage (and manually minted one via Python the first time).</p>
<h3 id="problem-5-my-existing-syllabification-heuristics-didnt-work">Problem 5: My existing syllabification heuristics didn&rsquo;t work</h3>
<p>I recently had to tweak the syllabification heuristics in my <a href="https://github.com/jtauber/greek-accentuation">greek-accentuation</a> Python library to correctly syllabify words like <code>φῡ́ω</code>. Prior to 0.9.4, it put a syllable division between the macron and the acute!</p>
<p>This would have not happened if Unicode (and hence Python) treated <code>ῡ́</code> as a single character.</p>
<h3 id="problem-6-theres-also-breathing">Problem 6: There&rsquo;s also breathing</h3>
<p>I thought I was all set after fixing Problem 5 but then I hit the imperfect of ἵστημι which starts in most cases with <code>ῑ́̔</code>/<code>ῑ̔́</code> (yes, that should be a rough breathing and acute with a macron.) I&rsquo;m in the process of working around this problem in <code>greek-accentuation</code> now.</p>
<h2 id="the-solution">The Solution</h2>
<p>The root cause of all this is just that Unicode-based code can&rsquo;t treat <code>ῑ́̔</code> or <code>ῡ́</code> or <code>ᾱ́</code> as single characters because Unicode doesn&rsquo;t have a codepoint for the precomposed characters. I imagine it&rsquo;s a long road to get the Unicode Consortium to &ldquo;fix&rdquo; this, if it&rsquo;s even possible. And even if some future version of Unicode fixed it; I&rsquo;d have to wait for Python and OS X to catch up before the problem really goes away. For now I&rsquo;ll just have to continue to work around the problem in code like my <code>greek-accentuation</code> library. That still doesn&rsquo;t solve the problem with the Skolar PE fonts but I might be able to raise that issue with the font foundry.</p>
<p>It&rsquo;s possible there are additional workarounds or tricks I&rsquo;m not aware of. If there are, please let me know.</p>
<p><strong>CORRECTION</strong>: Thanks to Tom Gewecke for pointing out an earlier misstatement about the Polytonic Greek Input Source on OS X producing combining characters. It does not. It supports digraphs to produce precomposed characters.</p>
<p><strong>CORRECTION</strong>: Thanks to Martin J. Dürst for pointing out that <code>ά̄</code> and <code>ᾱ́</code> are distinct graphemes and so the fact they aren&rsquo;t normalized to each other isn&rsquo;t a problem with Unicode as such.</p>
<p><strong>UPDATE</strong>: I remarked at the end of Problem 1 about font metrics in editors / text areas but really I should make that a separate problem. Related (and perhaps yet another problem) is selecting characters with multiple diacritics.</p>
<h2 id="updated-solution">Updated Solution</h2>
<p>Now see my later post: <a href="http://jktauber.com/2016/02/09/updated-solution-polytonic-greek-unicodes-problems/">An Updated Solution to Polytonic Greek Unicode’s Problems</a>.</p>
http://jktauber.com/2016/02/09/updated-solution-polytonic-greek-unicodes-problems/An Updated Solution to Polytonic Greek Unicode’s Problems2016-12-04T17:26:08Z2016-02-09T14:50:03ZJames Tauber
<p>In <a href="http://jktauber.com/2016/01/28/polytonic-greek-unicode-is-still-not-perfect/">Polytonic Greek Unicode Still Isn’t Perfect</a>, I enumerated various challenges that still exist with using Polytonic Greek when vowel length needs to be marked. I now have a better appreciation of what solutions are actually realistic.</p>
<p>In <a href="http://jktauber.com/2016/01/28/polytonic-greek-unicode-is-still-not-perfect/">Polytonic Greek Unicode Still Isn’t Perfect</a>, I enumerated various challenges that still exist with using Polytonic Greek when vowel length needs to be marked. I now have a better appreciation of what solutions are actually realistic.</p>
<p>After discussions with people on the Unicode mailing list, it&rsquo;s clear the solution is NOT to add more precomposed character code points to Unicode (or rather, such a solution will never be adopted by Unicode). Rather, the solution likely lies in the tools just understanding grapheme clusters. For more background, see <a href="http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries">Grapheme Cluster Boundaries</a> in the Unicode Standard Annex on Unicode Text Segmentation.</p>
<p>Perl 6 already has support for this: a layer above code points representing what are considered single graphemes even if made up of multiple code points. See, for example, Jonathan Worthington&rsquo;s <a href="http://jnthn.net/papers/2015-spw-nfg.pdf">slides on Normal Form Grapheme</a>.</p>
<p>So my plan is to at the very least implement a similar approach for Python 3 (unless someone else already has). That will still mean the problem has to separately be solved by:</p>
<ul>
<li>font foundries</li>
<li>text editor developers</li>
<li>keyboard / input source software developers</li>
<li>operating system developers</li>
</ul>
<p>I&rsquo;ll try to engage with each of these groups and will keep people posted on my progress.</p>
<p>Thanks to Ken Whistler for making clear that the path forward is not in more precomposed characters but in working with system vendors and font foundries.</p>
<p>Thanks to Markus Scherer and Elizabeth Mattijsen for their pointers to TR29 and the Perl 6 work.</p>
<p><strong>UPDATE (2016-12-04)</strong>: Now see <a href="http://jktauber.com/2016/12/04/diacritic-stacking-skolar-pe-fixed/">Diacritic Stacking in Skolar PE Fixed</a>.</p>
http://jktauber.com/2016/12/02/greek-inflexion-and-update-morphological-lexicon/greek-inflexion and an Update on the Morphological Lexicon2016-12-02T01:14:22Z2016-12-02T01:14:22ZJames Tauber
<p>Exactly seven months ago, I <a href="http://jktauber.com/2016/05/01/inflexion-code-morphological-generation-parsing/">released</a> a generic library, <code>inflexion</code>, and said I&rsquo;d soon follow it up with the Greek-specific stuff. While I did open-source the latter on GitHub as <code>greek-inflexion</code> shortly thereafter, I didn&rsquo;t want to announce it here until it was further along. I&rsquo;m happy to say it now is.</p>
<p>Exactly seven months ago, I <a href="http://jktauber.com/2016/05/01/inflexion-code-morphological-generation-parsing/">released</a> a generic library, <code>inflexion</code>, and said I&rsquo;d soon follow it up with the Greek-specific stuff. While I did open-source the latter on GitHub as <code>greek-inflexion</code> shortly thereafter, I didn&rsquo;t want to announce it here until it was further along. I&rsquo;m happy to say it now is.</p>
<p>If you recall, I said back in May that &ldquo;it can currently generate every single verb form in Louise Pratt’s intermediate grammar, on Helma Dik’s Greek verb handouts and in Andrew Keller &amp; Stephanie Russell’s beginner-intermediate text book&rdquo;. It now also has much better tooling for parsing new verb forms and guessing the stem of a given form. It also has the start of noun and adjective support.</p>
<p>On a separate <code>morphgnt</code> branch, it now has tooling for testing verb form generation against the MorphGNT/SBLGNT text. The coverage of the stem database is the gospel and epistles of John, Galatians and Mark. I expect to have complete MorphGNT/SBLGNT verb coverage by the end of the year.</p>
<p>The repo is at <a href="https://github.com/jtauber/greek-inflexion">https://github.com/jtauber/greek-inflexion</a>. Note that it&rsquo;s not pip-installable at the moment and that hasn&rsquo;t been a priority as it&rsquo;s not a library.</p>
<p>As mentioned in my May post, most of the value (and effort) is not so much in the code but in the data. The stemming rules and, in particular, the stem database forms the core of the Morphological Lexicon I&rsquo;ve been working on for a few years.</p>
<p>The best discussion of the Morphological Lexicon can be found in my <a href="https://www.academia.edu/18816954/A_Morphological_Lexicon_of_New_Testament_Greek">SBL 2015 Slides</a> although the vision can be found way back in <a href="http://jktauber.com/2004/12/09/morphgnt-v504-and-beyond/">this blog post</a> from 2004 where I say:</p>
<blockquote>
<p>the idea is that surface forms, lexical forms, spelling variations, roots, stems, suppletion, morphophonological rules, etc. will all be catalogued with relationships between them expressed as a directed labelled graph.</p>
</blockquote>
<p>So good progress is being made (and it&rsquo;s all available openly as work progresses) and the initial stem and morphophonological rule databases should be completed in the next month.</p>
<p>Alongside that I&rsquo;m also looking at better representing relationships between stems and also relationships between the stemming rules.</p>
<p>Ultimately, as discussed in my SBL 2015 talk and elsewhere, my goals are to:</p>
<ul>
<li>freely provide, in a machine-actionable way, all of the morphological information normally found in a Greek lexicon</li>
<li>facilitate tagging of new Greek texts</li>
<li>provide the underlying information to drive a new generation of adaptive Greek readers (the topic of my 2016 SBL talk)</li>
<li>contribute a comprehensive analysis of Ancient Greek of interest to general morphologists</li>
<li>experiment with the notion of an &ldquo;executable grammar&rdquo; where all paradigms, rules and assertions are tested automatically against a corpus and, with it, replace the existing plethora of books on paradigms and principal parts.</li>
</ul>
<p>Particular thanks to Jonathan Robie, who continues to provide the inspiration and encouragement for a lot of this work.</p>
http://jktauber.com/2016/11/26/more-diagramming-greek-accent-placement/More on Diagramming Greek Accent Placement2016-11-26T00:37:58Z2016-11-26T00:37:58ZJames Tauber
<p>I&rsquo;ve put together slides and a voice-over to further explain Greek accent placement from a moraic point-of-view.</p>
<p>I&rsquo;ve put together slides and a voice-over to further explain Greek accent placement from a moraic point-of-view.</p>
<p>After posting <a href="http://jktauber.com/2016/11/07/diagramming-greek-accent-placement/">Diagramming Greek Accent Placement</a>, a couple of people asked me to unpack the second diagram, so I put together a series of slides with a view to perhaps doing a voice-over to accompany them.</p>
<p>I put the slides up at <a href="https://www.academia.edu/29725241/Basic_Greek_Accentuation">https://www.academia.edu/29725241/Basic_Greek_Accentuation</a> and immediately got a suggestion to do a voice-over.</p>
<p>Here&rsquo;s the resultant video:</p>
<iframe src="https://player.vimeo.com/video/191687615" width="640" height="480" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
<p><a href="https://vimeo.com/191687615">Basic Greek Accentuation</a> from <a href="https://vimeo.com/user3466366">James Tauber</a> on <a href="https://vimeo.com">Vimeo</a>.</p>
http://jktauber.com/2016/11/26/greek-accentuation-104-released/greek-accentuation 1.0.4 Released2016-11-26T00:15:11Z2016-11-26T00:15:11ZJames Tauber
<p>Three weeks ago I fixed a few bugs in <code>greek-accentuation</code> and ended up doing three releases (although I only blogged about two at the time). I&rsquo;ve now done a fourth bug fix release: 1.0.4.</p>
<p>Three weeks ago I fixed a few bugs in <code>greek-accentuation</code> and ended up doing three releases (although I only blogged about two at the time). I&rsquo;ve now done a fourth bug fix release: 1.0.4.</p>
<p>1.0.3 was the bug fix mentioned in <a href="http://jktauber.com/2016/11/07/diagramming-greek-accent-placement/">Diagramming Greek Accent Placement</a> where paroxytone wasn&rsquo;t being given as a possible accentuation when the penult was long and length of ultima was unknown (e.g. an unmarked alpha).</p>
<p>To this, 1.0.4 adds two new fixes:</p>
<ul>
<li><code>syllabify.is_diphthong</code> now works with uppercase letters (fixes a syllabification bug when capitalized word begins with diphthong)</li>
<li><code>syllabify.add_necessary_breathing</code> now returns a NFKC normalized form (improving rebreath/debreath roundtripping)</li>
</ul>
<p>You can <code>pip install greek-accentuation==1.0.4</code>. The repo is at <a href="https://github.com/jtauber/greek-accentuation">https://github.com/jtauber/greek-accentuation</a>.</p>
http://jktauber.com/2016/11/07/diagramming-greek-accent-placement/Diagramming Greek Accent Placement2016-11-07T22:38:13Z2016-11-07T22:36:41ZJames Tauber
<p>Cleaning up code as part of another bug fix to <code>greek-accentuation</code> led me to update an old diagram I&rsquo;d done showing the Greek accentuation possibilities in terms of morae.</p>
<p>Cleaning up code as part of another bug fix to <code>greek-accentuation</code> led me to update an old diagram I&rsquo;d done showing the Greek accentuation possibilities in terms of morae.</p>
<p>Back in 2014 I came up with the following diagram to try to explain that the &ldquo;law of limitation&rdquo; was fairly easy to understand in terms of morae. Once you understand the acute and circumflex accents in terms of morae, it&rsquo;s clear that the accent can just go on one of the final three morae but that if the penult is long and the ultima short, the next-to-last mora is skipped over.</p>
<p><img src="http://jktauber.com/site_media/static/mora_accent.jpg"></p>
<p>In trying to fix a bug in <code>greek-accentuation</code>, I was stepping through all the possibilities again (with the additional complexity that the code there sometimes can&rsquo;t tell if a vowel is long or short). I realised it might be clearer to put the four combinations of penult/ultima length in a 2-by-2 matrix.</p>
<p>I added a bit more information on the resulting accents and came up with this:</p>
<p><img src="http://jktauber.com/site_media/static/greek_accentuation_possibilities.png" width=800></p>
<p>Let me know what you think. Do other people find this a helpful way to conceptualise things visually?</p>
http://jktauber.com/2016/11/04/greek-accentuation-102-released-and-how-persistent/greek-accentuation 1.0.2 Released (and How Persistent Accentuation Works)2016-11-04T17:40:06Z2016-11-04T17:40:06ZJames Tauber
<p>Hot on the heels of the 1.0.1 bug fix, I&rsquo;ve released 1.0.2 with another fix, this time in the persistent accent placement. So I thought I&rsquo;d explain how persistent accent placement is implemented and what the bug was.</p>
<p>Hot on the heels of the 1.0.1 bug fix, I&rsquo;ve released 1.0.2 with another fix, this time in the persistent accent placement. So I thought I&rsquo;d explain how persistent accent placement is implemented and what the bug was.</p>
<p><code>greek-accentuation.accentuation</code> has a method <code>persistent</code> used for placing accents that are persistent, that is, they stay in place through different inflections as much as is allowed by basic accentuation rules.</p>
<p>The <code>persistent</code> function takes both the unaccented word to be accented and a lemma or base form that IS accented.</p>
<p>The first step is seeing on which syllable the accent is on this base form and what type of accent it is. Note that the position of the accent is determined by the syllable position counting from the left, not the right. The code syllabifies both the word-to-be-accented and the base form. It then works out which three (or fewer) syllable placements are allowed on the word-to-be-accented based on the basic accentuation rules. This is provided by another function <code>possible_accentuations</code>.</p>
<p>Now the first thing that&rsquo;s tried is whether the exact syllable position and accent type of the base is in the possible accentuations for the word-to-be-accented. If so, we&rsquo;re done. If not, however, we try changing the accent type from acute to circumflex while keeping it in the same position. If that&rsquo;s still not allowed, we iterate back, trying to place an acute on each successively later syllable until it&rsquo;s an accentuation allowed by the basic rules.</p>
<p>However, this algorithm hit a problem with accenting Ἰουδαιων using the base Ἰουδαῖος.</p>
<p>The first thing is tries is Ἰουδαῖων which of course is not permitted so it immediately jumps to an acute on the next position: Ἰουδαιών. However this is incorrect. The bug was that only a change from acute to circumflex was attempted before trying later positions. In this case, the correct thing to do was try an acute in the same position as the original circumflex.</p>
<p>This was an easy addition and results in the correct answer: Ἰουδαίων</p>
<p>You can <code>pip install greek-accentuation==1.0.2</code>. The repo is at <a href="https://github.com/jtauber/greek-accentuation">https://github.com/jtauber/greek-accentuation</a>.</p>
http://jktauber.com/2016/11/03/greek-accentuation-101-released/greek-accentuation 1.0.1 Released2016-11-03T05:26:21Z2016-11-03T05:26:21ZJames Tauber
<p>A minor bug fix release that fixes a problem with <code>add_necessary_breathing</code>.</p>
<p>A minor bug fix release that fixes a problem with <code>add_necessary_breathing</code>.</p>
<p>My library for accenting Greek which includes a function for adding missing breathing was throwing an exception if given a word beginning with an initial uppercase vowel, e.g. Ιησους</p>
<p>The bug has now been fixed.</p>
<p>You can <code>pip install greek-accentuation==1.0.1</code>. The repo is at <a href="https://github.com/jtauber/greek-accentuation">https://github.com/jtauber/greek-accentuation</a>.</p>
http://jktauber.com/2016/09/11/thoughts-voice/Thoughts on Voice2016-09-11T07:10:07Z2016-09-11T07:06:47ZJames Tauber
<p>Occasionally I get in to conversations about the Greek middle (or voice in general) but I&rsquo;ve never written down my thoughts on the topic. Here&rsquo;s an attempt to summarize my current thinking although there&rsquo;s nothing particularly novel about it.</p>
<p>Occasionally I get in to conversations about the Greek middle (or voice in general) but I&rsquo;ve never written down my thoughts on the topic. Here&rsquo;s an attempt to summarize my current thinking although there&rsquo;s nothing particularly novel about it.</p>
<p>Imagine a transitivity spectrum of high object-affectedness at one end and high subject-affectedness at the other end.</p>
<p>When describing an event, there may be some freedom in where on the spectrum to go but for different choices, there&rsquo;s an ordering of where they would be placed relatively on the spectrum. For example, consider:</p>
<ul>
<li>I broke the vase</li>
<li>The vase broke</li>
<li>The vase was broken by me</li>
</ul>
<p>These three descriptions of the same event would be placed, relatively, from left to right on the spectrum.</p>
<p>Now consider each of the following pairs. If being used to describe the same event, the first of the pair would be placed on the spectrum (again, relatively) to the left of the second of the pair:</p>
<ul>
<li>take / choose (choosing might just be a mental decision but taking involves action)</li>
<li>destroy / perish</li>
<li>resolve / deliberate (resolve is a more active step beyond merely deliberating)</li>
<li>stop / cease</li>
<li>honor / value (you might value something but honoring it is taking action in response to that value)</li>
<li>show / appear (you can just appear but you can also actively show someone)</li>
</ul>
<p>Now in the imperfective, Greek offers two sets of endings that can (and I stress <em>can</em>) be used to capture the distinction between more to the left and more to the right on the spectrum. In the perfective, Greek offers three sets of endings.</p>
<p><strong>However</strong>, where the line is drawn between these two or three segments of the spectrum to map them to the different endings is somewhat arbitrary between different words and it isn&rsquo;t always directly comparable between different tense-aspect forms either. A single set of endings might cover a pretty large part of the spectrum. There is also no &ldquo;requirement&rdquo; that a single lexeme use all ending sets available, either. Instead, voice is available as a potential way of conveying the kinds of distinctions in the pairs above and in the three-way distinction in the vase example.</p>
<p>Where distinctions don&rsquo;t need to be made, it should not surprise us to find only &ldquo;middle&rdquo; forms in use, especially in cases of lower object affectedness (like in mental verbs). This does mean in the imperfective there is not a separate form for a passive but passivization is less useful (and hence less likely) in these cases. But it should also not surprise us if some mental verbs use active forms.</p>
<p>It should also not surprise us to find, say, the future using the middle where the present uses the active. If the imperfectives only need a two-way distinction, the perfectives can also make just a two-way distinction even if choosing to use the two middle-passive forms to do so.</p>
<p>And if only a one-way distinction is required, there is nothing odd about a lexical item choosing to use a particular one of any of the three available voice endings (although we would expect broad tendencies to be based on object-affectedness).</p>
<p>The &ldquo;active&rdquo; is often described as unmarked with the &ldquo;middle&rdquo; marked for subject-affectedness but I think it&rsquo;s actually helpful to think less about markedness and more about this transitivity spectrum of relative object-affectedness vs subject-affectedness. One can then think of voice as a largely <em>lexically</em>-determined tool for making <em>relative</em> contrasts on this spectrum.</p>
<p>This way of thinking means that the names of voices should probably not be so absolute but somehow be expressed in purely relative terms. The use of &ldquo;middle&rdquo; for the middle of the three isn&rsquo;t bad but &ldquo;active&rdquo; and &ldquo;passive&rdquo; are highly misleading although they are &ldquo;more active&rdquo; and &ldquo;more passive&rdquo; than the &ldquo;middle&rdquo; when directly contrasting <em>within the same lexeme</em>.</p>
http://jktauber.com/2016/07/27/greek-accentuation-100-released/greek-accentuation 1.0.0 Released2016-07-27T05:46:47Z2016-07-27T05:45:10ZJames Tauber
<p><code>greek-accentuation</code> has finally hit 1.0.0 with a couple more functions and a module layout change.</p>
<p><code>greek-accentuation</code> has finally hit 1.0.0 with a couple more functions and a module layout change.</p>
<p>The library (which I&rsquo;ve previously written about <a href="http://jktauber.com/2015/11/20/greek-accentuation-library/">here</a>) has been sitting on 0.9.9 for a while and I&rsquo;ve been using it sucessfully in my inflectional morphology work for 18 months. There were, however, a couple of functions that lived in the inflectional morphology repos that really belonged in <code>greek-accentuation</code>. They have now been moved there.</p>
<p>There is <code>syllabify.debreath</code> which removes smooth breathing and replaces rough breathing with an <code>h</code>. And there is <code>syllabify.rebreath</code> which reverses this.</p>
<p>The other big change made is there are no-longer three top-level modules—everything is enclosed in a <code>greek_accentuation</code> package so instead of <code>from syllabify import *</code> you say <code>from greek_accentuation.syllabify import *</code>.</p>
<p>You can <code>pip install greek-accentuation==1.0.0</code>. The repo is at <a href="https://github.com/jtauber/greek-accentuation">https://github.com/jtauber/greek-accentuation</a>.</p>
<p><code>greek-accentuation</code> is made available under an MIT license.</p>
<p>Thanks to Kyle Johnson of the wonderful <a href="http://cltk.org">Classical Language Toolkit</a> project for encouraging me to finally do the 1.0.0 release.</p>
http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/Modelling Stems and Principal Part Lists2016-07-25T00:12:59Z2016-06-17T15:03:41ZJames Tauber
<p>This is part 0 of a series of blog posts about modelling stems and principal part lists, particularly for Attic Greek but hopefully more generally applicable. This is largely writing up work already done but I’m doing cleanup as I go along as well.</p>
<p>This is part 0 of a series of blog posts about modelling stems and principal part lists, particularly for Attic Greek but hopefully more generally applicable. This is largely writing up work already done but I’m doing cleanup as I go along as well.</p>
<p>A core part of the handling of verbs in the <em>Morphological Lexicon</em> is the set of terminations and sandhi rules that can generate paradigms attested in grammars like Louise Pratt’s <em>The Essentials of Greek Grammar</em>. Another core part is the stem information for a broader range of verbs usually conveyed in works like Pratt’s in the form of lists of principal parts.</p>
<p>A rough outline of (future) posts is:</p>
<ul>
<li><a href="http://jktauber.com/2016/06/18/sources-principal-part-lists/">the sources of principal part lists for this work</a></li>
<li><a href="http://jktauber.com/2016/06/18/lemmas-pratt-principal-parts/">lemmas in the Pratt principal parts</a></li>
<li><a href="http://jktauber.com/2016/06/21/merging-morwood-and-pratt-lemmas/">merging the Morwood and Pratt lemmas</a></li>
<li><a href="http://jktauber.com/2016/06/22/merging-dcc-lemmas/">merging the DCC lemmas</a></li>
<li><a href="http://jktauber.com/2016/06/26/formatting-principal-parts/">formatting of principal parts</a></li>
<li><a href="http://jktauber.com/2016/07/16/parsing-dcc-principal-parts/">parsing the DCC principal parts</a></li>
<li><a href="http://jktauber.com/2016/07/24/more-parsing-dcc-principal-parts/">more parsing the DCC principal parts</a></li>
<li>how to model a merge of the lists</li>
<li>inferring stems from principal parts</li>
<li>stems, terminations and sandhi</li>
<li>relationships between stems</li>
<li>???</li>
</ul>
http://jktauber.com/2016/07/24/more-parsing-dcc-principal-parts/More Parsing of the DCC Principal Parts2016-07-25T00:02:49Z2016-07-24T23:51:13ZJames Tauber
<p>This is part 7 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and looks in even more detail at the format of the principal parts list in the DCC verbs.</p>
<p>This is part 7 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and looks in even more detail at the format of the principal parts list in the DCC verbs.</p>
<p>In the <a href="http://jktauber.com/2016/07/16/parsing-dcc-principal-parts/">previous blog post</a>, I used regular expressions to match DCC principal parts.</p>
<p>In moving from merely matching patterns to actually extracting parts correctly, I encountered further ambiguities.</p>
<p>Recall that previously, I just did matches like</p>
<div class="codehilite"><pre><span></span>{grk}, {grk}, {grk}, {grk}, {grk}
</pre></div>
<p>where <code>{grk}</code> matched any Greek word.</p>
<p>This weekend, I expanded that to patterns more like</p>
<div class="codehilite"><pre><span></span>{present}, {future}, {aorist}, {perfect_active}, {aorist_passive}
{present}, {future}, {perfect_active}, {perfect_middle}, {aorist_passive}
{present}, {future}, {aorist}, {perfect_middle}, {aorist_passive}
</pre></div>
<p>which actually took into account the endings of the Greek words (for example <code>{perfect_middle}</code> only matches Greek words ending in <code>μαι</code>.</p>
<p>Note that the one pattern from the previous blog post becomes three patterns. These more precise patterns, however, enable easier extraction of the actual parts with their morphosyntactic properties.</p>
<p>They also reveal some more inconsistencies. For example, 2nd aorists are not, it turns out, always explicitly marked.</p>
<p>Also, the four-part pattern</p>
<div class="codehilite"><pre><span></span>{grk}, {grk}, {grk}, {grk}
</pre></div>
<p>actually could be any of</p>
<div class="codehilite"><pre><span></span>{present}, {future}, {aorist}, {perfect_active}
{present}, {future}, {aorist}, {perfect_middle}
{present}, {future}, {aorist}, {aorist_passive}
{present}, {future}, {perfect_middle}, {aorist_passive}
{present}, {future}, {aorist_passive}, {perfect_middle}
</pre></div>
<p>The last pattern is necessitated by</p>
<div class="codehilite"><pre><span></span>δύναμαι, δυνήσομαι, ἐδυνήθην, δεδύνημαι
</pre></div>
<p>which is, presumably, an error with <code>ἐδυνήθην</code> and <code>δεδύνημαι</code> transposed.</p>
<p>Besides errors like this, there is at least one ambiguity where the endings aren&rsquo;t enough to disambiguate.</p>
<div class="codehilite"><pre><span></span>χαίρω, χαιρήσω, κεχάρηκα, κεχάρημαι, ἐχάρην
</pre></div>
<p>is ambiguous because, <code>κα</code> is a possible aorist ending. The ambiguity can obviously be resolved by looking at the entire form, but given some parts are annotated elsewhere to avoid possible misreading, it might be better to write the above as</p>
<div class="codehilite"><pre><span></span>χαίρω, χαιρήσω, pf. κεχάρηκα, κεχάρημαι, ἐχάρην
</pre></div>
<p>to make perfectly clear the aorist form has been skipped over.</p>
<p>Again, my point is not to nitpick the DCC principal parts list, but rather make explicit the assumptions that principal parts in this format make.</p>
<p>In determining what part a particular form is, the following needs to be considered:</p>
<ul>
<li>explicit annotation (e.g. <code>pf.</code> for perfects)</li>
<li>ending (<code>μαι</code> ending a form other than the first two parts indicates the perfect middle)</li>
<li>position in the list (both absolutely and relative to other forms who part is worked out from other considerations)</li>
</ul>
<p>And the main upshot of all this is I&rsquo;ve now converted the DCC principal parts to a YAML format that I&rsquo;ll shortly merge in with the parts from Pratt and Morwood.</p>
http://jktauber.com/2016/07/16/parsing-dcc-principal-parts/Parsing the DCC Principal Parts2016-07-16T02:23:48Z2016-07-16T02:17:10ZJames Tauber
<p>This is part 6 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and looks more precisely at the format of the principal parts list in the DCC verbs.</p>
<p>This is part 6 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and looks more precisely at the format of the principal parts list in the DCC verbs.</p>
<p>We&rsquo;ve already discussed that the DCC principal parts are presented slightly differently than the Pratt or Morwood inasmuch as the latter two are in tabular form whereas the DCC list just has a string of comma-separated parts.</p>
<p>In <a href="http://jktauber.com/2016/06/26/formatting-principal-parts/">Formatting of Principal Parts</a> we touched on many of the properties of the DCC format but in the spirit of precise modeling, what I&rsquo;ve done below is actually write a set of regular expressions that match and enable parsing of every entry in the DCC list.</p>
<p>In the regex patterns below, I&rsquo;ve used <code>{grk}</code> for Greek words, optionally preceded by a hyphen. In my code this expands to the regex <code>(-?[\u0370-\u03FF\u1F00-\u1FFF]+)</code>. I also have <code>{grk2}</code> which just allows an optional second Greek word separated with &ldquo;or&rdquo; or &ldquo;and&rdquo;. <code>{grk2}</code> hence expands to <code>({grk}( (or|and) {grk})?)</code>. And finally, in a couple of examples, I have <code>{gloss}</code> for glosses consisting of English words including a comma. This expands to <code>([a-z, ]+)</code>.</p>
<p>The simplest of cases just have a comma-separated list of Greek words. There may only be 1–5 rather than the full six although in these cases, the only gaps in the parts are in the final parts.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, {grk2}, {grk}, {grk}, {grk}, {grk2}&quot;
&quot;{grk}, {grk}, {grk}, {grk}, {grk}&quot;
&quot;{grk}, {grk}, {grk}, {grk}&quot;
&quot;{grk}, {grk}, {grk}&quot;
&quot;{grk}, {grk}&quot;
&quot;{grk}&quot;
</pre></div>
<p>As mentioned in the previous blog posts, when the third part is a 2nd aorist, that&rsquo;s made explicit. Again, sometimes the 5th and 6th, or 4th, 5th and 6th parts are omitted.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, {grk}, 2 aor\. {grk2}, {grk2}, {grk}, {grk}&quot;
&quot;{grk}, {grk}, 2 aor\. {grk}, {grk}&quot;
&quot;{grk}, {grk}, 2 aor\. {grk}&quot;
</pre></div>
<p>One pattern skips the second part but this is clear because of the explicit labeling of the third part.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, 2 aor\. {grk}, {grk}, {grk}&quot;
</pre></div>
<p>However, in one case, &ldquo;ἔρχομαι, fut. εἶμι or ἐλεύσομαι, 2 aor. ἦλθον, ἐλήλυθα&rdquo;, the second part is explictly labeled <code>fut.</code> because it is suppletive, even though it is unmbiguously the second part by position.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, fut\. {grk2}, 2 aor\. {grk}, {grk}&quot;
</pre></div>
<p>Sometimes both a 1st and 2nd aorist are given as separate parts. In the sigmatic case, &ldquo;ἁμαρτάνω, ἁμαρτήσομαι, ἡμάρτησα, 2 aor. ἥμαρτον, ἡμάρτηκα, ἡμάρτημαι, ἡμαρτήθην&rdquo;, the 1st aorist is not explicitly labeled and so the 2nd aorist is actually in the fourth position, the fourth part in the fifth position and so on.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, {grk}, {grk}, 2 aor\. {grk}, {grk}, {grk}, {grk}&quot;
</pre></div>
<p>However, sometimes the 1st aorist in this case is labeled because it is not sigmatic and so at a glance could be confused for a perfect.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, {grk}, 1 aor\. {grk}, 2 aor\. {grk}, {grk}, {grk}, {grk}&quot;
&quot;{grk}, {grk}, 1 aor\. {grk}, 2 aor\. {grk}, {grk}, {grk}&quot;
&quot;{grk}, {grk}, 1 aor\. {grk}&quot;
</pre></div>
<p>One example of this (matching the first line above) is &ldquo;φέρω, οἴσω, 1 aor. ἤνεγκα, 2 aor. ἤνεγκον, ἐνήνοχα, ἐνήνεγμαι, ἠνέχθην&rdquo;.</p>
<p>In one case, &ldquo;μιμνήσκω, -μνήσω, -έμνησα, pf. μέμνημαι, ἐμνήσθην&rdquo;, the fourth part is skipped and the fifth is labeled <code>pf.</code>. It would probably be clearer if this were labeled <code>pf. mid.</code> or similar.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, {grk}, {grk}, pf\. {grk}, {grk}&quot;
</pre></div>
<p>In another case, &ldquo;ἥκω, ἥξω, pf. ἧκα&rdquo;, the perfect active is labeled explicitly because there&rsquo;s no third part and the kappa in the imperfective stem makes the perfect form perhaps harder to identify.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, {grk}, pf\. {grk}&quot;
</pre></div>
<p>Sometimes an explicit imperfect is given. This is usually at the end, after the usual parts are given.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, {grk}, impf\. {grk}&quot;
&quot;{grk}, {grk}, {grk}, {grk}, impf\. {grk}&quot;
&quot;{grk}, {grk2}, 2 aor\. {grk}, {grk}, impf\. {grk}&quot;
&quot;{grk}, {grk}, 2 aor\. {grk}, {grk2}, {grk}, impf\. {grk}&quot;
</pre></div>
<p>In one case, &ldquo;οἴομαι or οἶμαι, οἰήσομαι, impf. ᾤμην, aor. ᾠήθην&rdquo;, (perhaps inconsistently) the imperfect is given before the aorist.</p>
<div class="codehilite"><pre><span></span>&quot;{grk2}, {grk}, impf\. {grk}, aor\. {grk}&quot;
</pre></div>
<p>In one case, &ldquo;ἀκούω, ἀκούσομαι, ἤκουσα, ἀκήκοα, plup. ἠκηκόη or ἀκηκόη, ἠκούσθην&rdquo;, where there is no fifth part, two forms of the pluperfect are given instead.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, {grk}, {grk}, {grk}, plup\. {grk2}, {grk}&quot;
</pre></div>
<p>In another case, however, &ldquo;καθίστημι, καταστήσω, κατέστησα, κατέστην, καθέστηκα, plupf. καθειστήκη, κατεστάθην&rdquo;, this turns out to be a little tricky because it has both a 1st and root aorist but that fact is not made explicit.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, {grk}, {grk}, {grk}, {grk}, plupf\. {grk}, {grk}&quot;
</pre></div>
<p>Also note the inconsistent use of &ldquo;plup.&rdquo; vs &ldquo;plupf.&rdquo;.</p>
<p>There are four cases of just providing various non-standard parts just as imperfects, infinitives or participles (in one case three participle parts).</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, impf\. {grk2}, infin\. {grk}&quot;
&quot;{grk}, {grk}, impf\. {grk}, infin\. {grk}&quot;
&quot;{grk}, ptc\. {grk}&quot;
&quot;{grk}, infin\. {grk}, ptc\. {grk}, {grk}, {grk}&quot;
</pre></div>
<p>In the case of εἶδον, the first part actually <em>is</em> the suppletive 2nd aorist of another part.</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, 2 aor\. of {grk}, act\. infin\. {grk}, mid\.infin\. {grk}&quot;
</pre></div>
<p>For our purposes this may end up getting treated differently.</p>
<p>There are five other cases where there is additional annotation:</p>
<div class="codehilite"><pre><span></span>&quot;{grk}, infin\. {grk}, imper\. {grk}, plupf\. used as impf\. {grk}&quot;
&quot;{grk}, {grk}, {grk}, {grk}, {grk} \(but usu\. {grk} instead\), {grk}&quot;
&quot;{grk}, {grk}, {grk}, {grk}, {grk} \(but commonly {grk} instead\), {grk}&quot;
&quot;{grk} \(usually mid\. {grk}\), {grk}, {grk}, {grk}, {grk}&quot;
&quot;{grk}, {grk}, {grk}, 2 aor\. mid\. {grk}, pf\. {grk} \(“I have utterly destroyed”\) or {grk} \(“I am undone”\)&quot;
</pre></div>
<p>And finally there are five cases that are clearly typos where the crucial comma delimiter has been ommitted or accidently replaced with a .</p>
<div class="codehilite"><pre><span></span>&quot;{grk} {grk} {gloss}, {grk} {gloss}, 2 aor\. {grk} {gloss}, {grk} {gloss}, plup\. {grk} {gloss}, {grk} {gloss}&quot;
&quot;{grk}, {grk}, {grk}, {grk}\. {grk}, {grk}&quot;
&quot;{grk} {grk}&quot;
&quot;{grk} {grk}, 2 aor\. {grk}&quot;
&quot;{grk} {grk}, {grk}, {grk}, {grk}, {grk}&quot;
</pre></div>
<p>These correspond to:</p>
<div class="codehilite"><pre><span></span>ἵστημι στήσω will set, ἔστησα set, caused to stand, 2 aor. ἔστην stood, ἕστηκα stand, plup. εἱστήκη stood, ἐστάθην stood
τυγχάνω, τεύξομαι, ἔτυχον, τετύχηκα. τέτυγμαι, ἐτύχθην
προσήκω προσήξω
ἕπομαι ἕψομαι, 2 aor. ἑσπόμην
βουλεύω βουλεύσω, ἐβούλευσα, βεβούλευκα, βεβούλευμαι, ἐβουλεύθην
</pre></div>
<p>These cases should probably just be fixed upstream.</p>
<p>Now, admittedly, it probably would have been quicker for me to just manually convert the 149 strings into some completely unambiguous format rather than write regular expressions that match them all, handling typos and idiosyncracies. But the approach highlights both specific issues with the DCC list (which admittedly are quite minor, I don&rsquo;t want to detract from the wonderful resource the DCC Core List is) and the value of precise modeling like this in identifying inconsistencies and potential ambiguities in the way this sort of information is presented.</p>
<p>While it&rsquo;s outside the scope of this blog series, I&rsquo;ve been exploring for a while similar tests on entire lexicon entries. This pretty quickly exposes inconsistencies. Even in cases where a markup language such as XML is used, unless it&rsquo;s very fine-grained markup (like the Cambridge Lexicon is/was using) lots of inconsistencies and ambiguities can creep in.</p>
<p>All of this comes back to what I talked about in my 2015 SBL and BibleTech talks under the heading of <a href="http://jktauber.com/2015/11/11/technical-aspects-openness/">Technical Aspects of Openness</a> and what&rsquo;s involved in making linguistic data truly machine-actionable.</p>
http://jktauber.com/2016/06/26/formatting-principal-parts/Formatting of Principal Parts2016-06-26T16:47:29Z2016-06-26T16:47:29ZJames Tauber
<p>This is part 5 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and covers the format of the principal parts themselves in the Pratt, Morwood and DCC verb lists.</p>
<p>This is part 5 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and covers the format of the principal parts themselves in the Pratt, Morwood and DCC verb lists.</p>
<p>Now that we&rsquo;ve looked at how the various lemmas interelate, let&rsquo;s turn our attention to the individual part formatting. Here I just describe the various idiosyncracies. In subsequent posts, I&rsquo;ll discuss how to bring together (the relevant parts of) this information in single, machine-actionable format.</p>
<h2 id="pratt">Pratt</h2>
<ul>
<li>unattested form cells have emdash <code>—</code></li>
<li>forms only found with a prefix but listed under the base verb are prefixed <code>-</code> (often still with breathing but sometimes inconsistently not)</li>
<li>alternative forms separated by <code>/</code><ul>
<li>active vs middle (this will be an important distinction in later posts)</li>
<li>different augment handling</li>
<li>stem alternatives</li>
<li>other spelling differences</li>
</ul>
</li>
<li>some single-letter spelling differences are just indicated with parenthetical letter (could be expanded to just use <code>/</code> as above)</li>
<li>aorists sometimes indicate the root in parentheses where it might not be predictable from the part (particularly useful later for inferring unaugmented stems, etc)</li>
<li>(rarely) section number with paradigm is referenced</li>
<li>(rarely) part-specific gloss is included</li>
<li>forms taken from another synonymous verb indicated by <code>*</code> (although not all suppletion indicated this way)</li>
</ul>
<h2 id="morwood">Morwood</h2>
<ul>
<li>includes seventh part for future passive</li>
<li>vowel lengths indicated</li>
<li>pre-contracted forms (especially in future) are shown in parentheses</li>
<li>rare forms are in <em>italics</em></li>
<li>forms only found with a prefix but listed under the base verb are prefixed <code>-</code> (not normally with breathing but one or two inconsistencies)</li>
<li>alternative forms separated by <code>,</code> (or on new line, see below)</li>
<li>imperfect form sometimes listed under aorist column (marked <code>impf.</code>)</li>
<li>specifically transitive or intransitive forms sometimes marked <code>(tr.)</code> or <code>(intr.)</code></li>
<li>because alternative lemmas get their own line, corresponding forms can be lined up</li>
<li>(rarely) page number references</li>
<li>(rarely) part-specific glosses</li>
<li>poetic spelling variants sometimes indicated</li>
</ul>
<h2 id="dcc-greek-core-list">DCC Greek Core List</h2>
<ul>
<li>unlike Pratt and Morwood, the parts are just a comma-separated list</li>
<li>missing forms are not indicated as such so sometimes fewer than six forms are listed; if there are gaps, the next form is sometimes annotated with which part it is (and sometimes it’s annotated even when it doesn’t need to be)</li>
<li>second aorists are annotated with <code>2 aor.</code></li>
<li>where there is a first and second aorist, they can both be given as separate, comma-separated parts (with first annotated as <code>1 aor.</code>)</li>
<li>non-standard parts are sometimes given (e.g. <code>impf.</code>, <code>infin.</code>, <code>ptc.</code>)</li>
<li>forms only found with a prefix but listed under the base verb are prefixed <code>-</code> (not normally with breathing)</li>
<li>occasionally further annotated in parentheses, e.g.:<ul>
<li><code>πειράω (usually mid. πειράομαι)</code></li>
<li><code>προστέθειμαι (but commonly προσκεῖμαι instead)</code></li>
<li><code>τέθειμαι (but usu. κεῖμαι instead)</code></li>
</ul>
</li>
<li>in a couple of cases forms are glossed (although inconsistently presented):<ul>
<li><code>pf. ἀπολώλεκα (“I have utterly destroyed”) or ἀπόλωλα (“I am undone”)</code></li>
<li><code>ἵστημι στήσω will set, ἔστησα set, caused to stand, 2 aor. ἔστην stood, ἕστηκα stand, plup. εἱστήκη stood, ἐστάθην stood</code> (note missing comma between first two parts)</li>
</ul>
</li>
<li>alternative forms just listed separated by <code>or</code></li>
</ul>
http://jktauber.com/2016/06/22/merging-dcc-lemmas/Merging the DCC Lemmas2016-06-22T18:24:54Z2016-06-22T18:24:54ZJames Tauber
<p>This is part 4 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and covers the Dickinson College Commentaries (DCC) Greek Core lemmas and issues in merging them with the existing merge of Pratt and Morwood.</p>
<p>This is part 4 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and covers the Dickinson College Commentaries (DCC) Greek Core lemmas and issues in merging them with the existing merge of Pratt and Morwood.</p>
<p>It was relatively straightfoward to merge in the lemmas from the DCC.</p>
<p>Of the 149 verb entries in the DCC, 111 of them matched exactly with an existing Pratt or Morwood lemma (dropping length in the latter as the DCC doesn&rsquo;t include it).</p>
<p>The remaining 38 cases were simple and fell in to one of nine categories:</p>
<h2 id="1-multiple-spellings-3">1. multiple spellings [3]</h2>
<p>There is only one case of multiple spellings in the DCC verbs (the first one below). In the other two cases, DCC only gives one of the spellings given by Pratt or Morwood.</p>
<ul>
<li>οἴομαι/οἶμαι is given as &ldquo;οἴομαι or οἶμαι&rdquo;</li>
<li>only σκοπέω of σκέπτομαι/σκοπέω is given</li>
<li>only μίγνυμι of μείγνυμι/μίγνυμι is given</li>
</ul>
<h2 id="2-difference-in-voice-4">2. difference in voice [4]</h2>
<ul>
<li>ἀποκρίνω is given, not ἀποκρίνομαι</li>
<li>πορεύω is given, not πορεύομαι</li>
<li>φοβέω is given, not φοβέομαι</li>
<li>πειράω is given as &ldquo;πειράω (usually mid. πειράομαι)&rdquo;</li>
</ul>
<h2 id="3-compounds-of-existing-base-also-in-dcc-17">3. compounds of existing base (also in DCC) [17]</h2>
<p>DCC focuses more on useful vocabulary rather than useful principal parts in its choice of which verbs to include. In this sense it&rsquo;s the opposite of Morwood. As a result, it include compounds where Pratt or Morwood would only include the base. In all the cases below, the DCC also includes the base (but they all fall into the categories of 111 words matching exactly).</p>
<ul>
<li>ἀναιρέω, ἀφαιρέω</li>
<li>ὑπάρχω</li>
<li>συμβαίνω</li>
<li>ἀποδίδωμι, παραδίδωμι</li>
<li>πάρειμι</li>
<li>προσήκω</li>
<li>παρέχω</li>
<li>ἀποθνῄσκω</li>
<li>ἀφίημι</li>
<li>καθίστημι, προστίθημι</li>
<li>καταλαμβάνω, ὑπολαμβάνω</li>
<li>διαφέρω, συμφέρω</li>
</ul>
<p>ἀποδίδωμι here is somewhat debatable as we already have ἀποδίδομαι in Pratt and Morwood but only under πωλέω.</p>
<h2 id="4-compounds-of-existing-base-not-in-dcc-2">4. compounds of existing base (not in DCC) [2]</h2>
<p>In two cases, DCC has a compound whose base is already in Pratt or Morwood but not in DCC itself.</p>
<ul>
<li>ἀποκτείνω</li>
<li>ἀπαλλάσσω</li>
</ul>
<h2 id="5-compounds-where-other-compound-but-not-base-existed-1">5. compounds where other compound but not base existed [1]</h2>
<p>In one case, DCC has a compound whose base is not in Pratt, Morwood or DCC but another compound of the same base is.</p>
<ul>
<li>κατασκευάζω (no σκευάζω but παρασκευάζω existed)</li>
</ul>
<h2 id="6-compounds-with-no-base-existing-1">6. compounds with no base existing [1]</h2>
<p>And in one case, DCC has a compound whose base, nor any other compounds of that base are in Pratt, Morwood or DCC.</p>
<ul>
<li>κατηγορέω</li>
</ul>
<h2 id="7-vs-3">7. σσ vs ττ [3]</h2>
<p>DCC favours σσ over ττ (whereas Pratt and Morwood use latter; although Morwood does have ἀλλάσσω alongside ἀλλάττω)</p>
<ul>
<li>πράσσω</li>
<li>τάσσω</li>
<li>φυλάσσω</li>
</ul>
<h2 id="8-words-appearing-under-different-entry-due-to-suppletion-3">8. words appearing under different entry due to suppletion [3]</h2>
<ul>
<li>δέδοικα (Pratt has under δείδω)</li>
<li>εἶδον (Pratt and Morwood have under ὁράω)</li>
<li>εἶμι (Pratt and Morwood have under ἔρχομαι)</li>
</ul>
<h2 id="9-completely-new-words-3">9. completely new words [3]</h2>
<p>These are unique to DCC.</p>
<ul>
<li>ἔρομαι</li>
<li>λαλέω</li>
<li>πολεμέω</li>
<li>ἔοικα</li>
</ul>
http://jktauber.com/2016/06/21/merging-morwood-and-pratt-lemmas/Merging the Morwood and Pratt Lemmas2016-06-21T07:05:10Z2016-06-21T07:05:10ZJames Tauber
<p>This is part 3 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and covers the Morwood lemmas and issues in merging them with Pratt&rsquo;s.</p>
<p>This is part 3 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and covers the Morwood lemmas and issues in merging them with Pratt&rsquo;s.</p>
<p>Like Pratt, Morwood conflates the lemma with the first principal part and similarly calls the relevant column “present”.</p>
<p>One of the first differences one notices is that Morwood’s principal parts list indicates vowel length. This is useful in many cases for the accentuation stage of my form generating code. That Morwood indicates length and Pratt doesn’t has at least two implications: (1) it means that any matching between the lists will have to strip length (not a big deal); (2) it raises the question of whether forms in Pratt but not Morwood should somehow be tagged as underspecified for length (perhaps to be later inferred from accentuation or looked up manually in other sources).</p>
<p>Like Pratt, Morwood indicates where a base form is used but a particular compound is more common. As we saw previously, Pratt does this by saying <code>αἰνέω {ἐπαινέω}</code>. Morwood, in turn, says <code>αἰνέω (ἐπ-)</code>. Each is fairly easily derivable from the other and whatever our own internal format will be, we should be able to reconstruct both the Pratt and Morwood display. However Morwood will sometimes include more than one preverb. For example <code>στέλλω (ἀπο-, ἐπι-)</code>. In this case Pratt just gives <code>στέλλω</code>.</p>
<p>Sometimes a single preverb will have alternative spellings (depending on assimilation) which Morwood indicates like <code>πίπλημι (ἐμ-/ἐν-)</code>.</p>
<p>One somewhat unusual feature of Morwood is it will group synonyms such as βιόω and ζάω, or πωλέω and ἀποδίδομαι. It still puts them on separate lines, though, which enables other parts to be correlated.</p>
<p>A similar approach is taken to spelling variations. In Morwood, these are:</p>
<ul>
<li>ἀλλάσσω and ἀλλάττω</li>
<li>ἁρμόττω and ἁρμόζω</li>
<li>κλαίω and κλᾱ́ω (the latter of which Morwood annotates with <code>(in prose)</code>)</li>
<li>αὐξάνω and αὔξω</li>
<li>μείγνῡμι and μῑ́γνῡμι</li>
<li>οἶμαι and οἴομαι</li>
</ul>
<p>each expressed as a pair of lines.</p>
<p>There are only two other things to note about Morwood’s first column: (1) where he groups βιόω and ζάω, the latter is inexplicably put in square brackets; (2) <em>italics</em> is occasionally used to indicate a form that is rare or non-attested. This is more often seen in parts other than the first but it does occurs in the first part in Morwood’s second list in two cases: βλώσκω and δαρθάνω (κατα).</p>
<h2 id="matching-up-pratt-and-morwood">Matching up Pratt and Morwood</h2>
<p>There are 73 entries identical in lemma between Pratt and Morwood’s first list. There are 27 entries identical in lemma between Pratt and Morwood’s second list.</p>
<p>There are 14 entries where Morwood simply adds vowel length but otherwise the lemmas are the same (10 in first list, 4 in second).</p>
<p>In three cases the lemmas are in fact the same but the common compound is just formatted differently:</p>
<ul>
<li>αἰνέω {ἐπαινέω} vs αἰνέω (ἐπ-)</li>
<li>θνῄσκω {ἀποθνῄσκω} vs θνῄσκω (ἀπο-)</li>
<li>κτείνω {ἀποκτείνω} vs κτείνω (ἀπο-)</li>
</ul>
<p>Similarly in two cases, Pratt just adds the preverb analysis:</p>
<ul>
<li>[ἀνα]λίσκω vs ἀνᾱλίσκω</li>
<li>[ἀφ]ικνέομαι vs ἀφικνέομαι</li>
</ul>
<p>(although note ἀνᾱλίσκω also adds vowel length)</p>
<p>In one case, Pratt gives common compound on base entry but Morwood doesn&rsquo;t</p>
<ul>
<li>ἵημι {ἀφιημι} vs ῑ̔́ημι</li>
</ul>
<p>(and Morwood adds vowel length)</p>
<p>In five cases, Pratt gives a compound with preverb analysis but Morwood has base (showing common preverb):</p>
<ul>
<li>[ἀν]οίγνυμι/[ἀν]οίγω vs οἴγνῡμι (ἀν-)</li>
<li>[ἀπ]όλλυμι vs ὄλλῡμι (ἀπ-)</li>
<li>[καθ]εύδω vs εὕδω (καθ-)</li>
<li>[κατα]δαρθάνω vs δαρθάνω (κατα)</li>
<li>[δια]φθείρω vs φθείρω (δια-)</li>
</ul>
<p>(although note Pratt also has φθείρω as separate entry; Morwood adds vowel length for οἴγνῡμι (ἀν-) and ὄλλῡμι (ἀπ-); Morwood doesn’t have the alternative ἀνοίγω for ἀνοίγνῡμι)</p>
<p>In three cases, Pratt gives the base (as does Morwood) but Morwood adds a common preverb:</p>
<ul>
<li>μιμνῄσκω vs μιμνῄσκω (ἀνα-)</li>
<li>πίμπλημι vs πίμπλημι (ἐμ-/ἐν-)</li>
<li>στέλλω vs στέλλω (ἀπο-/ἐπι-)</li>
</ul>
<p>φθείρω vs φθείρω (δια-) would be included here but Pratt separately has [δια]φθείρω.</p>
<p>Also, Pratt has an unmatched [ἀπο]κρίνομαι but Pratt and Morwood have a separate κρίνω and κρῑ́νω respectively.</p>
<p>In two cases, Pratt gives middle form but Morwood gives active form:</p>
<ul>
<li>μαίνομαι vs μαίνω</li>
<li>ψεύδομαι vs ψεύδω</li>
</ul>
<p>And in two cases, Morwood gives an indefinite form where Pratt gives 1st singular:</p>
<ul>
<li>δέω (2) vs δεῖ</li>
<li>μέλω vs μέλει</li>
</ul>
<p>There are 105 entries lemmas unique to Pratt (although this includes [δια]λέγομαι and [συλ]λέγω which could be mapped to λέγω). Most of these entries appear to be regular and so, given Morwood’s focus on irregular verbs, it is not surprising there are omissions.</p>
<p>Morwood’s first list adds three new lemmas: ἀποδίδομαι (grouped under πωλέω with which it&rsquo;s suppletive in 3rd part), βιόω and χρή.</p>
<p>Morwood’s second list adds 42 new lemmas: ἄγνῡμι, αἰδέομαι, ἀλείφω, ἅλλομαι, ἁρμόττω / ἁρμόζω, βλώσκω, ἐξετάζω, ζεύγνῡμι, ζέω, καθαίρω, καλύπτω, κείρω, κεράννῡμι, κερδαίνω, κηρῡ́ττω, κρεμάννῡμι, νέω, ὄζω, ὀνινημι, ὀρύττω, ὀσφραίνομαι, ὀφλισκάνω, παίω, περαίνω, πέρδομαι, πετάννῡμι (ἀνα-), πέτομαι, πήγνῡμι, πίμπρημι (ἐμ-/ἐν-), πνέω, σβέννῡμι, σκάπτω, σπάω, σπείρω, σπένδω, σφάλλω, τελέω, τήκω, ὑφαίνω, φείδομαι, χρῑ́ω, ὠθέω.</p>
<h2 id="concluding-thoughts">Concluding Thoughts</h2>
<p>Inclusion of vowel length and differences in how common compounds are shown are easy to handle in any model merging these two lists. If bases and compounds get individual entries containing their parts but are otherwise linked via additional properties, we get around those issues too.</p>
<p>However there remain four open issues to deal with:</p>
<ul>
<li>whether spelling differences that don&rsquo;t span all parts should get separate entries.</li>
<li>how to handle one list giving form in active but another in middle</li>
<li>how to handle one list giving indefinite its own entry, the other putting it under the first person singular</li>
<li>situations where one list uses forms from one lexeme for some of the parts of another</li>
</ul>
http://jktauber.com/2016/06/18/lemmas-pratt-principal-parts/Lemmas in the Pratt Principal Parts2016-06-18T16:51:03Z2016-06-18T16:51:03ZJames Tauber
<p>This is part 2 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and covers the complexities in the notion of a lemma identifying lexical entries, specifically in the Pratt principal parts.</p>
<p>This is part 2 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and covers the complexities in the notion of a lemma identifying lexical entries, specifically in the Pratt principal parts.</p>
<p>Before we get to the other principal parts beyond the first, there is a lot to be discussed just about the first part and its use as a lemma, identifying the lexical entry to which all the parts belong. In this post, we’ll start just looking at the presentation of lemmas in the Pratt list and in the next post move on to the other sources and the problems of merging multiple lists that may differ in choice of lemma for the same lexical entry.</p>
<p>The canonical lemma / first principal part is the present active (or middle) indicative first person singular of the verb but there are at least eight ways in which the first column in the Pratt principal parts table differs from this ideal.</p>
<h2 id="1-contract-verbs">1. Contract verbs</h2>
<p>The present active indicative first person singular of a contract verbs like ἀγαπάω is, of course, not ἀγαπάω but ἀγαπῶ. The pre-contract version is often used (and is indeed used by Pratt) in lemmas and the first principal part so the stem vowel is explicit (as it’s necessary for generating other forms).</p>
<h2 id="2-base-verbs-with-a-more-common-compound">2. Base Verbs With a More Common Compound</h2>
<p>Where a base verb gets its own entry but there is a more common compound, Pratt includes the latter in braces:</p>
<ul>
<li>αἰνέω {ἐπαινέω}</li>
<li>ἀπατάω {ἐξαπατάω}</li>
<li>θνῄσκω {ἀποθνῄσκω}</li>
<li>ἵημι {ἀφιημι}</li>
<li>κτείνω {ἀποκτείνω}</li>
</ul>
<p>Note that the other parts in this case are still given just for the base verb, even if that means they are not attested in Greek texts.</p>
<h2 id="3-compound-verbs">3. Compound Verbs</h2>
<p>In some cases only one compound verb gets an entry, but the preverb is indicated in square brackets:</p>
<ul>
<li>[ἀνα]λίσκω</li>
<li>[ἀν]οίγνυμι/[ἀν]οίγω</li>
<li>[ἀπ]αντάω</li>
<li>[ἀπο]κρίνομαι</li>
<li>[ἀπ]όλλυμι</li>
<li>[ἀπο]λογέομαι</li>
<li>[ἀφ]ικνέομαι</li>
<li>[δια]λέγομαι</li>
<li>[δια]νοέομαι</li>
<li>[δια]φθείρω</li>
<li>[δι]ηγέομαι</li>
<li>[ἐκ]πλήττω</li>
<li>[ἐπι]θυμέω</li>
<li>[ἐπι]μελ(έ)ομαι</li>
<li>[ἐπι]τηδεύω</li>
<li>[ἐπι]χειρέω</li>
<li>[καθ]εύδω</li>
<li>[καθ]ίζω</li>
<li>[κατα]δαρθάνω</li>
<li>[παρα]σκευάζω</li>
<li>[συλ]λέγω</li>
<li>[ὑπ]οπτεύω</li>
</ul>
<p>It seems that compound verbs with a common base used for other compound verbs don’t get their own entries at all in Pratt and the base verb is to be referred to in that case. This is one example where bringing in metadata from Major’s list is potentially useful, in making sure common compound verbs can easily be looked up in their base verb form.</p>
<h2 id="4-multiple-present-stems-conjoined-with-slashes">4. Multiple Present Stems Conjoined with Slashes</h2>
<p>In these cases there are multiple alternative present (or more properly imperfective) stems conjoined with a slash.</p>
<ul>
<li>[ἀν]οίγνυμι/[ἀν]οίγω</li>
<li>αὔξω/αὐξάνω</li>
<li>καίω/κάω</li>
<li>κλάω/κλαίω</li>
<li>μείγνυμι/μίγνυμι</li>
<li>οἴομαι/οἶμαι</li>
<li>σκέπτομαι/σκοπέω</li>
</ul>
<p>While these could arguably be treated as separate lemmas (and hence lexical entries) there are two arguments against doing this: (1) the two forms given are really just alternative spellings; (2) the lexical entries converge in other parts.</p>
<h2 id="5-homographs-that-differ-in-other-parts">5. Homographs That Differ In Other Parts</h2>
<p>δέω has two senses that, while identical in form in the first part, differ in other parts.</p>
<h2 id="6-spelling-differences-with-optional-letter-in-parentheses">6. Spelling Differences with Optional Letter in Parentheses</h2>
<p>There are two cases where an optional epsilon is given in parentheses:</p>
<ul>
<li>[ἐπι]μελ(έ)ομαι</li>
<li>οἰκτ(ε)ίρω</li>
</ul>
<p>In some cases the spelling alternative continues into other parts.</p>
<h2 id="7-lexemes-where-other-lexemes-are-merged-in-for-other-parts">7. Lexemes Where Other Lexemes are Merged In for Other Parts</h2>
<p>These aren’t marked in the lemma itself but I’ve included them here as they represent a particular choice of lemma to group parts under. The actual parts from other lexemes are indicated by an asterisk in Pratt. Note that this is not the same as suppletion although arguably there is a fine line worth exploring in more detail at some point.</p>
<ul>
<li>ἔρχομαι</li>
<li>ἐρωτάω</li>
<li>ἐσθίω</li>
<li>λέγω</li>
<li>πωλέω</li>
<li>ὠνέομαι</li>
</ul>
<h2 id="8-lexemes-without-an-imperfective-stem">8. Lexemes Without An Imperfective Stem</h2>
<p>Some words like οἶδα have a lemma which is from a part other than the first. While in some cases when this happens, the lexeme has been merged with another (see 7), this category covers the case where it hasn’t been.</p>
<h2 id="concluding-thoughts">Concluding Thoughts</h2>
<p>We’ll see further issues when we look at the other lists and how to merge them but for now let’s discuss possible solutions to the issues seen already.</p>
<p>It is important to note that the information in the first column of the Pratt principle parts table (headed “present”) in the book is serving a number of distinct purposes:</p>
<ul>
<li>providing an identifier for the entire row (what could properly be called the “lemma”)</li>
<li>providing the first principal part (and hence the present / imperfective stem)</li>
<li>providing additional information about the lexeme such as its preverb / base</li>
</ul>
<p>By separating these out we have a much clearer way forward. The lemma proper can really be any unique identifier and it can be treated completely opaquely. The first principal part (or parts when there is more than one under a single lemma) can be a separate field. Finally, information such as preverb / base decomposition can be expressed in yet further separate fields. This keeps the first principal part free of extra characters and the lemma opaque.</p>
http://jktauber.com/2016/06/18/sources-principal-part-lists/Sources of Principal Part Lists2016-06-18T07:38:17Z2016-06-18T07:35:09ZJames Tauber
<p>This is part 1 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and covers the three sources of Attic Greek principal parts used to expand and test the <em>Morphological Lexicon</em>.</p>
<p>This is part 1 of a series of blog posts about <a href="http://jktauber.com/2016/06/17/modelling-stems-and-principal-part-lists/">modelling stems and principal part lists</a> and covers the three sources of Attic Greek principal parts used to expand and test the <em>Morphological Lexicon</em>.</p>
<p>Because Louise Pratt’s <em>The Essentials of Greek Grammar</em> was the basis for testing a lot of paradigms, it made sense to use it as the starting point for Attic Greek principal parts as well. Pratt lists the principal parts (the standard six, i.e. not separating out the so-called “future passive”) for 247 verbs. It is not indicated the reason for her particular choice of verbs other than them being &ldquo;common Attic Verbs&rdquo;.</p>
<p>The second source is James Morwood’s <em>Oxford Grammar of Classical Greek</em>. Morwood has two lists, one of &ldquo;Top 101 irregular verbs&rdquo; and one of (81) &ldquo;More principal parts&rdquo;. The title of the first list suggests common verbs are omitted if regular. I have included both lists (although can treat them separately). Morwood includes a seventh part for the “future passive” (when and why this is useful is worthy of a separate blog post).</p>
<p>For my third source I used Chris Francese’s principal parts in the wonderful <a href="http://dcc.dickinson.edu/greek-core-list">DCC Greek Core Vocabulary list</a>. The DCC core vocabulary consists of 500 common words of which 151 are verbs. </p>
<p>All three lists included the occasional form outside the usual six or seven principal parts and a future post in this blog series will address the modelling of that.</p>
<p>The DCC principal parts were in electronic form and so were relatively easy to deal with (although I’ll discuss specifics in a later post). Both the Pratt and Morwood lists I did not have in electronic form and so manually keyed them in over the course of a few weeks (mostly in Vienna earlier this year).</p>
<p>I have also referred at times to Wilfred Major’s 80% list (discussed <a href="http://jktauber.com/2015/10/30/core-vocabulary-new-testament-greek/">elsewhere</a> on this blog) but, as it doesn’t contain principal parts, it was more of a reference for lemma choice and additional metadata than an input for testing part generation itself.</p>
<p>Of course many other lists could be included but these three are sufficient to establish most of the modelling issues and ensure the code works correctly. Data from other lists can be incorporated later relatively easily.</p>
http://jktauber.com/2016/05/19/pyuca-published-journal-open-source-software/pyuca Published in The Journal of Open Source Software2016-05-19T06:20:39Z2016-05-19T06:18:10ZJames Tauber
<p>A research career requires publication in peer-reviewed journals but what if some of your scholarly output is in the form of software? The Journal of Open Source Software attempts to solve that by essentially wrapping peer-reviewed software packages up as lightweight papers. My pyuca library was just accepted for publication by the journal.</p>
<p>A research career requires publication in peer-reviewed journals but what if some of your scholarly output is in the form of software? The Journal of Open Source Software attempts to solve that by essentially wrapping peer-reviewed software packages up as lightweight papers. My pyuca library was just accepted for publication by the journal.</p>
<p><a href="https://github.com/jtauber/pyuca">pyuca</a> is a Python implementation of the Unicode Collation Algorithm and is a vital part of most of my Greek work because it lets me properly sort Greek words. It&rsquo;s not limited to Greek, though, and the library is potentially useful for anyone doing text processing using Python on natural languages other than English.</p>
<p>pyuca has always been citable in an ad-hoc fashion, but thanks to publication in <a href="http://joss.theoj.org/about">The Journal of Open Source Software</a>, it can now be cited as a peer-reviewed journal article.</p>
<p>The submission process was straightforward. I dug up an <a href="https://en.wikipedia.org/wiki/ORCID">ORCID</a> (a persistent identifier for researchers) I&rsquo;d acquired a while ago but never used and set up my GitHub repo on <a href="https://zenodo.org">Zenodo</a> so a <a href="https://en.wikipedia.org/wiki/Digital_object_identifier">Digital Object Identifier</a> (DOI) gets minted for each release.</p>
<p>I then added a specially-formatted <a href="https://github.com/jtauber/pyuca/blob/master/paper.md">paper.md</a> file to the repo (including my ORCID, abstract about the software and any references) and submitted the repo for consideration.</p>
<p>JOSS reviews are done openly using GitHub issues. A reviewer stepped up and gave some excellent feedback on the usage example in my README and on adding contributor guidelines. Once I&rsquo;d addressed that feedback, the paper was accepted by the reviewer and the editor-in-chief and a new DOI was minted for the paper itself.</p>
<p>I also got a notification from ORCID that <a href="http://crossref.org">Crossref</a> had found a new work to be added to my ORCID record.</p>
<p>Of course, I could at some point write an article <em>about</em> pyuca but an article about software is not the same as the software itself (they would likely have quite different audiences) and so citing an article about particular software is not the same as citing the software itself. Thanks to JOSS, the distinction can be maintained while still keeping within a framework of peer-reviewed journal articles.</p>
<p>I&rsquo;m particularly excited that JOSS accepted software with a digital humanities application rather than their typical scientific computing applications.</p>
<p>So if you publish a work that made use of pyuca, you can now cite it as:</p>
<p>Tauber, J. K. (2016). pyuca: a Python implementation of the Unicode Collation Algorithm. The Journal of Open Source Software. DOI: 10.21105/joss.00021</p>
http://jktauber.com/2016/05/04/varros-four-parts-speech-latin/Varro’s Four Parts of Speech for Latin2016-05-19T05:24:54Z2016-05-04T17:22:14ZJames Tauber
<p>In my post <a href="http://jktauber.com/2015/11/05/morphological-parts-speech-greek/">Morphological Parts of Speech in Greek</a> last year, I presented a model of five or six parts of speech based purely on what they inflect for. I just found out Varro suggested similar for Latin over two thousand years ago.</p>
<p>In my post <a href="http://jktauber.com/2015/11/05/morphological-parts-speech-greek/">Morphological Parts of Speech in Greek</a> last year, I presented a model of five or six parts of speech based purely on what they inflect for. I just found out Varro suggested similar for Latin over two thousand years ago.</p>
<p>In his article <cite>Dionysius Thrax vs Marcus Varro</cite> in Historiographia Linguistica 17:1-2 (1990), Daniel Taylor argues for the greater significance of Varro over Thrax in the history of Greco-Roman lingustics.</p>
<p>I actually started reading the article for comparisons made with Theodosius but his description of Varro&rsquo;s parts of speech caught my eye. After introducing Thrax&rsquo;s list of eight parts of speech for Greek (noun, verb, participle, article, pronoun, preposition, adverb, and conjunction) which has dominated since, he describes Varro&rsquo;s for Latin:</p>
<blockquote>
<p>His definitions are exclusively grammatical, and there are but four parts of speech: one with case, one with tense, one with both, one with neither.</p>
</blockquote>
<p>This results in a similar division to the first table in my <a href="http://jktauber.com/2015/11/05/morphological-parts-speech-greek/">earlier blog post</a> although conflates infinitives and finite verbs (which Thrax does as well).</p>
<p>It&rsquo;s certainly appealing as an initial taxonomy of parts of speech, for Greek as well as Latin.</p>
http://jktauber.com/2016/05/01/inflexion-code-morphological-generation-parsing/Inflexion: Generic Code for Morphological Generation and Parsing2016-05-01T17:58:43Z2016-05-01T17:58:43ZJames Tauber
<p>Over the last few years, I&rsquo;ve worked on a number of iterations of code that can generate Ancient Greek verb forms. I&rsquo;ve now broken out the Greek-specific pieces and released a generic library called <strong>inflexion</strong>.</p>
<p>Over the last few years, I&rsquo;ve worked on a number of iterations of code that can generate Ancient Greek verb forms. I&rsquo;ve now broken out the Greek-specific pieces and released a generic library called <strong>inflexion</strong>.</p>
<p>There&rsquo;s nothing particularly innovative about the approach from a computational morphology point of view: it just uses a stem database combined with a list of endings including sandhi rules. I talked a bit about the endings / sandhi rules in <a href="http://jktauber.com/2015/11/22/morphological-lexicon-new-testament-greek-slides/">my SBL talk last year</a>.</p>
<p>It takes a very practical approach, though, and, with a suitable stem database, ending / sandhi rules and accentuation code (all of which I&rsquo;m releasing separately shortly) it can currently generate every single verb form in Louise Pratt&rsquo;s intermediate grammar, on Helma Dik&rsquo;s Greek verb handouts and in Andrew Keller &amp; Stephanie Russell&rsquo;s beginner-intermediate text book.</p>
<p>There&rsquo;s some support for parsing forms if the stem is known and I&rsquo;ll soon be working on support for when the necessary stem is not yet in the database. There&rsquo;s not yet any notion of stems being related and that will be a big part of future work which might be more interesting from a computational morphology point of view.</p>
<p>In a way, the real power (or &ldquo;knowledge&rdquo;) is in the pieces not included in this library itself but I wanted to break out the generic code partly in case other people wanted to use it for other inflected languages but mostly just to keep my own code more modular.</p>
<p>The GitHub repo is <a href="https://github.com/jtauber/inflexion">https://github.com/jtauber/inflexion</a> and example-based <a href="https://github.com/jtauber/inflexion/blob/master/docs.rst">documentation</a> is available.</p>
<p>Stay tuned for new releases of the inflexion library but also the stem database, ending / sandhi rules and accentuation code that are specific to Greek.</p>
http://jktauber.com/2016/02/19/17th-international-morphology-meeting/17th International Morphology Meeting2016-02-19T06:10:38Z2016-02-19T06:10:38ZJames Tauber
<p>I&rsquo;m current in Vienna for the <a href="https://www.wu.ac.at/en/home/imm17/">International Morphology Meeting</a>.</p>
<p>I&rsquo;m current in Vienna for the <a href="https://www.wu.ac.at/en/home/imm17/">International Morphology Meeting</a>.</p>
<p>It&rsquo;s been quite an adventure to get here, which you can read about <a href="https://thoughtstreams.io/jtauber/lost-passport-adventure-2016/">elsewhere</a>.</p>
<p>If four days full of morphology weren&rsquo;t enough there are workshops specifically on computational methods and discriminative approaches, both of which are obviously of huge interest to me.</p>
<p>I&rsquo;m also hoping to catch up with Jim Blevins who is a sort of undergraduate version of a Doktorvater to me.</p>
<p>I&rsquo;m sure in the coming months you&rsquo;ll see a lot on this blog the seeds of which will have been sown at this conference.</p>
<p>(and yes, that was a legitimate use of the future perfect)</p>
http://jktauber.com/2016/01/18/greek-utils-01-released/greek-utils 0.1 Released2016-01-18T00:09:32Z2016-01-18T00:09:32ZJames Tauber
<p>While I write and release a lot of Python code for working with Ancient Greek, it tends to be either throwaway code for data wrangling or fairly specialized code for things like accentuation or inflectional morphology.</p>
<p>I decided there needed to be a place to put lightweight utilities that can be used by a range of different projects. This is the motivation for <code>greek-utils</code>.</p>
<p>While I write and release a lot of Python code for working with Ancient Greek, it tends to be either throwaway code for data wrangling or fairly specialized code for things like accentuation or inflectional morphology.</p>
<p>I decided there needed to be a place to put lightweight utilities that can be used by a range of different projects. This is the motivation for <code>greek-utils</code>.</p>
<p>The initial 0.1 release of <code>greek-utils</code> just provides the following features:</p>
<ul>
<li>Convert BetaCode to Unicode</li>
<li>Turn an iterable into a generator over trigrams</li>
<li>A Trie datastructure</li>
<li>MorphGNT BCV string to human-readable verse reference</li>
</ul>
<p><code>greek-utils</code> is pip installable and the repo is at</p>
<blockquote>
<p><a href="https://github.com/jtauber/greek-utils">https://github.com/jtauber/greek-utils</a></p>
</blockquote>
<p>Full documentation is included there.</p>
<p>I&rsquo;ll be moving a lot more out of gists and individual project repos over the coming months.</p>
http://jktauber.com/2016/01/17/direct-speech-capitalization-first-preceding-head/Direct Speech Capitalization and the First Preceding Head2016-01-17T22:30:54Z2016-01-17T22:30:54ZJames Tauber
<p>As part of my explicit annotation of the normalization column in MorphGNT, I started down the rabbit hole of capitalization conventions which led to an interesting experiment with direct speech and the GBI syntax trees.</p>
<p>As part of my explicit annotation of the normalization column in MorphGNT, I started down the rabbit hole of capitalization conventions which led to an interesting experiment with direct speech and the GBI syntax trees.</p>
<p>Back in <a href="http://jktauber.com/2015/11/27/annotating-normalization-column-morphgnt-part-1/">Annotating the Normalization Column in MorphGNT: Part 1</a>, I talked about wanting to catalogue the reasons why a word in the text differs from the normalized form, and annotate the text on a per-case basis. One difference mentioned was capitalization.</p>
<p>In Greek texts printed now-a-days, there are three reasons why a word might start with an uppercase letter:</p>
<ul>
<li>it&rsquo;s a proper noun</li>
<li>it&rsquo;s the start of a paragraph</li>
<li>it&rsquo;s the start of direct speech</li>
</ul>
<p>So I obviously want to be able to explictly say in each case, which it is (of course it could be more than one or even all three, potentially).</p>
<p>The heuristic for the proper nouns is easy if you actually have tagged the proper nouns or lemmatized the text (although there are some inconsistencies as I&rsquo;ve already mentioned which need to get cleaned up in MorphGNT).</p>
<p>The start of a paragraph heuristic should be straight forward as the electronic SBLGNT text has paragraphs indicated but there are some oddities I&rsquo;m looking at (including 30 cases where a word after a paragraph break is not capitalized, some of which are inconsistencies in SBLGNT itself).</p>
<p>The direct speech is most interesting. I started by assuming that, if the lemma isn&rsquo;t capitalized and the word isn&rsquo;t at the start of a paragraph, it must be the start of direct speech. There are 2,225 cases of this in the SBLGNT text underlying the MorphGNT.</p>
<p>Then I implemented a little heuristic where I traversed up the heads from the start of the direct speech (using the dependency version of the GBI Syntax Trees) until hitting a word that preceded the direct speech. Let&rsquo;s call that the <strong>first preceding head</strong>.</p>
<p>My hypothesis was that the first preceding head would be some verb of communication (saying, writing, etc). In theory one might also expect a complementizer but the GBI Syntax Trees don&rsquo;t treat complementizers as heads so they don&rsquo;t come up in practice.</p>
<p>In 1,641 instances, the first preceding head was a form of λέγω. In much rarer instances (no lexeme with more than 64 instances) there were other verbs like γράφω, ἀποκρίνομαι, φημί, ἐπερωτάω, or κράζω.</p>
<p>In some cases the first preceding head was clearly not a verb of communication (and often not a verb at all). Going through the first half of Matthew so far, here are the explanations I&rsquo;ve discovered:</p>
<ul>
<li>in Matt 6.31, three instances of direct speech are disjoined and the GBI Trees model disjunction in such a way the second and third instance are linked to the first rather than the actual verb of communication, λέγοντες</li>
<li>in Matt 8.9, the verb of communication is elided in the second and third cases so the GBI Tree attaches the direct speech elsewhere</li>
<li>Matt 9.13 has &ldquo;μάθετε τί ἐστιν&rdquo; and Matt 12.7 has &ldquo;εἰ δὲ ἐγνώκειτε τί ἐστιν&rdquo; and the GBI Trees end up hanging the direct speech (or &ldquo;meaning&rdquo;) off τί</li>
</ul>
<p>There were 118 cases in the entire text where there was no first preceding head. Going through the first half of Matthew again, the majority of these are cases where there is no direct speech but a word has been capitalized without an actual paragraph break. However, there are a couple of other interesting scenarions:</p>
<ul>
<li>in Matt 11.21, we might expect ἤρξατο ὀνειδίζειν to be linked to the direct speech with a participle of saying but none is provided</li>
<li>similarly in Matt 13.33, there is direct speech but no participle linking to ἐλάλησεν</li>
</ul>
<p>My plan is to go through the rest of the text and describe all the scenarios, but as this is somewhat of an unexpected rabbit hole, it might take me a while.</p>
<p>If anyone is interested in a raw dump of the data with my explanations (covered above) so far, see <a href="https://gist.github.com/jtauber/39d85cff34c71a2df169">https://gist.github.com/jtauber/39d85cff34c71a2df169</a>.</p>
http://jktauber.com/2016/01/16/morphgnt-607-released/MorphGNT 6.07 Released2016-01-16T19:28:45Z2016-01-16T19:26:51ZJames Tauber
<p>The latest release of MorphGNT (with a corresponding release of the Python library py-sblgnt) fixes some lemmatization issues along with a couple of accent and part-of-speech changes.</p>
<p>The latest release of MorphGNT (with a corresponding release of the Python library py-sblgnt) fixes some lemmatization issues along with a couple of accent and part-of-speech changes.</p>
<ul>
<li>use acute at end of sentence in Luke 10.38</li>
<li>use ἄγω as lemma of ἄγε per issue #39</li>
<li>use ἱερός lemma in all situations per issue #36</li>
<li>fix accent in συνίημι lemma in Acts 28.26 per issue #37</li>
<li>fixed θαρσέω lemmas where forms use ρσ as well per issue #38</li>
<li>fixed προώρισε(ν) lemma in Acts 4.28 per issue #40</li>
<li>elaborated on part of speech and parsing codes in README</li>
<li>corrected lemmatization of ἤρχοντο in John 4.30 per issue #41</li>
<li>changed μακράν to adverb when lemma is μακράν per issue #33</li>
<li>changed lemma for ἔδει to δέω per issue #24</li>
</ul>
<p>Thanks Scott Fleischmann, Ulrik Sandborg-Petersen and Emma Ehrhardt.</p>
<p>MorphGNT is available at <a href="https://github.com/morphgnt/sblgnt">https://github.com/morphgnt/sblgnt</a> and all issues should be filed there.</p>
<p><a href="https://github.com/morphgnt/py-sblgnt">py-sblgnt</a> 0.5 is now available on PyPI for those wanting to access MorphGNT via a pip-installable Python API.</p>
http://jktauber.com/2016/01/13/gouin-language-learning/Gouin on Language Learning2016-01-13T18:53:37Z2016-01-13T18:47:42ZJames Tauber
<p>I recently found out about François Gouin, a sort of proto-Charles Berlitz who wrote (in French) a book called <em>The art of teaching and studying languages</em>, published in 1880 and then translated and published in English in 1892.</p>
<p>I&rsquo;ve only skimmed the book so far but it looks like it contains some real gems relating to the teaching of Greek.</p>
<p>I recently found out about François Gouin, a sort of proto-Charles Berlitz who wrote (in French) a book called <em>The art of teaching and studying languages</em>, published in 1880 and then translated and published in English in 1892.</p>
<p>I&rsquo;ve only skimmed the book so far but it looks like it contains some real gems relating to the teaching of Greek.</p>
<p>Gouin was a classics professor who attempted to learn German initially using the grammar-translation method used for Latin and Greek. The beginning of the book recounts what an utter failure it was and it&rsquo;s quite an amusing read with section headings such as &ldquo;An attempt at conversation—Disgust and fatigue—Reading and translation, their worthlessness demonstrated&rdquo;.</p>
<p>After observing three-year-olds playing with language, the light went off and he developed his Series Method, described in the bulk of the rest of the book.</p>
<p>He ends the book discussing implications of his findings for the teaching of Greek and Latin. Again, I haven&rsquo;t read in detail but I did enjoy his scathing remarks about the uselessness of dictionaries for learning a language and his bafflement at the fact students can spend 12 years learning Latin at school and still know nowhere near what someone learning German for six months under his method would know.</p>
<p>If you&rsquo;re interested in the history of second language teaching with particular reference to Latin and Greek, the book might be worth checking out. It&rsquo;s available at <a href="https://archive.org/details/artofteachingstu00gouirich">Internet Archive</a>.</p>
http://jktauber.com/2016/01/06/linguistic-society-americas-90th-annual-meeting/Off to the Linguistic Society of America’s 90th Annual Meeting2016-01-06T18:03:07Z2016-01-06T18:03:07ZJames Tauber
<p>I&rsquo;m heading off to the LSA&rsquo;s annual meeting for the first time.</p>
<p>I&rsquo;m heading off to the LSA&rsquo;s annual meeting for the first time.</p>
<p>This morning my twitter timeline was filled with classicists heading off to the SCS annual meeting (okay, maybe not filled, but there were three or four). I must follow more classicists than linguists because I didn&rsquo;t see anyone tweeting about heading off to Washington DC for the LSA annual meeting.</p>
<p>The fact they are on at the same time on different sides of the country doesn&rsquo;t exactly help cross-disciplinary collaboration and for a brief moment I wondered which to go to. It was actually an easy choice. I&rsquo;m far more of a linguist than a classicist, even though most of my linguistics for the last twenty two years has been Ancient Greek related. A quick look at the programmes of each conference reassured me I&rsquo;d made the right decision.</p>
<p>I don&rsquo;t yet know if anyone I personally know will be there, which always makes conferences awkward for me. I&rsquo;m also sitting an exam being proctored at a local university on Monday which I need to spend a decent amount of time studying for.</p>
<p>That exam is actually the main reason I haven&rsquo;t blogged much since SBL. That will hopefully change next week when I&rsquo;m done!</p>
http://jktauber.com/2015/12/15/functional-dependency-morphgnt-table/Functional Dependency in the MorphGNT Table2015-12-15T07:06:47Z2015-12-15T07:06:47ZJames Tauber
<p>Often it&rsquo;s useful to see whether certain columns in a table can be entirely determined by others. For example, can you unambigously get the lemma from just the form (the answer is no so a more useful question is which forms are ambiguous as to lemma)? Does knowing the part-of-speech help? Here we provide some code and give some examples.</p>
<p>Often it&rsquo;s useful to see whether certain columns in a table can be entirely determined by others. For example, can you unambigously get the lemma from just the form (the answer is no so a more useful question is which forms are ambiguous as to lemma)? Does knowing the part-of-speech help? Here we provide some code and give some examples.</p>
<p>At the end I provide the script used.</p>
<p>Run in the same directory as the MorphGNT SBLGNT, it runs like this:</p>
<div class="codehilite"><pre><span class="nv">$ </span>./dep.py <span class="m">6</span> 7
45
</pre></div>
<p>What this is telling us is that there are 45 times where the value of column 6 (the normalized form) gives us <em>multiple</em> possible values for column 7 (the lemma). In relational database terms was say that column 7 is not <strong>functionally dependendent</strong> on or not <strong>functionally determined</strong> by column 6 because of those 45 cases.</p>
<p>If you run:</p>
<div class="codehilite"><pre><span class="nv">$ </span>./dep.py -v <span class="m">6</span> 7
</pre></div>
<p>it will actually list all 45, starting with something like:</p>
<div class="codehilite"><pre>ἄμωμον {&#39;ἄμωμος&#39;, &#39;ἄμωμον&#39;}
ἴδε {&#39;ἴδε&#39;, &#39;ὁράω&#39;}
ὑποταγῇ {&#39;ὑποταγή&#39;, &#39;ὑποτάσσω&#39;}
καλῶν {&#39;καλός&#39;, &#39;καλέω&#39;}
Ἰουδαίας {&#39;Ἰουδαῖος&#39;, &#39;Ἰουδαία&#39;}
...
</pre></div>
<p>You can also give more than one column for either the determinant or dependent.</p>
<p>For example, does knowing the form AND part-of-speech determine the lemma?</p>
<p>Turns out there are only 8 exceptions in the current MorphGNT/SBLGNT:</p>
<div class="codehilite"><pre><span class="nv">$ </span>./dep.py -v 6,2 7
Ἅννα N- <span class="o">{</span><span class="s1">&#39;Ἅννα&#39;</span>, <span class="s1">&#39;Ἅννας&#39;</span><span class="o">}</span>
ἀνώτερον A- <span class="o">{</span><span class="s1">&#39;ἀνώτερος&#39;</span>, <span class="s1">&#39;ἀνώτερον&#39;</span><span class="o">}</span>
ἀλάβαστρον N- <span class="o">{</span><span class="s1">&#39;ἀλάβαστρος&#39;</span>, <span class="s1">&#39;ἀλάβαστρον&#39;</span><span class="o">}</span>
χρυσᾶ A- <span class="o">{</span><span class="s1">&#39;χρύσεος&#39;</span>, <span class="s1">&#39;χρυσοῦς&#39;</span><span class="o">}</span>
μακράν A- <span class="o">{</span><span class="s1">&#39;μακράν&#39;</span>, <span class="s1">&#39;μακρός&#39;</span><span class="o">}</span>
ὕστερον A- <span class="o">{</span><span class="s1">&#39;ὕστερον&#39;</span>, <span class="s1">&#39;ὕστερος&#39;</span><span class="o">}</span>
ταχύ A- <span class="o">{</span><span class="s1">&#39;ταχύ&#39;</span>, <span class="s1">&#39;ταχύς&#39;</span><span class="o">}</span>
ἤρχοντο V- <span class="o">{</span><span class="s1">&#39;ἄρχω&#39;</span>, <span class="s1">&#39;ἔρχομαι&#39;</span><span class="o">}</span>
8
</pre></div>
<p>There are other things that can be explored with this. How many lemmas have more than one part-of-speech in the MorphGNT/SBLGNT?</p>
<div class="codehilite"><pre><span class="nv">$ </span>./dep.py <span class="m">7</span> 2
70
</pre></div>
<p>How many forms have more than one parse analysis extant in the text, even if you know the lemma and part-of-speech:</p>
<div class="codehilite"><pre><span class="nv">$ </span>./dep.py 6,7,2 3
903
</pre></div>
<p>Given a lemma, part-of-speech and parse analysis, how many cases are there where multiple alternative forms are seen:</p>
<div class="codehilite"><pre><span class="nv">$ </span>./dep.py 7,2,3 6
132
</pre></div>
<p>Looking at these with the <code>-v</code> option, you can see some are unavoidable:</p>
<div class="codehilite"><pre>ὁράω V- 1AAI-P-- {&#39;εἴδομεν&#39;, &#39;εἴδαμεν&#39;}
κλείς N- ----APF- {&#39;κλεῖς&#39;, &#39;κλεῖδας&#39;}
</pre></div>
<p>whereas others are likely corrections that need to be made to the lemmatization:</p>
<div class="codehilite"><pre>τις RI ----GSM- {&#39;τινος&#39;, &#39;τινός&#39;}
</pre></div>
<p>The most recent set of corrections to MorphGNT/SBLGNT (which will be in release 6.07) stem from this sort of analysis.</p>
<p>There are still more to discuss and resolve, however. See <a href="https://github.com/morphgnt/sblgnt/issues/32">https://github.com/morphgnt/sblgnt/issues/32</a> and other issues on GitHub for details and to help in the discussion.</p>
<h2 id="the-script">The script</h2>
<script src="https://gist.github.com/jtauber/ab691a5552d97a8c40c2.js"></script>
http://jktauber.com/2015/12/15/new-numbering-system-greek-new-testament-lexemes/A (Not So) New Numbering System for Greek New Testament Lexemes2015-12-15T01:40:13Z2015-12-15T01:40:13ZJames Tauber
<p>Ten years ago, when Ulrik Sandborg-Petersen and I started collaborating, we came up with a way of referencing lexemes that would satisfy both the lumpers and splitters. At the time we wrote a paper that we circulated to a small audience but now it&rsquo;s finally up on Academia.edu.</p>
<p>Ten years ago, when Ulrik Sandborg-Petersen and I started collaborating, we came up with a way of referencing lexemes that would satisfy both the lumpers and splitters. At the time we wrote a paper that we circulated to a small audience but now it&rsquo;s finally up on Academia.edu.</p>
<p>The 2006 unpublished paper is entitled <a href="https://www.academia.edu/19660777/A_New_Numbering_System_for_Greek_New_Testament_Lexemes_2006_">A New Numbering System for Greek New Testament Lexemes</a>.</p>
<p>Here&rsquo;s the abstract:</p>
<blockquote>
<p>Numbering systems (such as Strong’s) are a popular way to reference the lexemes of the Greek New Testament corpus but a straight enumeration is not without problems, particularly when there is disagreement about whether two forms are the same lexeme or not. We present a way of referencing lexemes that allows competing viewpoints to be represented simultaneously. Existing numbering systems can be mapped into this new system without any loss of granularity and new analyses can be expressed without violating the integrity of existing references into the system.</p>
</blockquote>
http://jktauber.com/2015/11/27/annotating-normalization-column-morphgnt-part-1/Annotating the Normalization Column in MorphGNT: Part 12015-11-27T01:34:22Z2015-11-27T01:34:22ZJames Tauber
<p>Since the Series-6 release, MorphGNT has had a column that normalizes the word forms in the text for contextual things like accent changes, elision, movable nu and capitalization. I thought it would be useful to provide an annotation of exactly what normalization had been done for each word in the text and why.</p>
<p>Since the Series-6 release, MorphGNT has had a column that normalizes the word forms in the text for contextual things like accent changes, elision, movable nu and capitalization. I thought it would be useful to provide an annotation of exactly what normalization had been done for each word in the text and why.</p>
<p>I wrote a short Python script that runs some heuristics on each case where the &ldquo;word&rdquo; column and &ldquo;norm&rdquo; column differ to determine the nature of the in-context change.</p>
<p>In this post, I&rsquo;ll just report on some statistics. In later posts, I&rsquo;ll dive into further details that rely on actually looking at the surrounding context (rather than just the difference in one row).</p>
<p>There are 47,630 times where the word and norm columns differ.</p>
<p><strong>38,523</strong> times there is a <strong>change of accent</strong> (clitics, oxytones taking graves, etc). </p>
<p><strong>3,721</strong> times there is a <strong>change in capitalization</strong>. </p>
<p><strong>1,221</strong> times there is <strong>elision</strong>: 984 times a straight dropping of a final vowel, 237 times an additional aspiration of the preceding consonant.</p>
<p><strong>5,223</strong> times there is a <strong>movable nu</strong>. Note that both the existence and absence of nu is normalized to <code>(ν)</code> so this covers all cases where a nu <em>could</em> be dropped as well as the 142 times when it actually is.</p>
<p><strong>226 times</strong> there is a <strong>movable sigma</strong> (20 times where it&rsquo;s actually dropped). This doesn&rsquo;t count ἐξ (another 234 times). There are also 825 times οὐκ appears and 105 times οὐχ appears.</p>
<p>In addition to the 47,630 cases above, there are also 32 other instances of two types of discrepancy that need to be resolved. One is ἑλπίδι with a rough accent in Romans. The other is the cases where Χριστός appears with lower case χ. I&rsquo;m not sure what the solution to the former is but the latter might just involve having two distinct lemmata for Χριστός vs χριστός.</p>
<p>All these statistics might seem of trivial interest but they are side effects of a more important task of both verifying the normalization and, as will be covered in subsequent posts, testing context-sensitive accentuation rules.</p>
http://jktauber.com/2015/11/23/back-more-sustainable-blogging-pace/Back to a More Sustainable Blogging Pace2015-11-23T05:24:15Z2015-11-23T05:24:15ZJames Tauber
<p>Well, I did it! I blogged a post for every day in the four weeks leading up to my talk at SBL. It was a fantastic motivator but I can&rsquo;t sustain the pace.</p>
<p>Well, I did it! I blogged a post for every day in the four weeks leading up to my talk at SBL. It was a fantastic motivator but I can&rsquo;t sustain the pace.</p>
<p>I&rsquo;ll try to at least blog once a week with a substantial post at least once a month but we&rsquo;ll see.</p>
<p>There&rsquo;ll hopefully be a lot of ongoing progress to report but I&rsquo;ll also try to occasionally step back and write some more well-thought-out pieces, particularly on general linguistics. For thoughts-in-progress, I&rsquo;ll likely use <a href="https://thoughtstreams.io/">ThoughtStreams</a>.</p>
<p>I&rsquo;m really hoping to collaborate with others on all the work I&rsquo;ve been talking about over the last four weeks and in my SBL talk, so if you&rsquo;re interested, email me at <strong>jtauber@jtauber.com</strong>.</p>
<p>And because blogging won&rsquo;t be as regular, please subscribe to get email updates if you haven&rsquo;t already. Just fill out your email address in the form to the right (if you&rsquo;re on the site).</p>
http://jktauber.com/2015/11/22/morphological-lexicon-new-testament-greek-slides/A Morphological Lexicon of New Testament Greek: My SBL 2015 Slides2015-11-22T21:03:28Z2015-11-22T21:03:28ZJames Tauber
<p>This morning I gave my talk at SBL 2015 on my <em>Morphological Lexicon</em> project.</p>
<p>This morning I gave my talk at SBL 2015 on my <em>Morphological Lexicon</em> project.</p>
<p>I&rsquo;ve put the slides up <a href="https://www.academia.edu/18816954/A_Morphological_Lexicon_of_New_Testament_Greek">here</a>.</p>
http://jktauber.com/2015/11/21/analyzing-verbal-morphology-part-1/Analyzing Verbal Morphology: Part 12015-11-22T20:52:34Z2015-11-21T19:00:00ZJames Tauber
<p>In anticipation of my SBL talk tomorrow, here&rsquo;s an update on my verbal analysis.</p>
<p>In anticipation of my SBL talk tomorrow, here&rsquo;s an update on my verbal analysis.</p>
<p>As I mentioned in <a href="http://jktauber.com/2015/11/12/analyzing-nominal-morphology-part-1/">Analyzing Nominal Morphology: Part 1</a>, I started off with nominal morphology but, the last couple of years have been more focused on the verb (until a couple of months ago when I switched back to the noun).</p>
<p>My current modeling approach is actually my third attempt at verbs. Perhaps in a later post I&rsquo;ll describe the earlier approaches and why I backed out and started from scratch twice. I&rsquo;m happy with the path I&rsquo;m following now, though.</p>
<p>Unlike the approach I took later with nouns, my verb analysis didn&rsquo;t focus on theme/distinguisher but on stem/suffix with sandhi rules. One reason for this is one of my immediate goals was stem generation.</p>
<p>Prior to running on all the MorphGNT verbs, I started with Helma Dik&rsquo;s <em>Nifty Greek Handouts</em> and the verb paradigms in Louise Pratt&rsquo;s <em>The Essentials of Greek Grammar</em>. Coverage is now those plus all the MorphGNT verbs except for imperatives, subjunctives and optatives.</p>
<p>The code and data is currently available at <a href="https://github.com/jtauber/greek-inflection">https://github.com/jtauber/greek-inflection</a> although I may move at least the GNT-specific data to be in the <code>morphological-lexicon</code> repo soon.</p>
<p>The basic approach is to have an &ldquo;endings&rdquo; database and a &ldquo;stems&rdquo; database. The &ldquo;endings&rdquo; database looks like:</p>
<div class="codehilite"><pre>PAI.1S:
- &quot;|&gt;ω&lt;ω|&quot;
- &quot;|ε&gt;ῶ&lt;ω|&quot;
- &quot;|ο&gt;ῶ&lt;ω|&quot;
- &quot;|α&gt;ῶ&lt;ω|&quot;
- &quot;|ο!&gt;ω&lt;_1|μι&quot;
- &quot;|ε!&gt;η&lt;_1|μι&quot;
- &quot;|υ!&gt;υ&lt;_1|μι&quot;
- &quot;|α!&gt;η&lt;_1|μι&quot;
- &quot;|ει!&gt;ει&lt;_1|μι&quot;
AAI.1S:
- &quot;|&gt;&lt;|α&quot;
- &quot;|%&gt;ο&lt;T_1|ν&quot;
- &quot;|α^&gt;η&lt;_1|ν&quot;
- &quot;|ε^&gt;η&lt;_1|ν&quot;
- &quot;|ο^&gt;ω&lt;_1|ν&quot;
- &quot;|α!&gt;η&lt;_1|ν&quot;
</pre></div>
<p>where endings and sandhi are expressed. You can see various stem diacritics like <code>!</code> for athematic, <code>^</code> for root aorists and <code>%</code> for second aorists. <code>T_1</code> represents a thematic vowel and <code>_1</code> a particular ablaut pattern.</p>
<p>Along side this is a larger stem database:</p>
<div class="codehilite"><pre>ἀγαπάω:
stems:
1-: ἀγαπα
1+: ἠγαπα
2-: ἀγαπησ
3-: ἀγαπησ
3+: ἠγαπησ
4-: ἠγαπηκ
5-: ἠγαπη
7-: ἀγαπηθησ
ἀναλαμβάνω:
compound: ἀνά++λαμβάνω
stems:
1-: ἀναλαμβαν
3-: ἀναλαβ%
3+: ἀνελαβ%
6-: ἀναλημφθ
6+: ἀνελημφθ
</pre></div>
<p>Stems are keys by a principal-part like scheme where <code>-</code> / <code>+</code> refers to augmented and unaugmented. The <code>7-</code> stem is the future perfect.</p>
<p>The stem database can also do overrides for individual paradigm cells, show preverbs, mark enclitics and more.</p>
<p>All this gets tested against the Dik and Pratt examples and the verb forms in the MorphGNT in two ways:</p>
<ul>
<li>given a lemma and features, is the correct form generated?</li>
<li>given a form, lemma and features, is the correct stem identified?</li>
</ul>
<p>Once the imperatives, subjunctives and optatives are done, I&rsquo;ll work on stem relationships, essentially treating the stems as another paradigm. I may also at some point generate distinguishers for each verb form (within a particular aspect/tense-voice form).</p>
<p>Further work will involve using it to actually analyze new texts, particularly handling the case where the stem is not yet in the stem database.</p>
http://jktauber.com/2015/11/20/greek-accentuation-library/Greek Accentuation Library2015-11-20T13:16:27Z2015-11-20T13:16:27ZJames Tauber
<p>I knew that a necessary component of a comprehensive morphological analyzer for Ancient Greek was going to be a library for handling accentuation, so back in January 2014, I started the <code>greek-accentuation</code> Python library.</p>
<p>I knew that a necessary component of a comprehensive morphological analyzer for Ancient Greek was going to be a library for handling accentuation, so back in January 2014, I started the <code>greek-accentuation</code> Python library.</p>
<p>It consists of three modules:</p>
<ul>
<li>characters</li>
<li>syllabify</li>
<li>accentuation</li>
</ul>
<p>The <strong>characters</strong> module provides basic analysis and manipulation of Greek characters in terms of their Unicode diacritics as if decomposed. So you can use it to add, remove or test for breathing, accents, iota subscript or length diacritics.</p>
<div class="codehilite"><pre>&gt;&gt;&gt; base(&#39;ᾳ&#39;)
&#39;α&#39;
&gt;&gt;&gt; iota_subscript(&#39;ᾳ&#39;) == IOTA_SUBSCRIPT
True
&gt;&gt;&gt; add_diacritic(&#39;α&#39;, IOTA_SUBSCRIPT)
&#39;ᾳ&#39;
</pre></div>
<p>The <strong>syllabify</strong> module provides basic analysis and manipulation of Greek syllables. It can syllabify words, give you the onset, nucleus, code, rime or body of a syllable, judge syllable length or give you the accentuation class of word.</p>
<div class="codehilite"><pre>&gt;&gt;&gt; syllabify(&#39;γυναικός&#39;)
[&#39;γυ&#39;, &#39;ναι&#39;, &#39;κός&#39;]
&gt;&gt;&gt; penult(&#39;οἰκία&#39;)
&#39;κί&#39;
&gt;&gt;&gt; paroxytone(&#39;λόγος&#39;)
True
</pre></div>
<p>The <strong>accentuation</strong> module uses the other two modules to accentuate Ancient Greek words. As well as listing <code>possible_accentuations</code> for a given unaccented word, it can produce <code>recessive</code> and (given another form with an accent) <code>persistent</code> accentuations.</p>
<p>The library is open source under an MIT license. You can get the package on PyPI and the source repo is <a href="https://github.com/jtauber/greek-accentuation">https://github.com/jtauber/greek-accentuation</a>.</p>
http://jktauber.com/2015/11/19/dangers-reconstructing-too-much-morphophonology/The Dangers of Reconstructing Too Much Morphophonology2015-11-19T20:38:49Z2015-11-19T20:25:18ZJames Tauber
<p>What is the genitive singular ending for 2nd declension nouns?</p>
<p>What is the genitive singular ending for 2nd declension nouns?</p>
<p>The beginner student probably thinks the ending is ου.</p>
<p>Those that are told the <em>stem</em> ends in ο might be tempted to conclude the actual ending is υ. At least one popular introductory text teaches this but it&rsquo;s incorrect.</p>
<p>Those more familiar with the sandhi rules might conclude the ου <em>could</em> come from ο+σο or ε+σο via οο. Those who know some Homer might speculate an ο+ιο, but ου is also found in Homer (especially in the pronouns) which might seem confusing.</p>
<p>Those who study proto-Indo-European might know of <em>*osyo</em> becoming <em>*ohyo</em> in Proto-Greek then <em>*oyyo</em>.</p>
<p>How should this be modeled synchronically? I think there&rsquo;s too much of a tendency in morphophonology to adopt an &ldquo;ontogeny recapitulates phylogeny&rdquo; approach and assume that speakers are storing a <em>historical</em> underlying form and then replaying millennia of sound changes.</p>
<p>The problem here is there&rsquo;s no way a Koine speaker would have reconstructed <em>*osyo</em> during acquisition. In my <a href="http://jktauber.com/2015/11/12/analyzing-nominal-morphology-part-1/">stem+ending annotations</a> I tentatively used ο+ιο but I&rsquo;m reconsidering that. There is no evidence I can think of that would have helped a native Koine speaker choose between ο+ιο, ο+σο or ο+ο as underlying. </p>
<p>And given that there are a class of 1st declension masculine nouns whose genitive singular ends in ου despite the α stem ending (which could not result in ου unless the α was actually dropped), it may actually be best to view the speakers&rsquo; knowledge as the ending just being &ldquo;ου&rdquo;— the naïve view we wrote off at the start.</p>
<p>At the very least, we need to be very careful when saying &ldquo;the stem is X, the ending is Y&rdquo; as to whether we are trying to explain the form historically or the speakers&rsquo; synchronic knowledge.</p>
http://jktauber.com/2015/11/18/full-citation-forms-and-inflectional-classes/Full Citation Forms and Inflectional Classes2015-11-19T05:25:13Z2015-11-18T19:00:00ZJames Tauber
<p>Back in July and August 2014, I started looking at patterns in the full citation forms of nouns in Danker&rsquo;s Concise Lexicon. My goal was partly to explore, in a systematic way, the relationship between inflectional classes and the information expressed in the common pattern of <code>{nominative form}, {genitive ending}, {article}</code>. I also wanted to put together a kind of automated test to catch typos and inconsistencies in the lexicon.</p>
<p>Back in July and August 2014, I started looking at patterns in the full citation forms of nouns in Danker&rsquo;s Concise Lexicon. My goal was partly to explore, in a systematic way, the relationship between inflectional classes and the information expressed in the common pattern of <code>{nominative form}, {genitive ending}, {article}</code>. I also wanted to put together a kind of automated test to catch typos and inconsistencies in the lexicon.</p>
<p>I started drafting a paper with my findings as I went along and I intend to get back to it at some point but I wanted to mention this little project here, point to the code and mention a couple of things coming out of it so far.</p>
<p>The code is available at <a href="https://github.com/morphgnt/morphological-lexicon/tree/master/projects/citation_forms">https://github.com/morphgnt/morphological-lexicon/tree/master/projects/citation_forms</a>.</p>
<p>In particular, the file <code>citation_form_data.py</code> contains the rules (still needing some work outside the basic <code>{nominative form}, {genitive ending}, {article}</code> pattern) for what a full citation form can look like.</p>
<p>Each row in this file contains a tuple of:</p>
<ul>
<li>a tuple of regexes matching the full citation form, Mounce&rsquo;s category and Dobson&rsquo;s part-of-speech/gender (the last mostly to catch errors in that file)</li>
<li>a tentative new label for the inflectional class</li>
<li>a (potentially empty) list of child rules</li>
</ul>
<p>For example:</p>
<div class="codehilite"><pre>((r&quot;α, ας, ἡ$&quot;, r&quot;^n-1a$&quot;, r&quot;^N:F$&quot;), &quot;1.1/a1/F&quot;, []),
</pre></div>
<p>These rules are organized in a hierarchy starting with the most general rules and, containing as children, more specific subsets. The inflectional class labels like <code>1.1/a1/F</code> are intended to reflect this hierarchy. For example, here are the ancestors of the above rule:</p>
<div class="codehilite"><pre>((r&quot;^(\w+), (\w+), (\w+)$&quot;, r&quot;^n-&quot;, r&quot;^N&quot;), &quot;&quot;, [
((r&quot;[αη]ς, {art}$&quot;, r&quot;^n-1&quot;, r&quot;^N:.$&quot;), &quot;1&quot;, [
((r&quot;ας, ἡ$&quot;, r&quot;^n-1&quot;, r&quot;^N:F$&quot;), &quot;1.1/F&quot;, [
((r&quot; ας, ἡ$&quot;, r&quot;^n-1&quot;, r&quot;^N:F$&quot;), &quot;1.1/F&quot;, [
((r&quot;α, ας, ἡ$&quot;, r&quot;^n-1[ah]$&quot;, r&quot;^N:F$&quot;), &quot;1.1/a/F&quot;, [
</pre></div>
<p>The first line is the most general rule for any nouns whose citation form in Danker has three parts. The next level (given the class <code>1</code>) are those that have a citation form ending with either ας or ης and then an article. This is further subset (class <code>1.1/F</code>) into citations forms ending with ας and a feminine singular article. This is further subset into citation forms with no other letters before ας in the genitive ending provided. This is further subset (class <code>1.1/a/F</code>) into those citation form whose nominative form ends with α. Because this still results in a Mounce category of n-1a or n-1h, this is further refined into the first line we saw with the inflectional class <code>1.1/a1/F</code>.</p>
<p>From these rules certain inconsistencies show up. For example, &ldquo;γῆ, γῆς, ἡ&rdquo; is the only &ldquo;η, ης, ἡ&rdquo; entry that gives the full genitive form rather than just the genitive ending. Five of the six masculine words with genitive in &ldquo;τος&rdquo; give &ldquo;τος&rdquo; with the preceding vowel as the genitive ending but the other one gives the full genitive form. 34 feminine words with genitive in &ldquo;τος&rdquo; give just the preceding vowel but one gives the preceding consonant + vowel.</p>
<p>For a lexicon whose editors want consistency in their citation forms, this kind of thing is useful to be able to check programmatially.</p>
<p>Lots more to say when I get around to finishing the paper but I wanted to at least share the code and (in-progress) rules. For the tie-in to inflectional class modeling, I&rsquo;ll soon integrate this work with my recent work on <a href="http://jktauber.com/2015/11/12/analyzing-nominal-morphology-part-1/">Analyzing Nominal Morphology</a> but I&rsquo;ll also use the &ldquo;automatic consistency checking&rdquo; aspect of the work to ensure better consistency in the <em>Morphological Lexicon</em>.</p>
http://jktauber.com/2015/11/17/modern-greek-text-speech-biblical-greek/Modern Greek Text to Speech for Biblical Greek2015-11-18T03:07:53Z2015-11-17T19:00:00ZJames Tauber
<p>Text-to-speech is pretty good these days but a lot of people don&rsquo;t realize that operating systems like OS X have support for languages other than English, including Modern Greek. So I thought I&rsquo;d experiment with using it to read the Greek New Testament.</p>
<p>Text-to-speech is pretty good these days but a lot of people don&rsquo;t realize that operating systems like OS X have support for languages other than English, including Modern Greek. So I thought I&rsquo;d experiment with using it to read the Greek New Testament.</p>
<p>On OS X, if you go to System Preferences &gt; Dictation and Speech, then select &ldquo;Customize&hellip;&rdquo; under System Voice, you can download or upgrade your Greek voices. There are a male and female voice you can try: Nikos and Melina respectively.</p>
<p>There are two ways I know of that you can then get those voices to read Greek for you. </p>
<p>The first way is, with Nikos or Melina selected as the System Voice, you select any Greek text in another app (such as TextEdit), right click and select Speech &gt; Start Speaking. This will honour the speed setting in System Preferences &gt; Dictation and Speech. Slowing down the speech drops quality dramatically, though.</p>
<p>The second way is on the command line with <code>say</code>. I can&rsquo;t work out if <code>say</code> supports slowing down the reading (it doesn&rsquo;t honour the speed setting in System Preferences) but it does support outputting the result to an AIFF file.</p>
<p>Note that you can&rsquo;t feed it polytonic Greek so you need to strip breathing and convert accents. I did that to produce a text like this:</p>
<blockquote>
<p>Ήν δέ άνθρωπος εκ τών Φαρισαίων, Νικόδημος όνομα αυτώ, άρχων τών Ιουδαίων· ούτος ήλθεν πρός αυτόν νυκτός καί είπεν αυτώ· Ραββί, οίδαμεν ότι από θεού ελήλυθας διδάσκαλος· ουδείς γάρ δύναται ταύτα τά σημεία ποιείν ά σύ ποιείς, εάν μή ή ο θεός μετ’ αυτού. απεκρίθη Ιησούς καί είπεν αυτώ· Αμήν αμήν λέγω σοι, εάν μή τις γεννηθή άνωθεν, ου δύναται ιδείν τήν βασιλείαν τού θεού.</p>
</blockquote>
<p>I then used</p>
<div class="codehilite"><pre>say -v Nikos -f john_3_1.txt -o john_3_1
</pre></div>
<p>to produce the following <a href="http://jktauber.com/site_media/static/john_3_1.aiff">AIFF file</a>.</p>
<p><object>
<param name="autostart" value="false">
<param name="autoplay" value="false">
<param name="src" value="http://jktauber.com/site_media/static/john_3_1.aiff">
</object></p>
<p>A pretty decent reading of the Greek New Testament with Modern Greek pronunciation.</p>
<p>The only oddity is that the ου in the last clause is spelled out. Not sure how to fix that.</p>
<p>What excites me about this is less the generation of long audio files of entire passages, but more how it could be used in conjunction with an intelligent tutor to pronounce individual words and phrases that the student is currently studying.</p>
http://jktauber.com/2015/11/16/actual-core-vocab-lists-greek-new-testament/Actual Core Vocab Lists for Greek New Testament2015-11-17T04:33:31Z2015-11-16T19:00:00ZJames Tauber
<p>Back in <a href="http://jktauber.com/2015/10/30/core-vocabulary-new-testament-greek/">The Core Vocabulary of New Testament Greek</a> I talked about Wilfred Major&rsquo;s 2008 paper on core vocabulary lists for Classical Greek and provided code for producing the same for the Greek New Testament along with some discussion of the results. I didn&rsquo;t actually include the full results, however.</p>
<p>Back in <a href="http://jktauber.com/2015/10/30/core-vocabulary-new-testament-greek/">The Core Vocabulary of New Testament Greek</a> I talked about Wilfred Major&rsquo;s 2008 paper on core vocabulary lists for Classical Greek and provided code for producing the same for the Greek New Testament along with some discussion of the results. I didn&rsquo;t actually include the full results, however.</p>
<p>Prompted by Paul-Nitz&rsquo;s <a href="http://www.ibiblio.org/bgreek/forum/viewtopic.php?f=15&amp;t=3418&amp;p=22864#p22821">request</a> on the B-Greek forum, I put together <a href="https://github.com/jtauber/core-gnt-vocab">https://github.com/jtauber/core-gnt-vocab</a> which includes not only the code but actually generated lists (currently 50% and 80% lemma lists).</p>
<p>I&rsquo;ve included as a starting point glosses from Dodson but I&rsquo;d love people to file issues (or even better, pull requests) if they have improvements they&rsquo;d like to see.</p>
<p>I&rsquo;m also interested if people think certain lexeme should be split like Major does (e.g. suppletive verbs).</p>
<p>You can get the raw lists at:</p>
<ul>
<li><a href="https://raw.githubusercontent.com/jtauber/core-gnt-vocab/master/lemma_50.txt">50% List</a></li>
<li><a href="https://raw.githubusercontent.com/jtauber/core-gnt-vocab/master/lemma_80.txt">80% List</a></li>
</ul>
http://jktauber.com/2015/11/15/first-prototype-new-online-reader/First Prototype of New Online Reader2015-11-17T02:37:26Z2015-11-15T19:00:00ZJames Tauber
<p>Over in the lab section of this site, I&rsquo;ve added a little prototype Patrick Altman and I built last night.</p>
<p>Over in the lab section of this site, I&rsquo;ve added a little prototype Patrick Altman and I built last night.</p>
<p>At the moment it just shows the first paragraph of John 3 but if you click on a word it gives the lemmatization and parsing from MorphGNT, the gloss from Dodson and links to the head and child dependencies based on the GBI Syntax trees.</p>
<p>You can try it out at <a href="http://jktauber.com/labs/reader.html">http://jktauber.com/labs/reader.html</a>.</p>
<p>The source code is available at <a href="https://github.com/morphgnt/reader-demo">https://github.com/morphgnt/reader-demo</a>.</p>
<p>Besides the obvious extention to the rest of the GNT text, I&rsquo;ll soon bring in information from the Morphological Lexicon to help readers understand <em>why</em> the form is what it is.</p>
<p>Longer term, I&rsquo;d like to add user accounts so authenticated users can bookmark passages, words and forms. Giving users the ability to mark which words they do or don&rsquo;t understand means that the site can then produce custom quizzes, recommend what to read next, etc.</p>
<p>This is starting to get to the real heart of learning tools driven by better linguistic databases.</p>
<p>If you&rsquo;re a Django and/or React developer who would like to help with this, let me know. If you teach intermediate students and have feedback on what would make this more useful, I&rsquo;d also love to hear from you.</p>
http://jktauber.com/2015/11/14/analyzing-nominal-morphology-part-2/Analyzing Nominal Morphology: Part 22015-11-16T20:39:46Z2015-11-14T19:00:00ZJames Tauber
<p>In <a href="http://jktauber.com/2015/11/12/analyzing-nominal-morphology-part-1/">Analyzing Nominal Morphology: Part 1</a>, I talked about putting together a list of nominal distinguishers and verifying it on the MorphGNT, generating a per-lexeme theme + distinguisher analysis. Here, I&rsquo;ll outline some further steps I&rsquo;ve taken. </p>
<p>In <a href="http://jktauber.com/2015/11/12/analyzing-nominal-morphology-part-1/">Analyzing Nominal Morphology: Part 1</a>, I talked about putting together a list of nominal distinguishers and verifying it on the MorphGNT, generating a per-lexeme theme + distinguisher analysis. Here, I&rsquo;ll outline some further steps I&rsquo;ve taken. </p>
<p>As well as producing a YAML file with entries for each lexeme, I also now generate a (space-delimited) tabular form that looks like this:</p>
<div class="codehilite"><pre>ἀβαρής a-4a -- M n-3d(2aA) ἀβαρ AS ἀβαρῆ ἀβαρ ῆ εσ+α
ἄβυσσος n-2b -- F n-2b ἀβυσσ GS ἀβύσσου ἀβύσσ ου ο+ιο
ἄβυσσος n-2b -- F n-2b ἀβυσσ AS ἄβυσσον ἄβυσσ ον ο+ν
ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι NS ἀγαθοποιῶν ἀγαθοποι ῶν ουντ+
ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι NP ἀγαθοποιοῦντες ἀγαθοποι οῦντες ουντ+ες
ἀγαθοποιέω verb PA M n=3c(5b-OU) ἀγαθοποι AP ἀγαθοποιοῦντας ἀγαθοποι οῦντας ουντ+ας
ἀγαθοποιέω verb PA F n-1c ἀγαθοποιουσ NP ἀγαθοποιοῦσαι ἀγαθοποιοῦσ αι α+ι
ἀγαθοποιΐα n-1a -- F n-1a ἀγαθοποιϊ DS ἀγαθοποιΐᾳ ἀγαθοποιΐ ᾳ α+ι
ἀγαθοποιός a-3a -- M n-2a ἀγαθοποι GP ἀγαθοποιῶν ἀγαθοποι ῶν +ων
ἀγαθός a-1a(2a) -- M n-2a ἀγαθ NS ἀγαθός ἀγαθ ός ο+ς
</pre></div>
<p>The columns are:</p>
<ul>
<li>lemma</li>
<li>Mounce category (or <code>verb</code> for particples) for overall lexeme</li>
<li>aspect / voice (for participles)</li>
<li>gender</li>
<li>Mounce category used for particular sub-paradigm (different from overall lexeme for adjectives or participles)</li>
<li>lexeme-level theme</li>
<li>case / number</li>
<li>form</li>
<li>form-specific theme</li>
<li>form-specific distinguisher</li>
<li>stem ending and suffix</li>
</ul>
<p>What&rsquo;s helpful about this format is you can use <code>awk</code>, <code>grep</code>, <code>sort</code>, <code>wc</code> and other Unix tools to very quickly extract information. (I may soon put it in SQL and expose a web interface too). So you can see all the times a particular distinguisher is used, or all the times it&rsquo;s used for a particular case / number. Or what all the sandhi rules are.</p>
<p>I&rsquo;ve already written a Python script that generates a list of paradigms based on this (keyed off Mounce category for now, until I&rsquo;ve finalized my own, which will actually be defined <em>by</em> these paradigms).</p>
<p>The paradigms look like:</p>
<div class="codehilite"><pre>n-3b(1) M (10):
NS: ξ {κ+ς}
GS: κος {κ+ος}
DS: κι {κ+ι}
AS: κα {κ+α}
NP: κες {κ+ες}
GP: κων {κ+ων}
AP: κας {κ+ας}
</pre></div>
<p>There&rsquo;s actually a feedback loop where inconsistencies and errors spotted in this paradigm output inform corrections to the underlying distinguisher rules.</p>
<p>The code and data are available at <a href="https://github.com/morphgnt/morphological-lexicon/tree/master/projects/nominal_distinguishers">https://github.com/morphgnt/morphological-lexicon/tree/master/projects/nominal_distinguishers</a>.</p>
http://jktauber.com/2015/11/12/analyzing-nominal-morphology-part-1/Analyzing Nominal Morphology: Part 12015-11-14T19:26:53Z2015-11-12T14:30:00ZJames Tauber
<p>While much of my work going back 10 years or more was on the nominals, the last few years I&rsquo;ve been focused on verbal morphology. I decided that for my SBL paper, however, I&rsquo;d revisit some of my noun work and ended up exploring some ideas afresh.</p>
<p>While much of my work going back 10 years or more was on the nominals, the last few years I&rsquo;ve been focused on verbal morphology. I decided that for my SBL paper, however, I&rsquo;d revisit some of my noun work and ended up exploring some ideas afresh.</p>
<p>By <strong>nominals</strong> I mean nouns, adjectives, determiners, pronouns, proforms, participles. Basically anything marked for case (see <a href="http://jktauber.com/2015/11/05/morphological-parts-speech-greek/">Morphological Parts of Speech in Greek</a>).</p>
<p>I wanted to, at the very least, generate <a href="http://jktauber.com/2015/11/03/distinguishers-morphology/">themes and distinguishers</a> for the nominals. But once you have that, you have a nice set up to explore stems, endings and sandhi. This is a nice interface into some of the general (i.e. not language-specific) morphology I was doing for my PhD. Finally, it enables me to get back to my long-running goal of laying out a system of inflectional classes that improves on Funk, Mounce and others.</p>
<p>You can see the work in progress at <a href="https://github.com/morphgnt/morphological-lexicon/tree/master/projects/nominal_distinguishers">https://github.com/morphgnt/morphological-lexicon/tree/master/projects/nominal_distinguishers</a>.</p>
<p>The first phase involved enumerating the possible distinguishers for each combination of case/number/gender. This was done incrementally, running a Python script that (a) showed me forms that weren&rsquo;t covered by the existing list; (b) showed me lexemes that had more than one theme. In some cases, multiple themes was a legitimate suppletion but in other cases it meant I hadn&rsquo;t gotten the theme/distinguisher split right. Because I had them in electronic form, I also used Mounce&rsquo;s inflectional classes as a hint to disambiguate distinguishers.</p>
<p>So the first phase involved creating a file that looked something like this (just a very small subset of what is currently an 851-line file):</p>
<div class="codehilite"><pre>NSM:
- ας n-1d α+ς
- ης n-1f η+ς
- ος n-2a ο+ς
- ψ n-3a\(1\) π+ς
- ψ n-3a\(2\) β+ς
- ξ n-3b\(1\) κ+ς
- ξ n-3b\(2\) γ+ς
- ξ n-3b\(3\) χ+ς
- ους n=3c\(2-OD\) οδ+ς
- ς n-3c\(1\) τ+ς
- ς n-3c\(2\) δ+ς
- ς n-3c\(3\) θ+ς
</pre></div>
<p>You&rsquo;ll notice I annotated each distinguisher with the underlying stem ending and inflectional ending. You can see I needed to use Mounce&rsquo;s codes (for now) to disambiguate distinguishers like ψ, ξ and ς. You&rsquo;ll also notice I had to invent my own temporary extensions to Mounce in the case of οδ+ς → ους because there are deliberately no sandhi rules built in to my scripts (more on that later).</p>
<p>My initial script takes the above file, runs across all forms in the MorphGNT SBLGNT are produces entries like the following:</p>
<div class="codehilite"><pre>ἀγαλλίασις:
forms:
F:
theme(s): ἀγαλλιασ
NS: ἀγαλλίασις ἀγαλλίασ|ις ϳ+ς
GS: ἀγαλλιάσεως ἀγαλλιάσ|εως ϳ+ος
DS: ἀγαλλιάσει ἀγαλλιάσ|ει ϳ+ι
</pre></div>
<p>In some (not necessarily immediately) following posts, I&rsquo;ll talk more about additional outputs and other scripts in the pipeline.</p>
<p>This mini-project is a great example of where having a deterministic verification process on manually tweaked rules works well (over, say, trying to automate the generation of the rules entirely).</p>
http://jktauber.com/2015/11/11/technical-aspects-openness/Technical Aspects of Openness2015-11-14T18:29:58Z2015-11-11T19:00:00ZJames Tauber
<p>In my <a href="http://jktauber.com/2015/11/10/why-i-use-cc-sa-licenses/">previous post</a>, I talked about the legal / licensing aspects of open linguistic data but there are technical aspects in order for linguistic data to be open too.</p>
<p>In my <a href="http://jktauber.com/2015/11/10/why-i-use-cc-sa-licenses/">previous post</a>, I talked about the legal / licensing aspects of open linguistic data but there are technical aspects in order for linguistic data to be open too.</p>
<p>To illustrate, consider an out-of-copyright, printed lexicon. From a <em>licensing</em> point of view, it&rsquo;s open—it can be redistributed with or without modifications, etc. But that doesn&rsquo;t make it particularly usable for computational work.</p>
<p>A while ago I came across something Greg Crane had written where he talked about things being <strong>machine-actionable</strong>. I like this a lot more than &ldquo;machine-readable&rdquo; because it isn&rsquo;t just about being able to &ldquo;read&rdquo; the work, it but to actually do interesting things with it.</p>
<p>There are various facets of this so I thought I&rsquo;d try to enumerate some of them.</p>
<ul>
<li><strong>correctable</strong> — can I make corrections if I find mistakes?</li>
<li><strong>verifiable</strong> — can I write code to check for errors?</li>
<li><strong>reproducible</strong> — can I reproduce the results others have found?</li>
<li><strong>extensible</strong> — can I extend it with my own data or data from other sources?</li>
<li><strong>queryable</strong> — can I search, filter, or sort the data to get subsets of interest?</li>
<li><strong>reusable</strong> — can I use the same data for multiple applications?</li>
<li><strong>repurposable</strong> — can I use the data for purposes not conceived of initially?</li>
<li><strong>adaptable</strong> — can I produce different variants of the data applicable to different users?</li>
</ul>
<p>My <a href="http://jktauber.com/2015/05/06/my-bibletech-2015-talk/">BibleTech 2015</a> talk touched on a number of these.</p>
<p>I should note that it&rsquo;s entirely possible to have works that are proprietary from a licensing point of view but completely open technically. I may be able to purchase a database that I can&rsquo;t redistribute but which is in a clean, consistent format I can write software to process. It has the disadvantage that I can&rsquo;t make corrections available to others or redistribute derivative works, but it&rsquo;s better than a closed-license work that&rsquo;s also closed with regard to facets discussed in this post.</p>
http://jktauber.com/2015/11/10/why-i-use-cc-sa-licenses/Why I Use CC-BY-SA Licenses2015-11-10T23:57:35Z2015-11-10T23:57:35ZJames Tauber
<p>I don&rsquo;t think I&rsquo;ve ever articulated why I favour a Creative Commons CC-BY-SA license on all my New Testament Greek data.</p>
<p>I don&rsquo;t think I&rsquo;ve ever articulated why I favour a Creative Commons CC-BY-SA license on all my New Testament Greek data.</p>
<p>I don&rsquo;t mean why do open scholarship in general, but why my specific choice of Attribution-ShareAlike?</p>
<p>I avoid NoDerivs (<strong>ND</strong>) because I <em>want</em> people to build on my work, make corrections, add new analyses.</p>
<p>I use ShareAlike (<strong>SA</strong>), though, because I want to be able to incorporate corrections and new analyses back and want to avoid private forks of projects. Note that when it comes to software, I generally favour MIT/BSD-style licenses that aren&rsquo;t viral. But when it comes to data and analyses, I want the openness to the viral.</p>
<p>Perhaps more controversially, I avoid NonCommercial (<strong>NC</strong>). My reason is simple: I don&rsquo;t want someone who wants to use my work in a commercial package to have to waste time reinventing the wheel and redoing everything just so they can use it. Duplication of effort doesn&rsquo;t help anyone. Because of the ShareAlike, a commerical project can&rsquo;t make private forks. I don&rsquo;t care if someone is making money as long as improvements they make to my work are shared back.</p>
<p>Creative Commons doesn&rsquo;t have a license that requires ShareAlike but not Attribution but, even if they did, I&rsquo;d use Attribution (<strong>BY</strong>). Particularly in scholarship, I think it&rsquo;s important to give credit where credit is due. Plus having a chain of who did the work is useful for providing corrections upstream.</p>
<p>My arguments for using ShareAlike and Attribution are why I don&rsquo;t like just putting things in the &ldquo;public domain&rdquo; / under a CC0 license. (Incidentally, I put &ldquo;public domain&rdquo; in quotes because it&rsquo;s an ill-defined concept, which is why the CC0 license was developed in the first place. Even if you&rsquo;re not persuaded by my arguments for BY-SA, at least use CC0 rather than saying &ldquo;public domain&rdquo;).</p>
<p>Finally, I&rsquo;d be remiss if I didn&rsquo;t acknowledge the great work of the <a href="http://creativecommons.org">Creative Commons</a> organization in making all this possible.</p>
http://jktauber.com/2015/11/09/mean-log-frequency-dependency-paths/Mean Log Frequency of Dependency Paths2015-11-10T02:34:50Z2015-11-09T19:00:00ZJames Tauber
<p>Adding another potential readbility metric, let&rsquo;s look at the mean log frequency of dependency paths.</p>
<p>Adding another potential readbility metric, let&rsquo;s look at the mean log frequency of dependency paths.</p>
<p>So far we&rsquo;ve looked at the <a href="http://jktauber.com/2015/10/27/mean-log-frequency-lexemes/">mean log frequency of lexemes</a>, the <a href="http://jktauber.com/2015/11/04/mean-log-frequency-forms/">mean log frequency of forms</a>, and, after calculating <a href="http://jktauber.com/2015/10/28/dependency-paths/">dependency paths</a> or &ldquo;swords&rdquo;, the <a href="http://jktauber.com/2015/10/29/mean-dependency-depth/">mean dependency depth</a>.</p>
<p>What we haven&rsquo;t looked at is the mean log frequency of those dependency paths—a rough proxy for a target having common (rather than merely shallow) syntactic structures.</p>
<p>By this measure, the top five (i.e. lowest scoring) books are:</p>
<div class="codehilite"><pre>4832 1 Corinthians
4929 3 John
4935 1 John
4938 John
5027 James
</pre></div>
<p>and the top 10 chapters are:</p>
<div class="codehilite"><pre>4183 1 Corinthians 13
4362 1 Corinthians 9
4386 1 Corinthians 14
4485 Romans 14
4486 John 16
4550 1 John 3
4558 2 Corinthians 11
4564 1 Corinthians 6
4566 1 Corinthians 7
4576 John 7
</pre></div>
<p>It is interesting just how much 1 Corinthians features here. The book (and those chapters featured above) do poorly in terms of mean log frequency of lexemes.</p>
<p>If 1 Corinthians is actually <em>syntactically</em> easy to read, I wonder if that&rsquo;s an argument for having some readings which, because of vocab, need to be heavily footnoted with glosses but which are still worth reading early because of the syntax.</p>
http://jktauber.com/2015/11/08/half-way-point/At the Half Way Point2015-11-10T01:50:43Z2015-11-08T19:00:00ZJames Tauber
<p>Exactly two weeks ago I said I&rsquo;d be blogging every day until my talk at SBL. Well, that&rsquo;s two weeks away so I&rsquo;m at the half way point. I think the blogging has gone well.</p>
<p>Exactly two weeks ago I said I&rsquo;d be blogging every day until my talk at SBL. Well, that&rsquo;s two weeks away so I&rsquo;m at the half way point. I think the blogging has gone well.</p>
<p>Many of the posts have been things I&rsquo;ve had drafts of for a while. Others have been ideas that haven&rsquo;t taken long to get down in a post. Attempting to blog every day means I haven&rsquo;t really worked on posts that represent multiple days much less weeks or months of work.</p>
<p>In the next two weeks I do hope to talk about a few longer-running projects but, that said, I do enjoy getting down an idea or concept that&rsquo;s just a short post but which has been on my mind for years.</p>
<p>Thanks to the people who have so far engaged with my posts via email and elsewhere. My interactions with you are a huge motivation for me doing this.</p>
http://jktauber.com/2015/11/07/generating-readers/Generating Readers2015-11-08T17:00:30Z2015-11-07T19:00:00ZJames Tauber
<p>Back in April 2014, Brian Renshaw posted a <a href="http://www.brianrenshaw.com/blog/2014/4/18/a-good-friday-greek-reader-john-18-19">Good Friday Greek Reader</a>. It was presumably manually produced but I knew such things could be generated automatically and so went about building a system to do so.</p>
<p>Back in April 2014, Brian Renshaw posted a <a href="http://www.brianrenshaw.com/blog/2014/4/18/a-good-friday-greek-reader-john-18-19">Good Friday Greek Reader</a>. It was presumably manually produced but I knew such things could be generated automatically and so went about building a system to do so.</p>
<p>You can see a sample PDF at <a href="https://github.com/jtauber/greek-reader/blob/master/example/reader.pdf">https://github.com/jtauber/greek-reader/blob/master/example/reader.pdf</a> which roughly looks like what Brian produced.</p>
<p>From a code point of view, it&rsquo;s a fairly simple Python 3 script that generates LaTeX that is then typeset using XeTeX. There is also an experimental backend using SILE. The code is open source under an MIT license and is available at <a href="https://github.com/jtauber/greek-reader">https://github.com/jtauber/greek-reader</a>. It assumes you&rsquo;re comfortable with those tools and editing text files to tweak things, but my hope is eventually a website could be built around this.</p>
<p>To produce a reader like this, whether manually or automatically, you need:</p>
<ol>
<li>a text</li>
<li>lemmatization</li>
<li>frequency counts</li>
<li>glosses</li>
<li>full citation forms / headwords (e.g λαμπάς, άδος, ἡ) for nominals</li>
<li>parsing (e.g. AAI 3S) for verbs</li>
</ol>
<p>MorphGNT gave me 1, 2, 3 and 6. 4 came from Dodson (although you can override both globally and per verse) and 5 came from Danker&rsquo;s Concise Lexicon.</p>
<p>What&rsquo;s nice about doing this programmatically, besides that fact you can make corrections upstream and have them applied to all the generated readers is that you can <strong>make this adaptive</strong>. In the example, I chose which words to annotate based on frequency but it could just as easily be based on other criteria such as what a particular student has learnt up to this point or what has been covered in a particular textbook up to this point.</p>
<p>One major feature I want to add, though, is richer annotation both morphologically AND syntactically so it becomes possible to generate something more akin to Zerwick and Gosvenor&rsquo;s <em>A Grammatical Analysis of the Greek New Testament</em>.</p>
<p>One major motivation for my continuing work on a <em>Morphological Lexicon</em> is being able to provide more focused, helpful annotations for readers indicating not just a lemma but a principal part or some additional information that helps the student understand the form.</p>
<p>For the syntax, I&rsquo;d like to eventually develop a catalog of constructions so, much like forms are only annotated if they are less frequent (or otherwise unknown to the student), particular syntactic constructions in a text can be called out based on similar criteria. Some of this is possible with existing syntactic analyses, the trick is knowing which annotations to include and which are already obvious. (I have some ideas for how to crowdsource difficult constructions, but more on that later).</p>
<p>The <strong>greek-reader</strong> project is a great example of a pretty simple tool that can do a lot because it builds on rich data. As we get better and better data, we can build better and better tools.</p>
http://jktauber.com/2015/11/06/inline-annotation-sandhi/Inline Annotation of Sandhi2015-11-06T20:30:32Z2015-11-06T20:30:32ZJames Tauber
<p>In many Greek morphology projects, I&rsquo;ve wanted a way of conveying the surface form of an inflected word while also conveying the underlying components prior to the application of the sandhi rule. A couple of years ago, I came up with a simple representation for inline annotation.</p>
<p>In many Greek morphology projects, I&rsquo;ve wanted a way of conveying the surface form of an inflected word while also conveying the underlying components prior to the application of the sandhi rule. A couple of years ago, I came up with a simple representation for inline annotation.</p>
<p>Say you want to convey the fact that φιλοῦμεν comes from φιλε + ομεν by application of the rule that ε + ο → ου. In the representation I&rsquo;ve been using you&rsquo;d write <code>φιλ|ε&gt;ου&lt;ο|μεν</code>.</p>
<p>This enables you to see the stem and affix easily but also the result of sandhi.</p>
<p>So what <code>A|B&gt;C&lt;D|E</code> means is there is a sandhi rule that B + D → C and that rule has been applied in AB + DE to form ACE.</p>
<p>Using Stump&rsquo;s terminology introduced in a <a href="http://jktauber.com/2015/11/03/distinguishers-morphology/">previous post</a>:</p>
<ul>
<li>A / φιλ is the <strong>theme</strong></li>
<li>CE / ουμεν is the <strong>distinguisher</strong></li>
<li>AB / φιλε is the <strong>stem</strong></li>
<li>DE / ομεν is the <strong>affix</strong></li>
</ul>
<p>It also means that you can search for <code>|B&gt;C&lt;D|</code> to find where that particular sandhi rule has been applied.</p>
http://jktauber.com/2015/11/05/morphological-parts-speech-greek/Morphological Parts of Speech in Greek2015-11-05T21:13:09Z2015-11-05T17:37:35ZJames Tauber
<p>The parts of speech in a particular language can be drawn up on the basis of syntactic properties, morphological properties, and/or (perhaps most problematically) semantic properties.</p>
<p>What if we just want to classify lexemes in the MorphGNT based on what morphosynactic and morphosemantic features they have?</p>
<p>The parts of speech in a particular language can be drawn up on the basis of syntactic properties, morphological properties, and/or (perhaps most problematically) semantic properties.</p>
<p>What if we just want to classify lexemes in the MorphGNT based on what morphosynactic and morphosemantic features they have?</p>
<p>Minimally, we might get something like this:</p>
<table class="table">
<tr><th>case<th>person<th>aspect<th>
<tr><td align="center"><big>-</big><td align="center"><big>-</big><td align="center"><big>-</big><td><i>conjunctions, adverbs, interjections, prepositions, particles, indeclinable nouns and adjectives</i>
<tr><td align="center"><big>+</big><td align="center"><big>-</big><td align="center"><big>-</big><td><i>nouns, adjectives, pronouns, articles</i>
<tr><td align="center"><big>-</big><td align="center"><big>-</big><td align="center"><big>+</big><td><i>infinitives</i>
<tr><td align="center"><big>+</big><td align="center"><big>-</big><td align="center"><big>+</big><td><i>participles</i>
<tr><td align="center"><big>-</big><td align="center"><big>+</big><td align="center"><big>+</big><td><i>finite verbs</i>
</table>
<p>We could consider voice, but it co-occurs with aspect, so its value is predictable.</p>
<p>Mood only appears in finite verbs, which means it&rsquo;s also predictable (arguably, co-occurent with person but see below).</p>
<p>Number is predictable as it co-occurs with case or person.</p>
<p>As things stand above, gender is also predictable (it co-occurs with case).</p>
<p>However, let&rsquo;s consider the distinction between the 1st/2nd person pronouns on the one hand and the proforms on the other.</p>
<p>(There are strong arguments beyond just morphology for distinguishing the (1st/2nd person) personal pronouns and proforms. See Bhat&rsquo;s book <em>Pronouns</em> for cross-linguistic arguments for the distinction.)</p>
<p>The 1st/2nd person pronouns, unlike the proforms, don&rsquo;t inflect for gender. So let&rsquo;s add gender to the mix:</p>
<table class="table">
<tr><th>case<th>person<th>gender<th>aspect<th>
<tr><td align="center"><big>-</big><td align="center"><big>-</big><td align="center"><big>-</big><td align="center"><big>-</big><td><i>conjunctions, adverbs, interjections, prepositions, particles, indeclinable nouns and adjectives</i>
<tr><td align="center"><big>+</big><td align="center"><big>?</big><td align="center"><big>-</big><td align="center"><big>-</big><td><i>1st/2nd person personal pronouns</i>
<tr><td align="center"><big>+</big><td align="center"><big>-</big><td align="center"><big>+</big><td align="center"><big>-</big><td><i>nouns, adjectives, proforms, articles</i>
<tr><td align="center"><big>-</big><td align="center"><big>-</big><td align="center"><big>-</big><td align="center"><big>+</big><td><i>infinitives</i>
<tr><td align="center"><big>+</big><td align="center"><big>-</big><td align="center"><big>+</big><td align="center"><big>+</big><td><i>participles</i>
<tr><td align="center"><big>-</big><td align="center"><big>+</big><td align="center"><big>-</big><td align="center"><big>+</big><td><i>finite verbs</i>
</table>
<p>The <big>?</big> under person for the personal pronouns is because they don&rsquo;t really <em>inflect</em> for person. Person is lexical in the personal pronouns.</p>
<p>Interestingly, though, if we <em>do</em> give it a <big>+</big> then we don&rsquo;t need gender to distinguish the category.</p>
<p>You may wonder what about <em>degree</em>. I&rsquo;m currently of the inclination that degree is better modeled derivationally rather than inflectionally, although that&rsquo;s worthy of a separate post.</p>
http://jktauber.com/2015/11/04/mean-log-frequency-forms/Mean Log Frequency of Forms2015-11-04T15:47:15Z2015-11-04T15:47:15ZJames Tauber
<p>In <a href="http://jktauber.com/2015/10/27/mean-log-frequency-lexemes/">a previous post</a>, we looked at which chapters had the highest mean log frequency of lexemes. The code provided there was applicable to other items, though, so let&rsquo;s now take a look at mean log frequency of <strong>forms</strong>.</p>
<p>In <a href="http://jktauber.com/2015/10/27/mean-log-frequency-lexemes/">a previous post</a>, we looked at which chapters had the highest mean log frequency of lexemes. The code provided there was applicable to other items, though, so let&rsquo;s now take a look at mean log frequency of <strong>forms</strong>.</p>
<p>The code change is a simple change to one line.</p>
<p>The top 10 are:</p>
<div class="codehilite"><pre>6277 2304 449
6373 2305 429
6500 2302 585
6558 0403 657
6562 2303 467
6596 1001 401
6600 0408 905
6617 2301 207
6640 0702 287
6646 2720 406
</pre></div>
<p>In other words:</p>
<ul>
<li>1 John 4 (also 1st for lexemes)</li>
<li>1 John 5 (also 2nd for lexemes)</li>
<li>1 John 2 (8th for lexemes)</li>
<li>John 3 (9th for lexemes)</li>
<li>1 John 3 (7th for lexemes)</li>
<li>Ephesians 1 (11th for lexemes)</li>
<li>John 8 (6th for lexemes)</li>
<li>1 John 1 (4th for lexemes)</li>
<li>1 Corinthians 2 (32nd for lexemes)</li>
<li>Revelation 20 (14th for lexemes)</li>
</ul>
<p>Generally form frequency will track pretty closely with lexeme frequency because a form being common makes the lexeme common. This makes 1 Corithinians 2 interesting.</p>
<p>Frequent words and forms obviously doesn&rsquo;t necessarily mean shallow syntax, though. 1 John 4, 5 and 2 are respectively the 36th 67th and 38th by mean dependency depth. There are no chapters that are in the top ten of both mean log form frequency AND mean dependency depth.</p>
<p>So we now have mean log frequences for lexemes and forms as well as mean dependency depth. In future posts, I&rsquo;ll add parse codes and the actual dependency path to the mix and then we can look at combining all five metrics. I&rsquo;ll also look at paragraphs rather than chapters as targets.</p>
http://jktauber.com/2015/11/03/distinguishers-morphology/Distinguishers in Morphology2015-11-03T23:39:59Z2015-11-03T23:38:17ZJames Tauber
<p>A few years ago, I was introduced by Greg Stump to the notion of <strong>distinguishers</strong> in morphological description. The analysis of inflected forms in terms of theme + distinguisher is a very helpful concept and one that is made use extensively in my ongoing work on New Testament Greek morphology.</p>
<p>A few years ago, I was introduced by Greg Stump to the notion of <strong>distinguishers</strong> in morphological description. The analysis of inflected forms in terms of theme + distinguisher is a very helpful concept and one that is made use extensively in my ongoing work on New Testament Greek morphology.</p>
<p>Take a word like φιλοῦμεν. The underlying stem is φιλε and the suffix is ομεν. The sandhi rule ε + ο → ου has been applied.</p>
<p>So in the surface form of the word, the φιλ is <em>part</em> but not <em>all</em> of the stem. It&rsquo;s the part that will likely (unless there is suppletion) be common with other cells in the paradigm. Similarly οῦμεν is not the suffix but it is the part that is indicating &ldquo;first person plural&rdquo; (as well as indicating that the stem likely ends in ε or ο).</p>
<p>Stump calls φιλ the <strong>theme</strong> and οῦμεν the <strong>distinguisher</strong>. The <strong>theme</strong> is what the cells in a paradigm have in common, the <strong>distinguisher</strong> is what distinguishes them from one another.</p>
<p>SPOILER ALERT: I&rsquo;m working on a full theme/distinguisher and stem/suffix analysis of every inflected form in the Greek New Testament as part of my <em>Morphological Lexicon of New Testament Greek</em>.</p>
http://jktauber.com/2015/11/02/atom-editor-11-fixes-polytonic-greek-bug/Atom Editor 1.1 Fixes Polytonic Greek Bug2015-11-03T10:52:20Z2015-11-02T17:00:00ZJames Tauber
<p>Release 1.1 of GitHub&rsquo;s Atom Editor fixes a problem I had with using it for polytonic Greek.</p>
<p>Release 1.1 of GitHub&rsquo;s Atom Editor fixes a problem I had with using it for polytonic Greek.</p>
<p>I was an early adoptor of <a href="https://atom.io">Atom Editor</a> despite some initial rough edges. I now use it for all my development, including Greek-related stuff talked about on this blog—not just code but data files as well.</p>
<p>Most of the rough edges got sorted out early on and certainly before the 1.0 release but there was one problem, highly relevant to this blog, that persisted.</p>
<p>Basically, Atom was miscalculating the width of characters formed from Unicode combining characters which made it quite difficult to work with text files containing polytonic Greek.</p>
<p>You can see the problem in this screenshot:</p>
<p><img width="100%" src="http://jktauber.com/site_media/media/images/2015/11/03/before.png"></p>
<p>Notice that the existence of diacritics on the alpha at the end of some of the lines actually changes the width of preceding characters, even though a fixed-width font is being used. As well as just looking weird, it made files difficult to use as the cursor position didn&rsquo;t correspond visually to where typing would occur.</p>
<p>I filed a <a href="https://github.com/atom/atom/issues/5975">bug report</a> back in March and was disappointed a fix didn&rsquo;t make the Atom 1.0 release. But once I found out what was involved in fixing it (it didn&rsquo;t just affect polytonic Greek but a lot of non-ASCII use cases) I was impressed. If you want the raw details, see <a href="https://github.com/atom/atom/pull/6083">here</a> and <a href="https://github.com/atom/atom/pull/8811">here</a>.</p>
<p>A couple of weeks ago Atom 1.1 came out and it includes all that work that (amongst other things) fixes the bug I filed.</p>
<p>Now it works perfectly:</p>
<p><img width="100%" src="http://jktauber.com/site_media/media/images/2015/11/03/after.png"></p>
http://jktauber.com/2015/11/01/renaming-non-indicative-tense-forms/Renaming Non-Indicative Tense-Forms2015-11-03T01:51:56Z2015-11-01T19:00:00ZJames Tauber
<p>I think it&rsquo;s confusing that we name the non-indicative tense-forms with the same terms as indicative tense-forms. For example &ldquo;present indicative&rdquo; and &ldquo;present infinitive&rdquo;. The word &ldquo;present&rdquo; doesn&rsquo;t mean the same thing in both cases.</p>
<p>I think it&rsquo;s confusing that we name the non-indicative tense-forms with the same terms as indicative tense-forms. For example &ldquo;present indicative&rdquo; and &ldquo;present infinitive&rdquo;. The word &ldquo;present&rdquo; doesn&rsquo;t mean the same thing in both cases.</p>
<p>When there is a past/non-past alternation in Greek (e.g. imperfect/present or pluperfect/perfect), only one of the pair is possible in non-indicatives.</p>
<p>The reason for this is simple: only the indicative mood makes a past/non-past distinction. In other cases, only aspect is conveyed.</p>
<p>But this is undermined when we then go and choose for the non-indicative, &ldquo;aspect only&rdquo; forms the same terms that, in the indicative mood, are specifically conveying a non-past tense.</p>
<p>It would be far better to use a term with the non-indicatives that conveys <em>only</em> the aspect.</p>
<p>&ldquo;Imperfective&rdquo; and &ldquo;perfective&rdquo; are obvious choices instead of &ldquo;present&rdquo; and &ldquo;aorist&rdquo; respectively (although it&rsquo;s not clear what we&rsquo;d use for the perfect or future non-indicatives).</p>
<p>The same issue arises in discussion of &ldquo;systems&rdquo; and &ldquo;stems&rdquo;. Rather than the &ldquo;present system&rdquo; or the &ldquo;present stem&rdquo; should we instead talk about the &ldquo;imperfective system&rdquo; and &ldquo;imperfective stem&rdquo; in Greek?</p>
<p>If we use &ldquo;perfective stem&rdquo; rather than &ldquo;aorist stem&rdquo; we avoid the asymmetry of talking about an augmented/un-augmented aorist stem but not (or at least not without some awkwardness) an augmented/un-augmented present stem. (One might be forgiven for thinking Greek involves a morphological process of <em>removing</em> an augment if some descriptions of the aorist/perfective system are to be believed.)</p>
<p>Of course even in the above, there is the confusing use of terminology for what to call the bundle of aspect and tense.</p>
<p>Sometimes the bundles themselves are called &ldquo;tenses&rdquo; and the tense axis (as opposed to aspect) is referred to as &ldquo;time&rdquo;.</p>
<p>Sometimes the bundles are called &ldquo;tense-forms&rdquo;, which I think is better but still slightly confusing as that should really be &ldquo;tense-aspect-forms&rdquo; or, perhaps, &ldquo;aspect-tense-forms&rdquo;.</p>
<p>As an aside: the use of &ldquo;form&rdquo; is interesting as it places the bundling squarely in the realm of form, not meaning. In other words, even though the realization involves cumulative exponence (to adopt the terminology of Matthews), the meaning is just the union of the tense and aspect.</p>
<p>All of this plays into morphological tagging as well. I&rsquo;ve suggested for the <a href="https://github.com/morphgnt/sblgnt/wiki/Proposal-for-a-New-Tagging-Scheme">rethink of the parse codes in MorphGNT 7</a> that tense and aspect be split into two features.</p>
http://jktauber.com/2015/10/31/experimental-rest-api-morphgnt/An Experimental REST API to MorphGNT2015-11-02T23:44:09Z2015-10-31T19:00:00ZJames Tauber
<p>Back in July, I thought I&rsquo;d prototype a REST API for MorphGNT with resources for books, paragraphs, sentences, verses and words.</p>
<p>Back in July, I thought I&rsquo;d prototype a REST API for MorphGNT with resources for books, paragraphs, sentences, verses and words.</p>
<p>The prototype is available on <a href="http://api.morphgnt.org/">http://api.morphgnt.org/</a> and the underlying code <a href="https://github.com/morphgnt/morphgnt-api">here</a>.</p>
<p>The API exposes in JSON not only the normal MorphGNT data but also the paragraphs from the SBLGNT proper, the sentence divisions from the GBI syntax analysis AND the dependency relationships discussed in <a href="http://jktauber.com/2015/07/02/converting-gbi-syntax-trees-dependency-analysis/">Converting the GBI Syntax Trees to a Dependency Analysis</a>. So for now, at least, it&rsquo;s the only place you can get all that info.</p>
<p>The prototype is currently served up using Django hitting a PostgreSQL database but it would be possible to just generate the roughly 150,000 JSON files once and serve them up from a CDN.</p>
<p>There&rsquo;s only one thing using the API that I know of at the moment and that&rsquo;s the <a href="http://jktauber.com/labs/morphgnt-api-reader.html">lab on this site</a>. It doesn&rsquo;t make use of a lot of the rich word-level information but it does demo how you can navigate through paragraphs of the GNT purely using the links in a book&rsquo;s <code>first_paragraph</code> or a paragraph&rsquo;s <code>prev</code> and <code>next</code> fields.</p>
<p>Note that the <code>/v0/</code> prefix is used in URLs because there is no commitment to keep this API. It is subject to rapid change at the moment.</p>
<p>The URI patterns are:</p>
<div class="codehilite"><pre>/v0/root.json
/v0/book/{osis_id}.json
/v0/paragraph/{paragraph_id}.json
/v0/sentence/{sentence_id}.json
/v0/verse/{verse_id}.json
/v0/word/{word_id}.json
</pre></div>
<p>A word (currently) looks something like this:</p>
<div class="codehilite"><pre>{
@id: &quot;/v0/word/64001001005.json&quot;,
@type: &quot;word&quot;,
verse_id: &quot;/v0/verse/640101.json&quot;,
sentence_id: &quot;/v0/sentence/640001.json&quot;,
paragraph_id: &quot;/v0/paragraph/64001.json&quot;,
crit_text: &quot;λόγος,&quot;,
text: &quot;λόγος,&quot;,
word: &quot;λόγος&quot;,
norm: &quot;λόγος&quot;,
lemma: &quot;λόγος&quot;,
pos: &quot;N&quot;,
case: &quot;N&quot;,
number: &quot;S&quot;,
gender: &quot;M&quot;,
dep_type: &quot;S&quot;,
head: &quot;/v0/word/64001001002.json&quot;
}
</pre></div>
<p>A verse (currently) looks something like this:</p>
<div class="codehilite"><pre>{
@id: &quot;/v0/verse/640101.json&quot;,
@type: &quot;verse&quot;,
prev: null,,
next: &quot;/v0/verse/640102.json&quot;,
book: &quot;/v0/book/John.json&quot;,
words: [...]
}
</pre></div>
<p>where <code>words</code> is a list of objects like the word above.</p>
<p>A paragraph and sentence are very similar to a verse (with an <code>@id</code>, <code>@type</code>,
<code>prev</code>, <code>next</code>, <code>book</code> and <code>words</code> list).</p>
<p>A book (currently) looks something like this:</p>
<div class="codehilite"><pre>{
&quot;@id&quot;: &quot;/v0/book/1Cor.json&quot;,
&quot;@type&quot;: &quot;book&quot;,
&quot;name&quot;: &quot;1 Corinthians&quot;,
root: &quot;/v0/root.js&quot;,
&quot;first_paragraph&quot;: &quot;/v0/paragraph/67001.json&quot;,
&quot;first_verse&quot;: &quot;/v0/verse/670101.json&quot;,
&quot;first_sentence&quot;: &quot;/v0/sentence/670001.json&quot;
}
</pre></div>
<p>Feedback is greatly appreciated to make this more useful. I&rsquo;d particularly like to work with some front-end developers to do some more complex demos built on the API.</p>
http://jktauber.com/2015/10/29/mean-dependency-depth/Mean Dependency Depth2015-10-29T09:14:07Z2015-10-29T09:14:07ZJames Tauber
<p>With dependency paths calculated for the Greek New Testament, we can use mean dependency depth as a proxy for syntactic complexity.</p>
<p>With dependency paths calculated for the Greek New Testament, we can use mean dependency depth as a proxy for syntactic complexity.</p>
<p>In <a href="http://jktauber.com/2015/10/27/mean-log-frequency-lexemes/">Mean Log Frequency of Lexemes</a> I mentioned that, as well as mean log word frequency, reading comprehension measures such as the Lexile® framework use average sentence length. Now that we have <a href="http://jktauber.com/2015/10/28/dependency-paths/">Dependency Paths</a> calculated, we can explore potentially more useful proxies for syntactic complexity.</p>
<p>As an initial experiment, we&rsquo;ll simply take the mean dependency depth of each target where our targets are chapters and by &ldquo;dependency depth&rdquo; I simply mean the number of labels in the dependency path. In other words <code>np-O-CL-CL</code> will count as 4 and we&rsquo;ll just average across all the words in each chapter.</p>
<p>An initial run reveals one interesting problem. Luke 3 is given a considerably higher score than anything else because of the analysis of the genealogy (A the son of B the son of C&hellip;and so on, leads to very long paths). Reading that genealogy is arguably not that taxing syntactically which highlights one flaw in the dependency depth approach (or, perhaps the analysis chosen for the genealogy).</p>
<p>This aside, let&rsquo;s look at what this measure identifies as easiest chapters:</p>
<div class="codehilite"><pre>2685 67009
2715 67006
2746 66014
2831 67014
2840 66013
2840 69005
2841 67007
2869 66007
2888 67016
2892 69003
</pre></div>
<p>Interestingly, the top 10 chapters for lowest mean dependency depth are all in Romans, 1 Corinthians and Galatians.</p>
<p>If we average, instead, across entire books, the top ten are:</p>
<ul>
<li>3 John</li>
<li>1 Corinthians</li>
<li>1 John</li>
<li>James</li>
<li>Galatians</li>
<li>John</li>
<li>Romans</li>
<li>Matthew</li>
<li>Mark</li>
<li>2 John</li>
</ul>
<p>which is perhaps a little less surprising.</p>
<p>The <em>hardest</em> chapters, Luke 3 aside, are the first chapters of Ephesians, 2 Timothy and Colossians, which probably isn&rsquo;t much of a surprise either. The hardest books overall are Ephesians and Colossians.</p>
<p>The code is available <a href="https://gist.github.com/jtauber/16631ec63e6657f9a423">here</a> (tweak line 13 to get book-level stats).</p>
<p>Note, this all may be quite sensitive to the choice of analysis. It would be an interesting exercise to see, for example, what the PROIEL dependency analysis yields.</p>
<p>In future posts, we&rsquo;ll try a few more measures and then try to bring them together to see how chapters (or books, or authors) compare across multiple criteria.</p>
http://jktauber.com/2015/10/28/dependency-paths/Dependency Paths2015-10-28T04:11:21Z2015-10-28T04:07:16ZJames Tauber
<p>For numerous corpus linguistics applications, it&rsquo;s useful to have a word-level indication of syntax. A presentation by Vanessa and Robert Gorman gave me the idea of using dependency paths for this purpose so I&rsquo;ve now calculated them for the GNT based on the GBI syntax trees.</p>
<p>For numerous corpus linguistics applications, it&rsquo;s useful to have a word-level indication of syntax. A presentation by Vanessa and Robert Gorman gave me the idea of using dependency paths for this purpose so I&rsquo;ve now calculated them for the GNT based on the GBI syntax trees.</p>
<p>The presentation by the Gormans was entitled <a href="http://sites.tufts.edu/perseusupdates/events/dcne/greek-historiography-through-dependency-syntax-treebanking/">Greek Historiography Through Dependency Syntax Treebanking</a> and they refer to the dependency paths as &ldquo;syntactic words&rdquo; or &ldquo;swords&rdquo; for short.</p>
<p>While their particular interest is authorship, the Gormans make an excellent point about the value of these dependency paths:</p>
<blockquote>
<p>The chief advantage of recasting dependencies as syntax words is that they are immediately valuable: with trivial modifications such texts can be put into standard text-processing software to produce type-token ratios, word frequency histograms, etc., providing detailed syntactic information about individual authors.</p>
</blockquote>
<p>I&rsquo;ve previously written about <a href="http://jktauber.com/2015/07/02/converting-gbi-syntax-trees-dependency-analysis/">Converting the GBI Syntax Trees to a Dependency Analysis</a> so it&rsquo;s just a small step to producing dependency paths.</p>
<p>So if we take the output for the first part of John 3.16 from this dependency conversion:</p>
<div class="codehilite"><pre>64003016001 Οὕτως 64003016003 ADV
64003016002 γὰρ 64003016003 conj
64003016003 ἠγάπησεν None CL
64003016004 ὁ 64003016005 det
64003016005 θεὸς 64003016003 S
64003016006 τὸν 64003016007 det
64003016007 κόσμον 64003016003 O
64003016008 ὥστε 64003016013 conj
64003016009 τὸν 64003016010 det
64003016010 υἱὸν 64003016013 O
64003016011 τὸν 64003016012 det
64003016012 μονογενῆ 64003016010 np
64003016013 ἔδωκεν, 64003016003 CL
</pre></div>
<p>we can easily build up the dependency paths / swords:</p>
<div class="codehilite"><pre>64003016001 Οὕτως ADV-CL
64003016002 γὰρ conj-CL
64003016003 ἠγάπησεν CL
64003016004 ὁ det-S-CL
64003016005 θεὸς S-CL
64003016006 τὸν det-O-CL
64003016007 κόσμον O-CL
64003016008 ὥστε conj-CL-CL
64003016009 τὸν det-O-CL-CL
64003016010 υἱὸν O-CL-CL
64003016011 τὸν det-np-O-CL-CL
64003016012 μονογενῆ np-O-CL-CL
64003016013 ἔδωκεν, CL-CL
</pre></div>
<p>So it will tell you that μονογενῆ is qualifying the object of a subordinate clause (at least according to the GBI analysis). We&rsquo;ve thrown away the noun it&rsquo;s modifying (υἱὸν) and the verb in the subordinate clause it&rsquo;s the object of (ἔδωκεν) and the verb in the main clause (ἠγάπησεν), but <code>np-O-CL-CL</code> is a decent label for its syntactic role as qualifying the object of a subordinate clause.</p>
<p>The code I used is available <a href="https://gist.github.com/jtauber/676c7030d9b56f3e6acf">here</a>.</p>
http://jktauber.com/2015/10/25/blogging-every-day-between-now-sbl-annual-meeting/Blogging Every Day Between Now and SBL Annual Meeting2015-10-27T23:43:11Z2015-10-25T11:51:46ZJames Tauber
<p>It&rsquo;s exactly four weeks until I&rsquo;m presenting at the SBL Annual Meeting in Atlanta. As I have a long backlog of posts I&rsquo;ve wanted to do for a while, I thought I might try to blog every day between now and my talk on November 22nd.</p>
<p>It&rsquo;s exactly four weeks until I&rsquo;m presenting at the SBL Annual Meeting in Atlanta. As I have a long backlog of posts I&rsquo;ve wanted to do for a while, I thought I might try to blog every day between now and my talk on November 22nd.</p>
<p>As well as motivating me to finish up some posts and also get some other ideas down in writing, I also hope the blogging will get people more interested in what I&rsquo;m going to be talking about at the SBL meeting and lay a foundation for some conversations I hope to have with people while there.</p>
http://jktauber.com/2015/10/27/mean-log-frequency-lexemes/Mean Log Frequency of Lexemes2015-10-27T23:40:52Z2015-10-27T23:40:52ZJames Tauber
<p>One component of many readability measures on texts is the mean log word frequency. Here I do a basic calculation across chapters in the Greek New Testament (with code provided).</p>
<p>One component of many readability measures on texts is the mean log word frequency. Here I do a basic calculation across chapters in the Greek New Testament (with code provided).</p>
<p>Usually, the mean log word frequency is used in conjunction with something like the log mean sentence length (for example in the Lexile® framework). The latter is used as a proxy for syntactic complexity but, having a syntactic analysis, I think we can do better and I&rsquo;ll explore that in a future post.</p>
<p>For now, though, I wanted to get a per-chapter measure just based on mean log frequency of lexemes.</p>
<p>The code is available <a href="https://gist.github.com/jtauber/8e9156b34f452ea4cd89">here</a>. It&rsquo;s easy to adjust the targets (by default chapters, specified on line 14) and the items (by default lexemes, specified on line 15).</p>
<p>The result of running the script is something like this:</p>
<div class="codehilite"><pre>6153 0101 436
5757 0102 457
5471 0103 331
5487 0104 428
5437 0105 821
5532 0106 648
</pre></div>
<p>where the first column is -1000 times the mean log frequency (so the higher, the harder to read), the second column is the book and chapter number and the third column is just the number of word tokens in that chapter.</p>
<p>If we sort this output, we should get a list of the easiest chapters to read (at least by the measure of mean log lexeme frequency):</p>
<div class="codehilite"><pre>4704 2304 449
4746 2305 429
4926 0417 498
4949 2301 207
4973 0414 577
5025 0408 905
5036 2303 467
5044 2302 585
5080 0403 657
5090 2710 291
</pre></div>
<p>It is perhaps not surprising that the easiest chapters are from 1John and John&rsquo;s gospel (with Rev 10 coming it at number 10).</p>
<p>It will be interesting to see if we get similar results once we factor in some measure of syntactic complexity.</p>
<p>Incidentally, the most difficult chapter to read based on mean log lexeme frequency is 2 Peter 2 although 1 Timothy and Titus feature quite a bit in the most difficult ten chapters as well.</p>
http://jktauber.com/2015/10/26/updated-vocabulary-coverage-statistics/Updated Vocabulary Coverage Statistics2015-10-26T11:37:46Z2015-10-26T11:37:46ZJames Tauber
<p>In various mailing list posts, blog posts and talks, I&rsquo;ve shown vocabulary coverage statistics. It&rsquo;s time to update the code to use more recent data and republish the results here.</p>
<p>In various mailing list posts, blog posts and talks, I&rsquo;ve shown vocabulary coverage statistics. It&rsquo;s time to update the code to use more recent data and republish the results here.</p>
<p>The vocabulary coverage tables have a number of different parameters:</p>
<ul>
<li>what are the items being learnt: lexemes or forms or something else?</li>
<li>what are the targets: verses or sentences or something else?</li>
<li>what ordering is being used: item frequency or something else?</li>
</ul>
<p>and, of course, what text and lemmatization is being used.</p>
<p>Most of my published stats before were based on the UBS3 version of MorphGNT. Here I&rsquo;m going to use the latest MorphGNT based on the SBLGNT (MorphGNT 6.06) and I&rsquo;m going to explore not just verses but (in followup posts) clauses and sentences from the GBI Syntax Trees and paragraphs from the SBLGNT.</p>
<p>I also want to start incorporating the information from my morphological lexicon into the item/target modeling and ordering algorithms.</p>
<p>But first let&rsquo;s just update the basic stats.</p>
<h2 id="verses-lexemes-with-frequency-ordering">Verses-Lexemes with Frequency Ordering</h2>
<p>A target-item file for verses-lexemes can be achieved with:</p>
<div class="codehilite"><pre>awk &#39;{print $1,$7}&#39; sblgnt/*-morphgnt.txt
</pre></div>
<p>if we then feed that to <a href="https://github.com/jtauber/graded-reader/blob/cf9f59ca3695d4d832208ef402373a8e08f57da0/code/vocab-coverage.py">vocab-coverage.py</a> we get the following result:</p>
<div class="codehilite"><pre> ANY 50.00% 75.00% 90.00% 95.00% 100.00%
------------------------------------------------------------------
100 99.91% 91.07% 24.36% 2.13% 0.64% 0.48%
200 99.92% 96.83% 51.80% 9.75% 3.43% 2.54%
500 99.97% 99.13% 82.23% 36.57% 17.81% 13.81%
1000 99.99% 99.71% 93.60% 62.57% 37.28% 29.99%
2000 100.00% 99.92% 98.41% 84.95% 65.38% 56.43%
5000 100.00% 100.00% 100.00% 99.51% 96.44% 94.58%
ALL 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
</pre></div>
<p>What this table is saying is that if you learn, say, the 200 most frequent lexemes, you&rsquo;ll be able to read 95% of the lexemes in 3.43% of verses.</p>
<h2 id="verses-forms-with-frequency-ordering">Verses-Forms with Frequency Ordering</h2>
<p>A target-item file for verses-forms can be achieved with:</p>
<div class="codehilite"><pre>awk &#39;{print $1,$6}&#39; sblgnt/*-morphgnt.txt
</pre></div>
<p>if we then feed that to <code>vocab-coverage.py</code> but with 10000 added as an item count, we get the following result:</p>
<div class="codehilite"><pre> ANY 50.00% 75.00% 90.00% 95.00% 100.00%
------------------------------------------------------------------
100 99.82% 57.63% 1.10% 0.04% 0.01% 0.01%
200 99.86% 78.86% 6.51% 0.34% 0.05% 0.05%
500 99.91% 92.85% 26.95% 2.23% 0.59% 0.52%
1000 99.94% 96.95% 51.23% 7.75% 2.31% 1.74%
2000 99.96% 98.65% 72.52% 21.74% 7.86% 5.80%
5000 99.97% 99.74% 90.97% 52.13% 28.52% 21.61%
10000 100.00% 99.94% 98.31% 78.28% 55.19% 45.28%
ALL 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
</pre></div>
<p>What this table is saying is that if you learn, say, the 500 most frequent forms, you&rsquo;ll be able to read 75% of the forms in 26.95% of verses.</p>
<p>Various talks, including those at BibleTech in 2010 and 2015 explain a ton of caveats around these numbers but I wanted to at least refresh them (and then code) with the latest data.</p>
http://jktauber.com/2015/07/15/speaking-sbl-annual-meeting-atlanta/Speaking At The SBL Annual Meeting in Atlanta2015-07-15T13:23:42Z2015-07-15T13:23:42ZJames Tauber
<p>I&rsquo;ve just finished up registration for the SBL Annual Meeting. Here&rsquo;s the paper I&rsquo;ll be presenting. </p>
<p>I&rsquo;ve just finished up registration for the SBL Annual Meeting. Here&rsquo;s the paper I&rsquo;ll be presenting. </p>
<h2 id="a-morphological-lexicon-of-new-testament-greek">A Morphological Lexicon of New Testament Greek</h2>
<p>Morphological analyses such as analytical lexicons have typically involved indicating lemma, part-of-speech, morphosyntactic and morphosemantic information (such as case, number, person, gender, tense, voice, mood and degree). Much progress has been made in recent years making analyses of this sort freely available in digital formats, but the kind of information they contain has not advanced significantly for decades. This paper will provide an overview of the work of the MorphGNT project to develop an electronic Morphological Lexicon of New Testament Greek that adds inflectional classes, roots and stems, stem formation and morphophonological processes, principal parts, and derivational morphology. Beyond serving as a database of linguistic information, the goal of the morphological lexicon is to provide an &ldquo;executable grammar&rdquo; so particular grammar points discussed in beginner grammars, intermediate grammars or advanced reference grammars can be tested against a corpus in a way that makes completely transparent where the &ldquo;rules&rdquo; are followed and where they fall down. This data also provides useful data for pedagogical tools such as intelligent tutoring systems that typically require better modeling of latent traits in order to determine what a student actually knows and what items best test that knowledge. All data is for the Morphological Lexicon of New Testament Greek is available under a Creative Commons license, and all code used for both the generation and verification of the morphological lexicon is open source.</p>
http://jktauber.com/2015/07/13/types-disagreement-syntactic-analyses/Types of Disagreement in Syntactic Analyses2015-07-13T18:20:50Z2015-07-13T18:16:34ZJames Tauber
<p>As helpful as the GBI Syntax Trees are, I have disagreements with them. Randall and Andi are receptive to feedback but there are very different <em>types</em> of disagreement that can arise in syntactic analysis so I thought I&rsquo;d start to note down what they are.</p>
<p>As helpful as the GBI Syntax Trees are, I have disagreements with them. Randall and Andi are receptive to feedback but there are very different <em>types</em> of disagreement that can arise in syntactic analysis so I thought I&rsquo;d start to note down what they are.</p>
<p>Somethings aren&rsquo;t disagreements, just corrections. Some are differences of interpretation of the Greek. Some are differences in overall approach.</p>
<p>Here&rsquo;s a first attempt at a more refined categorization of types. I&rsquo;ll call the the person/group who did the initial (published) analysis A1 and the person/group who has the change/disagreement A2.</p>
<ul>
<li><strong>I</strong>. <strong>correction</strong>—A1 actually agrees with A2 but simply made a mistake and can uncontroversially update their analysis accordingly</li>
<li><strong>II</strong>. <strong>ambiguity</strong>—both A1 and A2&rsquo;s analysis is possible in the eyes of the other, but based on other factors, A1 and A2 disagree which analysis to go with. Perhaps this could further be refined into:<ul>
<li><strong>IIA</strong>. cases where A1 and A2 each think their own analysis is the <em>more</em> likely; versus</li>
<li><strong>IIB</strong>. cases where A1 and A2 each their their own analysis is the <em>only</em> likely.</li>
</ul>
</li>
<li><strong>III</strong>. <strong>terminology/framework</strong>—A1 and A2 agree on structure and relationship up to a certain isomorphism but not in the specifics. This could be further split into:<ul>
<li><strong>IIIA</strong>. cases where A1 and A2&rsquo;s analyses are structurally identical but just different in labels</li>
<li><strong>IIIB</strong>. cases where A1 and A2&rsquo;s analyses different in structure even though they are derivable from one another</li>
</ul>
</li>
<li><strong>IV</strong>. <strong>irreconcilable</strong>—A1 and A2 disagree on the way the language actually works and the analyses can&rsquo;t easily be mapped to one another.</li>
</ul>
<p>I think many of my disagreements with the GBI Trees so far are of <strong>type IIIB</strong> which means it is likely possible for me to programmatically generate an alternative analysis with my preferred structure. Indeed, converting to a dependency analysis is a simple example of this but even different choices of head within the constituent structure (which is a major source of systemic disagreement) are easy to make.</p>
<p>The great thing about <strong>type III</strong> in general is that even if you disagree with A1, you can still use the analysis to explore the syntactic phenomenon you want (you just have to map your queries to their labels and their conventions).</p>
<p>I should also note that an important aspect to dealing with this is proper documentation of conventions followed.</p>
<p>With these thoughts down, I&rsquo;m now interested in other work that has already been done in this area.</p>
http://jktauber.com/2015/07/02/converting-gbi-syntax-trees-dependency-analysis/Converting the GBI Syntax Trees to a Dependency Analysis2015-07-03T20:10:55Z2015-07-02T08:22:01ZJames Tauber
<p>With one child on each branch identified as the head, a constituent analysis can be converted to a dependency analysis. Fortunately, the GBI syntax trees have an explicit indication of the head, so I went ahead and converted them to a dependency format.</p>
<p>With one child on each branch identified as the head, a constituent analysis can be converted to a dependency analysis. Fortunately, the GBI syntax trees have an explicit indication of the head, so I went ahead and converted them to a dependency format.</p>
<p>Non-leaf nodes in the GBI syntax trees have a <code>Head</code> attribute which indicates the index of the child considered the head.</p>
<p>So the algorithm is fairly straightforward. For each leaf-node:</p>
<ul>
<li>walk up the tree until you find a node whose <code>Head</code> attribute is NOT the index of the child we just came from</li>
<li>follow the <code>Head</code> attributes back down the tree until you hit another leaf-node</li>
<li>that second leaf-node is the head of the leaf-node you started on</li>
<li>the &ldquo;type&rdquo; of the dependency is the <code>Cat</code> of the second-to-last node you visited walking up in step 1.</li>
</ul>
<p>The only catch is the source data this script uses omits a <code>Head</code> altogether in three types of cases. The original GBI analysis treated the <code>Head</code> as being <code>"1"</code> in these cases so I special case that in the code. I don&rsquo;t necessarily agree with the choice but it&rsquo;s easy to change (see below).</p>
<p>I&rsquo;ve put the code in a gist: <a href="https://gist.github.com/jtauber/c02d0928811b7ed21c9a">https://gist.github.com/jtauber/c02d0928811b7ed21c9a</a></p>
<p>The result (on the first part of John 3.16) is:</p>
<div class="codehilite"><pre>64003016001 Οὕτως 64003016003 ADV
64003016002 γὰρ 64003016003 conj
64003016003 ἠγάπησεν None CL
64003016004 ὁ 64003016005 det
64003016005 θεὸς 64003016003 S
64003016006 τὸν 64003016007 det
64003016007 κόσμον 64003016003 O
64003016008 ὥστε 64003016013 conj
64003016009 τὸν 64003016010 det
64003016010 υἱὸν 64003016013 O
64003016011 τὸν 64003016012 det
64003016012 μονογενῆ 64003016010 np
64003016013 ἔδωκεν, 64003016003 CL
</pre></div>
<p>The <a href="http://jktauber.com/labs/dependency-highlighting.html">dependency relationship color highlighting</a> experiment on this site shows a possible way of conveying this dependency information in a text (in this case, 2 John).</p>
<p>As mentioned, I don&rsquo;t necessarily always agree with the GBI choice of head, however, it&rsquo;s fairly straightfoward to alter the code to override the choice of head in certain contexts.</p>
<p>For example, if you consider the complementizer the head, you can just add code that takes <code>Head="0"</code> where <code>Rule="that-VP"</code> and so on. Similarly with prepositions, determiners, etc.</p>
<p>Finally note that it&rsquo;s not quite possible to reconstruct the original tree from the dependency data because the algorithm effectively eliminates information on some intermediate nodes. Some may consider this an advantage.</p>
http://jktauber.com/2014/02/01/version-10-pyuca-released/Version 1.0 of pyuca released2015-07-03T09:05:13Z2014-02-01T12:00:00ZJames Tauber
<p>pyuca is my pure Python implementation of the Unicode Collation Algorithm (for sorting, amongst other things, Greek).</p>
<p>I've just released version 1.0 for Python 3.3 and above, and it passes 100% of the UCA conformances tests.</p>
<p>pyuca is my pure Python implementation of the Unicode Collation Algorithm (for sorting, amongst other things, Greek).</p>
<p>I've just released version 1.0 for Python 3.3 and above, and it passes 100% of the UCA conformances tests.</p>
<p>I implemented enough back in 2006 to be able to sort Ancient Greek and released it on PyPI in 2012.</p>
<p>Since then, with input from others, I've made various improvements but in October last year I decided to start testing against the comprehensive UCA conformance tests provided by the Unicode Consortium. The last couple of days I've had an intense sprint where I got 100% of the tests passing and also 100% code coverage.</p>
<p>I also made the decision to ditch Python 2 support as part of my encouragement to get people to move to Python 3.</p>
<p>The repo is available at <a href="https://github.com/jtauber/pyuca/">https://github.com/jtauber/pyuca/</a> but you can most easily get pyuca with</p>
<div class="codehilite"><pre>pip install pyuca
</pre></div>
<p>and then use it as follows:</p>
<div class="codehilite"><pre><span class="kn">from</span> <span class="nn">pyuca</span> <span class="kn">import</span> <span class="n">Collator</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">Collator</span><span class="p">(</span><span class="s">&quot;allkeys.txt&quot;</span><span class="p">)</span>
<span class="n">sorted_words</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">words</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">c</span><span class="o">.</span><span class="n">sort_key</span><span class="p">)</span>
</pre></div>
<p><strong>UPDATE (2015-05-13)</strong>: <a href="http://jktauber.com/2015/05/13/pyuca-supports-python-2-again/">Python 2 support is back in 1.1</a></p>
http://jktauber.com/2015/05/13/pyuca-supports-python-2-again/pyuca supports Python 2 again2015-07-03T09:03:23Z2015-05-13T12:00:00ZJames Tauber
<p>Thanks to Chris Beaven, Paul McLanahan and Michal Čihař, Python 2 support is back in pyuca 1.1.</p>
<p>Thanks to Chris Beaven, Paul McLanahan and Michal Čihař, Python 2 support is back in pyuca 1.1.</p>
<p>There was a small amount of complaining about me dropping Python 2 support for the big release of pyuca last year.</p>
<p>I didn't have the time or motivation to bring it back, though.</p>
<p>Fortunately, other people did and thanks to Chris, Paul and Michael, pyuca 1.1 supports Python 2 <em>and</em> 3.</p>
<p>The repo is at <a href="https://github.com/jtauber/pyuca">https://github.com/jtauber/pyuca</a> and you can get pyuca from PyPI with <code>pip install pyuca</code>.</p>
http://jktauber.com/2015/05/06/my-bibletech-2015-talk/My BibleTech 2015 Talk2015-06-29T08:23:45Z2015-05-06T08:17:54ZJames Tauber
<p>BibleTech talks were not recorded but I turned on my iPhone's Voice Memo recording and later sync'd the audio with my slides to make this video.</p>
<p>BibleTech talks were not recorded but I turned on my iPhone's Voice Memo recording and later sync'd the audio with my slides to make this video.</p>
<iframe src="https://player.vimeo.com/video/127114639" width="500" height="375" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
<p>The abstract:</p>
<blockquote>
<p>In an update on the ongoing work he has spoken about in previous Bible Tech conferences, James will talk about recent developments in open source learning software and the MorphGNT linguistic database, and how the two work together to provide tools for improving the learning of New Testament Greek.</p>
</blockquote>
http://jktauber.com/2010/03/28/my-bibletech-2010-talk/My BibleTech 2010 Talk2015-06-28T07:55:08Z2010-03-28T00:45:00ZJames Tauber
<p>Yesterday I gave a talk on the graded reader ideas at BibleTech.</p>
<p>Here is a video of my talk.</p>
<p>Yesterday I gave a talk on the graded reader ideas at BibleTech.</p>
<p>Here is a video of my talk.</p>
<iframe src="https://player.vimeo.com/video/10489590" width="500" height="283" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
<p>The abstract:</p>
<blockquote>
<p>We will discuss a new approach to language learning based on texts, with a special focus on learning Greek from the New Testament.</p>
<p>We will be covering how various linguistic analyses of a text such as the Greek New Testament can help determine the order in which vocabulary and grammar is introduced and how each new word or grammatical concept can be shown in the context of the text.</p>
<p>Lastly, we will also discuss various algorithms that have been implemented as well as open source Python code for producing this new kind of graded reader.</p>
</blockquote>
http://jktauber.com/2010/04/25/inline-replacement-john-2/Inline Replacement for John 22015-06-28T07:34:03Z2010-04-25T00:48:00ZJames Tauber
<p>A post to the graded-reader mailing list from April 25, 2010.</p>
<p>A post to the graded-reader mailing list from April 25, 2010.</p>
<p>This afternoon and evening, I updated and open sourced my code for doing inline replacement and did a rough literal translation John 2, marked up with the PROIEL clause (and in some cases phrase) boundaries. </p>
<p>I then just ran a next-best ordering based on forms only, with the targets that are PRED or multi-word SUB (adding the latter works quite well) </p>
<p>I've included the complete results below. The main outstanding issue is it doesn't yet properly handle discontinuous clauses (the parenthetical in 2.9) or clauses that span verses (2.9,2.10; 2.14,2.15,2.16; 2.24,2.25). </p>
<p>All the code (and my annotated translation) are available on github. </p>
<p>James </p>
<hr />
<p>[343427] John 2.2<br />
<strong>ὁ Ἰησοῦς</strong> and his disciples were invited to the wedding </p>
<p>[343464] John 2.4<br />
<strong>ὁ Ἰησοῦς</strong> says to her , what (concern is that) to me and you , woman ? My hour is not yet come </p>
<p>[343517] John 2.7<br />
<strong>ὁ Ἰησοῦς</strong> says to them : fill the water-jars with water and they filled them up to the top </p>
<p>[343607] John 2.11<br />
This beginning of signs <strong>ὁ Ἰησοῦς</strong> did in Cana of Galilee and revealed his glory and his disciples believed in him </p>
<p>[343665] John 2.13<br />
and near was the passover of the Jews and <strong>ὁ Ἰησοῦς</strong> went up to Jerusalem </p>
<p>[343841] John 2.22<br />
so when he was raised from the dead , his disciples remembered that he was saying this and they believed the Scripture and the word which <strong>ὁ Ἰησοῦς</strong> said </p>
<p>[343430] John 2.2<br />
ὁ Ἰησοῦς and <strong>οἱ μαθηταὶ αὐτοῦ</strong> were invited to the wedding </p>
<p>[343623] John 2.11<br />
This beginning of signs ὁ Ἰησοῦς did in Cana of Galilee and revealed his glory and <strong>οἱ μαθηταὶ αὐτοῦ</strong> believed in him </p>
<p>[343642] John 2.12<br />
After this , he and his mother and his brothers and <strong>οἱ μαθηταὶ αὐτοῦ</strong> went down into Capernaum and there they remained not many days </p>
<p>[343736] John 2.17<br />
<strong>οἱ μαθηταὶ αὐτοῦ</strong> remembered that it has been written : the zeal for your house will devour me </p>
<p>[343825] John 2.22<br />
so when he was raised from the dead , <strong>οἱ μαθηταὶ αὐτοῦ</strong> remembered that he was saying this and they believed the Scripture and the word which ὁ Ἰησοῦς said </p>
<p>[343753] John 2.18<br />
so <strong>οἱ Ἰουδαῖοι</strong> answered and said to him : what sign are you showing us that you do these things ? </p>
<p>[343788] John 2.20<br />
so <strong>οἱ Ἰουδαῖοι</strong> said : this temple was built in forty-six years and you will raise it in three days ? </p>
<p>[343549] John 2.9,2.10<br />
as <strong>ὁ ἀρχιτρίκλινος</strong> tasted the water having become wine and didn't know from where it came ( but the servants who drew the water knew ) the head-steward calls the groom and says to him : all men first put out the good wine and when they are drunk , the inferior . you have kept the good wine until now </p>
<p>[343574] John 2.9,2.10<br />
as ὁ ἀρχιτρίκλινος tasted the water having become wine and didn't know from where it came ( but the servants who drew the water knew ) <strong>ὁ ἀρχιτρίκλινος</strong> calls the groom and says to him : all men first put out the good wine and when they are drunk , the inferior . you have kept the good wine until now </p>
<p>[343514] John 2.7<br />
<strong>λέγει αὐτοῖς ὁ Ἰησοῦς</strong> : fill the water-jars with water and they filled them up to the top </p>
<p>[343531] John 2.8<br />
<strong>καὶ λέγει αὐτοῖς</strong> , draw now and carry it to the head-steward and they brought it </p>
<p>[343428] John 2.2<br />
<strong>καὶ ὁ Ἰησοῦς καὶ οἱ μαθηταὶ αὐτοῦ</strong> were invited to the wedding </p>
<p>[343481] John 2.5<br />
<strong>ἡ μήτηρ αὐτοῦ</strong> says to the servants : do whatever he tells you to </p>
<p>[343634] John 2.12<br />
After this , he and <strong>ἡ μήτηρ αὐτοῦ</strong> and his brothers and οἱ μαθηταὶ αὐτοῦ went down into Capernaum and there they remained not many days </p>
<p>[343418] John 2.1<br />
And on the third day , a wedding was happening in Cana of Galilee and <strong>ἡ μήτηρ τοῦ Ἰησοῦ</strong> was there </p>
<p>[343451] John 2.3<br />
There was no wine because the wedding wine had been finished off . Then <strong>ἡ μήτηρ τοῦ Ἰησοῦ</strong> says to him : there is no wine </p>
<p>[343576] John 2.9,2.10<br />
as ὁ ἀρχιτρίκλινος tasted the water having become wine and didn't know from where it came ( but the servants who drew the water knew ) ὁ ἀρχιτρίκλινος calls the groom <strong>λέγει αὐτῷ</strong> all men first put out the good wine and when they are drunk , the inferior . you have kept the good wine until now </p>
<p>[343755] John 2.18<br />
so οἱ Ἰουδαῖοι answered and <strong>εἶπαν αὐτῷ</strong> : what sign are you showing us that you do these things ? </p>
<p>[343785] John 2.20<br />
<strong>εἶπαν οὖν οἱ Ἰουδαῖοι</strong> : this temple was built in forty-six years and you will raise it in three days ? </p>
<p>[343750] John 2.18<br />
<strong>ἀπεκρίθησαν οὖν οἱ Ἰουδαῖοι</strong> and εἶπαν αὐτῷ : what sign are you showing us that you do these things ? </p>
<p>[343754] John 2.18<br />
<strong>ἀπεκρίθησαν οὖν οἱ Ἰουδαῖοι καὶ εἶπαν αὐτῷ</strong> : what sign are you showing us that you do these things ? </p>
<p>[343770] John 2.19<br />
Jesus answered and <strong>εἶπεν αὐτοῖς</strong> : destroy this temple and in three days I will raise it </p>
<p>[343767] John 2.19<br />
<strong>ἀπεκρίθη Ἰησοῦς</strong> and εἶπεν αὐτοῖς : destroy this temple and in three days I will raise it </p>
<p>[343769] John 2.19<br />
<strong>ἀπεκρίθη Ἰησοῦς καὶ εἶπεν αὐτοῖς</strong> : destroy this temple and in three days I will raise it </p>
<p>[343872] John 2.24,2.25<br />
<strong>αὐτὸς Ἰησοῦς</strong> did not entrust himself to them because he knows everyone and because he had no need that anyone should testify about man for he knew what was in man </p>
<p>[343638] John 2.12<br />
After this , he and ἡ μήτηρ αὐτοῦ and <strong>οἱ ἀδελφοὶ αὐτοῦ</strong> and οἱ μαθηταὶ αὐτοῦ went down into Capernaum and there they remained not many days </p>
<p>[343632] John 2.12<br />
After this , <strong>αὐτὸς καὶ ἡ μήτηρ αὐτοῦ καὶ οἱ ἀδελφοὶ αὐτοῦ καὶ οἱ μαθηταὶ αὐτοῦ</strong> went down into Capernaum and there they remained not many days </p>
<p>[343444] John 2.3<br />
There was no wine because <strong>ὁ οἶνος τοῦ γάμου</strong> had been finished off . Then ἡ μήτηρ τοῦ Ἰησοῦ says to him : there is no wine </p>
<p>[343442] John 2.3<br />
There was no wine because <strong>συνετελέσθη ὁ οἶνος τοῦ γάμου</strong> . Then ἡ μήτηρ τοῦ Ἰησοῦ says to him : there is no wine </p>
<p>[343461] John 2.4<br />
<strong>λέγει αὐτῇ ὁ Ἰησοῦς</strong> , what (concern is that) to me and you , woman ? My hour is not yet come </p>
<p>[343765] John 2.18<br />
ἀπεκρίθησαν οὖν οἱ Ἰουδαῖοι καὶ εἶπαν αὐτῷ : what sign are you showing us that <strong>ταῦτα ποιεῖς</strong> ? </p>
<p>[343459] John 2.3<br />
There was no wine because συνετελέσθη ὁ οἶνος τοῦ γάμου . Then ἡ μήτηρ τοῦ Ἰησοῦ says to him : <strong>οἶνος οὐκ ἔστιν</strong> </p>
<p>[343740] John 2.17<br />
οἱ μαθηταὶ αὐτοῦ remembered that <strong>γεγραμμένον ἐστίν</strong> : the zeal for your house will devour me </p>
<p>[343734] John 2.17<br />
<strong>ἐμνήσθησαν οἱ μαθηταὶ αὐτοῦ ὅτι γεγραμμένον ἐστίν</strong> : the zeal for your house will devour me </p>
<p>[343476] John 2.4<br />
λέγει αὐτῇ ὁ Ἰησοῦς , what (concern is that) to me and you , woman ? <strong>ἡ ὥρα μου</strong> is not yet come </p>
<p>[343543] John 2.8<br />
καὶ λέγει αὐτοῖς , draw now and carry it to the head-steward <strong>οἱ δὲ ἤνεγκαν</strong> </p>
<p>[343829] John 2.22<br />
so when he was raised from the dead , οἱ μαθηταὶ αὐτοῦ remembered that <strong>τοῦτο ἔλεγεν</strong> and they believed the Scripture and the word which ὁ Ἰησοῦς said </p>
<p>[343479] John 2.5<br />
<strong>λέγει ἡ μήτηρ αὐτοῦ τοῖς διακόνοις</strong> : do whatever he tells you to </p>
<p>[343439] John 2.3<br />
<strong>καὶ οἶνον οὐκ εἶχον ὅτι συνετελέσθη ὁ οἶνος τοῦ γάμου</strong> . Then ἡ μήτηρ τοῦ Ἰησοῦ says to him : οἶνος οὐκ ἔστιν </p>
<p>[343416] John 2.1<br />
And on the third day , a wedding was happening in Cana of Galilee and <strong>ἦν ἡ μήτηρ τοῦ Ἰησοῦ ἐκεῖ</strong> </p>
<p>[343580] John 2.10<br />
λέγει αὐτῷ <strong>πᾶς ἄνθρωπος</strong> first put out the good wine and when they are drunk , the inferior . you have kept the good wine until now </p>
<p>[343534] John 2.8<br />
καὶ λέγει αὐτοῖς , <strong>ἀντλήσατε νῦν</strong> and carry it to the head-steward οἱ δὲ ἤνεγκαν </p>
<p>[343537] John 2.8<br />
καὶ λέγει αὐτοῖς , ἀντλήσατε νῦν and <strong>φέρετε τῷ ἀρχιτρικλίνῳ</strong> οἱ δὲ ἤνεγκαν </p>
<p>[343536] John 2.8<br />
καὶ λέγει αὐτοῖς , <strong>ἀντλήσατε νῦν καὶ φέρετε τῷ ἀρχιτρικλίνῳ</strong> οἱ δὲ ἤνεγκαν </p>
<p>[343619] John 2.11<br />
This beginning of signs ὁ Ἰησοῦς did in Cana of Galilee and revealed his glory and <strong>ἐπίστευσαν εἰς αὐτὸν οἱ μαθηταὶ αὐτοῦ</strong> </p>
<p>[343557] John 2.9,2.10<br />
as ὁ ἀρχιτρίκλινος tasted the water having become wine and <strong>οὐκ ᾔδει πόθεν ἐστίν</strong> ( but the servants who drew the water knew ) ὁ ἀρχιτρίκλινος calls the groom λέγει αὐτῷ πᾶς ἄνθρωπος first put out the good wine and when they are drunk , the inferior . you have kept the good wine until now </p>
<p>[343547] John 2.9,2.10<br />
as <strong>ἐγεύσατο ὁ ἀρχιτρίκλινος τὸ ὕδωρ οἶνον γεγενημένον</strong> and οὐκ ᾔδει πόθεν ἐστίν ( but the servants who drew the water knew ) ὁ ἀρχιτρίκλινος calls the groom λέγει αὐτῷ πᾶς ἄνθρωπος first put out the good wine and when they are drunk , the inferior . you have kept the good wine until now </p>
<p>[343555] John 2.9,2.10<br />
as <strong>ἐγεύσατο ὁ ἀρχιτρίκλινος τὸ ὕδωρ οἶνον γεγενημένον καὶ οὐκ ᾔδει πόθεν ἐστίν</strong> ( but the servants who drew the water knew ) ὁ ἀρχιτρίκλινος calls the groom λέγει αὐτῷ πᾶς ἄνθρωπος first put out the good wine and when they are drunk , the inferior . you have kept the good wine until now </p>
<p>[343796] John 2.20<br />
εἶπαν οὖν οἱ Ἰουδαῖοι : <strong>ὁ ναὸς οὗτος</strong> was built in forty-six years and you will raise it in three days ? </p>
<p>[343661] John 2.13<br />
and near was the passover of the Jews and <strong>ἀνέβη εἰς Ἱεροσόλυμα ὁ Ἰησοῦς</strong> </p>
<p>[343656] John 2.13<br />
and near was <strong>τὸ πάσχα τῶν Ἰουδαίων</strong> and ἀνέβη εἰς Ἱεροσόλυμα ὁ Ἰησοῦς </p>
<p>[343654] John 2.13<br />
<strong>Καὶ ἐγγὺς ἦν τὸ πάσχα τῶν Ἰουδαίων</strong> and ἀνέβη εἰς Ἱεροσόλυμα ὁ Ἰησοῦς </p>
<p>[343660] John 2.13<br />
<strong>Καὶ ἐγγὺς ἦν τὸ πάσχα τῶν Ἰουδαίων καὶ ἀνέβη εἰς Ἱεροσόλυμα ὁ Ἰησοῦς</strong> </p>
<p>[343718] John 2.14,2.15,2.16<br />
he found, sitting in the temple , the ones selling oxen and sheep and doves , and the coin-dealers and, having made a whip out of ropes , he threw out of the temple all the sheep and the oxen and he threw out the coins of the money-changers and he overturned the tables and <strong>τοῖς τὰς περιστερὰς πωλοῦσιν εἶπεν</strong> take these things from here . don't make my father's house a market-place </p>
<p>[343711] John 2.14,2.15,2.16<br />
he found, sitting in the temple , the ones selling oxen and sheep and doves , and the coin-dealers and, having made a whip out of ropes , he threw out of the temple all the sheep and the oxen and he threw out the coins of the money-changers and <strong>τὰς τραπέζας ἀνέστρεψεν</strong> and τοῖς τὰς περιστερὰς πωλοῦσιν εἶπεν take these things from here . don't make my father's house a market-place </p>
<p>[343423] John 2.2<br />
<strong>ἐκλήθη δὲ καὶ ὁ Ἰησοῦς καὶ οἱ μαθηταὶ αὐτοῦ εἰς τὸν γάμον</strong> </p>
<p>[343720] John 2.16<br />
and τοῖς τὰς περιστερὰς πωλοῦσιν εἶπεν <strong>ἄρατε ταῦτα ἐντεῦθεν</strong> . don't make my father's house a market-place </p>
<p>[343570] John 2.9,2.10<br />
as ἐγεύσατο ὁ ἀρχιτρίκλινος τὸ ὕδωρ οἶνον γεγενημένον καὶ οὐκ ᾔδει πόθεν ἐστίν ( but the servants who drew the water knew ) ὁ ἀρχιτρίκλινος calls the groom λέγει αὐτῷ πᾶς ἄνθρωπος first put out the good wine and when they are drunk , the inferior . you have kept the good wine until now </p>
<p>[343575] John 2.9,2.10<br />
as ἐγεύσατο ὁ ἀρχιτρίκλινος τὸ ὕδωρ οἶνον γεγενημένον καὶ οὐκ ᾔδει πόθεν ἐστίν ( but the servants who drew the water knew ) ὁ ἀρχιτρίκλινος calls the groom λέγει αὐτῷ πᾶς ἄνθρωπος first put out the good wine and when they are drunk , the inferior . you have kept the good wine until now </p>
<p>[343563] John 2.9,2.10<br />
as ἐγεύσατο ὁ ἀρχιτρίκλινος τὸ ὕδωρ οἶνον γεγενημένον καὶ οὐκ ᾔδει πόθεν ἐστίν ( <strong>οἱ διάκονοι οἱ ἠντληκότες τὸ ὕδωρ</strong> knew ) ὁ ἀρχιτρίκλινος calls the groom λέγει αὐτῷ πᾶς ἄνθρωπος first put out the good wine and when they are drunk , the inferior . you have kept the good wine until now </p>
<p>[343474] John 2.4<br />
λέγει αὐτῇ ὁ Ἰησοῦς , what (concern is that) to me and you , woman ? <strong>οὔπω ἥκει ἡ ὥρα μου</strong> </p>
<p>[398696] John 2.4<br />
λέγει αὐτῇ ὁ Ἰησοῦς , <strong>τί ἐμοὶ καὶ σοί</strong> , woman ? οὔπω ἥκει ἡ ὥρα μου </p>
<p>[343819] John 2.22<br />
so when <strong>ἠγέρθη ἐκ νεκρῶν</strong> , οἱ μαθηταὶ αὐτοῦ remembered that τοῦτο ἔλεγεν and they believed the Scripture and the word which ὁ Ἰησοῦς said </p>
<p>[343823] John 2.22<br />
<strong>ὅτε οὖν ἠγέρθη ἐκ νεκρῶν ἐμνήσθησαν οἱ μαθηταὶ αὐτοῦ ὅτι τοῦτο ἔλεγεν</strong> and they believed the Scripture and the word which ὁ Ἰησοῦς said </p>
<p>[343845] John 2.23<br />
when <strong>δὲ ἦν ἐν τοῖς Ἱεροσολύμοις ἐν τῷ πάσχα ἐν τῇ ἑορτῇ</strong> , many believed in his name , seeing his signs which he was doing </p>
<p>[343832] John 2.22<br />
ὅτε οὖν ἠγέρθη ἐκ νεκρῶν ἐμνήσθησαν οἱ μαθηταὶ αὐτοῦ ὅτι τοῦτο ἔλεγεν and <strong>ἐπίστευσαν τῇ γραφῇ καὶ τῷ λόγῳ ὃν εἶπεν ὁ Ἰησοῦς</strong> </p>
<p>[343831] John 2.22<br />
<strong>ὅτε οὖν ἠγέρθη ἐκ νεκρῶν ἐμνήσθησαν οἱ μαθηταὶ αὐτοῦ ὅτι τοῦτο ἔλεγεν καὶ ἐπίστευσαν τῇ γραφῇ καὶ τῷ λόγῳ ὃν εἶπεν ὁ Ἰησοῦς</strong> </p>
<p>[343449] John 2.3<br />
καὶ οἶνον οὐκ εἶχον ὅτι συνετελέσθη ὁ οἶνος τοῦ γάμου . <strong>εἶτα λέγει ἡ μήτηρ τοῦ Ἰησοῦ πρὸς αὐτόν</strong> : οἶνος οὐκ ἔστιν </p>
<p>[343782] John 2.19<br />
ἀπεκρίθη Ἰησοῦς καὶ εἶπεν αὐτοῖς : destroy this temple and <strong>ἐν τρισὶν ἡμέραις ἐγερῶ αὐτόν</strong> </p>
<p>[343804] John 2.20<br />
εἶπαν οὖν οἱ Ἰουδαῖοι : ὁ ναὸς οὗτος was built in forty-six years and <strong>σὺ ἐν τρισὶν ἡμέραις ἐγερεῖς αὐτόν</strong> ? </p>
<p>[343773] John 2.19<br />
ἀπεκρίθη Ἰησοῦς καὶ εἶπεν αὐτοῖς : <strong>λύσατε τὸν ναὸν τοῦτον</strong> and ἐν τρισὶν ἡμέραις ἐγερῶ αὐτόν </p>
<p>[343778] John 2.19<br />
ἀπεκρίθη Ἰησοῦς καὶ εἶπεν αὐτοῖς : <strong>λύσατε τὸν ναὸν τοῦτον καὶ ἐν τρισὶν ἡμέραις ἐγερῶ αὐτόν</strong> </p>
<p>[343585] John 2.10<br />
λέγει αὐτῷ <strong>πᾶς ἄνθρωπος πρῶτον τὸν καλὸν οἶνον τίθησιν</strong> and when they are drunk , the inferior . you have kept the good wine until now </p>
<p>[398697] John 2.10<br />
λέγει αὐτῷ πᾶς ἄνθρωπος πρῶτον τὸν καλὸν οἶνον τίθησιν and <strong>ὅταν μεθυσθῶσιν τὸν ἐλάσσω</strong> . you have kept the good wine until now </p>
<p>[343587] John 2.10<br />
λέγει αὐτῷ <strong>πᾶς ἄνθρωπος πρῶτον τὸν καλὸν οἶνον τίθησιν καὶ ὅταν μεθυσθῶσιν τὸν ἐλάσσω</strong> . you have kept the good wine until now </p>
<p>[343594] John 2.10<br />
λέγει αὐτῷ πᾶς ἄνθρωπος πρῶτον τὸν καλὸν οἶνον τίθησιν καὶ ὅταν μεθυσθῶσιν τὸν ἐλάσσω . <strong>σὺ τετήρηκας τὸν καλὸν οἶνον ἕως ἄρτι</strong> </p>
<p>[343743] John 2.17<br />
ἐμνήσθησαν οἱ μαθηταὶ αὐτοῦ ὅτι γεγραμμένον ἐστίν : <strong>ὁ ζῆλος τοῦ οἴκου σου</strong> will devour me </p>
<p>[343747] John 2.17<br />
ἐμνήσθησαν οἱ μαθηταὶ αὐτοῦ ὅτι γεγραμμένον ἐστίν : <strong>ὁ ζῆλος τοῦ οἴκου σου καταφάγεταί με</strong> </p>
<p>[343628] John 2.12<br />
<strong>Μετὰ τοῦτο κατέβη εἰς Καφαρναοὺμ αὐτὸς καὶ ἡ μήτηρ αὐτοῦ καὶ οἱ ἀδελφοὶ αὐτοῦ καὶ οἱ μαθηταὶ αὐτοῦ</strong> and there they remained not many days </p>
<p>[343647] John 2.12<br />
Μετὰ τοῦτο κατέβη εἰς Καφαρναοὺμ αὐτὸς καὶ ἡ μήτηρ αὐτοῦ καὶ οἱ ἀδελφοὶ αὐτοῦ καὶ οἱ μαθηταὶ αὐτοῦ and <strong>ἐκεῖ ἔμειναν οὐ πολλὰς ἡμέρας</strong> </p>
<p>[343645] John 2.12<br />
<strong>Μετὰ τοῦτο κατέβη εἰς Καφαρναοὺμ αὐτὸς καὶ ἡ μήτηρ αὐτοῦ καὶ οἱ ἀδελφοὶ αὐτοῦ καὶ οἱ μαθηταὶ αὐτοῦ καὶ ἐκεῖ ἔμειναν οὐ πολλὰς ἡμέρας</strong> </p>
<p>[343890] John 2.24,2.25<br />
αὐτὸς Ἰησοῦς did not entrust himself to them because he knows everyone and because he had no need that <strong>τις μαρτυρήσῃ περὶ τοῦ ἀνθρώπου</strong> for he knew what was in man </p>
<p>[343887] John 2.24,2.25<br />
αὐτὸς Ἰησοῦς did not entrust himself to them because he knows everyone and because <strong>οὐ χρείαν εἶχεν ἵνα τις μαρτυρήσῃ περὶ τοῦ ἀνθρώπου</strong> for he knew what was in man </p>
<p>[343794] John 2.20<br />
εἶπαν οὖν οἱ Ἰουδαῖοι : <strong>τεσσεράκοντα καὶ ἓξ ἔτεσιν οἰκοδομήθη ὁ ναὸς οὗτος</strong> and σὺ ἐν τρισὶν ἡμέραις ἐγερεῖς αὐτόν ? </p>
<p>[343799] John 2.20<br />
εἶπαν οὖν οἱ Ἰουδαῖοι : <strong>τεσσεράκοντα καὶ ἓξ ἔτεσιν οἰκοδομήθη ὁ ναὸς οὗτος καὶ σὺ ἐν τρισὶν ἡμέραις ἐγερεῖς αὐτόν</strong> ? </p>
<p>[343613] John 2.11<br />
This beginning of signs ὁ Ἰησοῦς did in Cana of Galilee and <strong>ἐφανέρωσεν τὴν δόξαν αὐτοῦ</strong> and ἐπίστευσαν εἰς αὐτὸν οἱ μαθηταὶ αὐτοῦ </p>
<p>[343705] John 2.14,2.15,2.16<br />
he found, sitting in the temple , the ones selling oxen and sheep and doves , and the coin-dealers and, having made a whip out of ropes , he threw out of the temple all the sheep and the oxen and <strong>τῶν κολλυβιστῶν ἐξέχεεν τὸ κέρμα</strong> and τὰς τραπέζας ἀνέστρεψεν and τοῖς τὰς περιστερὰς πωλοῦσιν εἶπεν ἄρατε ταῦτα ἐντεῦθεν . don't make my father's house a market-place </p>
<p>[343809] John 2.21<br />
<strong>ἐκεῖνος δὲ ἔλεγεν περὶ τοῦ ναοῦ τοῦ σώματος αὐτοῦ</strong> </p>
<p>[343897] John 2.25<br />
and because οὐ χρείαν εἶχεν ἵνα τις μαρτυρήσῃ περὶ τοῦ ἀνθρώπου <strong>αὐτὸς γὰρ ἐγίνωσκεν τί ἦν ἐν τῷ ἀνθρώπῳ</strong> </p>
<p>[343760] John 2.18<br />
ἀπεκρίθησαν οὖν οἱ Ἰουδαῖοι καὶ εἶπαν αὐτῷ : <strong>τί σημεῖον δεικνύεις ἡμῖν ὅτι ταῦτα ποιεῖς</strong> ? </p>
<p>[343519] John 2.7<br />
λέγει αὐτοῖς ὁ Ἰησοῦς : <strong>γεμίσατε τὰς ὑδρίας ὕδατος</strong> and they filled them up to the top </p>
<p>[343525] John 2.7<br />
λέγει αὐτοῖς ὁ Ἰησοῦς : γεμίσατε τὰς ὑδρίας ὕδατος <strong>καὶ ἐγέμισαν αὐτὰς ἕως ἄνω</strong> </p>
<p>[343874] John 2.24,2.25<br />
αὐτὸς Ἰησοῦς did not entrust himself to them because he knows everyone and because οὐ χρείαν εἶχεν ἵνα τις μαρτυρήσῃ περὶ τοῦ ἀνθρώπου αὐτὸς γὰρ ἐγίνωσκεν τί ἦν ἐν τῷ ἀνθρώπῳ </p>
<p>[343409] John 2.1<br />
<strong>Καὶ τῇ ἡμέρᾳ τῇ τρίτῃ γάμος ἐγένετο ἐν Κανὰ τῆς Γαλιλαίας</strong> and ἦν ἡ μήτηρ τοῦ Ἰησοῦ ἐκεῖ </p>
<p>[343415] John 2.1<br />
<strong>Καὶ τῇ ἡμέρᾳ τῇ τρίτῃ γάμος ἐγένετο ἐν Κανὰ τῆς Γαλιλαίας καὶ ἦν ἡ μήτηρ τοῦ Ἰησοῦ ἐκεῖ</strong> </p>
<p>[343602] John 2.11<br />
<strong>ταύτην ἐποίησεν ἀρχὴν τῶν σημείων ὁ Ἰησοῦς ἐν Κανὰ τῆς Γαλιλαίας</strong> and ἐφανέρωσεν τὴν δόξαν αὐτοῦ and ἐπίστευσαν εἰς αὐτὸν οἱ μαθηταὶ αὐτοῦ </p>
<p>[343612] John 2.11<br />
<strong>ταύτην ἐποίησεν ἀρχὴν τῶν σημείων ὁ Ἰησοῦς ἐν Κανὰ τῆς Γαλιλαίας καὶ ἐφανέρωσεν τὴν δόξαν αὐτοῦ καὶ ἐπίστευσαν εἰς αὐτὸν οἱ μαθηταὶ αὐτοῦ</strong> </p>
<p>[343725] John 2.16<br />
and τοῖς τὰς περιστερὰς πωλοῦσιν εἶπεν ἄρατε ταῦτα ἐντεῦθεν . <strong>μὴ ποιεῖτε τὸν οἶκον τοῦ πατρός μου οἶκον ἐμπορίου</strong> </p>
<p>[343492] John 2.5<br />
λέγει ἡ μήτηρ αὐτοῦ τοῖς διακόνοις : <strong>ὅ τι ἂν λέγῃ ὑμῖν ποιήσατε</strong> </p>
<p>[343668] John 2.14,2.15,2.16<br />
<strong>καὶ εὗρεν ἐν τῷ ἱερῷ τοὺς πωλοῦντας βόας καὶ πρόβατα καὶ περιστερὰς καὶ τοὺς κερματιστὰς καθημένους</strong> and, having made a whip out of ropes , he threw out of the temple all the sheep and the oxen and τῶν κολλυβιστῶν ἐξέχεεν τὸ κέρμα and τὰς τραπέζας ἀνέστρεψεν and τοῖς τὰς περιστερὰς πωλοῦσιν εἶπεν ἄρατε ταῦτα ἐντεῦθεν . μὴ ποιεῖτε τὸν οἶκον τοῦ πατρός μου οἶκον ἐμπορίου </p>
<p>[343690] John 2.14,2.15,2.16<br />
καὶ εὗρεν ἐν τῷ ἱερῷ τοὺς πωλοῦντας βόας καὶ πρόβατα καὶ περιστερὰς καὶ τοὺς κερματιστὰς καθημένους and, <strong>ποιήσας φραγέλλιον ἐκ σχοινίων πάντας ἐξέβαλεν ἐκ τοῦ ἱεροῦ τά τε πρόβατα καὶ τοὺς βόας</strong> and τῶν κολλυβιστῶν ἐξέχεεν τὸ κέρμα and τὰς τραπέζας ἀνέστρεψεν and τοῖς τὰς περιστερὰς πωλοῦσιν εἶπεν ἄρατε ταῦτα ἐντεῦθεν . μὴ ποιεῖτε τὸν οἶκον τοῦ πατρός μου οἶκον ἐμπορίου </p>
<p>[343684] John 2.14,2.15,2.16<br />
καὶ εὗρεν ἐν τῷ ἱερῷ τοὺς πωλοῦντας βόας καὶ πρόβατα καὶ περιστερὰς καὶ τοὺς κερματιστὰς καθημένους and, ποιήσας φραγέλλιον ἐκ σχοινίων πάντας ἐξέβαλεν ἐκ τοῦ ἱεροῦ τά τε πρόβατα καὶ τοὺς βόας and τῶν κολλυβιστῶν ἐξέχεεν τὸ κέρμα and τὰς τραπέζας ἀνέστρεψεν and τοῖς τὰς περιστερὰς πωλοῦσιν εἶπεν ἄρατε ταῦτα ἐντεῦθεν . μὴ ποιεῖτε τὸν οἶκον τοῦ πατρός μου οἶκον ἐμπορίου </p>
<p>[343498] John 2.6<br />
there were there, standing according to the purification (rites) of the Jews , <strong>λίθιναι ὑδρίαι ἓξ χωροῦσαι ἀνὰ μετρητὰς δύο ἢ τρεῖς</strong> </p>
<p>[343494] John 2.6<br />
<strong>ἦσαν δὲ ἐκεῖ λίθιναι ὑδρίαι ἓξ κατὰ τὸν καθαρισμὸν τῶν Ἰουδαίων κείμεναι χωροῦσαι ἀνὰ μετρητὰς δύο ἢ τρεῖς</strong> </p>
<p>[343857] John 2.23<br />
<strong>Ὡς δὲ ἦν ἐν τοῖς Ἱεροσολύμοις ἐν τῷ πάσχα ἐν τῇ ἑορτῇ πολλοὶ ἐπίστευσαν εἰς τὸ ὄνομα αὐτοῦ θεωροῦντες αὐτοῦ τὰ σημεῖα ἃ ἐποίει</strong> </p>
http://jktauber.com/2010/04/14/all-subtrees-not-just-clauses/All Subtrees Not Just Clauses2015-06-28T07:29:08Z2010-04-14T00:47:00ZJames Tauber
<p>A post to the graded-reader mailing list from April 14, 2010.</p>
<p>A post to the graded-reader mailing list from April 14, 2010.</p>
<p>I just ran a quick experiment where I treated the targets to learn not just as the clauses but any subtree in the dependency tree that has more than one word. </p>
<p>This results in 8209 targets in John's gospel instead of 3206. </p>
<p>Obviously it means learning common noun phrases and prepositional phrases first. </p>
<p>In particular, these are the first things learnt when using the next-best algorithm: </p>
<div class="codehilite"><pre>ὁ Ἰησοῦς
ἐν αὐτῷ
τοῦ θεοῦ
ἐκ θεοῦ
λέγει αὐτῷ
λέγει αὐτῷ Ἰησοῦς
λέγει αὐτῷ ὁ Ἰησοῦς
καὶ λέγει αὐτῷ
εἰς αὐτόν
πρὸς αὐτόν
τὸν πατέρα
πρὸς τὸν πατέρα
τὸν πατέρα μου
καὶ τὸν πατέρα μου
ἐν αὐτοῖς
λέγει αὐτοῖς
καὶ λέγει αὐτοῖς
λέγει αὐτοῖς ὁ Ἰησοῦς
εἶπεν αὐτῷ
καὶ εἶπεν ὁ Ἰησοῦς
</pre></div>
<p>Compare this with the first things learnt when the targets are clauses only (i.e. only subtrees rooted on "pred"): </p>
<div class="codehilite"><pre>εἶπεν
εἶπεν αὐτῷ
ἀπεκρίθη Ἰησοῦς
ἀπεκρίθη αὐτῷ Ἰησοῦς
ἀπεκρίθη Ἰησοῦς αὐτῷ
λέγει
λέγει αὐτῷ
λέγει αὐτῷ Ἰησοῦς
λέγει αὐτῷ ὁ Ἰησοῦς
εἶπεν αὐτῷ ὁ Ἰησοῦς
ἀπεκρίθη ὁ Ἰησοῦς
λέγει αὐτοῖς
λέγει αὐτοῖς Ἰησοῦς
λέγει αὐτοῖς ὁ Ἰησοῦς
εἶπεν αὐτοῖς
ἀπεκρίθη αὐτοῖς
ἀπεκρίθη αὐτοῖς Ἰησοῦς
ἀπεκρίθη αὐτοῖς ὁ Ἰησοῦς
εἶπεν αὐτοῖς Ἰησοῦς
εἶπεν αὐτοῖς ὁ Ἰησοῦς
</pre></div>
<p>(note, these are just based on surface form in text with no reference to any other linguistic information) </p>
<p>While it's kind of nice seeing the noun phrases emerge in the first list, I worry about learning prepositional phrases in isolation from their verb. Thoughts? Of course, when combined with inline replacement into English, the verb <em>will</em> be shown, albeit in English. </p>
<p>I also realise now, the former list should include one-word subtrees if the word is a "pred". </p>
<p>James </p>
http://jktauber.com/2010/04/12/initial-code-based-proiel-dependency-analysis/Initial Code Based on PROIEL Dependency Analysis2015-06-28T07:27:36Z2010-04-12T00:46:00ZJames Tauber
<p>A post to the graded-reader mailing list from April 12, 2010.</p>
<p>A post to the graded-reader mailing list from April 12, 2010.</p>
<p>Until this weekend, all the GNT graded reader work I'd done has used clause boundaries from OpenText.org.</p>
<p>With the availability of the PROIEL dependency tree analysis, I thought I'd give that a go.</p>
<p>I've uploaded to github code for extracting the clauses in John's Gospel and generating a very basic reading programme from that.</p>
<p>Clauses were extracted by looking at any 'pred' arc and linearizing all nodes from that point down. If there were embedded preds then clauses corresponding to both inner and outer preds were generated.</p>
<p>Note that the current code is just based on forms with use made of syntactic or morphological information. I also can't do inline replacement into an English context because I don't have an English text mapped to the PROIEL analysis.</p>
<p>However, my initial impression is that the PROIEL analysis will be preferable to work with moving forward.</p>
<p>James</p>
<hr />
<p>Then Patrick Narkinsky asked:</p>
<blockquote>
<p>Could you clarify in what ways you see the PROIEL data being superior to the opentext data? One obvious one that leaps to mind is that OpenText seems to be a dead project...</p>
</blockquote>
<hr />
<p>It's actively maintained, is redistributable under a CC license, is based on a freely redistributable text and is a less idiosyncratic analysis.</p>
<p>Admittedly, I haven't spent THAT much time with it but it seems that it will be easier to extract the kind of syntactic information I'm interested in from it.</p>
<p>James</p>
http://jktauber.com/2008/04/01/next-best-algorithm/The "Next-Best" Algorithm2015-06-28T07:21:32Z2008-04-01T00:44:00ZJames Tauber
<p>A post to the graded-reader mailing list from April 1, 2008.</p>
<p>A post to the graded-reader mailing list from April 1, 2008.</p>
<p>In the last few posts, I've mentioned a simple algorithm I've used (one of a number) for ordering items.</p>
<h3>The Input</h3>
<p>This algorithm, like all the ordering algorithms I've tried takes as an input, a list of target-item pairs. For example,</p>
<div class="codehilite"><pre>T1 I1
T1 I3
T1 I7
T2 I2
T2 I7
T3 I4
...
</pre></div>
<p>means that to read T1, you need to know I1, I3, I7; to read T2, you need to know I2, I7 and so on.</p>
<p>The targets and items can be anything. For the various stats I've posted here I've used verses for the targets and either lemmas or inflected forms for the items. In the sample reader online, I use clauses as the targets and a combination of lemmas, inflected forms and a little bit of morphology (not much yet). If you want to model the fact that students can't read a target until they've learnt some syntactic point or even some cultural point, that can be modeled by including an appropriate item for this.</p>
<p>I make this point to emphasize that the ordering algorithm is independent of what we chose as targets and what items we include as prerequisites to being able to comprehend those targets.</p>
<h3>The Output</h3>
<p>What this (and my other algorithms) output is what I sometimes in comments and elsewhere refer to as a "learning programme". (yes, I tend to use that spelling when referring to any ordered list to be followed that isn't a computer program)</p>
<p>Such a programme looks like this:</p>
<div class="codehilite"><pre>learn I2
learn I5
learn I7
know T2
learn I1
learn I3
know T1
</pre></div>
<p>Note that this algorithm will sometimes (as it does in the example above) prematurely mention an item that could be delayed (in this case I5) so the optimize-order code I've mentioned previously and uploaded to Google Code is useful as a post-processing step.</p>
<h3>The Algorithm Itself</h3>
<p>The algorithm is very simple and follows an iterative process. At each step, each item not yet learnt is assigned a score. The item with the highest score is then learnt and the process repeats (with the scores being recalculated each time on the remaining items).</p>
<p>The score favours items that are the only remaining unlearnt item (or one of only a few remaining unlearnt items) in a lot of different targets.</p>
<p>At each step, each unlearnt item receives, for each target the item is a prerequisite for, an additional score of 1 / 2^num_unlearnt_items_in_target.</p>
<p>In other words, for each target the item is the only unlearnt item in, the score goes up by 1/2, for each target the item is one of two unlearnt items in, the score goes up by 1/4, for each target the item is one of three unlearnt items in, the score goes up by 1/8 and so on.</p>
<p>I haven't done much experimentation to see if this exponential decay is optimal but it seems to give good results.</p>
<p>Because this algorithm is iterative and picks a single item at each step rather than exploring multiple ordering possibilities, I'm tentatively calling this algorithm the "next-best" algorithm.</p>
<p>I've checked in the code as <a href="http://code.google.com/p/graded-reader/source/browse/trunk/code/next-best.py">http://code.google.com/p/graded-reader/source/browse/trunk/code/next-best.py</a></p>
<p>It is important to note that this algorithm currently considers all items equally easy (or difficult!) to learn and assumes they are independent. However, it would be relatively easy to augment the algorithm with difficulty weightings and I plan to do that soon.</p>
<p>Another feature that I'm considering is being able to "pin down" certain items as not being available until a particular point. You may, for example, want to delay the introduction of participles but otherwise have the algorithm come up with its own ordering.</p>
<p>James</p>
http://jktauber.com/2008/03/29/vocab-coverage-table-better-ordering/Vocab Coverage Table for a Better Ordering2015-06-28T07:18:23Z2008-03-29T00:43:00ZJames Tauber
<p>A post to the graded-reader mailing list from March 29, 2008.</p>
<p>A post to the graded-reader mailing list from March 29, 2008.</p>
<p>I thought I'd calculate the vocabulary coverage table assuming the ordering generated for the post "just how much can frequency ordering be improved on?". To do this, I modified vocab-coverage.py to load in an arbitrary learning programme instead of assuming a frequency ordering. The code is now checked in as <code>vocab-coverage-arbitrary.py</code>.</p>
<p>Here's the original frequency ordering of forms in the Greek NT (using counts rather than percentages in the cells):</p>
<div class="codehilite"><pre> 0% 50% 75% 90% 95% 100%
100 7928 4585 88 1 0 0
200 7931 6291 515 26 4 4
500 7935 7388 2149 182 46 39
1000 7937 7700 4085 631 184 141
2000 7938 7838 5765 1736 628 456
5000 7939 7920 7232 4161 2275 1711
8000 7939 7935 7684 5691 3784 3004
12000 7941 7939 7879 6858 5149 4310
16000 7941 7941 7937 7777 7060 6549
20000 7941 7941 7941 7941 7941 7941
</pre></div>
<p>And here's the table with the ordered produced in the "just how much<br />
can frequency ordering be improved on?" post:</p>
<div class="codehilite"><pre> 0% 50% 75% 90% 95% 100%
100 7896 1762 78 *37* *36* *36*
200 7927 4590 339 *81* *71* *70*
500 7933 6781 1572 *315* *225* *213*
1000 7935 7455 3155 *802* *526* *491*
2000 7936 7739 4872 *1820* *1242* *1144*
5000 7939 7869 6400 3592 *3246* *3244*
8000 7939 7908 7156 5071 *4745* *4742*
12000 7939 7924 7501 6501 *6463* *6463*
16000 7940 7933 7791 7646 *7645* *7645*
20000 7941 7941 7941 7941 7941 7941
</pre></div>
<p>I've marked with asterisks those instances where the number is better than the frequency ordering.</p>
<p>Note that because the ordering algorithm was highly biased towards reading entire verses, it is actually worse for coverage 75th or below. Even for 90% it's only better for the first 2000 items.</p>
<p>But for the 100% familiarity level, you can see just how much better even the simple algorithm I used (which I will explain shortly) is than frequency ordering. For 200 forms, you get 70 verses instead of 4!</p>
<p>I'll repeat the caveats I mentioned in the other post, though: items are considered independent and equally easy to learn, there's no consideration of morphology, syntax, idiom and this is using verses as targets. We'll fix all that over time.</p>
<p>James</p>
http://jktauber.com/2008/03/29/ordering-ultimately-targets-not-items/Ordering is Ultimately of Targets not Items2015-06-28T07:11:14Z2008-03-29T00:42:00ZJames Tauber
<p>A post to the graded-reader mailing list from March 29, 2008.</p>
<p>A post to the graded-reader mailing list from March 29, 2008.</p>
<p>[this is based on a blog post from August 2005 but with the terminology changed]</p>
<p>Say you have written a program which lists an order in which to learn items along with an indication, every so often, of what new target has been reached. Running on the Greek lexemes of 1John, you might get something starting like this:</p>
<div class="codehilite"><pre>learn μαρτυρέω
learn θεός
learn ἐν
learn εἰμί
learn ὁ
learn τρεῖς
learn ὅτι
know 230507
</pre></div>
<p>This gives seven items to learn and then a target that has been reached (230507 = 1John 5.7). The problem is that two of those items are unnecessary. You only need to learn μαρτυρέω, εἰμί, ὁ, τρεῖς and ὅτι to be able to read 1John 5.7.</p>
<p>The problem is that the program is ordering items first and only then establishing at each point what goals (if any) have been achieved.</p>
<p>What you really want to do is not display an item until it is needed. So back in 2005, I wrote some code that optimizes the ordering of items by delaying any that are not yet needed.</p>
<p>I've now made that code more generic and will check it in shortly.</p>
<p>It can be used as a post-processor on ordering from any source, even a manually crafted list of items. It will optimize the ordering of items for the same ordering of targets.</p>
<p>Because the algorithm for doing such an optimization is nearly identical to what's necessary to calculate the "area under the curve" that I described in my video (and will write more about soon) my new code also outputs a score.</p>
<p>I'll be checking it in shortly.</p>
<p>James</p>
<hr />
<p>It's available at:</p>
<p><a href="http://code.google.com/p/graded-reader/source/browse/trunk/code/optimize-order.py">http://code.google.com/p/graded-reader/source/browse/trunk/code/optimize-order.py</a></p>
<p>James</p>
http://jktauber.com/2008/03/26/if-only-they-knew-one-rare-word/If Only They Knew That One Rare Word...2015-06-28T07:09:38Z2008-03-26T00:41:00ZJames Tauber
<p>A post to the graded-reader mailing list from March 26, 2008.</p>
<p>A post to the graded-reader mailing list from March 26, 2008.</p>
<p>I'm going to talk in more detail about alternatives to frequency order in a different thread but I wanted to share the results of a quite striking little test I did.</p>
<p>In my last post, I show the vocab/coverage table applied to fully inflected forms in the Greek NT rather than lexemes. You may have noticed that the 100% coverage column and even the 95% coverage column said 0.0% verses for the 100 most frequent forms.</p>
<p>If you did, you might then have wondered: is this just a rounding error? The answer is no. Even if you knew the 100 most frequent inflected forms in the GNT, there is not a single verse you would know all the forms in (of course assuming you couldn't guess).</p>
<p>I wanted to test if this was because of just one outlier. So I modified (added 4 extra lines) the code that produced the table to instead output a list of the top ten targets (i.e. verses) whose <em>second least</em> frequent item (i.e. form) is most frequent overall.</p>
<p>Here are the results:</p>
<div class="codehilite"><pre>032030 2 [1, 2, 1077]
030146 35 [1, 35, 524]
041135 46 [2, 46, 14597]
130528 66 [5, 19, 38, 45, 49, 59, 65, 66, 235]
071623 66 [5, 19, 38, 45, 59, 66, 235]
070323 68 [3, 3, 29, 65, 68, 131]
020940 72 [8, 18, 22, 22, 44, 49, 49, 72, 102]
012425 78 [36, 78, 2846]
060211 96 [8, 14, 18, 22, 79, 96, 4276]
130519 98 [7, 17, 98, 14731]
</pre></div>
<p>What this listing is showing is that, for example, target 032030 (Luke 20.30) consists of the 1st, 2nd and 1077th most frequent forms; target 030146 (Luke 1.46) consists of the 1st, 35th and 524th most frequent forms. So if the rarest word wasn't needed, they would jump from needing the top 1077 forms to just the top 2 and from needing the top 524 forms to the top 35.</p>
<p>Now you may argue that many of these are bad examples because the verse doesn't make sense in isolation (a good reason to be more careful about what to use as targets) or that the one rare word is actually the one carrying most of the semantic weight.</p>
<p>But this little test demonstrates that sometimes a single rare item can massively delay reading an otherwise quite readable target unit.</p>
<p>By the way, here's the same listing based on <em>lexemes</em> rather than fully inflected forms:</p>
<div class="codehilite"><pre>032030 2 [1, 2, 346]
030146 9 [2, 9, 509]
011615 9 [3, 4, 5, 7, 8, 9, 9, 33]
032448 13 [4, 13, 415]
090124 14 [1, 2, 6, 7, 14, 267]
021337 16 [4, 5, 9, 9, 12, 16, 588]
040620 17 [1, 3, 5, 7, 8, 9, 17, 180]
041135 19 [1, 19, 4752]
040426 19 [1, 1, 3, 4, 7, 8, 9, 19, 56]
031934 24 [1, 1, 3, 5, 9, 15, 23, 24, 311]
</pre></div>
<p>I'll check in the code that produces this shortly.</p>
<p>James</p>
<hr />
<p>It's now available at</p>
<p><a href="http://code.google.com/p/graded-reader/source/browse/trunk/code/if-only.py">http://code.google.com/p/graded-reader/source/browse/trunk/code/if-only.py</a></p>
<p>James</p>
http://jktauber.com/2008/03/26/just-how-much-can-frequency-ordering-be-improved/Just How Much Can Frequency Ordering Be improved On?2015-06-28T06:51:57Z2008-03-26T00:40:00ZJames Tauber
<p>A post to the graded-reader mailing list from March 26, 2008.</p>
<p>A post to the graded-reader mailing list from March 26, 2008.</p>
<p>Here's a quick demonstration. Recall that in my previous post, I pointed out that learning the top 100 inflected forms gives you 0 (zero, nada) target versus in the GNT. I showed that, for example target 130528 (1 Thessalonians 5.28) gets excluded because of one form that is #235 while the other eight forms appear in the top 66.</p>
<p>Well, what if those 9 forms were learnt first? That is:</p>
<p>Χριστοῦ, κυρίου, Ἰησοῦ, ὑμῶν, μετά, τοῦ, χάρις, ἡ, ἡμῶν</p>
<p>Not only could 130528 be read but also 071623</p>
<p>Now if the reader learnt πάντων (just one more form) they could read three more verses: 140318, 191325 and 272221</p>
<p>Now introduce these six forms:</p>
<p>καί, ὑμῖν, ἀπό, εἰρήνη, πατρός, θεοῦ</p>
<p>and suddenly <em>seven</em> more verses are readable: 140102, 070103, 100102, 110102, 090103, 180103, 080102</p>
<p>This was just with one algorithm I'm experimenting with (which I'll explain and provide code for soon) and there are likely others than do better.</p>
<p>So instead of 100 forms giving 0 verses, we now have just 16 forms giving us 12 entire verses from an actual corpus.</p>
<p>The usual caveats apply: items are considered independent and equally easy to learn, there's no consideration of morphology, syntax, idiom<br />
and this is using verses as targets. We'll fix all that over time.</p>
<p>James</p>
http://jktauber.com/2008/03/25/gnt-verse-coverage-frequency-ordering/GNT Verse Coverage with Frequency Ordering2015-06-28T06:49:04Z2008-03-25T00:39:00ZJames Tauber
<p>A post to the graded-reader mailing list from March 25, 2008.</p>
<p>A post to the graded-reader mailing list from March 25, 2008.</p>
<p>[if you'll indulge me, I'm trying to get all my thoughts and previous writing on these topics in one place and this list is a good place to do it]</p>
<p>[this is based on a post to b-greek[1] and my blog[2]. I hope the table comes out! ]</p>
<p>It is fairly common, in the context of learning vocabulary for a particular corpus like the Greek New Testament, to talk about what proportion of the text one could read if one learnt the top N words. I even produced such a table for the GNT back in 1996—see New Testament Vocabulary Count Statistics[3].</p>
<p>But these sort of numbers are highly misleading because they don't tell you what proportion of sentences (or as a rough proxy in the GNT case: verses) you could read, only what proportion of words.</p>
<p>Reading theorists have suggested that you need to know 95% of the vocabulary of a sentence to comprehend it. So a more interesting list of statistics would be how many verses can one understand 95% of the vocab of if one know a certain number of words. Of course, there's a lot more to reading comprehension than knowing the vocab. But it was enough for me to decide to write some code yesterday afternoon to run against my MorphGNT database.</p>
<p>To first of all give you a flavour in the specific before moving to the final numbers, consider John 3.16, which is, from a vocabulary point of view, a very easy verse to read.</p>
<p>To be able to read 50% of it, you only need to know the top 28 lexemes in the GNT. To read 75% you only need the top 85 (up to κόσμος). With the top 204 lexemes, you can read 90% of the verse and only a few more: up to 236 (αἰώνιος) gives you the 95%. The only word you would not have come across learning the top 236 words would be μονογενής but even that is in the top 1,200.</p>
<p>This example does highlight some of the shortcomings of this sort of analysis. There's no consideration of necessary knowledge of morphology, syntax, idioms, etc. Nor for the fact that the meaning of something like μονογενής is fairly easy to guess from knowledge of more common words. But I still think it's much more useful than the pure word coverage statistics I linked to above.</p>
<p>So let's actually run the numbers on the complete GNT. If you know the top N words, how many verses could you understand 50% of, 75%, 90% or 95% of...</p>
<div class="codehilite"><pre>vocab / coverage any 50% 75% 90% 95% 100%
100 99.9% 91.3% 24.4% 2.1% 0.6% 0.4%
200 99.9% 96.9% 51.8% 9.8% 3.4% 2.5%
500 99.9% 99.1% 82.3% 36.5% 18.0% 13.9%
1,000 100.0% 99.7% 93.6% 62.3% 37.3% 30.1%
1,500 100.0% 99.8% 97.2% 76.3% 53.5% 44.8%
2,000 100.0% 99.9% 98.4% 85.1% 65.5% 56.5%
3,000 100.0% 100.0% 99.4% 93.6% 81.0% 74.1%
4,000 100.0% 100.0% 99.7% 97.4% 90.0% 85.5%
5,000 100.0% 100.0% 100.0% 99.4% 96.5% 94.5%
all 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%
</pre></div>
<p>What this means is purely from a vocabulary point of view if you knew the top 1000 lexemes, then 37.3% of verses in the GNT would be 95% familiar to you.</p>
<p>Note that this uses:</p>
<ol>
<li>verses as the reading target</li>
<li>lexemes as the individual items to be learnt</li>
<li>frequency of lexemes as the ordering</li>
</ol>
<p>It is possible to alter any of these variables and in subsequent posts I will do this.</p>
<p>James</p>
<p>[1] http://lists.ibiblio.org/pipermail/b-greek/2007-November/044685.html<br />
[2] http://jtauber.com/blog/2007/11/04/gnt_verse_coverage_statistics/<br />
[3] (via Internet Archive's Wayback Machine) http://web.archive.org/web/19961104033056/www.entmp.org/HGrk/grammar/lexicon/NTcount.shtml</p>
<hr />
<p>I've checked in my Python code as: http://code.google.com/p/graded-reader/source/browse/trunk/code/vocab-coverage.py</p>
<p>If you're not comfortable running it yourself, I can run it on any data you provide.</p>
<p>(if you send data, I suggest you do it off-list and be careful because a "reply" will go to the entire mailing list)</p>
<p>Remember that, as I said in my post, there's no consideration of necessary knowledge of morphology, syntax, idioms, etc. Over time, we can incorporate that, but for now the results are limited to the somewhat naïve assumptions that:</p>
<ol>
<li>comprehension is only at the level of the target (the verse in my example data)</li>
<li>learning the items (lexemes in the example table I gave) is all that matters to comprehending the target</li>
<li>all items are equally easy to learn</li>
<li>there is no dependency between items</li>
</ol>
<p>and, of course, the table assumes a frequency ordering of items. Soon I'll be starting a separate thread on alternative orderings.</p>
<p>But all that said, the numbers produced are far more useful than misleading notions like "the top 10 words account for 37% of the text".</p>
<p>Incidentally, here is the table when applied to <em>forms</em> in the Greek NT rather than lexemes:</p>
<div class="codehilite"><pre> 0% 50% 75% 90% 95% 100%
100 99.8% 57.7% 1.1% 0.0% 0.0% 0.0%
200 99.8% 79.2% 6.4% 0.3% 0.0% 0.0%
500 99.9% 93.0% 27.0% 2.2% 0.5% 0.4%
1,000 99.9% 96.9% 51.4% 7.9% 2.3% 1.7%
2,000 99.9% 98.7% 72.5% 21.8% 7.9% 5.7%
5,000 99.9% 99.7% 91.0% 52.3% 28.6% 21.5%
8,000 99.9% 99.9% 96.7% 71.6% 47.6% 37.8%
12,000 100.0% 99.9% 99.2% 86.3% 64.8% 54.2%
16,000 100.0% 100.0% 99.9% 97.9% 88.9% 82.4%
20,000 100.0% 100.0% 100.0% 100.0% 100.0% 100.0%
</pre></div>
<p>The fact that it takes 1,000 forms just to get 2.3% of verses at 95% coverage is indicative of the fact that frequency alone is not the way<br />
to go. Soon, I'll also produce similar tables using clauses (in the OpenText.org sense), rather than verses, as the target.</p>
<p>James</p>
http://jktauber.com/2008/03/23/throttle-and-delay/Throttle and Delay2015-06-28T06:31:32Z2008-03-23T00:38:00ZJames Tauber
<p>A post to the graded-reader mailing list from March 23, 2008.</p>
<p>A post to the graded-reader mailing list from March 23, 2008.</p>
<p>When you look at example-reader.html[1] you see that as well as the normal verse pairs, there are pairs marked REVIEW.</p>
<p>This is another idea I'm experimenting with that is independent of other ordering and display choices.</p>
<p>Basically, when a particular clause such as καὶ εἶπεν is introduced, I never repeat more than 3 instances of it. Instead I store up any additional instances to show later as reminders.</p>
<p>This "throttle-and-delay" technique is a separate part of the overall pipeline that produces the text.</p>
<p>The ordering algorithm, before the throttle-and-delay produces something like this:</p>
<div class="codehilite"><pre>NT.John.18_c108
NT.John.20_c122
NT.John.11_c131
NT.John.9_c174
NT.John.3_c117
NT.John.12_c121
NT.John.12_c178
NT.John.6_c97
NT.John.7_c53
NT.John.13_c95
NT.John.11_c161
NT.John.21_c114
NT.John.3_c50
NT.John.9_c25
NT.John.3_c12
NT.John.4_c71
NT.John.13_c27
NT.John.1_c206
NT.John.3_c46
NT.John.3_c4
</pre></div>
<p>and then the penultimate step is taking this and turning it in to the following. I'll explain later what the various parts of the "learn" lines are (I'm adding to them all the time), but for now the thing to note is that know_S means "show this new clause they now know", know_A means "they know this clause at this point but don't show it yet" and know_R means "show the previously introduced clause that was delayed due to throttling"</p>
<div class="codehilite"><pre>learn καί|καί|C-|---|-----|-
learn εἶπε(ν)|λέγω|V-|AAI|3-S--|-ε(ν):sa3S
know_S NT.John.3_c117
know_S NT.John.6_c97
know_S NT.John.7_c53
know_A NT.John.9_c174
know_A NT.John.11_c131
know_A NT.John.11_c161
know_A NT.John.12_c121
know_A NT.John.12_c178
know_A NT.John.13_c95
know_A NT.John.18_c108
know_A NT.John.20_c122
know_A NT.John.21_c114
learn αὐτῷ|αὐτός|RP|---|-DSM-|-
know_S NT.John.1_c198
know_S NT.John.1_c206
know_S NT.John.3_c4
know_A NT.John.3_c12
know_A NT.John.3_c46
know_A NT.John.3_c50
know_A NT.John.4_c71
know_A NT.John.5_c54
know_A NT.John.9_c25
know_A NT.John.13_c27
know_A NT.John.14_c101
know_A NT.John.18_c142
know_A NT.John.20_c132
learn αὐτοῖς|αὐτός|RP|---|-DPM-|-
know_S NT.John.2_c64
know_S NT.John.6_c113
know_S NT.John.6_c174
know_A NT.John.7_c78
know_A NT.John.8_c24
know_A NT.John.8_c54
know_A NT.John.9_c147
know_A NT.John.13_c54
know_A NT.John.16_c78
know_R NT.John.4_c71
know_R NT.John.3_c46
know_R NT.John.5_c54
</pre></div>
<p>This is actually the input to the final stage that produces the HTML.</p>
<p>James</p>
<p>[1] linked from http://groups.google.com/group/graded-reader/files</p>
http://jktauber.com/2008/03/23/embedding-target-language-english/Embedding the Target Language in English2015-06-28T06:28:42Z2008-03-23T00:37:00ZJames Tauber
<p>A post to the graded-reader mailing list from March 23, 2008.</p>
<p>A post to the graded-reader mailing list from March 23, 2008.</p>
<p>[this will be a bit of an experiment as to whether I can cut and paste formatted Greek and have it pass through Google Groups. I apologize in advance if it doesn't work]</p>
<p>One aspect of the reader that seems to have received a lot of interest is the embedding of the target language (in my case Greek) in English.</p>
<p>It is important to note that this is entirely independent of the 95% of the code and data which has to do which choosing the order in which to learn things.</p>
<p>I wanted to explain a little about how it's produced and what the variables are that could be tweaked or changed all together.</p>
<p>First of all, consider the very first block of text introduced:</p>
<div class="codehilite"><pre>John 3.26:
So they came to John and said to him, “Rabbi, the one who was with you on the other side of the Jordan River, about whom you testified – see, he is baptizing, and everyone is flocking to him!”
John 3.27:
John replied καὶ εἶπεν, “No one can receive anything unless it has been given to him from heaven.
</pre></div>
<p>For those of you who don't know Greek, καὶ εἶπεν means "and (he) said".</p>
<p>This was generated because the ordering component of the software said that the first thing to be introduced is clause <code>NT.John.3_c117</code>. That's a clause reference from OpenText.org's clause analysis of the New Testament. Part of my database is a listing of all the clauses, as identified by OpenText.org along with this unique identifier and what chapter/verse the clause comes from:</p>
<div class="codehilite"><pre>NT.John.3_c117|3.27|καὶ εἶπεν,
</pre></div>
<p>So my code knows that the clause to show is from John 3.27. I decided to always include the previous verse for context as well. So I retrieve John 3.26 and John 3.27 from a database containing the NET translation but annotated with the OpenText.org clause boundaries:</p>
<div class="codehilite"><pre>3.26 [c108 So they came to John ] [c109 and said to him, ] “Rabbi, the one who was with you on the other side of the Jordan River, [c112 about whom you testified – ] [c113 see, ] [c114 he is baptizing, ] [c115 and everyone is flocking to him!” ]
3.27 [c116 John replied ] [c117 and said, ] “No one can receive anything unless it has been given to him from heaven.
</pre></div>
<p>Notice that I haven't annotated everything yet. It's a slow and laborious process so I tend to just mark clauses as they are needed.</p>
<p>In some cases, I slightly alter the NET translation so there is something to annotate. This becomes challenging when NET has altered clause order and even more so when the Greek breaks apart words from the one clause that have to be together in the English. I still want to do more work in this area as the key thing to note is I never use the actual translation of the clause when introducing the clause; rather I use everything <em>except</em> the translation of the clause and that might make the problem easier if thought about in those terms (rather than what my annotation above focuses on which is annotating what English text corresponds to what Greek clause).</p>
<p>But this annotated NET is used to then produce what you see in the example-reader.html extra shown at the start. If other clauses were known at this point, they would be replaced by the Greek as well. Any clauses already known are show at normal weight and the new clause being introduced is shown in bold. Hence later on in example-reader.html (at step 13.)</p>
<div class="codehilite"><pre>John 4.49:
The official said to him, “Sir, come down before my child dies.”
John 4.50:
λέγει αὐτῷ ὁ Ἰησοῦς, “Go home; your son will live.” The man believed the word ὃν εἶπεν αὐτῷ ὁ Ἰησοῦς and set off for home.
</pre></div>
<p>So, to summarize: the input to this part of the process is:</p>
<ol>
<li>what clause to introduce (by reference number)</li>
<li>what verse this clause is in</li>
<li>what other clauses are already known (by reference numbers)</li>
<li>what the English text of the verse (from 2) and the one before are, annotated by clause references that can be replaced by Greek if known</li>
</ol>
<p>The variables to this particular step are:</p>
<ol>
<li>the unit of text being introduced (in this example, a clause)</li>
<li>the unit of text to show (in this example, the verse containing the clause and the verse before it)</li>
</ol>
<p>There is no reason why the unit of text being introduced in Greek could not be smaller (a phrase or even a word) and the unit of text being shown in English larger (a paragraph, for example).</p>
<p>Note that the clauses I am currently dealing with included embedded clauses such as relative clauses and so in the John 4.50 example, we have the relative clause ὃν εἶπεν αὐτῷ ὁ Ἰησοῦς ("that Jesus said to him") even though it might have been better to wait until the containing noun phrase were readable (which would, of course, have required knowledge of phrase boundaries)</p>
<p>James</p>
http://jktauber.com/2008/03/23/welcome-and-some-files/Welcome (and some files)2015-06-28T06:20:11Z2008-03-23T00:36:00ZJames Tauber
<p>A post to the graded-reader mailing list from March 23, 2008.</p>
<p>A post to the graded-reader mailing list from March 23, 2008.</p>
<p>Welcome to the graded-reader mailing list.</p>
<p>I've been getting a lot of queries in response to my presentation so I thought I'd start a mailing list so we can all discuss questions and issues together.</p>
<p>I also plan to make available the code that I'm using to produce the graded reader. Because it's closely tied to the particular text and linguistic data I'm currently dealing with, it will take some time to make generic but I plan to release stuff incrementally based on your feedback.</p>
<p>I want to spend some time going through my current approach and explaining the different components and the ideas behind them. For the most part, these ideas can be used independently of one another so if you don't like one aspect of what I've done, you can still make use of other aspects. Also I'm still improving things in lots of different ways and, of course, I look forward to a lot of new ideas coming from this list.</p>
<p>Because the video presentation actually doesn't show much in terms of results, I've uploaded two files that will give you a flavour of the current state of my work.</p>
<p>You can get to these files at <a href="http://groups.google.com/group/graded-reader">http://groups.google.com/group/graded-reader</a></p>
<p><code>example-reader.html</code> shows the first 50 word forms output by the current version of my software when run on the Greek text of John's gospel.</p>
<p><code>greek_2.pdf</code> shows lesson 2 of an informal course I'm running for a couple of friends which uses the graded reader approach.</p>
<p>You'll notice (1) there is a lot of extra information in the lesson given to students; (2) the order in which words are presented is different.</p>
<p>There are three reasons for the difference in order:</p>
<ol>
<li>the ordering in lesson 2 was hand tweaked from what the software originally produced</li>
<li>the lesson 2 ordering was produced by an earlier version of the ordering algorithm that what was used for example-reader.html</li>
<li>example-reader.html used slightly more linguistic information (in particular, it knew about some verb endings) in the generation of ordering</li>
</ol>
<p>Note that the goal is to eventually not do any tweaking, but rather to capture in both the software and input data the criteria that motivated the manual reordering in the first place.</p>
<p>I'll send separate posts discussing different aspects of what goes in to producing the automated output.</p>
<p>James</p>
http://jktauber.com/2008/03/22/graded-reader-discussion-and-code/Graded Reader Discussion and Code2015-06-24T07:59:39Z2008-03-22T19:12:24ZJames Tauber
<p>Owing to the amount of interest I received about <a href="/2008/02/10/new-kind-graded-reader/">A New Kind of Graded Reader</a>... </p>
<p>Owing to the amount of interest I received about <a href="/2008/02/10/new-kind-graded-reader/">A New Kind of Graded Reader</a>... </p>
<p>I have started a mailing list at </p>
<p><a href="http://groups.google.com/group/graded-reader">http://groups.google.com/group/graded-reader</a></p>
<p>and also I plan to make my code available at</p>
<p><a href="http://code.google.com/p/graded-reader/">http://code.google.com/p/graded-reader/</a></p>
<p>If you're interested in the idea applied to any language (not just NT Greek) please join us.</p>
<p><strong>UPDATE</strong>: The code has moved to GitHub: <a href="https://github.com/jtauber/graded-reader">https://github.com/jtauber/graded-reader</a></p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2008/02/10/new-kind-graded-reader/A New Kind of Graded Reader2015-06-24T07:54:07Z2008-02-10T14:27:53ZJames Tauber
<p>Back in 2004, I talked about <a href="/2004/11/26/programmed-vocabulary-learning-travelling-salesman/">algorithms for optimal vocabulary ordering</a>.</p>
<p>Back in 2004, I talked about <a href="/2004/11/26/programmed-vocabulary-learning-travelling-salesman/">algorithms for optimal vocabulary ordering</a>.</p>
<p>Then in 2006, I talked about using this and other techniques in <a href="http://jtauber.com/blog/2006/05/05/teaching_new_testament_greek/">teaching New Testament Greek</a> (which I've resumed doing with this method, btw).</p>
<p>Earlier this year at <a href="/2008/01/14/bibletech-2008/">BibleTech:2008</a> I briefly touched on my graded reader approach. It generated a lot of interest so I decided to record a separate presentation at home this weekend, explaining some of the ideas behind the graded reader.</p>
<p>After multiple failed attempts to upload it to Google Video, it's now on YouTube and embedded below. Sound was recorded and mixed in Logic Pro and then synchronized with a presentation in Keynote and output as Quicktime.</p>
<p>Running time is just shy of 9 minutes.</p>
<iframe width="420" height="315" src="https://www.youtube.com/embed/ErmPyu19dgc" frameborder="0" allowfullscreen></iframe>
<p><strong>UPDATE 2008-03-22</strong>: Now see <a href="/2008/03/22/graded-reader-discussion-and-code/">Graded Reader Discussion and Code</a></p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2011/01/18/rebasing-morphgnt-sblgnt/Rebasing MorphGNT off SBLGNT2015-06-24T07:47:19Z2011-01-18T10:41:25ZJames Tauber
<p>The last three months, I've been working on rebasing the MorphGNT database off the SBLGNT text rather than the UBS3.</p>
<p>The last three months, I've been working on rebasing the MorphGNT database off the SBLGNT text rather than the UBS3.</p>
<p>While I have had permission to work with the CCAT database for over a decade, the fact the UBS3 text can be extracted from it has always been problematic. The existence of the SBLGNT solves the problem of having a critical text with clear licensing and so, in October 2010, I started the process of moving the MorphGNT analysis to the SBLGNT text.</p>
<p>This task is mostly done and the work-in-progress is available on GitHub at <a href="https://github.com/morphgnt/sblgnt">https://github.com/morphgnt/sblgnt</a>.</p>
<p>It was a three step process, done one book at a time.</p>
<ul>
<li>A Python script was used to do a first-pass alignment. The script allowed for differences in punctuation, accentuation, capitalization and movable-nu.</li>
<li>Any differences were then manually inspected and corrected. In 90% of cases it was a simple re-ordering of words but in the other 10%, a fresh analysis had to be made. These analyses were then checked against various sources such as BDAG, Perseus and the Lexham Reverse Interlinear.</li>
<li>Finally, I wrote another Python script that checked various heuristics</li>
</ul>
<p>I'm in the process of making a batch of corrections based on the third step and then I'll formally release what will be called MorphGNT 6.0 (although possibly as a beta such as 6.0b1).</p>
<p>The next step (which I've started in parallel) will merge in the Robinson analysis and parse codes on the road to a completely new set of parse codes for MorphGNT 7.0.</p>
<hr />
<p><em>originally published on morphgnt.org</em></p>
http://jktauber.com/2008/01/14/bibletech-2008/BibleTech 20082015-06-24T07:40:18Z2008-01-14T00:45:18ZJames Tauber
<p>I don't think I've mentioned it here before but next week, I'm one of the keynote speakers at the <a href="http://www.bibletechconference.com/">BibleTech 2008</a> conference in Seattle. </p>
<p>I don't think I've mentioned it here before but next week, I'm one of the keynote speakers at the <a href="http://www.bibletechconference.com/">BibleTech 2008</a> conference in Seattle. </p>
<p>While I've given talks a number of times about my Greek linguistics research, this will be the first time that I'll get to talk about how I've used technology in that research.</p>
<p>I plan to give a history of the [MorphGNT] project and the various sub-projects I've worked on over the last fifteen years, covering the evolution of data models, text encoding, tool sets and more. I then want to talk about the opportunities that lie ahead and where I hope the work will go in the future, particularly given my collaboration with Ulrik Sandborg-Petersen. </p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2007/11/04/gnt-verse-coverage-statistics/GNT Verse Coverage Statistics2015-06-24T07:01:22Z2007-11-04T13:07:53ZJames Tauber
<p>It is fairly common, in the context of learning vocabulary for a particular corpus like the Greek New Testament, to talk about what proportion of the text one could read if one learnt the top N words.</p>
<p>It is fairly common, in the context of learning vocabulary for a particular corpus like the Greek New Testament, to talk about what proportion of the text one could read if one learnt the top N words.</p>
<p>I even produced such a table for the GNT back in 1996—see <a href="http://web.archive.org/web/19961104033056/www.entmp.org/HGrk/grammar/lexicon/NTcount.shtml">New Testament Vocabulary Count Statistics</a> (via Internet Archive's Wayback Machine).</p>
<p>But these sort of numbers are highly misleading because they don't tell you what proportion of sentences (or as a rough proxy in the GNT case: verses) you could read, only what proportion of words.</p>
<p>Reading theorists have suggested that you need to know 95% of the vocabulary of a sentence to comprehend it. So a more interesting list of statistics would be how many verses can one understand 95% of the vocab of if one know a certain number of words. Of course, there's a lot more to reading comprehension than knowing the vocab. But it was enough for me to decide to write some code yesterday afternoon to run against my [MorphGNT] database.</p>
<p>To first of all give you a flavour in the specific before moving to the final numbers, consider John 3.16, which is, from a vocabulary point of view, a very easy verse to read.</p>
<p>To be able to read 50% of it, you only need to know the top 28 lexemes in the GNT. To read 75% you only need the top 85 (up to κόσμος). With the top 204 lexemes, you can read 90% of the verse and only a few more: up to 236 (αἰώνιος) gives you the 95%. The only word you would not have come across learning the top 236 words would be μονογενής but even that is in the top 1,200.</p>
<p>This example does highlight some of the shortcomings of this sort of analysis. There's no consideration of necessary knowledge of morphology, syntax, idioms, etc. Nor for the fact that the meaning of something like μονογενής is fairly easy to guess from knowledge of more common words. But I still think it's much more useful than the pure word coverage statistics I linked to earlier.</p>
<p>So let's actually run the numbers on the complete GNT. If you know the top N words, how many verses could you understand 50% of, 75%, 90% or 95% of...</p>
<table class="table table-condensed">
<tr><th>vocab / coverage <th> any <th> 50% <th>75% <th> 90% <th>95% <th>100% </tr>
<tr><th>100 <td>99.9% <td>91.3% <td>24.4% <td>2.1% <td>0.6% <td>0.4% </tr>
<tr><th>200 <td>99.9% <td>96.9% <td>51.8% <td>9.8% <td>3.4% <td>2.5% </tr>
<tr><th>500 <td>99.9% <td>99.1% <td>82.3% <td>36.5% <td>18.0% <td>13.9% </tr>
<tr><th>1,000 <td>100.0% <td>99.7% <td>93.6% <td>62.3% <td>37.3% <td>30.1% </tr>
<tr><th>1,500 <td>100.0% <td>99.8% <td>97.2% <td>76.3% <td>53.5% <td>44.8% </tr>
<tr><th>2,000 <td>100.0% <td>99.9% <td>98.4% <td>85.1% <td>65.5% <td>56.5% </tr>
<tr><th>3,000 <td>100.0% <td>100.0% <td>99.4% <td>93.6% <td>81.0% <td>74.1% </tr>
<tr><th>4,000 <td>100.0% <td>100.0% <td>99.7% <td>97.4% <td>90.0% <td>85.5% </tr>
<tr><th>5,000 <td>100.0% <td>100.0% <td>100.0% <td>99.4% <td>96.5% <td>94.5% </tr>
<tr><th>all <td>100.0% <td>100.0% <td>100.0% <td>100.0% <td>100.0% <td>100.0% </tr>
</table>
<p>What this means is <strong>purely from a vocabulary point of view</strong> if you knew the top 1000 lexemes, then 37.3% of verses in the GNT would be 95% familiar to you.</p>
<p>I should emphasis that learning vocabulary in frequency order isn't necessarily the fastest way to get this proportion of readable verses up. I blogged about this fact three years ago, see <a href="/2004/11/26/programmed-vocabulary-learning-travelling-salesman/">Programmed Vocabulary Learning as a Travelling Salesman Problem</a>, for example.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2006/03/12/announcing-morphgntorg/Announcing MorphGNT.org2015-06-24T06:30:41Z2006-03-12T14:31:55ZJames Tauber
<p>I've <a href="/2006/01/01/file-system-archaeology-morphgnt/">hinted before</a> about Ulrik Petersen and I collaborating on Greek New Testament linguistic endeavours.</p>
<p>I've <a href="/2006/01/01/file-system-archaeology-morphgnt/">hinted before</a> about Ulrik Petersen and I collaborating on Greek New Testament linguistic endeavours.</p>
<p>I'm now delighted to announce the website that will be the home of our collaborative work:</p>
<blockquote>
<p><a href="http://morphgnt.org">http://morphgnt.org</a></p>
</blockquote>
<p>I've transferred my [MorphGNT] files over there and Ulrik has done the same with his Tischendorf 8th and Strong's Dictionary.</p>
<p>We've been working on a bunch of other stuff for the last few months which will eventually find its way on to that site too.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2006/02/13/bug-fix-python-unicode-collation-algorithm/Bug Fix to Python Unicode Collation Algorithm2015-06-24T06:29:59Z2006-02-13T04:56:31ZJames Tauber
<p>See <a href="/2006/01/27/python-unicode-collation-algorithm/">Python Unicode Collation Algorithm</a> for background.</p>
<p>See <a href="/2006/01/27/python-unicode-collation-algorithm/">Python Unicode Collation Algorithm</a> for background.</p>
<p>This version fixes a major bug that prevented the collation algorithm from working properly with any expansions:</p>
<blockquote>
<p><a href="http://jtauber.com/2006/02/13/pyuca.py">http://jtauber.com/2006/02/13/pyuca.py</a></p>
</blockquote>
<p><strong>UPDATE (2012-06-21)</strong>: Now see <a href="https://github.com/jtauber/pyuca">https://github.com/jtauber/pyuca</a></p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2006/01/27/python-unicode-collation-algorithm/Python Unicode Collation Algorithm2015-06-24T06:28:07Z2006-01-27T01:41:45ZJames Tauber
<p>My preliminary attempt at a Python implementation of the Unicode Collation Algorithm (UCA) is done and available at:</p>
<blockquote>
<p><a href="http://jtauber.com/2006/01/27/pyuca.py">http://jtauber.com/2006/01/27/pyuca.py</a> (old version—see UPDATE below)</p>
</blockquote>
<p>My preliminary attempt at a Python implementation of the Unicode Collation Algorithm (UCA) is done and available at:</p>
<blockquote>
<p><a href="http://jtauber.com/2006/01/27/pyuca.py">http://jtauber.com/2006/01/27/pyuca.py</a> (old version—see UPDATE below)</p>
</blockquote>
<p>This only implements the simple parts of the algorithm but I have successfully tested it using the Default Unicode Collation Element Table (DUCET) to collate Ancient Greek correctly.</p>
<p>The core of the algorithm, which is what I have implemented, basically just involves multi-level comparison. For example, <em>café</em> comes before <em>caff</em> because at the primary level, the accent is ignored and the first word is treated as if it were <em>cafe</em>. The secondary level (which considers accents) only applies then to words that are equivalent at the primary level.</p>
<p>The UCA (and my code) also support contraction and expansion. Contraction is where multiple letters are treated as a single unit—in Spanish, <em>ch</em> is treated as a letter coming between <em>c</em> and <em>d</em> so that, for example, words beginning <em>ch</em> should sort after all other words beginnings with <em>c</em>. Expansion is where a single letter is treated as though it were multiple letters—in German, <em>ä</em> is sorted as if it were <em>ae</em>, i.e. after <em>ad</em> but before <em>af</em>.</p>
<p>Here is how to use the <strong>pyuca</strong> module. </p>
<p>Usage example:</p>
<div class="codehilite"><pre><span class="kn">from</span> <span class="nn">pyuca</span> <span class="kn">import</span> <span class="n">Collator</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">Collator</span><span class="p">(</span><span class="s">&quot;allkeys.txt&quot;</span><span class="p">)</span>
<span class="n">sorted_words</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">words</span><span class="p">,</span> <span class="n">key</span><span class="o">=</span><span class="n">c</span><span class="o">.</span><span class="n">sort_key</span><span class="p">)</span>
</pre></div>
<p>allkeys.txt (1 MB) is available at</p>
<blockquote>
<p><a href="http://www.unicode.org/Public/UCA/latest/allkeys.txt">http://www.unicode.org/Public/UCA/latest/allkeys.txt</a></p>
</blockquote>
<p>but you can always subset this for just the characters you are dealing with (and you will need to do this if any language-specific tailoring is needed)</p>
<p><strong>UPDATE (2006-02-13)</strong>: Now see <a href="/2006/02/13/bug-fix-python-unicode-collation-algorithm/">bug fix</a></p>
<p><strong>UPDATE (2012-06-21)</strong>: Now see <a href="https://github.com/jtauber/pyuca">https://github.com/jtauber/pyuca</a></p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2006/01/01/file-system-archaeology-morphgnt/File System Archaeology for MorphGNT2015-06-24T06:26:30Z2006-01-01T05:55:32ZJames Tauber
<p>Some of you will be aware of <a href="http://ulrikp.org">Ulrik Petersen</a>'s <a href="http://ulrikp.org/Tischendorf">work</a> on augmenting Tischendorf's 8th edition with morphological tags and lemmata, based on work by Clint Yale and Maurice Robinson. Ulrik is also the developer of <a href="http://emdros.org/">Emdros</a>, an open-source text database engine for annotated text.</p>
<p>The overlap of Ulrik's interests and work with my own on [MorphGNT] is very exciting and so we've started talking about how we might be able to collaborate on some things together.</p>
<p>Some of you will be aware of <a href="http://ulrikp.org">Ulrik Petersen</a>'s <a href="http://ulrikp.org/Tischendorf">work</a> on augmenting Tischendorf's 8th edition with morphological tags and lemmata, based on work by Clint Yale and Maurice Robinson. Ulrik is also the developer of <a href="http://emdros.org/">Emdros</a>, an open-source text database engine for annotated text.</p>
<p>The overlap of Ulrik's interests and work with my own on [MorphGNT] is very exciting and so we've started talking about how we might be able to collaborate on some things together.</p>
<p>To help facilitate this, I've spent much of this long weekend so far going through the last 12 years of work on MorphGNT and putting things into Subversion. Because my work on MorphGNT has always been in fits and spurts and has spanned approximately five different desktop machines over the 12 years, it's required a fair bit of "file system archaeology".</p>
<p>The archaeology analogy seems apt because, I'm essentially piecing together a history based on what "layer" I'm finding the files in - e.g. a file on a backup of my website in 2002 probably dates later than those found in the tar balls from when I moved from one machine to another in 1997.</p>
<p>There's also an analogy with textual criticism as in some cases I have to look at two files and judge whether a change from A to B or B to A is more likely.</p>
<p>It's been a lot of fun, especially uncovering little scripts I wrote back in the nineties to do various analyses.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2006/01/28/dynamic-interlinears-javascript-and-css/Dynamic Interlinears with Javascript and CSS2015-06-24T06:16:46Z2006-01-28T22:03:46ZJames Tauber
<p>After the continuation of a permathread on the b-greek mailing list about the pros and cons of interlinears, I built some quick demonstrations of how CSS and Javascript could be used for dynamic interlinear glosses that would not be possible on the printed page.</p>
<p>After the continuation of a permathread on the b-greek mailing list about the pros and cons of interlinears, I built some quick demonstrations of how CSS and Javascript could be used for dynamic interlinear glosses that would not be possible on the printed page.</p>
<ul>
<li><a href="http://jtauber.com/2006/interlinear-demo/plain.html">Plain</a> — show static glosses</li>
<li><a href="http://jtauber.com/2006/interlinear-demo/hover.html">Hover</a> — show glosses when a word is hovered over</li>
<li><a href="http://jtauber.com/2006/interlinear-demo/toggle.html">Toggle</a> — toggle showing a gloss when a word is clicked</li>
<li><a href="http://jtauber.com/2006/interlinear-demo/frequency.html">Frequency</a> — filter appearance of gloss by frequency</li>
</ul>
<p>They might be interesting as little Javascript tutorials too.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2004/11/26/programmed-vocabulary-learning-travelling-salesman/Programmed Vocabulary Learning as a Travelling Salesman Problem2015-06-24T05:04:29Z2004-11-26T05:20:17ZJames Tauber
<p>For a while I've been interested in how you could select the order in which vocabulary is learnt in order to maximise one's ability to read a particular corpus of sentences. Or more generally, imagine you have a set of things you want to learn and each item has prerequisites drawn from a large set with items sharing a lot of common prerequisites.</p>
<p>For a while I've been interested in how you could select the order in which vocabulary is learnt in order to maximise one's ability to read a particular corpus of sentences. Or more generally, imagine you have a set of things you want to learn and each item has prerequisites drawn from a large set with items sharing a lot of common prerequisites.</p>
<p>As an abstract example, imagine you want to be able to read the "sentences":</p>
<div class="codehilite"><pre>{&quot;a b&quot;, &quot;b a&quot;, &quot;h a b&quot;, &quot;d a b e c&quot;, &quot;d a g f&quot;}
</pre></div>
<p>where we assume you must first learn each "word". Further assuming that all sentences are equally valuable to learn, how would you order the learning of words to maximise what you know at any given point in time?</p>
<p>One approach would be to learn the prerequisites in order of their frequency. So you might learn in an order like</p>
<div class="codehilite"><pre>&lt;a, b, d, c, e, f, g, h&gt;
</pre></div>
<p>However, had we put h before d, we could have had an overall learning programme that, although equal in length by the end, enabled the learner, at the half-way mark, to understand three sentences instead of just two.</p>
<p>To investigate this further, I needed a way to score a particular learning programme and decided that one reasonable way to do so would be to sum, across each step, the fraction of the overall set of sentences understandable at that point.</p>
<p>I then needed an algorithm that would find the ordering that would maximise this score. </p>
<p>After the quick realisation that the number of possible learning programmes was factorial in the number of words, it dawn on me that this was essentially a travelling salesman problem.</p>
<p>So my sister, Jenni and I wrote a Python script that implements a simulated annealing approach to the TSP. We then applied it to the above contrived example. Sure enough, it found a solution that was better than a straight prerequisite frequency ordering.</p>
<p>I then decided to try applying it to a small extract of the Greek New Testament (which, of course, [I have in electronic form], already stemmed). So I ran it on the first chapter of John's Gospel. 198 words and 51 verses. A straight frequency ordering on this text achieves a score of 48 so that was the score to beat.</p>
<p>My first attempt, it didn't even come close to that. What a disappointment! Jenni and I wondered if it was just the initial parameters to the annealing model. So we increased the number of iterations at a given temperature to 50 and lowered the final temperature to 0.001 (keeping the initial temperature at 1 and the alpha at 0.9).</p>
<p>Success!! It found a solution that scored 82.94. The first verse readable (after 27 words) was John 1.34. John 1.20 was then readable after just 2 more words and John 1.4 after another 7.</p>
<p>I decided to try different parameters. With 100 iterations per temp, a final temp of 0.0001 and a few hours, it achieved a score of 91.59 (and was still increasing at the time). This time the first verse readable was John 1.24, after only 8 words; then John 1.4 after another 9; John 1.10 after 4; and both John 1.1 and John 1.6 after another 4 and John 1.2 just 1 word after that.</p>
<p>Overall a very promising approach. I doubt it's anything new but it was fun discovering the approach ourselves rather than just reading about it in some textbook. The example I tested it on was vocabulary learning, but it could apply to anything that can similarly be modelled as items to learn with prerequisites drawn from a large, shared set.</p>
<p>The next step (besides more optimised code and even more long-running parameters) would be to try to work out how to model layered prerequisites — i.e. where prerequisites themselves have prerequisites — to any number of levels. I haven't thought yet how (or even whether) that boils down (no pun intended) to a simulated annealing solution to the TSP.</p>
<p><strong>UPDATE (2005-08-03)</strong>: Now see <a href="/2005/08/03/using-simulated-annealing-order-goal-prerequisites/">Using Simulated Annealing to Order Goal Prerequisites</a>.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/01/19/datr-morphgnt-rdf-and-python/DATR, MorphGNT, RDF and Python2015-06-24T05:01:46Z2005-01-19T01:00:00ZJames Tauber
<p>I've been revisiting <a href="http://www.datr.org/">DATR</a>, the lexical knowledge representation language, as a possible format for the next generation of [MorphGNT]. I was previously considering developing my own RDF/graph-based format but I suddenly remembered DATR from my student days and it makes a lot more sense to use it rather than try to build my own.</p>
<p>I've been revisiting <a href="http://www.datr.org/">DATR</a>, the lexical knowledge representation language, as a possible format for the next generation of [MorphGNT]. I was previously considering developing my own RDF/graph-based format but I suddenly remembered DATR from my student days and it makes a lot more sense to use it rather than try to build my own.</p>
<p>Looking at DATR material, I haven't seen anything more recent than 1998 so I'm not sure if it's still the state-of-the-art. It's a natural fit for some kind of RDFization, something I'm sure I'll eventually end up doing if someone hasn't already.</p>
<p>Of course, I'll have to write Python code to manipulate DATR. Again, unless some already exists. But I'm almost hoping not as I love implementing specs, especially using test-driven development.</p>
<p><strong>UPDATE 2005-04-19</strong>: Now see <a href="/2005/04/19/datr-python/">DATR in Python</a></p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/01/27/betacode-unicode-python/BetaCode to Unicode in Python2015-06-24T04:52:22Z2005-01-27T03:35:36ZJames Tauber
<p>BetaCode is a common ASCII transcription for Polytonic Greek. I've been dealing with it for around twelve years. (As an aside, back in 1994, I designed a METAFONT for Polytonic Greek that enabled one to use BetaCode in TeX—I typeset my self-published <em>Index to the Greek New Testament</em> with it).</p>
<p>BetaCode is a common ASCII transcription for Polytonic Greek. I've been dealing with it for around twelve years. (As an aside, back in 1994, I designed a METAFONT for Polytonic Greek that enabled one to use BetaCode in TeX—I typeset my self-published <em>Index to the Greek New Testament</em> with it).</p>
<p>For the last six years, my preference has been to use Unicode, so I wrote a program (initially in Java but then in Python) that used a <em>Trie</em> to represent the multiple BetaCode characters that can map to a single pre-composed Unicode character.</p>
<p>I've had a version available on this site since 2002, but I've now updated it to what I've been using for my most recent work. You can download it at <a href="http://jtauber.com/2004/11/beta2unicode.py">http://jtauber.com/2004/11/beta2unicode.py</a></p>
<p>At some stage I'll better factor out the conversion pairs so the code is useful for other conversions. The Trie code might be useful for other contexts too.</p>
<p>(Also see Ricoblog's <a href="http://www.supakoo.com/rick/ricoblog/PermaLink,guid,c13cfcd6-92de-4f5d-8256-400e45c5e25d.aspx">Converting Greek Beta Code into Normalized Unicode</a>.)</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/04/19/datr-python/DATR in Python2015-06-24T04:50:45Z2005-04-19T03:40:01ZJames Tauber
<p>I <a href="/2005/01/19/datr-morphgnt-rdf-and-python/">previously</a> talked about wanting to implement the lexicon language DATR in Python. Well, I just received an email from Henrik Weber saying that (apparently inspired by my post) he has gone and done an implementation at <a href="http://pydatr.sourceforge.net/">http://pydatr.sourceforge.net/</a></p>
<p>I <a href="/2005/01/19/datr-morphgnt-rdf-and-python/">previously</a> talked about wanting to implement the lexicon language DATR in Python. Well, I just received an email from Henrik Weber saying that (apparently inspired by my post) he has gone and done an implementation at <a href="http://pydatr.sourceforge.net/">http://pydatr.sourceforge.net/</a></p>
<p>Well done Henrik! I'm looking forward to trying it out and maybe contributing.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/06/10/morphgnt-update/MorphGNT Update2015-06-24T04:49:36Z2005-06-10T03:44:00ZJames Tauber
<p>A couple of months ago, I <a href="/2005/04/19/current-morphgnt-work/">talked about</a> the current process I'm going through to identify errors in my morphologically parsed Greek New Testament, [MorphGNT]. By the end of April, I was down to 400 mismatches I needed to check. At the time, I thought I'd be able to finish going through them by the time I left to go to Europe on holiday.</p>
<p>A couple of months ago, I <a href="/2005/04/19/current-morphgnt-work/">talked about</a> the current process I'm going through to identify errors in my morphologically parsed Greek New Testament, [MorphGNT]. By the end of April, I was down to 400 mismatches I needed to check. At the time, I thought I'd be able to finish going through them by the time I left to go to Europe on holiday.</p>
<p>Unfortunately, I haven't actually worked on it at all the last month. I'm leaving tomorrow but still have 350 mismatches to check (an estimated 14 hours work).</p>
<p>Hopefully I'll get it done some time during July and then I'll be able to release another version of MorphGNT.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/07/16/morphgnt-506-released/MorphGNT 5.06 Released2015-06-24T04:48:40Z2005-07-16T03:47:29ZJames Tauber
<p>Well, it's been about a hundred hours work over the last six months, but I'm pleased to announce the release of a new version of [MorphGNT], the morphologically parsed Greek New Testament database made available under a Creative Commons license.</p>
<p>Well, it's been about a hundred hours work over the last six months, but I'm pleased to announce the release of a new version of [MorphGNT], the morphologically parsed Greek New Testament database made available under a Creative Commons license.</p>
<p>Besides some corrections to the text (mostly rho-breathing) and a couple of parsing code changes, this release has a huge number of corrections to the lemmata—160 lemma changes in 465 places. See <a href="/2005/04/19/current-morphgnt-work/">this blog entry</a> for how potential errors for this round of corrections were discovered.</p>
<p>You can download the new file at:</p>
<ul>
<li><a href="http://jtauber.com/2005/morphgnt/ccat-tauber-morphgnt-v5_06.zip">http://jtauber.com/2005/morphgnt/ccat-tauber-morphgnt-v5_06.zip</a></li>
</ul>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/07/04/morphgnt-roadmap/MorphGNT Roadmap2015-06-24T04:47:24Z2005-07-04T03:45:40ZJames Tauber
<p>This month I should be doing another release of my morphologically-parsed Greek New Testament. This will be release 5.06.</p>
<p>I thought I'd outline my future plans (as they currently stand).</p>
<p>This month I should be doing another release of my morphologically-parsed Greek New Testament. This will be release 5.06.</p>
<p>I thought I'd outline my future plans (as they currently stand).</p>
<p>At some point, I'll start doing 6.xx releases. This will involve a format change that includes some more information. I'll probably continue the 5-series releases for people used to the format. The 5-series data is just a subset of the 6-series data so it's always possible (and easy) for me to generate a 5 from a 6.</p>
<p>From Series-7, MorphGNT's format will likely change dramatically to adopt a graph structure rather than a simple tabular structure. This will enable much greater extensibility and annotation.</p>
<p>Series-7 will be the last that is based on the CCAT database. From Series-8 onwards, the data will hopefully be completely the results of my own parsing work.</p>
<p>First things first, though—getting 5.06 out. I'm down to 299 mismatches to resolve.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/07/16/parts-speech-and-number-accents/Parts of Speech and Number of Accents2015-06-24T04:45:00Z2005-07-16T03:49:17ZJames Tauber
<p>I thought I'd write a quick Python script to check how many accents were on each of the lemmata in [MorphGNT] 5.06.</p>
<p>I thought I'd write a quick Python script to check how many accents were on each of the lemmata in [MorphGNT] 5.06.</p>
<p>Here are the counts by part of speech and number of accents on lemma:</p>
<div class="codehilite"><pre>| | 0 | 1 | 2 |
+-----+---------+---------+-----+
| A | - | 9159 | - |
| C | 924 | 17361 | - |
| D | 1592 | 4606 | - |
| I | - | 17 | - |
| N | 30 | 28271 | 1 |
| P | 5433 | 5488 | - |
| RA | 19862 | 4 | - |
| RD | - | 1744 | - |
| RI | - | 1165 | - |
| RP | - | 11584 | - |
| RR | - | 1677 | - |
| V | 8 | 28101 | 1 |
| X | 147 | 844 | - |
</pre></div>
<p>Some of the low numbers are definitely errors in the database. Now to investigate...</p>
<p><strong>UPDATE (2005-07-16)</strong>: both 2-accent cases were mistakes. The 30 0-accent nouns and 5 of the 0-accent verbs were foreign loan words that intentionally weren't accented but 3 of the 0-accent verbs were mistakes. The 4 accented articles were the result of crasis with the following noun and the word should probably be analyzed as a noun rather than an article. I guess there'll be a 5.07 release soon. NOTE: I haven't looked at the particles, adverbs, conjunctions or prepositions yet.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/11/07/morphgnt-508-released/MorphGNT 5.08 Released2015-06-24T04:34:21Z2005-11-07T04:05:20ZJames Tauber
<p>I'm pleased to announce the release of a new version of [MorphGNT], the morphologically parsed Greek New Testament database made available under a Creative Commons license. </p>
<p>I'm pleased to announce the release of a new version of [MorphGNT], the morphologically parsed Greek New Testament database made available under a Creative Commons license. </p>
<p>I haven't put together the change log yet but will shortly.</p>
<p><strong>UPDATE (2005-11-08)</strong>: Change log is now available on [MorphGNT] page.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/08/03/ordering-goals-rather-prerequisites/Ordering Goals Rather Than Prerequisites2015-06-24T04:33:53Z2005-08-03T03:53:20ZJames Tauber
<p>The outcome of my <a href="/2005/08/03/using-simulated-annealing-order-goal-prerequisites/">simulated annealing program</a> is a list of prerequisites to learn along with an indication, every so often, of what new goal has been reached.</p>
<p>The outcome of my <a href="/2005/08/03/using-simulated-annealing-order-goal-prerequisites/">simulated annealing program</a> is a list of prerequisites to learn along with an indication, every so often, of what new goal has been reached.</p>
<p>Running on the Greek lexemes of 1John, you might get something starting like this:</p>
<div class="codehilite"><pre>learn μαρτυρέω
learn θεός
learn ἐν
learn εἰμί
learn ὁ
learn τρεῖς
learn ὅτι
know 230507
</pre></div>
<p>This gives seven prerequisites to learn and then a goal that has been reached (230507 = 1John 5.7). The problem is that two of those words are unnecessary. You only need to learn μαρτυρέω, εἰμί, ὁ, τρεῖς and ὅτι to be able to read 1John 5.7.</p>
<p>The problem is that the program is ordering prerequisites first and only then establishing at each point what goals (if any) have been achieved.</p>
<p>I can see two solutions:</p>
<ul>
<li>write a post-processor that walks through and, at each goal, takes any "unused" prerequisites and postpones them to after that goal.</li>
<li>change the program to order goals rather than prerequisites and work out the latter from the former</li>
</ul>
<p>The second is probably considerably more work but probably ultimately preferred. </p>
<p><strong>UPDATE</strong>: I'm almost embarrassed to report that not only was changing over to ordering goals not as hard to do as I thought, but the particular way I did it performs 200 times faster than my previous prerequisite ordering script. New script is at <a href="http://jtauber.com/2005/08/sa_goal_ordering.py">http://jtauber.com/2005/08/sa_goal_ordering.py</a></p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/08/03/using-simulated-annealing-order-goal-prerequisites/Using Simulated Annealing to Order Goal Prerequisites2015-06-24T04:33:04Z2005-08-03T03:50:57ZJames Tauber
<p>Back in November, I wrote about <a href="/2004/11/26/programmed-vocabulary-learning-travelling-salesman/">programmed vocabulary learning as a travelling salesman problem</a>.</p>
<p>Back in November, I wrote about <a href="/2004/11/26/programmed-vocabulary-learning-travelling-salesman/">programmed vocabulary learning as a travelling salesman problem</a>.</p>
<p>I'm pleased to say I've finally cleaned up my Python code and made an initial version available at:</p>
<p><a href="http://jtauber.com/2005/08/sa_prereq_ordering.py">http://jtauber.com/2005/08/sa_prereq_ordering.py</a></p>
<p><strong>UPDATE (2005-08-04)</strong>: You probably don't want to use the above script. See <a href="/2005/08/03/ordering-goals-rather-prerequisites/">Ordering Goals Rather Than Prerequisites</a> for why, along with a much improved script.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/08/31/morphgnt-507-released/MorphGNT 5.07 Released2015-06-24T04:10:14Z2005-08-31T04:04:00ZJames Tauber
<p>I'm pleased to announce the release of a new version of MorphGNT, the morphologically parsed Greek New Testament database made available under a Creative Commons license. </p>
<p>I'm pleased to announce the release of a new version of MorphGNT, the morphologically parsed Greek New Testament database made available under a Creative Commons license. </p>
<p>See the [MorphGNT] page for a list of changes (47 changes in 940 places).</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/08/30/upcoming-new-morphgnt/Upcoming new MorphGNT2015-06-24T04:10:07Z2005-08-30T04:01:58ZJames Tauber
<p>I'm just about to release [MorphGNT] 5.07 and, shortly after that, a major new release I'll designate 6.07.</p>
<p>I'm just about to release [MorphGNT] 5.07 and, shortly after that, a major new release I'll designate 6.07.</p>
<p>I've decided not to reset the minor release number on a new major release to emphasis the fact that 5.07 and 6.07 are identical in the data they have in common, the 6-series just adds some extra data.</p>
<p>I haven't yet decided just how much extra data will make it in the 6-series releases, but one new addition will be a column containing the surface form / inflected form / reflex (take your pick of terminology) of each word taken in isolation.</p>
<p>What do I mean by "taken in isolation"? Well a word like μετά could appear in the text as
μετά μεθ' μετ' or μετὰ depending on the text after it. This new column normalises that to μετά. This happens to also be the lemma so it might not be clear what the extra value is in this case. So consider the text in Matthew 1.20 which reads:</p>
<blockquote>
<p>παραλαβεῖν Μαρίαν τὴν γυναῖκά σου</p>
</blockquote>
<p>Note that τὴν has a grave accent and γυναῖκά has two accents. If you were to ask someone what the accusative singular feminine article is, they'd say τήν not τὴν. Similarly, if you asked someone what the accustive of γυνή is, they'd say γυναῖκα not γυναῖκά. The reason for the differing accentuation in the text is the context: final syllable acute becomes grave unless clause-final and enclitics like σου throw their accent back to the end of the previous word.</p>
<p>Sometimes you want to treat the variations these cause as distinct, sometimes you don't. By including the extra column, users of MorphGNT will have the best of both worlds.</p>
<p>Here is a list of possible differences between the existing text column and the new column:</p>
<ul>
<li>existing text may exhibit elision (e.g. μετ' versus μετά)</li>
<li>existing text may exhibit movable ς or ν</li>
<li>final-acute may become grave</li>
<li>enclitics may lose an accent</li>
<li>word preceding an enclitic may gain an extra accent</li>
<li>the οὐ / οὐκ / οὐχ alternation</li>
</ul>
<p>The new column normalises all these differences.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2005/04/19/current-morphgnt-work/Current MorphGNT Work2015-06-24T04:08:04Z2005-04-19T03:38:26ZJames Tauber
<p>For the last few months, I've been making corrections to [MorphGNT] by attempting to merge an English translation (NASB) marked with Strong's numbers with my database. Although it's a tedious process, it's revealing numerous errors.</p>
<p>For the last few months, I've been making corrections to [MorphGNT] by attempting to merge an English translation (NASB) marked with Strong's numbers with my database. Although it's a tedious process, it's revealing numerous errors.</p>
<p>When James Strong compiled his concordance, he assigned a number to every lemma in the underlying Greek text of the King James Version. Other translations are often made available annotated with these Strong's numbers. <a href="http://www.zhubert.com">Zack Hubert</a> provided me with an electronic text of the NASB translation with Strong's numbers which I converted to something looking like this:</p>
<div class="codehilite"><pre>010101 record 976
010101 genealogy 1078
010101 Jesus 2424
010101 Messiah 5547
010101 son 5207
010101 son 5207
010101 Abraham 11
</pre></div>
<p>The first column is the book, chapter and verse, the second column is the English word as it appears in the NASB translation and the third column is the Strong's number. Note that not all words are included.</p>
<p>I then found an electronic text of Strong's lexicon and stripped out the formatting and the definitions to just get a list of Strong's numbers with a transliteration of the Greek lemma:</p>
<div class="codehilite"><pre>1 a
2 Aaron
3 Abaddon
4 abares
5 Abba
6 Abel
7 Abia
8 Abiathar
9 Abilene
10 Abioud
</pre></div>
<p>Finally I took my [MorphGNT] database and extracted the lemmata:</p>
<div class="codehilite"><pre>010101 βίβλος
010101 γένεσις
010101 Ἰησοῦς
010101 Χριστός
010101 υἱός
010101 Δαυίδ
010101 υἱός
010101 Ἀβραάμ
</pre></div>
<p>I then wrote a Python program that attempts to merge the first and third files on the basis of the second. Note that the transliterations in Strong's lexicon don't have accents and there is ambiguity too (both epsilon and eta go to 'e'). That's a fairly straightforward part of the join, however, because it can be automated by the script.</p>
<p>The real challenge comes because:</p>
<ul>
<li>NASB versification isn't the same as the MorphGNT Greek text</li>
<li>the text underlying the NASB is not the same critical text as that of MorphGNT</li>
<li>there are errors in each of the files</li>
<li>there are spelling differences</li>
<li>there are differences in the granularity of the lemmata</li>
</ul>
<p>So my program simply indicates whenever it had trouble performing a match and I have to either:</p>
<ul>
<li>correct my MorphGNT lemma</li>
<li>correct (or merely change to my lemma conventions) the Strong's lexicon file</li>
<li>correct the NASB-Strong file</li>
<li>change the verse numbering in the NASB-Strong file</li>
<li>comment out a particular word that appears in the text underlying the NASB but not the MorphGNT text</li>
</ul>
<p>There were initially thousands of exceptions that each required one of these actions. After a number of months, I now have one thousand left. It takes me about 4 hours to make 100 corrections so I still have a little way to go.</p>
<p>When I'm done, I'll release a new version of [MorphGNT] with the lemma errors that this task revealed corrected.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2004/12/14/morphgnt-v505-available/MorphGNT v5.05 Available2015-06-23T06:12:06Z2004-12-14T05:35:35ZJames Tauber
<p>Various corrections.</p>
<p>Various corrections.</p>
<ul>
<li>Corrected occurrence of ἐμβάλλω for lemma instead of ἐμβλέπω or ἐμβαίνω (thanks to Ted Blakley via Zack Hubert)</li>
<li>Denormalized variant spellings of Ναζαρά</li>
<li>Corrected parse codes of κἀκεῖνος, θρόνοι</li>
<li>Added comparative parse code for σπουδαιοτέρως</li>
<li>Changed lemmata for ἀκριβέστερον, περισσότερον, τολμηρότερον</li>
<li>Changed lemmata for οὕτως, εἵνεκεν, ἑλπίς</li>
<li>Corrected lemma for ζώνην and ζώνη</li>
</ul>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2004/12/14/best-use-morphgnt-so-far/Best Use of MorphGNT So Far2015-06-23T06:11:58Z2004-12-14T05:33:40ZJames Tauber
<p>Zack Hubert has taken my [MorphGNT] and built a <a href="http://zhubert.com">GNT Browser</a> that blew me away!</p>
<p>Zack Hubert has taken my [MorphGNT] and built a <a href="http://zhubert.com">GNT Browser</a> that blew me away!</p>
<p>It displays the text in the browser; hover on a word and the lemma and parsing is shown in a pop-up; click on the word and you get a graph of word occurrence by book with the ability to list all occurrences.</p>
<p>I've toyed with web interfaces to the MorphGNT for years but nothing even remotely as slick as this.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2004/12/09/morphgnt-v504-and-beyond/MorphGNT v5.04 and Beyond2015-06-23T06:11:52Z2004-12-09T05:25:41ZJames Tauber
<p>I've released a new version of my [MorphGNT].</p>
<p>I've released a new version of my [MorphGNT].</p>
<p>Details of the changes are on the [MorphGNT] page but they all stem from a simple query performed via a Python script: in cases where there is no parse-code (i.e. the word is essentially uninflected), is the text form the same as the lexical form (other than accentuation)?</p>
<p>In some cases this rule means that new lexical forms need to be provided to allow for spelling variation, rather than the lexical form normalising spelling. This is an editorial decision I've made that makes more sense in the larger picture of where I'm going with the MorphGNT.</p>
<p>The corrections I'm making to the CCAT database are really just a side-effect of my efforts to build an original database of New Testament Greek morphology. I'll say more about it as it develops but the idea is that surface forms, lexical forms, spelling variations, roots, stems, suppletion, morpho-phonological rules, etc. will all be catalogued with relationships between them expressed as a directed labelled graph.</p>
<p>Eventually, the MorphGNT will reference into this graph rather than merely give the lemma. There'll be a partial ordering of nodes in the graph (expressed by a subset of arc types) and so references will be to the node that is as general as can explain the specific surface form.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2004/12/07/morphgnt-v503-available/MorphGNT v5.03 available2015-06-23T06:11:46Z2004-12-07T05:23:59ZJames Tauber
<p>More corrections now and more coming soon.</p>
<p>More corrections now and more coming soon.</p>
<p>Version 5.03 contains a major correction to the lemma PRO; a correction to MYRA; some spelling distinctions ENEKEN/ENEKA, BETHSAIDA(N), GOLGOTHA(N); and case corrections in proper names GERASENOS, STEFANOS, FOROS, TREIS, TABERNE, DIABLOS.</p>
<p>See [MorphGNT].</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2004/12/05/morphgnt-v502-available/MorphGNT v5.02 Available2015-06-23T06:11:39Z2004-12-05T05:22:13ZJames Tauber
<p>Some breathing corrections on rho-initial words.</p>
<p>Some breathing corrections on rho-initial words.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2004/11/21/morphgnt-v501-available/MorphGNT v5.01 Available2015-06-23T06:11:21Z2004-11-21T05:17:17ZJames Tauber
<p>Found an accent and breathing problem in both the text and lemma for ABEL, ANNA and ANNAS which is now corrected.</p>
<p>Found an accent and breathing problem in both the text and lemma for ABEL, ANNA and ANNAS which is now corrected.</p>
<p>Found an accent and breathing problem in both the text and lemma for ABEL, ANNA and ANNAS which is now corrected.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2004/11/14/morphgnt-v500-available/MorphGNT v5.00 Available2015-06-23T06:11:14Z2004-11-14T05:13:50ZJames Tauber
<p>At wildly varying intensities over the last ten years, I've worked on correcting the UPenn CCAT Morphological Parsed Greek New Testament as a side-effect of larger linguistic analyses I've undertaken.</p>
<p>At wildly varying intensities over the last ten years, I've worked on correcting the UPenn CCAT Morphological Parsed Greek New Testament as a side-effect of larger linguistic analyses I've undertaken.</p>
<p>The last big burst of activity was in 2002 when I resumed work on my own morphological analysis (starting with the nouns). </p>
<p>The last couple of weekends, I've been working on preparing a new release of the corrected MorphGNT file, the first in probably seven or so years.</p>
<p>Prompted by a post to the b-greek mailing list, I've now made that release. MorphGNT v5.00 is now available at [MorphGNT].</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2004/05/04/bible-and-semantic-web/The Bible and the Semantic Web2015-06-23T06:11:08Z2004-05-04T01:00:00ZJames Tauber
<p>For many years I've been thinking about the application of Semantic Web technology to studying (and presenting the results of the study of) the Bible. However, I never really thought about the application of Bible study (and the tools and techniques developed for it) to the Semantic Web.</p>
<p>For many years I've been thinking about the application of Semantic Web technology to studying (and presenting the results of the study of) the Bible. However, I never really thought about the application of Bible study (and the tools and techniques developed for it) to the Semantic Web.</p>
<p>Then I came across this <a href="http://leobard.twoday.net/stories/209611/">great blog entry</a>, discussing the latter.</p>
<p>On the former, there is a wonderful site <a href="http://www.semanticbible.com/">SemanticBible</a> that I hope I can contribute to in some way.</p>
<p>I also really need to get back to my morphological analysis. I haven't thought about it for a while, but I need to come up with URIs for each lemmata and word form. I could even grandfather in Strong's numbers and G/K numbers.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>
http://jktauber.com/2004/12/14/thoughts-gnt-net-parallel-glossing-project/Thoughts on GNT-NET Parallel Glossing Project2015-06-23T06:09:37Z2004-12-14T05:37:05ZJames Tauber
<p>Zack Hubert <a href="http://zhubert.com/node/view/20">mentions</a> that I'm thinking about using the <a href="http://bible.org/">NET Bible</a> for a collaborative parallel glossing project.</p>
<p>Zack Hubert <a href="http://zhubert.com/node/view/20">mentions</a> that I'm thinking about using the <a href="http://bible.org/">NET Bible</a> for a collaborative parallel glossing project.</p>
<p>Here is how it might work:</p>
<p>The user is presented with the Greek text and the NET text.</p>
<p>Consider Luke 1.1. The Greek reads:</p>
<blockquote>
<p>Ἐπειδήπερ πολλοὶ ἐπεχείρησαν ἀνατάξασθαι διήγησιν περὶ τῶν πεπληροφορημένων ἐν ἡμῖν πραγμάτων,</p>
</blockquote>
<p>The NET reads</p>
<blockquote>
<p>Now many have undertaken to compile an account of the things that have been fulfilled among us,</p>
</blockquote>
<p>It should be possible to select any number of words in the Greek and any number of words from the NET and assert that they correspond (or link) to one another. There is no need to link between the entire verse of Greek and the entire verse of the NET because that link has already been made automatically.</p>
<p>Say the user selects Ἐπειδήπερ. They should then be shown the part-of-speech and parse information for the word (in this case C) as well as the lexical form, ἐπειδήπερ. The user should also be shown all previous glosses for ἐπειδήπερ in other contexts.</p>
<p>The user is then instructed to select the word or words that directly translate ἐπειδήπερ. In this case, the user selects <em>Now</em> and submits.</p>
<p>The user need not progress in order. Say the next thing they select is the word πραγμάτων. As before, they are shown the part-of-speech and parse information (N-GPN) and the lexical form, πρᾶγμα. Again the user is show previous glosses. These glosses should include those specifically for πραγμάτων as well as other forms of πρᾶγμα, perhaps displayed differently.</p>
<p>The user then selects <em>things</em> and submits.</p>
<p>It should be possible to select multiple Greek words and link them to just one word from NET. It should also be possible to select one Greek word and link it to multiple words in the NET. Many-to-many links should also be possible. For example, a user could select περὶ τῶν πεπληροφορημένων ἐν ἡμῖν πραγμάτων and <em>of the things that have been fulfilled among us</em> and submit that linkage.</p>
<p>It is also possible that some words won’t link to anything.</p>
<p>Many-to-many linkages should be encouraged where the particular sense of a word is entirely determined by its use in a sequence (such as an idiom).</p>
<p>Users should be discouraged from doing many-to-many linkages where the sequence isn't a grammatical unit such as a phrase. For example, a user shouldn't submit a link between περὶ τῶν and <em>of the</em>. This clearly can't be enforced.</p>
<p>Users should be required to log in before they can submit linkages. Each linkage will be stored with the email address of the person that made the linkage.</p>
<p>While users may be encouraged to work on particular verses, they should be free to go to whatever verses interest them. Duplicate effort is not a problem and provides redundancy. The data can be checked later for inconsistencies.</p>
<hr />
<p><em>originally published on jtauber.com</em></p>