Yes, morphology doesn't match phonology perfectly. καινος and κενος have a Levenshtein distance of 2 but phonologically identical (in imperial/modern); then ἑνος is a 'minimal pair' with both of them phonologically, but 3 edits from καινος. To get the result Jonathan originally set out for, one would have to first convert the words into a phonetic representation (e.g. IPA). Should accents be ignored or represented and allowed for in some way?

Furthermore, if the goal is learning to write, showing the different ways to write the same sounds is another task not covered by the 'pairs of words that differ by a single sound.'

This has been a stimulating topic for me. Now I'm looking into how the python Levenshtein module handles diacritics.

A question arose for me: Are there any minimal pairs (in terms of Greek phonology) which have an edit distance of two or greater? Presumably a diphthong swapped with a single vowel would fit into this category (one edit and one additional = edit distance of 2). I feel certain that English would have many of these, but I'm not sure if Greek would.

You can run the notebook and modify it so that it picks the pairs with distance exactly 2, or at most two, if you wish.

The first pair differs by more than one sound ('bara'/'bore'/'bora' vs. 'bo' -- I think it's still a consistent list, just not phonological). The second pair don't seem to be real, inflected words at all, and the real words they correspond to differ by zero.

You could `s/[άέό]ω/ω` and add all the inflected forms of a single verb (perhaps λύω!) to get more results. Or perhaps skipping the lemmatization step would give better results, with real words as they appear in the texts?

I moved the minimal pairs from the BHSA repo to a new one: lingo.
I generated sets for greek and hebrew lexemes and occurrences (4 sets).
The Hebrew occurrence set is done with the words in a semi-phonetic representation that I produced a while ago.

More than a billion comparisons have been made; this is what it took:
1133622067 comparisons of 74889 items resulting in 59381 minimal pairs during 11m 49s on my MacBook Pro.

Here is the code.
The links to the results are in that notebook, but you can also go to their parent directory.

I have applied refinements (stripping punctuation, stress marks) and I have streamlined the code.

This list is hugely useful. I was just starting to put together my own list for teaching students pronunciation by the old-fashioned method of looking through the lexicon. This is so much faster.

Though, one limitation here is that not all of the pronounceable consonant combinations are represented in this list. The following consonant combinations are lacking:βδ, γδ (though this is only given as a poetic form, so easy to ignore for Koine purposes), γλ, γξ, δμ, δν, θλ, θν, σβ, σθ, σφ, τμ, χθ, χλ, χν

Most of these are fairly rare and there may not be a way to represent all of them with strictly minimal pairs, but a full representation of the Greek pronunciation system certainly requires them as well.

I am not sure that the above will result in phonologies, but rather graphologies.

On Greek, do you have ει set to ι? If not, why not?
Do you have αι set to ε? If not why not?
Do you have ω set to ο? If not, why not?
Do you have οι set to υ? If not, why not?
In English do you pronounce "knight" as [k-nix-t]? If not why not?

On Hebrew there are several ways to procede, but why not just use an oriental Hebrew pronunciation (includes actual ʕayin and ħet), recognised around the world? ʕattah 'now' versus 'attah 'you, s.'
One advantage of fluency in a modern dialect is that fluency in modern Hebrew activates the morphology in the student and transfers to the biblical dialect 100%. No other language in the world does that.

On Hebrew there are several ways to procede, but why not just use an oriental Hebrew pronunciation (includes actual ʕayin and ħet), recognised around the world? ʕattah 'now' versus 'attah 'you, s.'
One advantage of fluency in a modern dialect is that fluency in modern Hebrew activates the morphology in the student and transfers to the biblical dialect 100%. No other language in the world does that.

* Moderators : I made a mistake and didn't put my real name as my username. Sorry for this. Could you change to "João Luís"? *

Mr.(and teacher) Buth I didn't get it. If I learn modern Hebrew it would be useful to my studies in biblical Hebrew? "No other language in the world does that". Does it mean that is not so good to learn modern greek to use in koine greek?

The point was severalfold:
For Hebrew the modern oriental pronunciation is recommended.
Yes, modern Hebrew helps with biblical fluency and 100% of modern morphology transfers to biblical morphology. The fact that no other language does this is anecdotal. When a person has internalized "I saw" and "I will see" in modern Hebrew, those exact forms will be used in biblical Hebrew.

As for Greek, the question is more complicated because of the morphological changes and general distance. I am in favor of someone also learning/speaking modern Greek, but it will not help a person directly in the same manner as Hebrew.

The point was severalfold:
For Hebrew the modern oriental pronunciation is recommended.
Yes, modern Hebrew helps with biblical fluency and 100% of modern morphology transfers to biblical morphology. The fact that no other language does this is anecdotal. When a person has internalized "I saw" and "I will see" in modern Hebrew, those exact forms will be used in biblical Hebrew.

As for Greek, the question is more complicated because of the morphological changes and general distance. I am in favor of someone also learning/speaking modern Greek, but it will not help a person directly in the same manner as Hebrew.

Points made by my beginning Hebrew professor (Al Groves) in seminary, except he was starting with biblical Hebrew and let us know that it was a good starting point for modern. Just to clarify, by "modern oriental pronunciation" you mean Sephardic (Israeli)?

Just to clarify, by "modern oriental pronunciation" you mean Sephardic (Israeli)?

Partially. Sort of.

Most Europeans and NorthAmericans use a 5-vowel "Sefardi" pronunciation, but without ʕayin and ħet. The true oriental pronunciation includes those two consonants, a voiced pharyngeal fricative and a voiceless pharyngeal fricative, for those wanting the description. [PS: no, the voiceless pharyngeal fricative is not like German Bach, and is not a velar fricative or uvular fricative. One may also call ʕayin and ħet a voiced retracted-tongue-root constriction and a voiceless retracted-tongue-root constriction.]