Create a random word generator for a new language?

Posted 18 January 2012 - 04:59 PM

So, I'm creating a new language. I have all the characters written out, I just need to start forming the words. I need a program that will create random words, using English letters, but they can't be too random. For example, I don't use the letter c, if two vowels are together they have to be ii, ey, ei, aa, uu, or oo, and the words can't be too long. I also want some letters to pop up more frequently than others (optional). I don't want any repeating words either. Eventually, I would also like the program to match the new words with English words (that I will provide). In the end, I imagine a large list of 1000 words for my new language and their English translations. I hope this isn't too demanding, but I don't know where to start and I don't feel like making up 1000 words on my own, nor do I have the time. If someone could give me some tips on how I can start, or if someone's willing to start it for me, that'd be great. Thanks!

Replies To: Create a random word generator for a new language?

Re: Create a random word generator for a new language?

Posted 18 January 2012 - 06:03 PM

I'd take those rules and generate the words combinatorially - start with the first "part" of the word, select from your available options (if there's no restriction on the first letter, just grab one randomly). From then, move on, generating random selection keeping your restrictions in mind.

Re: Create a random word generator for a new language?

Posted 18 January 2012 - 07:12 PM

I would have 2 sequences. 1 sequence filled with vowel phonemes the other with consonants phonemes and vowels phonemes(you can repeat certain phonemes if you want them to show up more). first you pick a random vowel phoneme then you pick a random consonant/vowel phoneme add it on to the front or back(randomly decide between the 2 or none at all) then possibly add another to the other side. this ensures that each syllable has a vowel sound and the some have consonants; this *should* ensure more pronounceable syllables.

randomly generate a number of syllables you want for a word then generate that many random syllables. the effect *should* be a more or less pronounceable word.

I found this which has a complete list of phonemes and huge list of syllables.

edit:
also, have you thought about composing your words from syllables instead? that way every single word would be pronounceable for sure.

Re: Create a random word generator for a new language?

Posted 18 January 2012 - 08:27 PM

ishkabible, on 18 January 2012 - 07:12 PM, said:

I would have 2 sequences. 1 sequence filled with vowel phonemes the other with consonants phonemes and vowels phonemes(you can repeat certain phonemes if you want them to show up more). first you pick a random vowel phoneme then you pick a random consonant/vowel phoneme add it on to the front or back(randomly decide between the 2 or none at all) then possibly add another to the other side. this ensures that each syllable has a vowel sound and the some have consonants; this *should* ensure more pronounceable syllables.

randomly generate a number of syllables you want for a word then generate that many random syllables. the effect *should* be a more or less pronounceable word.

I found this which has a complete list of phonemes and huge list of syllables.

edit:
also, have you thought about composing your words from syllables instead? that way every single word would be pronounceable for sure.

Yeah, actually that's how I made the first few words of my language, so that sounds like a great idea. What language do you recommend using?

Re: Create a random word generator for a new language?

Posted 19 January 2012 - 08:58 PM

With Javascript, are there any shortcuts or any easy ways of randomizing the results? I'm also having trouble figuring out how I can place the letters where they would fit best. Sorry, I don't have a ton of experience with Javascript. An example would be great, and I should be able to figure out the rest from that.

And I need to know how to make sure none of the syllables (and eventually words) repeat themselves throughout the final product.

Re: Create a random word generator for a new language?

Posted 20 January 2012 - 10:19 AM

This is not such a trivial problem, actually, for any phonology worth discussing. The random-concatenation procedure will probably come up with some legitimate words, but they will not reflect any particular phonology, and most of them will likely be unpronounceable.

I think you'll have to write a little "grammar" to do this, but it's going to be more difficult than you think, assuming you want to generate all and only the legitimate sequences in the language under a certain length. (where the length will actually be syllables, not characters)

In English, you'd have to think about things in roughly this order:

a syllable can be as simple as V (hi·lar·i·ous) and as complex as CCCVCCCCC (strengths). A complex syllable is not built by simple concatenation: *strenthgs would not be a legitimate word in any language I can imagine.

Clearly you don't want to do the combinatorics on this, and random generation will produce a lot of useless dross.

We notice without too much more thought that some symbols combine readily where others do not, and that the big difficulties will be presented by the consonants. 'q' appears almost universally with a following 'u', 't' is often followed by 'h', 'b' is sometimes followed by 'l' and 'z' cannot appear between two 'k's. So we'd like to break down consonants into certain types. Immediately we have a snag in English, because different sounds can be represented by the same symbol. (It's less of a problem that difference symbols can represent the same sound) I hope your alphabet is more phonetic than English! (take a look at Finnish for this, they keep things nice and simple, at that level... the syntax is a bear, though)
Although they are simpler, there are of course a few issues with vowels - do you count "ou" in "hilarious" as one V or two? Again, a more phonetic alphabet would save us a lot of headaches, but I'll suggest just eliminating complex vowel combinations instead.

So as it turns out the English orthography presents a lot of problems, but it's not all of the problems. So if we simplify the orthography to something more reasonable, we still have to deal with the fact that a consonant cluster can be as complex as [r][n][t][s] - this is a possible ending for a syllable, since we could have a word ending in it. ("How many burgers did they need?" "One rare, two medium, and three burnts"). However, if it's in medial position, you'll find that the syllable breaks at the t-s boundary. (nonce forms like "torntsistic" naturally break tornt-sistic and not *tornts-istic). This means your generator will have to be aware of where in the word you are. The sequence -ntsp- cannot appear in a word-final position, but it's fine when the syllable breaks it into =Vnt-spV-.

By now we're starting to get a little nervous, so we look at the literature, and we find that syllable structure of English has been the subject of intense study for the last fifty years or so, and before, so we're right to be nervous. This is actually hard stuff.

So, maybe just randomly generating characters (and manually pruning out the impossible ones) is the easiest way, after all - but that's a boring answer! You can probably solve this problem for your language if you do a few things to make your life easier. First, make sure your alphabet is a one-to-one mapping of sounds to symbols. Second, disallow complex vowel combinations. (you can make rules for strong and weak vowels based on meter later, those won't affect this problem, I think) Third, start out with monosyllabic words. If you want polysyllables, that's a lot more difficult.

Lastly, if you haven't done any work with formal grammars, it might be good to take a look at them before you dive into this.

Re: Create a random word generator for a new language?

Posted 20 January 2012 - 12:28 PM

You might find my letter combinations snippets in haskell or python useful for this. They currently generate random words based on the English alphabet but you could easily modify to include phonemes and your own alphabet.

Re: Create a random word generator for a new language?

Posted 21 January 2012 - 11:50 PM

Nevermind. I made a list of all possible consonant clusters that can appear before and after the vowel sound in each syllable because every syllable has exactly one vowel sound in it with consonants before and/or after it (or with none at all). After that, my plan is to set all possible consonent clusters before the vowel (CBV), consonent clusters after the vowel (CAV), and each vowel sound (V) and put them in the equations "CBV + V + CAV", "CBV + V", and "V + CAV". Sound like a good plan? Also, I still need to know how to make sure my different syllables won't repeat throughout the final document. Thanks again! You guys have been a great help.