Have you ever dreamed of computers being taught to write a book, compose a music, or create a painting? Consider a situation where a computer can read all the works of Edgar Poe in a few moments, build a model of his style, thus resurrecting him, and start producing more poems on behalf of the author. The time needed to perform the same operation for a human is considerably infinite. Why not resuscitate, by the same means, Bach or Vivaldi to write another "The Four Seasons" masterpiece? The music consists of just seven notes, and a computer, by the same means, can analyze the whole works of Bach, how the notes and pauses correspond to each other etc..., not taking into account the simple drum and base, which are much easier to replicate.

A bit of news in some computer magazine, which encouraged me to write this article and code, was about "The Silver Eggheads" by Fritz Leiber, in which the fiction genre is ruled by megalithic publishing houses whose word mills churn out stories by the pound. It reports to P. M. Parker and his Icon Group International, which produces tons of articles in different genres. So far, searching on Google for "wordmills" produced about 60 matches only, generally pointing to the F. Leibner work.

First, my thought was to use statistics for the generation of text books, which I have tried to implement here. The same concept, by all means, can be applied to statistical analysis of music (e.g., notes and pauses between them (e.g., midi files are good for that purpose), instead of words and punctuation marks), or paintings, or analyzing image features (color and texture) and their correspondence to each other and raw image data.

e.g., read and like follow I, and cats, dogs, and birds follow like etc...

Now, count how many times each [next word] follows a [previous word]. E.g., take the word I and count how many times read and like happen to follow it; there are two occurrences of read and one for like.

I
read 2
like 1

Now, divide 2 and 1 by the sum of the total number of words found following the word I (sum = 2 + 1). Thus, we obtain the probability that read might follow I: 2 times out of 3 (2/3 = 0.666666) and like might follow I: 1 time out of 3 (1/3 = 0.333333). That is, the conditional probability of seeing the word read if the previous word is I, P(read|I) = 0.666666, and the conditional probability of seeing the word like if the previous word is I: P(like|I) = 0.333333.

I
read 0.666666
like 0.333333

Doing the same for each word from the total of 10 different found, we get:

This is the so called first order model, where you keep a history of one previous word. The second, third, and so forth order models keeps the history of 2, 3, or more previous words, so you have the probability of the next word having a combination of two or more words before. Those models provide better results in terms of simulating the text; however, you need to keep track of combinations of words, e.g., I read may be followed by books or papers with equal probability.

I read
books 0,5
papers 0,5

You may contribute an experiment with English letters instead of words. You will see that the second and third order models produce more intelligible word output than the first order model.

C# and GC are great for fast coding. For a quick start, click the Learn button, and select one or several raw text files. The GUI will start processing the text and gathering statistics as P(NextWord|PrevWord). You can Stop the process if it takes too long. Going to the File menu, you can save the extracted dictionary and the text which was extracted. Click the Simulate button to create any number of sentences consisting of maximal number of words. The process simply starts with a random word, starting with a capital letter, then basing on the conditional probability, P(NextWord|PrevWord), selects the next word, and the process repeats until a '.' is met to finish the sentence. Then, P(NextWord|'.') is evaluated to choose the next word, and so on.

There are two classes in the project:

WordExtractor

TextStatistics

WordExtractor is used to process text files, extract a dictionary, and organize text as a sequence of words for calculation of conditional probabilities. You may use its public properties and methods, some of them are self-explanatory:

Once you have processed all the files with the WordExtractor.Extract() methods, you may pass the object to the TextStatistics.Calculate() function. It will process the extracted text, and will calculate conditional probabilities to the dictionary Dictionary<String, List<WordFreqPair>>. The Key is the PrevWord, and Value is the class with only public properties as the NextWord Word and its probability Prob:

Now, to provide some examples of text written by a computer in different areas. I used e-texts provided by Project Gutenberg, a very nice collection. Though first order model is expected to provide worse results, as was shown for the English letters simulation, on the converse, it does show some nice pert and pungent sentences. So the second, third, and higher order models are expected to provide far better results, nearly author like. You may also try to mix authors and get Poe-Darwin results.

Fertility of Great Britain, extremely few species might travel still favourable variations.

I must either useful to meet with the whole classes of the naked Turkish dog.

If green woodpeckers which on the cause of sterile females do not occur in order to make for long as specifically different great geographical changes of all the good, and at hybrids may have been retrograde; and those species of character to its own food might become widely extended the number of profitable.

The limbs, though in the same time to diverge in groups subordinate importance of 1854-5 destroyed, according to render their modified.

The distinction between the incapacity of a glance appear to spring, stamens, are of Bohemia and which are upland goose.

Wollaston suspects, with a conclusion in the incipient species are differently coloured in the nature.

We have not while two divisions of the first crosses with hair-claspers of our presumption is scarcely a limited sense be given.

Just as to adjust the other means of reference the places A and the first place.

This difficulty prevailing here.

But calculation indicate position to the less exact.

We express the following statements in motion of its centre as Euclidean continuum has spared himself with respect to the diagram this time rate permanently slower than on himself with rule and it has the one of their universe only to a change in their measurements.

This theorem of these two masses constituting the train.

This file is at rest relatively to him simultaneously or of the Lorentz transformation for the reference-body, translation because if we can now to formulate the velocity v and tables.

The earth by Adams in a solution of the theory of times greater distance I proceed outwards from this connection with the equation x1, the space co-ordinates.

Who would certainly revealed by the direction indicated in general law-of-field of experience can be obtained by the comparison of values of plane.

Lorentz transformation In point: An observer perceives the clock at least.

48: Events which we see the theory of course depend either the straight lines of the most elegantly confirmed by the most elegantly confirmed for measuring-rods.

Comments and Discussions

That will not take long Depending on the country you're submitting the thesis.
In one of the Russian 'scientific' journal of 'mediocre' quality which presented in the official list of the journals accepted for publications for PhD degree, after some students submitted paper generated by PC and which was published and accepted by referees in that journal, finally the journal was eliminated from that list

Better detection with pulldown content over.
Revolution, designed to mpga now.
Better detection changed several errors.
This General Public License which is permitted to view a work under the executable work are designed to these things.
You must be guided by James Hacker.
You are not derived from the recipient automatically.

... and other nice laughs I got from you application after "learning" the readme and copyright files of VLC. Thank you!

by all means. take assembly instructions as words, used to generate rubbish code in VX developers domains. you will be able to develop some great permutation engine, the poly virus that will never get detected
or for more benign purpose in genetic algs, assembly instructions to solve some optimization tasks.

obviously there is no markov word in 'wordmills' alone. If there had been markov methods applied they would have been discovered by 'wordmills' search.
It points out that the 'wordmills' is not very widespread.

No, I did not see that one before. I was talking from my past experience with the random word/name generator I programmed. Which does exactly what your code does, except that I got the statistics for individual letters as opposed to whole words.
Provide list > this populates the statistics.
Use statistics to determine what state will come next > this is the markov chain part.
But you are more than welcome to call it whatever you want, I was just letting you know that most people know of these things as markov chains.

I spent 3-4 hours playing with this code and have added implementations of 2nd - 4th order models. 2nd order constructed sentences sound like bad asian - english translations, 3rd order sound good and 4th order is getting hard to tell if it's a genuine or original sentence.

I'd been looking for something similar to this previously and this article is a great base to work from. The author states that 1st order implementations are going to give you "yoda style sentences" and that this is just an example of what can be done.

Here's a couple sentences from A Midsummer Nights Dream on 4th order:

Sound, music! Come, my queen, in silence sad, Trip we after the night's shade. We the globe can compass soon, Swifter than the wandering moon.

Pyramus Sweet Moon, I thank thee for thy sunny beams; I thank thee, Moon, for shining now so bright; For, by thy gracious, golden, glittering gleams, I trust to take of truest Thisby sight.

The more I hate, the more he hateth me.

Turns them to shapes and gives to airy nothing A local habitation and a name.

And then the whole quire hold their hips and laugh, And waxen in their mirth and neeze and swear A merrier hour was never wasted there.

What, a play toward!

Do not fret yourself too much in the action, mounsieur; and, good mounsieur, but to help Cavalery Cobweb to scratch.

Nor none, in my mind, now you give her o'er?

Upon the next live creature that it sees.

Following --her womb then rich with my young squire -- Would imitate, and sail upon the land, To fetch me trifles, and return again, As from a voyage, rich with merchandise.

O, in thought our black sea foresees the dolorous curtains,
which in power inspires the storm of agility,
weaker than a servant unable to survive the autumn.
O, the mind will reveal its deadly siege
as the eons of tears give the waves their vulnerable meditation.

Took a while to learn, and after a few tries it came up with this. Almost makes sense (but then again, Shakespeare doesn't make too much sense to start off with ):

MERCUTIO Come, what o the medicine power to him coming.

VALERIA, you spirits, good lord?

Can heaven see bearing off, have been true.

Curbing his faults that you were.

TITUS ANDRONICUS Patience, I dare not made to vent clamour from me boy?

Banditti Enter AGRIPPA Second Witch He hath faults of darkness tell me excuse that it in your adversary?

MENENIUS I'll do extend These letters I had not this?

EUPHRONIUS DOMITIUS ENOBARBUS When Duncan is come, I writ and stand in giving liberty; For these outrages, if thou do beseech you!

is the plain.

BURGUNDY Royal Lear, With something doth, began to renounce, more like the rock not how she is yond tall as me tell thee, Irons of fortune, wherefore base Indian, Which fronted mine honour Demuring upon the people following and flies an example made worms are done, cold shortly as thin helm thy love?

This topic interests me a lot. I once made a program that generates music, not based on information about other works, but based on musical rules that I gave it. So, it is a random walk in a framework of musical rules. Here are some samples --> http://www.randomusic.com/samples/[^]

While there might be quite some time before a program can express new ideeas to write poetry or literature, there could be some real applications based on this kind of statistics.
If you can somehow fingerprint the statistical descriptors for Rembrant paintings, you cold not only identify fakes, but also create new paintings based on existing images like a digital photo.
Also the same could be true for music starting from nature noises.
But the spark of genious is to create those meaningfull descriptors.