At 45', I had to stop for a moment, as there is such an amusing elementary mistake on the slides which the eminent Dr. Dr. uses... And, as 1/2 != 3/5, Dr. Dr. Dembski gets as wrong result - which is, as he says - " typical for these search-for-a-search situations". I couldn't agree more.

Is there such a thing as an open-source Javascript webapp and GUI framework with RAD visual designer (doing for Javascript what Delphi does for Object Pascal or Visual Studio does for C#)?

My sole foray into Javascript so far has been a simple DOM thing to implement Weasel and display results in an HTML form. Now there looks to be a plethora of libraries, but little of the info oriented along the lines of interest in my question.

Is there such a thing as an open-source Javascript webapp and GUI framework with RAD visual designer (doing for Javascript what Delphi does for Object Pascal or Visual Studio does for C#)?

My sole foray into Javascript so far has been a simple DOM thing to implement Weasel and display results in an HTML form. Now there looks to be a plethora of libraries, but little of the info oriented along the lines of interest in my question.

I've played with Netbeans and Eclipse but you might be interested in Processing (processing.org) check this out

Avida-ED is pretty close to going to a web application. Currently in alpha test, the new version presents the familiar user interface in your browser. This is made possible by applying the Emscripten compiler to turn the Avida core into a Javascript library. It should be generally available sometime in mid-June.

Diane's MacBook Pro that she does the UI development on had issues. When we were at the Apple Store in Tampa for repairs to it, we brought up Avida-ED on an iPad and an iPod there. Probably in Safari, which is not a good fit now because Safari refuses to allow saving blob data to local storage. Fixing that has been in the Safari developers issue list for a long time now.

I am wondering if there is an analytical solution to the Weasel algorithm. I think of a probability distribution of the number of trials necessary to achieve a target sequence, the mean and variance implying the following parameters:

length of the alphabetlength of the target sequencemutation ratepopulation size

Have you seen a site where this is done? On evoinfo.org Dembski proposes a solution for what he calls "partitioned search". But I am looking for a solution of what he names "proximity reward search", which apparently is the Weasel algorithm. To be clear, I am not interested in a numerical but analytical solution of the mean and variance as a functions of the above-mentioned parameters.

I am wondering if there is an analytical solution to the Weasel algorithm. I think of a probability distribution of the number of trials necessary to achieve a target sequence, the mean and variance implying the following parameters:

length of the alphabetlength of the target sequencemutation ratepopulation size

Have you seen a site where this is done? On evoinfo.org Dembski proposes a solution for what he calls "partitioned search". But I am looking for a solution of what he names "proximity reward search", which apparently is the Weasel algorithm. To be clear, I am not interested in a numerical but analytical solution of the mean and variance as a functions of the above-mentioned parameters.

Further up the thread, I pointed out the difference between partitioned search and "weasel".

"Locking" or "latching" is the same as removing the term that allows for correct bases to mutate to incorrect ones. What remains is an expectation that the number of correct bases can only monotonically increase.

If you have the analytical form you like for partitioned search, then modify to add in the additional element I note for "weasel" and any other adjustments. I left off that project before fully working up the population component for probabilities.

I've had some issues with hosted images going stale. I need to look up some of my graphs in this thread and restore them.

And somewhere, sometime, I know I did a numerical scan of parameter space to show the likely range of parameters for Dawkins' original runs given his reported generation times for results. I'm not finding where I shared that, though.

I am wondering if there is an analytical solution to the Weasel algorithm. I think of a probability distribution of the number of trials necessary to achieve a target sequence, the mean and variance implying the following parameters:

length of the alphabetlength of the target sequencemutation ratepopulation size

Have you seen a site where this is done? On evoinfo.org Dembski proposes a solution for what he calls "partitioned search". But I am looking for a solution of what he names "proximity reward search", which apparently is the Weasel algorithm. To be clear, I am not interested in a numerical but analytical solution of the mean and variance as a functions of the above-mentioned parameters.

This is a linear law with respect to L if the other parameters are held constant. However, this does not match a Monte Carlo simulation, which shows that E is exponential with respect to L. So there must be some error in your calculation or I don't understand you well.

DiEb: Thanks for the link, I will look into this. Seems to be complicated however...

DiEb: Thanks for the link, I will look into this. Seems to be complicated however...

Well, there are three different approaches:

1) Simulation

2) Numerical Modelling

3) Analytical Modelling

Simulation is the easiest way, but doesn't provide much insights. Unless there aren't some clever simplifications/estimation/approximations, the analytical way gets just to complicated for me...

I've preferred to model the weasel as a Markov-Chain, given the mutation rate, the alphabet, the size of the population and the length of the target phrase, \mu and \sigma can be calculated in a straightforward was - and one gets some nice pictures:

The next configuration of a Markov chain only depends on the actual configuration and on neighboring sites, right? But the probability of a nucleotide sequence to achieve a target depends on all sites, that is, all sites must be correct. For example the third site of the sequence AAT is not correct if the target is AAA even though the first and second site are correct. So whether the target is achieved or not depends on all three sites. But maybe I don't understand exactly what is a Markov chain. Can you explain?

Can you also explain what is on the abscissa and ordinate in your graphic? And what mean the lines? Some constant parameters?

This is a linear law with respect to L if the other parameters are held constant. However, this does not match a Monte Carlo simulation, which shows that E is exponential with respect to L. So there must be some error in your calculation or I don't understand you well.

DiEb: Thanks for the link, I will look into this. Seems to be complicated however...

Edit: I'll look this over again. I derived it with Monte Carlo ground-truthing, so I'm not sure where we are diverging in expectation.

This is a linear law with respect to L if the other parameters are held constant. However, this does not match a Monte Carlo simulation, which shows that E is exponential with respect to L. So there must be some error in your calculation or I don't understand you well.

DiEb: Thanks for the link, I will look into this. Seems to be complicated however...

Edit: I'll look this over again. I derived it with Monte Carlo ground-truthing, so I'm not sure where we are diverging in expectation.

I think the issue is that the expectation I derived in the equation is the expected number of correct bases in one possibly-mutated offspring. I have rechecked the equation with a new Monte Carlo analysis, and it checks out.

Note that by the point we are mutating a string with 2 correct bases, we *expect* fewer than 2 correct bases afterward.

Partitioned search mutation expectation is always greater than or equal to the starting correct number of bases.

When we start talking about the expected number of generations to increment the best organism in the population to having another correct base, yes, that ends up in an exponentially increasing series with increasing number of correct bases. But the mutation expectation per offspring is a simpler calculation than that.

Sorry Wesley, but I don't understand a clue what you are talking about...

I started to read Utiger's paper, he explains it quite well. I mean what we need is an equation for the mean number of generations necessary to achieve a target. For instance, for Dawkins' weasel sentence this number is around 60 or so for a population size of 100 and a mutation rate of 0.05 as explained on Wiki.

Utiger found a distribution like that of throwing dices:

P(v) = q^v-1 p^v

where v is the number of generations and p = 1-q is the probability that the dice got the correct number. When several nucleotides and a population size greater than one is involved, p and q become matrices with the same dimension as the length of the sequence. The mean is calculated in the same manner than for dices. This way, Utiger found that the mean is a logarithmic law with respect to the sequence length if the population size is greater than one, otherwise it is exponential. He checks this with Monte Carlo simulations and both the analytical and numerical results perfectly fit.

Can you also explain what is on the abscissa and ordinate in your graphic? And what mean the lines? Some constant parameters?

The graph answers the question "What is the expected number of queries for Dawkins's original weasel?"

That is, the size of the alphabet (27) and the length of the target (28) are fixed.

On the x-axis, there is the mutation rate \mu, on the y-axis, the number of queries, i.e., the size of the population times the number of expected generations.

The coloured lines show the relation between \mu and number of queries for certain certain sizes of population. The dot displays the minimal expected number of queries for a certain population size - the black text enlists the relevant information:

1) size of population (e.g., 2 for the red line)2) most efficient rate of mutation (e.g., 0.000 049 for the read line)3) number of expected queries of this combination of size and rate (e.g., 21,213)

The black line connects the minimal points and allows for extrapolation - though this is quite difficult in this log-log diagram.

When I calculated the values nearly eight years ago, I concluded that the most efficient size of population would be 9 with a mutation rate of 0.00901: this would result in 1576 queries on average (or some 175 generations).

When I calculated the values nearly eight years ago, I concluded that the most efficient size of population would be 9 with a mutation rate of 0.00901: this would result in 1576 queries on average (or some 175 generations).

If I understand you well, a pop. size of 9 and mut. rate of 0.00901 yields a minimal average number of queries?

There is an optimal mut. rate. But there is no optimal pop. size yielding a minimal number of trials. So the higher the pop. size, the lower the number of trials.

Maybe it would be more useful if you only had the expected number of trials on the y-axis. Why take the number of queries?

When I calculated the values nearly eight years ago, I concluded that the most efficient size of population would be 9 with a mutation rate of 0.00901: this would result in 1576 queries on average (or some 175 generations).

If I understand you well, a pop. size of 9 and mut. rate of 0.00901 yields a minimal average number of queries?

There is an optimal mut. rate. But there is no optimal pop. size yielding a minimal number of trials. So the higher the pop. size, the lower the number of trials.

Maybe it would be more useful if you only had the expected number of trials on the y-axis. Why take the number of queries?

Frankly, I'm not sure what you mean by "trial"...

The general idea is that one wishes to reduce the costs of a simulation: you can define the idea of costs in various ways - maybe there is a restriction to the size of a generation and / or the number of generations - but it is standard to define the number of calls to the oracle / evaluations of the fitness function as the cost of a program. That's why I displayed the number of queries, i.e., the number of individuals created for which the fitness function has to be evaluated.

The optimal mutation rate depends on the size of the population - there is no overall optimum for all sizes!

The number of trials is the number of tries. When you roll dices for instance, a trial is when you throw the dice once. Or for the Weasel algorithm, it's the number of generations.

I calculated the probability distribution for the Weasel algorithm with a mutation rate of 0.05 and a population size of 100 according to Utiger's paper:

P(v)=|H.F^(v-2).A|

where v is the number of trials (or generations), H and F are matrices, A is a vector and |.| is the 1-norm. This yields

As you can see, the numerical blue points and the analytical red curve perfectly fit. The mean calculated analytically is 79.19 and the standard deviation is 24.64. So the number of queries is 100*79.19 = 7919 according to your indications. On your graphic above however, the intersection point of the vertical line at 5e-02=0.05 with the green curve passing through the point 100... is about 2e+05=2*10^5. This is the number of queries if I understand you well. So there is disagreement with both results...

The number of trials is the number of tries. When you roll dices for instance, a trial is when you throw the dice once. Or for the Weasel algorithm, it's the number of generations.

I calculated the probability distribution for the Weasel algorithm with a mutation rate of 0.05 and a population size of 100 according to Utiger's paper:

P(v)=|H.F^(v-2).A|

where v is the number of trials (or generations), H and F are matrices, A is a vector and |.| is the 1-norm. This yields

As you can see, the numerical blue points and the analytical red curve perfectly fit. The mean calculated analytically is 79.19 and the standard deviation is 24.64. So the number of queries is 100*79.19 = 7919 according to your indications. On your graphic above however, the intersection point of the vertical line at 5e-02=0.05 with the green curve passing through the point 100... is about 2e+05=2*10^5. This is the number of queries if I understand you well. So there is disagreement with both results...

where v is the number of generations and p = 1-q is the probability that the dice got the correct number.

Sorry for this error. The probability distribution for throwing a dice is of course

P(v) = q^(v-1) p

with p=1/6.

You should definitely read Utiger's article even though his conclusion is that natural selection does not work and you guys believe in the exact opposite... But he makes it with honesty and clarity. Furthermore, he does not draw any conclusions in favor of intelligent design or any other creation beliefs. He just says that there is disagreement between the empirical data of 13 million years ago when the split between the Pan and Homo genera occurred and the data furnished by the Weasel-algorithm, that is, billions of years ago...

If you really want to get a grip on the power and limitations of statistics I suggest Nate Silver's "The Signal and The Noise". My favorite quote: "Economists have predicted nine of the last six recessions". Berlinkski famously conflated the mathematical meaning of the word limit with the limit of actions available in the physical world. Personal incredulity unfortunately is a limit even for Mathemeticians. If his work was truly useful he could see if the strategy worked in Poker. I predict the results would see him lose. I have yet to see any mathematician change the outcome of a dice rolled in the past.

I haven't gone through all of Utiger's math, but I can speak some to the biology interactions.

Utiger attributes all change to natural selection. His paper is devoid of references to drift, and his only mention of Kimura is an offhand reference to mathematics. Thus, you can readily dismiss any conclusions he makes about plausibility of genetic change; he is not dealing with the full model of how genetics changes.

The number of trials is the number of tries. When you roll dices for instance, a trial is when you throw the dice once. Or for the Weasel algorithm, it's the number of generations.

I calculated the probability distribution for the Weasel algorithm with a mutation rate of 0.05 and a population size of 100 according to Utiger's paper:

P(v)=|H.F^(v-2).A|

where v is the number of trials (or generations), H and F are matrices, A is a vector and |.| is the 1-norm. This yields

As you can see, the numerical blue points and the analytical red curve perfectly fit. The mean calculated analytically is 79.19 and the standard deviation is 24.64. So the number of queries is 100*79.19 = 7919 according to your indications. On your graphic above however, the intersection point of the vertical line at 5e-02=0.05 with the green curve passing through the point 100... is about 2e+05=2*10^5. This is the number of queries if I understand you well. So there is disagreement with both results...

Great - I will try to find out where my error laid...

Dieb, I don't think it is your error. The plot WebHopper has is for population size 100, and should have been run for population size 9 to speak to your numbers.

The number of trials is the number of tries. When you roll dices for instance, a trial is when you throw the dice once. Or for the Weasel algorithm, it's the number of generations.

I calculated the probability distribution for the Weasel algorithm with a mutation rate of 0.05 and a population size of 100 according to Utiger's paper:

P(v)=|H.F^(v-2).A|

where v is the number of trials (or generations), H and F are matrices, A is a vector and |.| is the 1-norm. This yields

As you can see, the numerical blue points and the analytical red curve perfectly fit. The mean calculated analytically is 79.19 and the standard deviation is 24.64. So the number of queries is 100*79.19 = 7919 according to your indications. On your graphic above however, the intersection point of the vertical line at 5e-02=0.05 with the green curve passing through the point 100... is about 2e+05=2*10^5. This is the number of queries if I understand you well. So there is disagreement with both results...

Great - I will try to find out where my error laid...

Dieb, I don't think it is your error. The plot WebHopper has is for population size 100, and should have been run for population size 9 to speak to your numbers.

Nevertheless, I'll look into it over the next week. Heck, I liked the picture, but something seems to be off.

I run a few simulations: WebHopper seems to be correct! Now, I have to go over eight year old code - from when I was young and pretty ;-)

The number of trials is the number of tries. When you roll dices for instance, a trial is when you throw the dice once. Or for the Weasel algorithm, it's the number of generations.

I calculated the probability distribution for the Weasel algorithm with a mutation rate of 0.05 and a population size of 100 according to Utiger's paper:

P(v)=|H.F^(v-2).A|

where v is the number of trials (or generations), H and F are matrices, A is a vector and |.| is the 1-norm. This yields

As you can see, the numerical blue points and the analytical red curve perfectly fit. The mean calculated analytically is 79.19 and the standard deviation is 24.64. So the number of queries is 100*79.19 = 7919 according to your indications. On your graphic above however, the intersection point of the vertical line at 5e-02=0.05 with the green curve passing through the point 100... is about 2e+05=2*10^5. This is the number of queries if I understand you well. So there is disagreement with both results...

Great - I will try to find out where my error laid...

Dieb, I don't think it is your error. The plot WebHopper has is for population size 100, and should have been run for population size 9 to speak to your numbers.

Nevertheless, I'll look into it over the next week. Heck, I liked the picture, but something seems to be off.

I run a few simulations: WebHopper seems to be correct! Now, I have to go over eight year old code - from when I was young and pretty ;-)

ARRRRGH! I don't know how I have managed it, but I uploaded two wrong pictures! Those belong to a blogpost from Oct 3, 2009 on Dembski's "Random Search" which had a target length of 100, and an alphabet of size 2 only!

The day before I wrote about Dawkin's weasel - here are the correct pictures for Dawkins:

The number of queries:

The number of generations:

i) for a population size of 100 and a mutation rate of 0.044, I get an expected number of generations of 78.47

ii) I still think that the number of individuals created best reflects the cost of the algorithm - a least mathematically: the onerous task is to create/mutate a child, using random numbers - in this sense, 10 generations of 10 children are equally expensive as one generation of 100 children.