Probabilistic choice

Probabilistic choice
You are encouraged to solve this task according to the task description, using any language you may know.

Given a mapping between items and their required probability of occurrence, generate a million items randomly subject to the given probabilities and compare the target probability of occurrence versus the generated values.

The total of all the probabilities should equal one. (Because floating point arithmetic is involved this is subject to rounding errors).

/* pick a random index from 0 to n-1, according to probablities listed in p[] which is assumed to have a sum of 1. The values in the probablity list matters up to the point where the sum goes over 1 */int rand_idx(double*p,int n){double s =rand()/(RAND_MAX +1.0);int i;for(i =0; i < n -1&&(s -= p[i])>=0; i++);return i;}

Works by first converting the provided Probability Distribution Function into a Cumulative Distribution Function, so that it can simply scan through the CDF list and return the current item as soon as the CDF at that point is greater than the random number generated. The code could be made more concise by skipping this step and instead tracking the whole PDF for each random number; but this code is both faster and more readable.

It uses the language built-in (frequencies) to count the number of occurrences of each distinct name. Note that while we actually generate a sequence of num-trials random samples, the sequence is lazily generated and lazily consumed. This means that the program will scale to an arbitrarily-large num-trials with no ill effects, by throwing away elements it's already processed.

Expected number of aleph was 200000.0 and actually got 199300
Expected number of beth was 166666.66666666672 and actually got 166291
Expected number of gimel was 142857.1428571429 and actually got 143297
Expected number of daleth was 125000.0 and actually got 125032
Expected number of he was 111111.11111111111 and actually got 111540
Expected number of waw was 100000.0 and actually got 100062
Expected number of zayin was 90909.09090909091 and actually got 90719
Expected number of heth was 63455.98845598846 and actually got 63759

IF ABS(SUM-1)>1E-6 THEN PRINT("Probabilities don't sum to 1") ELSE FOR TRIAL=1 TO 1E6 DO R=RND(1) P=0 FOR I%=0 TO UBOUND(PROB,1) DO P+=PROB[I%] IF R<P THEN CNT[I%]+=1 EXIT END IF END FOR END FOR PRINT("Item actual theoretical") PRINT("---------------------------------") FOR I%=0 TO UBOUND(ITEM$,1) DO WRITE("\ \ #.###### #.######";ITEM$[I%],CNT[I%]/1E6,PROB[I%]) END FOR END IFEND PROGRAM

Sample times: 1000000
aleph should be 0.200000 is 0.200492 | Deviatation 0.245%
beth should be 0.166667 is 0.166855 | Deviatation 0.113%
gimel should be 0.142857 is 0.143169 | Deviatation 0.218%
daleth should be 0.125000 is 0.124923 | Deviatation -0.062%
he should be 0.111111 is 0.110511 | Deviatation -0.543%
waw should be 0.100000 is 0.099963 | Deviatation -0.037%
zayin should be 0.090909 is 0.090647 | Deviatation -0.289%
heth should be 0.063456 is 0.063440 | Deviatation -0.025%

For i=1To #times d=Random(#MAXLONG)/#MAXLONG ; Get a random number e=0.0For j=0To ArraySize(Mapps()) e+Mapps(j)\prob ; Get span for current itmeIf d<=e ; Check if it is within this span? Mapps(j)\Amount+1; If so, count it.BreakEndIfNext jNext i

Sample times: 1000000
aleph should be 0.2000000000 is 0.1995520000 | Deviatation -0.225%
beth should be 0.1666666667 is 0.1673270000 | Deviatation 0.395%
gimel should be 0.1428571429 is 0.1432040000 | Deviatation 0.242%
daleth should be 0.1250000000 is 0.1251850000 | Deviatation 0.148%
he should be 0.1111111111 is 0.1109550000 | Deviatation -0.141%
waw should be 0.1000000000 is 0.0999220000 | Deviatation -0.078%
zayin should be 0.0909090909 is 0.0902240000 | Deviatation -0.759%
heth should be 0.0634559885 is 0.0636310000 | Deviatation 0.275%
Press ENTER to exit

probabalistic-choice/exact uses fractions and greatest common denominators and the likes

The test submodule is used for unit tests, and is not run when this code is loaded
as a module. Either run the program in DrRacket or run `raco test prob-choice.rkt`

#lang racket;;; returns a probabalistic choice from the sequence choices;;; choices generates two values -- the chosen value and a;;; probability (weight) of the choice.;;;;;; Note that a hash where keys are choices and values are probabilities;;; is such a sequence.;;;;;; if the total probability < 1 then choice could return #f;;; if the total probability > 1 then some choices may be impossible(define (probabalistic-choice choices) (let-values (((_ choice) ;; the fold provides two values, we only need the second ;; the first will always be a negative number showing that ;; I've run out of random steam (for/fold ((rnd (random)) (choice #f)) (((v p) choices) #:break (<= rnd 0)) (values (- rnd p) v)))) choice))

This algorithm consists of a concise two-line tail-recursive loop (def weighted). The rest of the code is for API robustness, testing and display. weightedProb is for the task as stated (0 < p < 1), and weightedFreq is the equivalent based on integer frequencies (f >= 0).

const func letter: randomLetter is func result var letter: resultLetter is aleph; local var integer: number is 0; begin number := rand(1, 27720); while number > table[resultLetter] do number -:= table[resultLetter]; incr(resultLetter); end while; end func;

The stochasm library function used here constructs a weighted non-deterministic choice of
a set of functions. The pseudo-random number generator is a 64 bit Mersenne twistor
implemented by the run time system.