1
2
3
4
5
6
7
8
9

Arguments

string

character vector containing the to be analyzed strings (can contain multiple strings).

alphabet

numeric, the number of possible symbols (not necessarily actually appearing in str). Must be one of c(2, 4, 5, 6, 9) (can also be NULL or contain multiple values for acss()). Default is 9.

prior

numeric, the prior probability that the underlying process is random.

span

size of substrings to be created from string.

Details

The algorithmic complexity is computed using the coding theorem method: For a given alphabet size (number of different symbols in a string), all possible or a large number of random samples of Turing machines (TM) with a given number of states (e.g., 5) and number of symbols corresponding to the alphabet size were simulated until they reached a halting state or failed to end.
The outputs of the TMs at the halting states produces a distribution of strings known as the algorithmic probability of the strings. The algorithmic coding theorem (Levin, 1974) establishes the connection between the complexity of a string K(s) and its algorithmic probability D(s) as:

K(s) = log(D(s))

This package accesses a database containing data on 4.5 million strings from length 1 to 12 simulated on TMs with 2, 4, 5, 6, and 9 symbols.

"local_complexity"

A list with elements corresponding to the strings. Each list containes a named vector of algorithmic complexities (K) of all substrings in each string with length span.

"likelihood_d"

A named vector with the likelihoods for string given a detreministic process.

"likelihood_ratio"

A named vector with the likelihood ratios (or Bayes factors) for string given a random rather than detreministic process.

"prob_random"

A named vector with the posterior probabilities that for a random process given the strings and the provided prior for being produced by a random process (default is 0.5, which correspond to a prior of 1 - 0.5 = 0.5 for a detereministic process).

Note

The first time per session one of the functions described here is used, a relatively large dataset is loaded into memory which can take a considerable amount of time (> 10 seconds).