Working journal

Finally having run something in ProvingGround involving the main learning method, here are some observations.

Simple Andrews-Curtis run

Starting with an initial distribution, evoution took place with the basic Andrews-Curtis moves. I ran two loops with two steps each, and then resumed and ran 3 more loops. The raw results are at first run results

Conclusions

In terms of the highest weights, the method worked well even with such a short run. For example, the presentation $\langle a, b; \bar{a}, b\rangle$ was one of those with high weight.

In general, the order was according to expected behaviour.

However, virtually all theorems were remembered.

This stems from two weaknesses:

two high a weight for theorems relative to proofs, leading to memos favoured over deduction.

not enough methods of deduction, specifically exlusion of multiplication-inversion moves.

Database

Commits

Hub

Maintains all workers and communication with outside. Messages (and a few similar ones).

12345678910111213141516171819

caseobjectActorList// returns actor ids and whether they are paused.caseclassQueryState(id:String,pause:Boolean=true)caseclassQueryParams(id:String)// look up parameters at hub.caseclassUpdateParams(id:String,p:P)caseclassUpdateState(id:String,x:X)caseclassPauseWorker(id:String)caseclassResumeWorker(id:String)caseclassStopWorker(id:String)caseclassUpdateGoals(id:String,goals:List[X=>Boolean])caseclassActorLoops(id:String,loops:Int)

For simplicity, assume that pausing on success is independent of actor, and pausing on query is part of the query string.

The problem

One way to evolve a system is the genetic algorithm style, where we consider the fitness of the final elements and prune based on this. This however has trivial credit assignment.

At the other extreme is a pure back-propogation. This has nice credit assignment, but done directly the support never changes.

Solution

Firstly, there must be an identity term in the evolution, so credit can go to objects just persisting.

Note that while the result of a big sum does not depend on terms that are zero, specifically $\mu(v)f$ with $\mu(v) = 0$, the gradient still depends on such terms. Specifically we can get a flow back to add weight to $v$

We decouple the big sum component, so that when we apply to an argument which is a distribution, we do not just take a set of functions depending on that distribution. We can instead take a specified support.

We take a big sum over all generated elements, to allow some of them to acquire weight, while computing the gradient.

We should periodically prune by weight, but only after flowing for long enough to allow for picking up weight.

Code

We consider differentiable functions that also depend on a subset of V, by taking linear combinations including bigsums that are the union of the support of a distribution with the given subset.

For propagation, we just take an empty subset of V.

Once we have the image, we take its (essential) support as the given subset of V. We recompute the differentiable function. Note that the value is unchanged.

We use the gradient of the new function to back-propagate.

This gives a single loop, which we repeat to evolve the system.

Purge after looping for enough time.

Using the identity

The gradient of the identity does not depend on any form of a priori support.

Hence the gradient of a persistence term, i.e., identity multiplied by a weigh, attributes weight to everything in the final distribution.

The best way to build differentiable functions for the typical learning system in function-finder is using combinators. At present there is some ad hoc version of this. The better way is to use linear structures systematically.