Reinforcement learning covers a family of algorithms with the purpose of maximize a cumulative reward that an agent can obtain from an environment.
It seems like training crows to collect cigarette butts in exchange for peanuts, or paraphrasing an old say, the carrot and stick metaphor for cold algorithms instead of living donkeys.

The agent and environment have not been emphasized vainly, they represent more concretely a vacuum cleaner
sweeping your flat, an A/B testing engine for commerce or a driveless car in a crossroad. If you have heard about latest advances in the field, you would have came across of Deepmind’s AlphaZero, by which it is possible, with an affordable set of hardware, to build from scratch the best chess player in the world in just 4 hours.

“Difficulties strengthen the mind, as labor does the body.”
― Lucius Annaeus Seneca

Researches don’t lack of challenges in this field. The most important one comes from the intrinsic nature of RL learning process, which rely solely on the evaluations of its actions. Improvements are driven by just one signal, the reward.

Rewards are often very sparse.

They come after hundred or thousands of steps, exponentially increasing the combination of actions the agent must explore for finding a barely better sequence among of them. If that does not seem arduous, consider also the non-stationary nature of some environments, particularly common in dynamic scenarios where the learning phase resemble pursuing a moving target.

Non-stationary environments

Non-stationary means that the return of an action, performed in the precisely exact conditions of a past experience, might be different from what expected. This is particularly intuitive in the case of multi-agent scenario (MARL), in which the agent plays with one or more other agents that are learning too.

What distinguish RL from other optimization methods

While RL helps on creating agents that can autonomously take decisions, other algorithms attain this goal too, but with different working principles.

Supervised learning could be easily distinguished because it is trained with correct samples instead of vague rewards, making it simpler for loss functions to converge into an useful solution.

Mathematical optimization differs from RL in a more subtle way. Likewise RL, the simplex algorithm find solutions by iterating on optimization loops, but it works only on perfect information problems.
When considering what it exactly means, let’s look at the knapsack or the traveling salesman problems.

All the necessary informations for elaborating the optimal solution are readily there.

In other words, there is no exploration.
Conversely, an RL agent is like a probe on an heavenly body, where the assumptions on the environment are nearly absent. The agent needs to figure out autonomously the good and the bad actions only by the feedback from environment, the so called model free learning approach.

Genetic Programming is an evolutionary optimization method that share most of the characteristics of RL, it is iterative, suitable for imperfect information systems due to its stochastic explorative nature. Even the terminology is somehow related. For example, what is the objective function, in GP is named fitness function, just a polyseme.
What differentiate it from RL is it’s evolutionary method, I briefly explained in this post.

The International Conference on Machine Learning ICML took place this year in Europe,
in the beautiful city of Stockholm from 10th to 15th of July.
This is one of the two premiere conferences (within NIPS) on Artificial Intelligence research, and the numbers indicate the magnitude of the event: 612 accepted papers out of 2473 submissions, 9 tutorial and 67 workshop sessions on the latest advances in all disciplines of machine learning. One of the most intriguing workshop was about machine intelligence capable of writing software code for complex procedural behavior.

Goal programming attempts to find solutions which possibly satisfy, otherwise violates minimally, a set of goals. It has been enjoyed in innumerable domains such as engineering, financing or resource allocation. Solutions may include optimal strategies to maximize, for example, a sale’s profit or, on the other hand, to minimize the cost of a purchase under an acceptable threshold.

An optimized plan could be blended as a program defined as an abstract syntax tree (AST):

What is this exhilarating noise come out of my mouth when I talk? Not surely because that precise sequence of sounds, pops and squeezes are particularly melodic, but thanks to that palace of sophistications erected in favor of language, we can talk and afford a wide range of expressions. Since I began erratically to explore natural language processing I have been wondering how it comes out so natural for us, while it is extremely complicated from a computational perspective. What has caught my curiosity is the nature of language and its fundamental aspects that might have shaped the rudimentary ‘Me Tarzan, you Jane’, the sentence that paraphrases the earliest and the simplest level of language.

The difficulty of studying the evolution of language is that in its early forms the available evidences are sparse. Spoken languages don’t leave fossils. Moreover, all existing languages, including the far remote tribal ones, are already sophisticated. Contemporary ones have a lot of words, refined grammar structures and can express almost everything with a remarkable richness of details. Even in written human records collected so far, dating 5.000 years ago or so, things look almost the same like they are now.
Linguists have studied how communication change over time and inferred how it could appear us when the first rudimental steps toward a language were adopted in the first place.
What are the basic and fundamental aspects and principles of language that whether they would be taken away, the whole towering edifice of language would immediately collapse like a stack of cards? I would introduce them by a simple composition, which could not be taken as an example of eloquence, but nobody would find it difficult to understand:

I supermarket enter basket bring pick fresh fruit

I go cashier pay cashier basket bring bag quit

As might be noticed, there are no grammatical elements (prepositions, conjunctions, adverbs, plurals, tenses, relative clauses, complement clauses) that glue and hold sentences together, nor any abstract term. Nonetheless, the proto-sentence remains comprehensible due to very few natural principles that arrange those words together. Those principles crystallized into our brain million of years before language was even conceived by our ancestors. The evolution wired those principles in our cortex for facilitating communication.
The first lines of distinction in early languages came from the concrete world, such as actions and things and how to refer to them in space, the pointing words. The second principle refers to the sequentiality of events and and as one can correctly imagine this affect the ordering of words. The third is more about the economy of communication, by contextualizing meanings and references in the sentence.

Pointing words

Pointing words assist for referring or locating something in space. They are This, that, here, there and their reference depends on where the actors are. What is this for me could be that for you, due to the relative position of object and subject. Those referencing words are not simply compelling because children use them as an accompaniment to the pointing gesture, reinforcing the intimate link between physical world and mental representation in premature brains. Pointing words, oppositely to other grammatical terms, are not originated by anything else than pointing words. They are root and core concepts.

Things, actions

The sample text should help to inform that early languages were restricted to simple words, the ones involving only concrete entities in the here and now. Things and action distinction is also a part of what is social intelligence and the world representation which is common in other primates and this conceptual distinction was already there. Even metaphors, that count a large belonging among words of our dictionary, turns out of have concrete origins, they were evolved from elements of physical environment.

Order of words

Another basic principle of any language relies on a single strategy: the ordering of words.
What belongs together in reality appears close also in the language and follows the same sequentiality. It is natural to describe an action as central word between two participants. Between the actor and the patient (whom the action is performed) the order is the ordinary mapping from reality to language. Consider for example the
Caesar’s Principle: I came, I saw, I conquered (veni vidi vici). This saying was conferred to Julius Caesar after a victory. The order of words is clearly not accidental, it reflects the sequence of actions in the real world.

Context

The third principle is concerned with repetition. What is already stated or it is not particularly important does not need to be iterated again. What could be understood and inferred from the context may be omitted in the sentence. This follow the principle of least effort, which is also applicable in language. Whether I would have written the story like this:

I supermarket enter I bring basket I pick fruit I quit

the redundancy of the subject would be truly annoying, in any language. Have been invented several ways to keeping track of participants in the conversation, take by example pronouns.