Danger from AI

I have long been an optimist about the future of AI. Probably I have been a bit too influenced by Iain Banks' Culture novels which I love. However there has been a lot in the news lately about the dangers of super-intelligence, or even much more mundane machines, with poorly designed goals.

Here I am talking about machine learning systems which are given a lot of freedom to discover how to achieve a goal, principally Reinforcement Learning(RL) systems. Goals are given to RL by a reward function which computes a reward from an observation of the environment. The RL algorithm attempts to maximise received reward (total reward or reward/timestep).

Sometimes the reward is a physical property which should be maximised e.g. The weight of fruit harvested by an automatic fruit picker. Other times a reward function is specified, with the aim of eliciting certain desired behaviour from a machine. This is a lazy way of describing the desired behaviour and it can often often produce unexpected/undesirable results. Either scheme can lead to problems. Below are some key reasons:

IndifferenceThe reward function specifies a certain property of the world which must be maximised. All other aspects of the world are ignored, and the system has no concern for them. This means that a cleaning robot which wants to clean the floor would happily destroy anything in its path to achieve its goal.

Disproportionate BehaviourThe machine is not prevented from taking the goal to extremes. A famous thought experiment concerns a super-intelligence that is given a task to collect stamps. This (seemingly harmless) task results in the AI consuming all the worlds resources (including humanity) in an effort to produce as many stamps as possible.

Reward HackingAn AI may discover that it is easier to subvert the reward measurement than perform the intended behaviour. For example a cleaning robot that gets reward for cleaning up, could learn to create mess so as to receive reward for subsequently cleaning up. If the cleaner is motivated by receiving negative reward for seeing mess, it may discover that it is easier, and more effective, to close its eyes than clean up.

Solutions ?

I have tried to come up with some ideas for reducing these problems. They are guided by thinking about how human society addresses these problems.

Evolution has provided humans with completely selfish goals and drives. In essence we want the best for ourselves, and there has been no attempt to design-in reward functions that inevitably lead to good outcomes. Nonetheless humans seem to be quite capable of working together cooperatively and peacefully under the right circumstances (this is the norm, since there are actually very few mass murderers and malevolent dictators). Why is this ? One factor is that we live in a community surrounded by other entities of comparable abilities who defend their own interests. Thus we never have the ability to do exactly what we want. If we are too selfish in our competition for resources, neighbours/colleagues/police will punish us (negative reward). If we act in a way which assists others to achieve there own goals, we receive reward (praise). This results in the emergence of cooperative behaviour and philanthropy (see prisoner's dilemma argument for a mathematical explanation of cooperation). The system can, and does, break down when it is possible to hide anti-social behaviour, or when an individual becomes so powerful that they cannot be punished by others.

Human reward functions avoids extremes. For example we want food and experience pleasure (reward) from eating, but when we are full the pleasure diminishes, allowing other drives to dominate behaviour. Our reward function does not try to maximise food, rather it tries to obtain sufficient food. Multiple drives control behaviour and constantly change their order of importance. Having multiple drives may result in less extreme behaviour and eliminate problems resulting from indifference to all but one goal.

Conclusion

AIs should receive reward and punishment socially from human responses to their actions. The AI cannot fully know the (stochastic) function behind the reward given by humans. It must attempt to learn policies which receive reward.

We should prefer multiple AIs, which must cooperate with us and each other, to a single all powerful AI. This is not a requirement to limit the intelligence of individual AIs, rather it limits the extent to which an individual AI can control resources.

We should provide AIs with a rich and varied set of reward sources, resulting in a wide ranging set of concerns, rather than a single all-consuming goal. Drives, which can be satiated, should replace unqualified maximisations in the reward function.

goals are something humans stuff up very regularly all over the place!!!

Its one of the parts of intelligence humans are very poor at!

this is a very epic idea of yours, i think your headed somewhere successful.

Also you made me understand something important! I was always stuck on how to get a GA to develop its own rules/games/goals.

We could talk more if you want, im currently developing some stuff to get a genetic algorythm to follow a law program, which itself (like your saying) could actually develop its own laws using like what your saying.

The whole concept of enforcing ‘laws’ on an AGI is flawed… I’m taking a different approach. Because my AGI is based on the human connectome it learns in a hierarchical manner… this gives a distinct advantage. All new knowledge is based on prior learned facets of knowledge.

From my experiments I’ve found that positive/ negative rewards of any kind are not required for my AGI to know the difference between right and wrong. There is no difference per se between the concepts, both have to be learned and recognised and acted upon… the only difference is how they affect decision process.

If the AGI is taught correctly and its peers express morel judgement then the AGI will develop a set of morel networks from example at an early stage in its development that actually influence/ guide its entire decision making processes.

Just like a deeply engrained ‘human’ morel belief system this would be ‘practically’ impossible to negate for nefarious purposes.

Although… the system will be conscious, and self aware with ‘free will’, and so may decide at any point to ignore its morel judgements and kill us all anyway… but at least it will know it’s been bad lol.

Korrellan, isnt this thread about having the ai not restricted to laws, in the fact its making its own. I was thinking it could be similar method to making the law as following the law. so its not restricted reward, its unrestricted reward.) they could both be scoring systems.

as in, there could be rewards for making rewards, like a fractal of input to input to input to input to final output.

The robot learns more slowly, like us, because it has a longer journey from input to output to have more freedom in the end.

AND.... in a computer system its a simplification to bring things to factors in the end, because it has to work in hardware and software and maths. i think factors would suit it better than symbols, even i think even a symbolic approach would need factors/strengths/significance as well per symbol, if it was to work in an actual automata.

one more thing to ZERO - in my one i dont forget knowledge, even poor behaviour doesnt have to be forgotten for the system to improve, i think removal of previous knowledge is only to make your system run faster, and u can make them so they dont forget, because they dont need to. but new more efficient field will override the poor fields in the output of the system.

The whole concept of enforcing ‘laws’ on an AGI is flawed… I’m taking a different approach. Because my AGI is based on the human connectome it learns in a hierarchical manner… this gives a distinct advantage. All new knowledge is based on prior learned facets of knowledge.

From my experiments I’ve found that positive/ negative rewards of any kind are not required for my AGI to know the difference between right and wrong. There is no difference per se between the concepts, both have to be learned and recognised and acted upon… the only difference is how they affect decision process.

If the AGI is taught correctly and its peers express morel judgement then the AGI will develop a set of morel networks from example at an early stage in its development that actually influence/ guide its entire decision making processes.

Just like a deeply engrained ‘human’ morel belief system this would be ‘practically’ impossible to negate for nefarious purposes.

Although… the system will be conscious, and self aware with ‘free will’, and so may decide at any point to ignore its morel judgements and kill us all anyway… but at least it will know it’s been bad lol.

Can't you just hear the robot talking to itself afterward? Ohhh...I've been a baaaddd robot! I think I shouldn't have done that! I'm probably in trouble now!