winograd test and deep learning

(Maybe this Q should be moved to general discussions?)

I've got a question which is about two things: deep learning and the Winograd schema.

For anyone that doesn't know the Winograd schema is a suggested replacement for the Turing test, so it basically sets out to, if someone supplies some software/system and says, "this is intellegent", how do we actually know if that's the case? That's what these tests aim to answer. And the Winograd test does it by giving loads of these style of questions to the system:
"The city councilmen refused the demonstrators a permit because they feared violence. Who feared violence?"
So there's a situation description involving two entities, then a reference to one of those two entities using a pronoun (they, it, he, ... ) and then the question is, who/what is that pronoun referring to. In order to answer common and general sense is required, and an actual understanding of the situation, the interactions.

So my question, I don't know that much about deep learning but from what I do, it seems to be mainly about giving the system the end goal(s), examples of, and then also access to the necessary stuff/context/information/system, and then the deep learning system goes to work to work out how to achieve the end goal within the given system/situation.

Would deep learning systems be able to solve Winograd problems? Bearing in mind the textual descriptions in the Winograd questions can be about anything, so would potentially cover all human general knowledge of life and everything.

If I'm basically right about how deep learning operates, clearly it's no good for creativity. It's not going to come up with creative new ideas. That's simply not how it works. You give it the end answer, it goes to work to work out a way of achieving that end answer. It might come up with a creative route to the specified goal, but it's not going to come up with new interesting end results. The Winograd answers aren't exactly in the categroy of creative new ideas, but on the other hand the end goals are as many as the questions. It seems to me the answers to the Winograd questions are somewhere between creative new answers and speficied end goals. They're neither, they're inbetween those two things.

By deep learning, I'm talking about the method of AI which is famous for learning and playing Atari video games by being given some screenshots of desirable end results. I think most AI that's up and running now, which is proving to be of any use, harks back to, is based on, deep learning?

And just to actually state it clearly, my main question is: Could deep learning solve Winograd schema problems or not do you think?

For what itís worth, maybe someoneís interested, from the deep learning Wilipedia page:

Research psychologist Gary Marcus noted:

"Realistically, deep learning is only part of the larger challenge of building intelligent machines. Such techniques lack ways of representing causal relationships (...) have no obvious ways of performing logical inferences, and they are also still a long way from integrating abstract knowledge, such as information about what objects are, what they are for, and how they are typically used. The most powerful A.I. systems, like Watson (...) use techniques like deep learning as just one element in a very complicated ensemble of techniques, ranging from the statistical technique of Bayesian inference to deductive reasoning."[163]

Deep learning completely lacks any general knowledge and interactions of things, logic, which is exactly what is required to answer winograd questions. So no, deep learning alone wouldnít have a chance of solving a Winograd problem it would seem.

Do people not even know about the Winograd schema thing? Thereís a competition in the next few weeks with a $25k prize at some AI convention The Winograd Schema Challenge at AAAI-18: Announcement | AAAI 2018 Conference . Itís been run before, couple of years ago I think, and the highest score, if memory serves, was something like 58%, which is pretty abysmal. 50% is rock bottom (choosing answers randomly, coin flip style, would get you about 50% because thereís only two possible answers, although I think some have three possible answers, per question). The human score (an average of many people answering many winograd Qs) was about 91% if memory serves, which is slightly surprising, Iíd have expected something around 99%. Be interesting to see how well the best fairs this time.

Does the Winograd schema test thing have an end answer that deep learning can work towards though? The "schema questions" to me don't seem to have anything to "latch onto" to learn (based on my limited exposure and understanding of them).

No, that’s exactly it. The end answers are too varied, numerous and unknown. That’s why I said “The Winograd answers aren't exactly in the category of creative new ideas, but on the other hand the end goals are as many as the questions. It seems to me the answers to the Winograd questions are somewhere between creative new answers and specified end goals. They're neither, they're inbetween those two things.”

Deep learning gets given an end goal (or more usually probably a bunch of separate end goals or answers), then it goes to work with loads of varying data which leads to, somehow, one of its end goals, and learns techniques to get to the given end answer/goals. Then it knows how to do it in the moment quickly. Eg speech recognition, voice to text. There’s a whole lot of auditory versions of people saying dog which leads to the word “dog”. You (or deep learning systems) learn how to identify the characteristics of sounds which are intended to lead to the word “dog”, so you’re flexible, high voice, low voice, distorted voice etc.

Humans do solve winograd like problems all the time when they resolve pronouns, like my use of “they” about 10 words ago, it’s obvious without thinking I’m meaning “humans” not “winograd like problems” there. It’s one of those things we do effortlessly, without conscious thought, but it requires massive amounts of knowledge, pre existing in our heads, not just dictionary factual like knowledge, but also operations/functions/relations - *meaning* - and more nuanced aspects like cultural and social norms (see John’s birthday example below).

To do it obviously we have an underlying logic, schema, relational general knowledge, so even though you’ve probably never come across the exact sentence “Humans do solve winograd like problems all the time when they resolve pronouns” ever before, you effortlessly resolve who/what “they” refers to, because you know resolving is a verb related to solve, and humans can do the operation solving, and problems are things which can be solved.

In order to solve winograds reliably, you’re going to need general knowledge which covers whatever the textual description is about, and because you don’t know in advance what topics are going to be covered, you need a full spread, human like, of general knowledge, and that general knowledge has to include, critically I think, and this is why I reckon the winograd problems are good, how things interact, operations, functions, including the social norms behind those! Eg (this is a winograd I’ve just made up):

It’s John’s birthday today and John wanted a new set of golf clubs. Peter has given John some new golf clubs for John’s birthday present. He is really happy.

In that it’s obvious by he I mean John. But Peter wouldnt’ be logically, factually incorrect; giving is enjoyable. But it’s more likely I mean John. That’s the normal meaning of that sentance. If I wanted to convey Peter was happy, then I’d have had to made that clear in some way. So that’s, I don’t know, a kind of social norm. Blimey, good luck with writing code which learns and stores that across the board.

That’s exactly why these winograd tests are possibly an excellent replacement for Turing tests, and are good tests of intelligence. Although only a base level of intelligence right? It doesn’t test human intelligence very well.

To solve the problem, with a computer, across any/all topics, reliably, is massively hard. More I think about it the better the test is, and is an excellent replacement for the Turing test.

I’ve just started doing a psychology course (which is part of the reason I’m asking about this BTW), in cognitive psychology we’ve just covered schemas, the psychology version.
It’s a theory of memory and learning in cog psy. Piaget (the child development, metaphors guy) was into schemas, as are a bunch of others.

I suppose, in theory, deep learning could be applied to winograd problems, the data set would be humungous: all the words a human has read and heard upto the age of say 20. Actually no, that wouldn’t even be the data set, that would be the end goals!; apart from the incorrect/illogical stuff you hear/read. It’s not the way to do it. Something like schemas, an underlying relational logic.

So far as deep learning and winograd schemas go, it’s almost like the data set and end goals are one in the same.

Interestingly, looking at the rules of the competition on the winograd page I linked to, to compete you turn up with a laptop, any commercially available one, which will run the application to solve the problems. And minimal internet access is allowed (I think that might be quite pivital, exactly how much internet access can you have and to what).

Basically you’d have to store most standard general knowledge in some relational way (ie not too different to what the brain does) on a laptop - and of course work out a way of representing that and collecting it. I don’t know what kind of a system IBM’s Watson (mentioned in that quote about deep learning above) runs on, but I bet it wouldn’t run on a laptop. It appears Watson may be capable of solving winograd schemas. But it couldn’t win the competition because it wouldn’t run on a laptop.

Watson won Jeopardy. I still don't really understand that game (I'm not from the US) but from what I do know, it suggests that Watson could solve winograd problems, maybe.

I think we're out of sync timezone-wise so I apologise for very delayed responses.

After I got home today I did some reading on this subject and it's actually quite fascinating (mind-numbing as well). The requirement that any solution needs to run on a commercially available laptop ups the ante considerably even noting that you can, of course, use the GPU

Watson employs a cluster of ninety IBM Power 750 servers, each of which uses a 3.5 GHz POWER7 eight-core processor, with four threads per core. In total, the system has 2,880 POWER7 processor threads and 16 terabytes of RAM

> The requirement that any solution needs to run on a commercially available laptop ups the ante considerably even noting that you can, of course, use the GPU

Indeed. The internet access, whatís allowed and what isnít, seems really critical to me. I donít think the processing power is the main issue, itís storage. You could use exceptional amounts of processing power, and working memory, to set up your memory/processing network. Youíd do the work upfront. But then how big is the end result of that work? Small enough to fit on a laptop? Also any questions which are on topics your upfront work doesnít cover, then itíd be internet and processing carried out in the test time. Plus if you include that kind of contingency plan (for when questions are on topics you havenít covered, and given the full spread of general knowledge itís a given really isnít it?) then you need the code for processing and generating the network, and that processingís working storage (which I imagine would need to be big), irrespective of processing power, would require a lot of space. The way I see it is, once the characteristics, the particular salient aspect have been worked out, not too much space would be required for a particular concept, but to get to that point, a lot of space would be required, because of the workings as it were.

Youíd start with a dictionary including definitions. Use that as a basis. Then in addition some supply of ludicrous amounts of writings, I donít know, I think such things exist, or just use the internet/Google; problem is with that thereís a lot of mess/rubbish/illogicalness. The processing would be same basic logic applied throughout, object oriented style, that is simple same rules applicable and applied at all levels throughout. Pattern finding, and patterns of patterns etc. Frequencies, relations, comparisons. Youíd use the dictionary as the basis, and be looking to generate some sort of schema, not too disimilar to the one pictured above out of the dictionary, of the dictionary. But thatís a graph basically. Graphs arenít enough. I donít know the details of category theory, but from what I understand itís like graph theory but suped up. In particular, thereís types of relations. In graph theory the only relation is a simple dumb link. Anyway, youíd do ........ loads of processing upfront, not using a laptop, to generate a kind of schema based on pattern finding and a base set of definitions -- youíre kind of looking to give a dictionary (and then some extra stuff probably) life kind of, or at least actual operational logic. The brainís receptors are the way they are because of its experience. Itís an interplay between outside stimuli and current mental state (and biological state).

An interesting question is, given that the winograd problems arenít useful for measuring human intelligence, because unless thereís something wrong/ususual, or theyíre very young, pretty much any human who understands language (and only unusual or ill humans donít understand language) can solve winnograd schemas effortlessly. It only measures a basic level of competence of intelligence. So, what would a higher level of artificial intelligence be? And, at what point does ďartificialĒ get dropped? At some point, thatís going to be clearly a stupid name.

> Yeah, not quite a laptop

Right, yup, thatís funny.

Would be interesting to know if Watson can or at least could solve winnograds. The great thing about that deep learning, and the fact it learnt to play some Atari video games is that no one programmed it to play such, or any, games (unlike IBMís chess computer which beat Kasparof). It learnt. I suppose there is an analogy between lots of different versions of audio data which all lead to the word ďdogĒ, and lots of interactions which lead to ball in goal (or whatever winning the game is). Hereís the paper about deep learning learning to play Atari games: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

Missed out the word test there, I meant:
So, what would a higher level of artificial intelligence TEST be?

Essay writing? That's the whole point of essays. I'm talking about the kind you do for university. You paraphrase, at a fairly high level (as in general meaning), not sentence level, in order to demonstrate understanding. But then judging an essay isn't so quick/easy.