Football Predictions 2.0: Why use id3 to create checking samples?

Football Predictions uses neural network so a checking sample set is needed to evaluate the network configuration. The checking samples can be part of the problem’s sample set and the network will learn the rest. But this way is not good because the network will not be fully learned and may have big errors.

The ID3 method has an explicit Entropy and Information gain argument, so it can be trusted to create a sample set. The neural networks only reference the checking samples, but do not learn on these samples, so the error on the checking sample set is not important. Because the ID3 method is basically correct, it is the basis for the reliability of a neural network with the smallest error. In the tennis problem, the results of the neural network and id3 are exactly the same, which indicates that the neural network is properly set up, not to judge the accuracy.

Why use Football Predictions instead of using id3 directly?

Id3 performs inductive on the sample set to find the rule. In theory, it is only accurate when the sample is complete or infinite (as proof of a sequence of numbers by induction in mathematics). But actually the samples is finite and often incomplete. Therefore, a decisive attribute is just “almost”. The remaining uncertainty must be attributed to other attributes. When an attribute which has the greatest IG to be selected, id3 does not reflect how much its IG is larger than the IG of the second most important attribute, meaning that id3 does not represent the probability that an event belongs to a class.

The neural network provides a mapping over all variables, covering fuzzy gaps on the rule of id3.

Because it does not show the probability, id3 must hand in the “No Data” case. We know that neural networks can be used to recognize images, even if images are deformed. That means it can handle incomplete information. Football Predictions results for every event.