Exercise

Understanding the tree plot

In the previous exercise you made a fancy plot of the tree that you've learned on the training set. Have another look at the close-up of a node:

Remember how Vincent told you that a tree is learned by separating the training set step-by-step? In an ideal world, the separations lead to subsets that all have the same class. In reality, however, each division will contain both positive and negative training observations. In this node, 76% of the training instances are positive and 24% are negative. The majority class thus is positive, or 1, which is signaled by the number 1 on top. The 36% bit tells you which percentage of the entire training set passes through this particular node. On each tree level, these percentages thus sum up to 100%. Finally, the Pclass = 1,2 bit specifies the feature test on which this node will be separated next. If the test comes out positive, the left branch is taken; if it's negative, the right branch is taken.

Now that you can interpret the tree, can you tell which of the following statements is correct?

Instructions

50xp

Possible Answers

The majority class of the root node is positive, denoting survival.

press 1

The feature test that follows when the Sex is not female, is based on a categorical variable.

press 2

The tree will predict female passengers in class 3 to not survive, although it's close.

press 3

The leftmost leaf is very impure, as the vast majority of the training instances in this leaf are positive.