9.
Decision Trees
• A decision tree is a simple, but
powerful form of multiple variable
analysis. It displays a tree-like
graph of decisions and their
possible consequences.
• Recursive Partitioning-> at each
step, we identify a question that
we use to partition the data.
Advantages:
• Data-driven: Makes no prior
assumptions; selects significant predictors
based on the greatest information gain.
• Flexible: No data pre-processing needed!
Handles numeric and categorical data.
• Easy to interpret and explain to others.

14.
Mr or Noble-> Side-> Port or Starboard:
40% of surviving, 60% of dying
Mr or Noble-> Side-> Unknown:
16% of surviving, 84% of dying
Not a Mr or Noble-> 1st or 2nd Class:
98% of surviving, 2% of dying
Not a Mr or Noble-> 3rd Class-> Pay $23.25
61% of surviving, 39% of dying
Not a Mr or Noble-> 3rd Class-> Pay > $23.25
14% of surviving, 86% of dying

Conditional Tree: ctree

15.
Logistic Regression
Least squares linear
regression
Predicted probabilities can
be greater than 1 or less
than 0 if used for
classification!
LOGISTIC REGRESSION
• Used for binary
qualitative response.
• Using logit ensures all
probabilities are between
1 and 0 only.
Why use Logistic
Regression?
Allows us to establish a
relationship between a binary
outcome variable and a group
of predictor variables. Can be
used as:
• CLASSIFICATION METHOD:
Classifies binary response (E.g.
Yes/No, Pass/Fail,
Survived/Perished)
• REGRESSION METHOD:
Calculates probability (0.0 to
1.0) of the response.

22.
"Any data relating to one's location on the ship could
prove helpful to survival predictions…"

23.
First class adult males had
lower chances of survival
summary(Titanic.glm3)Those in upper decks (1st class) had more
timely, accurate information and shorter
journey to the lifeboats… Yet why did 1st
Class Males have lower survival rates?
Possible explanation:
• 1st Class Males were expected to be
"gentlemen" and perish with the ship.
"No woman shall be left aboard this ship
because Ben Guggenheim was a coward."
• 1st Class Male Survivors were
condemned by society:
> Bruce Ismay – had to resign as
Chairman of White Star Line.
> William Carter – divorced by wife.

26.
Random Forests
Advantages:
• Easy to use: can be used quite efficiently
with default parameters.
• Ideal for people without a deep
background in statistics.
• Produces fairly strong predictions with
only a small amount of coding.
• A group of actors who perform
together.
• An example of an ENSEMBLE
METHOD -- combines multiple
models to produce one result.
• Unlike single decision trees which
can suffer from high variance or
high bias, Random Forests use
random sampling and
averaging to find a natural
balance between the two
extremes.

30.
randomForest package vs. party package
randomForest package
• randomForest(…) function
• mtry is floor(sqrt(p)), which is
the number of features to
randomly select at each split.
• randomForest is
computationally faster.
• Popular in applied research
party package
• cForest(…) function
• mtry set to the number 5 by
default for technical reasons
• Resulting forests are unbiased if
the predictor variables are of
different types.
• Ensembles of conditional inference
trees have not yet been extensively
tested, so this routine is meant for
the expert user only and its current
state is rather experimental.