Once the attribute is set move down to each child and repeat the process.

if data is continuoustrat each point as cut points. use projection of the points in x direction and y direction as attribtues

in continuous case pick random x,y values as attributes and whther value is greater or smaller send it to left, right respectively

use highest information gain as before to decide which attribute first.

but if we had 20,000 dimensional vectors we would go for random forrest

if you want to do unsupervised and figureout clusters using randomforrest you can try to fit gaussian for each attribute and each one: fit a gausian (mean, variance) to each side, with highest information is selected

To avoid overfitiing prune

Until pruning is harmful

For each subtree

Remove and replace it with its majority class

evaluate on validation set

Remove subtree that leads to largest accuracy in Validation set

for continuous features, sort, all availblevalues or mean ofconsecutive values