Compare randomized search and grid search for optimizing hyperparameters of a
random forest.
All parameters that influence the learning are searched simultaneously
(except for the number of estimators, which poses a time / quality tradeoff).

The randomized search and the grid search explore exactly the same space of
parameters. The result in parameter settings is quite similar, while the run
time for randomized search is drastically lower.

The performance is slightly worse for the randomized search, though this
is most likely a noise effect and would not carry over to a held-out test set.

Note that in practice, one would not search over this many different parameters
simultaneously using grid search, but pick only the ones deemed most important.

print(__doc__)importnumpyasnpfromtimeimporttimefromscipy.statsimportrandintassp_randintfromsklearn.model_selectionimportGridSearchCVfromsklearn.model_selectionimportRandomizedSearchCVfromsklearn.datasetsimportload_digitsfromsklearn.ensembleimportRandomForestClassifier# get some datadigits=load_digits()X,y=digits.data,digits.target# build a classifierclf=RandomForestClassifier(n_estimators=20)# Utility function to report best scoresdefreport(results,n_top=3):foriinrange(1,n_top+1):candidates=np.flatnonzero(results['rank_test_score']==i)forcandidateincandidates:print("Model with rank: {0}".format(i))print("Mean validation score: {0:.3f} (std: {1:.3f})".format(results['mean_test_score'][candidate],results['std_test_score'][candidate]))print("Parameters: {0}".format(results['params'][candidate]))print("")# specify parameters and distributions to sample fromparam_dist={"max_depth":[3,None],"max_features":sp_randint(1,11),"min_samples_split":sp_randint(2,11),"bootstrap":[True,False],"criterion":["gini","entropy"]}# run randomized searchn_iter_search=20random_search=RandomizedSearchCV(clf,param_distributions=param_dist,n_iter=n_iter_search,cv=5)start=time()random_search.fit(X,y)print("RandomizedSearchCV took %.2f seconds for %d candidates"" parameter settings."%((time()-start),n_iter_search))report(random_search.cv_results_)# use a full grid over all parametersparam_grid={"max_depth":[3,None],"max_features":[1,3,10],"min_samples_split":[2,3,10],"bootstrap":[True,False],"criterion":["gini","entropy"]}# run grid searchgrid_search=GridSearchCV(clf,param_grid=param_grid,cv=5)start=time()grid_search.fit(X,y)print("GridSearchCV took %.2f seconds for %d candidate parameter settings."%(time()-start,len(grid_search.cv_results_['params'])))report(grid_search.cv_results_)