1.4.5 Numpy Arrays: Pandas read_csv and stores data as a dataframe, use this code to query one column `df['name_of_col']`, query more than one column `df[['col_1','col_2']]`. A common operation is to split dataframe into feature and target, then convert each to numpy arrays for efficient calculation. `numpy.array(df)`

1.4.6 Training models in sklearn: this course will cover important classification algorithms including Logistic Regression, Neural Networks, Decision Tree, Support Vector Machines. Modeling data is easy in sklearn `classifier.fit(X,y). Important exercise playing with decision boundaries. Seems like decision tree really fits "boxy" data well, because it can draw vertical and horizontal boundaries. But careful, the data may be "circular", in that case NN and SVM can fit better.

1.5.1 Regression model returns a numeric number, classification model returns a state. Testing reveals how well the model is doing. It's possible to make a model with a frontier or fitted line that's so curvey, it fits data perfectly, but it doesn't generalize well. Requirement of defining a good testing evaluation function, is to figure out if the model can generalize. Split datainto train_data, test_data. Train model with train_data, test model with test_data. from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = from sklearn.model_selection import train_test_split(X,y,test_size= 0.25) if you try the code above you will get an error ImportError: No module named model_selection. Previously the train_test_split is in ImportError: No module named model_selection