Learning

Describe the difference between "supervised" and "unsupervised"
learning.

Answer click to toggle

Supervised learning uses information about the truth when
training. Unsupervised learning does not have the truth (unless we're
hiding the truth to test an algorithm) so obviously cannot use this
information.

k-means clustering

k-means clustering is an unsupervised or supervised learning strategy?

Answer click to toggle

Unsupervised

What does the choice of \(k\) represent?

Answer click to toggle

The number of clusters.

How does the choice of initial clusters affect the outcome?

Answer click to toggle

The resulting clusters may be different. Initial clusters near
outliers may result in small clusters around the outliers (which is
usually a bad thing).

Note: also be able to perform k-means clustering on some data as in
Homework 8.

Suppose we make our classification engine more cautious; that is, it
is less likely overall to predict any category. Does precision go up
or down or remain unchanged? Does recall go up or down or remain
unchanged?

Answer click to toggle

Precision goes up because there are fewer \(fp\). Recall goes down
because there are more \(fn\).

What are the precision and recall for the following scenario:

The true categories for some data points are:

{noise, noise, signal, noise, signal}

The predicted categories for the data are (same ordering of the data
points):

{noise, signal, signal, signal, noise}

Consider "signal" to be a "positive" claim and "noise" to be a
"negative" claim.

Answer click to toggle

tp = 1, fp = 2, fn = 1

precision is 1/3

recall is 1/2

What does "10-fold cross validation" mean?

Answer click to toggle

This happens 10 times, and the results are averaged: Take 90% of the
input data and train the learning algorithm on it; test the learning
algorithm on the remaining 10%. For each of the 10 iterations,
separate the input data into a different 90/10 split.

k-nearest neighbor

What does k-nearest neighbor allow us to do with a new, unknown data
point?

Answer click to toggle

Determine its category (class, label, tag, etc.).

k-nearest neighbor is an unsupervised or supervised learning strategy?

Answer click to toggle

Supervised

What does the choice of \(k\) represent?

Answer click to toggle

The number of neighbors that get a "vote" during the classification
stage.

What problem may a very small value of \(k\) cause?

Answer click to toggle

Noise has too great an impact. The nearest neighbor will be chosen
without considering the "larger" picture.

What problem may a very large value of \(k\) cause?

Answer click to toggle

The more common category will be chosen more often than it should.

Is there one value for \(k\) that works best for nearly all data sets?
If so, what is it?

Answer click to toggle

There is not one best value; you need to experiment with different
values to find the best for your dataset.

Give one benefit of k-nearest neighbor learning.

Answer click to toggle

It is a very simple algorithm and can work quite well in some cases.

Give one drawback of k-nearest neighbor learning.

Answer click to toggle

It is very slow because it checks every item in the database (though
other techniques can be used). It also requires one to retain all the
training examples in the database.

Note: also be able to perform k-nearest neighbor classification on
some data as in Homework 8.

Philosophy

The "Chinese room" argument

What is the essential goal of "strong AI?"

Answer click to toggle

Create a true "mind," i.e., an intelligent thinking machine. It may
even be conscious.

What is the most critical assumption in the Chinese room argument?

Answer click to toggle

That the person in the room does not understand Chinese.

If you believe the Chinese room argument, can you also (reasonably)
believe that passing the Turing test gives proof that a machine
possesses a mind (i.e., can be said to truly understand things)?