Final Exam Sample Questions

The final for 8751 will be comprehensive and out of 300 points. The format of the final is 5 definitions, each worth 12 points on the first two pages, followed by 8 pages, each with one question to give you plenty of room to write. Exam questions will be drawn from material related to your presentations, material presented after midterm and some questions each from the material covered in midterms 1 and 2. I will give one question below for each of the nine presentations made in class. Three of these questions will be repeated exactly on the final.

There will be questions covering material from
the first two midterms (sample questions can be found at these links):

Additional sample questions for the remaining material covere:
Sample questions from class lecture:
1. Briefly define the following terms:
Linear programming
Slack variable
Margin (of a SVM decision surface)
Support vector
Domain theory
Bagging
Boosting
Stacking
Market Basket
Itemset
The Apriori Properties
2. Explain the fundamental difference between the Bagging and Ada-Boosting
ensemble learning methods? How do these notions relate to the concept of
generating a good ensemble? What are the advantages and disadvantages of
each method?
3. How is a problem phrased as a linear program in a support vector machine?
Give an example. What are slack variables used for and how are they
represented in the linear program?
4. Explain the concept of a Kernel function in Support Vector Machines.
Why are kernels so useful? What properties should a kernel have to be
used in an SVM?
5. How does the Apriori algorithm learn an association rule (give the
algorithm)? Give two examples of ways to speedup this algorithm.
Show an example of how the algorithm works.
Questions from student presentations (the questions regarding students
presentations will be limited to this set):
1. In the paper on Netflix Prize prediction, Singular Value Decomposition
was used to perform a feature transformation. What does SVD do and why
is it so useful in cases such as these?
2. Give the DIET algorithm for performing feature weighting in nearest-
neighbor algorithms. What are the advantages and disadvantages of this
algorithm.
3. The Osmot system is a search engine that allows researchers to gather
data about users' online behavior. In the paper presented in class,
these researchers attemped to infer behavior and feedback from the
searching done. Explain how they proposed to do this and indicate any
potential problems you see with this approach.
4. SVMTool proposes to learn a Part-Of-Speech tagger from a dictionary
and samples of statements in that language. Give five examples of the
types of features SVMTool considers and samples of such features.
5. Explain how wrapper and filter algorithms work for variable elimination.
How do Stracuzzi and Utgoff propose to use random samples of sets of
variables to set key parameters for selecting a good sample of variables.
6. How does Semi-Supervised Support Vector Machines propose to make use of
unlabeled data. How might this lead to better generalization?
7. What is a "hard" learning problem for skewing? How does skewing make it
possible to effectively learn a decision tree for such a problem?
8. Caruana uses multiple metrics in evaluating supervised learning algorithms,
why? Define four of the metrics Caruana used in his work and why these
metrics might be of interest.
9. What is meant by saliency in Optimal Brain Damage? How is it defined?
What are the potential problems with this approach?