Python script

Description

Experience Level: Intermediate

Please take a few minutes to read the task in details.

the task requires in implementation of algorithm of Perceptron using Python programming language, ((without)) using any built-in (classification algorithm) from library, but, using numpy and scipy libraries is allowed for the data accessing such as (scipy.sparse) or (numbly.array).

Please download the CA1data.zip attached. On that file, you will find four files: train.positive , train.negative , test.positive , and test.negative .
These files correspond to the positive and negative train/test reviews will be using in this task. Each line in each file represents a review using a set of features. We will be using both unigram and bigram (concatenated using two underscores) features to represent a review. A review is represented using a bag-of-features. Moreover, each feature is counted only once, giving a boolean valued feature representation (i.e. a set of features for each review).

The Task?

1. We need to write a program to load the train/test instances (positive/negative) from the train/test files.
2. Implement a binary Perceptron classifier and measure the classification accuracy on the test instances. Classification accuracy is defined as the percentage of the total number of correctly classified instances to the total number of test instances.
3. Plot the train error rate and test error rate against the number of iterations. According to your plot, what would be the ideal number of iterations to terminate the training?

* please, check the (algorithm.png) attached, to check the perceptron algorithm required to be used.