This project is based upon two datasets of the academic performance of Portuguese students in two different classes: Math and Portuguese. Initially, I show the simplicity of predicting student performance using linear regression. Later, I show that it is still possible, yet more difficult, to predict the final grade without Period 1 and Period 2 grades but we we learn from those predictions provides much deeper insight. I ask deeper questions about the mathematical structure of student performance and potential indicators that can be used for early support and intervention.

Our model does a great job at predicting student success; however, there are deeper questions that this model doesn’t address. In particular, it doesn’t demonstrate how we can pick which students are most likely to fail classes at an early age when they lack the best predictors in this model.

As we’ve seen, the best predictors of success are current grades within the course (G1 and G2), age, quality of family relationships, and absences.

Current grades are already present once a problem exists.

Let’s try to see if we can determine what factors can be more useful at preventing student failure and promoting academic success.

Let’s start by looking at all the variables within a linear model, but remove our strongest indicators, G1 and G2, which overshadow other potential factors.

In [13]:

%%R
fit <- lm(G3 ~ . -G1 -G2, student.mat)

Our predictions stop at 15 but actual scores rise until 20. Without G1 and G2, our model is unable to make predictions that are any higher.

A score of 15 shows a clear dividing line where the "potential" futures merge into current academic success. This line is important in that it can help us determine what deeper differences successful students have from their peers and also allows to create a definition of a "successful" student that we can use.

For this section, it becomes clear that two models will need to be analyzed: one for grades below 15 and another for grades above 15.

So far, the data has shown that it should be broken into three parts in order to analyze deeper predictors of future success.

Students who drop 1. The first isolates students who drop a course. Their final outcome is 0 even though they should have a higher predicted outcome. These students have predicted scores below 10.

Students who finish 2. Between 0 and 15, one set of predictors (one model) will be used to predict student outcomes. 3. Between 15 and 20, a different set of predictors (a different model) will be used.

Half the people who have a G1 grade of 0 drop the course while all students with a G2 grade of 0 drop the course. As can be expected, the G2 curve is steeper since more students drop the course as the first two bad grades come in.

geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.

The most influential predictor for dropping a course seems to be the initial grades in the course. The school one attends also plays a factor.

The first part of this project shows how simple it can be to build an accurate linear model to predict student success. However accurate that model may be, it lacks the ability of providing insight that can allow one to intervene before a student reaches failure.

However, one can still build a fairly accurate model with weaker predictors… to a point. The holistic profile of a student works quite well at predicting a student’s failure or success until a grade of about 15. The model’s profile then breaks down and can no longer perform in predicting success beyond that point, requiring a separate model (or rather profile) to accurately predict a student’s performance. This change in models is suggestive of a dividing line in student profiles that reinforce themselves within the same model to continue to succeed or continue to fail. Within these Portuguese classes the natural point of division lies at a grade of 15.

Successful students tend to have parents who live together, a history of success, and a desire to continue on to higher education. The best weak predictors for students include school, past failures, access to school supplies, absences, and frequency of going out. Student are more likely to drop a course if they’ve had bad initial grades in that course.

In the end, solutions cannot be proposed, but instead insight in the structure of student performances, how to create student profiles based on that structure, and importantly where to draw the line of success we should push students to cross.