Automated Essay Scoring Updates

Today, September 23rd, 2015, we are rolling out the most significant change to our Automated Essay Scoring system in its history. This involves many improvements summarized below:

Enhanced usage of grammar features in our predictive models

Ensemble methods for increased accuracy and reduced variability

Elimination of length bias

Additional predictors

More uniform distribution of scores

Incorporation of other Machine Learning and NLP techniques...

Although these changes represent an improvement in our AES technology, we recognize that classrooms as well as individuals may track changes in a score on a thesis or other written work over time, and that these changes could disrupt that process. To mitigate this issue and ease the transition, we are blending the scores from our previous AES model with scores generated using our new scoring models. As always, we welcome any feedback on the new scoring system.

We hope to continue with another round of major enhancements to the automated grader in the summer of 2016 when it will likely be less disruptive to most users of our service.

52 comments:

The improved models do not "clump" to the center of the scoring distribution as they did in the past. This made it very difficult to get a solid A. So, "No", the intention is not to give higher grades, but "Yes", this is a side-effect of the improvements.

Good question. The reason is that only a few categories are shown to the user with a percentile, and the scoring algorithm uses many additional features. For example, users are not shown a percentile ranking in terms of grammar, but clearly grammar has a major impact on their grade.

The accuracy of our service has increased with our recent release, but please understand that it's still common for the score to be off by 10 points. Most writers will see an increase in their score after the recent updates, but some will see a decrease. We reduced the length bias, which means that papers no longer get a boost simply for being longer. It could be that your previous submissions benefited from this but now they do not. Feel free to visit our contact page and send us the specific paper in question.

I got a moderately good grade on my paper, but fixed the errors and my grade actually went down. There were no errors created by the added improvements of my paper, in fact it showed no errors at all. How is this possible?

Generally, making the suggested improvements will cause your predicted grade to be higher, but this is not always the case. The prediction is based off many things, only a fraction of which are shown in the feedback sections.

Three edits later my C turns into an A and thats ignoring what it says about transitional words and phrases. Most of them are far to stuffy for a fantasy novel xD I think the grading part i still off a but as I'm guessing it must be giving me a higher grade for vocabulary. I consider words like: muted, ambush, forward, stifled, dwindled, sheer, nostalgia, wisps, wafted, wizened to be average. :P

I would like to let you know of some flaws that I found in your vocabulary words section. I'm not sure what the measure of a vocabulary word is, but I'm assuming it means having a good grasp of the English vocabulary and using a more advanced vocabulary. Paper Rater noted my use of appreciated, diverse, academic, and disheartened as vocabulary words, but ignored unorthodox, intricacies, ingenuity, and arduous. Why is that? Personally I think that the latter set of words demonstrates a firmer grasp of the English vocabulary.

We are working on developing a list for public consumption right now. The "PaperRater 500" as we call it will contain 500 words that every high school and college student would benefit from knowing. Please contact us at support@PaperRater.com in a couple of weeks and we will gladly send it to you.

an academic paper must differ substantially from a short story or a novel, for instance lines of dialogue with be shorter, and be crafted to suit the character and the occasion. How does this effect the rating.

I'll chime in here s well. I have been using the Rater (PR) for a few years. Got in on the ground floor. I have noticed the recent changes resulting in lower grades. I was consistently getting A's and now getting B's. The biggest struggle I'm having with the new grading is the tendency to over-emphasize "big words" over concise and readable language. I have been writing for many years, and more complex words (exchanging commonly used words for fancy words I dig out in a thesaurus) is not always a good thing. It might impress a professor, but the final product is not always better. My verdict is still out on the changes...I will continue to use the objective tools, but the subjective feedback has me in a quandary.

Thanks for the feedback, Kent! We hope to continue making improvements to our tools. The automated grading system is simpler than what you would find in a high-stakes testing service, so you are correct in your critique. With that said, our internal tests do confirm that it gets things close to the mark most of the time. We welcome any additional feedback!

It means that our automated scoring system (Grendel) felt that your paper was good enough to receive an A rather than another score (B, C, D, F). However, it also means that there is still some room for improvement, so keep at it! :)

The Paper rater is by far very good compared to others. My score is within a B, unfortunately, however I understood how to modify my B to A! I was trying to understand the trick, why my essays are not A right away?

You could be right the Paper Rater is too lenient in its grading because from your comment it appears you confuse its[a pronoun/adjective] and it's[a contracted form of it is/it has]. With this, I dont think your essays could go close to 98 or 99 as you are likely to come up with other confused words. Anyway, this is my observation.

You are absolutely right! There's a reason a Guassian distribution is called the "normal distribution". But when modeling any process you want the distribution of predictions to match the distribution you are modeling. In data science terms, the distribution of your predictions should match the distribution of your training labels. In our previous predictive models this was not the case.

When I type my essays in word, I use spell check. That however, doesn't catch all of the grammar and spelling errors the paper rater does. Also, it doesn't give me a rough idea of how I'm doing or what I can improve on. Thank you so much paper rater.