Second, I implemented a caching mechanism for passing information between runs of data.R.
The script queries only papers released since the last run, so adding new papers is faster and requires fewer HTTP requests.

Third, I added working paper titles to the information collected.
This allows me to, for example, use tf-idf scores to characterise research areas: