For the IRIS-HEP organization we need to collect publications and update our webpage regularly. I also have to do this for our group webpage, my CV, etc. It's annoying to copy/paste all of this information. The http://inspirehep.net website lets you export bibtex, latex, and plain text for individual papers or a set of papers that match a search result, but it's still not very convenient for web stuff where Markdown is common. The IRIS-HEP webpage is also based on jekyll, and can parse yaml files for making publication pages. So I wanted a tool that could ingest a bunch of paper identifiers and output yaml.

The new INSPIRE beta has a more modern API, so I wanted to try that out. Here's a repo with what I came up with while at CERN

defsummarize_record(recid):url='https://labs.inspirehep.net/api/literature/'+str(recid)max_authors=5r=requests.get(url)data=r.json()['metadata']mini_dict={'recid':recid}mini_dict.update({'title':data['titles'][0]['title']})iflen(data['authors'])>max_authors:#mini_dict.update({'authors':[a['full_name'] for a in data['authors'][:max_authors]]+['et. al.']})mini_dict.update({'authors':"; ".join([a['full_name']foraindata['authors'][:max_authors]]+['et. al.'])})else:mini_dict.update({'authors':[a['full_name']foraindata['authors']]})if'collaborations'indata:mini_dict.update({'collaboration':data['collaborations'][0]['value']})mini_dict.update({'arxiv_eprint':data['arxiv_eprints'][0]['value']})mini_dict.update({'url':'https://arxiv.org/abs/'+data['arxiv_eprints'][0]['value']})mini_dict.update({'creation_date':data['legacy_creation_date']})if'publication_info'indata:mini_dict.update({'journal_title':data['publication_info'][0]['journal_title']})mini_dict.update({'journal_volume':data['publication_info'][0]['journal_volume']})mini_dict.update({'page_start':data['publication_info'][0]['page_start']})mini_dict.update({'journal_year':data['publication_info'][0]['year']})if'dois'indata:mini_dict.update({'doi':data['dois'][0]['value']})returnmini_dict