Tag: arxiv

After a very busy couple of months, including a move to Silicon Valley (!), I’m pleased to say that plot2txt now offers a few more API methods, intended to help with search as much as data mining. The new methods allow the user to automatically create a searchable collection of figures and tables, identified andContinue Reading Figure Search Engine

I’m entering the final stages of a figure search engine, a nice wrapper for the new API method discussed below. It’s also a chance to properly release data mined directly from arxiv figures, and take advantage of the lambda + S3 processing pipeline I developed when pushing the p2t algorithms to cloud initially. Attached isContinue Reading figure meta data

Some time ago I launched a little project, mining data from arxiv; you can read about it in other blog posts. Specifically, I modeled figures from about 500k figures as Gaussian mixture models, in order to create some features, so figures might be ultimately represented as graphs for comparison. More ordinary methods might suffice tooContinue Reading arxiv mining

My bandwidth is a little limited (literally and figuratively) but you can now find at least 500k of the aforementioned arxiv figures here. Unfortunately you will still need to know the url of the article you’re interested in, and assuming the figures have been extracted, you should be able to retrieve mixture models. You willContinue Reading Figure Search Engine IV

I previously described modeling figure pixels with Gaussian mixtures. A few months ago, I took the same procedure and applied it to over 100k PDF documents in arXiv, which yielded close to 1M figures. The output of the process, for each figure, is a CSV spreadsheet of model parameters, and an image showing the location ofContinue Reading Figure Search Engine II