Category: AWS Lambda

After a very busy couple of months, including a move to Silicon Valley (!), I’m pleased to say that plot2txt now offers a few more API methods, intended to help with search as much as data mining. The new methods allow the user to automatically create a searchable collection of figures and tables, identified andContinue Reading Figure Search Engine

I’m entering the final stages of a figure search engine, a nice wrapper for the new API method discussed below. It’s also a chance to properly release data mined directly from arxiv figures, and take advantage of the lambda + S3 processing pipeline I developed when pushing the p2t algorithms to cloud initially. Attached isContinue Reading figure meta data

As mentioned, after doing some experiments for the KDnuggets article, I bundled some of the existing API methods into a new one, which will extract from a page figures that have x/y scaling information. The JSON output is well suited to elasticsearch or your favorite flavor of NoSQL eg., an extract of the response: [{“input”:”tmp/quant-ph0002044-9.png”,Continue Reading new api method for data image search

Some time ago I launched a little project, mining data from arxiv; you can read about it in other blog posts. Specifically, I modeled figures from about 500k figures as Gaussian mixture models, in order to create some features, so figures might be ultimately represented as graphs for comparison. More ordinary methods might suffice tooContinue Reading arxiv mining

We’re edging closer to officially releasing available API methods, including a core OCR method (text-lines) that allows for text extraction in the presence of extraneous objects like embedded images and so forth. Image up/download times combined with computation cost at the backend amounts to several seconds, which isn’t too bad. Using curl to POST data,Continue Reading Mining text from document pages

At the time of writing, AWS API gateway doesn’t support gzip requests, so I’ve been handling this at the lambda function itself and client side. Obviously compression makes a dramatic difference w.r.t performance, just ask the guys at Pied Piper 🙂 Another curious absence is support for multipart form data; attached a screen grab fromContinue Reading API Gateway Perf

We attended the AWS San Francisco Summit in April and were really impressed with the breadth and depth of the presentations. A stand out included this talk from Adam Boeglin on optimization. As we prepare to finally commercialize RESTful APIs, it’s clear a few lambda functions need a little tuning; this talk as some greatContinue Reading EC2 Optimization