PDF

Rally.org is all about inspiring leaders and inciting action to foster social change. That’s why we’re excited to host the PDF Liberation Hackathon’s West Coast event at RallyPad in San Francisco, CA, Friday through Sunday, Jan. 17-19.

This innovative 48-hour event, which will also take place in Washington D.C., New York City and Chicago, will bring developers and researchers together to come up with new ways to answer to old questions. Hackathon participants will troubleshoot different solutions for extracting large amounts of data sets from PDF documents. PDFs (in their current format) restrict the way that open source models can analyze data.

“[O]pen source models can bring much needed transparency to scientific research, finance, education and other fields plagued by biased, self-serving analytics. Models often need large volumes of data, and if the model is to be run on an ongoing basis, regular data updates are required.

Unfortunately, many data sets are not ready to be loaded into your analytical tool of choice; they arrive in an unstructured form and must be organized into a consistent set of rows and columns. This cleaning process can be quite costly. Since open source modeling efforts are usually low dollar operations, the costs of data cleaning may prove to be prohibitive. Hence no open model – distortion and bias continue their reign.”