PyF is a Python framework for writing highly scalable data processing, data mining applications, and more. PyF is Free software, distributed under the terms of the MIT license.

To achieve the scalability, PyF is based on flow programming: instead of processing « a certain quantity of data », we process a « flow » of data, so that at any point, we only ever have one object in memory, no matter how much data we will process in total. That’s right, mining your huge customer database and generating reports with PyF will not take your servers down to their knees.

To achieve this, we use Python generators (no need for python extensions like stackless):

Each unit of the whole processing chain takes a generator as input and yields values as soon as they were processed. We could even handle a never ending flow of input data and keep processing them, yielding each one after the other!

At the low level, you have only the basic subset of core functions that will help you write flow-based applications.

At the mid level, you can run your processes in your application, using a wide range of plugins (or writing your own)

At the highest level though, you will find a full-blown web application that allows you to graphically design your processing chain (we call it a tube) by dragging and dropping processing units (we call them components) and chaining them, output to input. We have several default generic components that can be used to do all sorts of processing and reporting already, and it is pretty easy to write your own if necessary (we will gladly help in any case). We even have a built-in scheduler so you can specify when to automatically launch your processes!