Flocc (functional language on compute clusters)

Flocc (functional language on compute clusters) is a high-level language for Big Data/data parallel programming on clusters. Its compiler showcases a new technique to automatically optimize the storage of Big Data collections on clusters, that works for distributed arrays, maps, and lists. It is much more flexible than existing techniques like HPF and MapReduce that don't optimize their distributed data layouts, and typically only work for one collection type. The compiler works by considering using different distributed-memory implementations of a program's high-level data-parallel operators (encoded as higher-order functions), and uses a type system and type inference algorithm to automatically derive distributed data layout information for these operators. It then code generates MPI programs in C++ from possible plans, and uses a performance feedback based search to look for optimal cluster implementations of input programs.