Raul Castro Fernandez

Postdoctoral Associate. MIT, CSAIL

Today we generate more data than we know how to comprehend. To benefit from the
value hidden in data, our capacity to explore and understand it must match our
capacity to generate it. In my research I work on problems geared towards
bridging the gap. I like designing and building systems to solve practical
problems; my research interests lie at the intersection of databases, systems
and distributed systems.
At MIT I work with professors Sam Madden and Mike
Stonebraker. Before MIT, I completed my PhD at Imperial College London with Peter Pietzuch.

DEBS

Projects

Data Discovery

Data is stored everywhere: in relational databases, files and hundreds of different data sources. These data sources contain valuable information and insights that can be beneficial to multiple aspects of modern data-driven organizations. However, as more data is produced, our ability to use it reduces dramatically, as no single person in the organization knows about all the existent data sources and so they are lost in the crowd. One big challenge is to discover the data sources that are relevant to answer a particular question. We are building Aurum, a data discovery system to answer "discovery queries" on large volumes of data.

Topology-Aware Dataflows

The dataflow abstraction implemented as part of large-scale data processing engines has reduced the processing times required to run analytical queries over large volumes of data by exploiting data parallelism, while permitting users to write their algorithms in a high-level language. However, there is a large number of applications that require both data- and task-parallelism--such as complex physical and biological simulations that depend on linear algebra operations. These applications are typically expressed as SPMD programs that intertwine algorithm logic and topology information. This produces hard to understand and error-prone code. We are exploring a new abstraction to bring the benefits of dataflows to HPC-like programs.

Metadataflows

Some applications require both input data and
fine-tuning a set of parameters---such as learning rate, smoothing factor and
optimizer type for machine learning applications---to produce results: we call
these applications exploratory queries. Users spend long times orchestrating
the different parameters they want to try, which is time-consuming and resource
inefficient: each instantiation becomes a dataflow representation that executes
in a dataflow system. Instead, we propose metadataflows as a new dataflow
abstraction for users to represent exploratory queries succcintly.
Metadataflows permit to exploit characteristics that allow us to execute these
kind of queries more efficiently, such as performing sharing of intermediate
results, avoiding redundant computation and using more sophisticated memory
management mechanisms.

Stateful Data-Parallel Processing

Large-scale data processing systems depend on stateless dataflows to extract data parallelims and execute the programs with fault tolerance. Many applications that require explicit access to state cannot be executed efficiently in such systems. Stateful data-parallel processing permits to execute stateful programs efficiently and still keeping the data parallelism and fault tolerance properties of traditional dataflow systems. In addition, with state in the applications we can translate imperative programs into stateful dataflow graphs, that can execute on a stateful data-parallel processing system.

Mountain View, CA, USA

UC3M: Researcher at FP7 Project

Madrid, Spain

Others

contxt.in

contxt helps you to discuss news with people
you care about. Information overload means that we do not have time to process
the seemingly infinite streams of news we receive every day. contxt helps to
tame this overload by curating news from the many different data sources that
interest you (Facebook, Twitter, LinkedIn, feeds, etc.) and offering them as
concise summaries. You can then start private conversations about those pieces
of news that are more interesting with people you want. By curating multiple
sources of news and trusting your friends, contxt helps you to stay up to data
without effort.

Ecana

I co-founded this company (Ecana Sistemas de Informacion SL) to help
wineries improve their production processes. Ecana acquires data from
sensors deployed in wineyards, weather stations and human-provided
knowledge. We then transform that data into valuable information to
humans and finally we visualise the information in dashboards. The
goal was to keep winemakers up to date as to what is going on in their
winery and alert them when important events occur such as rising probability
of freezing or disease.