Primarily my interest is in large-scale statistical natural
language transformation models and their applications. This
includes paraphrasing, shallow semantic parsing, text-to-text
generation, entailment recognition, and machine translation. I
work on grammar acquisition, as well as decoding approaches and
algorithms. I'm also curious about efficient processing of vast
amounts of data, particularly randomized and approximative
algorithms, probabilistic data structures and online methods. I'm
quite convinced that semi-supervised learning is a pretty good
idea.

I did internships with the Google Translate
team in Mountain View twice (Summers 2010 and 2011), and the Microsoft Research NLP group (Summer 2012). At Google I
worked with Ashish Venugopal, David Talbot, and Jakob
Uszkoreit. My project at MSR was in collaboration with Chris
Quirk and Bill
Dolan.

My legal name, as per my passport, is Jurij Ganitkevic. It's
the result of an unfortunate transliteration accident and I much
prefer the old spelling of my name that you see above. I continue
to use it in publications, and generally wherever I can get away
with it.

Projects

I'm the main developer of PPDB, a large-scale collection
of automatically induced paraphrases.

I'm also involved in the Joshua decoder, an open-source
statistical machine translation and paraphrasing system developed
at JHU and written in Java. We're trying to make it easily
accessible. Have a
go.

I am one of the main contributors to Thrax, a
sub-project of Joshua developed by Jonny Weese. Thrax is a
fast, Hadoop-based grammar extractor for synchonous
context-free grammars for both translation and
paraphrasing. It supports Hiero-style grammars as well as
grammars with rich syntactic labels. It also contains
modules for distributional context signature extraction over
large corpora such as Annotated Gigaword. It's open-source
as well, so come and lend a
hand.

Finally, I occasionally some work on the cdec decoder, another
open-source statistical machine translation system. This one is
written by CMU's Chris Dyer.