July 30 - 31, 2011Edinburgh, UK

Joshua is an open-source MT system developed at Johns Hopkins University. It uses a hierarchical phrase-based translation model.
What follows below are step-by-step instructions. This may look like a long list at first glance, but it should make it straightforward to build a machine translation system and all its components, and it should make the process of tuning, testing, and evaluating it transparent.

These instructions are adapted from Chris Callison-Burch's Joshua guide. More instructions and documentation for the use of Thrax, the translation model extractor, can be found on its github wiki.

At this point you need to set some environment variables:
export SRILM=/path/to/srilm
export JAVA_HOME=/Library/Java/Home (on OSX, other OSes are different)

Get the Joshua 1.3 tarball. You can install it with
tar xzf joshua.tar.gz
cd joshua
ant
If ant returns successfully, the decoder is ready to use. But in order to build translation models from the training data, we recommend using Thrax. (If you're also following ccb's guide, use of Thrax replaces step 5.)
To install Thrax:

Download and unpack the Hadoop tarball (or get access to a hadoop cluster)
wget http://apache.cs.utah.edu//hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
tar -xzf hadoop-0.20.2.tar.gz
If you don't have a cluster, some basic hadoop setup for standalone mode is here.

Download and unpack the Amazon Web Services SDK. This is a compilation requirement for Thrax, even though you don't necessarily have to use Amazon's cloud services to run it.
wget http://ds60ft5bv5jal.cloudfront.net/aws-java-sdk-1.1.3.zip
unzip aws-java-sdk-1.1.3.zip