Torch vs TensorFlow vs Theano

For an ongoing project at CCRi, we wanted to determine whether remaining with Torch (used for Phase I of a project currently underway at CCRi running on GPUs) or switching to TensorFlow or Theano made the most sense for Phase II of the project. We ultimately found that TensorFlow’s combination of performance and usability made it the best choice as we move into Phase II.

As always with such tests, newer versions of any components used can make these results increasingly dated, but it was interesting to compare the current state of the art of the three frameworks.

All benchmarks were run on boxes using a single Pascal Titan X graphics card with CUDA 8 and version 5.1 of cudNN, NVIDIA’s CUDA Deep Neural Network library.

Summary: fp16 uses less RAM, but is slower per sample. The only reason I can speculate for when one would use fp16 on a Pascal Titan X is if the size of the model with a single batch was otherwise too big to fit in RAM.

Summary: unless you’re Nervana, if you use cudNN everything is basically the same.

For CNN layers, we have the soumith convnet benchmarks, which are well documented and updated. Unfortunately, it doesn’t include Theano in the more recent benchmarks. I am not entirely sure what the fp16 benchmark mentioned there is measuring, exactly; my guess (since it is running on an older Titan X) is that it is simulated and hacked in instead of using native fp16 support.

For RNN layers (incl LSTMs), there are the glample rnn benchmarks (which at this writing date back to May of 2016) using TensorFlow version 0.8 when 0.11 is currently available. We ran these tests on more recent software below.

Model:

A single LSTM layer:

nn.SeqLSTM for Torch (updated version as of Nov 18, 2016)

tf.nn.rnn_cell.LSTMCell for TensorFlow 0.11

scan from Theano 0.8.2 (note: this version is compatible with cudNN 5, not cudNN 5.1, so there may be some problem there, but it still ran pretty quickly)

GPU:

1 Pascal Titan X

Drivers:

Cuda 8 with cudNN 5.1

Summary:

Generally, for just the forward pass, Torch > Theano > TensorFlow.

For forward + backward, it seems that Theano > Torch > TensorFlow. Torch and Theano are generally about the same in this case except for smaller batch sizes with larger numbers of hidden units where Theano crushes Torch and TensorFlow.

As the batch size and hidden layer size grows, the difference between these frameworks shrinks. This is not surprising, as more of the work is being shelled out to cuda, which is the same across the board.

sequence length

batch size

hidden layer size

forward samples / sec

forward + backward samples / sec

Torch

30

32

128

22110

4849

TensorFlow

30

32

128

2778

1410

Theano

30

32

128

15462

5440

Torch

30

32

512

6722

1582

TensorFlow

30

32

512

2155

1285

Theano

30

32

512

7127

1874

Torch

30

32

1024

3618

864

TensorFlow

30

32

1024

1790

888

Theano

30

32

1024

4421

1143

Torch

30

128

128

74897

15131

TensorFlow

30

128

128

8656

5411

Theano

30

128

128

53953

14491

Torch

30

128

512

27781

7335

TensorFlow

30

128

512

6421

4238

Theano

30

128

512

23037

6514

Torch

30

128

1024

10524

3090

TensorFlow

30

128

1024

4753

2702

Theano

30

128

1024

9679

2751

Torch

60

32

128

11126

2364

TensorFlow

60

32

128

1353

879

Theano

60

32

128

5538

3092

Torch

60

32

512

3344

785

TensorFlow

60

32

512

1272

811

Theano

60

32

512

3951

1060

Torch

60

32

1024

1810

428

TensorFlow

60

32

1024

1009

467

Theano

60

32

1024

2339

613

Torch

60

128

128

37693

7575

TensorFlow

60

128

128

5278

3328

Theano

60

128

128

31076

8702

Torch

60

128

512

13966

3676

TensorFlow

60

128

512

4057

2691

Theano

60

128

512

12505

3649

Torch

60

128

1024

5248

1543

TensorFlow

60

128

1024

2695

1423

Theano

60

128

1024

4366

1409

Fluffy Metrics

Usability

Because these are developer tools, we reviewed usability in terms of the Python interfaces for TensorFlow and Theano and the Lua interface for Torch.

Writing Code

Generally, it seems like the ease or challenge in using either Torch or TensorFlow comes from the choice of language. Everyone seems to have Python experience nowadays, whereas Lua experience is rarer. Adding the lack of many basic functions in the Lua language raises the barrier to entry for new users picking up and coding in the environment. Theano requires a paradigm shift in thinking about how to write the code, which makes it more verbose and complicated in general.

The neural network libraries built on top of Torch (nn, rnn, …) and TensorFlow/Theano (Keras), however, seem to be roughly equivalent in terms of structure and therefore are expected to be equivalent in terms of barrier to entry for new users to begin constructing their own models.

Reading Code

With the exception of the raw Theano library, for pure readability it seems like both the raw frameworks and the neural network libraries built on top of them are relatively straightforward to read and understand what is going on, with small syntactic differences here and there, and other relatively confusing aspects that the user can just take on good faith are there for a good reason (“Why does collectgarbage() get called twice in a row here?”). Of course, if you only ever interact with Theano through the Keras library, then it doesn’t really matter how different raw Theano is.

Debugging

TensorFlow: Lots of tools. You can return whichever element of the graph you want and set multiple watchers on your tensorboard. (There’s also a TensorBoard for visualization and organization.)

Torch: Debugging can be done using standard debug tools. Breakpoints can be set at locations in your own code, and in library code, and variables can be inspected at each trigger.

Theano: My experience with this is not recent, but Theano has historically been known to be a pain to debug.

Moving to Production

My understanding is that any of these could be run in a docker, which probably makes for the easiest deployment. Aside from that, one of the biggest difficulties with Torch is that they don’t actually cut releases of any of their code, so your dependencies are “whatever copy of Torch I have now.”

2 Responses

“We ultimately found that TensorFlow’s combination of performance and usability made it the best choice as we move into Phase II.”

Seems like a surprising choice based on this data showing TensorFlow is slower than Torch or Theano, and significantly so in some cases. Was it primarily the community support / tooling / “debugging” aspects that pushed you toward TensorFlow?

Tim Emerick

In the end, the decision came down to three things:
– The tooling and support, as you mention.
– The existence and frequency of regular release versions.
– The note above about how the difference between frameworks for forward and backward is far less significant with larger batch and hidden layer sizes is really key. Our models tend to live in that space, so significant speed concerns with smaller batch and/or hidden layer sizes is less concerning for training in our case.

The above comments apply to model training. For model use after training, there is still a significant speed concern even for larger layer and batch sizes. If this isn’t improved and becomes a significant bottleneck, we may need to revise our approach slightly. For example, building and training our models in TensorFlow so that we have access to all the tooling and the other TensorBoard metrics (which are more useful during model training than usage), then serializing the model and reloading it as a Theano model (switching between backends between training and usage is an operation that Keras largely supports). This would significantly improve our forward pass operations, but also increase deployment challenges.