Quantifying the Energy Efficiency of Object Recognition and Optical Flow

THIS REPORT HAS BEEN WITHDRAWN

Michael Anderson, Forrest Iandola and Kurt Keutzer

In this report, we analyze the computational and performance aspects of current state-of-the-art object recognition and optical flow algorithms. First, we identify important algorithms for object recognition and optical flow, then we perform a pattern decomposition to identify key computations. We include profiles of the runtime and energy efficiency (GFLOPS/W) for our implementation of these applications on a commercial architecture. Finally, we include an analysis of memory-bandwidth boundedness for optical flow to identify opportunities for communication-avoiding algorithms.

Our results were measured on an Intel i7-4770K (Haswell) reference platform. A five-layer convolutional neural network used for object classification achieves 0.70 GFLOPS/W, which is 21% of the theoretical compute bound for this Haswell processor. On the Horn-Schunck, Lucas-Kanade, and Brox optical flow methods our implementations achieve 0.0338, 0.0103, and 0.0203 GFLOPS/W respectively. Our implementation achieves 7.9% of the theoretical bandwidth bound, assuming no cross-iteration memory optimization, for Horn-Schunk optical flow using the Jacobi solver, and 9.7% of the bandwidth bound for the conjugate-gradient solver. To improve performance, we will focus first on increasing bandwidth utilization, then on doing cross-iteration memory optimizations such as blocking and tiling the Jacobi solver and employing communication-avoiding linear solvers.

We also compare the runtime-accuracy tradeoffs for each optical flow method. We find that each method has distinct advantages over the other methods in terms of the runtime-accuracy tradeoff, so we will continue to develop and support all three methods in the future.