(1)We improved the soft attention model by introducing convolutional
operations inside the LSTM cell and attention map generation process to capture the spatial layout.
(2)We built a hierarchical two layer LSTM model for action recognition. (3)We tested our model on
three widely applied datasets, the UCF sports dataset, the Olympic dataset and the HMDB51 dataset
with improved results on other published work.