Paper summaryisarandi* Presents an architecture dubbed ResNeXt
* They use modules built of
* 1x1 conv
* 3x3 group conv, keeping the depth constant. It's like a usual conv, but it's not fully connected along the depth axis, but only connected within groups
* 1x1 conv
* plus a skip connection coming from the module input
* Advantages:
* Fewer parameters, since the full connections are only within the groups
* Allows more feature channels at the cost of more aggressive grouping
* Better performance when keeping the number of params constant
* Questions/Disadvantages:
* Instead of keeping the num of params constant, how about aiming at constant memory consumption? Having more feature channels requires more RAM, even if the connections are sparser and hence there are fewer params
* Not so much improvement over ResNet

First published: 2016/11/16 (1 year ago)Abstract: We present a simple, highly modularized network architecture for image
classification. Our network is constructed by repeating a building block that
aggregates a set of transformations with the same topology. Our simple design
results in a homogeneous, multi-branch architecture that has only a few
hyper-parameters to set. This strategy exposes a new dimension, which we call
"cardinality" (the size of the set of transformations), as an essential factor
in addition to the dimensions of depth and width. On the ImageNet-1K dataset,
we empirically show that even under the restricted condition of maintaining
complexity, increasing cardinality is able to improve classification accuracy.
Moreover, increasing cardinality is more effective than going deeper or wider
when we increase the capacity. Our models, named ResNeXt, are the foundations
of our entry to the ILSVRC 2016 classification task in which we secured 2nd
place. We further investigate ResNeXt on an ImageNet-5K set and the COCO
detection set, also showing better results than its ResNet counterpart. The
code and models are publicly available online.

* Presents an architecture dubbed ResNeXt
* They use modules built of
* 1x1 conv
* 3x3 group conv, keeping the depth constant. It's like a usual conv, but it's not fully connected along the depth axis, but only connected within groups
* 1x1 conv
* plus a skip connection coming from the module input
* Advantages:
* Fewer parameters, since the full connections are only within the groups
* Allows more feature channels at the cost of more aggressive grouping
* Better performance when keeping the number of params constant
* Questions/Disadvantages:
* Instead of keeping the num of params constant, how about aiming at constant memory consumption? Having more feature channels requires more RAM, even if the connections are sparser and hence there are fewer params
* Not so much improvement over ResNet