In that case, the vector-matrix multiplication will be executed using a BLAS
kernel (if ETL is configured correclty) and the assignment, the sigmoid and the
addition will be automatically vectorized to use either AVX or SSE depending
on the machine.

This will automatically be computed either with NVIDIA CUDNN (if available) or
with optimized SSE/AVX kernels.

For more information, you can take a look at the Reference on the wiki.

Next version

For the next version, I'll focus on several things:

Improve matrix-matrix multiplication kernels when BLAS is not available. There
is a lot of room for improvement here

Complete support for symmetric matrices (currently experimental)

Maybe some new adapters such as Hermitian matrices

GPU improvements for some operations that can be done entirely on GPU

New convolution performanceimprovements

Perhaps more complete parallel support for some implementations

Drop some compiler support to use full C++14 support

Download ETL

You can download ETL on Github. If you
only interested in the 1.0 version, you can look at the
Releases pages or clone the tag
1.0. There are several branches:

master Is the eternal development branch, may not always be stable

stable Is a branch always pointing to the last tag, no development here

For the future release, there always will tags pointing to the corresponding
commits. I'm not following the git flow way, I'd rather try to have a more
linear history with one eternal development branch, rather than an useless
develop branch or a load of other branches for releases.

Don't hesitate to comment this post if you have any comment on this library or
any question. You can also open an Issue on Github if you have a problem using
this library or propose a Pull Request if you have any contribution you'd like
to make to the library.