Using Accelerate and simd

Learn how to use sophisticated Signal and Image Processing techniques to bring higher performance to your apps while lowering battery consumption. See compelling use cases for the Accelerate framework with interactive demos. Explore using simd—a valuable addition that effortlessly brings vector programming to your apps.

WWDC 2017

WWDC 2016

WWDC 2015

WWDC 2014

Good morning.
How is everybody?
Good.
My name is Matthew Badin, and
I'm an engineer at Apple.
Welcome to the Using Accelerate
and simd session.

So my colleague Luke Chang and I
are pretty excited today.
We get to talk to you about all
the great APIs that are
available in Accelerate and its
associated frameworks, and we're
going to start by taking a
high-level overview of
Accelerate and some of the
high-performance libraries that
are contained inside it.
We're then going to take a deep
dive into a few of libraries,
and we're going to start by
taking a look at vDSP.

We have two examples for you.
The first one, we're going to
show you how to extract signal
from noise, and then we're going
to show you how you can remove
certain types of artifacts from
an image.
We're then going to take a look
at simd, and we're going to show
you how you can use quaternions
to represent rotations in three
dimensions.

I'm then going to pass off the
presentation to my colleague,
Luke Cheng, who's going to show
you some of the interesting
things that you can do with
vImage.

So with that, let's get started.
So you're probably asking
yourself exactly this.

What on Earth is Accelerate?
So the primary purpose of
Accelerate is to provide
thousands of low-level math
primitives, and we provide these
primitives across all of Apple's
platforms.
So this includes not only iOS
and macOS but watchOS and tvOS
as well.
Most of these primitives are
hand-tuned to the
microarchitecture of the
processor.
So this means we get excellent
performance.
And this performance translates
directly into energy savings.

So if you're an app developer
and you use the Accelerate
framework, not only will your
application run faster, but
you'll also use less battery
life.

This means your users are going
to have an overall better
experience.

Now, because we provide so many
primitives, we find it useful to
organize or group them into
domain-specific libraries.
So, for instance, we group all
the signal processing primitives
under vDSP.
So, for instance, this would be
your FFTs or your DFTs and your
DCTs, your fast Fourier
transforms, your discrete cosine
transforms.

VImage contains the image
processing primitives.
If you're doing color space
conversions, this is the library
for you.
VForce contains vector versions
of the transcendental functions.
So for instance sine and cosine.
We also have support for dense
linear algebra as well as sparse
linear algebra, and we have a
specialized library for neural
networks called BNNS.
It stands for Basic Neural
Network Subroutines.

Not strictly part of the
Accelerate framework but very
closely related, and we find
these libraries very useful,
include libraries like simd,
which is a vector programming
aid for the CPU, and
Compression, which contains
several different lossless data
compression algorithms.
So let's take a look at our
first library.

We're going to take a look at
vDSP.
VDSP is a state-of-the-art
signal processing library, and
it contains a wide range of
signal processing primitives.

These range from basic
arithmetic operations on arrays
such as add and subtract as well
as more complicated operations
like convolution and FFTs.
If you're a successful app
developer and maybe in the past
you have avoided FFTs, I want to
show you how we make that easy
with Accelerate.
With just a few lines of code,
you can make that work.

And I have an example.
I'm going to show you how to
extract signal from noise.

So what we have here is an audio
signal.
This is the base signal.

We haven't added any noise to it
yet, and you'll notice I have
two sliders at the bottom.

In the lower left I have a
slider that lets me add noise,
and you can see that.

And I also have a second slider
that let's me remove this noise
called Threshold.

You'll also notice I have a
toggle switch in the lower
right.

This lets me look at this under
a different domain.
So currently we're looking at
this signal under the time
domain.
We're going to do some analysis
and see this signal in the
frequency domain.
You can see all the spikes on
the left.
Those are the frequency
components of the signal.

You'll also notice the blue bar.
The blue bar is the threshold
slider, so you can see I can
move it.
So what I'm going to do now is
add some more noise to the
signal.
Actually, let's add quite a bit
of noise.

So you can see the signal that
I'm still interested in is
represented by the spikes on the
left or another way of looking
at it is the spikes with the
tallest height, and what we've
added, the type of noise we're
adding to this, is background
noise.

So now I'm going to remove this
noise, and the way I'm going to
do this is I'm going to move
this threshold slightly higher.
As I do this, what's happening
is, behind the scenes, is we're
identifying any frequency
component less than this
threshold and removing it.

We're saying effectively if
there is any spike with a height
less than this blue bar, we're
going to pretend it's noise and
remove it.
So if I keep sliding this
higher,
eventually I've removed all the
noise.

And if I go back to the time
domain, you can see that I
removed the noise.

And in case you don't believe
me, we can remove the threshold,
and this is what this looks like
with all the noise still added.
So let me show you how we do
that.

At a high level what we're going
to do is first perform an
analysis on a signal.

That's what the switch let me
do.
We're then going to identify the
frequency components that
represent the noise and remove
them.

After we have done that, we're
going to reconstruct the audio
signal.

So let me show you some code.
What we're going to use here is
a discrete cosine transform or a
DCT.
You can see here the DCT
CreateSetup.

This context is going to
describe the type of work we're
going to do as well as give us
space to perform this work.
In this case, we're going to use
a type 2 DCT.

We're then going to pass this
context to an execute function.
This will actually perform the
work.
Once we've performed the
analysis, we want to remove the
noise.
This right here is where the
magic happens.

This routine is going to
identify any frequency component
less than that threshold and
make it zero.
It's going to zero it out.
After we've done that, we want
to reconstruct the audio signal.
Again, we're going to need to
use CreateSetup to create
context.
In this case, we're going to use
a type 3 DCT to reconstruct the
signal.
We're then going to pass this
context DCT execute, and this
will actually perform the work
of reconstructing the audio
signal.

So what we've shown you is an
example of how you can remove
certain types of noise from an
audio signal using vDSP.
I want to show you an example of
how we can remove certain types
of noise from an image also
using vDSP.
In this case, we're going to
restore an old newspaper
photograph.
So what we've done is we've
taken this image, and we've
applied a two-tone screen to it.
So this could represent an old
newspaper photograph.
And what we're going to try to
do is remove this screen.

We're going to try to remove the
artifacts that you see.
Currently, we're in off, so
we're not doing anything.
What we're going to do is
actually take a sample of this
screen, and we're going to
create a mask from that sample
and then apply that to the image
to try to remove it.
So let me show you the first
attempt.

So what we're doing is we are
identifying a frequency
component at a certain
threshold, and any frequency
component higher than that
threshold we're going to remove.

And you see, if we set the
threshold too low, not only do
we remove the artifacts, but we
also remove quite a bit of the
image as well.
If we set it too high, you can
see that we didn't do much of
anything.
Medium seems about right.

Medium seems too correctly
identify the artifacts in the
image without damaging too much
of the image.
So let me show you how we do
that.

At a high level what we're going
to do is we're going to perform
an FFT on the image and the
sample.
We're then going to create a
mask from that sample and apply
it to the image.
Once we've done that, we're
going to reconstruct the
original image.
So let me walk you through some
code.

We're going to do an FFT.
This means it needs to be a
power of 2.

That's why you see the log2
call.
The 1024 x 1024 is the size of
the image.
We're then going to pass it to
fft2d zrop.

This is quite a mouthful.
The important part here is the
op stands for out of place.

So we're going to have to create
some temporary space just for
the result.

We're going to store in this
complex structure, and
effectively this says we're
going to store the complex
number in two arrays with the
real component in one array and
the imaginary component in the
second.
We also need to specify a
direction.
In this case we're going to do a
forward FFT.

Now the artifact removal is a
little bit more advanced, so I'm
only going to go through this at
a very high level.
I recommend you download the
sample application that's
available online right now, and
each one of these routines is
also documented under vDSP, and
we have excellent online
documentation.
A high level what we're going to
do is we're going to identify
the magnitude of the frequency
components, in this case with
the sample.
We're then going to identify the
components we want to remove.

We're then going to create a
mask from that.
And once we've done that, we're
going to apply that mask to the
image.
Effectively what we're doing is
we're multiplying by 0 the
components we want to remove and
by 1 the components we want to
keep.
We're going to use zrop again to
reconstruct the image.

Because this is an FFT, we get
to reuse the context.
In this case, one key detail is
we're going to store the image
in two arrays because this is a
complex structure.

So the even pixels are going to
be in the real array, and the
odd pixels are going to be in
the imaginary.
And again we're going to specify
a direction.

In this case, we're going to use
an inverse FFT.
So now I want to shift gears for
a moment.
Previously we showed you two
examples, and we had two working
examples for vDSP, and then we
worked backwards, and we showed
you how we constructed these
examples.
We worked backwards and showed
you the moving pieces.

For this next library, what I
want to do is start at the basic
components.

I want to build up to a result.
So we're going to take a look at
simd, and we're going to start
at the basic low-level
primitives, and we're going to
build up to rotations.

In this case, rotations of 3D
objects.
At a high level, simd is an
abstraction for the vector
processing unit.
So what this lets you do is
declare vector and matrix
objects, and you get to perform
operations on these objects, and
this will map directly to the
vector hardware of the
processor.

So let me show you a coding
example.
So what we're going to do here
is we're going to take two
arrays, and we're going to
average the components.

So we're going to iteratively go
through each of the scaler
components, add them together,
and divide by 2.
This is going to be really slow.
Instead, you can declare these
arrays as simd float4 vector
types.
So then we can perform basic
arithmetic operations on these
objects.
So not only can you express the
computation more naturally, but
this will also run about as fast
as it can.

And this will work across all of
Apple's platforms.
Simd has a tremendous amount of
functionality.
In addition to vector and matrix
objects and allowing you to
perform arithmetic operations on
these objects.
It has extensions.

So for instance, dot products
and clamp.
It also has support for the
transcendental functions, so for
instance sine and cosine as well
as quaternions.

Quaternions are very useful for
representing rotations in three
dimensions, and I actually want
to talk a little bit more about
those.
So I'm going to walk you through
a coding example.
There's quite a bit to unpack
here.

So we're going to start on the
right.
We have a unit sphere.

That's what this gray sphere is.
And you'll notice this red dot.
And that's actually the tip of
this vector.
We've declared a simd float3
vector.

We've set the x and y components
to 0 and the z component to 1.
So it's coming out at us.

And that red dot is represented
by the tip of that vector.
We're now going to perform a
rotation on this vector using a
quaternion.
Technically, we're rotating the
entire scene, but for purposes
of illustration, we're going to
say we're rotating this vector.

When using a quaternion for
rotations, you need to specify
an axis and angle of rotation.

Or to put it another way, what
are you rotating about and by
how much.

So we're going to rotate about
the x axis, and we're going to
rotate pi over 3 radians up.

You apply the rotation by
calling the simd act function.
This applies the action of the
quaternion on that vector and
returns a rotated vector.
So let's take a look at that
now.
So normally you're not
interested in rotating along a
single axis.
You usually want to rotate along
multiple axes, and if you're
already familiar with rotation
matrices, this is going to seem
very natural.

Like rotation matrices, you can
combine the rotations using
multiplication, and also like
rotation matrices, the
multiplication is not
commutative.

So this means if you change the
order of the operands, you will
change the order of the
rotations.
So effectively what we're going
to do here is rotate pi over 3
radians up, and then pi over 3
radians to the right.
And we're going to combine that
into a single rotation.
Some of the more interesting
things you can actually do with
quaternions and simd is
interpolation, and we support
two types of interpolation.

The first is Slerp.
It stands for Spherical Linear
Interpolation, and there are
actually two variants of it.
We have simd slerp, which will
find the shortest arc between
these two points, in this case
between the blue and the green,
and we have simd slerp longest,
which will find the longest arc.
So you'll actually see it go
behind the unit sphere.

The second variant is Spline.
Spline is really more useful if
you have more than two
rotations.
So, for instance, here we're
going to interpolate between an
array of rotations, and there's
quite a bit of boilerplate code
here, so I want you to focus
just on the Spline call.
Effectively what we're doing is
just iterating through all the
individual rotations and
applying Spline.
So the thing you have to specify
with Spline is not only the two
rotations that you wish to
interpolate between but also the
previous and the next as well.
And this is what this looks
like.

So if you're a game developer,
you're probably not interested
in rotating individual vectors.

You're probably interested in
rotating objects.
So we have that for you.

We have a cube.
It's represented by multiple
vectors, and it's going to go
through a series of eight
rotations.
On the left, we're going to
trace those rotations using
Slerp, and on the right we're
going to use Spline.

So let me show you what Slerp
looks like.
And you can see because it's a
linear interpolation, every
single time it changes direction
you get these sharp corners.

Whereas if we look at Spline --
Because it's aware of the
previous and next rotation as
well, you end up with these
rounded corners.
So let's see that again.

So now I went through all those
topics pretty quickly, so as a
quick recap, we started by
taking a look at vDSP, and we
showed you two examples.
The first was how to extract a
signal from noise, and the
second was how to remove certain
types of artifacts from an
image.
We then took a look at simd, and
I showed you how you can
represent rotations in three
dimensions using quaternions.
I'm now going to pass off the
presentation to my colleague,
Luke Chang, who's going to show
you some of the interesting
things you can do with vImage.
Thank you, Matthew.

Hello everyone.
My name is Luke Chang.
I'm an engineer in Vector and
Numerics group.
Today, I'm going to talk about
vImage, what vImage offers, and
how easy it is to use vImage in
your apps.
With just a few lines of code,
you can create engaging video
effects in your app.
Let's get right to it.

VImage is our image processing
library.
It has several components.

The first component is the
conversion function.
Conversion function help you
move image between different
image formats.
Different image formats have
different advantages.
For example, RGV format matches
the pixels on your display, so
it's best for the display.
On the other hand, we have YCbCr
image, which is similar to how
human perceive the image.
Human eyes recognize brightness,
which is the luminance channel.

Also, the color, which is the
chrominance channel.
Also, camera uses YCbCr format
to capture images.
So converging function helps you
easily move images between these
formats.
Next, we have geometry function.
Geometry function changes the
size or orientation of the
image.
We have vImage scale that can do
enlarge or shrink the image.
We use Lanczos algorithm, so
we'll have high-quality output
after the operation.
We also have vImage rotate that
can rotate image clockwise or
counterclockwise.
Next, convolution function.
The most notable effect of
convolution function is the blur
effect.
You see blur effect all the
time, in UI, in photography.
If you want to phase something
into the background, you can
apply the blur function, blur
effect.
Next, transform function.

Transform function is basically
a matrix multiply.
It lets you operate on the data
channel of each pixel.
Let's say you want to increase
red or increase green, you can
do that with transform
functions.
Morphology.

Morphology changes the size or
shape of the objects in the
image, not the image itself.

We have vImage erode, vImage
dilate to make the object
smaller and bigger.

If you're feeling adventurous,
you can actually provide a
custom shape of kernels to these
functions, and vImage erode and
dilate will make the object
smaller or bigger according to
the kernel you provided.
Those are the five things in
vImage.

Now, I want to show you a demo
app that we wrote based on
vImage and show you what kind of
effect you can get from vImage.
What I have here is a lab that
captures images using the back
camera and put the image onto
the screen.
And we're doing in real-time.

This is a live stream, so you
can see the drinking bird doing
its motion next to the roses.

All right, the first effect I
want to show you is the color
saturation effect.

You can see this effect in a lot
of photo editing software.
So I want to bring out the
color.
What I can do is move this
slider to the right to make the
red really red and green really
green.
And on the other side, I have
white roses.
I feel that the color of white
roses is not that interesting to
me.
I would like to direct the focus
of my audience to the
composition and the contrast of
this image.
I can slide to the left to
desaturate the image to the
point that this becomes a black
and white image.

So now color is no more
distraction in this image, and
the viewer can focus on the
composition and contrast.
Okay. So, how do we do this?
There are several steps we need
to take.
First, of course, we have to get
the image from the camera, and
then we want to use vImage to
process, to apply the effect, so
we have to prepare the input and
output buffer for vImage.
Then we actually calling vImage
functions to apply these
effects, and we display the
output to the screen.
Let me jump ahead and talk about
how do we apply effects using
vImage functions.
The effect I show you is a color
saturation effect, and here is
the formula to do color
saturation.

Basically, we want to remove the
bias from the pixel, and using
multiplication to apply the
saturation effects, and then we
put the bias back to the pixel.
VImage has exactly the function
to do this operation, which is
vImage matrix multiply.
VImage matrix multiply takes the
preBias, in this case minus 128,
to remove the bias, and because
the saturation is [inaudible]
point and the image is an
integer, we want to convert this
saturation value first into
fixed point format.
We chose Q12 as the fixed point
format, hence a divisor of
0x100.
And then we have the postBias
128 times the divisor, just to
put the bias back to the pixel.
And the matrix itself is really,
really simple.

All we want to do is just doing
a scaling of CbCr channel.
So the matrix itself is just a
scaler, want to multiply CbCr
channel with this scaler.
We have all the information, so
let's call vImageMatrixMultiply,
and with just one line of code,
one function call, you can
achieve the saturation effect.
Now let me come back to other
steps that we need to take.

We need to take the image from
the camera.
How do we do that?
We need to write a delegate
method, and what camera gives us
is a CV image buffer.

So we get the buffer.
We have to make sure this buffer
is accessible to CPU.

That's where vImage lives.
After we apply the effects,
whatever effects that may be,
and we have to unlock the base
address of this pixel buffer so
that the camera can reuse this
piece of memory.
The second step, we have to
prepare the vImage input and
output buffer.
We already have this image in CV
image buffer.

All we need to do is just get
the information such as height
and width, and then we can
package this into a vImage
buffer object so it can be
consumed by vImage library.

We do this for luminance and
chrominance channel.
Now we need to prepare an output
buffer.
Remember, we don't have a piece
of memory allocated to the
output image yet, so we need to
do that, and vImage has a
convenience function, vImage
buffer in it, to do just that.
Given this height, width, and
bits per pixel, vImage buffer in
it will allocate a memory that's
large enough to hold this image
and then also create a vImage
buffer object so it can be
consumed by the vImage library.
Last step is put this process
image to the screen.
Like I said before, RGB is
really the best format for
display, so let's use the
conversion function to convert
YCbCr image into a RGB image.

And then because the UI expect
the cgImage object, we have to
create one.

There is a convenience function
in vImage.
vImageCreatesCGImage FromBuffer
that helps you to create a
cgImage based on the buffer you
already have in vImage.

One thing to note is that we're
not actually copying the large
data buffer in the image from
one place to another.
We're simply creating a cgImage
object that adds a container to
this image buffer.
So we are only filling in the
information that cgImage needs,
create a cgImage objects,
instead of copying data around.
Once we have that, we can send
the cgImage object to the image
view, and it will be displayed
on the screen.

So it's that simple.
Four steps and you can create
your own effects, we show you
the saturation effects.
Now, there are other effects we
can do with vImage.

We can do a rotation, like I
said before, rotate the image
clockwise, counterclockwise.

We can do blur or phase
something into the background.
And you feel, if you feel like
you want to add some retro
feeling to your images, you can
do dithering for black and white
images, and color quantization
for color images.
Let me show you how they look
like in the app.
So, again, I have the slider
here to control the rotation.

I can do rotate counterclockwise
or rotate clockwise.
Now I want to try the blurring
effect.
Let me click on the one here.
And I can apply more blurring or
slide to the left, bring the
roses back to the foreground.
For black and white, I can use
dithering.
Now this black and white image,
the gray scale is represented
now by the density of the dots,
that's the dithering effect.
And we use the accents and
dithering algorithm for this,
and I will show you how to do it
later, and for color
quantization, we have lookup
table.
And I can move this slider to
increase the quantization level.
As I move the slider to the
right, fewer and fewer color is
available in this image.
That's sort of creative, that's
how your computer screen looks
like in the '90's or in the
'80's.
All right.

So let me show you how we do
that.
For rotation effect, you can
call vImageRotate and given the
angle of rotation, it will do
counterclockwise or clockwise to
better align your images.
For the blur effect, we use
TentConvolve.

The blur effect is controlled by
the size of the kernel.
The larger the kernel, the more
blur you'll get.
Dither effect is basically
converting an 8-bit image into a
1-bit image.
At the same time, you can
specify a dithering algorithm.

In this case, we used the
Atkinson dithering algorithm.
Color quantization, we used the
quantization level to create a
lookup table for the RGB
channel, and we call
vImageTableLookUp to apply this
table lookup to RGB channels to
limit the number of colors on
the screen.
Those are the four additional
effects I wanted to show you,
and I think now is a good time
to move onto the next topic.
LINPACK Benchmark.

We talked about the
functionality of Accelerate.
We talk about how easy it is to
use in Accelerate in your apps.
We haven't talked about how fast
Accelerate really is.

And LINPACK Benchmark is a
perfect tool to do that.
What is LINPACK Benchmark?
Basically, it's trying to
measure how fast you can solve a
linear system on your machine.

There are actually three
different LINPACK Benchmarks.
The first one is solving a
100-by-100 linear system.
The second one is solving a
1000-by-1000.

The last one, which is the most
interesting one, that's the one
we're going to use today, is no
holds barred.
You can solve as large a system
you want to fully utilize every
last bit of the computational
power on your machine.
Let's see the performance of
iPhone X using Accelerate.
The performance is measured in
gigaflops, for double precisions
catch up with iPhone 5S, iPhone
6, iPhone 6S, iPhone 7, iPhone X
comes in around 28.7 gigaflops.

That's double precision.
Let's look at single precision.
Again, we run out of space, we
have to shrink to try to make it
come closer.
IPhone X comes in at 68
gigaflops.
Now you might be thinking, this
is not that surprising.

However, it improves over time,
so the performance improves over
time as well, but in fact,
that's only half of the story.
When there is a micro
architecture change to have
additional computational power
into a machine, you need the
matching [inaudible] to fully
utilize this additional
computational power.
And that is where we come in.

Remember, this is the same
LINPACK Benchmark executable
running on all five generations
of iPhones.
They all got the best
performance using Accelerate
without changing a bit.
The same is also true for your
apps.

If you use Accelerate in your
apps, you'll get the best
performance automatically on all
the architectures we support.
Moreover, Accelerate supports
across platforms.

Accelerate works on macOS, iOS,
tvOS, watchOS.
So let's say if tomorrow Apple
comes out with a new
architecture or new platform,
you don't have to worry about
it.
All you need to do, the most
you'll ever need to do is just
to rebuild your apps, link
against Accelerate.
You will automatically get the
best performance on the latest
release platform or
architecture.

Just as a summary, we talk about
Accelerate supports a wide
variety of functionalities.

We constantly look at this
feature request, evaluate them,
and then put into builds.

Actually, some of our best
features come from feature
requests.

Accelerate is easy to use.
Most of the time it's just one
function call and the job is
done.
It's fast and energy efficient,
so your app is more responsive
and the battery life is longer.
Accelerate is portable across
platforms and architectures.

You get the best performance on
all the platforms we support and
all the architectures we
support, and the best part is,
you don't have to change your
code.

For more information, you can go
to our online documentation at
developer.apple.com, and all our
demo apps, simple code, and
session material will be
available online.

We have a lab session tomorrow
afternoon at 2.
I look forward to seeing you
guys there.
If you have any questions or you
want to learn more about
Accelerate, I'd love to see you
there.
That's all for our presentation
today.
Thank you all for coming.
Have a great day.

Looking for something specific? Enter a topic above and jump straight to the good stuff.