Face and Face’s Landmarks Detection Using Vision Framework in iOS 11

Paweł Chmiel

Some time ago, I made a tutorial on how to use face detection in AVFoundation. This time, I would like to show you how to do this using Apple’s new Vision framework, presented at WWDC 2017.

You may be thinking “why should I use something new if the old staff is working quite nicely?”. Well, now we are able to do much more: we can detect a face’s landmarks! Let’s do it!

How do we start?

First of all, we need to setup our camera support by setting the AVSession, previewLayer, and implementing a captureOutputmethod delegate.

Let me skip the beginning parts and focus on the captureOutput method. The project is available on our GitHub account and a link to it will be at the bottom of this page.

For this method, we need to convert a sampleBuffer into a CIImage object. What is important here is that we need to provide the right orientation, because face detection is really sensitive at this point, and rotated image may cause no results.

Face detection

To be able to detect specific landmarks of our face, we first of all need to detect the whole face.

Using the Vision framework for this is really easy.

We need to create two objects, one for the face rectangle request and one as a handler of that request.

1

2

let faceDetection=VNDetectFaceRectanglesRequest()

let faceDetectionRequest=VNSequenceRequestHandler()

How can we detect the face?

Simply call perform on the request and check the results.

1

2

try?faceDetectionRequest.perform([faceDetection],on:image)

let results=faceDetection.results as?[VNFaceObservation]

The result of this perform method is an array of VNFaceObservation objects which have only one property: landmarks of VNFaceLandmarks2D type.

Landmarks detection

Once we have our face detected, we are able to start looking for some landmarks. The full list for this is quite long, as we can detect landmarks like face contour, eyes, eyebrow, nose, lips with outer lips and a few more.

If we want to detect one of these, we need to create a new request and request handler objects focused on our particular landmarks detection.

1

2

let faceLandmarks=VNDetectFaceLandmarksRequest()

let faceLandmarksDetectionRequest=VNSequenceRequestHandler()

Another really important thing here is setting the faceLandmarks property inputFaceObservations

Only by setting this one are we able to detect anything more than the whole face.

The usage of VNDetectFaceLandmarksRequest is exactly the same as in the previous case.

Just call:

1

2

3

4

try?faceLandmarksDetectionRequest.perform([faceLandmarks],on:image)

iflet landmarksResults=faceLandmarks.results as?[VNFaceObservation]{

forobservationinlandmarksResults{

...

and we can iterate through the landmarksResults

Let’s assume we want to draw faceContour using UIBezierPath, for example.

Each object of VNFaceLandmarkRegion2D, which is the type of landmark, contains a C array of points and pointCount

Points are a type of UnsafePointer<vector_float2> so we need to convert it before its used. I’ve prepared a method for conversion which creates an array of tuples consisting of two CGFloat values.

The next thing we need to do is perform some calculations because the points values are normalized, which means they are lower than 1.0

As you can see, I’m using an object called boundingBox which is a bounding box of a detected face.

At the end, just draw a UIBezierPath built from the provided converted points.

It’s worth remembering that Vision the framework is using a flipped coordinate system, which means we need to to the same with our drawing layer, as our face contour will be upside-down and on the wrong side…

to do this, just call:

1

shapeLayer.setAffineTransform(CGAffineTransform(scaleX:-1,y:-1))

And that’s all. Now your detected face landmarks should be on our face on live camera preview.

Conclusions

Vision is a nice framework which is also really easy to implement, but only if you remember these small details.

I see many improvements in comparison to the old version of face detection tool. So, if you need really good precision or more details detected on the face, I recommend giving it a chance and playing with it. Once you’re a little more familiar with it, implementing it in a working project shouldn’t be a big problem.