iOS7 Day-by-Day :: Day 16 :: Decoding QR Codes with AVFoundation

Written by Sam Davies

This post is part of a daily series of posts introducing the most exciting new parts of iOS7 for developers –#iOS7DayByDay. To see the posts you’ve missed check out the introduction page, but have a read through the rest of this post first!

Introduction

Yesterday we looked at some of the new filters available in CoreImage, and discovered that in iOS7 we now have the ability to generate QR codes. Well, given that we can create them you might imagine that it would be helpful to be able to decode them as well, and you aren’t about to be disappointed. In the 17th installment of DbD we’re going to take a look at how to use some new features in the AVFoundation framework to decode (amongst other things) QR codes.

AVFoundation pipeline

AVFoundation is a large framework which facilitates creating, editing, display and capture of multimedia. This post isn’t meant to be an introduction to AVFoundation, but we’ll cover the basics of getting a live feed from the camera to appear on the screen, since it’s this we’ll use to extract QR codes. In order to use AVFoundation we need to import the framework:

@import AVFoundation;

When capturing media, we use the AVCaptureSession class as the core of our pipeline. We then need to add inputs and outputs to complete the session. We’ll set this up in the viewDidLoad method of our view controller. Firstly, create a session:

AVCaptureSession *session = [[AVCaptureSession alloc] init];

We need to add the main camera as an input to this session. An input is a AVCaptureDeviceInput object, which is created from aAVCaptureDevice object:

Here we get a reference to the default video input device, which will be the rear camera on devices with multiple cameras. Then we create an AVCaptureDeviceInput object using the device, and then add it to the session.

In order to get the video to appear on the screen we need to create a AVCaptureVideoPreviewLayer. This is a CALayer subclass, which, when added to a session will display the current video output of the session. Given that we have an ivar called_previewLayer of type AVCaptureVideoPreviewLayer:

The videoGravity property is used to specify how the video should appear within the bounds of the layer. Since the aspect-ratio of the video is not equal to that of the screen, we want to chop off the edges of the video so that it appears to fill the entire screen, hence the use of AVLayerVideoGravityResizeAspectFill. We add this layer as a sublayer of the view’s layer.

Now this is set up, all that remains is to start the session:

// Start the AVSession running
[session startRunning];

If you run the app up now (on a device) then you’ll be able to see the camera’s output on the screen – magic.

Capturing metadata

You’ve been able to do what we’ve achieved so far since iOS5, but in this section we’re going to do some stuff which has only been possible since iOS7.

An AVCaptureSession can have AVCaptureOutput objects attached to it, forming the end points of the AV pipeline. TheAVCaptureOutput subclass we’re interested in here is AVCaptureMetadataOutput, which detects any metadata from the video content and outputs it. The output of this class isn’t of the form of image or video, but instead metadata objects which have been extracted from the video feed itself. Setting this up is as follows:

*output = [[AVCaptureMetadataOutput alloc] init];
// Have to add the output before setting metadata types
[session addOutput:output];
// What different things can we register to recognise?
NSLog(@"%@", [output availableMetadataObjectTypes]);

Here, we’ve created a metadata output object, and added it as an output to the session. Then we’ve using a method provided to log out a list of the different metadata types we can register to be informed about:

It’s important to note that we have to add our metadata output object to the session before attempting this, since the available types depend on the input device. We can see above that we can register to detect QR codes, so let’s do that:

The metadataObjects array consists of AVMetadataObject objects, which we inspect to find their type. Since we’ve only registered to be notified of QR codes we’ll only be getting objects of type AVMetadataObjectTypeQRCode. TheAVMetadataMachineReadableCodeObject type has a stringValue property which contains the decoded value of whatever metadata object has been detected. Here we’re pushing this string to be displayed in the _decodedMessage label, which was created in viewDidLoad:

Running the app up now and pointing it at a QR code will cause the decoded string to appear at the bottom of the screen:

Drawing the code outline

In addition to providing the decoded text the metadata objects also contain a bounding box and the locations of the corners of the detected QR code. Our scanner app would be a lot more intuitive if we displayed the location of the detected code.

In order to do this we create a UIView subclass, which when provided with a sequence of points, will connect the dots. This will become clear as we build it:

AVFoundation uses a different coordinate system to that used by UIKit when rendering on the screen, so the first part of this code snippet uses the transformedMetadataObjectForMetadataObject: method on AVCaptureVideoPreviewLayer to translate the coordinate system from AVFoundation, to be in the coordinate system of our preview layer.

Next we set the frame of our shape overlay to be the same as the bounding box of the detected code, and update its visibility.

We now need to set the corners property on the shape view so that the overlay is positioned correctly, but before we do that we need to change coordinate systems again.

The corners property on AVMetadataMachineReadableCodeObject is an NSArray of dictionary objects, each of which have Xand Y keys. Since we translated the coordinate systems, the values associated with the corners refer to the video preview layer – but we want them to be in terms of our shape overlay. Therefore we use the following utility method:

Here we use convertPoint:toView: from UIView to change coordinate systems, and return an NSArray containing NSValueboxed CGPoint objects instead of NSDictionary objects. We can then pass this to the corners property of our shape view.

If you run the app up now you’ll see the bounding box of the code highlighted as well as the decoded message:

The final bits of code in the example app cause the decoded message and bounding box to disappear after a certain amount of time. This prevents the box from staying on the screen when there are no QR codes present.

// Start the timer which will hide the overlay
[self startOverlayHideTimer];

Conclusion

AVFoundation is a very complex and powerful framework, and in iOS7 it just got better. Detecting different barcodes live used to be quite a difficult task on mobile devices, but with introductions of these new metadata output types it is now really simple and efficient. Whether or not we should be using QR code is a different question… but at least it’s easy if we want to =)