If you compile this with your C++11 compiler clang++ -std=c++11 -stdlib=libc++ code.cpp, you might get an error such as

error: allocating an object of abstract class type 'Animal'

But, we are never allocating the instance of type Animal right? This is time to get our understanding of Generic Programming better.

Generic Programming works around these main terminologies: The concept, model, refinement and type constraint.

Concept

Concept is the core design. Model is what conforms to the concept, the implementation of Concept. Refinement is like inheriting a Concept and adding more things to it. And Type Constraint deals with specifying what and what not a Type is expected to do

In C++, we can have a pure virtual base class, where the base class provides no implementation for any member function. This idea is implemented with a protocol in Swift.
Along with that we have something in Swift that C++ programmers are not used to. It’s the use of protocol for type constraints. With C++, the compiler implicitly adds the type constraints for us.

the Equatable protocol is an explicit type constraint. But since we’re using the term protocol with Equatable, it could be a source of confusion to many.
The point to remember when working with Swift is that in swift protocol is used both for concept and type constraint. Whereas in C++ we don’t have any way to design type constraint.

Let’s explore this theory with an example of a generic stack implementation:

What could be the core concept of a stack? If we take a look at it, we might say a stack is a data structure where you push items and later on pop them back. So, we can say the concept of stack can be written as

You get the idea. A concept is just the bare minimum abstraction of the idea that has to modeled by some other implementation for use.

Now, coming back to the original problem. What was wrong with our Animal example at the top?

The problem is with both the implementations. In the Swift implementation, we’ve designed our Animal as a type constraint and with our c++ implementation, as our dear clang compiler was trying to inform us, the Animal is actually a concept. And hence both the usages are invalid.

That’s all for today. We shall explore the other Generic Programming terminologies later someday.
Have a nice day!

Writing top level generic functions is easier. Its more of like you write a normal function and later on you realize that this function is not restricted to any type, rather it is just an abstract algorithm that can be used by many types, you upgrade it to generic type. For example,

bool isEqual(int x, int y) {
return x == y;
}

can be easily abstracted out to a generic form such as

template <typename T>
bool isEqual(T x, T y) {
return x == y;
}

This generic form would work as long as T supports a == operation. Which brings us to the term you will often find with generic programming, type constraints.

So, the above example would work fine with all types that support == operation. Such as:

Here, our isEqual() sees that the types are const char * and the compiler knows how to compare two pointers. But the result might or might not be what we were expecting. If you were expecting a string comparison, you’ve to tell the compiler how to do that.

For our specific case, it can be achieved by providing an overloaded implementation that just takes const char *

Now, where should we actually use the generic types in real world? To address this question, you’ve to switch you mindset from run-time polymorphism to compile-time polymorphism. Let’s consider an example where you have designed your types in the classical object-oriented fashion, that is using inheritence

Here, we have a base Shape class that does provides the public interface and leaves the actual implementation details to the subclass, in this case the actual calculation of the area. Internally, we’re calling the Area() function on Shape and let the run-time do the actual look-up for the implementation by looking at the type information.

In another words, what we are actually doing is that delegating some part of the implementation of our Shape class to subclasses. Another way of achieving the same results is by not depending on subclasses for overrides, but delegating the implementation to some external classes.

I don’t know if we’ve achieved any actual improvement over the earlier implementation with this or not. But, this gives you an idea that the implementation can be dragged out of the core class while keeping the visible interface of the Shape intact.

If we can achieve this, we can even go a step further and actually provide the implementation at compile time. In other words, we can simply make the Shape class generic, and let the compiler plug-in the implementation details at compile-time.

Now, if you notice that we are not using an std::vector anymore, that is because we don’t have a Shape type anymore. And if you find yourself thinking in this direction, you need to get out of the run-time polymorphic mindset. When using compile-time polymorphism you’ve to keep in mind the fact that now you’re not in control of creating the custom types, rather you’ve passed on that authority to the compiler.

So, Shape<TriangleImpl> and Shape<CircleImpl> are entirely different types with no common ancestor or anything.

When designing with compile time polymorphism you need to keep in mind that your generic type should be a leaf node or a final class in the entire type system. You should not have to subclass your generic type, rather plug-in a delegate class that provides the implementation that you would rather provide by sub-classing.

To summarize, here’s a quick list of good and bad of generic type system:

Pro
1. Robust: No run time exceptions.
2. Efficient: No run time lookups

Cons
1. Not a good candidate for base class
2. Hard to read and debug.

Now, moving on to using generics with Swift. Swift provides the same good old c++ way of writing generic code with some extra type constraint system. Where in c++ the compiler would do the type constraint checking when you actually compile the code, Swift actually makes the constraint system more explicit, such that you have to provide the constraint information with a protocol.

UILabel is one the most smart view in UIKit family. It knows a lot about itself. If you constraint it to a certain width, the UILabel can calculate the height for itself. Another smart thing about UILabel is that is saves you machine cycles by not redrawing its content unless something is actually modified. And if you use NSAttributedString, you can in fact draw a more sophisticated text content.

Given so many good things with a UILabel, it is very tempting to subclass a UILabel to render our custom text content. Say if you’re working on a game or app, say a chat messenger, you may like to have a view where you render you text and have some decorations around it, like a speech bubble with an optional image. And, if you’ve ever worked on such a task, you know how painful it actually is!.

A UILabel is the perfect fit whenever you’ve a need to render some text on screen. Recently had to undergo such a job of subclassing UILabel, and it surprisingly hard to find a good source of information that shows how to properly subclass UILabel. It might work for the text data you have right now, but there are so many screen sizes and edge cases possible. If you’re one of those person, I don’t want you to discover this knowledge the hard way like I did, I’ll share the knowledge I’ve acquired from my adventure.

Even if you’re not directly using UILabel but rendering your content on an OpenGL canvas using Freetype library or something, the knowledge of how UILabel works should be really helpful in designing your own view. But, you can take my word for it, that rendering text is one of the most challenging things in the UI problem domains. You have to render all sort of glyphs, in all sort of writing directions with a lot of kerning, ligatures and other text attributes. If you’re one of those adventure seeker, you should definitely try building your own text rendering system. Just for the inspiration, these are from the days when I was building one:

Someday, I should probably also share my experiences with building a UILabel-like view from scratch, I mean on top of freetype. But, today we just extend over the work UILabel already does for us. Let’s begin with creating a new project.

We simply create a new project with a single view and add a RoundedRectLabel class. Next, we write our basic code to render the label on screen.

The textRectForBounds(_:limitedToNumberOfLines:) provides us an opportunity to update the drawing area of the text within the UILabel. And the drawTextInRect() is for you to do you custom drawing within the view.

If you notice, this is a little different than your usual UIView subclassing, where you typically draw you content in drawRect(). Another thing to keep in mind is that you typically also don’t want to modify the intrinsicContentSize(). Overriding these two methods will do whatever you want to do with you custom UILabel.

For sake of getting a deeper understanding, let’s print out the order of execution.

The interesting thing to notice here is that whatever is calculated from the super.textRectForBounds(_:limitedToNumberOfLines:) is also passed on to the intrinsicContentSize(). Another interesting thing is that, the methods are called multiple times. First time with some estimated values by UIKit and second time after the values have been evaluated accurately enough.

Not implementing the intrinsicContentSize() also means that, you don’t need to invalidateIntrinsicContentSize() yourself. Since, the default drawing mode of UILabel is UIViewContentMode.Redraw, the content is redrawn automatically whenever the content updates. To confirm, if you update the text after a while, you can see the drawing methods being invoked again.

As you can see, we’ve the required padding from all edges as we wanted, but the text is truncated at the end. This is because the UILabel currently is trying to draw the text within the original frame it had calculated. So the next task is to provide the new drawing frame to UILabel. This is done within textRectForBounds(_:limitedToNumberOfLines:). The important question is how do we calculate what frame size do we need to draw the entire text content?

The NSString has a handy extension that does this calculations. The method we’re particularly interested in is

We just need to provide this method our estimated size of drawing area and other rendering attributes, and it will return the actual frame we need to render this text. This brings us to the part II of the problem: How do we calculate the estimated size of the drawing area?

But, UILabel passes the same original width to textRectForBounds(_:limitedToNumberOfLines:) and drawTextInRect(). So, this means if the original width of the UILabel is 100 pts and we wish to have a padding of 10 pts from all edges. If NSString API calculates the required height for width = 80, is lets say it’s 200. The size passed down to drawTextInRect() is 100×200, where we again shrink the size down to 80×180. In order to compensate for this second clipping, we must calculate the height for an 2 times smaller width.

Next, remember, the boundingRectWithSize(...) will try to wrap our text in as less space as possible, because that is the default behavior. So, we need to explicitly provide the extra top and bottom padding to the size calculated.

At first this might look bad, because the test3 and test4 are not correct. But, this is not the compiler’s fault. The standard min and max use the overloaded comparison operators and they are not correct. We can test this with

This looks better in the sense that the min and max is calculated per component. But, something is still wrong with our clamp tests, as they still print the same old value. Turns out the min and max within clamp function still use the standard generic function, rather than our provided specialized version.

The only way to make this work is if I provide a specialized clamp function for Vector2.

Note that the specialized clamp implementation for Vector2 is exactly the same as the generic one. And this is not good, as now if we want the compiler to automatically pick the right version, we have to implement the entire chain down to every function. So, it comes down to either using the generic functions all the way up or implementing the entire chain.

To compare, here’s a C++ version of the same functionality that works great with a single clamp generic implementation.

In conclusion, generic programming with Swift is great and definitely a step forward than Objective-C, but somebody coming from C++ would be a little disappointed. The bright side is that the Swift language is rapidly evolving and maybe this would get improved in the future version.

I think we have covered most of the core concurrency concepts. With the current knowledge we are good enough to tackle all real world concurrency related problems. But this doesn’t means that we’ve covered everything the thread libraries have to offer. And by thread libraries I mean just the C++ standard thread library and libdispatch. For example, we’re yet to see std::future, std::promise, std::packaged_task, dispatch_barier, dispatch_source in action.

Today let’s focus on the low level of the libraries, atomics. Atomics are the lowest level that we can work with a thread library. Atomic operations provide the lowest level guarantee of ordering of operations. An operation is atomic if there is a guarantee that the operation would be never be left by any thread in an indeterminate state.

To make sure that the operation is atomic, internally the runtime could be either not switch the thread while an atomic operation is underway or maybe it simply uses a lock or whatever innovation the technology has to offer. As a user of the library, you just get the guarantee that atomic operation are indivisible.

To facilitate atomicity the C++ standard library offers a few atomic types. All the types offer a is_lock_free() function to test if the operation is done really not using any locks. Only exception is std::atomic_flag which is the always lock free.

Atomic Types

C++ provides a lot of atomic types. You can say that for almost all the fundamental types have an atomic equivalent. (And by fundamental types I mean bool and integers, no floating points, as you’ll see in a moment why). You can use them directly or as std::atomic<> template specialization. Here’s an example

std::atomic_bool b1;
std::atomic<bool> b2;

This same pattern is applied to all other fundamental types. Although, you can use them as your usual fundamental types, there are few operations that are not allowed. First striking constraint is disallowed copy and assign operation. This code won’t compile

Then how do we use these atomic types? This brings us to load() and store() operations. All atomic types except atomic_flag offer load(), store(), exchange(), compare_exchange_weak() and compare_exchange_strong() operations. Let’s take a look at what do they do.

load and store

If you’ve ever done a bit of assembly, you must be familiar with load and store operations. Load retrieves the data while store saves the data.

An exchange operation is basically a 3 step operation. First it loads the data, second it updates the data with new data, and third it stores the new data back. And this should explain why floating points are left out from fundamental types. Floating point types are not deterministic at comparison.

And since we’re dealing with the lowest level of concurrency operations. Sometimes you want a greater control over the execution. Maybe you need a stronger guarantee that the 3 step exchange was indeed done successfully before the running thread’s just ran out of time.

For such grained control the C++ standard library offers two more exchange operations. compare_exchange_weak() and compare_exchange_strong().

The compare_exchange_weak() returns false, if the exchange wasn’t successful. This could be because if the running thread’s time just ran out and was kicked out by the scheduler before it could finish the steps. This is called as spurious failure.

So if you want the exchange to run successfully every time, you probably need to put this operation under a loop. So that whenever the operation fails, you keep trying until it succeeds.

while (i.compare_exchange_weak(expected, desired) != true) {
}

Or, you can simple use compare_exchange_strong() which is guaranteed to eliminate all spurious failures.

bool success = i.compare_exchange_strong(expected, desired);

Both of these functions return false whenever the expected value is not same as the stored value. That is, whenever the comparison fails, and in that case the expected updates to whatever was the actual value. For example:

Finally, if you want your custom type to work as an atomic type, you can do that guaranteed that your custom type don’t do anything fancy. What that means in practical world is that your type should work with memcpy() and memcmp(). That is plain C types, no virtual table lookups. Here’s trivial example:

There’s big part dealing with memory ordering that we’ve simply skipped for now, but we shall come back to it later. Let’s now focus on the most important atomic type atomic_flag.

atomic_flag

Forget whatever that has been said about atomic types so far. None of that applies to std::atomic_flag. std::atomic_flag is different. You can say that std::atomic_flag is the core of the threading library. Its like the atom of the universe. Let’s start exploring std::atomic_flag.

Let’s consider scenario. On your social network you get a lot of LOL text that you just can’t understand. So, you decide to write a program to convert that text either into a full uppercase or full lower case.

Using the std::atomic_flag you can set the flag as soon as one of the threads select a routine and then clear it only after the entire modification is done. So, using std::atomic_flag you’re randomly selecting a thread and blocking all the rest.

This almost sounds like what std::mutex does right. In fact, using std::atomic_flag you can implement your own mutex object.

And since you’re so far, you can even just reap the benefits of std::lock_guard for locking and unlocking the mutex for you. Remember, std::lock_guard is based on RAII principles, so you get a exception-safe guarantee that no matter what the rest of the code does (except deadlock) your mutex will get unlocked.

Big deal, right? We just reinvented something that is already provided by the C++ standard library. And I guarantee they’ve a better implementation of the mutex. So, what can we extra out of working at such low level?

With std::atomic_flag you must’ve noticed we use std::memory_order_seq_cst and std::memory_order_release, what are they? They specify the memory ordering of operations. Let take the red pill and follow down the memory ordering hole.

Memory ordering

This is probably the weirdest topic you’ll encounter as a programmer, as this will put some doubts over your knowledge of how you thought instructions execute at the lower level.

First lets take a look at all types of memory orderings possible. Memory orderings are classified for 3 major classes of operations

Store: seq_cst, release and relaxed

Load: seq_cst, acquire, consume and relaxed

Exchange: seq_cst, acq_rel, acquire, release, consume, relaxed

If we think in terms of different memory models available, we can classify these operations as:

Default: seq_cst

Unordered: relaxed

Lock based: acquire, consume, release and acq_rel

You’re already familiar with sequentially consistent (seq_cst) for this is what you’ve been doing all your life. You see a piece of code and you follows through the lines of code top to bottom, because that’s how the execution works, right? Let’s see.

Let’s say there’s new trend that every awesome software company is following. They have placed a coffee machine and a fedora hat machine at the entrance lobby. And they require every employee to have a fedora hat on their heads or a cup of coffee in their hands to enter the office. They certainly favor the employee getting both the items. According to a survey this allegedly increases the hip level of the employee and brings more energy and productivity in the office. Say each of these items increment the employee’s hip level by 1.

So company A tries to implement this with the default memory model. They noticed that some employees prefer wearing the hat first and then holding the coffee, while other hold the coffee first and then wear the hat. So they have two security personnels, one that waits until employee wears a hat and then it checks if the employee also has a coffee. The second one does completely opposite, he first waits for the employee to hold the coffee and then checks if the employee also has a hat on. After both the security personnels have reported back, the doors decides whether to grant entry or not.

If you follow the code, you’ll notice that there is no way an employee can be denied an entry. No matter how long an employee takes to get a coffee or a hat, as soon as he does one thing the observing security personnel will check the other item, if they have it good, otherwise whenever they get the other item the second security will activate and this time employee will definitely pass the test, as they already have the first item.

They came up with the idea that they don’t really need two security personnels. What they instead do is that they instruct their employees to wear a hat and get coffee. So, the security has to only test for the coffee, because the employee must already have the hat by then. But this can procedure can fail, some of the employees can be denied entry. This is because the operations aren’t sequentially consistent anymore. When an employee is instructed to GetThings(), the employee sees it as there are 2 tasks they have to complete in relaxed manner, that is perform whatever seems convenient. The employee has no idea if any other thread is monitoring its activities. It just cares enough that by the time it has to exit GetThings() it need to have executed both the tasks. So, in case the employee feels like getting the coffee first and then the hat, there’s nobody stopping them. While, the security personnel is under false impression that whenever a employee has a coffee in their hands they must also have a hat on their heads. So every once in a while the security can encounter an employee that has a coffee in his hands but not hat yet, but the security doesn’t waits for the employee to get the hat and instead immediately runs them through the door, which obviously denies them the entry. And it’s all due to the misunderstood relaxed memory ordering.

Company C learns the lessons from both companies A and B, and wants to get the best of both worlds. So it adopts the lock based memory model.

Here the all the tasks are still relaxed, except for the check on coffee. The test on coffee is set as the synchronization point. Here hasCoffee serves as a token that both the employee and the security agrees on. The employee is free to do whatever it wishes to do in whatever order if they agree to perform the store on hasCoffee at the exact point as they’re expected to. It serves as a kind of checkpoint. Whenever an employee gets a coffee, it means that they have done all the prior tasks in whatever order that seems fit, nobody cares. So whenever the security sees a employee has a coffee in their hands, it is guaranteed that all the tasks before it have been completed. So, now the check for the hat can be successfully executed.

It still uses lock based memory model, but instead of asking the security to using acquire based loop they use consume based loop. What this means is that instead of constantly monitoring employees, the security can take a nap once in a while and whenever they wake up they just assume that the employee has a coffee in their hands, if not they can go back to sleep. This approach works good when you have a somewhat predictable data on how long does an average employee takes to GetThings().

Atomics and Objecitve-C

Coming over to Objective-C, atomicity is simplified. All the properties are by default atomic. This is good news because if you’re using multiple threads to update the same property, you will always have a valid value for that property. But this doesn’t means that the entire object will be valid.

The Safari for iOS has some very interesting perspective effect built-in. You can check that when you’ve multiple tabs open. How about building something like that? Those views are clearly UIViews right? You can see the web content rendered, they have cross buttons. The content even periodically updates without launching the app using the background services probably.

Lets start with searching for something in the Apple’s documentation. As far as I was able to search, I only got so far to the Adding Perspective to Your Animations in the Apple’s Core Animation Programming Guide.

If you just scroll down the ‘Advanced Animation Tricks’ page, at the bottom you’ll find around 10 lines and a small code snippet explaining how to add perspective to your CALayer object.

This is a good start, it’s not great as it skips a lot of details and that’s for a reason. To adding perspective, you have to be familiar with the linear algebra behind it, and we shall get to in the next session. But for now, let’s get started.

So, create a new single view project and add a new Swift file to it. Let’s call it PerspectiveView.swift.

class PerspectiveView: UIView {
}

In the storyboard add a UIView and set its class to be PerspectiveView type. Next, we need a container view to hold any subview into it. We’ll apply perspective to this container view and hopefully the perspective gets applied to all the contained subviews. Let’s call this container view as contentView.

I’m just centering all the subview. This is just me being lazy. You should probably add proper constraints to keep the image view aspect correct.

Now coming to the interesting part. Applying perspective.

We just call the calculatePerspectiveTransform() to do all the calculation and return back a CATransform3D object that we can simply apply to the CALayer of our contentView. As for the calculation let’s simply copy-paste the code from Core Animation Programming Guide to calculate the transform.

That’s it. Give it a run. That is not my dog BTW, just photobombing my photo out of nowhere.

If for now we just ignore the distortion of the image, which is a constraints issue. There are some other questions to be answered first.

What is sublayerTransform?

Why is eyePosition 10?

What’s up with m34?

And most importantly, where’s the perspective?

Turns out, the snippet provided by the Core Animation Programming Guide just adds the perspective. But, in order to see it in action you still have to modify the transform a bit more. Let’s add a little translation to the transform.

Wow! Now we have some perspective. We can now either just tinker with the magic numbers until we get the desired effect, or we can get a deeper understanding of things and have a better control over things.

I prefer the latter. So let’s begin by answering some of the questions from above.

What is sublayerTransform?

If you look at the interface of CALayer, you’ll see there are two CATransform3D types.

When you modify transform, the layer’s own content gets modified. Whereas, when you modify the sublayerTransform, the sublayers get modified, while the receiver’s layer remains untouched.

If you replace sublayerTransform with transform

contentView?.layer.transform = calculatePerspectiveTransform()

You would get something like

See what I mean? Our contentView which had a background color yellow also got modified. Let’s undo that code change. We need the contentView unmodified for things like reading touch and gestures.

What is eyePosition?

For the code above, it’s enough to understand that eyePosition here just means the degree of perspective you want to have. If it is some larger value, the effect is less and if it is a smaller value, the effect more. We shall look behind the maths of this in the next session with linear algebra. But for now you can try experimenting with values like 5, 50, 500, 5000, 50000 and see the changes yourself.

What’s up with m34?

There are mostly two kind of projections we see in computer graphics, the orthogonal projection and the perspective projection. Orthogonal is the default projection used by any 2D system like UIKit. Whereas the perspective projection is used by all 3D systems, like any 3D game. The main difference is that in orthogonal projection the distance from the viewer is not accounted, or in other words the z axis is totally ignored.

CATransform3D transform is a 4×4 matrix. It works in a homogenous coordinate system. We will dive deeper into homogenous coordinate system in a later session. For now it’s enough to understand that the purpose of this matrix is to convert your points from a 3D space to a screen space which is in 2D.

The m34, or the value at 3rd row, 4th column of the matrix is the biggest hint whether the projection matrix is an orthogonal or a perspective. An orthogonal projection matrix typically has m34 as 0 while a perspective matrix has some negative value here, typically -1. If in the above code, you simply set the value of m34 as 0, you’ll notice that all the perspective effect is gone!

Of course, the view is scaled, because this isn’t a perfect orthogonal matrix, we’re still performing the rotation and the translation on the matrix. But the main perspective effect is gone.

With the basic questions answered, let’s now build a proper projection system. I’ll be using the GLKit, because it has all the things we need to build the system we want. Since, as of this writing GLKit isn’t available in Swift, we might have to do this work in Objective-C and return the CATransform3D object back to our Swift class, where we can simply apply it to our contentView.

First we need a Camera class. Think of a camera. In a camera you control two things. First is the lens setting like the field of view, focus, aperture. The kind of things you usually set once before you begin the filming. The second is the camera motion, like the direction you want to point, rotating the camera and so forth.

1. Field of view: Controls the area you want to capture. The wider the angle, the more area you capture. But at the cost of greater distortion. The fish eye lens has a very wide fov.

2. Aspect ratio: Controls the aspect ratio of the image captured. A value of 1 means the captured image will be distorted to fit within a square. You typically want this to be the actual ratio you wish to capture.

3. nearZ, farZ: The clip planes. Anything farther than farZ or nearer than nearZ doesn’t gets included in the final image. You don’t want the farZ to be too far, as it brings in more floating-point errors. So if set the farZ to 1,000,000 and you’ve two objects placed at z 10 and 12. The one at 12 might overlap the one at 10, even though it is farther down the z axis. Typically a value of 0.1 and 100 is good enough for nearZ and farZ respectively.

4. center: This is where the camera is placed at. Default value is origin.

5. up: Tells what side is considered as up. Default value if (0,1,0), that is up is along the y axis.

6. eye: This is the direction you’re pointing at. Actually the final direction is calculated using the up and center values as well.

Since the Camera class controls two independent things, we can have separate matrices for each one of those. The role of a matrix is just to transform points from one coordinate system to another. A matrix simply defines a coordinate system. So, if you have a 4×4 matrix and you multiply it with a 4D vector you get a 4D vector in the transformed coordinate space.

This class is pretty straightforward. It just describes the transformations to be applied the object. The only thing that could be misunderstood is rotation. Rotation doesn’t describes the angle, it describes the axis. For angle we’ve another property. The final rotation is calculated from both rotation and angle.

Why do we need the Camera object? It’s because the final image is calculated after considering the camera object. Think of you shooting a frog leaping. The final captured motion depends on both the leap of the frog and the motion of your camera.

With that setup, lets see what can we build now. With a little experiment on the fov, camera’s eyeZ and the transform angle.

Now, for your experimentation, you can test that when we update the fov how distorted the image gets. Also updating the eyeZ of camera actually zooms in and out the content. Next you can also experiment with different transform angles and axes.

There’s a whole lot of things I’ve skipped in the implementation of the Camera and the Transform class. It’s more about why things work the way they work. In particular the insides of

GLKMatrix4MakeLookAt()

GLKMatrix4MakePerspective()

GLKMatrix4MakeTranslation()

GLKMatrix4Rotate()

GLKMatrix4Multiply()

and much more

These are some of the most important functions where the real linear algebra magic is. But, I thought this is more like low level stuff and deserves a rant of its own. In the follow-up I’ll dive deeper in to these function and much more on linear algebra.

We’ve explored so much with our concurrency experiments and yet there’s one fundamental topic we haven’t touched so far. Deadlocks. Whenever we think of concurrency, we can’t help thinking of locks for protecting shared resources, managing synchronization and what not. But, as Uncle Ben has said, with great power comes great responsibility, similarly with great locks comes great deadlocks.

Deadlocks with multiple locks

In the classical concurrency theory, deadlocks happen when more than one thread needs mutual exclusivity over more than one shared resource. There’s a classic Dining Philosophers problem that demonstrates this deadlock situation.

Five silent philosophers sit at a round table with bowls of spaghetti. Forks are placed between each pair of adjacent philosophers.

Each philosopher must alternately think and eat. However, a philosopher can only eat spaghetti when he has both left and right forks. Each fork can be held by only one philosopher and so a philosopher can use the fork only if it’s not being used by another philosopher. After he finishes eating, he needs to put down both forks so they become available to others. A philosopher can grab the fork on his right or the one on his left as they become available, but can’t start eating before getting both of them.

There are many problems hidden inside this one problem, but lets focus on the situation where every philosopher gets super hungry has acquired the fork on their left side and waiting for someone to release a fork before they can begin eating. But since, no philosopher is willing to release their fork, we have a deadlock situation.

Let’s reduce this problem to just two philosophers dining. Here’s a simple implementation

As you can see, each philosopher acquires a fork and then just waits indefinitely. One way to break this deadlock is to use std::lock() function. What this function does is that it provides a functionality to lock one or more mutexes at once if it can, otherwise it locks nothing.

First, the philosopher tries to lock both the forks. If successful, then they proceed further to eat, otherwise they just wait.

Notice two things here, first of all we’re not using any explicit loops to lock-test-release mutex. All of that is handled by std::lock(). Second, we’re providing the extra parameter std::adopt_lock as a second argument to std::lock_guard. This gives a hint to std::lock_guard to not lock the mutex at the construction, as we’re using the std::lock() for locking mutexes. The only role of std::lock_guard() here is to guarantee an unlock when the Eat() goes out of scope.

Deadlocks happen more often when more than one mutex is involved. But, that doesn’t means that deadlocks can’t happen with a single mutex. You must be thinking, how much stupid one has to be just using a single mutex and still deadlocking the code. Picture this scenario:

Homer Simpson works at Sector 7-G of a high security Nuclear Power Plant. To avoid the unproductive water-cooler chit-chat, their boss Mr. Burns has placed the water cooler inside a highly protected cabin which is protected by a passcode, allowing only single employee in at a time. Also, to keep track of how much water is being consumed by each employee, Mr. Burns has equipped the water cooler with mechanism that every employee needs to swipe their card in order to use it.

Homer is thirsty, so he decides to get some water. He enters the passcode and is inside the protected chamber. Once there, he realizes he’s left his card at his desk. Being lazy, he decides to call his friend Lenny to fetch his card. Lenny goes to Homer’s office, grabs his card and goes straight to the water cooler chamber, only to find it locked from the inside.

Now, inside Homer is waiting for Lenny to get his card. While, outside Lenny is waiting for the door to get unlocked. And we have a deadlock situation.

The moral of the story is, calling of user code after acquiring a lock should be handled with uttermost care. For example, in above code calling SwipeCard() after acquiring a lock is a big clue that there are some design errors with this code.

Moving over to GCD, we don’t have to worry about such nitty-gritty details, as we don’t have to care about mutex and locks. But still deadlocks can happen, because deadlocks are more of design errors than anything else.

The jobs sounds simple enough, we have a serial queue. We have two tasks, we need them to be executed sequentially. Each task has a taskId associated with it, just so that we can keep track of what’s getting executed. We use dispatch_sync because we want our tasks to be executed one after the other.

Here’s the output, before it gets deadlocked:

Enter task: 0

So, why isn’t the task 1 getting executed? Well, lets understand how dispatch_sync works. A dispatch_sync blocks the queue until it has been executed fully. Consider the following situation:

Say there is a hair saloon, they’ve a very strict first come first serve policy. A mother and a daughter visit the saloon. The mother steps in first so gets the client id 0 while the daughter gets client id 1. According to the rules, they have to serve the mother first, because she has the lowest numbered client id. But, the mother insists that they first serve her daughter. If they start serving the daughter they’re actually breaking the rule, as then they would be serving client 1 before 0. Hence, the deadlock.

And this is exactly why task 1 is not getting initiated, because the serial dispatch_queue is waiting for the task 0 to finish. But, the task 0 is insisting the queue to finish the task 1 first.

There are two ways to resolve this deadlock. First is using a concurrent queue instead of a serial queue.

The reason why this works is that a concurrent queue is allowed to work on more than one submitted task at a time. So, first task 0 is submitted. Task 0 then submits task 1 and demands it to be executed synchronously or in other words to be finished before finishing task 0. Since this is a concurrent queue, it’s distributes its time over all the queued tasks.

In terms of our mother-daughter-and-the-saloon example above. Lets say the saloon has a big clock that ticks every 1 minute. For that 1 minute they focus on a single client and then at the next tick they switch to another client, irrespective of what state the client is in.

So at first tick they focus on the mother, the mother just sits there not allowing them to even touch her hairs before her daughter is done. At next tick, they switch to the daughter and start serving her. This repeats until the daughter is fully done, then the mother allows them to work on her hairs.

This works, but it has a problem. Remember, we wanted the task 0 to be fully completed before task 1. That’s probably why we thought of using serial queues and synchronous tasks. This solution actually breaks it. What we actually need is a serial queue with tasks asynchronously submitted.

Picture the scenario: You’re on the phone with your friend, when suddenly he say ‘Hey! Hold on. I think I’m getting another call’ and before you can say anything, he puts you on hold. Now, you have no idea whether to hang up, or wait. Your friend could be coming back in 3 seconds or 3 hours you’ve no idea. Wouldn’t it be great if your phone service provided you a mechanism that whenever you’re put on hold, you can freely disconnect and whenever your friend disconnects the other call, you immediately get a ring. Let’s talk about managing synchronization between threads.

On a broader level the above is a classical producer-consumer problem. Both consumer and producer are synchronized by a shared data. What we actually need is a way to communicate between the producer and the consumer. The producer needs to signal the consumer the data is ready for consumption.

So far in all our concurrency experiments whenever we’ve some shared data, we explicitly block one or more threads while one thread gets to play with the resource. The shared data here seems to be a boolean flag of some sort. But we can do better with std::condition_variable

In the above code both the consumer and producer execute simultaneously in concurrent threads. The consumer waits for the producer to finish working on whatever it was doing. But instead of continuously checking for whether the producer is done as in

The first thing that you’ll notice in this code is that we start the producer code first and then the consumer waits. This is because if we start the consumer first, the queue has nothing in it to wait for.

dispatch_group is actually meant to synchronize a set a tasks in a queue. For example, you’re loading the next level in your game. For that, you need to download a lot of images and other data files from a remote server before finishing the loading. What you’re doing is synchronizing a lot of async tasks and wait for them to finish before doing something else.

For this experiment, we can better use a dispatch_semaphore. dispatch_semaphore is a functionality based on the classic semaphores concepts you must’ve learned during your college days.

If you look closely, we haven’t even used any dispatch_queue here. This is because dispatch_semaphore is more about synchronization than about threading.

dispatch_semaphore is basically just a count based blocking system. We just create a dispatch_semaphore object by telling it how many counts can be decremented before it blocks. Then on each wait it decrements the count and on each signal it increments the count. If the count reaches less than 0, the thread is blocked until somebody issues a signal.

Here’s another example to illustrate on using semaphores between two queues

Again note that we’re starting producer first, and because we initialize the dispatch_semaphore with 0, the consumer is going to block the main thread as soon as it is invoked. It won’t even give the chance to run the producer code in parallel.

Here’s the output of above code:

Producer working
Consumer sleeping
Producer sleeping
Consumer working

This is just an scratch of the surface of inter-thread communication. Most of problems with concurrency actually somehow deal with inter-thread communication. In later sessions we shall experiment more on this. The idea for this article came from the this discussion on problem of loading game data in a parallel thread and the solution I worked on.

Love them or hate them, you just can not ignore singletons. Let’s try experimenting with singletons and concurrency.

Singletons are objects that you wish to have only a single instance all over your program lifecycle. Usually, you want to create singleton objects lazily. Here’s a typical way you can create a singleton in a single threaded application.

This might actually work or not depending on how the threads are switching on your machine. To see what is problem with this code, let’s deliberately sleep the thread while creating the singleton instance.

There are two problems here. First of all, we’ve two instances of the Library object and second, because our Dealloc routine depends on the guarantee that at any given moment we have a single instance of the Library object, so the other Library object never gets released. This is a classic example of a memory leak.

We can solve this issue with two techniques. First is using the C++11’s static initialization.

Let’s modify our Library class and instead of creating the singleton at heap, we create a static instance of a Library object.

This works because with C++11 we’re now guaranteed to have the static variables created thread-safely. Another benefit with this approach is that we don’t have to implement any Library::Dealloc() function as static data gets deallocated automatically by the runtime. If this satisfies your requirements, this is best way out.

But, this won’t be the best fit for some cases. Say, you want to have Library singleton object only in heap memory. Also, if you’re using Objective-C this won’t work as you’re not allowed to allocate memory of custom objects on the stack. This brings us to our second method: Using C++ thread library.

Let’s switch back to our original implementation, with Library object in heap memory. The first thing that we can try is using mutex.

The only problem with this code is that Library::SharedInstance() is going to be called from plenty of places, and this code unnecessary lock and unlock the mutex at each call. What we actually need it the mutex only for cases when the single instance has not been created, all other times we can just simple pass the pointer back. This brings us to the infamous double checking lock design.

This code looks great right? It applies the mutex only when creating the shared instance, every other time it just returns the pointer. But this has a serious bug. This code would work almost all the time, except when you’ve forgotten about it. It’s so hard to debug that you probably can’t even simulate the issue in the debug environment and unfortunately you won’t ever see this bug again for probably a million clock cycles. This sounds like a programmer’s nightmare right?

Suppose you’re running this code with two thread ‘bookSearchThread’ and ‘libSizeThread’. bookSearchThread comes in and sees the shared_ is NULL. It would acquire the lock and start allocating and initializing the instance. bookSearchThread is half-way done, it has allocated a space in the heap, so the shared_ isn’t NULL anymore, but bookSearchThread is not done initializing the instance yet, just the allocation. Something like in Objective-C as:

Library *shared = [[Library alloc] init];

Suddenly, the thread switching happens and libSizeThread comes in. It sees the shared_ isn’t NULL. It says OK, good somebody must have already created this instance, lets use it straightaway. This could result in undefined behavior, as libSizeThread is now using an object that isn’t actually fully initialized.

This is such a great problem that almost every thread library provides a way to make this work. In C++11, we can use the std::call_once

These helper objects from standard libraries are provided just to guarantee that the object allocation and initialization is complete before setting the flag and every other time, it just returns immediately. So, no more race conditions for us.

In continuation with the last week’s experiment on protecting shared data in a concurrent system, let’s look at how we can make it work using C++ and threads.

Here’s a brief recap: We’re building a Airport scenario, where we have a single status board screen. The status board is simultaneously being edited by multiple airliners. The status board can be read by any passenger at any given moment for any flight status based on the flight number.

The good thing about C++ is that much of the code looks almost similar to the single threaded code and that is due to the way the thread library has been designed. At first glance you won’t be even able to tell where the threading code actually is. Here we’re invoking two operations on two threads at each iteration of the loop.

This makes use of std::future to invoke the StatusReader function on a async thread. The get() call at the end actually start the process. We shall talk about futures and promises in some future experiment. In this experiment it would be suffice to understand that a call to std::async() starts the StatusReader() function in a concurrent thread that magically returns back the result with the get() to the caller thread.

Another is simply spawning of a std::thread using the direct use with detach that we’re already familiar of from our previous experiment.

Here we just start the StatusWriter function on a new thread every time the loop iterates.

Obviously, this code isn’t thread-safe. We’re reading and writing to the same forward list from more than one threads, and this code is supposed to crash a lot. So, next step let’s make it thread safe using a mutex.

A mutex is the basic object that is used to provide mutual exclusivity to a portion of code. The mutex in C++ are used to lock a shared resource.

First of all we start with a std::mutex object. This object guarantees that the code is locked for a single thread use at a time. We can use the lock() and unlock() member functions on std::mutex object like:

If you’ve used C++ long enough, you’ll know that this code is prone to all sort of errors. In many cases the foo() can exit before the unlock() gets executed. This could be due to any early returns that you have added to the function. Or, in a more likely scenario, your code does something illegal and the foo() throws some exception. Even if foo() is very robust, it could call some another function which isn’t that robust.

If you’re an old time C++ user, you must know that to solve this problem space we often use RAII idiom. Good for you, the standard thread library already provides this for you in form of a std::lock_guard. The std::lock_guard is designed to work with any lock like object.

This is due to the fact that we’ve explicitly added the thread delay for debugging and some of out async threads are executing as detached and might not be over before the main exits. I don’t think this is going to be a problem in any real code.

So, there, we’ve a basic thread-safe code. This is not the best implementation possible. There are many things to look out for. Like, we’ve to take care of not pass out any references or pointers of shared data, otherwise the code might become unsafe again. Always remember that at the low level all threads share the same address space, and whenever we’ve a shared resource without a lock, we have a race condition. This is more of the design error from the programmer’s perspective.