Manipulating data in structure of arrays format can be unweildy for some, but this post talks about making things easier for you using some simple templating to replace the manual side of iteration through the arrays.

It has become more obvious to people involved in optimisation that the x86 architecture is a difficult platform to understand at the core. This is partially because of the multitude of different CPUs out there that support the instruction set, each with their different timings, but also because of this latest breed of extraordinarily out of order CPUs. Knowing what's actually going to happen in an i7 has become a near impossible task.

Read Robert Graham's post on x86 is a high-level language and try to see why it's so very difficult to grok the flow of data in these chips, and also how it's very difficult to guess what will be the best performing algorithm without doing a lot of real world tests.

Nice read on why grep is quick. Some simple stuff, some awesome algorithm usage, and generally the kind of thing that you might want to keep in your head for if you come across a searching pattern that is similar to grep in any way.

A really interesting article with real world data on the potential cache misuses issues of large, 2^ aligned structures. This is another reason to be ware of keeping data in large structures rather than separate arrays. It doesn't happen much by accident, but it's the kind of thing that gets done on purpose because there were reasons in GPUs to keep things aligned, and the cargo cult goes deep with these kinds of knowledge.How Misaligning Data Can Increase Performance 12x by Reducing Cache Misses

I was skeptical at first, but the author appears to have tested his efforts with real hardware, which of course is a core tenet of DOD. Also this is not a post about a new invention, but a set of results from tests where the author replaces a hash table with alternatives. It's interesting to look at the different timings, but remember to test your code and not just follow blindly, as you may have overhead somewhere else that makes the slowest in these tests, suddenly the fastest.

Swap data for energy, and the demand oriented approach to fulfilment changes the function used to determine fitness. With energy, the demand over time was well known, but ignored by thousands of people installing expensive hardware.

As reference material for the book, a github project has been started to show the development of a game in both the Object-Oriented and Data-Oriented approaches.

Expect slow updates right now as it only has one developer and they are in full time employment at a startup so spare time is scarce. However, if you wish to follow along, the project is hosted here on github for all to see.

In addition to the parallel game development, submissions from other developers would be appreciated, specifically any demo code that provides ways to build timings for the performance oriented points of the book. For example, any code that could be used to directly show the impact of bad pipelining, bad cache alignment, or even the effects of write combining. The only rule will be that it has to be simple, and able to run on many platforms. Single platform statistics aren't much use unless they are targetting current trending hardware like ARM based CPUs.

Update: the book is now in live public beta. Check the right bar now for the link
.
Time to see if there is an audience for the book.
If Richard Fabian gets more than 100 people commenting on his post then he has promised to upload and publish his work in progress book on Data-Oriented design to this very website.
If this happens, we will be replacing the signup link with the link to the full book which will be updated as it progresses towards a printable version.

A lovely example of how continually looking at the data from one step to the next resulted in a drastic reduction in space usage while still maintaining a data-parallel solution to the problem of mesh decompression.link

Even people with big brains have reservations about components, so to finish off the section on component oriented design in the upcoming data-oriented design book, I'd really like some negative experiences so they can be solved in a troubleshooting like section.

I've noticed that there are some points where people get things a bit wrong in implementing components, and it causes things to tie together badly. Once they've started coupling things together, they don't see any benefit to components, and it becomes a bad example and part of their opinion on components. This negative experience spreads around by word of mouth, and that's just as bad as gossip.

There are probably some guidelines or guiding principles that can be distilled from these negative experiences that might help when trying to ensure people don't get lost. Troubleshooting is a good match as there are likely a number of similar negative experiences that can all be solved in a similar way.

For example, in my experience the thing that trips people up most is expecting components to talk to each other for some reason, like they are objects that can message each other in some two directional way. But in practice, I've found that's not necessary.

So I think we need some examples of cases where it might seem like components don't work, or are inefficient, and follow them with how you diagnose what's wrong with those assumptions.

If you've had any bad experiences with components, they're probably going to be more beneficial than positive ones, as helping people out of messes is a lot more positive than just announcing how cool something is.

but this time there's an interesting statement that needs to be investigated. In the section titled Why intrusive lists are better, there is an argument:

When traversing objects stored on an intrusive linked list, it only takes one pointer indirection to get to the object, compared to two pointer indirections for std::list. This causes less memory-cache thrashing so your program runs faster — particularly on modern processors which have huge delays for memory stalls.

Right

Think about it a bit more and you realise that this must in fact be false. Where are the pointers to the elements? They are in the middle of the structures being traversed. They're not somewhere all together, huddled up on cachelines, they're split apart by at least the size of the object they are linking. This means you save one indirection, but potentially at the cost of many loads from separate areas of memory which will cane your memory bandwidth.

Apart from that minor snaffet, the article is sound, but when you come across a false statement like that it does make you question the authenticity of the rest of the article. Foremost in your mind when you do any performance related work must be profiling. If the author had done profiling he might have learned that this was the case and been able to improve on the design further, or at least realise that there was a trade-off and with that knowledge be better armed.

Boolean parameters are usually used to control code flow inside a function from without. With no further information it should be a simple case of deduction that this is an unnecessary waste of time in almost all cases. If the code flow is meant to be controlled from without, then why not introduce two different functions. If there are multiple boolean switches of code flow, then it's probably true that the callee does too much.

If you put your data into classes, then you're limited to the basic structures available, namely fields as constants in your classes. Runtime changes to the structure of a class are very difficult to achieve with C++ without invoking arcane and hard to debug techniques. With blobs, or a simpler access pattern such as a free function to get a variable, then you can reimagine the data structures in new ways.http://simblob.blogspot.co.uk/2012/07/playing-with-dot-operator.html

Sometimes, reinterpretation can bring old ideas back to be fully realised. The MVC pattern is one of the good design patterns because it promoted separation of state from interpretation and action. MOVE is maybe a more clear interpretation of what MVC aims to provide.http://cirw.in/blog/time-to-move-on

If you're trying to reduce the amount of energy used getting data to your CPUs, then maybe this idea will work out better for efficient data movement.

Whether you're talking about high performance computers, enterprise servers, or mobile devices, the two biggest impediments to application performance in computing today are the memory wall and the power wall. Venray Technology is aiming to knock down those walls with a unique approach that puts CPU cores and DRAM on the same die. The company has been in semi-stealth mode since it inception seven years ago, but is now trying to get the word out about its technology as it searches for a commercial buyer.