Is it guaranteed that random access within a 16KB chunk will not cause cache miss?

My knowledge on the cpu arch was pretty out-dated, so I've made some quick research on cache. And here're something I learned:

* Cache loads data in units of cache line, all cache line is 64bytes in L1-L3 (recent Intel cpus)
* Sequence access and stride access would make cache to do prefetch so we can avoid cache miss even if we access more than one cache line

And here're something I don't quite understand:

* What's the point of 16KB chunk? Does that mean there would be no cache miss if we randomly access within a chunk? Or is it that there would still be (16KB/64=256) cache miss before we fill the whole chunk into the L1 cache?

Does that mean there would be no cache miss if we randomly access within a chunk?

Click to expand...

No, we are not aiming random access without cache miss. We are aiming to minimize the cache miss and try to process the data in sequence. This guarantees that the prefetch will load our next chunk of data so cache miss will be rare.
No one promised no cache miss over random access AFAIK.

Unity Technologies

As Lurking-Ninja said, its not about guranteeing zero cachemiss. It is impossible to achieve that on a program as a whole, it is about massively reducing it and massively reducing the amount of bandwidth caused by loading memory into cache that is then not consumed.

16KB means if you have for example 4 components on entity roughly 62 cache lines for each component per chunk laid out linearly. Thats a huge improvement over otherwise randomly jumping around in cache lines.

The exact 16KB chunk size was determined empirically on megacity. Ultimately we do want it to be dynamic. At least introducing the concept of tiny & large chunks. We have definitely noticed that on both extremes (Huge amounts of entities, small amounts of entities of one archetype) efficiecency could be improved.

As Lurking-Ninja said, its not about guranteeing zero cachemiss. It is impossible to achieve that on a program as a whole,

Click to expand...

Thanks for the clarification. There're still quite some points I don't quite understand about the cache.

Just take the random access within a chunk in title as example, say if a IJobChunk randomly access the data in the chunk 100k times ( an iterative algorithm on graph maybe), is it mostly right that we would only have 256 cache miss during the job execution?

massively reducing the amount of bandwidth caused by loading memory into cache that is then not consumed.

Click to expand...

I've another question about cache prefetch (strided prefetch) here.

Assume we cache some data of 100K elements (each 128bytes) in an array. Is it right that if we loop to access an int member in each of the element, the cache will basically not miss as we are accessing address in a fixed stride?

Oops...

"Unity", Unity logos, and other Unity trademarks are trademarks or registered trademarks of Unity Technologies or its affiliates in the U.S. and elsewhere (more info here). Other names or brands are trademarks of their respective owners.