Saturday, August 28, 2010

Cache Coherence Gyaan



Total Geek talk, noobs feel free to close the tab.

Finally, I realized that it is good to share technical knowledge , because the more you share the more you know.

Basics: Cache is a mid level memory between the CPU and the RAM, the simple funda behind cache is that it is a bit expensive and much more faster than your RAM (in terms of access time) , so during processing some of the data from RAM is brought into the cache, the data which is assumed to be accessed much more often. So that the system becomes more efficient.

Now, all is good till the time there is only one core, but if there are multiple cores which are sharing the same memory space, it creates a bit of a mess. One needs to know who is changing the memory contents, because each core would be having its own cache. Say, there is a variable which is used by both the cores, and both of them have cached it. Now, if core1 updates its value in its cache, or even if it updates its value in the memory. Core2 needs to know that the value of this variable has been changed and it should update its value before using it. Well this is called as the "Cache Coherency" problem. The problem statement is simple, "Figure out if something has been messed up, and if yes , what??"

Before we delve into the details, we need to have some more basics. There are two ways to maintain a cache (even if there is a single cache), through which we decide that what data should be loaded into the cache and when, because the size of cache w.r.t RAM is quite less. (Just as a spec my laptop has 4 GB RAM and 3MB of cache). This is maintained as either write through cache or a write back cache. In a write through cache, the contents of the cache are written back to the memory as soon as they are updated in the cache. In a write back cache, the contents of the cache are kept in the cache even when they are changed, but when they need to be swapped out of the cache to load some other memory contents, only then they are written back into the memory.

Now, there are two ways to handle this situation :

  • The Software way
  • The Hardware way
In the software way, to implement cache coherency, it is the responsibility of the compiler , and sometimes the programmer (if he/she is programming in assembly language) to take care of cache coherency. A simple and a bit inefficient way of doing so is flush the cache or mark it as invalid, as soon as you write a shared/global variable. Well this a very inefficient way and there are many more efficient ways to do this. And these ways generally depend on the hardware too.

Now the problem with having cache coherency through Software means is, firstly, it is inefficient one might end up updating the whole cache even when only a few memory locations are changed. Secondly, and most importantly, the code which depends on software cache coherency cannot be easily ported from one machine to another, as the other machine might be having a different hardware configuration.


In the hardware way, there is special space reserved in the cache for cache coherence. The cache is in general divided into cache lines, say if a cache line consists of 100 bytes of data then whenever data would be swapped out of cache into the memory or data is swapped into the cache it is transacted into multiples of this number. One can imagine a cache as a 2D memory consisting of some number of lines. And for each cache line there is a directory entry in the special space reserved for cache coherence, this space is generally called as "cache directory". For each cache line there are 3 bits reserved in the directory which tell about the state of the data in that cache line.

The issues with hardware coherency is, first, it generates a lot of inter cache traffic, whenever a cache write occurs, that cache needs to send info about this to all the other caches so that they can update themselves. Secondly, the amount of cache memory which is usable for storing data decreases.



So which one is better ?

Well it depends on the situation, if the program has too little shared variables, then the hardware method is preferable, as the amount of inter cache traffic is manageable. Whereas if the amount of shared variables are too many and they are updated by different threads quite often, then the software method is preferred, as in this case network traffic will be too much, and it is fine to flush the whole cache at once instead of doing it one by one, by hardware method.


Now, how about a system in which we can dynamically change from software coherency to hardware coherency. This is a paper which talks about such a system, and also inspired me to write this blog.

No comments: