Site hosted by Angelfire.com: Build your free website today!

Cache Memory Tutorial

Back to Piglet's Home Page/Back to Computer Science

COSC243 - lecture10

Stallings "Computer Organisation and Architecture" Ch4.3 pg117-136


What is it?

Fast but expensive temporary memory.

back to top


How big is it?

back to top


Where is it?

Close to the CPU - between the main memory and the CPU. The CPU will first retrieve data from the cache.. If it is not in the cache, the data is first loaded from the main memory into the cache, and then taken by the CPU from the cache. The chance of the CPU finding a particular piece of data in the cache is called it's "hit rate".

Transfer

What do we store in it?

Blocks of main memory map into lines of cache memory. Each block/line consists of a number of words/cells - usually quite small (optimally between 2 and 8). At any one time the cache will contain a subset of the main memory.

Memory to Cache Mapping

back to top


How is this mapping done? How do we address it? (and this is the hard part) We have a few choices:

  1. Direct Mapping
  2. Each memory block in the main memory can only be sent to one line. As there are alot more blocks of main memory than lines of cache, many blocks of main memory will map to a single line of cache. One of the disadvantages of this type of mapping is that it can lead to "thrashing": replacing a block of memory that is again immediately needed. The main memory address is split into a tag, line number, and word number.

    Direct Main Memory to Cache Mapping

    Let us take an example. Say we have 64bytes of main memory, and each word/cell is only one byte. Then we need lg64=6 bits to address each cell. Now let us say that each block is composed of only four cells (thus we have a total of 64cells/(4cells/block)=16blocks). Now let us pretend that our Cache memory is only 16bytes. That gives us 16bytes/(4cells/line)=4lines (the number of cells per line will always be the same as the number of lines). We can now think of our cache memory as a grid of line numbers and cell numbers.

    Direct Main Memory to Cache Mapping

    back to top


    Partitioning of Main Memory Address

  3. Associative Mapping
  4. Unlike direct mapping, any block from main memory can appear on any cache line. Thus we don't bother indexing the line (just the cell with the least significant bits from the main memory address). The remainder of the address is the tag, and to find out which line of cache we are dealing with, the tag on each line has to be compared to that part of the address we want untill we reach the correct line. This is done simultaneously (else would take quite awhile as you could imagine) - but the circuitry required is complex and expensive.

    Associative Main Memory to Cache Mapping

    Partitioning of Main Memory Address

    back to top


  5. Set Associative Mapping
  6. The lines of cache are grouped into indexed sets. Each set is only 2 or 4 lines. In this way we have direct mapping to the set, but associative mapping to the line in the set. It is a good comprimise between the two.

    Set Associative Main Memory to Cache Mapping

    Partitioning of Main Memory Address

back to top


While a program is running, how does the computer choose which line to overwrite to store the next block of memory on (after, of course, discovering that the memory required is not already in the cache)?

Of course, this is not an issue for Direct Mapping as there is only one possible line in the cache where any particular block of memory can go. The following mostly applies to Associative, but also Set Associative (although the choice here will only be one of 2 or 4 lines):

  1. Least Recently Used (LRU)
  2. Not good if executing a loop that is slightly larger than the cache. Works easily for two-way set associative - use an extra bit (per line) that is set if most recently used, so replace the line with this bit set to 0.

  3. First In First Out (FIFO)
  4. Easy to Implement, but again not good if program loop slightly larger than cache

  5. Least Frequently Used (LFU)
  6. The problem of calculating it.

  7. Random
  8. Actually quite effective - don't spend time deciding which one to throw out. Works quite well in a mix/variety of programmes. Burroughs (now Unisys) came up with the idea.

back to top


The problem with writing over cache

This is easy if we are replacing a line that hasn't been modified - but this isn't always easy to determine, especially as many I/O devices may have access to the cache...and several processors may have their own cache. So, we need to maintain the validity between main memory and all caches.

  1. Write Through
  2. All write operations to cache are also written to main memory...

    ...but then all other processor caches must monitor traffic to maintain consistancy.... this generates substrantial memory traffic and may cause a bottleneck.

  3. Write Back
  4. Updates are only made to cache. Each line of cache has an associated ubdate ("dirty") bit so that when wanting to overwrite a line, the former one in written to main memory only if this bit is set.

    ...this means that parts of the main memory may no longer be valid if their value is changed in cache, so all modules must first access the cache first to see if their memory cell is there first...this again leads to complex circuitry and possible bottlenecks.

back to top


Maintaining Cache Coherency when there are Multi Processors

  1. Bus Watching With Write Through.
  2. Each processor has a bus master that watches when other processors write through to the main memory - if detected for one of the addresses it has in it's own cache, renders it's own cache data invalid

  3. Hardware Transparency.
  4. Additional hardware updates all caches when a write through occurs.

  5. Non-cachable Memory.
  6. Any memory which more than one processor is trying to access (identified with chip-select logic or high address bits) is deemed noncachable and must be accessed directly from the main memory

back to top


Two Level Caches

back to top


Unified or Split Caches?

  1. Unified Cache
  2. Split Cache

back to top