Site hosted by Angelfire.com: Build your free website today!

Cache Memory Tutorial

Back to Piglet's Home Page/Back to Computer Science

COSC243 - lecture10

Stallings "Computer Organisation and Architecture" Ch4.3 pg117-136

What is Cache Memory?
Cache size
Cache location
Stores blocks of main memory
Mapping main to cache memory
Replacement algorithms
Maintaining validity of main memory
Maintaining cache coherency in the presence of multi processors
Two level caches
Unified vs split cache

How big is it?

Well, it needs to be small enough not to affect the overall cost of a system too much (generally aim to be about the same price as the main memory)...
...but large enough so that access times are close to that of the cache alone (perhaps aim for 110%).
Optimum size is VERY dependant on workload - effective size is 1K-512K (depending on use of system)

Close to the CPU - between the main memory and the CPU. The CPU will first retrieve data from the cache.. If it is not in the cache, the data is first loaded from the main memory into the cache, and then taken by the CPU from the cache. The chance of the CPU finding a particular piece of data in the cache is called it's "hit rate".

Transfer

What do we store in it?

Blocks of main memory map into lines of cache memory. Each block/line consists of a number of words/cells - usually quite small (optimally between 2 and 8). At any one time the cache will contain a subset of the main memory.

Memory to Cache Mapping

How is this mapping done? How do we address it? (and this is the hard part) We have a few choices:

Direct Mapping

Each memory block in the main memory can only be sent to one line. As there are alot more blocks of main memory than lines of cache, many blocks of main memory will map to a single line of cache. One of the disadvantages of this type of mapping is that it can lead to "thrashing": replacing a block of memory that is again immediately needed. The main memory address is split into a tag, line number, and word number.

Direct Main Memory to Cache Mapping

Let us take an example. Say we have 64bytes of main memory, and each word/cell is only one byte. Then we need lg64=6 bits to address each cell. Now let us say that each block is composed of only four cells (thus we have a total of 64cells/(4cells/block)=16blocks). Now let us pretend that our Cache memory is only 16bytes. That gives us 16bytes/(4cells/line)=4lines (the number of cells per line will always be the same as the number of lines). We can now think of our cache memory as a grid of line numbers and cell numbers.

Direct Main Memory to Cache Mapping

The number of bits required to cover all the cell numbers is lg(blocksize)=lg4=2
The number of bits required to cover all the line numbers is lg(num.lines)=lg4=2
And the tag is just what's left (number of bits in main memory address less the bits above eg. 6-2-2=2)- it distinquishes between all the possible main memory blocks that could be in that particular cache line, and has to be checked before calculations.

Partitioning of Main Memory Address

Associative Mapping

Unlike direct mapping, any block from main memory can appear on any cache line. Thus we don't bother indexing the line (just the cell with the least significant bits from the main memory address). The remainder of the address is the tag, and to find out which line of cache we are dealing with, the tag on each line has to be compared to that part of the address we want untill we reach the correct line. This is done simultaneously (else would take quite awhile as you could imagine) - but the circuitry required is complex and expensive.

Associative Main Memory to Cache Mapping

Partitioning of Main Memory Address

Set Associative Mapping

The lines of cache are grouped into indexed sets. Each set is only 2 or 4 lines. In this way we have direct mapping to the set, but associative mapping to the line in the set. It is a good comprimise between the two.

Set Associative Main Memory to Cache Mapping

Partitioning of Main Memory Address

While a program is running, how does the computer choose which line to overwrite to store the next block of memory on (after, of course, discovering that the memory required is not already in the cache)?

Of course, this is not an issue for Direct Mapping as there is only one possible line in the cache where any particular block of memory can go. The following mostly applies to Associative, but also Set Associative (although the choice here will only be one of 2 or 4 lines):

Least Recently Used (LRU)

Not good if executing a loop that is slightly larger than the cache. Works easily for two-way set associative - use an extra bit (per line) that is set if most recently used, so replace the line with this bit set to 0.

First In First Out (FIFO)

Easy to Implement, but again not good if program loop slightly larger than cache

Least Frequently Used (LFU)

The problem of calculating it.

Random

Actually quite effective - don't spend time deciding which one to throw out. Works quite well in a mix/variety of programmes. Burroughs (now Unisys) came up with the idea.

The problem with writing over cache

This is easy if we are replacing a line that hasn't been modified - but this isn't always easy to determine, especially as many I/O devices may have access to the cache...and several processors may have their own cache. So, we need to maintain the validity between main memory and all caches.

Write Through

All write operations to cache are also written to main memory...

...but then all other processor caches must monitor traffic to maintain consistancy.... this generates substrantial memory traffic and may cause a bottleneck.

Write Back

Updates are only made to cache. Each line of cache has an associated ubdate ("dirty") bit so that when wanting to overwrite a line, the former one in written to main memory only if this bit is set.

...this means that parts of the main memory may no longer be valid if their value is changed in cache, so all modules must first access the cache first to see if their memory cell is there first...this again leads to complex circuitry and possible bottlenecks.

Maintaining Cache Coherency when there are Multi Processors

Bus Watching With Write Through.

Each processor has a bus master that watches when other processors write through to the main memory - if detected for one of the addresses it has in it's own cache, renders it's own cache data invalid

Hardware Transparency.

Additional hardware updates all caches when a write through occurs.

Non-cachable Memory.

Any memory which more than one processor is trying to access (identified with chip-select logic or high address bits) is deemed noncachable and must be accessed directly from the main memory

Two Level Caches

Higher logic densities enable on-chip caches - very fast and no bus use...
...but still need an ordinary cache to reduce accesses to DRAM and ROM through the bus.
Thus have two levels of cache: L1 on the actual chip, and L2 is our normal one associated with main memory.

Unified or Split Caches?

Unified Cache

Data and instructions together.
Higher hit rate than split. Simpler and cheaper to design and maintain.
Not good for parallel instruction execution and pre-fetching of predicted instructions.

Split Cache

Instruction cache + data cache.
Efficient pipelining and locality of reference.

Cache Memory Tutorial

What is it?

How big is it?

Where is it?

What do we store in it?

How is this mapping done? How do we address it? (and this is the hard part) We have a few choices:

Direct Mapping

Associative Mapping

Set Associative Mapping

While a program is running, how does the computer choose which line to overwrite to store the next block of memory on (after, of course, discovering that the memory required is not already in the cache)?

Least Recently Used (LRU)

First In First Out (FIFO)

Least Frequently Used (LFU)

Random

The problem with writing over cache

Write Through

Write Back

Maintaining Cache Coherency when there are Multi Processors

Bus Watching With Write Through.

Hardware Transparency.

Non-cachable Memory.

Two Level Caches

Unified or Split Caches?

Unified Cache

Split Cache