# Chapter 2: Memory Hierarchy Design (Part 3)

Introduction Caches Main Memory (Section 2.2) Virtual Memory (Section 2.4, Appendix B.4, B.5)

### Memory Technologies

Dynamic Random Access Memory (DRAM) Optimized for density, not speed One transistor cells Multiplexed address pins Row Address Strobe (RAS) Column Address Strobe (CAS) Cycle time > access time Destructive reads Must refresh every few ms Access every row Sold as dual inline memory modules (DIMMs)

# Memory Technologies, cont.

Static Random Access Memory (SRAM) Optimized for speed, then density Typically 6 transistors per cell Separate address pins Static ⇒ No Refresh Greater power dissipation than DRAM Access time = cycle time

| DRAM Organization |  |
|-------------------|--|
| DIMM              |  |
| Rank              |  |
| Bank              |  |
| Array             |  |
| Row buffer        |  |
|                   |  |

### **DRAM Organization**

Rank: chips needed to respond to a single request Assume 64 bit data bus For 8 bit DRAM, need 8 chips in a rank For 4 bit DRAM, need 16 chips in a rank

- Can have multiple ranks per DIMM
- Bank: A chip is divided into multiple independent banks for

pipelined access

Array: A bank consists of many arrays, 1 array per bit of output, for **parallel** access

Row buffer: A "cache" that preserves the last row read from a bank

### **DRAM** Organization

See figure 1.5 in

The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It By Bruce Jacob Synthesis Lectures on Computer Architecture, Morgan & Claypool

Series editor: Mark Hill

Downloadable from U of I accounts

## Internals of a DRAM Array

See Figure 1.6 of the synthesis lecture

Steps to access a bit

Pre-charge bit lines

Activate row: turn on word line for the row, brings data to sense amps

Column read: send subset of data (columns)

(Restore data)

# DRAM Optimizations – Page Mode

Unoptimized DRAM

First read entire row Then select column from row

Stores entire row in a buffer

Page Mode

Row buffer acts like an SRAM

By changing column address, random bits can be accessed within a row.

# DRAM Optimizations – Synchronous DRAM

Previously, DRAM had asynchronous interface Each transfer involves handshaking with controller Synchronous DRAM (SDRAM) Clock added to interface Register to hold number of bytes requested Send multiple bytes per request Double Data Rate (DDR) Send data on rising and falling edge of clock

### Simple Main Memory

Consider a memory with these parameters: 1 cycle to send address 6 cycles to access each word 1 cycle to send word back to CPU/Cache

What's the miss penalty for a 4word block?  $(1 + 6 \text{ cycles} + 1 \text{ cycle}) \times 4 \text{ words}$ = 32 cycles

How can we speed this up?

# Wider Main Memory

Make the memory wider Read out 2 (or more) words in parallel Memory parameters: 1 cycle to send address 6 cycles to access each *doubleword* 1 cycle to send doubleword back to CPU/Cache Miss penalty for a 4 word block: (1 + 6 cycles + 1 cycle) × 2 doublewords = 16 cycles Cost Wider bus Larger expansion size

| ngamzo moi   | nory in banks  |                |        |  |
|--------------|----------------|----------------|--------|--|
| Subseque     | nt words map   | to different t | banks  |  |
| Word A in    | bank (A mod    | M)             |        |  |
| Within a b   | ank, word A ir | n location (A  | div M) |  |
| Word address |                |                |        |  |
|              |                |                |        |  |

|                                                                                                       |             |                 |                 |                   |              |                         |                 |                                                                                          | Standard       | I/O clock rate                                           | M transfers/s                                         | DRAM name                                                | MiB/s/DIMM                                                | DIMM name                                               |
|-------------------------------------------------------------------------------------------------------|-------------|-----------------|-----------------|-------------------|--------------|-------------------------|-----------------|------------------------------------------------------------------------------------------|----------------|----------------------------------------------------------|-------------------------------------------------------|----------------------------------------------------------|-----------------------------------------------------------|---------------------------------------------------------|
|                                                                                                       |             |                 | Best case a     | ccess time (no pr | echarge)     | Precharge needed        |                 |                                                                                          | DDR1           | 133                                                      | 266                                                   | DDR266                                                   | 2128                                                      | PC2100                                                  |
| Production year                                                                                       | Chip size   | DRAM type       | RAS time (ns)   | CAS time (ns)     | Total (ns)   | Total (ns)              |                 |                                                                                          | DDR1           | 150                                                      | 300                                                   | DDR300                                                   | 2400                                                      | PC2400                                                  |
| 2000                                                                                                  | 256M bit    | DDR1            | 21              | 21                | 42           | 63                      |                 |                                                                                          | DDR1           | 200                                                      | 400                                                   | DDR400                                                   | 3200                                                      | PC3200                                                  |
| 2002                                                                                                  | 512M bit    | DDR1            | 15              | 15                | 30           | 45                      |                 |                                                                                          | DDR2<br>DDR2   | 266                                                      | 533 667                                               | DDR2-533<br>DDR2-667                                     | 4264                                                      | PC4300<br>PC5300                                        |
| 2004                                                                                                  | 1G bit      | DDR2            | 15              | 15                | 30           | 45                      |                 |                                                                                          | DDR2           | 400                                                      | 800                                                   | DDR2-667                                                 | 5330                                                      | PC5300                                                  |
| 2006                                                                                                  | 2G bit      | DDR2            | 10              | 10                | 20           | 30                      |                 |                                                                                          | DDR2<br>DDR3   | 533                                                      | 1066                                                  | DDR3-1066                                                | 8528                                                      | PC8500                                                  |
| 2010                                                                                                  | 4G bit      | DDR3            | 13              | 13                | 26           | 39                      |                 |                                                                                          | DDR3           | 666                                                      | 1333                                                  | DDR3-1333                                                | 10.654                                                    | PC10700                                                 |
| 2010                                                                                                  | 8G bit      | DDR5            | 13              | 13                | 26           | 39                      |                 |                                                                                          | DDR3           | 800                                                      | 1600                                                  | DDR3-1600                                                | 12.800                                                    | PC12800                                                 |
| 2016                                                                                                  | 80 bit      | DDR4            | 13              | 15                | 20           | 39                      |                 |                                                                                          | DDR4           | 1333                                                     | 2666                                                  | DDR4-2666                                                | 21,300                                                    | PC21300                                                 |
| 2.4 Capacity and access<br>s a new row must be op<br>arge is required, and the<br>increased. DDR4 SDR | access time | e is longer. As | s the number of | f banks has incre | eased, the a | ability to hide the pre | nd<br>nen<br>ne | Figure 2.5 Clock r<br>columns. The third<br>The fifth column is<br>significant first use | eight times th | dth, and names<br>ce the second, an<br>e third column, a | of DDR DRAMS<br>ad the fourth use<br>and a rounded ve | S and DIMMs in<br>as the number fro<br>rsion of this num | 2016. Note the<br>om the third colu<br>aber is used in th | numerical relatio<br>mn in the name<br>le name of the D |
|                                                                                                       |             |                 |                 |                   |              |                         |                 |                                                                                          |                |                                                          |                                                       |                                                          |                                                           |                                                         |







| Original Motivation<br>Avoid overlays<br>Use main memory as a cache for disk<br>Current motivation<br>Relocation<br>Protection<br>Sharing |  |
|-------------------------------------------------------------------------------------------------------------------------------------------|--|
| Use main memory as a cache for disk<br>Current motivation<br>Relocation<br>Protection                                                     |  |
| Current motivation<br>Relocation<br>Protection                                                                                            |  |
| Relocation<br>Protection                                                                                                                  |  |
| Protection                                                                                                                                |  |
|                                                                                                                                           |  |
| Sharing                                                                                                                                   |  |
| Shanng                                                                                                                                    |  |
| Fast startup                                                                                                                              |  |
| Engineered differently than CPU caches                                                                                                    |  |
| Miss access time O(1,000,000)                                                                                                             |  |
| Miss access time >> miss transfer time                                                                                                    |  |

Virtual Memory





















### Protection

Goal:

One process should not be able to interfere with the execution of another

Process model Privileged kernel Independent user processes

#### Primitives vs. Policy

Architecture provides the primitives Operating system implements the policy Problems arise when hardware implements policy

### **Protection Primitives**

User vs. Kernel At least one privileged mode Usually implemented as mode bit(s)

How do we switch to kernel mode? Change mode and continue execution at *predetermined* location Hardware to compare mode bits to access rights

Access certain resources only in kernel mode

# Protection Primitives, cont.

Base and Bounds Privileged registers Base  $\leq$  Address  $\leq$  Bounds

Pagelevel protection Protection bits in page table entry Cache them in TLB