# Chapter 2: Memory Hierarchy Design (Part 3)

Introduction

Caches

Main Memory (Section 2.2)

Virtual Memory (Section 2.4, Appendix B.4, B.5)

# **Memory Technologies**

Dynamic Random Access Memory (DRAM)

Optimized for density, not speed

One transistor cells

Multiplexed address pins

Row Address Strobe (RAS)

Column Address Strobe (CAS)

Cycle time > access time

Destructive reads

Must refresh every few ms

Access every row

Sold as dual inline memory modules (DIMMs)

1 2

## Memory Technologies, cont.

Static Random Access Memory (SRAM)

Optimized for speed, then density

Typically 6 transistors per cell

Separate address pins

 $\mathsf{Static} \Rightarrow \mathsf{No}\; \mathsf{Refresh}$ 

Greater power dissipation than DRAM

Access time = cycle time

**DRAM Organization** 

DIMM

Rank

Bank

Array

Row buffer

3

# **DRAM Organization**

Rank: chips needed to respond to a single request

Assume 64 bit data bus

For 8 bit DRAM, need 8 chips in a rank

For 4 bit DRAM, need 16 chips in a rank

Can have multiple ranks per DIMM

Bank: A chip is divided into multiple independent banks for pipelined access

Array: A bank consists of many arrays, 1 array per bit of output, for parallel access

Row buffer: A "cache" that preserves the last row read from a bank

#### **DRAM Organization**

See figure 1.5 in

6

The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It

By Bruce Jacob

Synthesis Lectures on Computer Architecture, Morgan & Claypool

Series editor: Mark Hill

Downloadable from U of I accounts

https://www.morganclaypool.com/doi/pdfplus/10.2200/S00201ED1V 01Y200907CAC007

5

#### Internals of a DRAM Array

See Figure 1.6 of the synthesis lecture

Steps to access a bit

Pre-charge bit lines

Activate row: turn on word line for the row, brings data to sense amps

Column read: send subset of data (columns)

(Restore data)

# DRAM Optimizations - Page Mode

Unoptimized DRAM

First read entire row

Then select column from row

Stores entire row in a buffer

Page Mode

Row buffer acts like an SRAM

By changing column address, random bits can be accessed within a row.

7 8

## DRAM Optimizations - Synchronous DRAM

Previously, DRAM had asynchronous interface

Each transfer involves handshaking with controller

Synchronous DRAM (SDRAM)

Clock added to interface

Register to hold number of bytes requested

Send multiple bytes per request

Double Data Rate (DDR)

Send data on rising and falling edge of clock

#### Simple Main Memory

Consider a memory with these parameters:

1 cycle to send address

6 cycles to access each word

1 cycle to send word back to CPU/Cache

What's the miss penalty for a 4word block?

 $(1 + 6 \text{ cycles} + 1 \text{ cycle}) \times 4 \text{ words}$ = 32 cycles

How can we speed this up?

9 10

# Wider Main Memory

Make the memory wider

Read out 2 (or more) words in parallel

Memory parameters:

1 cycle to send address

6 cycles to access each doubleword

1 cycle to send doubleword back to CPU/Cache

Miss penalty for a 4 word block:

 $(1 + 6 \text{ cycles} + 1 \text{ cycle}) \times 2 \text{ doublewords}$ 

= 16 cycles

Cost

Wider bus

Larger expansion size

Organize memory in banks
Subsequent words map to different banks
Word A in bank (A mod M)
Within a bank, word A in location (A div M)

Word address

Bank
Word in Bank
How many banks to include?

11 12



| Standard                                                                                            | I/O clock rate    | M transfers/s | DRAM name                        | MIB/s/DIMM       | DIMM name           |                |
|-----------------------------------------------------------------------------------------------------|-------------------|---------------|----------------------------------|------------------|---------------------|----------------|
| DDRI                                                                                                | 133               | 266           | DDR266                           | 2128             | PC2100              |                |
| DDRI                                                                                                | 150               | 300           | DDR300                           | 2400             | PC2400              |                |
| DDRI                                                                                                | 200               | 400           | DDR400                           | 3200             | PC3200              |                |
| DDR2                                                                                                | 266               | 533           | DDR2-533                         | 4264             | PC4300              |                |
| DDR2                                                                                                | 333               | 667           | DDR2-667                         | 5336             | PC5300              |                |
| DDR2                                                                                                | 400               | 800           | DDR2-800                         | 6400             | PC6400              |                |
| DDR3                                                                                                | 533               | 1066          | DDR3-1066                        | 8528             | PC8500              |                |
| DDR3                                                                                                | 666               | 1333          | DDR3-1333                        | 10,664           | PC10700             |                |
| DDR3                                                                                                | 800               | 1600          | DDR3-1600                        | 12,800           | PC12800             |                |
| DDR4                                                                                                | 1333              | 2666          | DDR4-2666                        | 21,300           | PC21300             |                |
| re 2.5 Clock rates, banns. The third column is lifth column is eight time ficant first use in 2016. | lwidth, and names | of DDR DRAMS  | and DIMMs in<br>s the number fro | 2016. Note the r | numerical relations | the DRAM chip. |

14



# Other Technologies Graphics Data RAMS (GDDR) Wider (32 bits), higher clock, connect directly to GPUs (soldered to board vs. DIMMs) Die stacked DRAMs / 3D / High Bandwidth Memory (HBM) Nonvolatile memory (later) Flash Phase change Reliability: Parity, ECC, chipkill

15 16



Virtual Memory User operates in a virtual address space, mapping between virtual space and main memory is determined at runtime Original Motivation Avoid overlays Use main memory as a cache for disk Current motivation Relocation Protection Sharing Fast startup Engineered differently than CPU caches Miss access time O(1,000,000) Miss access time >> miss transfer time

18

# Virtual Memory, cont. Blocks, called pages, are 512 to 16K bytes. Page placement Fully-associative -- avoid expensive misses Page identification Address translation -- virtual to physical address Indirection through one or two page tables Translation cached in translation buffer Page replacement Approx. LRU Write strategy Writeback (with page dirty bit)



19 20









23 24





25 2





27 28

## **Protection**

#### Goal:

One process should not be able to interfere with the execution of another

Process model

Privileged kernel

Independent user processes

Primitives vs. Policy

Architecture provides the primitives

Operating system implements the policy

Problems arise when hardware implements policy

## **Protection Primitives**

User vs. Kernel

30

At least one privileged mode

Usually implemented as mode bit(s)

How do we switch to kernel mode?

Change mode and continue execution at predetermined location

Hardware to compare mode bits to access rights

Access certain resources only in kernel mode

29

## Protection Primitives, cont.

Base and Bounds

Privileged registers
Base ≤ Address ≤ Bounds

Pagelevel protection

Protection bits in page table entry

Cache them in TLB

31