# TESLA V100 GPU

Xudong Shao Houxiang Ji Hao Gao

#### The history of GPU architecture



2017 Volta architecture

Figure 1.5: The Scaling of NVIDIA GTX Products for Desktop Utilizations

Reference & Credit: Erik Lindholm, John Nickolls, Stuart Oberman, John Montrym, NVIDA Tesla: A Unified Graphics And Computing Architecture

# Components of GPU

- ➤ host interface
- vertex work
- pixel fragments work
- compute work

> TPC texture/processor clusters numbers --> performance

UnificationStarts from Tesla architecture



Figure 1. Tesla unified graphics and computing GPU architecture. TPC: texture/processor cluster; SM: streaming multiprocessor; SP: streaming processor; Tex: texture, ROP: raster operation processor.

Reference & Credit: Erik Lindholm, John Nickolls, Stuart Oberman, John Montrym, NVIDA Tesla: A Unified Graphics And Computing Architecture





Reference & Credit: Erik Lindholm, John Nickolls, Stuart Oberman, John Montrym, NVIDA Tesla: A Unified Graphics And Computing Architecture

#### ➤ Geometry controller

# SMCStreaming multiprocessor controller

≻ Texture unit

| M                                                                        |                                                                                         |                                                     |                                                              |                                                              |           |                                 | L1 Instruc                     | tion Cache                                                   |                                                            |                                                     |                                                              |                                                              |                         |                |
|--------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|-----------------------------------------------------|--------------------------------------------------------------|--------------------------------------------------------------|-----------|---------------------------------|--------------------------------|--------------------------------------------------------------|------------------------------------------------------------|-----------------------------------------------------|--------------------------------------------------------------|--------------------------------------------------------------|-------------------------|----------------|
|                                                                          |                                                                                         | LOI                                                 | nstruct                                                      | tion Ca                                                      | ache      |                                 |                                |                                                              |                                                            | L0 Ir                                               | nstruc                                                       | tion C                                                       | ache                    |                |
| Warp Scheduler (32 thread/clk)                                           |                                                                                         |                                                     |                                                              |                                                              |           |                                 | Warp Scheduler (32 thread/clk) |                                                              |                                                            |                                                     |                                                              |                                                              |                         |                |
| Dispatch Unit (32 thread/clk)                                            |                                                                                         |                                                     |                                                              |                                                              |           |                                 | Dispatch Unit (32 thread/clk)  |                                                              |                                                            |                                                     |                                                              |                                                              |                         |                |
| Register File (16,384 x 32-bit)                                          |                                                                                         |                                                     |                                                              |                                                              |           | Register File (16,384 x 32-bit) |                                |                                                              |                                                            |                                                     |                                                              |                                                              |                         |                |
| FP64                                                                     | INT                                                                                     | INT                                                 | FP32                                                         | FP32                                                         |           | $\square$                       |                                | FP64                                                         | INT                                                        | INT                                                 | FP32                                                         | FP32                                                         |                         |                |
| FP64                                                                     | INT                                                                                     | INT                                                 | FP32                                                         | FP32                                                         |           |                                 |                                | FP64                                                         | INT                                                        | INT                                                 | FP32                                                         | FP32                                                         |                         |                |
| FP64                                                                     | INT                                                                                     | INT                                                 | FP32                                                         | FP32                                                         |           |                                 |                                | FP64                                                         | INT                                                        | INT                                                 | FP32                                                         | FP32                                                         |                         |                |
| FP64                                                                     | INT                                                                                     | INT                                                 | FP32                                                         | FP32                                                         | TEN       | TENSOR                          | TENSOR                         | FP64                                                         | INT                                                        | INT                                                 | FP32                                                         | FP32                                                         | TENSOR                  | TENSOR         |
| FP64                                                                     | INT                                                                                     | INT                                                 | FP32                                                         | FP32                                                         | CORE      | CORE                            | FP64                           | INT                                                          | INT                                                        | FP32                                                | FP32                                                         | CORE                                                         | CORE                    |                |
| FP64                                                                     | INT                                                                                     | INT                                                 | FP32                                                         | FP32                                                         |           |                                 |                                | FP64                                                         | INT                                                        | INT                                                 | FP32                                                         | FP32                                                         |                         |                |
| FP64                                                                     | INT                                                                                     | INT                                                 | FP32                                                         | FP32                                                         |           |                                 |                                | FP64                                                         | INT                                                        | INT                                                 | FP32                                                         | FP32                                                         |                         |                |
| FP64                                                                     | INT                                                                                     | INT                                                 | FP32                                                         | FP32                                                         |           |                                 |                                | FP64                                                         | INT                                                        | INT                                                 | FP32                                                         | FP32                                                         |                         |                |
| LD/ LD/<br>ST ST                                                         | LD/<br>ST                                                                               | LDV<br>ST                                           | LD/<br>ST                                                    | LDV<br>ST                                                    | LD/<br>ST | LDV<br>ST                       | SFU                            | LD/ LD/<br>ST ST                                             | LDV<br>ST                                                  | LD/<br>ST                                           | LDV<br>ST                                                    | LD/<br>ST                                                    | LDV LD/<br>ST ST        | SFU            |
|                                                                          | L0 Instruction Cache<br>Warp Scheduler (32 thread/clk)<br>Dispatch Unit (32 thread/clk) |                                                     |                                                              |                                                              |           |                                 |                                |                                                              | War<br>Di                                                  | p Sch<br>spatci                                     | edule<br>h Unit                                              | r (32 ti<br>(32 th                                           | hread/clk)<br>read/clk) |                |
|                                                                          | Reg                                                                                     | ister                                               | File (1                                                      | 6,384                                                        | x 32      | -bit)                           |                                |                                                              | Reg                                                        | ister                                               | File (1                                                      | 16,384                                                       | 4 x 32-bit)             |                |
| FP64                                                                     | INT                                                                                     | INT                                                 |                                                              |                                                              |           |                                 |                                |                                                              |                                                            |                                                     |                                                              |                                                              |                         |                |
|                                                                          |                                                                                         |                                                     | FP32                                                         | FP32                                                         |           |                                 |                                | FP64                                                         | INT                                                        | INT                                                 | FP32                                                         | FP32                                                         |                         |                |
| FP64                                                                     | INT                                                                                     | INT                                                 | FP32                                                         | FP32<br>FP32                                                 |           |                                 |                                | FP64<br>FP64                                                 | INT<br>INT                                                 | INT<br>INT                                          | FP32<br>FP32                                                 | FP32<br>FP32                                                 |                         |                |
| FP64<br>FP64                                                             | INT<br>INT                                                                              | INT<br>INT                                          | FP32<br>FP32<br>FP32                                         | FP32<br>FP32<br>FP32                                         |           |                                 |                                | FP64<br>FP64<br>FP64                                         | INT<br>INT<br>INT                                          | INT<br>INT<br>INT                                   | FP32<br>FP32<br>FP32                                         | FP32<br>FP32<br>FP32                                         |                         |                |
| FP64<br>FP64<br>FP64                                                     | INT<br>INT<br>INT                                                                       | INT<br>INT<br>INT                                   | FP32<br>FP32<br>FP32<br>FP32                                 | FP32<br>FP32<br>FP32<br>FP32                                 | TEN       | SOR                             | TENSOR                         | FP64<br>FP64<br>FP64<br>FP64                                 | INT<br>INT<br>INT<br>INT                                   | INT<br>INT<br>INT                                   | FP32<br>FP32<br>FP32<br>FP32                                 | FP32<br>FP32<br>FP32<br>FP32                                 | TENSOR                  | TENSOR         |
| FP64<br>FP64<br>FP64<br>FP64                                             | INT<br>INT<br>INT<br>INT                                                                | INT<br>INT<br>INT                                   | FP32<br>FP32<br>FP32<br>FP32<br>FP32                         | FP32<br>FP32<br>FP32<br>FP32<br>FP32                         | TEN       | ISOR                            | TENSOR                         | FP64<br>FP64<br>FP64<br>FP64<br>FP64                         | INT<br>INT<br>INT<br>INT                                   | INT<br>INT<br>INT<br>INT                            | FP32<br>FP32<br>FP32<br>FP32<br>FP32                         | FP32<br>FP32<br>FP32<br>FP32<br>FP32                         | TENSOR                  | TENSOR         |
| FP64<br>FP64<br>FP64<br>FP64<br>FP64                                     | INT<br>INT<br>INT<br>INT                                                                | INT<br>INT<br>INT<br>INT                            | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32                 | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32                 | TEN       | ISOR<br>DRE                     | TENSOR                         | FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64                 | INT<br>INT<br>INT<br>INT<br>INT                            | INT<br>INT<br>INT<br>INT<br>INT                     | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32                 | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32                 | TENSOR<br>CORE          | TENSOR         |
| FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64                             | INT<br>INT<br>INT<br>INT<br>INT                                                         | INT<br>INT<br>INT<br>INT<br>INT                     | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32         | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32         | TEN       | ISOR                            | TENSOR                         | FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64                 | INT<br>INT<br>INT<br>INT<br>INT<br>INT                     | INT<br>INT<br>INT<br>INT<br>INT                     | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32                 | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32         | TENSOR                  | TENSOR         |
| FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64                     | INT<br>INT<br>INT<br>INT<br>INT                                                         | INT<br>INT<br>INT<br>INT<br>INT<br>INT              | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32 | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32 | TEN       | ISOR                            | TENSOR                         | FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64         | INT<br>INT<br>INT<br>INT<br>INT<br>INT<br>INT              | INT<br>INT<br>INT<br>INT<br>INT<br>INT<br>INT       | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32         | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32 | TENSOR                  | TENSOR         |
| FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>ED7 LD7<br>ST ST         | INT<br>INT<br>INT<br>INT<br>INT<br>INT<br>LDJ                                           | INT<br>INT<br>INT<br>INT<br>INT<br>INT<br>LDV<br>ST | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32 | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32 |           | ISOR<br>DRE                     | TENSOR                         | FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64 | INT<br>INT<br>INT<br>INT<br>INT<br>INT<br>INT<br>LDV<br>ST | INT<br>INT<br>INT<br>INT<br>INT<br>INT<br>LD/<br>ST | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32 | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32 | TENSOR<br>CORE          | TENSOR<br>CORE |
| FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>LDT LDV<br>ST ST | INT<br>INT<br>INT<br>INT<br>INT<br>LDJ                                                  | INT<br>INT<br>INT<br>INT<br>INT<br>INT<br>LDV<br>ST | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32 | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32 | TEN<br>CO | ISOR<br>ISOR<br>ISOR            | TENSOR<br>CORE                 | FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64<br>FP64 | INT<br>INT<br>INT<br>INT<br>INT<br>INT<br>LUY<br>ST        | INT<br>INT<br>INT<br>INT<br>INT<br>INT<br>LD/       | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32 | FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32<br>FP32 |                         | TENSOR<br>CORE |

Figure 5. Volta GV100 Streaming Multiprocessor (SM)

|                                 | L0 Instruction Cache   |           |           |           |           |           |           |        |  |
|---------------------------------|------------------------|-----------|-----------|-----------|-----------|-----------|-----------|--------|--|
| Warp Scheduler (32 thread/clk)  |                        |           |           |           |           |           |           |        |  |
| Dispatch Unit (32 thread/clk)   |                        |           |           |           |           |           |           |        |  |
| Register File (16,384 x 32-bit) |                        |           |           |           |           |           |           |        |  |
| FP6                             | 54                     | INT       | INT       | FP32      | FP32      | $\square$ |           |        |  |
| FP6                             | 54                     | INT       | INT       | FP32      | FP32      |           |           |        |  |
| FP6                             | 64                     | INT       | INT       | FP32      | FP32      | TENSOR    |           |        |  |
| FP6                             | 64                     | INT       | INT       | FP32      | FP32      |           |           | TENSOR |  |
| FP6                             | 64                     | INT       | INT       | FP32      | FP32      | co        | ORE       | CORE   |  |
| FP6                             | 64                     | INT       | INT       | FP32      | FP32      |           |           |        |  |
| FP6                             | FP64                   |           | INT       | FP32      | FP32      |           |           |        |  |
| FP6                             | FP64 INT INT FP32 FP32 |           |           |           |           |           |           |        |  |
| LD/<br>ST                       | LD/<br>ST              | LD/<br>ST | LD/<br>ST | LD/<br>ST | LD/<br>ST | LD/<br>ST | LD/<br>ST | SFU    |  |

Reference & Credit: Nvidia Tesla V100 GPU Architecture, The World's Most Advanced Data Center GPU. NVIDIA Corporation, 2017

| L0 Instruction Cache            |           |           |           |           |           |           |           |      |
|---------------------------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|------|
| Warp Scheduler (32 thread/clk)  |           |           |           |           |           |           |           |      |
| Dispatch Unit (32 thread/clk)   |           |           |           |           |           |           |           |      |
| Register File (16,384 x 32-bit) |           |           |           |           |           |           |           |      |
| FP                              | 64        | INT       | INT       | FP32      | FP32      | $\square$ |           |      |
| FP                              | FP64      |           | INT       | FP32      | FP32      |           |           |      |
| FP                              | FP64      |           | INT       | FP32      | FP32      |           |           |      |
| FP                              | 64        | INT       | INT       | FP32      | FP32      | TENSOR    | TENSOR    |      |
| FP                              | 64        | INT       | INT       | FP32      | FP32      | cc        | DRE       | CORE |
| FP                              | FP64      |           | INT       | FP32      | FP32      | H         |           |      |
| FP                              | FP64      |           | INT       | FP32      | FP32      | H         |           |      |
| FP                              | 64        | INT       | INT       | FP32      | FP32      | $\vdash$  |           |      |
| LD/<br>ST                       | LD/<br>ST | LD/<br>ST | LD/<br>ST | LD/<br>ST | LD/<br>ST | LD/<br>ST | LD/<br>ST | SFU  |

- FP64 cores
- FP32 cores
- INT32 cores
- LD/ST
- Register File
- SFU Special-Function-Unit (sin,cos,etc)
- Cache, memory, tensor core (introduced later)
- Warp Scheduler

| Tesla Product                   | Tesla K40            | Tesla M40           | Tesla P100          | Tesla V100                  |
|---------------------------------|----------------------|---------------------|---------------------|-----------------------------|
| GPU                             | GK180 (Kepler)       | GM200 (Maxwell)     | GP100 (Pascal)      | GV100 (Volta)               |
| SMs                             | 15                   | 24                  | 56                  | 80                          |
| TPCs                            | 15                   | 24                  | 28                  | 40                          |
| FP32 Cores / SM                 | 192                  | 128                 | 64                  | 64                          |
| FP32 Cores / GPU                | 2880                 | 3072                | 3584                | 5120                        |
| FP64 Cores / SM                 | 64                   | 4                   | 32                  | 32                          |
| FP64 Cores / GPU                | 960                  | 96                  | 1792                | 2560                        |
| Tensor Cores / SM               | NA                   | NA                  | NA                  | 8                           |
| Tensor Cores / GPU              | NA                   | NA                  | NA                  | 640                         |
| GPU Boost Clock                 | 810/875 MHz          | 1114 MHz            | 1480 MHz            | 1530 MHz                    |
| Peak FP32 TFLOPS <sup>1</sup>   | 5                    | 6.8                 | 10.6                | 15.7                        |
| Peak FP64 TFLOPS <sup>1</sup>   | 1.7                  | .21                 | 5.3                 | 7.8                         |
| Peak Tensor TFLOPS <sup>1</sup> | NA                   | NA                  | NA                  | 125                         |
| Texture Units                   | 240                  | 192                 | 224                 | 320                         |
| Memory Interface                | 384-bit GDDR5        | 384-bit GDDR5       | 4096-bit HBM2       | 4096-bit HBM2               |
| Memory Size                     | Up to 12 GB          | Up to 24 GB         | 16 GB               | 16 GB                       |
| L2 Cache Size                   | 1536 KB              | 3072 KB             | 4096 KB             | 6144 KB                     |
| Shared Memory Size /<br>SM      | 16 KB/32 KB/48<br>KB | 96 KB               | 64 KB               | Configurable up<br>to 96 KB |
| Register File Size / SM         | 256 KB               | 256 KB              | 256 KB              | 256KB                       |
| Register File Size /<br>GPU     | 3840 KB              | 6144 KB             | 14336 KB            | 20480 KB                    |
| TDP                             | 235 Watts            | 250 Watts           | 300 Watts           | 300 Watts                   |
| Transistors                     | 7.1 billion          | 8 billion           | 15.3 billion        | 21.1 billion                |
| GPU Die Size                    | 551 mm²              | 601 mm <sup>2</sup> | 610 mm <sup>2</sup> | 815 mm <sup>2</sup>         |
| Manufacturing<br>Process        | 28 nm                | 28 nm               | 16 nm FinFET+       | 12 nm FFN                   |

Reference & Credit: Nvidia Tesla V100 GPU Architecture, The World's Most Advanced Data Center GPU. NVIDIA Corporation, 2017

# SM multithreading

- ➤ single-instruction multiple-thread (SIMT)
- thread block
- warp (32 threads)
- active mask
- its own instruction address and register state
- select a warp and issue the next instruction

Independent thread scheduling for volta architecture



Figure 4. Single-instruction, multiple-

Reference & Credit: Erik Lindholm, John Nickolls, Stuart Oberman, John Montrym, NVIDA Tesla: A Unitional (SIMT) warp scheduling. Graphics And Computing Architecture

# Independent thread scheduling for volta architecture



Its own program counter and call stack.

Volta (bottom) independent thread scheduling architecture block diagram compared to Pascal and earlier architectures (top). Volta maintains per-thread scheduling resources such as program counter (PC) and call stack (S), while earlier architectures maintained these resources per warp.

#### Figure 21. Volta Warp with Per-Thread Program Counter and Call Stack

Reference & Credit: Nvidia Tesla V100 GPU Architecture, The World's Most Advanced Data Center GPU. NVIDIA Corporation, 2017

#### **GPU Memory Hierarchy**



Figure 3.1: Memory hierarchy of the Volta V100 GPU (GV100).

Reference & Credit: Jia, Z., Maggioni, M., Staiger, B., & Scarpazza, D. P. (2018). Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. arXiv preprint arXiv:1804.06826.

# Where can we get information?

- Published by Nvidia: official but limited

[1] <u>Nvidia Tesla V100 GPU Architecture, The World's Most Advanced Data Center GPU. NVIDIA Corporation,</u> 2017.

[2] Pascal GP100 Whitepaper. NVIDIA Corporation, 2016.

[3] <u>Lindholm, E., Nickolls, J., Oberman, S., & Montrym, J. (2008). NVIDIA Tesla: A unified graphics and computing architecture. IEEE micro, 28(2).</u>

[4] CUDA C Programming Guide, NVIDIA Corporation, 2018.

[5] CUDA C Best Practices Guide, NVIDIA Corporation, 2018.

- Microbenchmarking

[6]: X. Mei and X. Chu, "Dissecting GPU memory hierarchy through microbenchmarking," IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 1, pp. 72–86, Jan 2017.
[7]: Jia, Z., Maggioni, M., Staiger, B., & Scarpazza, D. P. (2018), Dissection

[7]: Jia, Z., Maggioni, M., Staiger, B., & Scarpazza, D. P. (2018). Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. arXiv preprint arXiv:1804.06826.

# Registers

- Virtual Registers

Two levels of assembly: PTX and SASS. Difference?

Sample PTX and SASS for vector addition

The intermediate language (PTX) use virtual registers. Why?

Size of Register Files
 In GV100, register file is 256KB/SM \* 80SMs = 20480KB
 In comparison, L2 caches only 6144KB
 Why so many registers?
 avoid register spilling

## Registers

- The register file is divided into 2 banks, each bank 64 bits Use microbenchmark "FFMA R6, R97, R99, RX".



## Caches

- Data Cache Structure
   L1 cache on each SM
   L2 cache shared among all SMs
- Latency
  - L1 cache hit: 28 cycles
  - L2 cache hit: 193 cycles
  - L2 cache miss with TLB hit: 375 cycles
  - L2 cache miss with TLB miss: 1029 cycles
- L1 Cache

Volta architecture features combined L1 data cache and shared memory (difference between L1 cache and shared memory?)

## Caches

- L1 Cache (continued)

Replacement policy: Not simply LRU.

The same four cache lines from 4 cache set have lowest preservation priority.

- L2 Cache

total size 6144KB; 16-way set-associative cache; cache line size 64B

- TLBs

L1 data cache is indexed by virtual addresses;

L2 data cache is indexed by physical addresses

Two levels of TLB:

L1 TLB: 2M page entries, 32M of coverage

L2 TLB: ~8192MB coverage.

# **Shared Memory**

- Shared within a threadblock
- Specified explicitly by programmer

\_\_global\_\_ void kernel(...)

{

\_\_shared\_\_ float shared\_memory[1024]; load global memory into shared memory \_\_syncthreads(); actual computation

}

- configurable, up to 96KB
- shared memory bank

# **Constant Memory**

- Resides on device memory but cached in the constant cache
- Cache hit -> throughput of constant cache
   Cache miss -> throughput of device memory
- Constant memory supports
   broadcasting: when all threads in a
   warp access
   the same location -> simultaneous
   diverging addresses -> serialized



# **Global Memory**

- Memory Coalescing:

Memory accesses from the same warp coalesced into fewer memory block accesses. (fall in the same block, meet alignment criteria)

- HBM2 Memory

2.5D designbetter bandwidth, but slowerenergy efficientsmaller form factor



Figure 9. Cross-section Illustrating GP100 adjacent HBM2 stacks

# What's Tensor Core

#### 4x4x4 Warp Matrix Multiply and Accumulate (WMMA)



# D = AB + C

#### **Tensor** Core

#### **Mixed-precision** Operation



#### D = AB + C



#### Power of Tensor Core

640 Tensor Cores on V10064 FP FMA per Core per Cycle125 Tensor TFLOPS for DL12x throughput over Pascal



# Multi-Process Service (MPS)



## Independent Thread Scheduling

