T2 Single Processor Core (SPC)

The spc module being dissected here is OpenSPARC T2 ver. 1.3 (OpenSPARCT2.1.3.tar.bz2). The spc module sources can be found in the sub directory $DV_ROOT/design/sys/iop/spc , where $DV_ROOT is the directory where you extracted the OpenSPARCT2.1.3.tar.bz2 file.

This module can be used for single core synthesis or can be used for multi-core building blocks.

The spc module is used both by single core FPGA T2 synthesis t2.v ($DV_ROOT/design/fpga/rtl/t2.v ) and multi-core design cpu.v ($DV_ROOT/design/sys/iop/cpu/rtl/cpu.v).

Each processor core has 8 hardware threads (HMT) or strands. These 8 strands are grouped into two with four strands in each group.
Only one strand can run at a given time within a group. Thus a spc has two strands concurrently running at any given time.

Each strand within the group, share one integer execution pipeline. 
That is a single processor core (spc) has two integer execution pipeline. 

Also a spc has one floating-point execution pipeline, $DV_ROOT/design/sys/iop/spc/fgu/rtl/fgu.v 

and a memory access pipeline, $DV_ROOT/design/sys/iop/spc/mmu/rtl/mmu.v .      

Since each of the group of four strands share one integer execution pipeline. At any given time, the two active strands within the spc can either be both executing integer operation, or one integer operation and the other strand either of a memory or a floating point operation.
Alternatively, if neither of the strand is executing an integer operation then one of them can be executing a floating-point operation while the other strand may be executing memory operation.    

The block diagram below, depicts the relationships between the modules within the single core. The green boxes, EXU0 and EXU1, are the two groups, hosting 4 strands each. 

TLU: Trap Logic Unit, updates the machine state and handles the exceptions and interrupts. 

IFU: Instruction Fetch Unit

$DV_ROOT/design/sys/iop/spc/ifu/              : Instruction Fetch related files.
$DV_ROOT/design/sys/iop/spc/dec/rtl/dec.v : Instruction decoder
$DV_ROOT/design/sys/iop/spc/exu/rtl/exu.v : Instruction Execution

Many of the parameters to the module have numeric as part of the naming instead of indexing. For example, in the,
ifu_ftu module ( $DV_ROOT/design/sys/iop/spc/ifu/rtl/ifu_ftu.v ), the wire parameter  ftp_thr0_trprdpc_sel_bf to ftp_thr7_trprdpc_sel_bf .
These parameters are dimensional ie. arrays, for example:
wire [2:0] ftp_thr0_trprdpc_sel_bf;

Verilog 2001 supports multi-dimensional arrays. Since older versions Verilog did not support multidimensional arrays, these input/output parameters are not defined as indexes, as we do it in higher level languages like 'C' and Java. 
Wire FtpThrTrprdpcSelBf[3][8];

Having said that, the variables like:
wire ftp_thr0_sel_br_bf
wire ftp_thr7_sel_br_bf
are uni-dimensional, so they could have been defined as single dimensional array.

We should take up a project to clean up these variables.

Page 52 of OpenSPARC Internals 
Each OpenSPARC T2 physical core has a 16-Kbyte, 8-way associative
instruction cache (32-byte lines), 8-Kbyte, 4-way associative data cache (16-
byte lines), 64-entry fully-associative instruction TLB, and 128-entry fully
associative data TLB that are shared by the eight strands. The eight
OpenSPARC T2 physical cores are connected through a crossbar to an on-chip
unified 4-Mbyte, 16-way associative L2 cache (64-byte lines).
EXU0, EXU1: The two Execution Units, each hosting 4 Hardware Threads (HMT) or stands. Each executes one strand at a time. The strand can either be executing Integer Operation or executing Floating point operation or executing a memory operation. But since the Floating point and memory units are common for both the thread groups, only one strand of the total of 8 strand can be executing it.


TODO: Verify ALU logic trace.

ALU (Arithmetic and Logic Unit)

ALU is part of EXU. It handles all arithmetic operations other than integer multiply and divide (these two ops are handled by FGU). 

The fundamental design principle employed consistently across the OpenSPARC Verilog RTL source code is that, the files at the top levels define input/output of the architecture block. And connect the blocks below it. As you "peel the onion", you will end up to the core building blocks of logic cells, that are isolated in libs directory files. The advantage of such a "Lego Blocks" design is that the architecture and its technology implementation (or synthesis) are isolated. Thus while transitioning from one synthesis technology to another one needs to focus only on optimizing the cell library to the target silicon technology platform, without messing with the architecture.    

As an example, let us trace the Adder unit source code.

Line 435 of $DV_ROOT/design/sys/iop/spc/exu/rtl/exu.v
module exu_edp_dp is instantiated as edp .
This module is located in  $DV_ROOT/design/sys/iop/spc/exu/rtl/exu_edp_dp.v  
In line 1634 of the above file, the macro (module) exu_edp_dp_cla_macro__width_64 is instantiated as i_as_cla .

The module exu_edp_dp_cla_macro__width_64 is located in the same file line 5935. It replicates the cla RTL module 64 times.

The cla module in located at line 1936 of  $DV_ROOT/libs/cl/cl_rtl_ext.v , it has a single line assign statement.

assign {cout,out[SIZE-1:0]} = ({1'b0,in0[SIZE-1:0]} + {1'b0,in1[SIZE-1:0]} + {{SIZE{1'b0}},cin});
Thus after traversing through three modules we reach the cla macro cell in libs directory.

However, for FPGA synthesis, these cells are replaced the FPGA vendor specific optimized lib.

FGU: Floating and Graphics Unit. 


MMU : Memory Management Unit and Hardware Table Walk (HWTW). 
LSU: Load Store Unit, bridges the L1 and L2 cache.

Gasket: arbitrates between the cores and the cache crossbar switch (ccx).