VLIW Data Path

The way things are organized inside processors to allow storage of data!

CISC and DSP
-Memory-to-memory and complex addressing modes
-Accumulator:a target register of ALU
-Some storage specialities force complier to make binding choice and optimizations too early

RISC and VLIW
-Registers-to-registers: large register files
-Decoupling scheduling and register allocation

Processor: 16×16 bit, 32x32bit
ARM7, ARM9E, ARMv5TE

VLIW Machine:A VLIW machine with 8 32-bit independent datapaths

In the embedded world, characteristics of application domain are also very important
Simple Integer and Compare Operations
Carry, Overflow, and other flags

Fixed Point Multiplication

Partial interconnect clusters promise better scalability
The bypass network of a clustered VLIW is also partitioned.

Index Register Files
illegal, outputs locals, static
-compiler can explicitly allocate a variable-sized section of the register file
-used in proc call and return for stack management

VLIW instructions are separated into primitive instructions by the compiler.
During each instruction cycle, one VLIW word is fetched from the cache and decoded.
Instructions are then issued to functional units to be executed in parallel.
Since the primitive instructions are from the same VLIW word, they are guaranteed to be independent.

VLIW is load/store architecture
– mostly reuse RISC memory-addressing concepts
– special addressing modes
– registers – data and address
– memory could be banked, power efficient
– X-Y memory
– GPUs constant memory, shader memory, shared memory, local memory