Intel iWarp VLIW

Intel iWarp is a late-1980s VLIW computer architecture that coupled a 32-bit RISC processor core with long instructions and powerful communication support to enable meshed computers for powerful parallel computing with > 20 GFLOPS.

Intel iWarp
iWarp © Intel and CMU 1995

iWarp was jointly developed between Intel and Carnegie Mellon University, supported by DARPA (DoD) in the late 1980s. It was based on earlier work of CMU Warp, a programmable systolic array developed at Carnegie Mellon and produced by General Electric.

With iWarp processors, larger nodes could be closely coupled for general purpose and parallel computing, aimed at signal processing with potentially military uses. Intel shipped first iWarp computers (64 cells) to CMU in 1991 and sold iWarp computers until the mid-1990s, after the integration with the Supercomputing Systems Division at Intel.

Intel iWarp processors are not to be confused with the 21st century Intel iWarp networking technologies.

Processor Clock Cache Width Units
iWarp 20 MHz 1 KB 3-way 3

Instruction set

Intel iWarp architecture uses a 32-bit RISC processor core with 96-bit large instruction word (LIW) decoder for parallel processing. Instructions have a length of 32-bit or 96-bit.

The long instruction width of iWarp processors is 96-bit, data width is 32-bit. iWarp can issue three instructions per clock, integrated into a single LIW, for their three instruction units.

iWarp supports 8 to 32-bit integers and 32 to 64-bit IEEE 754 floating point.

Functional units

Each Intel iWarp processor (component) contains a Computation Agent and a Communication Agent, which are independent of each other.

Functional units in the Computation Agent are:

iWarp uses a 128 word Register File (RF) with 118 general purpose and ten special register.

Cache and memory

Intel iWarp has an on-chip program store of 1 KB (256 words) instruction cache and 2 KB ROM for program functions (not really a cache).

Memory is supported up to 64 MB physical on a single-processor iWarp cell (node) with 64-bit data bus and 24-bit (23-bit?) address bus with up to 160 MB/s RAM data rate. Memory was implemented in 20 MHz SRAMs on the processor board (node).

Integration and peripherals

iWarp integrates communications onto the processor in the Communication Agent:

Communications from iWarp computers to the outside are conducted via the Sun interface board, a VME board that connects a Sun workstation for control and I/O.

Devices and I/O is implemented on the local memory bus using memory-mapped I/O.

Physical

Intel iWarp has a clock speed of 20 MHz. Die (14mm²) has 600,000 FETs, is manufactured in 0.9µ CMOS process, packaged in 271-pin PGA. Manufacturing process used for iWarp at Intel was the same as the Intel i386 microprocessor.

Intel iWarp
iWarp © Intel and CMU 1995

Chips were expected to be military-specified, as the i386 process was approved for the NASA (Freedom) space station and pretty rad-hard. iWarp processor cards are 9×11″ with usually four processors and 6 MB memory in SRAMs. Memory could be extended with a (planned?) daughter board that could hold extra RAM and connect to the processor card.

Intel iWarp
iWarp © Intel and CMU 1995

Processor boards are mounted vertically in cardcages with 16 boards for up to 64 processors, connected to a backplane. Up to four cardcages are in turn mounted into the all-metal container, with up to 5KW power draw. Containers were similar to Intel iPSC but black in color. The largest iWarp systems designed are four-container systems with up to 1024 processors, limited by clock distribution.

Future iWarp developments foresaw iWarp 1.5 processors with up to 80 MHz clock speed in 3-layer CMOS process, better FP functionality and bigger caches and buses. With a planned iWarp 2, improvements on communications (unified), node architecture, locally shared memory and 3D packaging were planned.

Used in

Intel iWarp processors were used in node computers as part of larger, parallel multi-processor systems of the early 1990s, often used in academia, research and (possibly) the military. Strengths of iWarp at the time were high speed static memory and the high performance low latency communication, useful for research and applications like real time vision. Intel sold iWarp computers privately until the mid-90s, after the merger with the Intel supercomputing division.

iWarp torus
iWarp torus
Torus and hexagonal © 1988 Intel

There were multiple iWarp configurations available by Intel:

The main selling point of Intel iWarp was the possibility of combining processors, single nodes, into larger multi-processor systems in general-purpose or special-purpose arrays, depending on use cases.

Intel iWarp computers used a narrow set of software and development components: RTS (Run Time System) with a small operating system kernel, Pathlib for systolic communication, C and assembler, Fortran 77, Apply.

Benchmarks

Processor Configuration Speed MIPS/FLOPS SPEC92 int/fp
Intel iWarp Node (single) 20 MHz 20/20
Intel iWarp QCB (4) 20 MHz 80/80
Intel iWarp CCA (64) 20 MHz 1,280/1,280
Intel iWarp Cabinet (256) 20 MHz 5,120/5,120
Intel iWarp Multi-Cabinet (1024) 20 MHz 20,480/20,480
Comparisons
Apollo PRISM DN10000 18 MHz 22
Sun MAJC-5200 500 MHz 13,000/6,160
Philips TM-1100 133 MHz 5,000

Documentation

Product documentation

Articles

Papers

↑ up