Intel iWarp VLIW
Intel iWarp is a late-1980s VLIW computer architecture that coupled a 32-bit RISC processor core with long instructions and powerful communication support
to enable meshed computers for powerful parallel computing with > 20 GFLOPS.
iWarp was jointly developed between Intel and Carnegie Mellon University, supported by DARPA (DoD) in the late 1980s. It was based on earlier work of CMU Warp, a programmable systolic array developed at Carnegie Mellon and produced by General Electric.
With iWarp processors, larger nodes could be closely coupled for general purpose and parallel computing, aimed at signal processing with potentially military uses. Intel shipped first iWarp computers (64 cells) to CMU in 1991 and sold iWarp computers until the mid-1990s, after the integration with the Supercomputing Systems Division at Intel.
Intel iWarp processors are not to be confused with the 21st century Intel iWarp networking technologies.
Processor | Clock | Cache | Width | Units |
---|---|---|---|---|
iWarp | 20 MHz | 1 KB | 3-way | 3 |
Instruction set
Intel iWarp architecture uses a 32-bit RISC processor core with 96-bit large instruction word (LIW) decoder for parallel processing. Instructions have a length of 32-bit or 96-bit.
The long instruction width of iWarp processors is 96-bit, data width is 32-bit. iWarp can issue three instructions per clock, integrated into a single LIW, for their three instruction units.
iWarp supports 8 to 32-bit integers and 32 to 64-bit IEEE 754 floating point.
Functional units
Each Intel iWarp processor (component) contains a Computation Agent and a Communication Agent, which are independent of each other.
Functional units in the Computation Agent are:
- Integer/logical unit (ALU): Arithmetical, logical and bit operations
- Floating point adder (FPADD)
- Floating point multiplier (FPMUL): Multiplications, divide, remainder, sqroot
- Internal data storage and interconnect: Register file and agent access
- Memory units: Interface to off-chip memory and on-chip cache
iWarp uses a 128 word Register File (RF) with 118 general purpose and ten special register.
Cache and memory
Intel iWarp has an on-chip program store
of 1 KB (256 words) instruction cache and 2 KB ROM for program functions (not really a cache).
Memory is supported up to 64 MB physical on a single-processor iWarp cell (node) with 64-bit data bus and 24-bit (23-bit?) address bus with up to 160 MB/s RAM data rate. Memory was implemented in 20 MHz SRAMs on the processor board (node).
Integration and peripherals
iWarp integrates communications onto the processor in the Communication Agent:
- Four input and four output ports (
bidirectional pathways
): 8-bit buses that link P2P to other iWarp processors - Multiplexed logical on physical buses: Control up to 20 pathway buses
- Pathway unit: routing for 1D/2D and schemes
Communications from iWarp computers to the outside are conducted via the Sun interface board, a VME board that connects a Sun workstation for control and I/O.
Devices and I/O is implemented on the local memory bus using memory-mapped I/O.
Physical
Intel iWarp has a clock speed of 20 MHz. Die (14mm²) has 600,000 FETs, is manufactured in 0.9µ CMOS process, packaged in 271-pin PGA. Manufacturing process used for iWarp at Intel was the same as the Intel i386 microprocessor.
Chips were expected to be military-specified, as the i386 process was approved for the NASA (Freedom) space station and pretty rad-hard.
iWarp processor cards are 9×11″ with usually four processors and 6 MB memory in SRAMs.
Memory could be extended with a (planned?) daughter board that could hold extra RAM and connect to the processor card.
Processor boards are mounted vertically in cardcages with 16 boards for up to 64 processors, connected to a backplane.
Up to four cardcages are in turn mounted into the
all-metal container, with up to 5KW power draw.
Containers were similar to Intel iPSC but black in color.
The largest iWarp systems designed are four-container systems with up to 1024 processors, limited by clock distribution.
Future iWarp developments foresaw iWarp 1.5 processors with up to 80 MHz clock speed in 3-layer CMOS process, better FP functionality and bigger caches and buses. With a planned iWarp 2, improvements on communications (unified), node architecture, locally shared memory and 3D packaging were planned.
Used in
Intel iWarp processors were used in node computers as part of larger, parallel multi-processor systems of the early 1990s, often used in academia, research and (possibly) the military.
Strengths of iWarp at the time were high speed static memory and the high performance low latency communication
, useful for research and applications like real time vision.
Intel sold iWarp computers privately until the mid-90s, after the merger with the Intel supercomputing division.
There were multiple iWarp configurations available by Intel:
- Quad Cell Board (QCB) with four iWarp processors
- Card Cage Assembly (CCA) with four QCBs = 64 iWarp processors (8x8 torus), a typical iWarp system
- System Cabinet with four CCAs = 256 nodes
- Multi-Cabinet with four System Cabinets = 1024 nodes
- Variations of these in other form-factors and interfaces
The main selling point of Intel iWarp was the possibility of combining processors, single nodes, into larger multi-processor systems in general-purpose or special-purpose arrays, depending on use cases.
Intel iWarp computers used a narrow set of software and development components: RTS (Run Time System) with a small operating system kernel, Pathlib for systolic communication, C and assembler, Fortran 77, Apply.
Benchmarks
Processor | Configuration | Speed | MIPS/FLOPS | SPEC92 int/fp |
---|---|---|---|---|
Intel iWarp | Node (single) | 20 MHz | 20/20 | |
Intel iWarp | QCB (4) | 20 MHz | 80/80 | |
Intel iWarp | CCA (64) | 20 MHz | 1,280/1,280 | |
Intel iWarp | Cabinet (256) | 20 MHz | 5,120/5,120 | |
Intel iWarp | Multi-Cabinet (1024) | 20 MHz | 20,480/20,480 | |
Comparisons | ||||
Apollo PRISM | DN10000 | 18 MHz | 22 | |
Sun MAJC-5200 | 500 MHz | 13,000/6,160 | ||
Philips TM-1100 | 133 MHz | 5,000 |
Documentation
Product documentation
- iWarp: An Integrated Solution to High-Speed Parallel Computing, Carnegie Mellon University and Intel Corporation (1988)
- Summary of iWARP Forum, Bradley C. Kuszmaul (1989) [this is great reading]
Articles
- iWarp Project, CMU School of Computer Science (1998)
- Papers related to the CMU iWarp project, CMU School of Computer Science (1995!)
- iWarp Architecture Overview (LONG), Jim Sutton (1991, comp.parallel) google groups
Papers
- 100 MOP LIW Microprocessor for Multicomputers, Intel Corporation (HOT CHIPS 1990) archive.org
- iWARP - Anatomy of a Parallel Computing System , MIT Press (1998) archive.org
Noeasy reading material, because it requires previous knowledge of instruction-level parallelism [...] contains a wealth of interesting information [...] the world of parallel computers