Convex Exemplar SPP1000, SPP1200 & SPP1600
|CPU||2-16 (CD)/2-8 (XA)
|Caches||512 KB-2 MB L1|
|RAM||4 GB (CD)
2 GB (XA)
|Expansion||16 SBus (CD)
8 SBus (XA)
|Bandwidth||CPU/Mem 1 GB/s
I/O 250 MB/s
XBAR 1.25 GB/s
SCI 2.4 GB/s
SCI/CTI links (XA)
The Convex Exemplar SPP1x00 are scalable 32-bit mainframe/technical computing systems, with either PA-7100 (SPP1000) or PA-7200 (SPP1200 and SPP1600) processors. Previous Convex designs used custom Convex processors, with the SPP line Convex switched to third-party processors with the HP PA-RISC. This probably colluded witht the close collaboration between Convex and HP starting in the early 1990s, which resulted in the join HP/Convex marketed Exemplar SPP2000 (the direct 64-bit successor of the SPP1x00s with a slightly modified architecture) and the takeover of Convex by HP in 1994, which resulted in the HP-branded V-Class servers (the 64-bit non-clusterable HP 9000/V2200 and V2250 and the up to four-way clusterable HP 9000/V2500 and V2600).
The 32-bit Convex SPP1x00 systems consist of three distinct system building concepts, the CD compact systems, the XA eXtended Architecture hypernodes and the XA clusters:
- SPP1000/CD, SPP1200/CD, SPP1600/CD: single
compactsystems — the documentation is not really clear, this could either be special systems with up to sixteen processors, or — more likely —, two SPP XA Hypernodes coupled together and sold as a single, non-clusterable system.
- SPP1000/XA, SPP1200/XA, SPP1600/XA Hypernode: a single Hypernode/eXtended Architecture (XA) SPP system with up to eight processors and provisions for linking up via SCI to other systems.
- SPP1000/XA, SPP1200/XA, SPP1600/XA Cluster: a multiple Hypernode eXtended Architecture (XA) SPP system, with up to sixteen SPP1x00/XA Hypernodes coupled via SCI/TCI interconnection rings; these XA clusters can have up to 128 processors in their maximum configuration. The resulting interconnected Exemplars are then ccNUMA computers, a cache-coherent Non-Uniform Memory Access (for a detailed explanation cf. for example the ccNUMA section on the Wikipedia Non-Uniform Memory Access page).
The internal Exemplar architecture is based on a 5x5 crossbar with the central
switching component (the crossbar) connecting the resources to each other by forming matrix connections between
the devices’ input and output ports (
5x5 because the crossbar has five ports
for processors, memory and I/O).
The Nodes and Clusters are controlled and booted via a seperate workstation connected to it,
frequently a IBM RS/6000 computer running AIX, which faced the Exemplar’s console
and control I/O (in the case of a cluster only one node had a control workstation).
Also apparently used were HP 9000/715 workstations running as
Introduced: 1994 (SPP1000), 1995 (SPP1200), 1996 (SPP1600) for
$145,000-$750,000 (SPP1000/CD), $550,000 to $8 million (SPP1000/CD), $160,000 (two-CPU SPP1200/CD) and $586,000 (eight-CPU SPP1200/XA).
- SPP1000/CD: 2-16 PA-7100 100 MHz with 1/1 MB off-chip I/D L1 cache each
- SPP1000/XA Hypernode: 2-8 PA-7100 100 MHz with 1/1 MB off-chip I/D L1 cache each
- SPP1000/XA Cluster: 8-128 PA-7100 100 MHz with 1/1 MB off-chip I/D L1 cache each
- SPP1200/CD: 2-16 PA-7200 120 MHz with 256/256 KB off-chip I/D L1 cache each
- SPP1200/XA Hypernode: 2-8 PA-7200 120 MHz with 256/256 KB off-chip I/D L1 cache each
- SPP1200/XA Cluster: 8-128 PA-7200 120 MHz with 256/256 KB off-chip I/D L1 cache each
- SPP1600/CD: 2-16 PA-7200 120 MHz with 1/1 MB off-chip I/D L1 cache each
- SPP1600/XA HyperNode: 2-16 PA-7200 120 MHz with 1/1 MB off-chip I/D L1 cache each
- SPP1600/XA Cluster: 8-128 PA-7200 120 MHz with 1/1 MB off-chip I/D L1 cache each
It is not quite clear how the CD models relate to the XA models — the XA clusters consist of several 2-8 processor hypernodes while the CD models were shipped with up to 16 processors. Either the CDs are different machines than the XA hypernodes or they are simply two XA hypernodes coupled together, without any additional SCI/CTI expansion possibilities.
The chipset is based completely on an own Convex design and centers around the Convex five-port crossbar, later improved on the SPP2000 with eight ports and used in HP’s V-Class.
- 5x5 nonblocking crossbar, with five crossbar ports, is the central part of the system,
it connects to four
functional units(memory, SCI links and processor) and with the fifth port to the local system I/O. The four functional units contain each a memory controller, SCI controller and an
agentfor two processors. Memory and processor use different data links to the crossbar — memory access always goes over the crossbar, even from a processor to the memory in the same functional unit. Each crossbar port has a data rate of 250 MB/s, giving the crossbar a combined peak bandwidth of 1.25 GB/s. The crossbar is implemented in Gallium arsenide gate arrays (GaAs, 250K transistors), quite a rarity, since it was very expensive and difficult to handle.
- Four CPU Agents attach to the crossbar and provide access for the processors to the memory via the crossbar over a 250 MB/s crossbar port shared with the memory controller (see below).
- Four Convex Coherent Memory Controllers (CCMCs) attach each one four-way interleaved memory board to the crossbar. The CCMCs additionally do cache coherency and interface to the Convex’s SCI (CTI) link for inter-hypernode connection. [It is not quite clear if the CCMCs share the whole 250 MB/s port/data connections with the CPU agents on the same functional unit, or if CCMC and CPU agent attach to separate lines of the crossbar port —Ed.] The CTI interface — or the complete CCMC — were apparently also GaA chips.
- Exemplar I/O (Input/Ouput) Subsystem connects to the fifth 250 MB/s crossbar port and attaches the I/O subsystem controllers to the crossbar and this memory and processors.
» View a system-level ASCII illustration of the crossbar architecture.
- Total crossbar bandwidth 1.25 GB/s (five 250 MB/s ports)
- CPU/Memory bandwidth 1.0 GB/s (four 250 MB/s ports shared with memory)
- I/O bandwidth 250 MB/s (one crossbar port)
- SPP1000: Four SBus I/O buses for expansion slots
- SPP1200/SPP1600: Eight SBus I/O buses for expansion slots
- Attachments to SCI rings, interconnection via four one-dimensional rings bandwidth of 2.4 GB/s (each ring has a data rate of 600 MB/s, with a clock of 150 MHz [both edges] and a width of 16 bit)
- SCSI-2 storage I/O bus
- Two to eight memory boards per node
- Memory is up to eight-way interleaved per node
- XA single nodes: up to 2 GB of memory (512 MB per memory board)
- CD nodes: up to 4 GB of memory
- XA single nodes: 8 SBus slots
- CD nodes: 16 SBus slots
- (This is apparently really the same SBus as the one used by Sun in their earlier workstations — IEEE 1496)
- 20 internal SCSI drives
Multiple SPP1x00/XA systems can be connected together to form a single large system.
- Up two sixteen SPP1000/SPP1200/SPP1600 (XA models) can be clustered together to form a system with up to
- 128 processors
- 32 GB of RAM
- 64 SBus slots
- 320 SCSI drives
- Clustered SPP Exemplar are ccNUMA computers.
- Multiple systems (nodes) are connected via four CTI rings: each uni-directional ring attaches to the same CCMC memory controller on different nodes (all nodes attach with their first CCMC to the first ring, with their second CCMC to the second ring, and so on).
- The four rings are implementations of the IEEE Standard 1596-1992 (SCI), called by Convex CTI — Convex Toroidal Interconnect.
- Each ring is only unidirectional and has a bandwidth of 600 MB/s (16-bit differential, 300 MHz clock)
- Complete CTI bandwidth is thus 2.4 GB/s.
- Each node’s main memory is globally accessible from other nodes on the CTI network (that is, local memory is globally shared).
- Memory access to global memory goes from the processor through the local crossbar to the local functional unit whose memory controller is associated with the remote memory — it attaches to the same CTI ring the remote memory/CCMC attaches to. (The A Comparative Evaluation of Hierarchical Network Architecture of the HP-Convex Exemplar paper in the References has a detailed discussion of the CTI ring topology, memory access and performance.)
- SCSI depending on installed controller
- Console/control connections for the control workstation (teststation)
- SPP1200/CD Scalable Computing System, Convex Data Sheet (1995: Convex Computer Corporation)
- SPP1200/XA Scalable Computing System, Convex Data Sheet (1995: Convex Computer Corporation) [did not find appropriate URLs for these two Convex data sheets —Ed.]
- A Comparative Evaluation of Hierarchical Network Architecture of the HP-Convex Exemplar (Postscript) Robert Castaneda, et al. (1997: in Proceedings of IEEE International Conference on Computer Design (ICCD’97) [there is a mirrored PDF version from citeseer (accessed August 2008)]
- Characterizing Shared Memory and Communication Performance: A Case Study of the Convex SPP-1000 (Postscript) Gheith A. Abandah and Edward S. Davidson (January 1996: University of Michigan. Accessed August 2008)
- An Empirical Evaluation of the Convex SPP-1000 Hierarchical Shared Memory System (PDF) Thomas Sterling, et al. (1995: Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques. Citeseer mirror accessed April 2009)
- Exemplar 1200 Architecture presentation (FTP, Postscript) Beth Richardson? (N.d.: NCSA. Google archive accessed August 2008)
- Convex SPP-UX, a heavily modified Mach-based operating system, which looks familiar to HP-UX but is a completely different design. The later HP V-Class are able to run stock HP-UX (which was modified specially for the V-Class architecture).
Compare these with other results on the Benchmarks page.
- SPP1200/XA: 71×112×178 cm
- SPP1200/XA: 404 kg maximum weight
smallertower-like systems (thus
- SPP1200/CD: 159 kg maximum weight
- SPP1200/CD: 46×99×89 cm
- The cabinets are air-cooled
- XA systems: up to two cabinets can be mechanically stacked