Convex Exemplar SPP1000, SPP1200 & SPP1600
|CPU||2-16 (CD)/2-8 (XA)
|Caches||512 KB-2 MB L1|
|RAM||4 GB (CD)
2 GB (XA)
|Expansion||16 SBus (CD)
8 SBus (XA)
|Bandwidth||CPU/Mem 1 GB/s
I/O 250 MB/s
XBAR 1.25 GB/s
SCI 2.4 GB/s
SCI/CTI links (XA)
The Convex Exemplar SPP1000, SPP1200 and SPP1600 are scalable 32-bit mainframe computing systems, with either PA-7100 or PA-7200 processors. Previous Convex designs used custom Convex processors, with SPP Convex switched to HP PA-RISC designs. At the same time, Convex and HP started collaborating more closely in the early 1990s, which resulted in the joint HP/Convex Exemplar SPP2000, and the takeover of Convex by HP in 1994. This development peaked with the HP-branded V-Class servers based on similar architecture, the 64-bit non-clusterable HP 9000/V2200 and V2250 and the up to four-way clusterable HP 9000/V2500 and V2600.
The 32-bit Convex SPP1x00 systems consist of three distinct system types, the CD compact systems, the XA eXtended Architecture hypernodes and the XA clusters:
- SPP1000/CD, SPP1200/CD, SPP1600/CD: single
compactsystems, either special systems with up to sixteen processors, or two SPP XA Hypernodes coupled together and sold as a single, non-clusterable system.
- SPP1000/XA, SPP1200/XA, SPP1600/XA Hypernode: a single XA hypernode with up to eight processors and provisions for linking up via SCI to other systems.
- SPP1000/XA, SPP1200/XA, SPP1600/XA Cluster: up to sixteen XA hypernodes coupled via SCI/TCI interconnection rings; these XA clusters can have up to 128 processors in their maximum configuration. The resulting interconnected Exemplars are ccNUMA computers.
The internal Exemplar architecture is based on a 5x5 crossbar with the central
switching component, the crossbar, connecting the resources to each other by forming matrix connections between
the devices’ input and output ports.
5x5 because the crossbar has five ports for processors, memory and I/O.
The Nodes and Clusters are controlled and booted via a separate workstation connected to it,
frequently a IBM RS/6000 computer running AIX, which faced the Exemplar’s console
and control I/O, in the case of a cluster only one node had a control workstation.
Also apparently used were HP 9000/715 workstations running as
- SPP1000/CD: 2-16 PA-7100 100 MHz with 2 MB off-chip L1 cache each
- SPP1000/XA Hypernode: 2-8 PA-7100 100 MHz with 2 MB off-chip L1 cache each
- SPP1000/XA Cluster: 8-128 PA-7100 100 MHz with 2 MB off-chip L1 cache each
- SPP1200/CD: 2-16 PA-7200 120 MHz with 512 KB off-chip L1 cache each
- SPP1200/XA Hypernode: 2-8 PA-7200 120 MHz with 512 KB off-chip L1 cache each
- SPP1200/XA Cluster: 8-128 PA-7200 120 MHz with 512 KB off-chip L1 cache each
- SPP1600/CD: 2-16 PA-7200 120 MHz with 1 MB off-chip L1 cache each
- SPP1600/XA HyperNode: 2-16 PA-7200 120 MHz with 1 MB off-chip L1 cache each
- SPP1600/XA Cluster: 8-128 PA-7200 120 MHz with 1 MB off-chip L1 cache each
It is not quite clear how the CD models relate to the XA models — the XA clusters consist of several 2-8 processor hypernodes while the CD models were shipped with up to 16 processors. Either the CDs are different machines than the XA hypernodes or they are simply two XA hypernodes coupled together, without any additional SCI/CTI expansion possibilities.
The chipset is based completely on an own Convex design and centers around the Convex five-port crossbar, later improved on the SPP2000 with eight ports and used in HP’s V-Class.
- 5x5 nonblocking crossbar, with five crossbar ports, is the central part of the system,
it connects to four
functional units(memory, SCI links and processor) and with the fifth port to the local system I/O. The four functional units contain each a memory controller, SCI controller and an
agentfor two processors. Memory and processor use different data links to the crossbar — memory access always goes over the crossbar, even from a processor to the memory in the same functional unit. Each crossbar port has a data rate of 250 MB/s, giving the crossbar a combined peak bandwidth of 1.25 GB/s. The crossbar is implemented in Gallium arsenide gate arrays, GaAs with 250K transistors, a rarity, very expensive and difficult to handle.
- Four CPU Agents attach to the crossbar and provide access for the processors to the memory via the crossbar over a 250 MB/s crossbar port shared with the memory controller.
- Four Convex Coherent Memory Controllers CCMCs attach each one four-way interleaved memory board to the crossbar. The CCMCs additionally do cache coherency and interface to the Convex’s SCI (CTI) link for inter-hypernode connection. The CTI interface or the complete CCMC were apparently also GaA chips.
- Exemplar I/O subsystem connects to the fifth 250 MB/s crossbar port and attaches the I/O subsystem controllers to the crossbar and this memory and processors.
» View a system-level ASCII illustration of the crossbar architecture.
- Total crossbar bandwidth 1.25 GB/s, five 250 MB/s ports
- CPU/Memory bandwidth 1.0 GB/s, four 250 MB/s ports shared with memory
- I/O bandwidth 250 MB/s with one crossbar port
- SPP1000 Four SBus I/O buses for expansion slots
- SPP1200/SPP1600 Eight SBus I/O buses for expansion slots
- Attachments to SCI rings, interconnection via four one-dimensional rings bandwidth of 2.4 GB/s
- SCSI-2 storage I/O bus
- Two to eight memory boards per node
- Memory is up to eight-way interleaved per node
- XA single nodes up to 2 GB of memory (512 MB per memory board)
- CD nodes up to 4 GB of memory
- XA single nodes 8 SBus slots
- CD nodes 16 SBus slots
- This is apparently really the same SBus used by Sun in their SPARC workstations
- 20 internal SCSI drives
Multiple SPP1x00/XA systems can be connected together to form a single large system.
- Up two sixteen SPP1000/SPP1200/SPP1600 XA models can be clustered together to form a system with up to
- 128 processors
- 32 GB of RAM
- 64 SBus slots
- 320 SCSI drives
- Clustered SPP Exemplar are ccNUMA computers.
- Multiple systems (nodes) are connected via four CTI rings: each uni-directional ring attaches to the same CCMC memory controller on different nodes.
- The four rings are implementations of the IEEE Standard 1596-1992 SCI, called by Convex CTI — Convex Toroidal Interconnect.
- Each ring is only unidirectional and has a bandwidth of 600 MB/s, 16-bit differential, 300 MHz clock
- Complete CTI bandwidth is thus 2.4 GB/s.
- Each node’s main memory is globally accessible from other nodes on the CTI network: local memory is globally shared.
- Memory access to global memory goes from the processor through the local crossbar to the local functional unit whose memory controller is associated with the remote memory
- SCSI depending on installed controller
- Console/control connections for the control workstation, the teststation
- Exemplar 1200 Architecture presentation (FTP, Postscript) Beth Richardson? (N.d.: NCSA. Google archive accessed August 2008)
- Convex SPP-UX, a heavily modified Mach-based operating system, which looks familiar to HP-UX but is a completely different design. The later HP V-Class are able to run stock HP-UX, which was modified specially for the V-Class architecture.
- SPP1200/CD Scalable Computing System, Convex Data Sheet (1995: Convex Computer Corporation) (URL gone)
- SPP1200/XA Scalable Computing System, Convex Data Sheet (1995: Convex Computer Corporation) (URL gone)
- A Comparative Evaluation of Hierarchical Network Architecture of the HP-Convex Exemplar (Postscript) Robert Castaneda, et al. (1997: in Proceedings of IEEE International Conference on Computer Design (ICCD’97) [there is a mirrored PDF version from citeseer (accessed August 2008)]
- Characterizing Shared Memory and Communication Performance: A Case Study of the Convex SPP-1000 (Postscript) Gheith A. Abandah and Edward S. Davidson (January 1996: University of Michigan. Accessed August 2008)
- An Empirical Evaluation of the Convex SPP-1000 Hierarchical Shared Memory System (PDF) Thomas Sterling, et al. (1995: Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques. Citeseer mirror accessed April 2009)