Convex Exemplar SPP1000, SPP1200, SPP1600
| Quick Facts | |
|---|---|
| Introduced | 1994-1996 |
| Period | Maturity (III) |
| Series | Mainframe |
| CPU | 4-128 PA-7100 32-bit PA-7200 32-bit 100-120 MHz |
| Caches | 512 KB-2 MB L1 |
| RAM | 4 GB (CD) 2 GB (XA) |
| Design | Crossbar |
| Drives | 20 SCSI |
| Expansion | 16 SBus (CD) 8 SBus (XA) |
| I/O | SCSI Console SCI/CTI links (XA) |
Convex Exemplar SPP1000, SPP1200 and SPP1600 are scalable mainframes with 32-bit PA-RISC HP PA-7100 and PA-7200 processors, released by Convex in 1994. Previous Convex designs used custom Convex processors, with the new SPP mainframes, Convex switched to HP PA-RISC processors, first 32-bit and later 64-bit, utilizing its partnership with HP and PRO.
In the early 1990s, Convex and HP started a close collaboration, which began with jointly marketed cluster-computing solutions based on HP 9000 in 1992. Soon after, HP licensed HP-UX Unix to Convex in 1993, then HP became a value-added reseller (VAR) for Convex before acquiring the company outright in 1995 and integrating it as Exemplar division.
Convex SPP Exemplar were used for compute-heavy workloads like Computational Fluid Dynamics (CFD), Structural Analysis, Decision Support, Molecular Mechanics, Petroleum Exploration and much more.
Convex SPP1000, SPP1200 and SPP1600 were available as CD Compact Design, XA eXtended Architecture hypernodes and XA clusters.
- SPP1000/CD, SPP1200/CD, SPP1600/CD: Stand-alone compact systems with up to sixteen processors in one or two cabinets.
- SPP1000/XA, SPP1200/XA, SPP1600/XA: Single XA hypernode with up to sixteen processors and provisions for linking other systems via SCI.
- SPP1000/XA, SPP1200/XA, SPP1600/XA: Cluster of XA hypernodes coupled via SCI/TCI rings with up to 128 processors, ccNUMA.
Convex Exemplar architecture is based on a 5x5 crossbar, a central internal switching component that connexts resources to each other by forming matrix connections between input and output ports.
5x5
because the crossbar has five ports for processors, memory and I/O.
Nodes and clusters are controlled by separate workstations, often IBM RS/6000 with AIX. HP 9000 715 workstations were also used as so-called teststations.
- SPP1000/CD were introduced in 1994 for $145,000-$750,000
- SPP1000/XA were introduced in 1994 for $550,000-$8 million
- SPP1200/CD were introduced in 1995 for $160,000 and up
- SPP1200/XA were introduced in 1995 for $586,000 (8-CPU) and up
- SPP1600 were introduced in 1996
This was followed by SPP2000 from the HP Exemplar division. Development of the HP Convex Exemplar architecture peaked with HP 9000 V-Class servers: non-clusterable HP 9000 V2200 and V2250 and up to four-way clusterable HP 9000 V2500 and V2600.
System
Processors
| System | CPU | Speed | L1 cache | |
|---|---|---|---|---|
| Convex SPP1000/CD Compact | 2-8 | PA-7100 PA-RISC 32-bit | 100 MHz | 512 KB off-chip |
| Convex SPP1000/XA Hypernode | 4-8 | PA-7100 PA-RISC 32-bit | 100 MHz | 512 KB off-chip |
| Convex SPP1000/XA Cluster | 4-128 | PA-7100 PA-RISC 32-bit | 100 MHz | 2 MB off-chip |
| Convex SPP1200/CD Compact | 4-16 | PA-7200 PA-RISC 32-bit | 120 MHz | 2 MB off-chip |
| Convex SPP1200/XA Hypernode | 8-16 | PA-7200 PA-RISC 32-bit | 120 MHz | 2 MB off-chip |
| Convex SPP1200/XA Cluster | 8-128 | PA-7200 PA-RISC 32-bit | 120 MHz | 512 KB off-chip |
| Convex SPP1600/CD Compact | 4-16 | PA-7200 PA-RISC 32-bit | 120 MHz | 2 MB off-chip |
| Convex SPP1600/XA Hypernode | 8-16 | PA-7200 PA-RISC 32-bit | 120 MHz | 2 MB off-chip |
| Convex SPP1600/XA Cluster | 8-128 | PA-7200 PA-RISC 32-bit | 120 MHz | 2 MB off-chip |
Chipset
Exemplar chipset is based on a custom Convex design with Convex five-port crossbar, later improved on the SPP2000 with eight ports and used in HP V-Class.
- 5x5 nonblocking crossbar, with five crossbar ports, is the central part of the system,
connects to four
functional units
(memory, SCI links and processor) and with the fifth port to the local system I/O.- The four functional units contain each a memory controller, SCI controller and an
agent
for two processors. - Memory and processor use different data links to the crossbar — memory access always goes over the crossbar, even from a processor to the memory in the same functional unit.
- The crossbar is implemented in Gallium arsenide gate arrays, GaAs with 250K transistors, a rarity, very expensive and difficult to handle.
- The four functional units contain each a memory controller, SCI controller and an
- Four CPU Agents attach to the crossbar and provide access for the processors to the memory via the crossbar port shared with the memory controller.
- Four Convex Coherent Memory Controllers CCMCs attach each one four-way interleaved memory board to the crossbar. The CCMCs additionally do cache coherency and interface to Convex SCI (CTI) link for inter-hypernode connection. CTI interface and/or CCMC were apparently also GaA chips.
- Exemplar I/O subsystem connects to the fifth crossbar port and attaches I/O subsystem controllers to the crossbar and memory and processors.
_____ _______
CPU1----\_|Agent| | |
_| 1 |=\ | Cross |
CPU2----/ |_____| | | bar |
|250MB/s| |
========| 1.25 |
______ ______ | | GB/s |
|Memory|_|CCMC1 | |
|Board1| |______|=/
|______| |
|
SCI
Ring 1
. . . .
. . . .
. . . .
_____
CPU7----\_|Agent| | |
_| 4 |=\ | Cross |
CPU8----/ |_____| | | bar |
|250MB/s| |
========| |
______ ______ | | |
|Memory|_|CCMC4 | | |_______|
|Board4| |______|=/ |
|______| | ___|___ _____
| |I/O | | 4-8
SCI |Control|==|----- SBus
Ring 4 |_______| |_____ I/O buses
Convex SPP100/1200/1600 Crossbar Architecture System Architecture
System buses
- Total crossbar bandwidth 1.25 GB/s, five 250 MB/s ports
- CPU/Memory bandwidth 1.0 GB/s, four 250 MB/s ports shared with memory
- I/O bandwidth 250 MB/s with one crossbar port
- SPP1000 Four SBus I/O buses for expansion slots
- SPP1200/SPP1600 Eight SBus I/O buses for expansion slots
- Attachments to SCI rings, interconnection via four one-dimensional rings bandwidth of 2.4 GB/s
- SCSI-2 storage I/O bus
Expansion
Memory
- DRAM, memory is up to eight-way interleaved per node
- Two to eight memory boards per node
- XA single nodes up to 2 GB of memory (512 MB per memory board)
- CD nodes up to 4 GB of memory
Expansion cards
- XA single nodes 8 SBus slots
- CD nodes 16 SBus slots
- Apparently really the same SBus used by Sun in their SPARC workstations
Storage
- 20 internal SCSI drives
I/O ports
- SCSI depending on installed controller
- Console/control connections for the control workstation, the teststation
Clustering
Multiple SPP1x00/XA systems can be connected to form a single large system.
- Up two sixteen SPP1000/SPP1200/SPP1600 XA models can be clustered together to form a system with up to
- 128 processors
- 32 GB of RAM
- 64 SBus slots
- 320 SCSI drives
- Clustered SPP Exemplar are ccNUMA computers.
- Multiple systems (nodes) are connected via four CTI rings: each uni-directional ring attaches to the same CCMC memory controller on different nodes.
- The four rings are implementations of the IEEE Standard 1596-1992 SCI, called by Convex CTI — Convex Toroidal Interconnect.
- Each ring is only unidirectional and has a bandwidth of 600 MB/s, 16-bit differential, 300 MHz clock
- Complete CTI bandwidth is thus 2.4 GB/s.
- Each node’s main memory is globally accessible from other nodes on the CTI network: local memory is globally shared.
- Memory access to global memory goes from the processor through the local crossbar to the local functional unit whose memory controller is associated with the remote memory
Operating systems
Convex SPP Exemplar with PA-RISC processors exclusively run SPP-UX, a scalable Unix based on Mach, developed by Convex for SPP1000 and SPP2000 mainframes with up to 512 processors, released between 1993 and 1999. SPP-UX implemented a distributed architecture that emulated HP-UX for developers but was very different below the userland.
Performance
Convex SPP Exemplar were impressive but expensive scalar RISC servers, the second generation faster than other RISC architectures like UltraSPARC, MIPS and Intel. Convex with PA-RISC Exemplar technology competed on floating-point use cases (MFLOPS) with supercomputers, a long Convex tradition.
| System | Processor | SPEC95 rate int/fp |
Linpack TPP Rmax |
||
|---|---|---|---|---|---|
| SPP1000 | 8 PA-7100 100 MHz 16 PA-7100 100 MHz 32 PA-7100 100 MHz 64 PA-7100 100 MHz |
751 965 |
1.01 3.30 6.19 |
||
| SPP1200 | 8 PA-7200 120 MHz 16 PA-7200 120 MHz 24 PA-7200 120 MHz 32 PA-7200 120 MHz |
656 | 1.02 2.03 2.83 3.96 |
||
| SPP1600 | 8 PA-7200 120 MHz 16 PA-7200 120 MHz 32 PA-7200 120 MHz |
290 541 996 |
383 744 1444 |
934 | 1.45 2.84 5.45 |
| System | Processor | SPEC95 rate int/fp |
Linpack TPP Rmax |
||
|---|---|---|---|---|---|
| Cray T90 T932 | 32 Cray ECL 450 MHz | 29360 | 61.80 | ||
| HP 9000 V2500 | 32 PA-8500 440 MHz 16 PA-8500 440 MHz |
7481 | 8217 |
31.59 17.47 |
|
| AlphaServer HPC320 AlphaServer HPC160 |
32 Alpha 21264 500 MHz 16 Alpha 21264 500 MHz |
7264 3837 |
11779 6246 |
||
| Cray SV1 | 24 Cray CMOS 300 MHz | 10420 | 38.31 | ||
| Convex SPP2000 | 64 PA-8000 180 MHz 16 PA-8000 180 MHz |
1307 |
6140 1413 |
4609 |
27.56 7.78 |
| AlphaServer 8400 | 32 Alpha 21164 625 MHz 8 Alpha 21164 625 MHz |
4504 1279 |
4527 1212 |
3608 |
17.96 |
| Sun Starfire | 32 Sun UltraSPARC-II 333 | 3480 | 3021 | 5187 | 17.91 |
| SGI Origin 2000 | 16 R12000 300 MHz | 2560 | 4224 | 3970 | 8.71 |
| HP V2250 | 16 PA-8200 240 MHz | 2209 | 2471 | 5935 | 10.65 |
| HP V2200 | 16 PA-8200 200 MHz | 1865 | 2312 | 4832 | 9.20 |
| Sun Enterprise 6k | 16 Sun UltraSPARC 250 | 1437 | 1965 | 3493 | 7.21 |
| AlphaServer ES40 | 4 Alpha 21264 667 MHz | 1390 | 2686 | 3804 | 4.11 |
| HP 9000 T600 | 12 PA-8000 180 MHz | 1192 | 1151 | ||
| DG AViiON AV 20000 | 16 Pentium Pro 200 MHz | 1007 | |||
| Siemens RM600 720 | 24 R4400 250 MHz | 921 | |||
| HP 9000 K580 | 6 PA-8200 240 MHz | 902 | 849 | ||
| IBM RS/6000 SP | 4 POWER3 375 MHz | 845 | 1739 | 3700 | 4.64 |
| HP Visualize C3600 | 1 PA-8600 552 MHz | 379 | 576 | ||
| Cray C90 | 1 Cray 238 MHz | 902 | 2.92 | ||
| HP 9000 D380 | 2 PA-8000 180 MHz | 210 | 221 | ||
Dimensions
| System | Height | Width | Depth | Weight |
|---|---|---|---|---|
| SPP1200/XA | 71cm | 112cm | 178cm | 404kg |
| SPP1200/CD | 46cm | 99cm | 89cm | 159kg |
Documentation
Most documentation is only available at archive.org and other archives, with most official sources, articles and journals having disappeared in the 2000s.
Manuals
- SPP1200/CD Scalable Computing System, Convex Data Sheet (1995: Convex Computer Corporation) (URL gone)
- SPP1200/XA Scalable Computing System, Convex Data Sheet (1995: Convex Computer Corporation) (URL gone)
Articles
- A Comparative Evaluation of Hierarchical Network Architecture of the HP-Convex Exemplar (Postscript) Robert Castaneda, et al. 1997 ICCD [citeseer PDF mirror, accessed August 2008]
- Characterizing Shared Memory and Communication Performance: A Case Study of the Convex SPP-1000 (Postscript) Gheith A. Abandah and Edward S. Davidson (January 1996: University of Michigan. Accessed August 2008)
- An Empirical Evaluation of the Convex SPP-1000 Hierarchical Shared Memory System (PDF) Thomas Sterling, et al. 1995
- Exemplar SPP-1200 Architecture Overview presentation, NCSA/University of Illinois 1996 archive.org
- Overview of Modern HPC Architectures presentation, NCSA/University of Illinois 1996 archive.org
- Convex Division Data Sheets, Hewlett-Packard Company (1996) archive.org
- Exemplar SPP1600 Technical Summary, Hewlett-Packard Company 1996 archive.org
- An Overview of the HP/Convex Exemplar Hardware archive.org, Hewlett-Packard Company (1997: mirror accessed 2025)
- SPP-UX: An HP-UX Binary Compatible, Microkernel-Based Operating System archive.org, Hewlett-Packard Company (1997: mirror accessed 2025)
- A Highly Scalable System Utilizing up to 128 PA-RISC Processors, Convex Computer Corporation (n.d.: mirror accessed 2025) archive.org
- Performance Evalutation of the Convex SPP Series , NCSA/University of Illinois 1996 archive.org
