PA-RISC Processors
Introduction
The PA-RISC processors are RISC processors from HP, started in the early 1980s as a replacement for different platforms used in HP computers and developed until the early 2000s. Three major revisions of the PA-RISC architecture were developed:
- 32-bit MMU-less (no virtual memory) PA-RISC 1.0, implemented in several early processors and used in the very first PA-RISC servers;
- 32-bit PA-RISC 1.1, used in the large range of PA-7x00 processors, and HP 9000 servers and workstations from the late-1980s and 1990s;
- 64-bit PA-RISC 2.0, which extended the 32-bit PA-RISC 1.1 to 64-bit width in the PA-8x00 processors and featured a redesign of most parts of the architecture, used in the late-1990s and 2000s in the last PA-RISC computers.
Almost all HP Unix systems from the mid-1980 until the early 2000s were based on PA-RISC — other HP product lines (as the HP 3000 systems) and few external integrators (OEMs) used PA-RISC processors as well.
There are roughly five main classes of actual PA-RISC processor designs — two PA-RISC 1.0, two PA-RISC 1.1 and one PA-RISC 2.0, with individual processors mostly being iterations of these basic designs.
- TS-1, the first PA-RISC processor, PA-RISC 1.0 32-bit, implemented in TTL.
- NS-1, NS-2 and PCX, the PA-RISC 1.0 32-bit sucessors. NS-2 tweaked the NS-1 design (both implemented in NMOS) and PCX implemented the NS-2 design on CMOS.
- PA-7000 and PA-7100, the first PA-RISC 1.1 processors,
and the later PA-7100LC and PA-7300LC, integrated
low-cost
PA-RISC 1.1 processors, all 32-bit. The former two have VSC bus system interfaces, with the PA-7000 being the more-integrated descendant of the earlier PCX and the PA-7100 adding superscalarity and integrating the FPU. The two LC processors integrate additional processing logic and direct GSC system bus attachments and on-die memory controllers. The PA-7300LC extended the original PA-7100LC design with true on-chip cache and modified memory controller and bus interfaces. - PA-7200, a high-performance PA-RISC 1.1 32-bit processor, a rather large redesign and the first PA-RISC processor with Runway bus interface.
- PA-8000 and PA-8200, the first PA-RISC 2.0 64-bit processors,
were very similar.
The subsequent 64-bit processors all were iterations of the basic PA-8000 core.
PA-8500, PA-8600 and PA-8700 are direct evolutions of the PA-8000 with large on-chip caches. The PA-8600 and PA-8700 are slight modifications of the PA-8500 with different cache layouts and process technologies.
PA-8800 and PA-8900 implemented dual PA-8700 cores onto single-dies with large off-die but on-chip caches.
Several third-party vendors designed and produced PA-RISC processors under license, including the general-purpose CPUs from Hitachi (PA/50 and HARP) and various microcontrollers from Winbond and Oki.
The following sections discuss the various PA-RISC processors in detail. A separate page describes the PA-RISC architecture.
Overview table
| CPU | ISA | Clock max |
FETs | L1 Cache max |
L2 Cache max |
Bus | Super scalar |
SMP | Units |
|---|---|---|---|---|---|---|---|---|---|
| TS-1 | PA 1.0 32-bit |
8MHz | ? | 128KB I/D off-chip |
Custom | 1-way | No | 1 Integer External FPU |
|
| NS-1 | PA 1.0 32-bit |
30MHz | 144k | 128KB off-chip |
SMB | 1-way | No | 1 Integer External FPU |
|
| NS-2 | PA 1.0 32-bit |
27.5MHz | 183k | 1MB I/D off-chip |
SMB | 1-way | Yes | 1 Integer External FPU |
|
| PCX | PA 1.0 32-bit |
50MHz | 196k | 1MB I/D off-chip |
SMB | 1-way | Yes | 1 Integer External FPU |
|
| PA-7000 | PA 1.1a 32-bit |
66MHz | 577k | 256KB I 256KB D off-chip |
PBus/VSC | 1-way | No | 1 Integer External FPU |
|
| PA-7100/ PA-7150 |
PA 1.1b 32-bit |
125MHz | 850k | 1MB I 2MB D off-chip |
PBus/VSC | 2-way | Yes | 1 Integer 1 Floating Point |
|
| PA-7100LC | PA 1.1c 32-bit |
100MHz | 900k | 1KB I on-chip |
2MB off-chip |
GSC | 2-way | No | 2 Integer 1 Floating Point MAX-1 |
| PA-7200 | PA 1.1d 32-bit |
140MHz | 1.3M | 2KB on-chip |
1MB I 2MB D off-chip |
Runway | 2-way | Yes | 2 Integer 1 Floating Point |
| PA-7300LC | PA 1.1e 32-bit |
180MHz | 9.2M | 64KB I 64KB D on-chip |
8MB off-chip |
GSC | 2-way | No | 2 Integer 1 Floating Point MAX-1 |
| PA-8000 | PA 2.0 64-bit |
230MHz | 4.5M | 1MB I 1MB D off-chip |
Runway | 4-way | Yes | 4 Integer 4 Floating Point 2 Load/Store MAX-2 |
|
| PA-8200 | PA 2.0 64-bit |
300MHz | 4.5M | 2MB I 2MB D off-chip |
Runway | 4-way | Yes | 4 Integer 4 Floating Point 2 Load/Store MAX-2 |
|
| PA-8500 | PA 2.0 64-bit |
440MHz | 140M | 512KB I 1MB D on-chip |
Runway | 4-way | Yes | 4 Integer 4 Floating Point 2 Load/Store MAX-2 |
|
| PA-8600 | PA 2.0 64-bit |
550MHz | 140M | 512KB I 1MB D on-chip |
Runway | 4-way | Yes | 4 Integer 4 Floating Point 2 Load/Store MAX-2 |
|
| PA-8700 | PA 2.0 64-bit |
875MHz | 186M | 768KB I 1.5MB D on-chip |
Runway | 4-way | Yes | 4 Integer 4 Floating Point 2 Load/Store MAX-2 |
|
| PA-8800 2-core |
PA 2.0 64-bit |
1GHz | 300M | 2× 768KB I 768KB D on-chip |
32MB off-chip |
Itanium 2 | 2× 4-way |
Yes | 2× 4 Integer 4 Floating Point 2 Load/Store MAX-2 |
| PA-8900 2-core |
PA 2.0 64-bit |
1.1GHz | 317M | 2× 768KB I 768KB D on-chip |
64MB off-chip |
Itanium 2 | 2× 4-way |
Yes | 2× 4 Integer 4 Floating Point 2 Load/Store MAX-2 |
| Hitachi PA/50 |
PA 1.1 32-bit |
60MHz | 1.28M | 8KB I 4KB D on-chip |
? | 1-way? | No? | 1 Integer 1 Floating Point |
|
| Hitachi HARP-1 |
PA 1.1 32-bit |
150MHz | 2.8M | 8KB I 16KB D on-chip |
512KB I 512KB D off-chip |
? | 2-way | No? | 2 Integer 1 Floating Point (Vector) |
- ISA: Instruction set architecture — version of the PA-RISC architecture and its width, i.e. integer register width and maximum addressable memory (32-bit or 64-bit)
- FETs: Number of transistors
- L1/L2 Caches: Maximum amount of Level 1 and Level 2 cache memories — on-chip is integrated onto the CPU die while off-chip cache is implemented with separate chips (most PA-RISC processors supported larger off-chip caches than were implemented in actual products)
- Bus: Type of bus the processor attaches to on the main board (note that this is in two cases the main I/O bus [GSC on the LC processors] and on the others the processor/memory bus)
- SMP: Capability of the CPU to work in multi-processor configuration
- Units: Number of functional processing units, for integer and floating point arithmetic, and load/store operations. Also notes if the MAX multimedia extensions are available.
Early PA-RISC
The first PA-RISC processors, designed and used in the mid to late-1980s in the HP 9000/800 servers (and HP 3000 MPE/iX systems), are very poorly documented. Their exact nomenclatura is not clear, one group of sources refers to them as TS-1, NS-1 and NS-2, while other call apparently the same processors PN-5, PN-7 and PN-10. These early CPUs still mostly were chipsets — multiple separate chips and components formed the central processing unit, contrary to the mostly single-chip post-PA-7000 implementations. The chips were based first on TTL, then NMOS-III and finally CMOS26B. An interesting aspect of these CPUs are their huge TLB sizes — from 2048 up to 16384 entries while their successors and competitors had sizes typically in the low to mid hundreds.
TS-1
Used in: 840
Introduced in: 1986
The TS-1 was the very first PA-RISC production processor and integrated
version 1.0 of PA-RISC on six boards (each 8.4×11.3″) of TTL.
Details:
- PA-RISC version 1.0 32-bit
- Three-stage pipeline
- The CPU consists of six separate boards:
- I-unit: the Instruction Unit
- Register File Board, contains general and control registers
- E-unit: the Execution Unit
- TLB, the translation lookaside buffer with 4096 entries for 2KB pages
- Cache controller with split instruction and data caches — 64KB for each I and D
- FPC, the floating-point coprocessor, handles FP operations parallel to the CPU/ALU (the ADD/MUL/DIV chip was taken over from the HP 9000/550 FOCUS system)
- 4096-entry TLB off-chip, direct-mapped
- Off-chip L1 cache of 128KB (I/D) direct-mapped/one-way associative
- Physical address space of 27-bit (128MB main memory could be addressed)
- 8MHz clock speed
- Six (some sources say five) printed circuit boards, implemented in FAST TTL and (25ns and 35ns) SRAMs/PALs, which each about 150 ICs
NS-1
Used in: 825, 835, 850
Introduced in: 1987
The first implementation of PA-RISC in a NMOS fabrication process followed shortly on the original TTL-based TS-1 and was called NS-1. The NS-1 processor is integrated on one circuit board (two on 825 server) with the CPU as single NMOS-III chip supplemented by external support chips:
Details:
- PA-RISC version 1.0 32-bit
- Three-stage pipeline
- CPU is a single chip, with eight support VLSI chips
- SIU (system interface unit), attaches the CPU to the SMB main bus
- two CCUs (cache controller units CCU0 and CCU1), attach to separate external cache chips
- TCU (TLB controller unit), attaches to the external TLB chips
- MIU (math interface unit), controls three third-party floating point (FP) chips (ADD, MUL and DIV)
- 2048 to 4096-entry TLB off-chip
- Off-chip L1 cache of 16KB (HP 9000/825) to 128KB (others), unified
- Physical address space of 29-bit (512MB main memory could be addressed)
- CPU attaches via System Main Bus (SMB) to memory and I/O (controllers)
SMB is a synchronous, pipelined bus with 64-bit wide address and data transfers - 25-30MHz clock speed
- One circuit board (two boards on HP 9000/825), 144,000 FETs, implemented in NMOS-III packaged in a 272-pin ceramic PGA package
NS-2
Used in: 822, 832,
845, 855, 860
Introduced in: 1989-1990
The final NMOS PA-RISC processor was the NS-2, a tweaked follow-on to the NS-1 with increased pipeline stages (from three to five), new TLB and cache controllers and significantly larger caches and TLB. The NS-2 design was simplified over its NS-1 predecessor. The processor is implemented on one circuit board with the CPU as a single NMOS-III and seven other VLSI chips. The bus structure connecting these chips was updated and simplified, with the CPU having private connections to the cache and TLB controllers (for which the NS-1 CPU had to use the shared cache bus).
Details:
- PA-RISC version 1.0 32-bit
- CPU is a single chip with seven VLSI support chips
- SIU (system interface unit), attaches the CPU to the SMB main bus
- two CCUs (cache controller units, split into instruction and data — ICCU and DCCU), attach to separate external cache chips
- TCU (TLB controller unit), attaches to the external TLB chips
- FPC (floating point controller), controls two third-party floating point (FP) chips (ADD, MULTI)
- Five-stage pipeline
- 16384-entry TLB off-chip
- Off-chip L1 cache up to 1024KB, split into I/D
- Physical address space of 29-bit (512MB main memory could be addressed)
- CPU attaches via System Main Bus (SMB) to memory and I/O (controllers)
SMB is a synchronous, pipelined bus with 64-bit wide address and data transfers - 27.5MHz clock speed (or maximum of 30MHz?), power dissipation of 26W
- One circuit board, CPU implemented in NMOS-III, 183,000 FETs, 1.5µ NMOS-III, die size 14.0×14.0 mm2 die, packaged in 408-pin PGA
PCX (CMOS26B)
Used in: 842, 852,
865, 870
Introduced in: 1990?
The last PA-RISC 1.0 design was the CMOS26B or PCX and the first PA-RISC processor fabricated in a CMOS process. It implemented the NS-1/NS-2 NMOS design and several of the processor functions previously supplied on external VLSI chips onto a single CPU chip. The PCX still was supplemented by external support chips, including three CMUX (cache multiplexer — one instruction, two data; equivalent to the earlier CCUs), SPI (SMB to processor interface — SMB is the system main bus), FPC (floating point coprocessor) and two FP chips (MUL/DIV and ADD/SUB) [not completely clear if the latter two or latter three chips are third-party].
- PA-RISC version 1.0 32-bit
- First multi-processor-capable PA-RISC CPU (up to four-way SMP)
- Direct predecessor of the PA-7000 (PCXS) processor which integrated most processor logic minus the FPU onto a single die/chip
- External FPU (apparently ECL logic)
- 8192-entry TLB on-chip
- Off-chip L1 cache up to 1024KB, split into I/D (apparently asymmetrical 1:2 I/D)
- Physical address space of 29-bit (512MB main memory could be addressed)
- CPU attaches via System Main Bus (SMB) to memory and I/O (controllers)
SMB is a synchronous, pipelined bus with 64-bit wide address and data transfers - 50MHz clock speed
- One circuit board, 196,000 FETs, 1.0µ (micron), implemented in three-level CMOS (CMOS26B)
- CPU is a single chip, needs seven other (VLSI) support chips for memory/bus interfaces and I/O
There are sources which also mention a CS-1
processor — from the nomenclatura this would point to a CMOS design but the
performance figures/charts do not really match up with the CMOS26B/PCX described
here.
References
- Wayne E. Holt (ed.), Beyond RISC! An Essential Guide to Hewlett-Packard Precision Architecture (January 1988: Software Research Northwest Inc.)
- Hardware Design of the First HP Precision Architecture Computers (PDF) David A. Fotland et al (March 1987: Hewlett-Packard Journal)
- HP 3000 Series 950 and HP 9000 Model 850S Family CE Handbook (PDF) Hewlett-Packard Company (October 1990. Accessed January 2008 at hpmuseum.net)
- HP 9000 Series 800 Model 825S Hardware Technical Data (PDF) Hewlett-Packard Company (September 1988. Accessed January 2008 at hpmuseum.net)
- HP 3000/925 and HP 9000/825/835 Computer Systems CE Handbook (PDF) Hewlett-Packard Company (May 1988. Accessed January 2008 at hpmuseum.net)
- New midrange members of the Hewlett-Packard Precision Architecture Computer Family Thomas O. Meyer et al (June 1989: Hewlett Packard Journal. Accessed January 2008 at findarticles.com)
- HP 9000 Series 800 Model 822S/832S Technical Data (PDF) Hewlett-Packard Company (1989. Accessed January 2008 at hpmuseum.net)
- A 30 MIPS VLSI CPU, Brian D. Boschma et al (ISSCC 89: February 1989)
PA-7000 (PCX-S) (Cheetah)
Used in
- 705, 710, 720, 730, 750
- F10, F20, F30, G30, G40, H20, H30, H40, I30, I40
- Mitsubishi ME/R7200, ME/S7200, ME/R7300, ME/S7300, ME/R7500, ME/S7500
Time of introduction
1991
Overview
The PA-7000 was the first PA-RISC version 1.1 processor and first used in the new 700 series workstations and later in some of the Nova servers. The PA-7000 is a multi-chip implementation:
- Central CPU with ALU, TLB and the I/D cache controllers
- Viper Memory and I/O Controller (MIOC)
- External FPU
- PBus/VSC interface, buffer chips for data/addresses between VSC and PBus
Details
- PA-RISC version 1.1a 32-bit
- Needs external FPU (commonly used was a coprocessor developed by HP and Texas Instruments)
- Five-stage pipeline
- 96/96 I/D TLB
- 4/4 I/D BTLB
- 32-bit bus to I cache
64-bit bus to D cache - PBus 32-bit from processor to the Memory and I/O Controller (MIOC)
- Off-chip caches up to 256KB/256KB I/D
- Up to 66MHz frequency with 5.0V core voltage
- 14.2×14.2 mm2 die, 577,000 FETs, 1.0µ (micron), 2-layer CMOS (CMOS26B) in 408-pin CPGA
- External FPU fabbed in 13.0×13.0 mm2 die, 640,000 FETs, 0.8µ (micron), TI EPIC-2 in 207-pin CPGA
References
- Various
- Évolution des gammes de processeurs MIPS, DEC Alpha, PowerPC, SPARC, x86 et PA-RISC (PDF) André Seznec and Thierry Lafage (INRIA: June 1997)
- Midrange PA-RISC Workstations with Price/Performance Leadership (.pdf) pp. 6-11 Andrew J. DeBaets and Kathleen M. Wheeler (August 1992: Hewlett-Packard Journal)
- VLSI Circuits for Low-End and Midrange PA-RISC Computers (.pdf) pp. 12-22 Craig A. Gleason (August 1992: Hewlett-Packard Journal)
PA-7100/PA-7150 (PCX-T) (Thunderbird)
Used in
- 715, 725, 735, 755
- 742i, 745i, 747i
- G50, G60, G70, H50, H60, H70, I50, I60, I70
- T500, T520
- Convex SPP1000/CD, SPP1000/XA
- Hitachi 3050RX 220, 230, 310S, 320, 330, 430, 440, 9000V V735/125, VT500
- Stratus Continuum 610S, 610, 615S, 615, 620, 625, 1220, 1225, 1245
Time of introduction
Early 1992 (PA-7150: 1994)
Overview
The PA-7100 the first PA-RISC CPU to integrate the ALU and FPU on a single die, saving board space and lowering production cost. The design of the basic and integer units is close to the PA-7000, which was modified to scale to higher frequencies; the (previously external) FPU was a new design, taking about one third of the transistor count. The link between the PA-7100 and its instruction cache has been doubled compared to the PA-7000, which enables the CPU to fetch multiple consecutive instructions and simultaneously dispatche them to independent integer and floating point units. The PA-7100 is a superscalar processor that is able to issue two separate instructions at a time.
SMP systems can be built with two alternative strategies: either two PA-7100s attach via a shared PBus to one Memory and I/O Controller (Viper) to which the system bus and memory separately attach; or each PA-7100 is attached to its own MIOC, which in turn is attached to a shared memory and I/O bus with the other PA-7100/MIOCs.
The PA-7150 is a PA-7100 with tweaks to the core and cache subsystem to allow clock frequencies up to 125MHz.
The PA-7100 was hardware developed on an HP 9000/I-Class server.
Details
- PA-RISC version 1.1b 32-bit
- Two functional units: 1 integer ALU, 1 Floating Point unit
- 2-way superscalar
- SMP-capable
- CPU, FPU, MMU and cache controller on one chip, memory and I/O controller (Viper MIOC) off-chip
- Five-stage pipeline
- Pipeline store technique for reduction of penalty for execution of any store to data cache
- Stall-on-use mechanism for parallel procession of instruction streams and cache misses
- 3-instruction queue
- Hardware TLB miss handler
- Hardware static branch support
- I/D cache bypass (7150)
- Off-chip L1 caches up to 1MB I and 2MB D realized in asynchronous standard SRAMs
- I/D caches are both 64-bit per access, direct mapped, parity protected and cycled at CPU clock
- Caches are attached directly to the CPU
- Caches are software accessible
- Caches are virtually indexed and physically tagged to minimize latency
- 120-entry fully associative TLB
- 16-entry BTLB with programmable page sizes up to 64MB
- CPU attaches via PBus to the Viper memory and I/O controller (MIOC)
- PBus is 32-bit multiplexed address/data bus and probably runs at possible bus speeds of 1.0, .67 and .50 of processor speed
- Two different multiprocessing connection strategies supported (shared MIOC or dedicated MIOCs)
- MP cache coherency support
- Up to 100MHz frequency (PA-7100) with 5.0V core voltage
- Up to 125MHz frequency (PA-7150) with 5.0V core voltage
- 14.0×14.0 mm2 die, 850,000 FETs, 0.8µ (micron), 3-layer metal CMOS (CMOS26B process) packaged in a 504-pin ceramic PGA package
- Power dissipation of 30W at 100MHz
References
- Various
- A 200 MFLOP HP PA-RISC Processor (.pdf) W. Jaffe, B. Miller, J. Yetter (1992: Hewlett Packard. Proceedings of IEEE Hot Chips IV)
- Multiprocessor Features in a PA-RISC Processor Interface Chip (.pdf) T. Alexander et al (1992: Hewlett Packard. Proceedings of IEEE Hot Chips IV)
- Évolution des gammes de processeurs MIPS, DEC Alpha, PowerPC, SPARC, x86 et PA-RISC (PDF) André Seznec and Thierry Lafage (INRIA: June 1997)
PA-7100LC (PCX-L) (Hummingbird)
Used in
- 712, 715, 725
- 743i, 748i
- D200, D210, D300, D310
- E25, E35, E45, E55
- Hitachi 3050RX 225, 235, 255, 535, e9000V V715, V715Tiny, VE25, VE35, VE45, VE55
- SAIC Galaxy 1100
Time of introduction
1994
Overview
The PA-7100LC was primarily designed as a single-chip solution for application in low cost systems while still delivering the performance of 1991 high-end workstations and servers. The CPU core design was leveraged from the PA-7100 and integrated with several of its off-chip support components on a single die. The PA-7100LC integrates the CPU, FPU, MIOC (memory and I/O controller) and a first-level cache on a single VLSI chip and has a direct attachment to the GSC main bus. Both CPU and FPU support the PA-RISC 1.1 Edition 3 ISA.
Details
- PA-RISC version 1.1c 32-bit
- Three functional units: 2 integer ALUs, 1 Floating Point unit1
- 2-way superscalar
- Not SMP-capable
- Five-stage pipeline
- DRAM memory & cache controller (MIOC) integrated on die, thus direct interface from the CPU to memory and cache
- 1KB on-chip I L1 instruction cache, direct mapped, 64-bit per access, prefetch from off-chip I cache
- 8KB-2MB off-chip unified I/D L1 cache, direct mapped, hashed address, virtual index, 480-600MB/s bandwidth
- The 1KB on-chip I cache is not really considered a true cache, thus the off-chip cache in fact is the system’s real L1 cache
- 32-Byte cache line size
- Support for bi-endian load-store operations
- MAX-1 multimedia extensions (subword arithmetic) for multimedia applications, e.g., MPEG decoding
- Floating Point load-store to I/O space
- 64-entry unified I/D TLB, fully associative, 4K page size
- 8-entry BTLB, page sizes from 512K - 64M
- 64-bit wide load/store operations
- I and D cache bypassing
- Stall on use D cache miss policy
- Don’t fill on miss cache hint
- Hardware TLB miss handler support
- Hardware static branch prediction
- GSC bus interface
- 64-bit ECC interface to the main memory
- Instruction line prefetch from main memory
- Up to 100MHz clock
- 14.2×14.2 mm2 die, 900,000 FETs, 0.75µ (micron), 3-layer aluminium process packaged in a 432-pin PGA
- Only one of the two integer ALUs is able to handle loads, stores and shifts, these operations can only be paired with simple math operations, like integer addition or multiplication. Both units can handle branch operations.
References
- PA7100LC ERS (External Reference Specification) (.pdf) Hewlett-Packard Company (1999)
- The PA 7100LC Microprocessor: A Case Study of IC Design Decisions in a Competitive Environment Mick Bass et al (April 1995: Hewlett-Packard Journal. Accessed May 2009)
- Design methodologies for the PA 7100LC microprocessor (.pdf) Mick Bass et al (April 1995: Hewlett-Packard Journal. Accessed May 2009)
PA-7200 (PCX-T') (Thunderbird')
Used in
- C100, C110
- D250, D260, D350, D360
- J200, J210
- K100, K200, K210, K220, K400, K410, K420
- Convex SPP1200/CD, SPP1200/XA, SPP1600/CD, SPP1600/XA
- Hitachi 9000V VQ200, VQ210, VR100, VR200, VR400
Time of introduction
Early 1995
Overview
The PA-7200 completely revised the PA-7100 processor core, leveraging only the FPU. Being a two-way superscalar processor, the PA-7200 can dispatch and execute two separate instructions at a time to its functional units. In contrast to the PA-7100 it has two separate integer ALUs and thus can execute two ALU integer operations simultaneously. Other changes include a redesigned cache architecture — while retaining the general cache layout with large off-chip L1 caches at CPU clock speed — and use of the Runway processor bus, carried on to later PA-8x00 processors. The PA-7200 was targeted towards high-performance general-purpose applications, but also on specialized applications with large working sets which could take advantage of the high-bandwidth bus interface.
Details
- PA-RISC version 1.1d 32-bit
- Three functional units: 2 integer ALUs, 1 Floating Point
- 2-way superscalar
- SMP-capable
- FPU, MMU, cache controller integrated on die, memory and I/O controller separate and off-chip
- Five-stage pipeline
- 2KB on-chip
assist
L1 cache, fully associative, holds 64 32-Byte cache lines - Off-chip L1 caches up to 1MB I and 2MB D realized in asynchronous SRAMs with one cycle latency
- (The 2KB on-chip assist cache is not really considered a true cache, thus the off-chip cache is the system’s L1 cache.)
- Caches are 64-bit per access, direct mapped, parity protected and cycled at CPU speed
- Caches are virtually indexed and physically tagged to minimize latency
- 120-entry fully associative TLB
- 16-entry BTLB
- Hardware TLB miss support
- Six predecode bits
- Support for uncached memory pages
- Bi-endian support
- Runway system interface, 64-bit wide, 120MHz, 960MB/s peak bandwidth, CPU-to-bus frequency ratios of 1.0, 0.75 and .67 processor speed possible
- Glueless interface to the Runway system bus for up to four-way SMP (four CPUs on same Runway processor bus)
- Can have up to six bus-transactions in progress at once
- CPU interfaces to U2 I/O adapters and MMC/SMC memory controllers on the Runway bus
- Up to 140MHz frequency with 4.4V core and 3.3V I/O voltage
- 14.0×15.0 mm2 die, 1,300,000 FETs, 0.55µ (micron), 3-layer metal CMOS (CMOS14A process) packaged in a 540-pin ceramic PGA package
- Power dissipation of 29W at 140MHz
References
- Design of the HP PA 7200 CPU (.pdf) Kenneth K. Chan et al (February 1996: Hewlett-Packard Journal)
- A Different Kind of RISC Dick Pountain (August 1994: BYTE Journal)
- Interview with David Fotland, September/October 2008
PA-7300LC (PCX-L2) (Velociraptor)
Used in
- 744, 745, 748
- A180, A180C
- B132L, B132L+, B160L, B180L+
- C132L, C160L
- D220, D230, D320, D330
- RDI PrecisionBook
- Hitachi 3050RX 255, 355E, 365
Time of introduction
Mid 1996
Overview
The PA-7300LC is the direct descendant of the PA-7100LC and likewise designed for low-cost systems. It is still a PA-RISC 1.1 32-bit processor in contrast to the new PA-RISC 2.0 64-bit PA-8000 introduced in the same timeframe. While the PA-7300LC is rather close to the original PA-7100LC design it has several significant enhancements:
- Large on-chip L1 caches, in contrast to the small
assist
caches of the 7100LC and 7200 - Integrated L2 controller in the MIOC
- Improved bus interface, a faster GSC
The then current process technologies made it possible to include a large L1 cache on the CPU die, breaking a long-standing HP tradition of large off-chip L1 caches. The PA-7300LC was the final 32-bit, PA-RISC version 1.1 CPU, later workstations and servers used 64-bit PA-RISC 2.0 processors.
- PA-RISC version 1.1e 32-bit
- Three functional units: 2 integer ALUs, 1 Floating Point unit1
- 2-way superscalar
- MAX-1 multimedia extensions (subword arithmetic) for multimedia applications (not explicitly mentioned on the PA7300LC, but its documentation states support for MAX-1 instructions)
- 64KB/64KB I/D on-chip L1 caches, each two-way set associative, virtually indexed
- Cache line size of 32 Byte
- Caches have a 64-bit datapath to the execution units, 256-bit datapath to main memory
- Optional unified I/D L2 off-chip cache, up to 8192KB
- No hashing for both I and D caches
- L2 cache is write-through, direct mapped, physically indexed and physically tagged
- Instruction prefetch buffer moved from memory controller to L1 instruction cache, thus allowing prefetch hits without penalty
- On-chip MIOC memory controller
- 96-entry unified I/D TLB
- 8-entry BTLB
- 4-entry ILAB
- GSC system interface (implements GSC+ features), maximum clock frequency of 40MHz — actual system implement from 33MHz (132MB/s), 36MHz (140MB/s) and up to 40MHz (160MB/s)
- Either 64-bit or 128-bit datapath from execution units to the memory
- Up to 180MHz frequency with 3.3V core voltage
- 15.3×17.0 mm2 die, 9,200,000 FETs, 0.5µ (micron), 4-layer metal CMOS (CMOS14C process) packaged in a 464-pin ceramic PGA package
- Only one of the two integer ALUs is able to handle loads, stores and shifts, these operations can only be paired with simple math operations, like integer addition o multiplication. Both units can handle branch operations.
References
- PA7300LC ERS (External Reference Specification) (PDF, 716KB)
- Hewlett-Packard Company (1996).
-
The PA-7300LC: the first
System on a Chip
(archive.org mirror) - Tom Meyer (1996: Presentation for Microprocessor Forum 1995).
- The PA 7300LC Microprocessor: A Highly Integrated System on a Chip (PDF, 50KB).
- Terry W. Blanchard and Paul G. Tobin (June 1997: Hewlett-Packard Journal).
PA-8000 (PCX-U) (Onyx)
Used in
- C160, C180
- D270, D280, D370, D380
- J280, J282
- K250, K260, K450, K460
- R380
- T600
- HP/Convex SPP2000 (S-Class/X-Class)
- Stratus Continuum 628, 1228
Time of introduction
January 1996
Overview
The PA-8000 is a four-way superscalar 64-bit processor with aggressive out-of-order (OoO) execution capabilities. It has four integer, four floating-point and dual load/store units, a large OoO dispatch window and, following a long HP tradition, no on-chip caches. The PA-8000 is the first chip to implement the 64-bit PA-RISC 2.0 architecture which includes many extensions to support 64-bit computing. This includes that all integer registers and functional units (ALU, shift/merge) have been widened to 64-bit to support native 64-bit integer computing. The flat address space was also extended from 32- to 64-bit — however, PA-RISC 2.0 processors support only a physical address space/addressable physical memory of 40-bit/1TB (PA-8000 to PA-8600) to 44-bit/16TB (PA-8700 and up). Other extensions in the PA-8000 include fast TLB insert instructions, memory prefetch instructions, support for variable sized pages, branch prediction hinting and new FPMAC (Floating Point Multiply Accumulate) units. The instruction decode logic is not integrated with the functional units’ pipeline logic, which allows the chip to partially decode instructions in advance of the actual execution (by the functional units).
A key feature of the PA-8000 and all other PA-RISC 2.0 processors is the IRB (Instruction Reorder Buffer), which enables the processor to perform its own instruction scheduling in hardware, independent of compiler or other software technologies. The IRB can store up to 28 computation and 28 load/store instructions; it tracks interdepencies between these instructions and allows execution as soon as they are ready. Also tracked are branch prediction outcomes and with re-scheduling the CPU can execute instructions past cache misses. The IRB plays the key part in the OoO execution capabilty of the chip.
Details
- PA-RISC version 2.0 64-bit
- Ten functional units: 2 integer ALUs, 2 shift/merge units, 2 complete load/store pipelines, 2 Floating Point multiply/accumulate units, 2 Floating Point divide/square root units
- 4-way superscalar
- SMP-capable
- External memory and I/O controllers
- Two address adders
- 96-entry fully-associative dual-ported TLB
- TLB miss penalty of 61 cycles
- 32-entry BTAC (Branch Target Address Cache)
- 256-entry BHT (Branch History Table)
- Dynamic and static branch prediction modes
- Off-chip L1 caches up to 1MB I and 1MB D, realized in synchronous 6.7ns (150MHz) late-write 1Mb SRAMs, one cycle latency
- Caches are direct-mapped and dual-ported
- 56-entry instruction queue/reorder buffer (IRB)
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g., MPEG decoding
- Each instruction includes five predecode bits
- Bi-endian support
- Runway system/memory bus, 120MHz, 64-bit wide, featuring split transactions and glueless multiprocessing. Max. throughput of 960MB/s
- CPU interfaces to UTurn I/O adapters and MMC/SMC memory controllers on the Runway bus
- Up to 180MHz frequency with 3.3V core voltage
- 17.7×19.6 mm2 die, 4,500,000 FETs, 0.5µ (micron), 5-layer metal CMOS packaged in a 1,085-pin flip-chip LGA package
References
- Advanced Performance features of the 64-bit PA-8000 (archive.org mirror)
- Doug Hunt (1995: IEEE CS Press CompCon 5). [Article reprint for cpus.hp.com]
- PA-8000 Combines Complexity and Speed (archive.org mirror)
- Linley Gwennap (1994: Microprocessor Report, Volume 8 Number 15). [Article reprint for vanished cpus.hp.com]
- Four-Way Superscalar PA-RISC Processors (PDF, 190KB)
- Anne P. Scott et al (August 1997: Hewlett-Packard Journal).
PA-8200 (PCX-U+) (Vulcan)
Used in
Time of introduction
May 1997
Overview
Shortly after the introduction of the PA-8000 the design team noted several aspects of this chip for improvement in the successor:
- Branch prediction
- TLB miss rates
- Cache sizes
The new chip should offer improved performace, compatibility with existing applications and short time to market,
with the whole design heavily leveraged from the existing PA-8000 foundation.
The availability of new 4Mb SRAMs with faster access times allowed for an increased CPU clock speed and bigger caches.
Smaller changes include an increase to the BHT and TLB as
high benefit, low risk
improvements.
Details
- PA-RISC version 2.0 64-bit
- Ten functional units: 2 integer ALUs, 2 shift/merge units, 2 complete load/store pipelines, 2 Floating Point multiply/accumulate units, 2 Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- SMP-capable
- External memory and I/O controllers
- 120-entry fully-associative dual-ported TLB
- 42-entry BTAC (Branch Target Address Cache)
- 1024-entry BHT (Branch History Table)
- Dynamic and static branch prediction modes
- Off-chip L1 caches up to 2MB I and 2MB D, realized in synchronous 5ns (200MHz) late-write 4Mb SRAMs, one cycle latency
- Caches are direct-mapped and dual-ported
- 56-entry instruction queue/reorder buffer (IRB)
- Each instruction includes five predecode bits
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g., MPEG decoding
- Bi-endian support
- Runway system/memory bus, 120MHz, 64-bit wide, featuring split transactions and glueless multiprocessing. Max. throughput of 960MB/s
- CPU interfaces to UTurn I/O adapters and MMC/SMC memory controllers on the Runway bus
- Up to 300MHz frequency with 3.3V core voltage
- 17.7×19.6 mm2 die, 4,500,000 FETs, 0.5µ (micron), 5-layer metal CMOS packaged in a 1,085-pin flip-chip LGA package
References
- Four-Way Superscalar PA-RISC Processors (PDF, 190KB)
- Anne P. Scott et al (August 1997: Hewlett-Packard Journal).
- HP Pumps Up PA-8x00 Family (archive.org mirror)
- Linley Gwennap (October 1994: Microprocessor Report, Volume 10 Number 14). [Article reprint for vanished cpu.hp.com]
PA-8500 (PCX-W) (Vulcan)
Used in
- A400-44 (rp2400), A500-44 (rp2450)
- B1000, B2000
- C360, C3000
- J5000, J7000
- L1000-36, L1000-44 (rp5400), L2000-36, L2000-44 (rp5450)
- N4000-36, N4000-44 (rp7400)
- V2500
- Stratus Continuum 419, 429, 616S, 616, 619, 629, 1219, 1229
Time of introduction
September 1998
Overview
The PA-8500 processor is a direct evolution of the PA-8000 and PA-8200 processors, taking over a very similar processing core. However, the PA-8500 implemented large on-die L1 caches, a first for PA-RISC processors and a break with the long-standing HP tradition of keeping the large L1 caches off-chip. (The two years older PA-7300LC also includes on-chip L1 caches, albeit much smaller). There were no other significant changes to the processing core, besides small increases to the TLB and BHT.
The main challenge in the PA-8500 development were the large on-chip L1 caches, which had to fit onto the allocated die area and be able to keep up with the instruction reordering in the IRB. The data cache is composed of 0.5MB banks, implemented with four 0.125MB arrays providing error correction. The instruction cache is implemented as one bank of 0.5MB four-way set associative pipelined cache, providing 128 bits of instruction per cycle plus pre-decode bits.
Details
- PA-RISC version 2.0 64-bit
- Ten functional units: 2 integer ALUs, 2 shift/merge units, 2 complete load/store pipelines, 2 Floating Point multiply/accumulate units, 2 Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- SMP-capable
- External memory and I/O controllers
- 160-entry fully-associative dual-ported TLB
- 32-entry BTAC (branch target address cache)
- 2048-entry BHT (branch history table)
- Dynamic and static branch prediction modes
- On-chip L1 caches 0.5MB I and 1MB D, each 4-way set associatve
- 32 or 64 Byte cache line size
- Supports up to 1 TB of physically addressable memory (40-bit physical addresses)
- 56-entry instruction queue/reorder buffer (IRB)
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g., MPEG decoding
- Bi-endian support
- Runway+/Runway DDR system/memory bus, 125MHz, 64-bit, DDR (double data rate), about 2.0GB/s peak bandwidth
- CPU interfaces in most systems to the Astro memory and I/O controller (on very few configurations the PA-8500 attaches to the DEW Runway ports/converters of the Stretch chipset)
- Up to 440MHz frequency with 2.0V core voltage
- 21.3×22.0 mm2 die, 140,000,000 FETs, 0.25µ (micron), 5-layer metal CMOS packaged in a 544-pin LGA package
References
- HP Pumps Up PA-8x00 Family (archive.org mirror)
- Linley Gwennap (October 1994: Microprocessor Report, Volume 10 Number 14). [Article reprint for vanished cpu.hp.com]
- A 500 MHz 1.5 MByte Cache with On-Chip CPU (PDF, 141KB)
- Jonathan Lachman and J. Michael Hill (1997: ISSCC).
- PA-8500: The Continuing Evolution of the PA-8000 Family (archive.org mirror)
- Gregg Lesartre and Doug Hunt (1997: Proceedings of CompCon, IEEE CS Press). [Article reprint for vanished cpu.hp.com]
PA-8600 (PCX-W+) (Landshark)
Used in
- A400-5X (rp2400), A500-5X (rp2450)
- B2000 (some), B2600
- C3600
- J5600, J6000, J7600
- L1000-5X (rp5400), L2000-5X (rp5450)
- L1500-5X (rp5430), L3000-5X (rp5470)
- N4000-5X (rp7400)
- V2600
- Superdome
- Stratus Continuum 439, 449, 651-2, 652-2, 1251-2, 1252-2
Time of introduction
January 2000
Overview
The PA-8600 is a PA-8500 with minor modifications for a new manufacturing process in order to achieve higher clock speeds, which was the main aim of developing the PA-8600. One of the few changes to the original design is a quasi LRU replacement policy for the instruction cache.
Details
- PA-RISC version 2.0 64-bit
- Ten functional units: 2 integer ALUs, 2 shift/merge units, 2 complete load/store pipelines, 2 Floating Point multiply/accumulate units, 2 Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- SMP-capable
- External memory and I/O controllers
- 160-entry fully-associative dual-ported TLB
- 32-entry BTAC (branch target address cache)
- 2048-entry BHT (branch history table)
- Dynamic and static branch prediction modes
- On-chip L1 caches 0.5MB I and 1MB D, each 4-way set associatve
- 32 or 64 Byte cache line size
- Supports up to 1 TB of physically addressable memory (40-bit physical addresses)
- 56-entry instruction queue/reorder buffer (IRB)
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g., MPEG decoding
- Quasi LRU replacement policy for the instruction cache
- Bi-endian support
- Runway+/Runway DDR system/memory bus, 125MHz, 64-bit, DDR (double data rate), about 2.0GB/s peak bandwidth
- CPU interfaces in smaller systems to the Astro memory and I/O controller, in larger/mainframe systems to the DEW Runway ports/converters of the Stretch chipset or to the Cell chipset (probably with converters, since Cell is also an Itanium chipset)
- Up to about 550MHz frequency with 2.0V core voltage
- 21.3×22.0 mm2 die, 140,000,000 FETs, 0.25µ (micron), 5-layer metal CMOS packaged in a 544-pin LGA package
PA-8700 (PCX-W2) (Piranha)
Used in
- A400-6X (rp2430), A500-6X, A500-7X (rp2470)
- C3650, C3700, C3750
- J6700
- L1500-6X, L1500-7X, L1500-8X (rp5430), L3000-6X, L3000-7X, L3000-8X (rp5470)
- N4000-6X, N4000-7X (rp7400)
- N4000-6X, N4000-7X, N4000-8X (rp7405, rp7410)
- Superdome
Time of introduction
August 2001
Overview
The PA-8700 is an enhanced PA-8500 core with several modifications. As all PA-8x00 processors the PA-8000, the logic core is still very close to the original PA-8000 core from 1997. All subsequent PA-RISC processors from HP were based on this basic PA-RISC version 2.0 design while adding features and slight modification. The PA-8700 significally enhanced the on-chip L1 caches and TLB while switching to a new manufactoring process helped increasing the clock speed. The PA-8700 was at its time one of the largest available commercial processors and one of the first manufactured in a SOI (Silicon On Insulator) process. After the Intel-fabbed PA-8500 and PA-8600, the PA-8700 was produced in IBM’s fabs after HP gave up its own in the 1990s.
Details
- PA-RISC version 2.0 64-bit
- Ten functional units: 2 integer ALUs, 2 shift/merge units, 2 complete load/store pipelines, 2 Floating Point multiply/accumulate units, 2 Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- SMP-capable
- External memory and I/O controllers
- 240-entry fully-associative dual-ported TLB
- 32-entry BTAC (branch target address cache)
- 2048-entry BHT (branch history table)
- Dynamic and static branch prediction modes
- 0.75MB I and 1.5MB D on-chip L1 caches, each 4-way set associatve, implemented in independent 0.75MB banks.
- 32 or 64 Byte cache line size
- Data cache prefetching
- Quasi LRU replacement policy for both the instruction and data cache.
- Supports up to 16 TB of physically addressable memory (44-bit physical addresses)
- 56-entry instruction queue/reorder buffer (IRB)
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g., MPEG decoding
- Bi-endian support
- Support for hardware lock-stepping, i.e. operating multiple chips in parallel to detect faults
- Runway+/Runway DDR system/memory bus, 125MHz, 64-bit, DDR (double data rate), about 2.0GB/s peak bandwidth
- CPU interfaces in smaller systems to the Astro memory and I/O controller, in larger/mainframe systems to the DEW Runway ports/converters of the Stretch chipset or to the Cell chipset (probably with converters, since Cell is also an Itanium chipset)
- Up to 750MHz (875MHz on the PA-8700+) frequency with 1.5V core voltage
- 16.0×19.0 mm2 die, 186,000,000 FETs, 0.18µ (micron), 7-layer Silicon-on-Insulator CMOS packaged in a 544-pin LGA package
References
- A 900MHz 2.25MByte Cache with On Chip CPU (PDF, 119KB)
- J. Michael Hill and Jonathan Lachman (2000: ISSCC).
PA-8800 (Mako)
Used in
- C8000
- L1500-9X (rp5430), L2000-9X (rp5450)
- N4000-9X (rp7405, rp7410)
- rp3410, rp3440
- rp4410, rp4440
- rp7420
- rp8400, rp8410, rp8420
- Superdome
Time of introduction
2004
Overview
The dual-core PA-8800 Mako consists of two seperate PA-8700 cores on a single die with very large off-die L2 caches on the processor module. The clock speed was only increased slightly, while the processor bus interface was redesigned to use the HP/Intel Itanium/McKinley bus. Mako was supposed to breathe fresh life in the PA-RISC line, though it had strong internal competition from the Itanium line, based on HP development together with Intel, and was not marketed much. Most systems supporting PA-8800s use the HP zx1 chipset and could be hardware-upgraded to use Itanium 2/IA64 processors.
Details
- PA-RISC version 2.0 64-bit
- Twenty functional units: four integer ALUs, four shift/merge units, four complete load/store pipelines, four Floating Point multiply/accumulate units, four Floating Point divide/square root units
- 4-way superscalar
- Two address adders
- SMP-capable
- External memory and I/O controllers
- 240-entry fully-associative dual-ported TLB per core
- 32-entry BTAC (branch target address cache) per core
- 2048-entry BHT (branch history table) per core
- Dynamic and static branch prediction modes
- 0.75MB I and 0.75MB D on-chip L1 caches per core
- No data passing between the cores’ L1 caches
- 32MB off-chip L2 cache, four-way associative, physically indexed and tagged
- L2 cache is shared between both CPU cores
- L2 cache controller is on-die
- L2 implemented in DDR-ESRAM, four 8MB chips, 300MHz clock, each 2.7GB/s bandwidth
- Total >10GB/s L2 cache bandwidth
- 1MB SRAM tags for L2 cache
- ECC for L2 data and tags
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g., MPEG decoding
- Itanium 2/McKinley processor bus, 200MHz clock (
double-pumped
), 128-bit datapath, 6.4GB/s bandwidth, data ECC-protected, signals parity - CPU interfaces to the Cell chipset or the zx1 chipset’s MIO
- Up to 1 GHz frequency with 1.5V core voltage
- 23.6×15.5 mm2 die, 300,000,000 FETs, 0.13µ (micron), 8-layer Silicon-on-Insulator CMOS (fabbed by IBM)
References
- HP’s Mako Processor (PDF, 1.4MB)
- David J. C. Johnson (2001: Microprocessor Forum).
PA-8900
Used in
- rp3410, rp3440
- rp4410, rp4440
- rp7440, rp8440
- C8000
- L1500-9X (rp5430), L2000-9X (rp5450) (probably)
- N4000-9X (rp7405, rp7410) (probably)
- Superdome
Time of introduction
2005
Overview
The PA-8900 is a slightly tweaked PA-8800 processor with a doubled L2 cache and higher clock frequency, keeping the tradition of only small upgrades in the 64-bit processor generation. It is probably the last processor of the PA-RISC family. Future systems will be based on Itanium-family chips. After HP dropped its line of Itanium workstations the PA-8900-powered C8000 workstation re one of the last HP-UX workstations.
Information on the PA-8900 is limited, it seems there was not much interest releasing details on its architecture.
Details
- PA-RISC version 2.0 64-bit
- Twenty functional units: four integer ALUs, four shift/merge units, four complete load/store pipelines, four Floating Point multiply/accumulate units, four Floating Point divide/square root units
- Two address adders
- SMP-capable
- External memory and I/O controllers
- 240-entry fully-associative dual-ported TLB per core
- 32-entry BTAC (branch target address cache) per core
- 2048-entry BHT (branch history table) per core
- Dynamic and static branch prediction modes
- 4-way superscalar
- 0.75MB I and 0.75MB D on-chip L1 caches per core
- 64MB off-chip L2 cache, four-way associative, physically indexed and tagged
- ECC for L2 data and tags
- MAX-2 multimedia extensions (subword arithmetic) for multimedia applications, e.g., MPEG decoding
- Itanium 2/McKinley processor bus, 200MHz clock (
double-pumped
), 128-bit datapath, 6.4GB/s bandwidth, data ECC-protected, signals parity - CPU interfaces to the Cell chipset or the zx1 chipset’s MIO
- 44 bit physical addressing
- 64 bit virtual addressing
- Four GB maximum page size
- Up to 1.1 GHz frequency
- 23.6×15.5 mm2 die, 317,000,000 FETs, 0.13µ (micron), 8-layer Silicon-on-Insulator CMOS (apparently fabbed by IBM)
References
- Overview of the HP 9000 rp3410-2, rp3440-4, rp4410-4, and rp4440-8 Servers (PDF, 700KB)
- Hewlett-Packard (2005).
Hitachi PA/50
Used in
Time of introduction
About 1993
Overview
The PA/50 is a PA-RISC version 1.1 compatible processor designed and manufactured by Hitachi. Two designs were developed: M and L (L for low-cost). They were used as personal workstation processors and high-end embedded controllers. Hitachi integrated a set of features previously not implemented at that time in other PA-RISC processors, e.g., on-chip caches, data-prefetching, a power-saving mode and SDRAM support.
Details
- PA-RISC version 1.1 32-bit
- Built-in, pipelined FPU
- L1 I: 8KB, 2-way set-associative, 32-byte blocks
- L1 D: 4KB, 2-way set-associative, 32-byte blocks, copy-back
- L1 caches are on-chip
- Uncacheable memory (per page)
- TLB: I/D 32/64-entry, 2-way set, 4K-page, each +2 additional block entries
- BTLB (256KB-32MB)
- Seven 32-bit shadow registers for fast interrupts
- Data-prefetching
- Non-blocking cache
- Power-saving mode, reducing frequency to 1/8
- Support for SDRAM
- PA/50L: Up to 33MHz frequency with 3.3V core voltage
- PA/50M: Up to 60MHz frequency with 5.0V core voltage
- 11.5×12.0 mm2 die, 1,280,000 FETs, 0.6µ (micron), 3-layer metal CMOS packaged in a 160-pin plastic QFP package
References
- PROgress (PA-RISC) Newsletter - comp.sys.hp
- Candace Doyle (October 1993: Precision Risc Organization. Accessed December 2007)
Hitachi HARP-1
Used in
- Hitachi SR2201 supercomputer (HARP-1E)
- Probably others
Time of introduction
June 1994
Overview
The HARP-1 is a PA-RISC version 1.1 compatible CPU from Hitachi, apparently a larger and faster version of the above PA/50. Not much information is available on the processors.
Apparently the HARP-1E variant includes (pseudo
) vector processing modifications/add-ons
and was used in Hitachi vector/supercomputers.
It seems the L1 cache was increased to 16KB/16KB instruction/data.
Details
- PA-RISC version 1.1 32-bit
- Three functional units: two integer ALUs and one floating point unit (and two shift-merge units)
- Six-stage pipeline
- Built-in, pipelined FPU
- Built-in memory controller (Memory Interface Unit, MIU)
- 2-way superscalar
- L1 I cache: 8KB, 1-way set-associative, 32-byte blocks
- L1 D cache: 16KB, 2-way set-associative, 32-byte blocks, copy-back
- L1 caches are on-chip
- L2 I/D 512/512KB, off-chip
- TLB: I/D 128/128-entry, 1-way set
- (Some say a second level TLB was included)
- L2 Cache bus: 128-bit (ECC) data path to L2 caches
- Processor bus: 64-bit (parity) data path to main memory and I/O
- Up to 150MHz frequency with 3.3V core voltage, 17W power dissipation (at 120MHz)
- 16.2×16.5 mm2 die, 2,800,000 FETs, 0.5µ (micron) 3-layer aluminium + 1-layer tungsten BiCMOS, packaged in 595-pin PGA
References
- Chronology of Workstation Computers (1993) Ken Polsson (November 2007. Accessed November 2007)
- PROgress (PA-RISC) Newsletter - comp.sys.hp Candace Doyle (October 1993: Precision Risc Organization. Accessed December 2007)
- Basic Concept of Cooperative Timing-driven Design Automation Technology for High-speed RISC Processor HARP-1 (PDF) Hidekazu Terai et al (October 1999: Hitachi Ltd. Accessed January 2008)
- A 120-MHz BiCMOS Superscalar RISC Processor, Shigeya Tanaka et al (IEEE Journal of Solid-State Circuits, vol. 29, no. 4, April 1994)
Other processors
| CPU | ISA | Clock max |
FETs | Cache | Bus | Super scalar |
Units | Controllers on-chip |
|---|---|---|---|---|---|---|---|---|
| Winbond W89K | PA 1.1 32-bit |
33/66MHz | 1.1M | 2/2KB I/D on-chip L1 |
Intel 486 | 1-way | 1 Integer | none? |
| Winbond W90210 W90215 |
PA 1.1 32-bit |
33/66MHz | ? | 4/8KB I/D on-chip L1 |
Intel 486 | 1-way | 1 Integer MAX-1 |
DRAM DMA PCI I/O |
| Winbond W90220 W90221 |
PA 1.1 32-bit |
150MHz | ? | 4/4KB I/D on-chip L1 |
Intel 486 | 1-way | 1 Integer 1 MAC(DSP) MAX-1 |
DRAM DMA PCI IDE I/O VGA (W90221) TV (W90221) |
| Oki OP32 | PA 1.1 32-bit |
33MHz | 1.1M | ? | ? | 1-way | 1 Integer | DRAM DMA |
Winbond W89K
Time of introduction: Spring 1994
The Winbond W89K is an embedded 32-bit PA-RISC controller chip, pin-compatible with the then-popular Intel 80486DX. It could be used as a drop-in replacement in mid-1990s PCs together with Winbond BIOS replacement chips. Rationale was to allow hardware developers utilize existing 486DX mainboards and components for a shorter product development process. The W89K is a level 0 PA-RISC 1.1 implementation: a 32-bit PA-RISC processor without virtual addressing.
- PA-RISC version 1.1 (third edition) 32-bit
- Level 0 implementation (no virtual addressing): no MMU
- Five-stage pipeline
- One functional unit: one 32-bit integer ALU
- 2KB/2KB I/D on-chip L1 caches
- 80486 (Intel) bus interface
- 33MHz and 66MHz clock speeds were available, with the latter apparently having been achieved with a clock-doubling also used in the Intel’s 80486DX/2 (the chips uses an internal clock-doubler on the external 33MHz bus)
- On-chip JTAG support
- 14.3×14.3 mm2 die, 1,100,000 FETs, 0.8µ (micron), 3-layer metal CMOS
References
- PROgress (PA-RISC) Newsletter - comp.sys.hp
- Candace Doyle (October 1993: Precision Risc Organization. Accessed December 2007)
- Winbond, Varian sign deal for thin-film IC process
- Terho Uimonen (April 1994: Electronic News. Accessed January 2008 at findarticles.com)
- PA-RISC in a PC box (was: Re: HP's vision of a low-end 3000) - comp.sys.hp.mpe
- Stan Sieler (Februar 1996. Accessed December 2007)
Winbond W90210/215
Time of introduction: Fall 1997
Shortly after the W89K embedded controllers Winbond introduced more
sophisticated PA-RISC processors with the W90K line of embedded controllers.
The W90210F still was 32-bit PA-RISC 1.1 but integrated many external I/O
components on the chip — DRAM and DMA controllers, a PCI bridge and various
I/O ports.
As its predecessor, the W90210F was a level 0 PA-RISC 1.1 implementation without virtual addressing.
It was apparently used in various Internet appliances
: set-top boxes, TV sets, DVD players, PDAs,
VoIP devices, and for industrial automation.
The W90215 is identical to the W90210 but did not include license rights for the
embedded operating system (and was thus cheaper).
- PA-RISC version 1.1 (third edition) 32-bit
- Level 0 implementation (no virtual addressing): no MMU
- Five-stage pipeline
- One functional unit: one 32-bit integer ALU
- L1 I cache: 4KB, direct mapped, 32-byte blocks, 256 entries
- L1 D cache: 8KB, 2-way set-associative, 32-byte blocks, 2×64 entries, write-back
- MAX-1 multimedia extensions (subword arithmetic) for multimedia applications, e.g., MPEG decoding
- 80486 (Intel) bus interface
- DRAM controller
- ROM/FLASH interface
- DMA controller (2-channel 8-bit)
- PCI bridge
- Two serial ports
- Parallel port
- 33MHz and 66MHz clock speeds (?)
- 208-pin PQF package
References
- W90210F PA-RISC Embedded Controller (.pdf)
- Winbond Electronics Corp. (October 1997. Accessed January 2008)
Winbond W90220 and W90221
Time of introduction: Spring 1999
The W90220F is, as its predecessor W90210, a 32-bit PA-RISC 1.1 design without MMU but integrated many external I/O components on the chip — DRAM and DMA controllers, PCI bridge, IDE channels, I/O ports and, on the W90221, a graphics/TV chip. It had the same target systems of set-top boxes and internet appliances. The sucessor W90221 is apparently similar, with higher clock speed, integrated (S)VGA and TV controller
- PA-RISC version 1.1 (third edition) 32-bit
- Level 0 implementation (no virtual addressing): no MMU
- Six-stage pipeline
- Two functional units: one 32-bit integer ALU and one 32-bit multiply-accumulate (MAC) module (for DSP purposes, can be used as two 16-bit modules too)
- L1 I cache: 4KB, direct mapped, 32-byte blocks, 256 entries
- L1 D cache: 4KB, 4-way set-associative, write-back or write-through
- MAX-1 multimedia extensions (subword arithmetic) for multimedia applications, e.g., MPEG decoding
- 80486 (Intel) bus interface
- Hardware dynamic branch prediction
- 256-entry branch-target-buffer (i. e. BTAC)
- Memory controller (supports DRAM, EDO-DRAM and SRAM; W90221 additionally SDRAM)
- ROM/FLASH interface
- DMA controller (2-channel 8-bit)
- IDE I/O controller (four 16-bit channels)
- W90221: VGA and TV controller (W9971)
- PCI bridge
- Two serial ports
- Parallel port
- Serial ICE port
- Up to 150MHz clock speed at 3.3V/5V I/O and 3.3V core
- W90221: 133MHz clock speed with apparently 3.3V at both I/O and core
- 0.35µ (micron) single-poly-triple-metal CMOS
- 208-pin PQF package
References
- W90220F PA-RISC Embedded Controller (.pdf)
- Winbond Electronics Corp. (March 1999. Accessed January 2008)
Oki OP32
Oki Semiconductor OP32/50N was introduced in 1994 as an embedded controller, based on a 32-bit PA-RISC design with integrated DRAM and DMA controllers. The chip was targeted at laser printers, Fax machines, X-Terminals and the Telecom and Automotive markets.
- PA-RISC version 1.1 32-bit
- 33MHz frequency
- 14.3×14.3 mm2 die, 1,100,000 FETs, 0.8µ (micron), 3-layer metal CMOS
References
- PROgress (PA-RISC) Newsletter - comp.sys.hp
- Candace Doyle (October 1993: Precision Risc Organization. Accessed December 2007)