OpenPA.net
PA-RISC information - since 1999

PA-RISC Architecture

PA-RISC is Hewlett Packard’s Reduced Instruction Set Computing (RISC) architecture developed in the 1980s and used until the mid-2000s in Unix and industrial HP computers. The computers covered on this site, the HP 9000, are based on the PA-RISC architecture and processors and used custom HP system designs.

PA-RISC Phases

PA-RISC hardware and computers were developed and marketed in waves by HP, based on technological advances, market development and the ongoing computerization from the 1980s. Four distinct phases can be seen in the PA-RISC maturity cycle — beginning with the Infancy (I) of the PA-RISC architecture. in the late 1980s. The overall HP PA-RISC and HP 9000 story is covered in a separate PA-RISC history page, as are the PA-RISC release dates and prices on the PA-RISC timeline.

PA-RISC hardware period table
Period Processors Design Chips
I Infancy: Early Architecture
TS-1, NS-1, NS-2, PCX Early SIU/SPI, CTB
II Growth: 32-bit 1990s
PA-7000, PA-7100 ASP/Viper Viper. ASP
III Maturity: The heydays
PA-7100LC, PA-7300LC LASI MIOC, LASI, Wax, Dino
PA-7200, PA-8000, PA-8200 U2/UTurn MMC/SMC, U2, UTurn,
LASI, Wax, Dino, Cujo
PA-8500, PA-8600, PA-8700 Astro Astro, Elroy
Stretch DEW, Prelude, IKE, Elroy
IV Decline: 64-bit to Itanium
PA-8700, PA-8800 PA-8900 Cell CC, XBC, SBA, Elroy
PA-8800 PA-8900, Itanium 2 zx1 Pluto, Mercury

Infancy (I)

Early Precision Architecture of the late 80s. First versions of PA-RISC were released in the late 1980s as Precision Architecture with early implementations of processors and chipsets for the early HP 9000 800 series of computers. A few systems were released, but details on their exact architecture remain fuzzy. These designs were quickly superseded by new designs for both servers and workstations in the 1990s.

Systems sold in that period used PA-RISC processors such as TS-1, NS-1, NS-2 and PCX and were based on custom HP system designs. Chipsets used were the SIU/SPI main bus interfaces that connected the processors to the SMB bus that links it to memory, I/O and devices. In most cases the system processing and I/O units are made up of a large number of individual chips or boards forming the central chipset with the CIO and HP-PB I/O buses.

Growth (II)

32-bit PA-RISC in the early 1990s. PA-RISC workstations and servers became popular with PA version 1.1 processors and new chipsets and system designs built on it.

Major innovations and developments took place from the late 1980s to the early 1990s to produce the PA-RISC 1.1 architecture and popular Unix systems based on it from the early 1990s on. They did not have much in common with the early PA-RISC 1.0 systems.

Along with the architecture, PA-RISC hardware designs matured throughout the early 1990s, with popular 32-bit PA-7000 and PA-7100 systems using the ASP chipset and Viper memory controller. They utilize the VSC CPU/memory, GSC system main and SGC and EISA expansion buses, with servers using HP-PB I/O buses, all provided by separate I/O adapters/bus bridges.

Maturity (III)

The PA-RISC heydays in the 1990s. Many innovations and improvements took place in the heydays of PA-RISC in the 1990s, with 32-bit low-cost LC processors, a shift to 64-bit PA-RISC 2.0 and quite advanced designs and I/O components.

From the mid-1990s on, the integrated, low-cost PA-7100LC and PA-7300LC systems use the highly integrated LASI chipset, which combines most functions and I/O on a single chip, and an on-CPU MIOC memory controller. These system use GSC or GSC+ as main bus and a variety of expansion buses via bus adapters, ranging from HSC/GSC, EISA to PCI and VME. EISA is provided by Wax, PCI by Dino.

PA-7200 and 64-bit PA-8000 and some PA-8200 systems use the U2/Uturn I/O adapters, which attach two GSC/HSC buses to the main Runway bus, and MMC/SMC memory controllers. I/O is realized on the GSC bus with the LASI chipset and Wax and Dino I/O adapters.

PA-RISC computers from the turn of the century used 64-bit PA-8500, PA8600 and PA-8700 designs with a rope-based architecture with Astro as main system controller and separate Runway+/Runway DDR buses with I/O devices controlled by Elroy PCI bridges.

Midrange servers from that time are based on the same processors (PA-8500 to 8700) but use the sophisticated Stretch chipset, a rather complicated setup with central system controller and links to separate processor and I/O controllers and PCI bridges. Main system bus is the Itanium bus, with converters for the processors’ Runway+/Runway DDR buses.

Decline (IV)

64-bit to Itanium in the 2000s. HP transitioned to a post-RISC phase in the 2000s, releasing the last PA-RISC 2.0 processors and introducing Itanium to its server and workstation lineup. System designs converged between PA-RISC and Itanium.

PA-RISC moved towards a server-only role in the early 2000s, with a variety of servers in the rp-range and the similar Superdome mainframe. The Superdome mainframes and similar servers are based on PA-8700 and PA-8800/PA-8900 processors and use the Cell chipset, similar to the Stretch, but more scalable. Systems are made up of cells, with their own central system/memory controller, I/O controller and PCI bridges.

The last PA-RISC systems before the mainstream advent of the Itanium VLIW architecture in the mid-2000s use PA-8800/PA-8900 processors, followed by several generationis of Itanium systems. Both use the HP zx1 chipset, conceptually similar to Astro systems but with higher datarates and options, based on Itanium 2/McKinley buses.

↑ up

Precision Architecture RISC

PA-RISC is Hewlett Packard’s Reduced Instruction Set Computing (RISC) architecture from the 1980s and an offspring from active HP research and development undertakings from that time. The aim of the Precision Architecture was to replace 16-bit stack-based CPUs in HP 3000 servers and Motorola 680x0 CPUs in HP’s Unix systems with a common system architecture.

An earlier commercial design from HP from the early 1980s was the HP FOCUS architecture.

Overall PA-RISC was a rather conservative RISC design for that time:

Compared to other RISC architectures original PA-RISC was rather unspectacular — it had fewer features but remained always at competitive speeds, especially in Floating Point and multiprocessing. HP was the first to include multimedia extension in commercially available microprocessors, MAX-1 in the PA-7100LC and MAX-2 64-bit in the PA-8000, which allowed vector operations on two or four 16-bit subwords in 32-bit or 64-bit integer registers.

PA-RISC 1.0

The original PA-RISC 1.0 architecture was 32-bit and included a single instruction/data bus. PA-RISC later on moved to a Harvard-style architecture with seperate instruction and data buses.

PA-RISC 1.0 has thirty-two 32-bit integer general purpose registers (GR0-GR31), seven shadow registers (SR0-SR6) for fast-interrupts and thirty-two 64-bit Floating Point registers for the FPU, which also could be combined to 64×32-bit and 16×128-bit. The FPU is able to execute a Floating Point instruction simultaneously to the ALU.

The original addressing was 48-bit wide, it was later on expanded to 64-bit (with the introduction of the PA-8000 line).

PA-RISC 1.1

The PA-RISC architecture was extended to version 1.1 with the PA-7000 processor in 1991. The major change in PA-RISC 1.1 was the inclusion of a MMU (memory management unit), that enabled PA-RISC computers to use virtual memory. From the the second PA-RISC 1.1 processor, the PA-7100 onward all processors implement superscalar instruction execution — the ability to execute multiple instructions simultaneously.

The 32-bit PA-RISC 1.1 processors are up to two-way superscalar, later 64-bit processors up to four-way. Other significant developments in PA1.1 include the PA-7100LC and PA-7300LC processors (LC for low cost) , which integrated the memory and I/O controller onto the processor die, on the PA-7300LC additionally the cache controller and first-level cache.

PA-RISC 2.0

In 1996 the 64-bit redesign of PA-RISC was introduced with the PA-RISC 2.0 PA-8000 processor. The architectural changes were rather intrusive but stayed compatible with 32-bit PA-RISC 1.1. On a side note, the PA-RISC 2.0 and the PA-8000 were introduced before the last 32-bit PA-RISC processor — the PA-7300LC — shipped.

Main changes and features of PA-RISC 2.0 include:

The later PA-8x00 processors of the 2000s did not introduce significant changes to the architecture or logic, besides higher integration of large L1 caches in the PA-8600 and dual-core PA-8800 and PA-8900. The processors after the PA-8000 were mostly redesigns and extensions of that processor core.

Post-PA-RISC

From the mid-1990s on a parallel track to PA-RISC 2.0 development HP joined Intel in developing the VLIW Itanium architecture from its own R&D projects, called EPIC, which resulted in the Intel/HP IA64 architecture.

Since the early-2000s HP sold two lines of Unix computers and servers in parallel — PA-RISC 2.0 and Itanium. These competing designs were apparent in the Integrity servers — with the rp servers (PA-RISC) and rx servers (Itanium).

These post-PA-RISC designs were not the success many hoped and HP after the turn of the century switched to standard Intel x86 fare.

Pre-PA-RISC

The predecessor of PA-RISC in the early 1980s was the HP FOCUS architecture from the HP 9000 Series 500. FOCUS was a stack architecture, with 230 instructions both 32 bits and 16 bits wide, a segmented memory model, and no general purpose programmer-visible registers. There are thirty-nine 32-bit registers in the CPU hardware, thirty-one internal 32-bit general purpose registers, two 32-bit ALU registers, and others.

↑ up

Floating Point Unit (FPU)

The Floating Point Unit is an assist processor logically added to a system to improve the performance on floating-point operations. The processor can be on a seperate chip (e.g., PA-7000) or integrated onto the central CPU die (all PA-RISC CPUs upwards). The FPU executes special floating point instruction to perform arithmetic on its own set of independent registers (register file) and to move data between its own registers and the system’s lower memory hierarchy. The FPU execution stage is pipelined. All PA-RISC FPUs contain thirty-two 64-bit registers, which can also be used as sixty-four 32-bit registers and sixteen 128-bit registers.

↑ up

Transition Lookaside Buffer (TLB)

The Translation Lookaside Buffer is a hardware structure doing virtual-to-physical memory address translations. The TLB takes virtual page numbers and returns the corresponding physical page number. The PA-7000 is the last PA-RISC processor to use seperate I/D TLBs, all later PA 1.1 and 2.0 CPUs use a combined TLB structure.

Hitachi’s PA-RISC 1.1 derivates also used split TLBs:

Most interestingly, the older PA-RISC 1.0 processors (pre-PA-7000) have huge TLBs (even for today’s standards):

The TLB memory on these earlier CPUs was implemented mostly off-chip/off-die via separate memory (SRAM) chips.

Translation process

TLB miss handling implementations

↑ up

Block Transition Lookaside Buffer (BTLB)

Similar to the TLB, the BTLB provides virtual-to-physical address translations. The BTLB however maps large address ranges rather that single pages as the TLB. These large address ranges are block translations and therefore stored in the Block Translation Lookaside Buffer. These block translations are useful for virtual address ranges that do not get paged in or out.

BTLBs were only implemented on 32-bit PA-RISC processors (PA-7x00), 64-bit PA-RISC instead implemented variable page sizes, thus any entry can be of >4k mapping.

↑ up

Superscalar execution

Overview

A superscalar processor implementation decodes, dispatches and executes multiple instructions per cycle if dependencies between the instructions permit. This is possible if the instruction stream contains independent instructions. Superscalarity can be gained from a decoupled floating point unit (FPU) which executes floating point operations indepently from the integer ALU. More complicated variations allow for parallel load/store operations, integer calculations and so on, which need a more complex CPU design that analyzes the instructions/branches.

Every PA-RISC processor from the PA-7100 on implements superscalar execution. Instructions proceed together through the execution pipeline, which is called instruction bundling. The superscalar execution is functionally transparent to the software, the effects of any given instruction are the same whether it was executed as part of a bundle or alone. Bundling rules are applied at run-time by the hardware; optimal performance may only be gained by proper ordering of the instructions so the processor can use its full superscalar potential.

Several kinds of restrictions are placed upon the instruction bundling in PA-RISC:

For bundling purposes instructions are divided into classes:

PA-RISC superscalar instruction classes
Class Description
FLOP Floating point operation
LDST Loads and stores
ALU Integer ALU
MM Shifts, extracts, deposits
NUL Might nullify successor
BV Branch Vectored (BV) local, Branch (BE) external
BR Other branches
FSYS FTEST and FP status/exception
SYS System control instructions

PA-7100 superscalar capabilities

The PA-7100 is two-way superscalar with one integer ALU and one FPU.

Allowed bundles

PA-7100 allowed instruction bundles
First instruction Second instruction
ALU  + FLOP
LDST  + FLOP
FLOP  + ALU/LDST/Branch

PA-7100LC/PA-7300LC superscalar capabilities

These are 2-way superscalar processor implementations with two integer ALUs and one FPU. Notably only one of the two ALUs is capable to handle loads, stores and shifts.

Allowed bundles

PA-7100LC/PA-7300LC allowed instruction bundles
First instruction Second instruction
FLOP  + LDST/ALU/MM/NUL/BV/BR
LDST  + FLOP/ALU/MM/NUL/BR
ALU  + FLOP/LDST/ALU/MM/NUL/BR/FSYS
MM  + FLOP/LDST/ALU/FSYS
NUL  + FLOP
SYS Never bundled

Besides from these bundles, LDST + LDST bundles are under certain circumstances also possible. These are then called double word load/store.

Data dependencies

Several kinds of instructions cannot be bundled together because of inter-instruction data dependencies:

Control Flow

PA-7200 superscalar capabilities

This is a 2-way superscalar processor implementation. It has two integer ALUs and one FPU. Similar to the PA-7100LC, shift-merge and test condition units are not duplicated in the second ALU. To support the superscalar capabilities one additional write port and two additional read ports were added to the general registers (GR*).

Allowed bundles

PA-7100LC/PA-7300LC allowed instruction bundles
First instruction Second instruction
FLOP  + LDST/ALU/MM/NUL/BV/BR
LDST  + FLOP/ALU/MM/NUL/BR
ALU  + FLOP/LDST/ALU/MM/NUL/BR/FSYS
MM  + FLOP/LDST/ALU/FSYS
NUL  + FLOP

PA-8x00 superscalar capabilities

To be described.

↑ up

Multimedia Acceleration eXtensions (MAX-1 and MAX-2)

MAX-1 (32-bit)

MAX-1 are the original multimedia extensions from the 1990s introduced with the HP PA-7100LC processor and later also the PA-7300LC. The aim from HP in its design was to enable contemporary workstations with these CPUs to provide real-time MPEG video decompression and playback at a rate of 30 frames/second without the need for a special DSP (digital signal processing) chip, not an easy feat.

The HP design process for the PA-7100LC processor in the early 1990s included for the first time multimedia benchmarks for analyzing optimizations in the instruction set design.

The actual implementation used a small set of SIMD-MIMD instructions to faciliate the application of instructions on bundled subword data. Since these instructions use the same data paths and execution units within the processor as the regular instructions, the design team termed this intrinsic signal processing (ISP).

Sticking to conventional RISC principles, the design team decided against adding complex special-purpose instructions to the design but opted for the elegant use of the existing facilities in the CPU, which were slightly modified to understand new, packed subword data.

In 1994, the MAX-1 extensions made their way into the final PA-7100LC product and as such were the first SIMD instructions found in a general microprocessor. Less than 0.2 percent of the processor silicon area had to be used for MAX-1 additions and modifications, while allowing a very significant performance boost in affected applications.

As an example, the then-highend HP 9000 735/99 workstation with a 99  MHz processes and 512 KB cache achieved 18.7 FPS at MPEG decompression benchmarks — the new entry-level 712 workstation at 60 MHz and 64 KB cache achieved 26 FPS, an impressive feat for the time an 1990s information technology.

New MAX-1 multimedia instructions include: parallel add, parallel subtract, parallel shift left & add (i.e. multiply with integer), parallel shift right & add (i.e. division), parallel average.

MAX-2 (64-bit)

With the introduction of the new 64-bit PA-RISC 2.0 architecture in 1996 HP unveiled a new set of multimedia-oriented instructions aimed at using the processor’s resources more effectively for sub-word data. The basic components of the contemporary multimedia data were often represented as 8, 12 or 16-bit integers, for example audio sampling and pixel color depth. Doing arithmetic with data of this length would waste an considerable amount of the processor’s execution capacities, a simple addition of 16-bit data would only use one quarter of the 64-bit wide integer units datapath. To remedy this situation, MAX allows for packing of these subword data into larger words near the processor’s natural word width (64-bit on PA-RISC 2.0 processors) and using parallel instructions on them. An example would be four 16-bit additions by the 64-bit adder on four 16-bit packed subwords.

The basic functionality from the earlier 32-bit MAX-1 was taken over and four more instructions added for MAX-2. Additionally, due to the wider integer registers (now 64-bit) more subwords can be packed in one cycle, doubling the effective speed of these multimedia instructions. The MAX-2 multimedia instructions include (new in MAX-2 are in bold): parallel add, parallel subtract, parallel shift left & add (i.e. multiply with integer), parallel shift right & add (i.e. division), parallel average, parallel shift right, parallel shift left, mix and permute.

MAX-2 debuted 1996 with the PA-8000 processor and later featured on all subsequent PA-RISC 2.0 processors (PA-8x00). In contrast to contemporary multimedia extensions, MAX-2 required only very little die space (0.1 percent on the PA-8000).

↑ up

Further reading

Selected papers and articles for further reading on the PA-RISC architecture and platform

↑ up