A of the present day feature.

Best selection wholesale flowers Huge selection great savings.

A of the present day feature, called hardware cache coherent I/O, was introduced into the HP PA-RISC architecture as part of the HP 9000 J/K-class program. This feature allows the I/O hardware to participate in the system-defined cache coherency scheme, thereby offloading the memory method and processors of unnecessary overhead and contributing to greater hypothesis performance. This paper reviews I/O data transfer, introduces the general [i]or[/i] abstract notion of cache coherent I/O from a hardware perspective, discusses the implications for HP-UX(*) software, illustrates a certain number of of the benefits realized according to HP's networking products, and not absents measured performance results.

I/O Data Transfer

To understand the impact of the HP 9000 J/K-class coherent I/O implementation, it is necessary to take a gradation back and get a high-level view of in what manner data is transferred between I/O devices and main memory forward HP-UX systems.

There are sum of two units basic models for data transfer: direct memory access (DMA) and programmed I/O (PIO). The difference between the sum of two units is that a DMA transfer takes place without assistance from the innkeeper processor while PIO requires the innkeeper processor to move the data according to reading and writing registers forward the I/O device. DMA is typically used for devices like disks and LANs which influence large amounts of data and for which performance is important. PIO is typically used for low-cost devices for which performance is les important, like RS-232 ports. PIO is also used for about high-performance devices like graphics frame buffing-apparatuss if the programming model requires it.



All data transfers incline data either to main memory from an I/O device inbound) or from main memory to an I/O device outbound) These transfers require undivided or more transactions on each bus between the I/O device and main memory. Fig. 1 exhibits a typical PA-RISC system with a two-level bus hierarchy. PA-RISC processor-to-memory buses typically support transactions in sizes that are powers of 2 up to 32 byte that is, READ4, WRITE4, READ8, WRITE8, READ16, WRITE16, READ32, WRITE32, where the number attributes to the number of byte in the transaction. Each transaction has a master and a slave; the master initiates the transaction and the slave must accord Write transactions move data from the master to the slave, and read transactions cause the slave to be agreeable to with data for the master. The processor is always the master for PIO transactions to the I/O device. An I/O device is always the master for a DMA transaction. For example, if a software device driver is reading (PIO) a 32-bit register onward the fast/wide SCSI device shown in Fig. 1 it causes the processor to master a READ4 transaction to the device, which conclusions in the I/O adapter mastering a READ4 transaction forward the I/O bus, where the fast/wide SCSI device accords with the four bytes of data. If the Fibre Channel interface card is programmed to DMA transfer 4K byte of data from memory to the disk, it will master 128 READ32 transactions to come by the data from memory. The bridge forwards transactions in the one and the other directions as appropriate.

Because PIO transactions are not in memory address space and are therefore not a coherency be of importance to the rest of this article discusses DMA transactions merely The coherent I/O hardware has no impact at all in succession I/O software device drivers that interact with devices via PIO exclusively.

Hardware Implications

Cache memory is defined as a small, high-speed form of memory located close to the processor. forward the HP PA 7200 and PA 8000 processors, a portion of the software virtual addres (called the virtual index) is used as the cache lookup Main memory is frequently larger and slower than cache. It is accessed using physical addresses, in this way a virtual-to-physical address translation must present itself before issuing any request to memory. Entries in the PA 7200 and PA 8000 caches are stored in lines of 32 byte Since data that is referenc one time by the processor is likely to be referenc again and nearby data is likely to be accessed as well, the line size is pickeded to optimize the frequency with which it is accessed while minimizing the overhead associated with obtaining the data from main memory. The cache contains the in the greatest degree recently accessed lines, thereby maximizing the rate at which processor-to-memory petition fors are intercepted, ultimately reducing latency.

When a processor prayers data (by doing loads), the line containing the data is copied from main memory into the cache. When a processor modifies data (by doing stores), the archetype in the cache will become more up-to-date than the pattern in memory. HP 9000 J/K-class productions resolve this stale data point in dispute by using the snoopy cache coherency scheme defined in the Runway bus protocol. Each processor monitors all Runway transactions to determine whether the virtual index beged matches a line currently stored in its cache. This is called "snooping the bus." A Runway processor must confess a cache line exclusively or privately before it can entire a store. Once the store is unbroken the cache line is considered dirty relative to the stale memory model To maximize Runway bus efficiency, processors are not required to write this stale data back to memory immediately. Instead, the write-back operation fall outs when the cache line location is required for use by dint of the owning processor for another memory access. If, following the store still before the write-back, another processor issues a read of this cache line, the owning processor will snoop this read supplication and respond with a cache-to-cache duplicate of the updated cache line data. This data is then stored in the requesting processor's cache and main memory.

...

Home