Massive Parallelism for Mission-Critical Applications

Massive Parallelism for Mission-Critical Applications

Massive Parallelism for Mission-Critical Applications

Massive Parallelism for Mission-Critical Applications Advanced Explicitly Parallel Instruction Computing (EPIC) Architecture Steve Undy Intel Corporation 3 Intel® Itanium® 9500 processor series, codenamed Poulson, is the latest Intel Itanium processor in a long line of ground breaking designs. Optimized for Explicitly Parallel ... Instruction Computing (EPIC) principles, Intel Itanium processor 9500 series’ advanced EPIC Architecture can be best summarized as exploiting parallelism on all levels: pipe- line, core, thread, memory, pipeline and instructions. End-users are now able to ex- tract more inherent parallelism in their code than ever before to deliver a new level of performance, while benefiting from the mainframe-class RAS features to deliver an always-on experience in their mission-critical enterprise. Adopting an Advanced EPIC Architecture Intel Itanium processor 9500 series represents a near clean- sheet redesign of the Intel Itanium cores to support an un- precedented amount of instruction-level parallelism in its main execution pipeline. It can execute up to 12 instructions each cycle in 4 instruction bundles. It has 2 memory execution units, 2 general purpose integer units, 2 ALU units, 2 floating point units, 3 branch units and 1 NOP unit. The Intel Itanium bundle template determines which units are candidates for executing each in- struction. The hardware algorithm used to disperse the incoming instructions into each of the 12 execution unit pipelines is simple, deterministic and efficient – allowing compilers to exactly control execution resources. To support 12-wide issue, the register files have 12 read and 12 write ports. Mid- Branch Floating Point Mid- Level Predict Execution Level Inst. Integer Float. Data BR Cache Register Pt RF Cache CTL st 1 Instruction Integer level Cache QueuesExecution st 1 st 1 level level Cache Cache BuffBerusffers Interface Pipe Line Logic Control Figure 1 Intel Itanium processor 9500 series core floorplan. New microarchitecture features an 11-stage pipeline and architectural extensions. The refreshed microarchitecture also allowed a focus on power efficiency. The power aware design of Intel Itanium processor 9500 series was essential to being able to double core count and operating frequency while simultaneously reducing maximum package power to achieve a factor of three power efficiency advantage over the previous Itanium processor design. 2 Intel Itanium processor New-Instrutions Architectural Extensions Intel Itanium processor 9500 series adds a set of new instruc- tions that extends the Itanium architecture. It adds integer multiply instructions and a count-leading-zero instruction. It adds an instruction to provide better OS control of thread behavior. It adds and extends instructions that provide more detailed data access hints as well as new user-controlled regis- ter file to control those hints. This allows compilers much finer grained control of data cache and TLB policies. It also adds an instruction for multi-line software prefetches. All of these new instructions are motivated by the desire to increase perfor- mance, both single-thread and multi-thread. Memory Parallelism Intel Itanium processor 9500 series also focuses on increasing memory parallelism by addressing throughput and queuing in the memory subsystem. The core has additional queuing for pending memory operations tweaked for throughput. Read the full Massive Parallelism for Mission-Critical Applications.

Intel Develops Explicitly Parallel Instruction Computing (EPIC)

The white paper focuses on the Explicitly Parallel Instruction Computing (EPIC) feature in the Intel® Itanium® processor 9500 series. EPIC principles exploit parallelism on all levels—pipeline, core, thread, memory, and instructions—delivering superior performance while benefiting from the mainframe-class reliability, availability, and serviceability features.

EPIC represents a paradigm shift in the development of instruction set architectures. Instead of placing the main burden of extracting parallelism and performance on the underlying computing hardware, a synergy is developed between the software ecosystem and the hardware implementation. This allows compilers, which have full access to the program source code, and the processors, which have full access to run-time information as a program executes, to be optimized for what each does best. In order to do this, the instruction set provides a rich set of features for software to optimally control the low-level hardware resources. This most notably includes the ability for compilers to specify, schedule, and exploit the many forms of parallelism inherent in user programs.

วิดีโอที่เกี่ยวข้อง