Conceptual Simulation for Designing High Performance Computers

High performance computers (HPC) tackle problems critical to national security and citizen welfare, such as aerodynamics simulation, computation virology, and weather modeling. HPCs are expensive to acquire, so it is crucial to understand in advance how critical problems will perform and ensure the HPC design will meet requirements. Normally, this would be a task for simulation. However, due to the speed and scale of HPCs, standard simulation techniques are too slow and ineffective for predicting performance.

Peraton Labs is developing a new conceptual simulation method that can efficiently estimate the time, energy, and utilization for a program running on a potential HPC. The method is flexible and applies to a wide range of problems and a wide variety of future HPCs. Our conceptual simulation method can address key questions:

Is greater network bandwidth a better value than investing in faster CPUs or more memory?
Does increasing the number of processing nodes improve performance or does the cost of synchronization outweigh the advantage?
How can energy consumption be reduced while meeting required performance levels?

Here’s how the new method works. First, repeated runs are made of test problems on existing testbeds. For each run, data is recorded on the problem (size, complexity, etc.), testbed (number of nodes, amount of memory, types of CPUs, etc.), and program performance (execution time, watts, output size).

Second, the data is combined to build a parametric problem execution model (PPEM), which incorporates computation and communication to characterize the performance of the test problem or program. The computation component of the PPEM aggregates information on computation time and energy use as a function of CPU speed, available RAM, speed of RAM, number of distributed nodes, input size, and other aspects of the problem. The communication component aggregates information on data flows between computation nodes, data flow patterns (one-to-one, scatter, gather, broadcast, multicast, etc.), and relative timing of flows. The PPEM links the components to predict problem execution time and energy use.

In the final step, the problem is conceptually executed on a model of a proposed HPC system. Unlike a standard simulation, which is too slow, conceptual execution applies the PPEM, which is fast. The HPC model, describing the hardware configuration, number of nodes, CPUs, cores, switches, and their speeds, latency, and energy cost (in use and when idle), is virtually instantiated in a simulator.

Conceptual execution consists of iteratively applying the PPEM (or consulting the oracle) to specific tasks and transferring the prediction results within the simulation. Conceptual execution proceeds through the execution graph to produce estimates for the time, energy, and utilization needed to execute the given problem on the specified HPC system. Real world phenomena, such as congestion, corruption, and retransmission, are included. By varying the HPC model, our conceptual simulation method can provide critically needed insights to optimize HPC systems.

For more information, contact [email protected].

​New Conceptual Simulation Approach Delivers Efficient Design Space Exploration and Performance Prediction for High Performance Computers

New Conceptual Simulation Approach Delivers Efficient Design Space Exploration and Performance Prediction for High Performance Computers