SEG High Performance Computing special session

Hess trials GPUs, Barcelona SC on ‘million core’ computers, Stanford—’industry is running scared!’

A 150 plus audience attended the SEG Special Session on high performance computing in Houston last month. Chairman Masoud Nikravesh (CITRIS) noted that ‘15 years of exponential growth has ended’ as microprocessor clock speeds reach their limits. Today, the deal is how to how to take advantage of multi-core architectures—both inside the micro-processor and on the graphics card.

Scott Morton reported on Hess’ seismic imaging effort—which has strong backing from John Hess himself. Hess has investigated co-processors from PeakStream and Nvidia along with more ‘esoteric’ hardware such as digital signal processors, FPGAs and the IBM Cell BE. Early FPGA tests with SRC Computers showed a 10x speedup for wave equation migration (WEM), albeit at a 10x system cost. FPGAs proved hard to program and hard to tune for performance.

Hess has now moved to Nvidia’s CUDA which runs on a CPU host and spawns a multitude of threads on GPUs. Memory management remains a big headache and CUDA is easy to learn and write but harder to optimize. Hess reports a 24x WEM speedup over a CPU. For reverse time migration (RTM), a single GPU is equivalent to 20 Intel Harpertown cores. As CPU and CUDA codes ‘diverge,’ one solution may be OpenCL. Hess’ system is now moving up to 1,200 GPUs, a large percentage of its compute power. Hess uses Tesla boxes with PCI Express interconnect to dual Harpertown CPUs.

José Cela outlined an evaluation of 3D RTM on hardware accelerators performed at the Barcelona Supercomputing Center. Echoing Nikravesh’s introductory remarks on the growing core count, Cela wondered how we will program a million-core computer? These should be arriving within the next 5 years! The key is to reduce power consumption without impacting performance. IBM’s Blue Gene system was built for energy saving from the ground up. But this meant a limited amount of memory per core. This is good for some apps, but not for seismic processing. Accelerators (GPU, Cell) break the memory linkage but now memory movement needs to be done in code—resulting in an increase of program complexity by ‘1 to 2 orders of magnitude.’ Cela noted the standard-less programming landscape—which may merge to OpenCL in the future. This is a critical issue, ‘because code outlives hardware.’ In the Q&A, Robert Clapp (Stanford) remarked that, ‘We have a working FPGA RTM demonstrator—don’t write them off yet!’ Clapp later presented work done on stream programming with FPGAs to conclude that ‘comparing vanilla implementations does not give an accurate measure of different architectures.’

Ryan Schneider from Nvidia’s Acceleware unit was ‘sceptical that FPGAs will have much impact in HPC.’ Particularly because of the relatively low level of R&D compared with Cell/GPU. Schneider claimed that it should be possible to get 100x improvement with GPU vs. the CPU. Nvidia’s ‘Fermi’ architecture is coming soon—with 3.5 billion transistors and a promise of 1400/770 GFlops (single and double precision). Schneider concluded by suggesting that we ‘look out a few years—how are your algorithms going to deal with hundreds of cores? The cost of computing is tending to zero while the cost of coding is rising steadily.’ This touched a nerve with the audience—Clapp remarked that ‘Industry is running scared—we are moving to massively parallel with no software to run on it!’ Another curious issue is ‘code taint.’ This is what happens when a computer guy optimizes a program, after which the scientist refuses to touch it!

John Falsh (Berkeley Lab NERSC) questioned exaflop projections for 2020, noting the implied 100 MW power requirement. For Falsh, the issue is, ‘how to get 1000x without locating next to a nuclear power plant!’ Performance improvement in the next decade will be harder to achieve and program. We need a 100 x energy improvement over the ‘mainstream COTS’ approach. Some suggestions, ‘self optimizing hardware and software, ‘co-tuning’ and leveraging low power embedded COTS technology such as that used in the iPhone or MP3 player.

Finally, Bill Menger (ConocoPhillips) announced the creation of the Society of HPC Professionals. More (but not much more!) from hpcsociety.org.

SEG High Performance Computing special session

Hess trials GPUs, Barcelona SC on ‘million core’ computers, Stanford—’industry is running scared!’

Click here to comment on this article

Click here to view this article in context on a desktop

SEG High Performance Computing special session

Hess trials GPUs, Barcelona SC on ‘million core’ computers, Stanford—’industry is running scared!’

Sign up for occasional emails and subscription information...

Click here to comment on this article

Click here to view this article in context on a desktop