Tera takes on vector and massively parallel marketplace (December 1996)

Tera Computer's new Multithread Architecture is set to challenge the dominance of vector and massively parallel computing in high performance arena.

Anew name that popped up at the SEG in Denver was Tera Computer, based in Seattle, describing itself as a high performance computer company in the development stage. From a modest stand Tera was testing the E&P market's interest in its Tera Multithreaded Architecture (MTA) high performance, general-purpose computer due out next year, which is said to go beyond massive parallelism in its ease of programming and scalability. The company said that its MTA systems represent a significant breakthrough, offering significant improvements over both parallel vector processors and massively parallel systems. MTA systems are claimed to be the first true shared memory systems that are architecturally scalable. The programmer is freed completely from data layout concerns irrespective of system size. Tera's high performance multithreaded processors provide scalable latency tolerance, and an extremely high bandwidth interconnection network lets each processor access arbitrary locations in global shared memory at 2.8 gigabytes per second. Tera is scathing about massively parallel and vector processors.

cost effective

It says massively parallel systems and workstation networks depend on massive locality for good performance. Applications must be partitioned to minimize communication while balancing the workload across the processors. This task is often impractical or difficult. An MTA system, on the other hand, can accommodate intensive communication and synchronization while balancing the processor workload automatically, according to Tera. The company argues that vector processors are true shared memory systems, but rely on long vectors and massive vectorization for good performance as systems increase in size. Executing scalar code, parallel or not, is seldom cost effective on these machines. In contrast MTA systems optimize both vector and scalar code, exploiting parallelism while retaining the programming ease of true shared memory. The customer's investment in vector parallel software is thereby preserved and new applications and approaches that are better suited to scalar computing become attractive, says Tera, noting that MTA systems represent a new paradigm for high performance computing - scalable shared memory. MTA systems are constructed from resource modules. Each resource module measures approximately 5 by 7 by 32 inches and contains up to six resources: a computational processor (CP); an 1/0 processor (JOP) nominally connected to an 1/0 device via 32- or 64- bit Hl PPI and either two or four memory units. Each resource is individually connected to a separate routing node in the system's 3D toroidal interconnection network.

bandwidth

This connection is capable of supporting data transfers to and from memory at full processor rate in both directions, as are all the connections between the network routing nodes themselves. The 3D torus topology used in MTA systems has eight or sixteen routing nodes per resource module with the resources sparsely distributed among the nodes. In other words, there are several routing nodes per computational processor rather than the several processors per routing node that many systems employ. As a result the bisection of bandwidth of the network scales linearly with the number of processors. Just as MTA system bandwidth scales with the number of processors, so too does its latency tolerance. The current implementation can tolerate average memory latency up to 500 cycles, representing a comfortable margin; future versions of the architecture will be able to extend this limit without changing the programming model as seen by either the compiles or the users.

Click here to comment on this article

If your browser does not work with the MailTo button, send mail to pdm@the-data-room.com with PDM_V_2.0_199612_17 as the subject.

© Oil IT Journal - all rights reserved.