Nvidia Tesla C1060 GPGPU - Double precision
Feb 19th, 2009 by admin
The Nvidia Tesla board is a 240 core GPGPU which supports massively parallel programming through a threading model. Each of the cores has a speed of a 1.5GHz , and can carry out multiple operations per clock cycle giving a theoretical peak performance of 1 Terra Flop. Memory latency is low due to a 120GB/s bandwidth and the 30,000 concurrent threads help to hide any additional latency. The board uses up about 160W and the cost of a “personal supercomputer” with 4 of these boards is less than £10,000. This is a low cost, exceedingly high performance supercomputer.
We are now in the second generation of these boards. The development language and environment CUDA (which actually refers to the whole architecture) is maturing and there is a good community around it. Nvidia also continue to actively support open CL. This second generation board has more onboard memory and most importantly for a number of applications now includes a full 64bit double precision capability. Sounds goo, almost perfect but what does this mean in reality?
Well, double precision is certainly supported. And each of the processing units contains one DP processor alongside the 8 SP processors. This means that there are only 30 DP units available to you if you are doing DP only calculations. By my calculations that makes this card perhaps up to 8 times faster than an equivalent 4 core CPU (taking into account faster clock speeds) , but benchmarks on DGEMM show much less.
The calculations are specifically only 64 bit - you may be aware that on X86 architecture these calculations are usually carried out in 80 bits and then rounded down - which means that your numbers will almost certainly be different if you run on this card (despite teh IEE 764 standards adherence) .
For many applications this won’t matter, but for some it does (finance especially) . Of course, you can benefit from the CUDA architecture by massive speed-up if you can work out clever ways of exploiting the DP, SP mix and introducing calculation tolerances and a degree a pragmatism about what a floating point number actually represents. And hopefully the Nvidia team are building more DP units into future versions of the card.
Watch this space to find out about CUDA developments and the ways in which Nvidia tesla continues to differentiate from multi-core and on board gpu technology from Intel and AMD.














