Apparently, NVIDIA’s next core, the GT300, set to introduce DirectX 11 compatibility into NVIDIA’s GPU lineup, will feature no less than 512 (!) processing cores arranged into sixteen 32-core clusters. In comparison, NVIDIA’s current single core champion, the GTX 280/285, using the GT200(b) core, features 240 stream processors in ten 24-core clusters. To put this into perspective, using the 65nm process, the GTX 280’s core was a gargantuan 1.4 billion transistors over a 576mm2 die surface. Here it is compared to a dual-core Intel Penryn (what the current crop of Core 2 Duos are based on):
Its like comparing any of us to a professional porn star. Image courtesy of Anandtech.
The 55nm GT200b revision found in the GTX 285 shrunk this down to 470mm2. The GT300 will reportedly use a 40nm process for further size reduction and power savings, while allowing for increasing clock speeds. However, it’s not just the number of processors that are different. The GT200 and all previous NVIDIA GPUs utilizing the unified shader architecture since the G80 (e.g. 8800 GTX) used SIMD (Single-Instruction Multiple-Data) units, while the processors found in the GT300 will be MIMD (Multiple-Instruction Multiple-Data).
What this might mean is that while each cluster (actually stream multiprocessors, as each cluster is further divided into these SM’s with their own caches) in the previous architecture was only able to operate on a single instruction at a time, the stream multiprocessors in the GT300 will be much more versatile, able to operate on different instructions from its cache at the same time in an asynchronous fashion. While the GT200 and its predecessors had fairly fine granularity in its instructions measured by clusters of stream processors, the GT300 will take it even one step further and achieve granularity on a per-processor basis, so potentially, every processor on the GT300 could be operating on a different instruction, if the situation called for it. Of course, I’m just pulling this out of my ass based on a few lines from an online rumor report.
What this means for your gaming is even better load balancing between different computations necessary for rendering bleeding edge graphics, such as pixel and vertex shading. With the advent of on-GPU physics processing through NVIDIA’s PhysX, this becomes even more important (bouncing boob physics), for the GPU to be able to divide its attention between rendering graphics and calculating physics, for all the jiggling and flopping around you could desire at the fastest framerates.
What this might mean for General Purpose computing on GPUs and CUDA or OpenCL applications is finer control over how the GPU issues and executes threads. Currently, calling a CUDA-enabled function to run on the GPU means issuing it in blocks of threads, which is then managed by the GPU, only giving the programmer abstract thread IDs and global synchronization instructions to work with. Since all the clusters across the GT300 will be identical, the programmer might now be able to work within the cluster and perhaps be able to order different processors in the cluster to execute different functions, maybe through something like a processor ID, while the GPU still maintains control over which cluster to issue that batch of instructions to.
Since this is all a rumor though, who knows what final product NVIDIA has prepared, which is also rumored to perhaps appear in Q4 2009. This particular tidbit about MIMD processors on the GT300 is from TechConnect Magazine.