: News
: Reviews
: Editorials

: Archives
: About
: Contact
: Advertising
: Privacy
: Links
: Forums




NVIDIA SLI Interview


A Few Words from Chris Daniel


by Josh Walrath

     Recently I was given the opportunity to submit a handful of questions to the product marketing manager of NVIDIA's SLI Program.  Chris Daniel was kind enough to get these in-depth answers back to me in a very timely manner.  My questions are in regular type, and Chris' are in blue and italics.

For all the ladies reading this (all three of you), this is Chris.

The AGP 3.0 specification supported two AGP slots, why did NVIDIA never pursue dual AGP video solutions in terms of both video cards and motherboard chipsets?

AGP did not provide the bandwidth to do this effectively. The high-bandwidth bus architecture offered by PCI Express enabled NVIDIA to launch its SLI technology and reach scaling of up to 2X. 

Can you quickly take us over the advantages and disadvantages of SLI as compared to a supertiling solution?

NVIDIA has been working on SLI technology for many years.  In fact, NVIDIA has been designing scalability logic into their GPUs since GeForce 3.  However, an effective multi-GPU technology is non-trivial, requiring extensive research & development in both hardware and software. NVIDIA designed SLI technology with the flexibility to support multiple rendering algorithms, offering optimal performance for different types of applications.  Our two primary algorithms are dynamic split frame rendering (SFR) and alternate frame rendering (AFR), but there are others.

Supertiling is another algorithm that may be used to scale performance. The problem with supertiling is that it isn’t able to scale geometry like AFR can. Supertiling also has a problem with over-fetching textures (as did scanline interleaving) because of the number of textures that cross the tiling boundaries.  This means that each GPU would have to fetch the same texture for neighboring tiles.  In SFR, we have one edge where textures can cross boundries.  In supertiling, you have many, many more edges and the problem is multiplied linearly with the number of edges. These double-fetched textures can eat up valuable bandwidth. 

When developing SLI technology a number of different algorithm options were investigated. In fact NVIDIA looked at supporting supertiling as well, however it simply wasn’t the optimal choice for delivering the highest scaling performance. 

Geometry scaling is a buzzword that has been making the rounds lately. Can you quickly explain what geometry scaling entails, and how does SLI address this? In comparison, how would a supertiling architecture address this issue?

NVIDIA’s AFR algorithm scales geometry processing since the graphics cards work in parallel, each processing the entire geometry for alternate frames. AFR also scales pixel shading and fill rate. This is the reason why NVIDIA SLI has been able to achieve 2X scaling on the latest graphics intensive applications such as 3Dmark05 and Unreal Engine 3. Supertiling, in its current implementation, divides one frame of geometry into separate tiles.  Unlike AFR where dedicated GPU geometry processing occurs for each frame, supertiling is unable to send ONLY the relevant geometry to each of the pipes.  With Supertiling, each GPU is forced to do all of the geometry transform calculations, resulting in no geometry scaling.  The result is a less-efficient architecture whereby each GPU is forced to spend extra cycles performing the entire geometry processing.

While AFR can easily balance the loads (since each frame is alternated) on two video cards, how much of a driver overhead exists when performing SFR and figuring out which card should render which portion of the frame?

None, NVIDIA's patented dynamic SFR algorithm is free from driver overhead.

What advantages does the over-the-top connector in SLI bring to the architecture?

The SLI connector enables inter-GPU communication of up to 1GB/s, improving performance by consuming no bandwidth over the PCI Express bus. The connector also provides perfect, digital compositing of pixel data that is asynchronous with the rendering rate, eliminating the need for a separate compositing chip or separate video mixing box/card.


Next: PCI-E Bandwidth


If you have found this article interesting or a great help, please donate to this site.


Copyright 1999-2005 PenStar Systems, LLC.