: News
: Reviews
: Editorials

: Archives
: About
: Contact
: Advertising
: Privacy
: Links
: Forums

 

 

 

 

 

 

 

 

 

May 15, 2007

Post HD 2900 Launch Thoughts - Josh

Where exactly does one start?  Well, first off I am excited to eventually get my hands on one (or two) of these cards.  That may be in a couple of weeks, but it will eventually happen (allocation was tight for launch).  I think it is a good competitor for NVIDIA for the price point, and it certainly gives the 8800 GTS a run for its money.  But where exactly did AMD go wrong, and where did they go right?

I think the first obvious problems are heat production, power consumption, and noise.  The second set of issues is that a 700 million + transistor chip running at 740 MHz is only competitive with the 8800 GTS which is a 680 million transistor chip with portions of it disabled, and the primary core running at 500 MHz.  Admittedly, the scalar units are running at 1.2 GHz, and there are 96 of them, but the potential of the HD 2900 XT should eclipse the 8800 GTS.  Now, to give the AMD part a fair shake, NVIDIA did offload the display portion of the chip into the NVIO1, so when considering those numbers the overall transistor count of both solutions are very close together.  So why is the HD 2900 XT lagging so far behind the 8800 GTX in most applications, especially when we consider that massive 106 GB/sec of memory bandwidth?

The easy and most obvious answer is that NVIDIA has created a more efficient architecture that is able to do more work per clock.  There are fewer scalar units, but they are clocked twice as fast.  These 128 scalar units are arranged in 8 SIMD units for the 8800 GTX, while ATI puts all 320 of their units in 4 SIMD units.  This has the result of finer granularity for NVIDIA, and with their thread dispatch processor can seemingly keep these units a lot more busy than what ATI can do.  ATI clusters their scalar units into groups of five, and only one of those can do more complex functions than MADD.  NVIDIA on the other hand can do more complex instructions with each scalar unit, even though there are almost three times the amount of units in the HD 2900 XT running at well over half the speed of the NVIDIA units.  In pure GFLOPS the HD 2900 XT has the 8800 GTX buried, but that is obviously not the case in realworld performance.

AMD's R600 can be considered a VLIW (very long instruction word) architecture.  As such there are only a few things that AMD can do internally to accelerate this process.  The R600 features a command processor and a setup engine to get the work going, and to schedule the pixel/vertex/geometry instructions into threads that the multiple SIMD units can handle.  The "Ultra Threaded Dispatch Processor" then sends these threads to the scalar groups.  The primary problem here is that it is very hard to keep all 320 of these scalar processors busy all the time.  In worst case scenarios a total of 64 scalar processors could be in use.  Contrast this with the G80, and the way they have it set up is that the granularity is so fine that nearly all of the 128 scalar units are busy all the time.  In certain situations many of the threads being sent to the scalar units are identical, so the chip is doing redundant work at times.  But at least those SPs are being kept busy all the time.  In a "all MADD" environment the HD 2900 XT has a huge performance advantage over the 8800 GTX, but no application that I know of is composed of 100% MADD instructions.  This is admittedly a gross simplification, and there are a lot of things going on at the chip level that we simply do not know, or I simply do not have time to cover here.

Because of the way the R600 is set up, the realtime compiler in the driver has to do a lot of work to make sure the workload is efficiently managed.  At first one would think because of this the driver would be heavily CPU dependent, and to a certain degree it is.  AMD has offloaded much of this work from the CPU to the GPU in DX10.  Still, the quality of the compiler will directly affect the overall performance of the chip.  AMD is working quite hard to improve the compiler, and they are hoping to see the performance from their cards steadily improve over the next few months.

Another area where I think AMD has a slightly skewed design philosophy is their render backends and texture units.  The R600 features 16 of these a piece setup in groups of four.  Now, these are not the ROPS and texture units of old, but do handle a lot more work per cycle than previous X800 and X1x00 chips.  AMD obviously thought that these would be more than sufficient given the clockspeed of the chip.  I have a nagging suspicion that these could be a major bottleneck in most current applications.  For quite some time the engineers at ATI have had a bias that leans towards more arithmetic functions vs. traditional pixel and texture operations.  We see this really start to take hold with the X1900 and X1600 chips, which had far more math units than traditional ROPS and texture units.  While shader effects are becoming more and more common, I still believe that NVIDIA has a better mix of these units in their lineup.  The 8800 GTX has 24 ROPS and 32 texture units.  While the functionality of each architecture cannot be 100% directly compared, the overall amount of work per clock definitely favors NVIDIA, even with significantly slower core clock speeds that these units are running at.

In the 8800 series NVIDIA took the huge step of going full custom with designing their stream processors.  This means that standard cells are thrown out the window, and the units are essentially hand laid out.  The electrical properties of these units are finely tuned, and as such can run at very high speeds without eating a lot of juice.  On the 8800 GTX 128 SPs are running at 1.35 GHz, and the majority of them are kept busy most of the time.  Because a very large portion of the G80 is custom, and though those parts run at high speeds, overall the chip consumes less power than the R600 which has very few custom parts as compared.  The basic design decisions NVIDIA made years ago have paid off greatly in this generation of products.  Moving towards more custom parts in the GPU is a trend we will see continue, and one that is definitely needed.  Hopefully in the R700 series from AMD we will see more custom designs.

Even though there are a lot of negatives in the HD 2900 XT, I still like what AMD has done.  It is a feature packed part, and it is at least competitive overall in its price range.  I like the idea of more programmable AA, better video decode, and the GPGPU features.  What I don't like is the heat, power, and noise.  Once I get my hands on one I will better able to refine my opinions, and hopefully give a good enough answer to help those that are on the fence with their buying decisions.  Still, it is good that AMD finally was able to come to market with a card and give NVIDIA some competition in the DX10 space.

May 14, 2007

AMD Releases the HD 2000 Series - Josh

AMD is announcing the HD 2000 series of products.  These include the HD 2400, HD 2600, and HD 2900 series.  Only the HD 2900 XT will be available today for hard launch, as the other two families of products based on the 65 nm RV610 and RV630 chips will be showing up around the first week of July.

Today's launch will showcase AMD's new high end offering in the 3D world.  The HD 2900 XT is based on the R600 chip which is fabricated on TSMC's 80HS process (speed enhanced 80 nm).  This chip features a host of new, as well as improved technologies.  The first major point is that the chip features 64 SIMD units comprised of a total of 320 scalar units.  This is a unified shader design, so these units can handle pixel, geometry, and vertex data.  The chip itself is made of 700 million + transistors running at a blistering 742 MHz.  It does not feature multiple clock domains like the NVIDIA G8x series, so the shader units run at core speed.  It also features 16 improved ROPS which increases the performance of HDR + AA as compared to the earlier X1800 and X1900 models.  These also feature an improved AA unit which allows upwards of 24X AA.  16 texture units are also present, and these are also improved from the previous versions.  The jump in texture filtering quality is not as large over the older X1800 and X1900 models, but it certainly puts it on a level with NVIDIA's latest 8x00 series.

The HD 2900 XT is also the first consumer 3D graphics chip with a 512 bit memory interface.  The GDDR-3 memory runs at 825 MHz (1.65 GHz effective) for a mind number bandwidth of 106 GB/sec.  Internally it sports a 2 x 512 bit ring bus, giving a kilobit of bi-directional communications with 4 ringstops and 1 PCI-E memory controller.  Internal data bandwidth is simply tremendous with this chip.  Currently the 2900 XT features 512 MB of memory.  We were looking forward to a full 1 GB of GDDR-4, but due to decisions internally the retail market will most likely not see a card sporting that specification.  The primary reason for that is price/performance.  To achieve the price AMD was looking to hit, it could not afford a full GB of fast GDDR-4.  AMD did consider making the higher end HD 2900 XTX which would feature GDDR-4 memory, but when looking at overall performance of this high end card against the 8800 GTX and Ultra, AMD felt that most consumers would shy away from a $600 HD 2900 XTX.  This does lead us to the sweetest point of this release, and that is the price of the HD 2900 XT.  AMD is offering this product at a mouth-watering $399 US.  Performance primarily looks to be somewhere between the 8800 GTS and 8800 GTX, and at times it proves faster than the 8800 GTX.  Overall I believe consumers will view this as a very good deal.  Especially when there is talk of the price of the board going down to $350 in pretty short order.

ATI/AMD has produced a very good part, but unfortunately for them the NVIDIA GeForce 8800 series has been out for six months, and NVIDIA's top end product is still faster overall.  The HD 2900 XT is definitely not perfect, as it eats power (upwards of 215 watts), runs hot (80C + when running games), and has a cooling fan that can spin up to annoying sound levels.  On the positive side it features improved Avivo functionality (scores near perfect in HQ DVD Standard and HD benchmarks) and has HDMI output with full audio content (but will not replace your soundcard).  It certainly does make for a fascinating product at the $400 US level.  Overall I consider it a very exciting product, and one that should sell well for AMD in the near future.

The RV610 and RV630 are both 65 nm parts that are expected around the first of July.  These are based on the same architecture, but cut down a bit.  The RV610 has a total of 40 scalar units while the RV630 has 120 units.  They also feature cut down ring busses, fewer ROPS and texture units, and 64/128 bit memory busses.  Indications around the industry though are pointing to the RV630 based HD 2600 XT as being potentially the fastest midrange card for this generation.  Time will tell, and of course AMD needs to get those chips out in volume before the back to school season for integration into product lines.

Is history repeating itself?  It seems like a fair question.  If we remember back to the X1800 days, ATI was supposed to bring that product out in Spring of 2005, but it ended up at October.  This product was then replaced by the much improved X1900 series just a few months later.  Now we have the HD 2900 XT at the top end, based on the R600 chip which is itself very late.  Undoubtedly AMD wants to even up performance as soon as possible, and the whispers of the 65 nm R650 chip arriving in August are fresh on the wind.  So it looks like the HD 2900 XT may be following in the footsteps of the X1800 XT a little more closely than we would have liked, but at least the card is highly competitive from a price perspective with NVIDIA's latest products.  So for right now, a $399 HD 2900 XT is a very fascinating product for consumers.

Here is a list of a few reviews:

Tech Report

Guru of 3D

Bjorn3D

Driver Heaven

May 11, 2007

AMD Strutting Tech - Josh

This week in the Bay Area AMD held a private gathering of select journalists and other guests to show off a few nice tidbits.  For the first time, AMD showed working Barcelona chips in both 1S and 2S configurations to the public, and had them running actual applications.  Apparently the 2S (which has a total of 8 cores) was able to encode a 1080p stream in near realtime, which apparently is quite a feat.  Not being an encoding expert myself, I do realize that often a DVD rip/encode of a 480p disc can take upwards of 4 to 8 hours on a single core Athlon 64 machine running at 2.2 GHz, depending on how it is encoded (or at least one of my friends tells me).

In terms of product release, we are expecting AMD to officially announce these new cores in the middle of Summer.  Product will be available for servers and some workstations by the end of Summer, along with the potential Agena FX (to be named Phenom FX) as the high end gaming/enthusiast platform.  Native quad core desktops will not be reaching the public until December though, which is a bit disappointing.  Kuma, the dual core variant, will likely be available around that time as well.  While AMD's new Barcelona based chips are almost ready for mass production, it will be a while before AMD can ramp up production and at the same time slow down production on 65 nm Athlon 64 X2s.

The second interesting piece of tech they showed off was a 45 nm wafer made in Dresden.  This wafer contained dies composed of SRAM and logic, but it was not made clear what exactly these chips were for.  Previously all 45 nm wafers that AMD showed off were actually produced by IBM in their East Fishkill Fab.  So, apparently Dresden now has the ability to make 45 nm test wafers.  AMD has also stated that they will start 45 nm production at the beginning of 2008, and have actual product available in the middle of 2008.  This means that AMD is working hard to close the nm gap between themselves and Intel.  Previously AMD was around 14 months behind Intel with the jump to 65 nm production, now hopefully they can get that figure down to 6 months or so.  This is one area where AMD absolutely needs to compete, as Intel finally has a core architecture that is competitive with what AMD can put out.

It would have been nice to see AMD be able to offer their Barcelona, Agena FX, Agena, and Kuma products by the end of this summer, as it would have given them a nice window of opportunity to compete with Intel and potentially stop the marketshare loss.  Perhaps more importantly they would have been able to increase their margins and start digging themselves out of the financial hole they are finding themselves in.  Many are expecting Q2 results from AMD to be not as bad as previous, as Q1 suffered largely because of AMD stuffing the channel with processors during Q4, which lead to far lower demand in Q1.  There is also the ATI portion finally shipping out good quantities of R600 silicon to its partners.  Unfortunately, the cash cow of ATI graphics, which is the RV6x0 series will not be available for some time yet.  That will come in as Q3 revenue, but from what we are hearing the 65 nm RV6x0 series have already had great interest from OEMs and SIs.  These products could be the short term saving grace for AMD, but we won't know that until we see actual product tested and for sale.

Apparently AMD really got down to the nitty gritty of the combination of GPUs and CPUs, and what they are starting to say really made me think that I could have hit the nail on the head with this article.

May 9, 2007

Optimism on AMD's Barcelona and Overall Situation- Josh

Last year I wrote an article about what I thought was exuberant optimism pertaining to Intel's upcoming Conroe.  It turns out I was wrong and the Core 2 Duo was a tremendous performer.  Now I am tempted myself to start talking optimistically about AMD's upcoming Barcelona based processors!  The news that is slowly starting to leak out is very positive about what we can expect.  In the past few years AMD has rarely talked about the performance of their upcoming chips, but they are starting to talk more and more about Barcelona.  This is rather uncharacteristic of AMD, but it is something that they probably should have been doing a while ago.  Of course, management probably took the safe route and decided not to hype it up, especially if it came out to be a dog in terms of performance and upper limit clockspeed.

Now that AMD is sitting on B0 samples of their native quad core processors, the news leaking out is that AMD is just plain giddy about not only the performance, but the expanded clockspeed envelope that they are supposedly seeing.  Leaked AMD roadmaps have shown the first Barcelona based cores coming out at 2.3 to 2.5 GHz.  Now we are hearing that AMD could be considering products clocked upwards of 2.9 GHz.  With reported increases in performance of up to 40% over similarly clocked Intel quad core processors in certain applications, we are starting to see that AMD might be very competitive in overall performance if they are able to release Barcelona based processors up to 3 GHz, and in fact perhaps surpass Intel at the high end since the highest clocked Core 2 Quad from them is 2.93 GHz (QX6800).

The issue that AMD is going to run into is that Intel has a very healthy 45 nm process and a processor nearly ready for production on that node.  Penryn and its stablemates are optimized versions of the Core 2 Duo, and as such show around a 5% to 10% increase in overall performance (sometimes much greater due to SSE4 optimizations).  These processors will also have an expanded clock envelope, and we should see parts going to 3.4 GHz in fairly short order.  Intel is supposedly looking at a late Q4 release of these new processors, with wide availability in Q1 2008.  AMD has a 6 month window of opportunity to get their Barcelona based quad cores out in force, and perhaps more importantly they need to get their dual core Kuma based cores out to the OEM and retail space.  If they are able to do this then it could potentially stop their marketshare and mindshare slide, improve their overall margins, and increase the amount of shipping product.

Some time ago I wrote about how AMD was likely to expand its contract relationship with Chartered, and when considering the sizes of the native quad core processors, it is looking like I was correct.  AMD is very likely to send the current Athlon 64 X2 (Rev. G) production to Chartered, and concentrate wholly on getting Barcelona, Agena FX, Agena, and Kuma processors out the door in short order.  AMD will want to produce these larger and more complex parts at their Dresden Fabs to insure quality control.  Chartered will then help support the channel with the cheaper X2 parts clocking anywhere from 1.9 GHz to 2.5 GHz.  I really doubt Chartered's products will reach 3.0 GHz reliably, not so much that they couldn't tweak their process, but rather AMD will not spec their 65 nm X2 lineup to go that high.

AMD is in a budget crunch as well, and to improve their cash situation they are slowing the conversion of Fab 30 to Fab 38.  This means that Fab 30/38 (which is the same building) will not be going to 300 mm wafers as soon as expected, and AMD will be tailing off 90 nm production from that Fab.  At this time it is unknown if AMD will transition Fab 30/38 to 65 nm this year, but if they do it may likely still utilize 200 mm wafers.  At the same time AMD is trying to sell off their 200 mm gear to improve their cash position.  More details about AMD's transitioning of Fab 38 will be given in the next quarterly call.  The extra $2.2 billion that AMD raised through the selling of senior notes will certainly help out their situation, but that is merely more long term debt that AMD eventually has to cover.

This summer could see a big turnaround for AMD as they are looking to be on the verge of releasing their new processor architecture.  If they can successfully transition their high end desktop and server lineup to these cores, they could stop their marketshare slide and really improve their financial situation.  On the ATI side they are close to releasing their next generation architecture, and by the end of summer it looks like the graphics portion of AMD could start making some serious money.

The next four weeks should be very interesting with news and reviews coming from AMD.

May 1, 2007

Super Talent T800UX2GC4 Memory Review

Let's start May off right, shall we?  Today I have posted my review of the Super Talent T800UX2GC4 DDR-2 memory kit.  This stuff features some of the lowest timings of DDR-2 800 DIMMS on the market, and they are able to be bought at rock bottom prices.  See how these puppies perform at stock and overclocked!

            Super Talent has been around for 20+ years now, but until the past two years it was a company that catered mainly to OEMs and SBs.  Super Talent obviously saw the success that Corsair and OCZ experienced in the retail and enthusiast market, and it had all the tools needed to jump into the fray.  Marketing was a problem for Super Talent though, and to rectify the situation they hired Joe James of Tyan and Corsair fame.  Joe has certainly worked hard to put the name Super Talent out there, and it just so happens that the marketing efforts were not all sound and fury.  Super Talent actually had some good products to push forward into this very competitive market.

You can read the entire article here.

You can also sign up for a contest at NVNews which is offering these exact DIMMS to 3 lucky winners.  If you want really nice DIMMS that will perform well at stock speeds as well as overclocked, then don't delay and click here.

April 18, 2007

Post 8600 Release Thoughts - Josh

I am in line to eventually get a few of these cards for SLI testing, so at that time I will be able to make some definitive conclusions from hand's on testing.  Until that point though, I am left reading reviews like the rest.  I have come to some interesting conclusions so far after reading a handful of reviews from different members of the press.  If I were to make one pretty certain comment, it would be "DX10 is an expensive jump."

So what exactly does that mean?  From my point of view it means that the amount of transistors needed to insure not only DX10 functionality, but also solid performance, is set pretty high considering the process technology these chips are made from.  The G84 chip is comprised of 289 million transistors, which correlates with the previous generation high end chip G71 (7900 GTX).  The good news here is that NVIDIA used TSMC's 80 nm process, so they were able to shrink the die size as compared to the 90 nm products.  The G84 is approximately 169 mm square, while the older G71 is 196 mm square.  As a comparison though, the G72 (7600 GT) was a mere 127 mm square.  So, the G84 fits nicely between those two products.  Now we compare that to the G80, which is a monster at 681 million transistors taking up a area of approximately 420 mm square!  Things aren't looking so bad for the G84 in that context.

So it seems it was a supreme balancing act that NVIDIA had to accomplish to have a successful release of its 8600 series.  Due to their decision to go with a 80 nm process they did not have the transistor budget of using the 65 nm process.  We also have to throw in board level costs, which is why NVIDIA went with a 128 bit memory bus running very fast GDDR-3 memory.  I am pretty sure that when cheap GDDR-4 hits the market in quantity, we will see the G84 chips paired with that memory tech as GDDR-3 is phased out, which will further reduce costs on these boards.

So was NVIDIA successful?  Apparently the 8600 GTS performs pretty much on par with the X1950 Pro, and slightly slower overall than the X1950 XT.  Where the 8600 seems to take a hit is with anti-aliasing.  While 32 GB/sec of bandwidth is impressive from a historical standpoint, it is still quite a bit lower than the 47 GB/sec that a X1950 Pro delivers.  Where the 8600 really shines though is in shader heavy content where its very efficient architecture oftentimes overshadows the bulkier X1950 series.  Another aspect is the PureVideo 2 engine, which eventually (hopefully) will be brought 100% online and nicely accelerate HD content across Windows XP and Vista platforms.

As the spiritual successor to the 6600 and 7600 chips, the 8600 series will start out their lives in the $149 to $229 price range.  Over time, once economies of scale are reached, we will see retail prices start to go down.  Eventually these boards will also see sub $150 levels for the high end options, but that won't be for another 9 to 10 months or so.  As such, NVIDIA has built in cost saving features (such as the 128 bit memory) to allow these products to hit those price points.  Hopefully at some point we will see 512 MB cards with GDDR-4 eventually make their way into the lineup, as quite a few current and upcoming titles really choke on 256 MB of memory.

NVIDIA still has a lot of work to do on their Vista drivers, but it seems the hardware supporting DX10 is very capable.  I feel NVIDIA reached a good balance of price/performance with DX10 functionality.  Considering the complexity of full DX10 functionality, it probably was not an easy task.  While there are many out there that are probably unimpressed with the performance of the 8600 GTS and GT, I feel that the overall performance is about where it should be expected at this point in time.  Yes, users will pay a premium for DX10 functionality when comparing this product to the less expensive X1950 Pro, but it is nice that the choice is there.  The 8600 is not a slam dunk for NVIDIA as previous midrange cards were, but it certainly is not a dog and its performance and features may make it a compelling product for a variety of users.  Once prices for the high end 8600 GTS hit the $199 average, I think we will see a greater adoption of these products.  Now we only have to see what ATI/AMD brings to the table in the next month.

Auzentech Adopts the X-Fi

I have had the X-Meridian for testing for some time now, and I have been exceptionally impressed with its quality and functionality.  Auzentech certainly has some skill when it comes to designing soundcards, and now it seems they are putting that skill to the test.  Auzentech will be the first and only company outside of Creative to adopt the X-Fi chip for a product of their own.  The Auzentech X-Fi Prelude could very well be the premier X-Fi based card on the market, as it adopts not only the full X-Fi featureset (up to EAX 5.0 support) but it adds in Dolby Digital Live and DTS Connect support.  These last two features are not present in ANY Creative standalone card.

Auzentech is set to introduce this card in a May/June timeframe, and it certainly sounds like it will be a very impressive board.  One area of great interest is how driver functionality will be implemented.  Auzentech does not have its own dedicated software/driver branch, but rather relies on C-Media to provide drivers for their CMI-8xxx series of cards.  Creative will likely have to handle those duties, and I am curious how DDL and DTS encoding will be handled.  The X-Fi chip has enough DSP horsepower to easily encode those technologies, but will the drivers utilize the DSP for that functionality, or will it utilize a more software based solution and use the host CPU?  There are currently no answers for that at the moment, but the very fact that Creative is allowing a 3rd party sound card manufacturer to utilize their X-Fi chip is a very exciting development.  I personally cannot wait to see what Auzentech adds to this card.

 

 

If you have found this article interesting or a great help, please donate to this site.

 

Copyright 1999-2007 PenStar Systems, LLC.