![]() |
|||
|
|||
|
:
News
: Archives
|
|
Exuberant Optimism? |
|
|
Intel and AMD after Conroe |
|||
|
|
|||
|
by Josh Walrath |
|||
|
Recently we have been seeing a wash of interviews, articles, and opinions that are all pointing in one direction; Intel will recover the performance crown with their Conroe/Merom/Woodcrest series of chips. Whether this is from analysts, editors, or freelance writers, it seems that many of them agree wholeheartedly that the next generation of Intel chips will be the second coming and the death of AMD and their marketshare gains. While I agree that this next generation of Intel chip is going to be good, I think that most people fail to see the entire picture, as well as the ecosystem that AMD has slowly been putting into place with their products. The 4 Issue Core The one aspect that people can not get enough of with the Conroe series of chips is the 4 issue core. Many people are confusing this with either quad core or massively multi-threaded. This is not the case. AMD’s Athlon series, from when it was just the Athlon to the current X2 Athlon 64, are all three issue cores. Now, it is not technically true that AMD is a 3 issue core, as it is actually a 9 issue core. AMD has broken up the functionality of the core into three areas: integer, floating point, and SIMD. Each of these three separate units are three issue, but for now we will just address the single 3 issue integer unit. AMD did some extensive modeling, and found that three was a pretty optimal number. Now, in the first generation of Athlon’s, AMD was seeing a utilization of about .5 issues per cycle, and of course theoretically this number should be 3. However, we don’t live in a perfect data world, and not all data and instructions are the same. Currently AMD is saying that with the latest Athlon 64 cores they are seeing a utilization of around .97 issues per cycle. This is a significant increase in efficiency from the first generation of Athlon’s, but still far from fully utilizing the three issue core. AMD has worked very hard with the cores to increase this efficiency by addressing multiple areas. The first is the branch predictor. On the first Athlon it was fairly simple as compared to other units at the time (namely the K6 branch predictor, which was actually quite amazing). AMD has improved this functionality throughout the years, and now nobody is talking about it at all. It is simply no longer an issue. AMD has improved the branch predictor in the Athlon 64 to such a degree that it really is no longer a selling point on a bullet sheet. It simply cannot get much better than it actually is. The second area is caches. AMD’s L1 and L2 caches, while looking quite similar throughout the years, have been improved upon with every core. There was in fact a rather large (comparatively speaking) increase in L1/L2 performance with the change from the Winchester to the Venice core. Again, not much was said about this, and AMD didn’t have to trumpet that it was there. Users saw it as part of a 5% to 10% increase in per clock performance over the older core. Not much was done to the internal workings, but the cache structures have been improved upon without much fanfare throughout the years. The third area addressed is that of the integrated memory controller. This is probably one of the largest factors that has led to the greater utilization of the three issue architecture. Not all data and instructions can be held in the caches of the Athlon 64, and so to get that data the CPU needs to go to main memory. Having the memory controller on die, and running at full CPU speed, memory fetch and write commands are executed very quickly. Because the CPU doesn’t have to wait for these instructions to go to an external memory controller running at chipset speed, some operations have 3X lower latency than a traditional FSB memory controller (such as that on the Athlon XP or the Pentium 4). This decreased latency from when the CPU needs data in main memory to that data being delivered allows the 3 issue core to not stand idle for hundreds of cycles. The final improvement comes from the ccHT links. The Cache Coherency HyperTransport links provide a huge amount of low latency bandwidth between multiple CPU’s and the system. This is one of AMD’s greatest advantages over Intel’s FSB architecture, as it provides for “glueless” multiprocessing. AMD continues to use the advanced MOESI Protocol to handle data and instructions in the caches of multiple CPU’s in one system. This protocol allows the different processors to modify data in their caches without first having to write it back to main memory. This is a more efficient protocol than Intel’s MESI, which requires any change to owned data to be written back to main memory for sharing in the multiple processors. The ability of the Operton to quickly snoop the other data caches for the correct data, as well as transmit data, makes any multiprocessing system more efficient. HyperTransport is not only fast, but it is full duplex so synchronous read/writes can be accomplished. These “four issues” that AMD have addressed with their Athlon 64 architecture has increased the overall efficiency of their processors, in both single and multiple core operations. Also, the internal crossbar of the dual core Athlon 64’s helps to improve overall communication between the cores and allow intelligent allocation of main memory resources as well as the HyperTransport links. The entire design of the Athlon 64 should still be considered next generation as compared to even Intel’s upcoming products. Taken as a whole, the Athlon 64 core design and infrastructure are amazingly well thought out, not to mention fast and flexible. From 10,000 feet the core of the latest Athlon 64’s looks very similar to that of the first Athlon, but in fact the two are very different. So saying that the Athlon 64 is merely the old Athlon core with 64 bolted on is false. The changes between the cores, the base technology, and the implementations are far different from the first generation to this last one. That is almost like comparing a four pixel pipeline GeForce 256 to the four pixel shader GeForce 7300 GS, while these both utilize a four pixel pipeline, the technology behind these functional units has changed considerably throughout the years.
If you have found this article interesting or a great help, please donate to this site.
Copyright 1999-2005 PenStar Systems, LLC. |
|||