I don't normally like posting speculations and rumors, but I like the semiaccurate website and forums and this rumor has popped up a number of occasions that the CPU would turn out to be higher than 1.6Ghz closer to 2. It's a great article overall.
As you can see there is a lot on this chip, 8 AMD Jaguar cores organized in to two blocks of four each with a 2MB 16-way L2 cache shared among all associated CPUs. As you might expect the two L2s are coherent but not directly addressable by CPUs in the other group of four. Each Jaguar core has 32K of 2-way L1 I$ and 32K of 8-way L1 D$ as well. AMD’s Jaguar only goes up to four cores so Microsoft had to come up with a mechanism to both ensure coherency among the two clusters of four CPUs and the rest of the system including the GPUs. This is where a lot of what Microsoft added to the system came in.
The most notable change here was that the data paths between the CPUs and the North Bridge/system fabric were massively beefed up. The blue arrows above are for coherent memory accesses, yellow for non-coherent traffic and all the major blocks are coherent with each other. If you think about the sheer volume of coherency data that needs to go between the two CPU blocks, Microsoft probably had to beef up the L2 to NB links to almost match that of the L1 to L2 links. While specifics were not given out, SemiAccurate was told it was “significantly wider” along with beefed up buffers and deeper queues. Don’t discount this as a minor change, it is both critical to the system performance and a very complex thing to do. It will be interesting to see how Sony did their variant if they ever give a talk on the PS4 architecture.
Main Memory Speeds, Feeds, and Coherency:
The CPUs connect to four 64b wide 2GB DDR3-2133 channels for a grand total of 68GB/sec bandwidth. Do note that this number exactly matches the width of a single on-die memory block. One interesting thing to note is that the speed of the CPU MMU’s coherent link to the DRAM controller is only 30GBps, something that strongly suggests that Microsoft sticks with Jaguar’s half-clock speed NB. If the NB to DRAM controller is 256b/32B wide, that would mean it runs at about 938MHz, 1.88GHz if it is 128b/16B wide.
SemiAccurate would be very surprised if it was 128b wide, wires are cheap, power saving areas not. Why is this important? Unless Microsoft’s XBox One architects are masochists that enjoy doing needless and annoying work they would not have reinvented the wheel and put an arbitrarily clockable asynchronous interface between the NB and the CPU cores/L2s. Added complexity, lowered performance, and die penalty for absolutely no useful upside is not a good architectural decision. That means the XBox One’s 8 Jaguar cores are clocked at ~1.9GHz, something that wasn’t announced at Hot Chips. Now you know.
The CPU NB also has coherent links to the GPU MMU and I/O MMU, something you would expect on any system that takes GPU compute work seriously. AMD has their HSA/HUMA architecture coming with Kaveri in short order but XBO is based on a design ~1+ generations older so no advanced AMD CPU/GPU coherency here. Luckily Microsoft is on the ball here and put their own mechanism in which they would unfortunately not go in to detail on. What SemiAccurate has heard about it says they did a pretty impressive job but until it is fully disclosed we can’t comment with authority. Lets just leave things at, “From what we can tell it looks good”.
Another thing to notice is a rather odd direct and coherent path between the AV In block in the GPU/accelerator area to the Audio DMA unit in the I/O area. The Audio DMA unit also has a direct link to the Audio Out/Resize/Composting block, both of these are one way. Since one is an inbound unit and the other is an outbound unit that kind of makes sense, if there is need to go the other way the two can talk via the CPU MMU. While this may not make much sense on the surface, much of the XBO’s audio functionality is devoted to processing the Kinect’s data stream so high bandwidth and low latency are kind of necessary. More on this later.S|A
Note: This is the first part and only covers part of the system. More including the GPU, embedded memory, and audio systems to come.