This was a server/professional release but the GP100 GPU that's the core of that reveal is likely to be the top tier of GPU coming to consumers too (because building a 600mm^2 GPU is hard and you want to sell as many as possible to make back R&D - also the CUDA core design process means building completely different designs for consumers would split the architecture weirdly and so far we generally see generational advances as a single wave). [More readable (technical) reporting of the facts]
That 600mm^2 is the same die area as the outgoing GTX 980 Ti uses only this a shrunk process (each transistor is much smaller so you get a lot more on every chip and they should hopefully produce much less heat to do the same work - the total power required seems to back up it may be 80% faster for the same power draw and high-end GPUs are generally limited by power use). The RAM is upgraded to the latest and greatest HBM2 technology, and (after all those nVidia Maxwell designs that basically ignored 64-bit floating-point performance to focus purely on FP32 (gaming) computational power) FP64 is back. An interesting aside to that is the new chip not only has a lot of FP64 units but the FP32 units can be used for 16-bit FP maths doing two computations per tick - this is interesting as some calculations in graphics can be done at 16-bit without it all going to Hell and this gives game engines a strong incentive to try it out (because you literally can double the calculations you do per second "for free").
And this isn't even the fastest version of that GP100 design (56 of the 60 blocks are enabled - this is done so a small defect in production doesn't force them to throw away the chip as they can disable the block with the defect and so increase yields to make it commercially viable; as time goes on the yields get higher and they can start shipping fully-enabled versions of the design). Here's texture map units, raster units, bandwidth, and floating point (maths that happens when rendering a scene) perf (approximately):
|GP100 (No GTX model yet)||GM200 (Titan X/GTX 980Ti)||GM204 (GTX 980/970) [VR spec]||GK110B (Titan Black/GTX 780 Ti)|
|TEX units||240||192 / 176||128 / 104||240|
|ROP units||??||96||64 / 56||48|
|Memory Bandwidth||720 GB/s||336 GB/s||224 / 196 GB/s||336 GB/s|
|FP16 GFLOPS||21,200||6,144 / 5,632||4,612 / 3,494||5,120 / 5,045|
|FP32 GFLOPS||10,600||6,144 / 5,632||4,612 / 3,494||5,120 / 5,045|
|FP64 GFLOPS||5,300||192** / 176**||144** / 109**||1707 / 210*|
|Power demand||300 Watts||250 Watts||165 Watts / 145 Watts||250 Watts|
* Mainstream consumer Kepler models had disabled FP64 performance. ** Later, Maxwell removed most FP64 units from the design to get the most FP32 units into the die area (because the 20nm die-shrink failed to arrive so GPUs missed a jump which is why we're going from 28nm designs to a 16nm design).
AMD will respond with the Polaris design (also expected to come in broadly around this scale so it'll be interesting to see how they run for gaming, how they price them, how they build smaller mass-market designs for the cheaper cards, and if there's anything cool being added with these designs to take advantage of that very high memory bandwidth).
So if you were thinking of buying a fancy GPU in the next months, this is what you're going to be looking at getting hold of when the 16nm new designs arrive for consumers. Even if you're looking for a not-fancy new PC GPU, these big numbers may indicate the mainstream models will also be getting radically faster (although that HBM2 RAM is expensive so maybe they won't also see masses more memory bandwidth to go with the faster processors).