Nvidia has surprise-announced their new Vera Rubin architecture (no relation to the recently unveiled telescope) at the Consumer Electronics Show in Las Vegas. The new platform, set to reach customers later this year, is advertised to offer a ten-fold reduction in inference costs and a four-fold reduction in how many GPUs it would take to train certain models, as compared to Nvidia’s Blackwell architecture. The usual suspect for improved performance is the GPU. Indeed, the new Rubin GPU boasts 50 quadrillion floating-point operations per second (petaFLOPS) of 4-bit computation, as compared to 10 petaflops on Blackwell, at least for transformer-based inference workloads like large language models. DO you think that focus on GPU loses the bigger picture ?

