I have recently discussed NVIDIA’s acquisition of Arm and partnership with VMware. Those were recent because they were in lead up to GTC 2020, which has been happening this week. I’ve watched a few sessions concerning artificial intelligence (AI). It wasn’t until today that I saw something intriguing. Charlie Boyle, Vice President and General Manager, DGX Systems, presented with a top of “AI Infrastructure Trends for 2021.” It had one very interesting point indicating a movement of customers to the implementation of AI applications.
While I knew he is in charge of DGX, NVIDIA’s heavy duty server line aimed at AI, I’d hoped the general title would present broader corporate information. That didn’t really happen. For instance, I thought there might be a mention of Arm but there was nothing. Still, two pieces of information were indicative of more actual applications rather than sandboxes or training.
That leads me to a quick soapbox tangent. I’ve heard too many hardware folks, focused on the data center and training, use the contrast between training and inference. Well, what do that think they’re training. It’s something called an inference engine. Training is inferencing, it’s just that the inferences change as more data is ingested. The proper contrast is between training and runtime, live applications.
Why that matters, even as a tangent, is that Mr. Boyle’s message made clear that NVIDIA was finally adding a real focus on run-time inferencing. While he had a lot of technical information that was focused on heavy training, he repeatedly mentioned runtime. That indicates that many of their main customers are moving applications out of training into heavy usage. That’s the market beginning to mature.
Also related to that was the one area in which he was clear clients were involved in runtime: natural language processing (NLP). I’ve pointed to NLP moving from “nice to have” into “must have”, and this mention shows that movement. It’s becoming standard in the industry, and has almost exclusively moved to deep learning. That means that large, B2C companies need serious data center support for the NLP interface systems.
The one, more technical, message, that I found very interesting is the change in GPU management. The standard has been that a GPU is focused on a single application, a single network. Charlie Boyle spent some time talking about a new software feature called the multi-instance GPU (MIG). NVIDIA can now manage up to seven different instances running on a single A100. While he spent far more time talking about how that can help developers, he did briefly mention runtime. Let me expand upon that.
Runtime deep learning isn’t changing the algorithm and doesn’t take the cycles (which drive both power consumption and latency of the inference engine) required in training. The engines are lighter, but they must also handle more individual transactions in place of the massive data batches used for training.
MIG means more than a company can run multiple AI driven application on the same server. What it really means is that, for instance, an automated call system or chat application, can have multiple instances running in smaller resource footprints. The effect is that end-user latency is better because a single instance isn’t overburdened, and that OPEX is more controlled because of more efficient use of the GPU. The benefits to the scalability of runtime inference are clear and MIG is a very nice enhancement to NVIDIA’s DGX line.
The things mentioned in that presentation show a clear indication that the market is moving. If NVIDIA wasn’t seeing demand, this wouldn’t be happening. While the message of the technical folks might still be a bit too focused on training, this presentation mentioned runtime more than I’ve previously heard. The industry application of AI is still relatively new, but this news does show clear movement towards support for real world applications.