NVIDIA Corporation (NVDA) BofA Securities 2024 Global Technology Conference (Transcript) (2024)

NVIDIA Corporation (NASDAQ:NVDA) BofA Securities 2024 Global Technology Conference June 5, 2024 3:30 PM ET

Company Participants

Ian Buck - VP

Conference Call Participants

Vivek Arya - Bank of America Securities

Vivek Arya

Hope everyone enjoyed their lunch. Welcome back to this session. I'm Vivek Arya. I lead the semiconductor research coverage at Bank of America Securities. I'm really delighted and privileged to have Ian Buck, Vice President of NVIDIA's HPC and Hyperscale business. Ian has a PhD from Stanford. And when many of us were enjoying our spring break, Ian and his team were working on Brook, which is the precursor to CUDA, which I think is kind of the beating heart of every GPU that NVIDIA sells. So really delighted to have Ian with us.

What I thought I would do is lead off with some of my questions, but if there's anything that you feel is important to the discussion, please feel free to raise your hand. But a very warm welcome to you, Ian. Really delighted that you could be with us.

Ian Buck

Thank you. Look forward to your questions.

Question-and-Answer Session

Q - Vivek Arya

Okay. So Ian, maybe let's -- to start it off, let's talk about Computex and some of the top announcements that NVIDIA made. What do you find the most interesting and exciting as you look at growth prospects over the next few years?

Ian Buck

Yes. Computex is an important conference for NVIDIA and for AI now. The world's systems, data centers, they get their machines, their hardware from this small island of Taiwan, of course, the chips as well. So they're a very important ecosystem for us. A year ago, we introduced MGX, the system standard for deploying and building GPU systems in a variety of shapes and different sizes of workloads.

And that opportunity are now the standard for building servers, start with a CPU motherboard. But now how many GPUs do you need? What's the configuration? What's the thermal profile and where they want to fit? And what workloads they want to run on, has diversified the whole system and server ecosystem.

So it's been really fun to watch that explode and the number of companies that are able to take advantage of it. We talked, of course, about Blackwell and what it will do, our next-generation GPU. We talked as well about our roadmap, what we're doing today with the Hopper platform, it's our current architecture, what we'll be deploying with Blackwell and our Blackwell platform, including upgrades to Blackwell in 2025.

And then we also publicly talked about for the first time, what's after Blackwell, Rubin platform, which will come with a new CPU and GPU. So a lot of interesting, exciting from an infrastructure and hardware standpoint. On the software side, seeing the adoption of all sorts of different models and we can talk more about it.

One way we're helping is by packaging up a lot of those models so it will be Llama, Mistral, Gemma into containers to help enterprises adopt them. And they know they're getting the best performance, the best inference capabilities in a nicely packaged container that they can then tailor and deploy anywhere.

These are what we call NIMs, which are the NVIDIA inference containers that have those models. And we're educating and making it available to all the enterprises. So it's been a very exciting Computex, and if you ever get to go, it's quite available.

Vivek Arya

So let's start looking at this from the end market, right? You work very closely with all the hyperscalers. From the outside, when we look at this market, we see the accelerator market was like over $40 billion last year, right? It could be like over $100 billion this year. But help us kind of bridge this to what the hyperscalers are doing, right? What are they doing with all the acceleration and all this hardware that they're putting in? Is it about making bigger and bigger models? Like where are they in that journey of their large language model outwards and how they're able to monetize them?

Ian Buck

Yes, we're still very much in the beginning of that AI growth cycle, really. It's odd to say, AI has been around for, at least accelerated AI, it's approaching 10 years now from the first AlexNet moment. But what we're seeing as the different hyperscalers are evolving and figuring out what their contributions and what their value is, there's three obvious thrusts. One, of course, is infrastructure, providing infrastructure at scale for the world, AI start-ups and community in the cloud to go consume.

And you see all the major startups partnering or getting access to the technology, often not just one but multiple, or they switch and move around, figure out who can help them scale and grow their capabilities or who's bringing the GPU market first to where can they get their GPUs? That's infrastructure.

And of course, infrastructure is hugely profitable. Every dollar a cloud provider spends on buying a GPU, they're going to make it back at $5 over four years. The second thing we're seeing is growth in token serving, just building and providing AI inference, whether it be a Llama or a Mistral, or Gemma and providing it to the community's users to serve tokens. Here, the economics are even better. So every $1 spent, there's $7 earned over that same time period and growing.

The third, of course, is building the next-generation models. Not everyone can do that at scale. And those models are getting very large and the infrastructure is getting huge. So we're seeing them build amazing next-generation capabilities and scale. And of course, that's not just a physical building and putting things in the building, but actually figuring all the software, the algorithms and training at that scale over that many billions and trillions of tokens and the software that has to go into that.

I can talk all day about the software for training it going from 10 to 20 to 50 and now 100,000 GPUs. And in the next click out, people are going to be talking about 1 million. So that is -- all three of those are happening at the same time as they are figuring out they're developing the next-gen model, serving those models to the customers renting infrastructure. I guess the fourth would be using -- deploying AI for themselves, Copilot being an example. You can see multiple services on Amazon are now all being served -- being backed by AI agents or AI capabilities, some directly, some indirectly, you may not know.

And of course, companies like Meta offering services or their news feed or whether it be recommenders or elsewhere deploying AI into just -- which raises all the numbers across the board quite -- they've been a great partner for NVIDIA of new model.

Vivek Arya

So you mentioned that AI, the traditional AI or CNNs, right, they have been around for a long time. We used to talk about like tens of millions of parameters, and here, we are knocking on the door of what, almost two trillion parameters. Do you see a peak in terms of when we kind of say, okay, this is it, the model sizes? Now we might even go backwards, that we might try to optimize the size of these models, right, have smaller or midsized models. Or we are not yet at that point?

Ian Buck

Yes. So the evolution of AI models is quite interesting and they map to the workloads. Obviously, initially, it started with ImageNet or image recognition. What is this a picture of? It doesn't tell really you where it is, just what a picture of, then we can put boxes around what it is.

And then we can identify every pixel, and they got more and more intelligent. When we got to language and LLMs, that was another quick upping of intelligence because language is different than just image.

CNNs was about understanding what's inside of the picture. You and I do that, but also dogs and cats and even bugs have to recognize from vision what they are. Language is uniquely a step above intelligence. You have to understand what the person is saying, what they mean, the context, which goes right to the understanding of overall human understanding and knowledge. Take it a click further, you got to do generative AI.

Not only do you need to understand what was said but be able to maybe summarize it, but actually synthesize to create new things, whether it be a chatbot, open chatbot conversation like you do in WhatsApp with Meta AI or coding, generating code that works correctly, that wants to be a certain style or being able to generate a picture from test and do multimodal.

So it's a little cheesy to say but it's understanding the, what do we need? What are we saying? What is the context? Can we get -- can the AI reproduce that and generate that? I talked to the AI scientists.

They do the studies and they don't see their models being overtrained yet. They can continue to take more and more tokens. The tokens, of course, is part of the limiter. You do have to have a massive data set in order to train a foundational model from scratch. Once you do that, though, and you build 100 billion, 400 billion, 1.8 trillion, 2 trillion, that model becomes the foundation for a whole litany of other models.

You can take a Llama 70B and produce a set of an 8B underneath it, depending on how much level of accuracy or comprehension you want to provide it or your context length. You can then take that foundation model and fine-tune it and optimize it to generate code Llama so you can like basically have a coding Copilot. That all starts from a foundation model. Each one of these are not individual efforts. They take a foundation and they go deploy it everywhere.

Like Microsoft would do with GPT and Copilot, turning one giant foundation model into 100 different assets that activate a whole bunch of other products. That's the value of foundational models when they get built.

They built a large, capable one and they fine-tune, build smaller ones that can do certain tasks or others and create that opportunity. In terms of where is it going next, they haven't seen the limit in terms of learning, the things we were still learning. Probably logical as a human brain is 100 trillion to 150 trillion depending on your worth of neurons, connections in your head. We're at about two trillion now in AI.

Vivek Arya

So 50x more?

Ian Buck

At least. We haven't gone to reasoning yet. That would be the next step. How do you reason about actually doing reasoning or come up with conclusions and a logic chain? That's thinking.

Vivek Arya

But is there matching returns, do you think, at some point? Or just the cost of training, can that get to a level where it kind of puts an upper limit on how large these models can be?

Ian Buck

The cost training is actually .is definitely a factor. Getting the infrastructure is a factor in how fast we can move the needle here. In addition to the science, in addition to the software, to the algorithms, to the complexity, to the resiliency, doing things at this scale requires an end-to-end optimization. It's not just about the hardware.

You try to get a -- maybe a simple analogy as to build a bigger -- to turn your company into more revenue. You just don't wagon in 10,000 or 50,000 employees. You have to build a company in order to grow and be more intelligent. So you just can't wagon in 50,000 or 100,000 or 1 million more GPUs. You do have to work and you have to build the capability to be able to keep those GPUs all working together or that company to build -- to work together to build things even bigger.

That is the day-to-day life that I tend to lead, to working with those biggest customers, to figure out not what scale they think they can achieve from an infrastructure standpoint but also the software and algorithms. Is there a limit? We haven't hit one yet. Certainly, the 100,000 is happening now and the 1 million is being talked about for -- and we're seeing -- we're walking up that curve right now.

Vivek Arya

All right. Do you find it interesting that some of the most frequently used and the largest models, one is developed by a start-up and one is developed by somebody who's not a hyperscaler, right? So where do you think kind of the biggest hyperscalers are in their journey? Are they still in early stages? Are they hoping to just kind of leverage the technology that's been built up? Or do you think they have to get things going also and that can provide growth over the next several years?

Ian Buck

Yes. I think the Lighthouse models, everyone recognizes the benefit of having a foundational model as an asset. It's something they can leverage for their business. Some of them make it public, some of them don't. That's a business decision, a strategic decision. And the -- but the innovation is still happening. That's the interesting thing.

So that could be there's so much change happening in AI design and model design and how to train these things at scale. Students at Berkeley are becoming -- or professors at Stanford turn into startups, turn into -- discover a new kind of attention mechanism, some modification of transformer, or they do something totally different than transformers and doing state-based algorithms. We are not -- the AI architecture, the model architecture is constantly evolving.

Just this year, we -- last year, we started seeing an explosion of the mixture of expert style model, which changed the model architectures to allow it to scale to 1 trillion. Mixture of experts is when previously like GPT-3, it would have a transformer-based AI neural network, followed by another one followed by another one and many layers of one AI after another to build 173 billion parameters.

If you look at models like GPT-4 or others, they're a mixture of experts. They're on the order of 1 trillion parameters. And one of the ways they achieve that actually is there's not one model stacked on top of each other, one neural network stacked on top of each other. They actually have multiple neural networks running across each layer. In fact, if you look at the 1.8 trillion-parameter GPT model, it has 16 different neural networks all trying to answer their part of the layer.

And then they confer and meet up and decide what the right answer is. And then they share with the next 16, like this room is going talk -- you guys confer, next group you guys confer, and hand it off the row. And that mixture of experts allows each neural network to have its own specialty, own little perspective to make the whole thing smarter. What's interesting about that is that not only the models get bigger, they're smarter, it actually changed the way we do computing because now -- we used to have one neural network, one big matrix, multiply mass followed by another, followed by another. Now we have lots of them, and they're communicating all the time.

Each one of you have to talk to everybody else and confer and then share your knowledge with the next row. So you see that in the systems and designs and how the architecture is evolving. So one of the reasons in the Blackwell architecture, we did this multi-node NVLink or NVL72. We expanded our how many GPUs you can connect in one -- with NVLink. Up to 72 still allow for that mixture of experts.

So these -- everyone could be communicating with each other and not get blocked on IO. So this evolution is constantly happening, the model architectures. So you can see start-ups figuring this out. They take advantage of it. They partner with a hyperscaler or partner with the cloud, with help from NVIDIA get -- move the needle in a next phase of what AI looks like from a model, for what it can do and how it could implement in the architecture.

So when I say early stages, that is kind of what it feels like. These last two years has been explosion of mixture of experts. We have new model architectures that are starting to show up. It's influencing how we deploy them, the software we write, the algorithms, all that. And then on top of it, that's going right into NVIDIA's roadmap, what we're building, how fast we can build. It's one of the reasons we're accelerating our roadmap. That's because the world is constantly in AI evolving and changing and upgrading.

Vivek Arya

Got it. Now I'm glad you brought that up in terms of the 1-year product cadence because one aspect of this is we are seeing these model sizes. I've seen one statistic that says they are doubling every six months or so. So that argues that even a 1-year product cadence is actually not fast enough. But then the other practical side of it is that your customers then have to live in this constant, right, flux in their data center. So how do you look at the puts and takes of this 1-year product cadence?

Ian Buck

So the overall performance improvement comes as a compounding of hardware connectivity, algorithms, and model architecture. When we do optimizations, we look at it holistically. Obviously, we still are improving the performance of our Ampere generation GPUs. We've been -- we've improved the performance of our Hopper GPUs by 3x. Actually, when we first introduced Hopper in end of '22, and running Llama on doing -- Llama was GPT inference.

From the end of '22 to today, I think we've improved Hopper's inference performance by 3x. So we're continuously making the infrastructure more efficient, faster and more usable. And that gives the customers who have to now buy at a faster clip, confidence that the infrastructure that they've invested in is going to continue to return on value and does so. The workloads might change. They may -- they'll take their initial Hopper and build the next GPT. They may take the next Blackwell and build the next GPT.

But that may be the infrastructure they use to continue to refine or create the derivative models or host and serve it. I think one of the interesting things, our products used to be much more segmented to inference and training. You use a 100-class GPU, the big iron for training. The smaller PCIe products for inference, due to the cost or size of model. Today, the models and the infrastructure that could be used for training at scale is also frequently used for inference, which I know is difficult for this community to digest and figure out what's inference and training.

I'm sorry about that. But that is the benefit, their capacity. They can invest and they know they can use those GPUs for both inference and training and get continued value and performance throughout. So with the increased pace, it's sort of natural. This market can certainly support the continued improvement. The feedback cycle working with NVIDIA allows us to invest and build new technologies and respond and enable.

And then it becomes a job of managing transitions and execution and supply and data center to make sure that we have -- everyone has the GPUs they need. I certainly talk to startups. Some of them are on A100 still and they're enjoying it. They're looking forward to their H100s. Others that have their H100s, they're looking forward to their Blackwells.

And they're all getting the benefit of the performance and algorithms in the platform that we provide. So the demand -- it's one way to meet the demand is to continue to support and drive the whole ecosystem. And that just creates more players, more invention and moves the ball forward, which is a rising tide for us and, I think, the whole market.

Vivek Arya

There's always this question about what are the killer apps driving generative AI, right? Yes, we understand that a lot of hardware is being deployed. So what are the top use cases, right? You mentioned that customers deploying NVIDIA hardware are seeing that 4x to 5x the return on their investment. But what are the use cases that are actually being -- obviously, that's over a 4-year period, right? So what are the big use cases that you think are the most promising right now?

Ian Buck

The baseline -- obviously build that foundation model, it gets shown off and enjoyed as a chat bot that you all can interact with. But then they go and they turn those into -- incorporate them into their products. Copilot is a good example. Taking a GPT model and tailoring it so that you're -- it can help you create that PowerPoint, write that e-mail, type and modify or create that Excel expression. That's really hard to figure out.

I've certainly used that. Certainly, developers. I see Microsoft, I think, has spoken publicly about how much productivity their software developers have increased because they've made that model available internally to help via Copilot for their own software products. So AI has accelerated their entire product portfolio, like everything. And I don't know how you model or measure that benefit, but they're -- in terms of headcount or software developers but also a rate at which they can roll out new technology, new products.

In some ways, generative AI is making all the old boring products exciting again and reassessing their value, their ASP and the revenue they can make on their existing installed base. That's just Microsoft. It's happening across all of those industries and why every company wants to be deploying and use benefit of AI or generative AI is because they see the opportunity to improve the productivity of their own existing products and installed base and users, and, of course, provide the additional value the generative AI content or client or agent can make as a feature add, not just make the existing features better.

That's what we see in the enterprise. Certainly, other areas is, in generative, AI just the content creation, the new companies, the new start-ups that are providing those key technologies, key enablers, which are going to be either consumed or purchased by more of the established software ecosystem.

We are certainly seeing AI now work its way into finance, into health care, into telco markets, big adopters. Obviously, these are companies that have -- see high benefits have a lot of data and are often technology-savvy, ready to adopt. But every industry will. The other area I think we're seeing is -- well, AI in general is recommender systems.

It's not as talked about or it's sexy but it's certainly a big part of inference is deploying AI to understand the content, present the right content, the right user or the -- make sure the wrong content is not shown to the wrong user, and also see the opportunities to make their platforms easier and the click-throughs higher and the revenues as a result faster. Those recommender systems are leveraging all the learning -- the generative AI work that's been done specifically on the content to increase revenues.

Vivek Arya

Got it. I wanted to talk about AI inference and get your views on what is NVIDIA's moat in AI inference? Because if I say that inference is a workload where I'm really constraining the parameters, where I'm optimizing sometimes more of a cost than performance, why isn't a custom ASIC the best product for AI inference, right? I know exactly what I need to infer, right? And I can customize it and I don't need to go after -- I don't need to make the same chip work for training also. So why isn't a custom ASIC the best product for AI inference?

Ian Buck

Yes, it's a good question and one that gets asked a lot. First off, you're often your best architecture for inference is the one you trained on. If you know how training works, training works by starting with a blank neural network or maybe one that's been pretrained, a general foundation model, but you're going to train it to be a better call center agent or codeveloper for a software program.

So you're starting with that, you're training. Training starts with inference, you actually send the tokens through and you ask the AI to predict what it should do, and you tell if it's right or wrong, and then you send the errors, the different -- why I got the question wrong or why I got it right and reinforce those neurons.

But it always starts with that forward pass and it's a big part of training. So that builds a natural transition from training to inference. The second thing is that the models are always evolving and changing over time. Think about it. You're going to invest $1 billion, $5 billion, $10 billion in a data center infrastructure for inference.

That asset is going to last you four or five. I think they're still just now retiring some of those older Keplers and Voltas in the data center. The more that asset can do all of those models, the ones that are being important today but the ones that are going to be important, that show up tomorrow and after that and after that, you know that you can make that investment and have that capability, that platform infrastructure that's going to continue to produce the revenues that we talked about. And so was -- and hardware takes a long time to build, don't forget that. We're excelling on our roadmap but that's only because we can have multiple hardware and architectures working in flight in parallel as well as trying to compress it.

But it's hard to compress. And the execution there is very difficult and tapeouts to productions are very long, and they're longer than the innovation cycle of AI. So that's why programmability is important.

That's why having an architecture that is or platform that everybody is using, not just at your company, but every other company and across the academic ecosystem, the start-up ecosystem, you know that it will -- as the model evolves, as the techniques and technologies evolve, that, that investment is going to continue to track with forward innovation, not just what we have now. Now of course, if you know you have one model, you know you're going to put it in one device, you know where it's going to go, and that may be the right answer.

And NVIDIA is not trying to win every single cycle of AI. If your doorbell needs an AI and you know exactly what to build, please. But the opportunity at the data center scale is clear that it has to -- we have to have that level of investment. They want to make sure that they're getting the last full years and get the full value out of it. And they see the benefit of NVIDIA, where we're the one AI company that's kind of working with every other AI company. They see the benefit of that investment getting the software and the algorithms and the new models over time.

Vivek Arya

Practically, the large customers, do they have separate clusters for training, separate for inference? Or are they mixing and matching? Are they reusing them some type for training, some type of -- practically, how do they do it?

Ian Buck

It depends. Certainly, there are some geographic benefits and differences between training and inference. Most folks don't -- can do training anywhere in the globe. So we see big training clusters being put up. Usually, it's a function of where they can get the data center space and can tap into the grid, having good access to power, and the economics there is very important.

But training doesn't need to be localized. If you've ever used a remote desktop that's halfway around the world, you can feel the lag and latency. Training sign for that. But inference you kind of do need to be near the user. Some inference workloads might be fine.

Batch processing inference, fine. Doing a longer chat bot might be okay. But if you're doing gen AI search, you're asking your browser a question of information you want to get an answer back, you want that answer quickly. If it's too slow, then it just -- immediately your quality of service plummets. So we often see that training clusters are put where logically, they can get the power and capability.

Inference tends to be either in those same clusters, then they'll divide it up. Just like the clouds are providing regions, folks will put a GPU in every one of those regions, and they can then serve it with both training and inference. I would just say the training part is a little bit more specialized because super big clusters can be wherever that makes the most sense for them to build and invest in the building and capability. But they are largely using more and more the same training and for training and inference, the same infrastructure. That again goes to the value of that investment.

They know they can be using it for training and flip it over to inference. If you saw when we launched our Blackwell GB200 NVL72, we talked a lot about inference because those models are getting big. They need the -- they got to run that mixture of experts work through, and they also have the same infrastructure they can be used for training as well. And that's very important when we launch our new platforms. At the same time, we also make sure that they can take that -- the same building blocks and vary the sizes and capabilities.

GB200, the NVL72 is designed for trillion parameter inference. For the more modest-sized 70B or 7B, we have an NVL 2, which is just two Grace Blackwells tied together, which fit nicely in standard server design and could be deployed anywhere including at the edge, the telco edge. Telco often will have a -- they'll have a cage. Cage has 100 kilowatts, you can't exceed that. So the metric is what kind of GPUs or what kind of servers can I put there that makes the most sense to serve as many models like at the edge? You'll do something different than you'll do for a big OpenAI data center or some such. And that's why we have both kind of products going.

Vivek Arya

Got it. Since you have been so intimately involved with CUDA since its founding, right, how do you address the pushback that people have is that a lot of other software extraction is being done away from CUDA and it will make CUDA obsolete at some point, that that's not really a sustainable moat for NVIDIA. How do you address that pushback?

Ian Buck

Yes. I think moat is a complicated word and what does it mean? The innovation -- what makes the platform useful is how many developers it has on it? How many users it has on it? What is the installed base of people can get the access to that next AI invention can be -- make sure it's compatible with that architecture, what it can do.

These foundations, these new next-generation models that show up, they're often -- they're not academic exercises. They are designed to what the limits of the capabilities of the platform can provide at the time that they are trained. And many of the models that we are enjoying today are actually trained, like they started training like two years ago.

There's a lag, unfortunately, in terms of how long it takes to -- when NVIDIA announces a new GPU to when the data center gets set up to when this. They're obviously thinking we're explaining what we're building to try to shorten this process, but it is directly influencing the scale of what they can build not just the number of GPUs, but how much we make -- whatever generation, we also improve the performance on a per GPU basis by X factors.

Blackwell is like 4x to 5x better at training per GPU than Hopper was and inference as well. So they're also thinking about 30x better on inference for trillion parameter models. And so that sets the bar for them how big a model and then they look at the architecture of the NVLink and what they can build. So it is a symbiosis between what we're building, what they're inventing and then keep riding that wave. And that really helps us define -- helps them define that feature of the next-generation AI model.

Vivek Arya

Got it. How is the outlook around Blackwell as we look at next year? First of all, do you think that because of the different -- the power requirements that are going up significantly, does that constrain the growth of Blackwell in any way? And what's sort of the lead time in engagements between when somebody wants to deploy, right, versus when they have to start a discussion with NVIDIA, i.e., how far is your visibility of growth into next year?

Ian Buck

So what we -- good question, actually. So there's one question about how far forward we work with all of our not just our biggest customers, but all those AI visionaries that are building those foundation models. And then what does that ramp look like in Blackwell specifically. So we stated recently in our earnings that Blackwell has now entered into production builds. We started our production.

The samples are now going -- will go out this quarter, and we're ramping for production outs later this year. And then everything -- that always looks like a hockey stick, you start small and you go pretty quick to the right. And the challenge, of course, is with every new technology transition comes -- the value is so high, there's always a mix of a challenge of supply and demand. We experienced that certainly with Hopper. And there'll be similar kinds of supply/demand constraints in the on-ramp of Blackwell certainly at the end of this year and going into next year.

In terms of the horizon, though, that conversation on Blackwell transition and ramp and what it is and what to build starts two years in advance. The slide that was announced in Computex of what is our Hopper platform, Blackwell for the first time you guys saw Rubin. That has been a conversation for quite some time with those big customers. So they know kind of where we're going and the time scales. It's really important for us to do that.

Data centers don't drop out of the sky. They're big construction projects. They need to understand what is a Blackwell data center look like and how is it going differ than Hopper. And it will. The opportunity we saw with Blackwell was to transition to a denser form of computing, to put 72 GPUs in a single rack, which has not been taken to scale before.

We have experience with it. I also do that HPC and supercomputing side. So we've seen those kinds of scale but those are one-off systems. Now we're taking it and democratizing and commoditizing that supercomputing technology to take it everywhere. Very challenging.

And of course, we've been talking to them about it for two years now and not just the hyperscalers but also the supply chain. In Taiwan, for example, the people that are building the liquid cooling infrastructure, the power shelves, the WIPs, which is the cables that go down into the bus bars. The opportunity here is to help them get the maximum performance through a fixed megawatt data center and at the best possible cost and optimized for cost. By doing 72 GPUs in a single rack, we need to move to liquid cooling. We want to make sure we had the higher density, higher power rack, but the benefit is that we can do all 72 in one NVLink domain.

Connect them all up with copper instead of having to go to optics, which adds cost and adds power. And every time you add cost and power, you're just taking away from a number of GPUs you can put in your 10-, 50-, 100-megawatt data center. So that is driving us towards reducing cost, increasing density.

So when you look at a Blackwell, you may say, well, it's really hot, that's actually going to be significantly reducing -- improving the toll throughput of a fixed power data center. So there's a strong economic and technology driver to transition to more denser and more power efficient and more -- and next-generation cooling technologies than just air.

Water is a fantastic mover of heat. Your house is built with insulation that is nothing more than just trapping air. Air is actually an insulator. It's not a good transfer to heat, but water is excellent at it. If you ever jumped from a 70-degree pool from a 70-degree air, it feels really cold.

That's because water is sucking the heat right out of you. It's really good at moving heat around. And that efficiency goes right to more GPUs, more capabilities and denser, more capable AI systems.

Vivek Arya

Got it. So customers who are deploying Blackwell, are they replacing the Hoppers or Amperes that were already in place? Or are they putting up new infrastructure? Like how should we think about kind of the replacement cycle of these?

Ian Buck

Yes, they can't build the data centers fast enough, so what they're doing is they're decommissioning or accelerating their CPU. If you have a bunch of -- they obviously still have quite a -- we're not in every data center. Obviously, the vast majority of systems in hyperscaler CPU systems. So if you want to make space and you can only build so fast.

Vivek Arya

Taking out traditional servers.

Ian Buck

They can retire their old legacy systems that maybe they've just left, not upgraded. They can accelerate the decommission of the older CPU infrastructure. They can also accelerate it. So actually, we've had a lot more conversations with all the hyperscalers on this old workflow that was on CPUs kind of didn't have any people on that worked is in sort of sustaining.

They're going back in and like, okay, actually, we should probably go accelerate this old database workload, this machine learning workload that we've left alone for so many years because we can take 1,000 servers and do what 1,000 servers were doing with just 10 GPU servers.

And I just freed up hundreds of racks and megawatts of power. So there is a -- it's not just the new data centers that are being built. What they're doing is actually making space for more and more GPUs to come in. Of course, they're not retiring the Hoppers. They can't stop Hopper and they can sell every Ampere and they can sell some of the earlier generation Volta systems in some cases or keep them around. What we're seeing is the combination of both building new and retiring or deprecating or accelerating their CPU infrastructure.

Vivek Arya

Got it. And lastly, InfiniBand versus the Ethernet, right? So most of the clusters that NVIDIA has built so far have primarily used InfiniBand. What is the strategy behind the new Spectrum-X product because there is a large incumbent that is out there? Just like NVIDIA is a large incumbent on the compute side, there is an incumbent on the switching side. So what did make customers adopt your product versus staying with the incumbent?

Ian Buck

Yes. So first, the -- we support all different kinds of networking. Certainly, Amazon has their EFA networking, which we support and execute toward. Each of the hyperscalers has different flavors of their own Ethernet or networking or some have taken the decision to get the best possible, which is the InfiniBand platform, and you see that with Microsoft, and they're matching our performance 1:1 and benchmarks like MLPerf, connecting 10,000 GPUs with InfiniBand. And we have a 10,000 GPU cluster.

They have a 10,000 GPU cluster they get the same score, they're getting the best on MLPerf. Ethernet is tricky in the sense that the standard Ethernet infrastructure is really important. It is a data center scale networking technology. It has a huge ecosystem of software capabilities for managing at scale. Ethernet's important.

Ethernet was originally designed for sort of that north-south use case. You have a server, wants to talk to the rest of the world, a CPU core wants to talk to the rest of the world. That's what Ethernet did. You can talk across the whole data center but it was for the traditional use cases. When you get to AI, it's a different kind of problem.

It's kind of a supercomputing problem. You have these billions of dollars' worth of GPU infrastructure all trying to train a model like Llama 3. And now we're going to 100,000 all trying to train even a bigger model. And so that east-west traffic is incredibly important. If one of these packets kind of slows down, one of these links get lost or has a blip, the entire infrastructure slows down because it's waiting for the slowest guy.

And InfiniBand was designed to optimize that and make sure the performance was the max possible so everyone could talk to everybody else. And that's the difference between designing for east-west versus north-south. You don't care if you're -- if you have a slightly slower connection than the person next to you, everybody is happy. But if your connection slowed everybody down, that would be a problem. And if you look at it from a data center standpoint, that's billions of dollars of wasted GPU, like billions, just idle.

The whole thing goes down. So that's what Spectrum-X is addressing, to provide a standard Ethernet -- support of the standard Ethernet ecosystem, which many hyperscalers and clouds and everybody is standardized on. But add the technologies that support the east-west traffic, the adaptive routing, the congestion control technique, all the stuff that you need to do to make sure that you have that deterministic performance east-west so that that AI can progress, your GPUs stay utilized, and it's a really hard problem. We've been accelerating our Spectrum-X roadmap as a result. We still have InfiniBand, which is obviously very important in supercomputing and for the ultimate performance.

But to provide that kind of Ethernet that can go and train giant models, it requires that technology to be embedded, integrated, and provided in an Ethernet ecosystem. So that's what Spectrum-X is.

Vivek Arya

Do you see the attach rate of your Ethernet switch going up? Because I think NVIDIA has outlined like several billion dollars of which includes the NICs as well, right? Even before Blackwell starts, right?

Ian Buck

There's a 100,000 GPU training project that's being put together right now, which is it will be Spectrum-X.

Vivek Arya

And then as Blackwell rolls out next year, do you see your attached rate of Ethernet?

Ian Buck

Yes, you'll see a mix of both Ethernet and InfiniBand.

Vivek Arya

Got it, okay. Terrific. With that, thank you so much, Ian. Really appreciate your insights. Thanks, everyone, for joining.

NVIDIA Corporation (NVDA) BofA Securities 2024 Global Technology Conference (Transcript) (2024)

References