NVIDIA Blackwell Whitepaper Story

In “Clock Canvas Story – The Clock of the Latest Chip is Complicated”, NVIDIA Blackwell was used as an example. Now that Blackwell’s whitepaper has been made public, I will look into it a little more deeply from the perspective of the system, power, and clock that I understand well among its many functions.

Looking at the GB202 above, it contains 24,576 CUDA cores and 192 Ray Tracing cores.
The previous generation Hopper (H100) had 15,896 CUDA cores and 132 Ray Tracing cores, so the number of cores has increased by almost 1.5 times. It’s really overwhelming.

When talking about NVIDIA’s competitiveness, many people mention the CUDA platform, NVLINK, excellent GPU and Tensor cores, and power efficiency.
Of course, I think that NVIDIA is where it is today because these areas are excellent.
However, I would like to talk about it from a slightly different perspective.
NVIDIA is a company that can solve exponentially increasing computing complexity.

Increasing the number of cores by 1.5 times with the launch of Blackwell just two years after Hopper is not a simple increase in numbers.
It is a completely different level from increasing 200 cores to 300. It increased 50,000 cores to 25,000.
This is not a simple expansion. It’s a task that requires designing and connecting not just a few hundred but tens of thousands of buildings, including electricity, water, and road networks, and Nvidia is doing this.

If you think about the fact that there are over 20,000 cores in operation,
how do you make these numerous cores work?
How do these cores access memory?
How do they share data?
How do they supply clocks?
What is the power configuration?
How does the software recognize and control these cores?

In fact, I think this systematic design part is much more difficult than the core design itself.
There are many companies that can implement algorithms, but there are few companies that can solve this level of chip complexity.

In other words, I think the paradigm of semiconductor design has shifted from implementing algorithms to conquering complexity.

“Clock Canvas 이야기 – 최신 칩의 Clock은 복잡합니다”에서 엔비디아 블랙웰을 예로 든적이 있었는데요. 블랙웰의 whitepaper가 공개가 되어있으니, 많은 기능 중에서 제가 잘 이해하는 시스템, 파워, 클럭등의 관점에서 조금 더 깊이 들여다 보려 합니다.

아래 GB202를 보면 CUDA core 24,576개, Ray Tracing core 192개가 들어 있습니다.
이전 세대 호퍼 (H100)가 CUDA core 15,896개, Ray Tracing core가 132개 였으니, 코어 수가 거의 1.5배가 늘어난 것입니다. 정말 압도적이죠.

엔비디아의 경쟁력을 얘기할 때, 많은 사람들이 CUDA 플랫폼, NVLINK, 뛰어난 GPU와 텐서코어, 전성비 같은 점을 듭니다.
물론 이부분들이 훌륭하기 때문에 지금의 엔비디아가 있다고 생각합니다.
하지만 저는 조금 다른 관점에서 얘기해보고 싶습니다.
엔비디아는 기하급수적으로 증가한 컴퓨팅 복잡도를 해결할 수 있는 기업입니다.

호퍼 이후 2년만에 블랙웰을 출시하면서 코어수를 1.5배 늘린 것은 단순한 숫자의 증가가 아닙니다.
200개 코어를 300개로 늘린 것과는 완전히 다른 차원입니다. 무려 만오천개의 코어를 2만5천개로 만든 것인데요.
이건 단순한 확장이 아닙니다. 몇백 개가 아니라 수만 개의 건물에 전기, 수도, 도로망까지 설계해 연결해야 하는 일인데, 엔비디아는 이 일을 해내고 있다는 얘기인 것이죠.

2만 개가 넘는 코어가 동작한다고 생각해보면,
이 수많은 코어에 일은 어떻게 시킬까요?
이 코어들은 메모리를 어떻게 액세스할까요?
데이터를 어떻게 공유할까요?
클럭은 어떤 방식으로 공급할까요?
파워 구성은 어떻게 되어 있을까요?
소프트웨어는 이 코어들을 어떻게 인식하고 제어할까요?

사실 저는 코어 설계 자체보다 이러한 시스템적인 설계 부분이 훨씬 더 어려운 영역이라고 생각합니다.
알고리즘을 구현 할 수 있는 회사는 많지만, 이 정도의 칩의 복잡도를 해결할 수 있는 회사는 거의 없습니다.

즉, 반도체 설계의 패러다임이 알고리즘 구현에서, 복잡도 정복으로 넘어갔다고 생각합니다.

NVIDIA Blackwell Whitepaper Story

Related

Sitemap