Astera Labs has been a stock with intense battles between the bulls and bears: the bears are pessimistic about the reduced usage of its PCIe retimers in NVIDIA’s GB200 ( as compared to DGX); while the bulls are optimistic that its newly launched PCIe switch product will significantly increase the company's content dollar. Today, I will explain in details about Astera Labs’ content opportunity in NVIDIA’s AI servers.
First, let me explain the usage of PCIe retimers and switches in NVIDIA’s current DGX servers: We know that a DGX server contains a Universal Base Board (UBB) with 8 GPGPUs on it, as well as a CPU board (also known as the head node) with 2 CPUs on it. According to my supply chain research, a standard DGX server is equipped with 8 PCIe Gen5 retimers on the UBB (corresponding to the 8 GPGPUs) and another 8 PCIe Gen5 retimers on the head node in order to match the 8 retimers on the UBB. Some MGX customers may modify the board layout to reduce the data transmission distance between GPUs and CPUs, so as to allow only 4 retimers on the head node, but the standard DGX design is 8+8 retimers. In addition, a DGX server is equipped with two 144-lane PCIe Gen5 switches, which connect the CPU, GPU, and CX7 network cards. Specifically, each PCIe switch connects to one Intel or AMD CPU, using 16 x 2 = 32 lanes; two CX7 network cards, using 16 x 2 = 32 lanes; and four GPGPU cards, using 16 x 4 = 64 lanes, totaling 128 lanes. NVIDIA leaves the remaining 144 – 128 = 16 lanes unspecified for flexible configuration by customers and system manufacturers (see figure below, using an AMD CPU DGX as an example):
Among these, NVIDIA uses Astera Labs’ PCIe Gen5 retimers, with a mass production price of $30–35 per retimer (depending on customer volume); The PCIe Gen5 switch used is Broadcom’s PEX89144, with a mass production price of $400–450 per switch.
Now that I’ve explained the DGX server, let’s take a look at PCIe topology of NVIDIA’s GB200 compute tray:
There might be a common misunderstanding regarding with the diagram above: since Astera Labs announced at the OCP conference that its Scorpio PCIe Gen6 switch would be used in NVIDIA GB200, some investors mistakenly thought the blue ‘PCIe fanout switch’ shown in the left part of the graph was the PCIe switch that Astera Labs talked about. In fact, this is just a PCIe Gen3 switch (with 16 uplinks connected to the Grace CPU and 18 downlinks connected to USB/BMC/Boot/Debug networks), used for managing some miscellaneous and peripheral devices in the compute tray. It is supplied by the American analog chip company Diodes. The standard NVIDIA GB200 reference design actually does not include any PCIe Gen6 switch, and only hyperscaler customers who want to adopt non-NVIDIA CX8 network cards and/or non-NVIDIA Grace CPUs need to add PCIe Gen6 switch in their GB200 compute trays.
We know that Astera Labs launched a 64-lane PCIe Gen6 switch this year, designed to connect CPUs, GPUs, NICs, and NVMe together in the compute tray. According to my supply chain research, each GB200 card requires two 64-lane PCIe switches from Astera Labs. Each switch connects to one CPU, using 17 lanes; one NIC network card, using 16 lanes; one GPGPU card, using 16 lanes; and two SSDs (i.e. NVMe), using 2 x 4 = 8 lanes, totaling 57 lanes, with the remaining 64 – 57 = 7 lanes left idle, which ustomers can configure by themselves as needed (see diagram below):
A GB200 compute tray contains two GB200 cards, so it requires 2 x 2 = 4 of these 64-lane PCIe switches. Additionally, although the standard GB200 compute tray doesn’t require PCIe retimers due to the close distance between CPU and GPU, which are also connected through C2C NVLink rather than PCIe, hyperscaler customers using their custom NICs based on FPGA, with those NICs and NVMe placed on an extended board in addition to the main board of the compute tray, would still need 4 PCIe retimers on the extended board (corresponding to 4 NIC cards).
So far, we have discussed the PCIe topology of NVIDIA’s DGX and GB200. In the following paragraphs, I will talk about the GB200 projects that Astera Labs has already secured and provide an estimate of its content dollar opportunity within the GB200.
Keep reading with a 7-day free trial
Subscribe to Global Technology Research to keep reading this post and get 7 days of free access to the full post archives.