AIME A4004 - Multi GPU HPC Rack Server
The AIME A4004 is the next generation enterprise Deep Learning server based on the ASUS ESC4000A-E12, configurable with up to 4 of the most advanced deep learning accelerators and GPUs for most demanding multi-GPU HPC tasks.
Packed into a dense form factor of two height units, EPYC Genoa CPU performance with up to 96 cores, the latest DDR5 memory, the fastest PCI 5.0 bus speeds and up to 100 GBit network connectivity.
Built to perform 24/7 for most reliable high performance computing. Either at your inhouse data center, co-location or as a hosted solution.
AIME A4004 - Deep Learning Server
If you are looking for a server specialized in maximum deep learning training and inference performance with the highest demands in HPC computing, the AIME A4004 multi-GPU 2U rack server takes on the task of delivering.
The AIME 4004 is basend on the ASUS ESC4000A-E12 barebone which is powered by an AMD EPYC™ Genoa processor generation with up to 96 cores which result 192 parallel CPU threads.
Its GPU-optimized design with high air flow cooling allows the use of up to four high-end double-slot GPUs of the latest NVIDIA generation like the NVIDIA H100, A100 or RTX Ada GPU models.
Definable GPU Configuration
Choose the desired configuration among the most powerful NVIDIA GPUs for Deep Learning:
Up to 4x NVIDIA H100
The latest NVIDIA generation the NVIDIA Hopper H100, the most powerful NVIDIA processor with the highest compute density. The NVIDIA H100 is based on the GH-100 processor in 5nm manufacturing with 18.432 CUDA cores, 576 forth-generation Tensor cores and 80 GB HBM2 memory with the highest data transfer rates. A single NVIDIA H100 accelerator brings 1,5 peta-TOPS fp16 performance. Four accelerators of this type add up to more than 3000 teraFLOPS fp32 performance. The NVIDIA H100 is currently the most efficient deep learning accelerator card available.
Up to 4x NVIDIA A100
The NVIDIA A100 is the flagship of the NVIDIA Ampere processor generation and the current successor to the legendary NVIDIA Tesla accelerator cards. The NVIDIA A100 is based on the GA-100 processor in 7nm manufacturing with 8.192 CUDA cores, 512 third-generation Tensor cores and 40 or 80 GB HBM2 memory with the highest data transfer rates. A single NVIDIA A100 accelerator already breaks the peta-TOPS performance barrier. Four accelerators of this type add up to more than 1000 teraFLOPS fp32 performance.
Up to 4x NVIDIA L40S 48GB
The NVIDIA L40S is built on the latest NVIDIA GPU architecture: Ada Lovelace. It is the direct succesor of the RTX A40 and the passive cooled version of the RTX 6000 Ada. The L40S combines 568 fourth-generation Tensor Cores, and 18.176 next-gen CUDA® cores with 48GB GDDR6 graphics memory for unprecedented rendering, AI, graphics, and compute performance.
Up to 4x NVIDIA RTX 6000 Ada
The RTX ™ 6000 Ada is built on the latest NVIDIA GPU architecture: Ada Lovelace. It is the direct succesor of the RTX A6000 and the Quadro RTX 6000. The RTX 6000 Ada combines 568 fourth-generation Tensor Cores, and 18.176 next-gen CUDA® cores with 48GB of graphics memory for unprecedented rendering, AI, graphics, and compute performance.
Up to 4x NVIDIA RTX 5000 Ada
The RTX ™ 5000 Ada is built on the latest NVIDIA GPU architecture: Ada Lovelace. It is the direct succesor of the RTX A5000/A5500 and the Quadro RTX 6000. The RTX 5000 Ada combines 400 fourth-generation Tensor Cores, and 12.800 next-gen CUDA® cores with 32GB of graphics memory for a convincing rendering, AI, graphics, and compute performance.
Up to 4x NVIDIA RTX A5000
With its 8.192 CUDA and 256 Tensor cores of the 3rd generation, the NVIDIA RTX A5000 is similair powerful than a RTX 3090. However, with its 230 watts of power consumption and 24 GB of memory, it is a more efficient accelerator card and especially for inference tasks the RTX A5000 is a very interesting option.
All NVIDIA GPUs are supported by NVIDIA’s CUDA-X AI SDK, including cuDNN, TensorRT which power nearly all popular deep learning frameworks.
EPYC Genoa CPU Performance
The next-gen Genoa AMD EPYC CPU designed for servers delivers up to 96 cores with a total of 192 threads per CPU with support for the latest DDR5 memory standard.
The available 128 PCI 5.0 lanes of the AMD EPYC CPU allow highest interconnect and data transfer rates between the CPU and the GPUs and ensures that all GPUs are connected with full x16 PCI 5.0 bandwidth.
A large amount of available CPU cores can improve the performance dramatically in case the CPU is used for preprocessing and delivering of data to optimaly feed the GPUs with workloads.
Up to 30 TB Interal High-Speed NVME SSD Storage
Deep Learning is most often linked to high amount of data to be processed and stored. A high throughput and fast access times to the data are essential for fast turn around times.
The AIME A4004 can be configured with two exchangeable U.2 NVMe PCIe 4.0 triple level cell (TLC) SSDs with a capacity of up to 15 TB each, which adds up to a total capacity of 30 TB of fastest SSD storage.
Since each of the SSDs is directly connected to the CPU and the main memory via PCI 4.0 lanes, they achieve consistently high 6000 MB/s read and 4000 MB/s write rates.
As usual in the server sector, the SSDs have an MTBF of 2,000,000 hours and a 5-year manufacturer's guarantee with 1 DWPD.
Optional RAID: Up to 60 TB High Speed NVME SSD Storage
For further storage of large datasets and training checkpoints often additional storage capacity is required. The A4004 offers the option to use its four additional drive bays with a hardware reliable RAID configuration. Up to 80 TB HDD SATA storage or 60 TB fastest NVME SSD storage with RAID levels 0 / 1 / 5 / 10 and 50.
High Connectivity and Management Interface
With the available 100 Gbit/s QSFP28 option fastest connections to NAS resources and big data collections are achievable. Also for data interchange in a distributed compute cluster the highest available LAN connectivity is a must have.
The AIME A4004 is completely manageable with ASMB9 (out-of-band) and ASUS Control Center (in-band) makes a successful integration of the AIME A4004 into larger server clusters possible.
Optimized for Multi GPU Server Applications
The AIME A4004 offers energy efficiency with 1+1 redundant titanium power supplies, which enable long time fail-safe operation.
Its thermal control technology, for cooling GPU and CPU tracks individualy, provides more efficient power consumption for large-scale environments.
All setup, configured and tuned for perfect Multi GPU performance by AIME.
The A4004 come with preinstalled Linux OS configured with latest drivers and frameworks like Tensorflow, Keras and PyTorch. Ready after boot up to start right away to accelerate your deep learning applications.
Technical Details
Type | Rack Server 2U, 80cm depth |
CPU (configurable) |
EPYC Bergamo EPYC 9754 (128 cores, 2.25 / 3.1 GHz) EPYC 9734 (112 cores, 2.2 / 3.0 GHz) EPYC Genoa EPYC 9124 (16 cores, 3.0 / 3.7 GHz) EPYC 9224 (24 cores, 2.5 / 3.7 GHz) EPYC 9354 (32 cores, 3.25 / 3.8 GHz) EPYC 9454 (48 cores, 2.75 / 3.8 GHz) EPYC 9554 (64 cores, 3.1 / 3.75 GHz) EPYC 9654 (96 cores, 2.4 / 3.7 GHz) |
RAM | 96 / 192 / 384 / 768 / 1024 / 1536 GB DDR5 ECC memory |
GPU Options |
1 to 4x NVIDIA H100 NVL 94GB or 1 to 4x NVIDIA H100 80GB or 1 to 4x NVIDIA A100 80GB or 1 to 4x NVIDIA L40S 48GB or 1 to 4x NVIDIA RTX 6000 Ada 48GB or 1 to 4x NVIDIA RTX A6000 48GB or 1 to 4x NVIDIA RTX A5000 24GB |
Cooling | CPU and GPUs are cooled individualy with an air stream provided by 8 high performance fans (2 per GPU and 4 for CPU) with > 100000h MTBF |
Storage | Up to 2x 15.36 TB U.2 NVMe PCIe 4.0 SSD Tripple Level Cell (TLC) quality 6800 MB/s read, 4000 MB/s write MTBF of 2,000,000 hours and 5 years manufacturer's warranty with 1 DWPD Optional Hardware RAID: Up to 4x HDD 20 TB SATA RAID 0/1/5/10 or Up to 4x SSD 7.68 TB SATA RAID 0/1/5/10 or Up to 4x SSD 3.84 TB NVMe RAID 0/1/5/10 or Up to 4x SSD 7.68 TB NVMe RAID 0/1/5/10 or Up to 4x SSD 15.36 TB NVMe RAID 0/1/5/10 |
Network |
2x 1 GBit LAN RJ45 1x IPMI LAN RJ45 optional additional: 2x 10 GBit LAN SFP+ or RJ45 or 2x 25 GBit LAN SFP28 or 1x 100 GBit LAN QSFP28 |
USB | 4x USB 3.2 Gen1 ports (front) 2x USB 3.2 Gen1 ports (back) |
PSU | 1+1 2600 Watt redundant power 80 PLUS Titanium certified (96% efficiency) |
Noise-Level | 90 dBA |
Dimensions (WxHxD) | 440mm x 88.9mm (2U) x 800mm
17.30" x 3.5" x 31.50" |
Operating Environment | Operation temperature: 10℃ ~ 35℃
Non operation temperature: -40℃ ~ 70℃ |