Nvidia, Dell, and Qualcomm speed up AI results in latest benchmark tests

April 06, 2023

Nvidia’s graphic depiction of AI tasks.

Nvidia.

In the latest benchmark test for artificial intelligence, Nvidia, Dell, Qualcomm and a gaggle of startups brought novel techniques to the task of slimming down the compute budget of answering questions while conserving power in the process.

That could help to stem the tide of rising compute demand for running AI programs that are growing in size such as OpenAI’s ChatGPT and GPT-4.

Also: Want a job in AI? These are the skills you need

On Tuesday, the latest benchmark test of how fast a neural network can be run to make predictions was presented by MLCommons, the consortium that runs the MLPerf tests. Leading vendors such as Nvidia, Dell, Qualcomm and Supermicro submitted computer systems with various configurations of chips to see which systems took the top marks.

They competed to deliver either the most number of questions that could be answered per second, the least amount of time to answer, known as latency, or the least amount of power consumed — the energy efficiency.

A gaggle of intriguing startups also took part, including Neural Magic, xFusion, cTuning, Nettrix, Neuchips, Moffett, and Krai.

Also: How to use ChatGPT to create an app

Called “MLPerf Inference 3.0,” the test results emulate the computing operations that happen when a trained neural network is fed new data and has to produce conclusions as its output.

The benchmark measure how fast a computer can produce an answer for a number of tasks, including ImageNet, where the challenge is for the neural network to apply one of several labels to a photo describing the object in the photo such as a cat or dog.

The test results follow MLPerf inference 2.1 reported in September.

The MLCommons, in a press release, noted that the results submitted by multiple vendors show “significant gains in performance by over 60% in some benchmark tests.”

Also: Neural Magic’s sparsity, Nvidia’s Hopper, and Alibaba’s network among firsts in latest MLPerf AI benchmarks

For the benchmarks, chip and system makers compete to see how well they can do on measures such as the number of photos processed in a single second, or how low they can get latency, the total round-trip time for a request to be sent to the computer and a prediction to be returned.

The reported results pertain to computer systems operating in data centers and the “edge,” a term that has come to encompass a variety of computer systems other than traditional data center machines. A spreadsheet lists the results for all the different segments of data center and edge.

The results showed more organizations buying into the benchmark tests. MLCommons said that “a record-breaking 25 submitting organizations” submitted “over 6,700 performance results, and more than 2,400 performance and power efficiency measurements.” That is up from 5,300 performance measurements and 2,400 power measurements in September.

Also: How to use ChatGPT to write code

The submissions are grouped into two categories: “closed” and “open.” In the former category, the various submitters follow strict rules for how they run the AI software, allowing for the most direct comparison of systems on a level playing field.

In the latter case, submitters are allowed to use unique software approaches that don’t conform to the standard rules for the benchmarks, and thus produce some novel innovations.

As is often the case, Nvidia, the dominant supplier of GPUs used to run AI, picked up many honors for performance on most of the tests. Nvidia’s system running on two Intel Xeon processors and eight of Nvidia’s “Hopper” GPU chips took top place in five of the six different benchmark tasks, including running the Google BERT language model, a precursor to ChatGPT. In one of the six tasks, it was a Dell system using an almost identical configuration of Intel and Nvidia chips that took highest place.

Also: For AI’s ‘iPhone moment’, Nvidia unveils a large language model chip

More on Nvidia’s results can be found in the company’s blog post.

Qualcomm was able to boost the throughput of queries for the BERT language program by three times over results in the prior 2.1 round, the company said. A system submitted by Qualcomm using two AMD EPYC server chips and 18 of Qualcomm’s “AI100” AI accelerator chips took the top score for the open division of the data center computers on the BERT task. Its achievement, a throughput of 53,024 queries to the BERT network per second, was only a little behind the top-place score by Nvidia in the closed division.

New participants included Paris-based cTuning, a non-profit that is developing open-source tools for AI programmers to reproduce benchmark test results across different hardware platforms.

Also: In latest benchmark test of AI, it’s mostly Nvidia competing against Nvidia

CTuning took the top spot for the lowest latency, the shortest time from submission of a query to when the answer comes back, for four out of five tasks on the benchmark for edge computing, within the closed category.

Returning inference contender Neural Magic, a venture-backed startup co-founded by Nir Shavit, a scholar at MIT, once again brought to bear its special software that can find which “neural weights” of a neural network can be left unused so they are not processed by the computer chip, thus saving on computing demands.

The company’s DeepSparse software is able to use only the host processor, an x86 chip from Intel, AMD or, in future, ARM-based chips, without any assistance from the Nvidia GPUs.

Also: AMD vs Intel: Which desktop processor is right for you?

On the BERT language test in the open division for edge computing, Neural Magic’s DeepSparse software used two AMD EPYC server processors to yield 5,578 responses per second from the Google BERT neural network. That was only slightly behind the second-place showing by Supermicro’s computer in the closed division that consisted of two Xeon processors and one Nvidia Hopper GPU.

The company argues that relying on common x86 chips instead of pricier GPUs will help to spread AI to more companies and institutions by lowering overall cost of running the programs.

“You can get a lot more out of this consumer-grade hardware,” said said Michael Goin, product engineering lead for Neural Magic, in an interview with ZDNET. “These are the same AMD chips that companies already use in their store or retail location to run sales, to run inventory, to run logistics.”

More on Neural Magic’s approach can be found in the company’s blog post