Microsoft Corp. on Thursday released a new version of its open-source DeepSpeed tool that it says will enable the creation of deep learning models with a trillion parameters, over than five times as many as in the world’s current largest model.
The company also sees the tool boosting the work of developers working on smaller projects.
DeepSpeed is a software library for performing artificial intelligence training. Originally announced this February, DeepSpeed has already gone through multiple iterations that increased the maximum size of the models it can train from over 100 billion to over a trillion.
At a high level, parameters can be thought of as the insights that an AI learns from processing data. These insights are what enable AI models to improve their accuracy and speed with time. The more parameters a neural network has, the more proficiently it can process the data it ingests and thereby produce higher-quality results.
The challenge that DeepSpeed was created to address is that developers can only equip their neural networks with as many parameters as their AI training infrastructure can handle. In other words, hardware limitations are an obstacle to building bigger and better models. DeepSpeed makes the AI training process more hardware-efficient so developers may increase the sophistication of the AI software they build without having to buy more infrastructure.
Microsoft says the tool is capable of training a trillion-parameter language model using 100 of Nvidia Corp.’s previous-generation V100 graphics cards. Normally, the company claims, that task would take 4,000 of Nvidia’s current-generation A100 graphics cards 100 days to complete. That’s with the A100 being 20 times faster than the V100.
Even if the available hardware is reduced to just a single V100 chip, Microsoft says that DeepSpeed could still train a language model with up to 13 billion parameters. For comparison, the largest language model in the world has around 17 billion parameters and the largest neural network overall packs about 175 billion.
If these results hold up in real-world projects, DeepSpeed could be a major boon for AI projects. Research at groups such as OpenAI that are working to push the envelope on the size of neural networks could use it to reduce the hardware costs associated with their work. Startups and others pursuing practical day-to-day applications of AI, in turn, could harness Microsoft’s tool to build more sophisticated models than they otherwise could afford to with their limited infrastructure budgets.
DeepSpeed “democratizes multi-billion-parameter model training and opens the window for many deep learning practitioners to explore bigger and better models,” Microsoft executives Rangan Majumder and Junhua Wang wrote in a blog post.
These scalability improvements are made possible by several new technologies in the latest version of DeepSpeed. One is a ZeRO-Offload, which improves how many parameters AI training servers can handle by making creative use of the memory in those servers’ central processing units. Another innovation, dubbed 3D parallelism, distributes work among the training servers in a way that increases hardware efficiently.
“3D parallelism adapts to the varying needs of workload requirements to power extremely large models with over a trillion parameters while achieving near-perfect memory-scaling and throughput-scaling efficiency,” Microsoft’s Majumder and Wang detailed. “In addition, its improved communication efficiency allows users to train multi-billion-parameter models 2–7x faster on regular clusters with limited network bandwidth.”