It seems that the more revolutionary deep learning models are in AI, the more massive they become. This summer’s hottest model for natural language processing, GPT-3, is a perfect example. To achieve the levels of precision and speed to write like a human, the model required 175 billion parameters, 350 GB of memory and $ 12 million to train (think of training as the “learning” phase). But, beyond just the cost, big AI models like this have a big problem with energy.
UMass Amherst researchers have found that the computing power required to drive a large AI model can produce over 600,000 pounds of CO2 emissions – five times the amount of a typical car over its lifetime! These models often require even more energy to be processed in actual production contexts (also known as the inference phase). NVIDIA estimates that 80-90% of the costs of running a neural network model come from inference rather than training.
To make more progress in the field of AI, popular opinion suggests that we will have to make a huge environmental compromise. But this is not the case. Large models can be scaled down to run on an everyday workstation or server, without having to sacrifice accuracy and speed. But first, let’s see why machine learning models got so big in the first place.
Now: computing power doubling every 3.4 months
A little over ten years ago, researchers at Stanford University discovered that the processors used to power the complex graphics of video games, called GPUs, could be used for deep learning models. This discovery led to a race to create increasingly powerful dedicated hardware for deep learning applications. In turn, the models created by data scientists have grown larger and larger. The logic was that larger models would lead to more precise results. The more powerful the hardware, the faster these models will work.
OpenAI Research proves that this hypothesis has been widely adopted in the field. Between 2012 and 2018, the computing power of deep learning models doubled every 3.4 months. So, that means that over a six-year period, the computing power used for AI has increased by 300,000x. As stated above, this power is not only for training algorithms, but also for using them in production settings. More recent MIT research suggests that we can hit the upper limits of computing power sooner than we think.
Additionally, resource constraints have limited the use of deep learning algorithms to those who can afford them. When deep learning can be applied to everything from detecting cancer cells in medical imaging to stopping hate speech online, we can’t afford to limit access. Here again, we cannot afford the environmental consequences of proceeding with models that are infinitely larger and more energy intensive.
The future is getting small
Fortunately, researchers have found a number of new ways to reduce deep learning patterns and reuse training datasets through smarter algorithms. This way, large models can operate in production settings with less energy, while still achieving the desired results depending on the use case.
These techniques have the potential to democratize machine learning for more organizations that don’t have millions of dollars to invest in training algorithms and bring them to production. This is especially important for “fringe” use cases, where larger, specialized AI hardware is not physically practical. Think of small devices like cameras, car dashboards, smartphones, etc.
Researchers are shrinking the patterns by removing some of the unnecessary connections in neural networks (cut), or by making some of their mathematical operations less complex to process (quantification). These smaller, faster models can run anywhere with similar accuracy and performance as their larger counterparts. This means that we will no longer need to run at the peak of computing power, causing even more environmental damage. Making big models smaller and more efficient is the future of deep learning.
Another major issue is the continuing education of large models on new data sets for different use cases. A technique called learning transfer can help prevent this problem. Transfer learning uses pre-trained models as a starting point. Knowledge of the model can be “transferred” to a new task using a limited data set, without having to recycle the original model from scratch. This is a crucial step in reducing the computing power, energy and money required to train new models.
The bottom line? Models can (and should) be scaled down whenever possible to use less computing power. And knowledge can be recycled and reused instead of starting the deep learning training process from scratch. Ultimately, finding ways to reduce model size and associated computing power (without sacrificing performance or accuracy) will be the next big unlock for deep learning. This way, anyone can run these applications in production at a lower cost, without having to make a huge environmental compromise. Anything is possible when you think small about big AI – even the next app to help stop the devastating effects of climate change.
Published March 16, 2021 – 18:02 UTC