Machine learning

Revolutionary Technique Enhances Speed of Neural Networks

11/25/2023

2 minute read

Researchers at ETH Zurich have made a groundbreaking discovery that has the potential to revolutionize the speed and efficiency of neural networks. By altering the inference process, they have achieved an unprecedented reduction of over 99% in computations. This technique, demonstrated on BERT, a transformer model used for language tasks, could also be applied to large language models like GPT-3, opening up new possibilities for faster, more efficient language processing.

Unleashing the Power of Fast Feedforward Layers

Transformers, the neural networks that drive large language models, consist of various layers, including attention and feedforward layers. The feedforward layers, which contribute to a significant portion of the model’s parameters, are computationally demanding due to the need to calculate the product of all neurons and input dimensions. However, the researchers propose the introduction of “fast feedforward” layers (FFF) as a replacement for traditional feedforward layers. FFF utilizes conditional matrix multiplication (CMM) instead of dense matrix multiplications (DMM) used in conventional networks. Unlike DMM, which multiplies all input parameters by all neurons in the network, CMM intelligently selects only a handful of neurons for each computation, resulting in a substantial reduction in computational load.

“By identifying the right neurons for each computation, FFF can significantly reduce the computational load, leading to faster and more efficient language models.”
ETH Zurich Researchers

To validate their technique, the researchers developed FastBERT, a modified version of Google’s BERT transformer model. They replaced the intermediate feedforward layers with fast feedforward layers arranged in a balanced binary tree structure. The performance of FastBERT was evaluated on various language understanding tasks and proved to be comparable to base BERT models of similar size and training procedures. Impressively, the best FastBERT model achieved the performance of the original BERT model while utilizing only 0.3% of its own feedforward neurons.

Accelerating Large Language Models

The potential for acceleration in large language models is immense. For example, in GPT-3, each transformer layer contains 49,152 neurons in the feedforward networks. The researchers propose that by incorporating fast feedforward networks, which would use only 16 neurons for inference, the performance could be maintained with just 0.03% of GPT-3’s neurons.

“If trainable, this network could be replaced with a fast feedforward network of maximum depth 15, which would contain 65,536 neurons but use only 16 for inference. This amounts to about 0.03% of GPT-3’s neurons.”
ETH Zurich Researchers

While there have been significant optimizations for dense matrix multiplication, the researchers highlight the lack of efficient implementations for conditional matrix multiplication. They stress the need for device programming interfaces to support conditional neural execution, which could potentially lead to a 341x speedup in BERT-base models.

“With a theoretical speedup promise of 341x at the scale of BERT-base models, we hope that our work will inspire an effort to implement primitives for conditional neural execution as a part of device programming interfaces.”
ETH Zurich Researchers

This research is a significant step towards addressing the memory and compute bottlenecks of large language models. It paves the way for more efficient and powerful AI systems, ultimately enhancing their capabilities and impact.

The Latest

AI Agents and the Future of User Interface and User Experience

The Language of Immersive Technology: AR, MR, and VR

tag to contain the h1 title –> The Power of GPT-4 Turbo: The Latest Advancement in AI Language Models

Qualcomm becomes No. 2 Leader in U.S. Patent Grants, Surpassing IBM

Revolutionary Technique Enhances Speed of Neural Networks

Unleashing the Power of Fast Feedforward Layers

Accelerating Large Language Models

Leave a Reply Cancel reply

AI Agents and the Future of User Interface and User Experience

The Language of Immersive Technology: AR, MR, and VR

tag to contain the h1 title –> The Power of GPT-4 Turbo: The Latest Advancement in AI Language Models

Qualcomm becomes No. 2 Leader in U.S. Patent Grants, Surpassing IBM

MMGuardian Introduces AI-Powered Smartphone for Kids Focusing on Child Safety

The Rise of AI Wearables: Tab Raises $1.9 Million in Seed Funding

OpenAI’s GPT Store: A Platform for Custom GPTs

OpenAI Announces New ChatGPT Team Subscription Tier

Revolutionary Technique Enhances Speed of Neural Networks

Unleashing the Power of Fast Feedforward Layers

Accelerating Large Language Models

Leave a Reply Cancel reply

Related Posts