Chinese startup DeepSeek has gatecrashed the world of artificial intelligence with the launch of its high-performance DeepSeek-R1 reasoning model, which claimed to have matched and sometimes even surpassed the capabilities of OpenAI’s models, despite being built at only a fraction of the cost.
The development of R1 has captivated developers and shaken investors who have plunged billions of dollars in capital into U.S.-based AI firms in the belief that money and computing resources equate to more powerful models. DeepSeek shows us that’s not necessarily the case.
Having launched on January 20, DeepSeek-R1 has risen to become the top-trending model on the Hugging Face AI platform, with more than 189,000 downloads just nine days later. Developers are racing to test the model and understand its implications for the future of AI innovation, following a host of headlines that indicate superior performance to vastly more expensive competitors like OpenAI’s GPT-4o and Google’s Gemini LLMs.
As of January 27, DeepSeek’s consumer app soared to the number one spot in Apple’s App Store, displacing ChatGPT and sparking a major sell-off in U.S.-based AI stocks.
DeepSeek’s model could have profound implications for enterprise AI strategies. By making DeepSeek-R1 freely available and much more affordable, it provides a viable alternative to the costly proprietary models built by OpenAI and Google et al, which were previously seen as best in class.
DeepSeek-R1 brings the promise of democratizing access to the most powerful, cutting-edge AI capabilities, giving smaller companies a leg up in what is quickly becoming an AI arms race.
What’s really exciting is not just DeepSeek-R1’s ability to perform complex tasks such as reasoning, math, and coding to such a high level, but also the way it does it. The company has pioneered the use of novel techniques including clever hardware optimizations, reinforcement learning, and model distillation.
In doing so, it has created an incredibly powerful model that doesn’t just deliver accurate and insightful results – it gets smarter over time, adapting and improving the quality of its outputs.
Smart Optimizations
When the U.S. government imposed restrictions on the export of sophisticated graphics processing units to China, it was assumed that this would throw an enormous spanner in the works of Chinese AI companies. However, DeepSeek has shown that it’s possible to compensate for a lack of advanced hardware by heavily customizing the software that manages how that hardware is used.
The company trained DeepSeek-R1 almost exclusively on Nvidia’s H800 GPUs rather than the H100 chips used by its U.S. competitors. The H800 was developed specifically for the Chinese market to comply with U.S. sanctions, and notably throttles the chips’ throughput and lowers the amount of bandwidth they can handle.
To get around this, DeepSeek’s engineers came up with some clever low-level code optimizations that vastly improved the H800 GPU’s memory efficiency, ensuring that its model would not be held back by any bandwidth limitations. This innovation shows that it’s possible to get around the need for millions of dollars’ worth of advanced hardware, simply by squeezing more performance out of lower-power chips.
Reinforcement Learning
Last November, DeepSeek made its first claims regarding the performance of DeepSeek-R1, releasing benchmark results that showed how it was able to surpass the performance of OpenAI’s o1 reasoning model. That was prior to its public release.
With the full release and accompanying academic paper, the company raised eyebrows with the revelation that it had not relied on conventional supervised fine-tuning (SFT) techniques, but instead adopted a new approach known as reinforcement learning (RL).
SFT is a process that involves training AI models on curated datasets to train models to perform step-by-step reasoning, also known as chain-of-thought. It’s seen as an essential technique for improving the reasoning abilities of LLMs, but DeepSeek shows that reinforcement learning can make it obsolete.
Reinforcement learning enabled DeepSeek-R1 to improve its performance autonomously through a trial and error process, incentivized by rewards, reducing the need for pre-labeled training data. Although the paper doesn’t reveal everything about DeepSeek’s reinforcement learning process, it notes the use of an innovation known as Group Relative Policy Optimization (GRPO), which helps to stabilize the training process and boost its accuracy over time.
DeepSeek has closely-guarded the training data used to develop DeepSeek-R1, but it is believed to have used a combination of synthetic and open-source data sources to enhance its reasoning abilities.
Validated, Open-Source Data
DeepSeek has closely-guarded the training data used to develop DeepSeek-R1, but it is believed to have used a combination of synthetic and open-source data sources to enhance its reasoning abilities.
The GRPO algorithm was first described in DeepSeek’s April 2024 DeepSeekMath paper, which revealed it was trained on the Common Crawl dataset, an open repository of web crawl data that includes raw webpages, metadata, text extracts, and image files. The Common Crawl Foundation has previously claimed that its data has been used to train more than 80% of the world’s LLMs.
This data is especially useful for LLMs due to the way it enhances transparency and traceability via a partnership with the U.S. startup Constellation Network, which has created a customized blockchain for validating and securely accessing the Common Crawl data.
Constellation has helped to validate and secure 17 years’ worth of internet crawl data spanning almost nine petabytes through Metagraph, an innovative application-specific blockchain network. This enables Common Crawl to provide a fully immutable copy of the last 17 years of internet history, addressing concerns about data provenance, privacy and ethical sourcing – which are all hallmarks of DeepSeek’s model, suggesting it relied on this dataset.
By using blockchain, Constellation provides cryptographic security that ensures the integrity of the Common Crawl data throughout the entire AI lifecycle, while providing a more ethical AI framework around data collection and citation.
Computational Efficiency
Another of DeepSeek’s innovations is the use of model distillation, which is a process that involves transferring the knowledge of massive models with billions of parameters to more lightweight and efficient models.
The result is distilled models that are capable of almost matching the performance of their larger counterparts, while substantially reducing the computational resources needed to generate those results.
As an example, distilled models can be applied to specific tasks like mathematical problem solving and coding, leveraging the knowledge of much larger models but without any of the bloat that hogs computational resources. It’s essentially a balancing act that involves striking an equilibrium between efficiency and power.
DeepSeek’s paper also describes how it emphasized stability and iterative refinement during the training process. By combining GRPO with self-evaluation mechanisms, the model can consistently produce accurate and reliable outputs by assessing its own responses, identifying any mistakes or inaccuracies, and refining its outputs based on what it learns.
This iterative improvement process is especially useful for complex tasks where precision is of paramount importance, such as in engineering, advanced analytics, and scientific research.
DeepSeek’s “Aha Moment”
In its paper, DeepSeek explained how it used reinforcement learning to incentivize its model to think independently by rewarding it for generating correct answers and showing the logical process it used to come up with those answers.
It was thanks to this approach that DeepSeek was able to see how DeepSeek-R1 evolved by itself, devoting more processing time to increasingly challenging problems. This, the researchers say, demonstrates the model’s ability to prioritize tasks based on how difficult they are.
They termed it an “aha moment”, as a key milestone in which DeepSeek-R1 utilized reinforcement learning to create its own advanced reasoning processes, without the need for traditional SFT techniques.
A Blueprint For More Efficient AI
Perhaps the biggest advantage of DeepSeek-R1 is that, besides outperforming leading models like o1 and Llama 3, it’s also able to showcase its entire chain-of-thought. In other words, it provides transparency into how it came to its answers or conclusions. This is a key capability, and one that can be especially useful considering that other models don’t do this, or will only do it under certain circumstances.
For instance, OpenAI masks its models’ chain-of-thought in order to protect its development secrets, while Llama 3 will only reveal its thought processes through some aggressive prompting. This transparency enables developers to quickly identify and fix any errors in the model’s output, enabling its accuracy to be improved over time.
Conclusion:
The incredible performance of DeepSeek-R1 and the key innovations used in its development suggest a path towards more efficient AI models that can reduce the overall resource requirements without affecting performance.
In this way, DeepSeek has given us a blueprint for the development of powerful AI tools for developers and researchers who can only access limited computational resources, paving the way for more rapid innovation.