Deepseek’s ‘Aha Moment’ creates a new way to build a powerful Ai with less money
Chinese AI LAB Deepseek adopted innovative techniques for the development of AI model that was trained with limited human intervention, creating a “aha moment” that could transform the costs of the construction of killers on the basis of technology.
The research work published on the operation of the Deepseek model R1 “Raschnings” reveals that the group, led by the Hedge Fund of the billionaire Liang Weenfeng, has achieved strong results by removing narrow throats in the development of AI.
The paper shows how Deepseek Adopted a number of more efficient techniques for the development of R1, which, like Openi’s O1 rival model, create accurate answers by “thinking” step by step about their answers longer than most large language models.
The breaks of Deepseek come from its use of “learning to strengthen” to reduce human involvement in creating answers to the instructions.
The company also built smaller models with fewer parameters – the number of variables used for training Ai The system and shaping its output – with powerful explanation capabilities by adjusting large models trained by competitors such as Meta and Alibaba.
Together, these events sent shock waves throughout the Silicon Valley, as R1 outweighs some tasks compared to recently published models from Openi, Anthropic and Meta, but in a fraction of money for development.
Openii said on Tuesday that he found evidence that Deepsek had defeated his technology, using the results of his models to train his LLMS at lower costs, a practice that is common with academicians and less well-funded start-ups.
Despite the controversy, experts said Deepseek showed a real innovation. AI researchers also praised their willingness to publish a detailed technical report showing how he built his reasoning model.
“I think this is just the tip of the iceberg in terms of the type of innovation we can expect in these models,” said Neil Lawrence, professor of machine learning Deepmind at Cambridge University. “History shows that large companies are struggling for innovation while the scale, and what we have seen from many of these large companies is a replacement of calculating investment for intellectual hard work.”
The thumbs lead to ‘aha of the moment’
Large language models are built in two stages. The first one is called “pre-treatment” in which developers use huge data sets that help models Predict the following word in the sentence. The second phase is called “post-traning”, through which developers teach the model to follow the instructions, such as solving mathematical problems or encoding.
One way to generate chatbots to generate more useful answers is called “learning reinforcement from human feedback” (RLHF), a technique launched to improve chatgpt.
RLHF operates by human anotetters that indicate the AI models answers to the instructions and selects of the answers that are best. This step is often arduous, expensive and long -lasting, often requires a small army of human data labelers.
Deepseek’s great innovation is to automate this last step, using a technique called learning to strengthen (RL), in which the AI model is rewarded for the right thing.
Deepseek was the first to develop a powerful model that predicted a text called V3. He then used RL for a “reward” model, such as giving a thumb to generate the right answer.
The Chinese company has found that the model working this process has been able to spontaneously solve problems without human control.
This technique was also used by Google Deepmind to build Alphagoa, AI system that beat human players in ancient game on the board and launched current flourishing in deep learning computer science techniques.
Deepseek said he discovered that the model had what the company called the “AHA moment” when he re -estimated his answers and adjusted his processing time to deal with different questions.
“Aha Moment” serves as a strong reminder of the potential [RL] In order to unlock new levels of intelligence in artificial systems, the path for autonomous and adaptable models in the future, “Deepseeek’s creatives wrote in their Research work.
Lewis Tunstall, Hugging Face Researcher, AI Research Company, said: “It seems that a secret sauce to create this paper is just having a very, very strong pre -trained model, and then very very, very good infrastructure for this reinforcement learning process in great extent. “
Small models built using large
While Openai and Google invest billion dollars to build large language models, Deepseek also built fewer models that can be launched on phones or web browsers “distillation” the possibilities of explaining larger models.
Deepseek used its R1 model to generate a relatively small set of 800 000 data, and then touched models made by competitors such as Alibaba Qwen and Meta’s llam using these information generated on AI.
Deepseek has discovered that these distilled models are particularly powerful in reference reference values, in some cases, surpassing leading models like Anthropic Claude. “This basically can solve most of the mathematical problems I have worked in undergraduate studies,” Tunstall said.
This development could be a blessing for app programmers, which have a cheap and effective way to build products. Teaching the AI model by distinguishing during “conclusion”-when the model generates answers-it is a lot more effective than the pre-trading procedure, which requires a lot of computer power, according to Lennart Heim, a RAND researcher, Think-tank.
This new paradigm could allow competitors to build competitive models with far less computer forces and money, he added. However, no money for chips, “they just can’t schedule them,” Heim said.
Deepseek did not say how much he spent to build R1 but hard He trained his V3 model, which is based on R1, for only $ 5.6 million.
This amount does not include other expenses, such as the probable acquisition of thousands of graphic units for training models or salaries, experiments, training and deployment, Heim said.
And while Deepseek was the first to use its special techniques, it is expected that the other AI lab will follow the suit and the face hug is already working on replicating R1.
The US AI companies also worked to use the possibilities of their big, most modern models in smaller, faster models. Google launched Gemmu last year, which is a lighter model based on her twin.
“Intelligence recipe is quite simple,” said Thomas Wolf, co -founder and chief scientific director in Hugging Face, adding that Deepseek’s technique was well understood by others on the field. “And so I expect that a lot of teams will happen again.”
Additional Cristine Criddle Reporting in San Francisco and Madhumita Murgia in London