header-langage
简体中文
繁體中文
English
Tiếng Việt
Scan to Download the APP

DeepSeek Dominates App Store, Chinese AI Sparks Earthquake in US Tech Industry

2025-01-27 10:56
Read this article in 43 Minutes
总结 AI summary
View the summary 收起
Original Title: "DeepSeek Dominates the App Store, Triggering an Earthquake in the U.S. Tech Scene Thanks to Chinese AI"
Original Source: APPSO


Over the past week, the DeepSeek R1 model from China has stirred up the entire overseas AI community.


On one hand, it has achieved performance comparable to OpenAI's o1 at a lower training cost, showcasing China's advantage in engineering capabilities and innovation at scale. On the other hand, it also upholds the spirit of open source and is enthusiastic about sharing technical details.


Recently, a research team led by Jiayi Pan, a Ph.D. student at the University of California, Berkeley, successfully reproduced the key technology of DeepSeek R1-Zero — the "Eureka Moment" — at a very low cost (less than $30).


It's no wonder that Meta CEO Mark Zuckerberg, Turing Award winner Yann LeCun, and DeepMind CEO Demis Hassabis have all highly praised DeepSeek.


As the popularity of DeepSeek R1 continues to soar, this afternoon, the DeepSeek App briefly experienced server congestion due to a surge in user traffic, and even experienced a "crash" at one point.


OpenAI CEO Sam Altman has just attempted to leak the usage quota for o3-mini in order to grab international media headlines back — ChatGPT Plus members can query 100 times per day.


However, little-known to many, before gaining fame, DeepSeek's parent company, Magic Square Quantitative, was actually one of the leading companies in the domestic quantitative hedge fund industry.


DeepSeek Model Shakes Silicon Valley, Its Value Still Rising


On December 26, 2024, DeepSeek officially released the DeepSeek-V3 large model.


This model has performed exceptionally well in several benchmark tests, surpassing mainstream top models in the industry, especially in knowledge questioning, long-text processing, code generation, and mathematical abilities. For example, in knowledge-based tasks such as MMLU and GPQA, DeepSeek-V3's performance approaches that of the international top-tier model Claude-3.5-Sonnet.



In terms of mathematical ability, it set new records in tests such as AIME 2024 and CNMO 2024, surpassing all known open-source and closed-source models. At the same time, its generation speed has increased by 200% compared to the previous generation, reaching 60 TPS, significantly improving user experience.


According to the analysis by the independent evaluation website Artificial Analysis, DeepSeek-V3 has surpassed other open-source models in multiple key metrics and is on par in performance with the world's top closed-source models GPT-4o and Claude-3.5-Sonnet.


DeepSeek-V3's core technological advantages include:


1. Mixture of Experts (MoE) Architecture: DeepSeek-V3 has 671 billion parameters, but in actual operation, only 370 billion parameters are activated for each input, greatly reducing computational costs while maintaining high performance.


2. Multi-Headed Latent Attention (MLA): This architecture, validated in DeepSeek-V2, achieves efficient training and inference.


3. Unassisted Loss Load Balancing Strategy: This strategy aims to minimize the negative impact on model performance due to load balancing.


4. Multi-Tokens Prediction Training Objective: This strategy enhances the overall performance of the model.


5. Efficient Training Framework: Using the HAI-LLM framework, it supports 16-way Pipeline Parallelism (PP), 64-way Expert Parallelism (EP), and ZeRO-1 Data Parallelism (DP), and reduces training costs through various optimization techniques.


Most importantly, DeepSeek-V3's training cost is only $5.58 million, far less than the $78 million cost of training GPT-4. Furthermore, its API service pricing continues the past friendly approach.



Input tokens cost only $0.5 per million (cache hit) or $2 per million (cache miss), output tokens cost only $8 per million.


The Financial Times described it as a "dark horse that has shocked the international tech industry," believing that its performance is on par with well-funded American competitors such as OpenAI. Chris McKay, founder of Maginative, further pointed out that the success of DeepSeek-V3 may redefine the established approach to AI model development.


In other words, the success of DeepSeek-V3 is also seen as a direct response to the US export restrictions on computing power, with this external pressure actually stimulating innovation in China.


DeepSeek Founder Liang Wenfeng, the Low-key Genius from Zhejiang University


The rise of DeepSeek has made Silicon Valley restless, and the founder behind this disruptor of the global AI industry, Liang Wenfeng, perfectly embodies the traditional Chinese sense of genius—a young achiever who continues to innovate.


A good leader of an AI company needs to be both tech-savvy and business-savvy, visionary yet practical, daringly innovative yet disciplined in engineering. This kind of multidisciplinary talent is itself a scarce resource.


At the age of 17, Liang Wenfeng was admitted to Zhejiang University's Information and Electronic Engineering program. By the age of 30, he founded HQuant and began leading the team to explore fully automated quantitative trading. Liang Wenfeng's story proves that geniuses always seem to do the right thing at the right time.



· 2010: With the launch of the CSI 300 Stock Index Futures, quantitative investment saw a development opportunity, and the HQuant team capitalized on this, rapidly growing its proprietary funds.


· 2015: Liang Wenfeng co-founded HQuant with his alumni and the following year launched the first AI model, deploying trade positions generated by deep learning.


· 2017: HQuant declared the full AI-ization of its investment strategies.


· 2018: Established AI as the company's main development direction.


· 2019: Managed funds surpassed 10 billion RMB, making it one of the top four quantitative hedge fund giants in China.


· 2021: HQuant became the first domestic quantitative hedge fund giant to surpass 100 billion RMB in managed funds.


You can't just remember this company's history of sitting on the bench in the past few years when it succeeds. However, just as quantitative trading companies transitioned to AI, which may seem unexpected but is actually logical—because both are data-driven, technology-intensive industries.


Huang Renxun just wanted to sell gaming graphics cards, making money from us gamers, but he ended up creating the world's largest AI arsenal. HQuant's foray into the AI field shares a striking resemblance. This kind of evolution is more dynamic than many industries' current superficial adoption of large AI models.


Throughout its experience in quantitative investment, HQuant has accumulated a wealth of experience in data processing and algorithm optimization, along with a large number of A100 chips, providing robust hardware support for AI model training. Starting in 2017, HQuant made a large-scale deployment of AI computing power, constructing high-performance computing clusters such as "Firefly-1" and "Firefly-2" to provide powerful computing power support for AI model training.



In 2023, Magic Square Quantitative officially established DeepSeek, focusing on AI large model research and development. DeepSeek inherited Magic Square Quantitative's accumulation of technology, talent, and resources, quickly making a name for itself in the AI field.


In an in-depth interview with The Surge, DeepSeek's founder, Liang Wenfeng, also demonstrated a unique strategic vision.


Unlike most Chinese companies that choose to replicate the Llama architecture, DeepSeek starts directly from the model structure, aiming only for the grand goal of AGI.


Liang Wenfeng openly acknowledged the significant gap between current Chinese AI and the international top level, with a comprehensive gap in model structure, training dynamics, and data efficiency that requires four times the computing power to achieve equivalent results.


▲ Image Source: CCTV News Screenshot


This confrontational attitude towards challenges stems from Liang Wenfeng's years of experience at Magic Square.


He emphasized that open source is not only about sharing technology but also a cultural expression, and a true moat lies in the team's continuous innovative ability. DeepSeek's unique organizational culture encourages bottom-up innovation, flattens hierarchies, and values the passion and creativity of talent.


The team is mainly composed of young people from top universities, adopting a natural division of labor model that allows employees to explore and collaborate autonomously. When recruiting, the focus is more on the employees' passion and curiosity rather than traditional experience and background.


Regarding the industry outlook, Liang Wenfeng believes that AI is currently in a period of technological innovation explosion rather than application explosion. He emphasized that China needs more original technological innovations, cannot always be in the imitation stage, and needs people to stand at the forefront of technology.


Even though companies like OpenAI are currently in a leading position, there are still opportunities for innovation.



Surging into Silicon Valley, Deepseek Makes Waves in the Overseas AI Community


While opinions in the industry about DeepSeek vary, we have also collected some comments from industry insiders.


Jim Fan, head of the NVIDIA GEAR Lab project, highly praised the DeepSeek-R1.


He pointed out that this represents non-U.S. companies practicing OpenAI's original open mission by influencing through openly sharing core algorithms, learning curves, etc., and subtly referencing OpenAI.


DeepSeek-R1 not only open-sourced a series of models but also revealed all training secrets. They may be the first to showcase a significant and continuously growing RL flywheel open-source project.


Influence can be achieved either through 'ASI internalization' or 'Strawberry Plan' legendary projects or simply by openly sharing core algorithms and matplotlib learning curves.


Wall Street's top-tier venture capitalist A16Z founder Marc Andreesen believes that DeepSeek R1 is one of the most amazing and impressive breakthroughs he has ever seen, and as an open-source project, it is a profoundly meaningful gift to the world.



Former Tencent senior researcher and Beijing University AI postdoctoral Lu Jing analyzed from a technical accumulation perspective. He pointed out that DeepSeek did not suddenly become popular; it inherited many innovations from the previous generations of model versions, and the related model architectures and algorithm innovations have been iteratively validated, making its industry-shaking impact inevitable.


Turing Award winner and Meta Chief AI Scientist Yann LeCun proposed a new perspective:


"For those who, after seeing DeepSeek's performance, feel that 'China is surpassing the U.S. in AI,' your interpretation is wrong. The correct interpretation should be that 'open-source models are surpassing proprietary models.'"



DeepMind CEO Demis Hassabis expressed a hint of concern in his evaluation:


"The achievements it (DeepSeek) has made are impressive. I believe we need to consider how to maintain the leading position of Western cutting-edge models. I think the West is still ahead, but it is certain that China has extremely strong engineering and scaling capabilities."


Microsoft CEO Satya Nadella stated at the World Economic Forum in Davos, Switzerland, that DeepSeek has effectively developed an open-source model that not only performs well in inference computing but also boasts highly efficient supercomputing capabilities.


He emphasized that Microsoft must respond to China's breakthrough advances with the highest level of attention.


Meta CEO Mark Zuckerberg's evaluation goes even further, as he believes that the technology prowess and performance demonstrated by DeepSeek are impressive. He points out that the AI gap between the U.S. and China has become negligible, with China's full-on sprint making this competition even more intense.


The response from competitors may be the best recognition of DeepSeek. According to Meta employees' revelations on the anonymous workplace community TeamBlind, the emergence of DeepSeek-V3 and R1 has plunged Meta's generative AI team into panic.


Meta's engineers are racing against the clock to analyze DeepSeek's technology, trying to replicate any possible techniques from it.


The reason is that the training cost of DeepSeek-V3 is only $5.58 million, a figure even less than the annual salary of some Meta executives. With such a stark input-output ratio, Meta's management feels immense pressure when explaining its massive AI research and development budget.



The international mainstream media has also paid close attention to DeepSeek's rise.


The Financial Times pointed out that DeepSeek's success has overturned the traditional belief that "AI research and development must rely on massive investment," proving that a precise technological roadmap can also achieve outstanding research results. More importantly, the DeepSeek team's selfless sharing of technology innovation has made this research-value-focused company an exceptionally strong competitor.


The Economist stated that the rapid breakthrough in cost-effectiveness of Chinese AI technology has begun to shake America's technological advantage, which could affect the U.S.'s productivity growth and economic potential in the next decade.



The New York Times approached the topic from another perspective, noting that DeepSeek-V3 is on par with high-end chatbots from American companies in terms of performance but at a significantly lower cost.


This suggests that even under chip export controls, Chinese companies can compete through innovation and efficient resource utilization. Moreover, the U.S. government's chip restriction policies may backfire, actually driving China's innovative breakthroughs in open-source AI technology.


DeepSeek "Knocks on the Door," Claims to Be GPT-4


Amid a wave of praise, DeepSeek also faces some controversy.


Many external observers believe that DeepSeek may have used output data from models such as ChatGPT during the training process as training material, and through model distillation techniques, the "knowledge" in this data was transferred to DeepSeek's own model.


This practice is not uncommon in the AI field, but critics are concerned about whether DeepSeek used output data from OpenAI's model without full disclosure. This seems to also be reflected in DeepSeek-V3's self-awareness.


Earlier, some users found that when asked about its identity, it would mistakenly consider itself to be GPT-4.



High-quality data has always been a key factor in AI development, and even OpenAI has struggled to avoid controversy over data acquisition. Its practice of massive web scraping data has also led to many copyright lawsuits, and as of now, the initial ruling in OpenAI's case with The New York Times has not yet been finalized, adding a new case.


Therefore, DeepSeek has also faced public criticism from Sam Altman and John Schulman.


"Copying something you know works is (relatively) easy. Doing something new, risky, and hard when you are not sure if it will work is very hard."



However, the DeepSeek team clearly stated in the R1 technical report that they did not use output data from OpenAI's model and claimed to have achieved high performance through reinforcement learning and a unique training strategy.


For example, they adopted a multi-stage training approach, including basic model training, reinforcement learning (RL) training, fine-tuning, etc. This multi-stage iterative training approach helps the model absorb different knowledge and capabilities at different stages.


Saving Money is Also a Technical Skill: Insights into the Technology Behind DeepSeek


The DeepSeek-R1 technical report mentions an interesting finding from the R1 zero training process, which is the "aha moment" that occurred. In the mid-term training phase, DeepSeek-R1-Zero began to actively reevaluate its initial problem-solving approach and allocate more time to optimize strategies (such as attempting different solutions multiple times).


In other words, through the RL framework, AI may spontaneously develop human-like reasoning abilities, even surpassing the limitations of pre-set rules. This is also expected to provide guidance for developing more autonomous, adaptive AI models, such as dynamically adjusting strategies in complex decision-making (medical diagnosis, algorithm design).



Meanwhile, many industry experts are attempting to delve deep into the technical report of DeepSeek. Former co-founder of OpenAI, Andrej Karpathy, expressed his thoughts after the release of DeepSeek V3:


DeepSeek (the Chinese AI company) is impressing today, openly releasing a cutting-edge language model (LLM) and accomplishing training on an extremely low budget (2048 GPUs, 2 months, $6 million).


For reference, this level of capability typically requires a cluster of 16K GPUs to support, with most advanced systems today using around 100K GPUs. For example, Llama 3 (405B parameters) used 30.8 million GPU-hours, while DeepSeek-V3 appears to be a more powerful model, using only 2.8 million GPU-hours (approximately 1/11th the computation of Llama 3).


If this model also performs well in actual testing (e.g., LLM arena rankings are underway, and my quick tests are doing well), it will be a very impressive demonstration of research and engineering capabilities under resource constraints.


So, does this mean we no longer need large GPU clusters to train cutting-edge LLMs? Not quite, but it does indicate that you must ensure you are not wasting resources, as this case demonstrates that data and algorithm optimizations can still yield significant progress. Furthermore, this technical report is also very exciting and detailed, well worth a read.



Facing the controversy over DeepSeek V3 being questioned for its use of ChatGPT data, Karpathy stated that large language models fundamentally do not possess human-like self-awareness. Whether a model can correctly answer questions about its identity entirely depends on whether the development team has specifically constructed a self-awareness training set; if not trained deliberately, the model will answer based on the closest information in the training data.


Furthermore, the model identifying itself as ChatGPT is not the issue. Considering the prevalence of ChatGPT-related data on the internet, this type of answer actually reflects a natural "emergence of adjacent knowledge" phenomenon.


After reading the technical report of DeepSeek-R1, Jim Fan pointed out:


The most crucial point of this paper is: driven entirely by reinforcement learning, with no involvement of supervised fine-tuning (SFT), this approach is similar to AlphaZero—mastering the games of Go, Shogi, and chess from scratch through a "Cold Start," without needing to mimic human players' moves.


– Utilizing genuine rewards calculated based on hard-coded rules, rather than learning-based reward models that are easily "gamed" by reinforcement learning.


– The model's thinking time steadily increases as the training progresses, not as a pre-programmed feature but as a spontaneous behavior.


– Emergence of phenomena involving self-reflection and exploratory behavior.


– Adoption of GRPO instead of PPO: GRPO removes the critic network in PPO and instead uses the average reward over multiple samples. This is a simple approach that can reduce memory usage. It is worth noting that GRPO was invented by the DeepSeek team in February 2024, truly a powerful team.


On the same day when Kimi also released similar research results, Jim Fan found that the research of the two companies was converging:


· Both abandoned complex tree search methods like MCTS in favor of simpler linearized thought trajectories, utilizing traditional autoregressive prediction instead


· Both avoided using a value function that requires additional model replicas, reducing computational resource requirements and improving training efficiency


· Both eschewed dense reward modeling, relying as much as possible on real outcomes as guidance to ensure training stability



However, there were significant differences between the two:


· DeepSeek employed an AlphaZero-style pure RL cold-start approach, while Kimi k1.5 opted for an AlphaGo-Master-style warm-up strategy using a lightweight SFT


· DeepSeek is open-sourced under the MIT license, whereas Kimi excelled in multimodal benchmarking, with more detailed paper system design covering RL infrastructure, hybrid clusters, code sandboxing, and parallel policies


However, in this rapidly iterating AI market, a leading edge often quickly dissipates. Other model companies will surely swiftly absorb DeepSeek's experience and improve upon it, perhaps catching up soon.


Initiator of the Big Model Price War


Many people are aware that DeepSeek holds the title of "Pinduoduo of the AI world," but few know that this actually stems from last year's initiation of the big model price war.


On May 6, 2024, DeepSeek released the DeepSeek-V2 open-source MoE model, achieving a dual breakthrough in performance and cost through innovative architectures such as MLA (Multi-Head Latent Attention Mechanism) and MoE (Mixture of Experts).


The inference cost has been reduced to only 1 RMB per million tokens, approximately one-seventh of Llama3 70B at the time and one-seventieth of GPT-4 Turbo. This technological breakthrough has enabled DeepSeek to provide highly cost-effective services without losing money, while also putting tremendous competitive pressure on other vendors.


The release of DeepSeek-V2 triggered a chain reaction, with ByteDance, Baidu, Alibaba, Tencent, and Zhīpǔ AI quickly following suit, significantly reducing the prices of their large model products. The impact of this price war even crossed the Pacific, attracting high attention from Silicon Valley.


As a result, DeepSeek was dubbed the "Pinduoduo of the AI industry."



In the face of external doubts, DeepSeek's founder, Liang Wenfeng, responded to questions during an interview with Anycast:


"Snatching users is not our main goal. On the one hand, we reduced prices because our costs came down as we explored the structure of the next-generation model; on the other hand, we believe that whether it's an API or AI, it should be something inclusive, affordable for everyone."


In fact, the significance of this price war far exceeds the competition itself. The lower barrier to entry has allowed more companies and developers to access and apply cutting-edge AI, while also forcing the entire industry to rethink pricing strategies. It was during this period that DeepSeek began to enter the public eye and shine.


A Thousand Gold to Buy the Horse's Bone, Lei Jun Poaches AI Prodigy Girl


A few weeks ago, DeepSeek also saw a notable personnel change.


According to China Business Network, Lei Jun successfully poached Luo Fuli with a million-yuan annual salary and entrusted her with the role of Head of Xiaomi AI Lab's large model team.


Luo Fuli joined DeepSeek under Quantum Illusion in 2022, and her presence can be seen in DeepSeek-V2 and the latest R1 reports.



Later on, DeepSeek, which had previously focused on the B-end, also began to lay out the C-end and launched a mobile app. As of the time of writing, DeepSeek's mobile app has ranked as high as second in the free app category on the Apple App Store, demonstrating strong competitiveness.


A series of minor breakthroughs has made DeepSeek well-known, but at the same time, it is also building up to bigger achievements. On the evening of January 20, the ultra-large-scale model DeepSeek R1 with 660B parameters was officially released.


This model has shown outstanding performance in mathematical tasks, such as achieving a pass@1 score of 79.8% on AIME 2024, slightly surpassing OpenAI-o1; and scoring as high as 97.3% on MATH-500, on par with OpenAI-o1.


In programming tasks, for example, it received a 2029 Elo rating on Codeforces, surpassing 96.3% of human participants. In knowledge benchmark tests such as MMLU, MMLU-Pro, and GPQA Diamond, DeepSeek R1 scored 90.8%, 84.0%, and 71.5%, respectively, slightly below OpenAI-o1 but better than other closed-source models.


In the comprehensive ranking of the latest large model arena, LM Arena, DeepSeek R1 ranks third, tied with o1.


· In areas such as "Hard Prompts," "Coding," and "Math," DeepSeek R1 holds the first position.


· In "Style Control," DeepSeek R1 is tied for first place with o1.


· In the test of "Hard Prompt with Style Control," DeepSeek R1 also ties for first place with o1.



In terms of open-source strategy, R1 adopts the MIT License, giving users maximum freedom to use, supporting model distillation to transfer the inference ability to smaller models. For example, the 32B and 70B models have achieved comparable performance to o1-mini in multiple capabilities. The level of open-sourcing even surpasses the previously criticized Meta.


The emergence of DeepSeek R1 allows domestic users to freely access a model that rivals o1 for the first time, breaking down long-standing information barriers. The discussion frenzy it has stirred up on platforms like Xiaohongshu is reminiscent of the release of GPT-4.


Go Global, Avoid Insularity


Looking back at the development trajectory of DeepSeek, its successful formula is clearly visible. While prowess lays the foundation, brand recognition is the moat.


In a conversation with Evening Point, MiniMax CEO Yan Junjie deeply shared his thoughts on the AI industry and the strategic shift of the company. He emphasized two key turning points: first, recognizing the importance of a technology brand, and second, understanding the value of an open-source strategy.


Yan Junjie believes that in the AI field, the speed of technological evolution is more critical than current achievements, and open source can accelerate this process through community feedback; furthermore, a strong technology brand is crucial for attracting talent and acquiring resources.


Take OpenAI as an example. Despite later experiencing management turmoil, the innovation image and open-source spirit it established in the early days have garnered the first wave of positive impressions. Even though Claude has caught up from a technical standpoint, gradually eroding OpenAI's B-side users, relying on user path dependence, OpenAI still maintains a commanding lead among C-side users.


In the AI field, the real competitive stage is always global. Going global, avoiding insularity, and promoting are undoubtedly viable strategies.



This wave of going global has long stirred ripples in the industry. Earlier players like Qwen, Wall Intelligence, and more recently DeepSeek R1, Kimi v1.5, and Bean v1.5 Pro have already made quite a stir overseas.


While 2025 has been dubbed the Year of Intelligent Entities, the Year of AI Glasses, and many other labels, this year will also be an important year for Chinese AI companies to embrace the global market. Going global will be an unavoidable keyword.


Furthermore, the open-source strategy is also a wise move, attracting a large number of tech bloggers and developers who have voluntarily become DeepSeek's "water source." Technology should be benevolent, not just a slogan. From the slogan "AI for All" to actual technological inclusiveness, DeepSeek has paved a purer path than OpenAI.


If OpenAI showed us the power of AI, then DeepSeek makes us believe:


This power will eventually benefit everyone.


Original Article Link


欢迎加入律动 BlockBeats 官方社群:

Telegram 订阅群:https://t.me/theblockbeats

Telegram 交流群:https://t.me/BlockBeats_App

Twitter 官方账号:https://twitter.com/BlockBeatsAsia

举报 Correction/Report
This platform has fully integrated the Farcaster protocol. If you have a Farcaster account, you canLogin to comment
Choose Library
Add Library
Cancel
Finish
Add Library
Visible to myself only
Public
Save
Correction/Report
Submit