DeepSeek V4 In-Depth Original Review: Million-Token Context, Ultimate Cost-Efficiency, the Open-Source Large Model Fully Upgraded

📋 One-Minute Summary

On April 24, 2026, DeepSeek officially released the DeepSeek-V4 preview version and open-sourced it simultaneously. This fourth-generation flagship model brings the following core breakthroughs:

1 Million Token Ultra-Long Context — Can read the entire "Three-Body Problem" trilogy in one go; no need to split long documents for analysis
Agent Capabilities Significantly Evolved — Upgraded from a "chatbot" to an "AI employee" capable of autonomous planning and continuous programming
Pricing as Low as 0.02 RMB per Million Tokens — Total cost is only one-tenth of GPT-5.5
Fully Open Source Under MIT License — Private deployment possible, data security under your control
Ranked #1 in China on SuperCLUE Chinese Benchmark — Scored 70.98 points, significantly ahead of other domestic models
Image Recognition in Gray-Scale Testing — Multimodal capabilities are about to be completed

In April 2026, the AI field once again witnessed an intensive wave of model releases. First, Anthropic launched Opus 4.7, followed by OpenAI's release of GPT-5.5. Immediately afterward, DeepSeek, after more than a year of silence, finally unveiled its highly anticipated fourth-generation flagship model — DeepSeek-V4 Preview. Without any pre-launch hype or fanfare, on the morning of April 24, DeepSeek officially announced that the V4 preview version was online and open-sourced.

This release came a full 15 months after the last major version update. While the outside world speculated whether DeepSeek had slowed its pace due to chip restrictions or talent loss, V4 responded with a report card that is both "affordable and powerful." This review will comprehensively dissect the true performance of DeepSeek-V4 across multiple dimensions, including million-token context capability, reasoning and programming performance, Agent evolution, pricing strategy, multimodal implementation, and a head-to-head comparison with ChatGPT, Claude, and Gemini.

I. The Bottom Line: What is DeepSeek-V4

DeepSeek-V4 is the fourth-generation open-source large language model series launched by DeepSeek, employing a brand-new Mixture of Experts (MoE) architecture. It comes in two versions:

DeepSeek-V4-Pro (Flagship): 1.6 trillion total parameters / 49B activated parameters
DeepSeek-V4-Flash (Lightweight): 284B total parameters / 13B activated parameters

Both models are standard-equipped with a 1 million token context window, chain-of-thought reasoning enabled by default, support for JSON output and tool calling, and are open-sourced under the MIT license.

V4 is the first major version since the V3 series to introduce a completely new underlying architecture. Its most important technological breakthrough is the Hybrid Attention Mechanism — combining token compression with sparse attention, dramatically reducing the cost of long-context reasoning. This innovation transforms million-level context from a "premium feature" into "basic infrastructure."

II. Comprehensive Real-World Testing of Core Capabilities

2.1 Million-Token Context: Reading the "Three-Body" Trilogy in One Breath

The entire DeepSeek-V4 series comes standard with a 1 million token ultra-long context, which translates to approximately 750,000 Chinese characters. What does that mean? The "Three-Body" trilogy totals nearly 1 million characters; V4 can "read" and memorize the entire content in one go, and then discuss every plot detail with you in depth.

Journalists from CCTV conducted a live-fire test: they prepared a multi-type material package totaling 970,000 characters, covering diverse text content such as full-length literary works and multi-domain news articles, and fed it into the model all at once. In the first round of testing, the journalist asked the model to accurately extract the core content of the fourth part of a specified news article. V4 output a structured and precise summary of five key points in about 7 seconds. Subsequently, the journalist posed a question that required cross-referencing all nearly 1 million characters to answer: "Across all the materials in this set, how many industries are covered in total?" The model provided an accurate answer of approximately 45 sub-industries.

For users who frequently use AI to process long documents, contract analysis, financial report interpretation, or academic research, this means you no longer need to upload files in chunks — just "feed" them directly.

2.2 Reasoning & Knowledge: #1 in China, Rivaling Top-Tier Closed-Source Models

The third-party evaluation platform SuperCLUE released its comprehensive evaluation results for the DeepSeek V4 series Chinese large model on April 28. This evaluation covered six dimensions: mathematical reasoning, scientific reasoning, code generation, agent task planning, instruction following, and hallucination control. DeepSeek-V4-Pro ranked first among domestic models of its kind with 70.98 points, with the Flash version closely following at 68.82 points. Both scores significantly outperformed other domestic models.

Compared with the previous generation V3.2, the Pro version showed an improvement of over 20 points in agent capability, nearly 10 points in mathematical reasoning, nearly 12 points in instruction following, and a notable optimization in hallucination control.

Looking at the official benchmarks released by DeepSeek, V4-Pro surpassed all currently publicly evaluated open-source models in tests such as mathematics, STEM, and competitive coding, achieving results comparable to world-class closed-source models. It scored 87.5 on MMLU-Pro, a Codeforces Rating of 3206, and 93.5 on LiveCodeBench.

In terms of world knowledge, V4-Pro significantly leads other open-source models and is only slightly behind the top closed-source model Gemini-Pro-3.1.

It is worth noting that DeepSeek's official technical report frankly admitted that V4-Pro "slightly lags behind GPT-5.4 and Gemini-3.1-Pro," with a gap of about 3 to 6 months compared to the most advanced closed-source models. Such candor is rare in the AI industry and demonstrates DeepSeek's clear-eyed understanding of its own positioning.

2.3 Agent: Evolving from "Chatbot" to "Autonomous Employee"

If long text represents a leap in "memory," then the evolution of Agent capabilities is the most crucial soul upgrade of DeepSeek-V4. In previous versions, directing the AI to do work required step-by-step instruction; with V4, it behaves more like an autonomous "veteran employee."

DeepSeek has already adopted V4 as the primary programming assistant for its internal staff. According to the company's own assessment: the user experience surpasses the industry-renowned Anthropic Sonnet 4.5, with delivery quality approaching the non-thinking mode of Opus 4.6.

Lei Technology's evaluation team used plain language to request V4-Pro to autonomously execute programming tasks. In the first test round, they asked it to write an interactive webpage version of a starry sky (where you can click on stars to read stories and drag to change the perspective). V4-Pro autonomously planned a six-step design plan and programmed continuously for nearly 34 minutes, without any interruptions or infinite loops, and without missing key steps. The task was completed, consuming about 6.19 RMB in tokens. All functions worked correctly: the spherical celestial model could be smoothly dragged, information annotations could be viewed by clicking, and the meteor shower effect was perfect.

The second round was even harder: write a small dungeon exploration web game. After the first generation was truncated, V4 autonomously adopted a more compact approach and retried. The second attempt not only built the basic game framework but also independently designed a rather comprehensive economic system and upgrade pathway. The formulas for the character's HP, MP, and attack power were written very rigorously.

In the Agentic Coding evaluation, V4-Pro has reached the best level among current open-source models. Real-world test results from Vals AI further pointed out that in the core code task track, V4 achieved the top spot on the open-weight model leaderboard with "overwhelming advantage," defeating closed-source models like Gemini 3.1 Pro, achieving an approximate 10x performance leap over the previous generation V3.2.

2.4 Code Writing: Excellent Logic, Aesthetics Need Improvement

Editors from ifanr integrated DeepSeek V4 into their daily-use Chatbot and Claude Code for real-world testing. The results showed: code writing performance was good; initially, there were some differences in requirement understanding, but it quickly adjusted, and subsequent requirements only needed natural language to be fully understood. With the help of V4 Pro, the editor built a small sleep tracking monitoring project connecting Telegram to a Notion database.

However, multiple evaluations point to the same shortcoming: solid basic functionality, but weak UI/UX aesthetics. Lei Technology's test results showed that while the webpages generated by V4 functioned completely normally, the visual design, color transitions, and interactivity aesthetics were noticeably inferior to comparable products like Codex. If you need a ready-to-use, beautifully designed frontend product, manual assistance for adjustments might be required.

III. Pricing Strategy: As Low as 2 Cents per Million Tokens

Price has always been DeepSeek's most lethal weapon. The V4 series not only continues this tradition but further depresses the industry's price floor.

On the release day, the pricing per million tokens for DeepSeek-V4-Flash input/output was $0.14/$0.28 respectively; the Pro version input/output was priced at $1.74/$3.48, less than one-fifth the price of overseas high-performance large models of the same tier.

But this was just the beginning. On the evening of April 25, DeepSeek announced a limited-time 75% discount on the V4-Pro model API; on the evening of the 26th, it further announced that the input cache hit price for the entire series API service was reduced to one-tenth of the original. After the latest price adjustment:

DeepSeek-V4-Flash input cache hit price is 0.02 RMB per million tokens
DeepSeek-V4-Pro input cache hit price is 0.025 RMB per million tokens

Equivalent to just about $0.0037 USD per million tokens.

To put this in perspective: processing one million tokens of input and output, the combined cost for GPT-5.5 and Claude Opus 4.7 is $35 and $30 respectively, while DeepSeek-V4-Pro costs only $5.27. If the input hits the cache, the cost further drops to about $3.66, approximately one-tenth of GPT-5.5.

Even more noteworthy is that DeepSeek explicitly revealed that with the mass market availability of the Huawei Ascend 950 super-node in the second half of the year, the price of Pro will be further significantly reduced. This implies that AI deployment costs still have enormous room to fall.

💰 For regular users: The web version and app remain completely free; you can use all the capabilities of the V4 series models at no cost.

IV. Unique Strengths: DeepSeek-V4's Differentiated Competitiveness

4.1 New Hybrid Attention Architecture: More Results with Less Computing Power

The core technical moat of DeepSeek-V4 is its newly designed Hybrid Attention Mechanism. This architecture combines token dimension compression with sparse attention (DSA), resulting in the following when processing the same 1 million token context:

Pro Version: Computation volume is only 27% of V3.2, KV cache usage as low as 10%
Flash Version: Computation volume compressed to 10%, KV cache usage only 7%

This means: long-context reasoning no longer requires exorbitant computing power costs. V4's low pricing is not simply capital subsidy, but stems from real engineering breakthroughs.

4.2 Deep Integration with Domestic Computing Ecosystem

DeepSeek-V4 chose to launch inference on Huawei Ascend, and domestic chips like Cambricon have also completed ecosystem adaptation. This marks the first time a top-tier domestic large model has been deployed at scale on a domestic computing system, signifying that a fully domestic AI chain has become possible.

For a long time, the global large model landscape heavily relied on NVIDIA chips and the CUDA ecosystem. The deep integration of DeepSeek-V4 with Huawei's CANN ecosystem provides a top-tier large model endorsement for the domestic chip ecosystem, reducing path dependency on a single chip system.

4.3 Open Source Strategy: MIT License, Truly Open

Unlike American companies like Anthropic and OpenAI that strictly guard their cutting-edge models, DeepSeek has always adhered to a fully open-source approach. DeepSeek-V4 is released under the MIT license, allowing anyone to download, modify, and use it for free.

This means developers can deploy V4 on their own servers without worrying about data privacy or API call limitations. For enterprise users, this is a level of freedom that closed-source models cannot offer.

4.4 Compatibility Design: One-Click Access to Mainstream Ecosystems

DeepSeek-V4 is compatible with both OpenAI and Anthropic API protocols. Developers only need to modify a single line of code to seamlessly switch from GPT or Claude. Simultaneously, V4 is deeply integrated with mainstream AI development tools like Claude Code, OpenClaw, and OpenCode, ready to use out of the box.

V. Multimodal Catch-Up: Image Recognition Officially Arrives

For a long time, the lack of multimodal capabilities was one of DeepSeek's biggest shortcomings. This situation is changing.

On April 29, just five days after the V4 release, DeepSeek initiated gray-scale testing for image recognition on its web and mobile app. Some users can see three options above the input bar: "Quick Mode," "Expert Mode," and "Vision Mode." After clicking Vision Mode, users can upload images, triggering the model to understand, describe, and analyze the image content.

36Kr's evaluation team conducted a stress test with 12 tricky images. The results showed: basic recognition accuracy is quite high, and the reasoning process is methodical. For a single photo, it could fully describe the details of the scene, identify characters, and even deduce the shooting background and lighting elements; for a jade artifact in a museum, after enabling thinking mode, it accurately identified it as "Qing Dynasty Hindustan style," perfectly matching the actual exhibition information.

However, the evaluation also pointed out that the current knowledge base for the vision feature is not yet rich enough, and there are still errors when facing high-difficulty stress tests.

Meanwhile, researcher Chen Xiaokang, responsible for multimodal research and development, has teased on social platforms that a "new version of DeepSeek V4" is coming. The market generally expects a fully functional native multimodal version to arrive in May.

VI. Comprehensive Comparison with ChatGPT, Claude, and Gemini

To provide readers with a more intuitive understanding of DeepSeek-V4's positioning among mainstream large models, here is a cross-comparison along several key dimensions:

6.1 Performance Comparison

Dimension	DeepSeek-V4-Pro	GPT-5.5	Claude Opus 4.7	Gemini 3.1 Pro
Parameter Scale	1.6T (49B Activated)	Undisclosed	Undisclosed	Undisclosed
Context Window	1M Token	1M Token	200K Token	1M Token
Max Output	384K Token	Undisclosed	128K Token	Undisclosed
Open Source	MIT Open Source ✓	Closed Source ✗	Closed Source ✗	Closed Source ✗
Multimodal	Vision in Gray-Scale Test	Native Multimodal	Native Multimodal	Native Multimodal

DeepSeek-V4-Pro has matched or even surpassed some closed-source models on metrics like MMLU-Pro (87.5 points) and Codeforces (3206 points). It ranks first domestically on the SuperCLUE Chinese benchmark with 70.98 points, though a certain gap still exists compared to top-tier closed-source models like GPT-5.5 and Opus 4.7 in areas such as code generation quality and complex multi-step instruction execution.

6.2 Pricing Comparison

Model	Input Pricing (per Million Tokens)	Output Pricing (per Million Tokens)
DeepSeek-V4-Flash	$0.14 (Cache Hit $0.0029)	$0.28
DeepSeek-V4-Pro	$1.74 (Cache Hit $0.0037)	$3.48
GPT-5.5	~$5	~$30
Claude Opus 4.7	$5	$25
Gemini 3.1 Pro	$1.25	$10

Data sources: Mashable, DeepSeek API documentation, and official pricing pages of each provider (April 2026).

The comprehensive cost of DeepSeek-V4-Pro is approximately one-seventh of GPT-5.5 and one-sixth of Claude Opus 4.7. In cache-hit scenarios, this gap further widens to over ten times.

6.3 Scenario Selection Recommendations

Budget-limited, high-concurrency scenarios: DeepSeek-V4-Flash is the best choice, with extremely low cost and fast response speed
Complex reasoning and professional tasks: DeepSeek-V4-Pro offers extremely high cost-efficiency; if ultimate precision is required, consider Claude Opus 4.7 or GPT-5.5
Multimodal needs: GPT-5.5 and Gemini 3.1 Pro have more mature native multimodal capabilities, but DeepSeek's vision feature is catching up
Chinese language scenarios: DeepSeek-V4's Chinese capability is certified by SuperCLUE as the strongest domestically, with a natural and fluent Chinese language feel
Open-source / private deployment: DeepSeek is the only choice

VII. Panoramic Use Case Landscape

7.1 Daily Office Work

DeepSeek-V4 supports million-character long-text analysis, capable of directly processing lengthy contracts, financial reports, and meeting minutes. V4 supports enterprise-grade features such as thinking mode toggling, JSON output, tool calling, and conversation prefix continuation. The FIM completion function works normally in non-thinking mode, covering complex scenarios like development, office tasks, legal, and finance.

7.2 Software Development

Thanks to its enhanced Agent capabilities, DeepSeek-V4 can serve as a programming assistant to autonomously complete tasks like code generation, debugging, and refactoring. It is deeply integrated with mainstream development tools like Claude Code and OpenClaw, compatible with OpenAI and Anthropic API protocols, resulting in minimal migration cost for developers.

7.3 Content Creation

In Chinese writing, DeepSeek-V4 continues the "lifelike" linguistic style of the R1 era. When 36Kr editors conversed with V4 in the Chatbot, they found its default personality "quite sweet," feeling human rather than mechanical. It could use connective phrases that only a real person would use, conveying a natural and conversational tone.

7.4 Deep Long-Text Processing

Load a full-length novel on the scale of "Dream of the Red Chamber," a complete codebase, or 8 hours of continuous conversation history in one go for cross-chapter analysis, code review, or knowledge graph construction.

7.5 Image Recognition In Gray-Scale Testing

Upload images to obtain scene descriptions, object identification, and scene analysis. Currently available only to some users; expected to be fully rolled out in May.

VIII. Shortcomings and Limitations

8.1 Multimodal Capabilities Not Yet Mature

Although the image recognition feature has entered gray-scale testing, the native multimodal version has not yet been officially released. The current DeepSeek-V4-Pro and Flash versions only support plain text input and output, with zero image generation capability.

8.2 Insufficient Code Generation Aesthetics

Multiple evaluations point out that the web frontends generated by DeepSeek-V4 significantly lag behind comparable products like Codex and Hy3 in visual design, color transitions, and interactive aesthetics, requiring manual beautification. Technical document generation is its core strength, with API documentation and code comment terminology accuracy reaching 96%, and it can automatically adapt to 16 programming language specifications. However, its multimodal generation capability is almost zero, supporting only plain text.

8.3 Gap with Top-Tier Closed-Source Models Remains

DeepSeek officially admits to a gap of approximately 3 to 6 months compared to top-tier closed-source models like GPT-5.4 and Gemini-3.1-Pro. In the SuperCLUE evaluation, there is still a distance from the international leading level in code generation quality and complex multi-step instruction execution.

8.4 Limited Service Throughput

Constrained by high-end computing power, the current service throughput of the Pro model is very limited. DeepSeek has clearly stated that significant improvements are expected only after the mass market availability of Huawei Ascend 950 super-nodes in the second half of the year.

IX. Conclusion: Who Should Choose DeepSeek-V4?

The release of DeepSeek-V4 essentially answers this question: In today's world where top-tier AI capabilities are increasingly concentrated in the hands of a few closed-source giants, can the open-source route provide a truly competitive alternative?

The answer is a resounding yes.

🎯 Suitable Audiences at a Glance

Individual Users: The web and app versions are completely free, offering direct access to the powerful capabilities of million-token context and deep reasoning
Developers: V4 is compatible with mainstream API protocols, allowing one-click migration at one-tenth the price of competitors
Enterprise Users: The MIT open-source license means private deployment is possible, data security is controllable, and long-term ownership costs are far lower than closed-source model subscriptions

DeepSeek's hand is open source, pricing, and the domestic computing ecosystem — admitting that a gap still exists with the very top closed-source models actually makes this hand play out more steadily.

Of course, if you have rigid demands for multimodal capabilities, or high requirements for the aesthetic quality of generated code frontends, you might currently need to pair it with other tools. However, in terms of overall capability, cost, and freedom, DeepSeek-V4 is undoubtedly the most noteworthy open-source large model of 2026 — bar none.

Review Date: April 30, 2026
Model Version Reviewed: DeepSeek-V4-Pro (Preview) / DeepSeek-V4-Flash (Preview)