Jonathan Ng

165 posts

Jonathan Ng

Jonathan Ng

@JnJonathanpro

Katılım Eylül 2014
31 Takip Edilen7 Takipçiler
DeepSeek
DeepSeek@deepseek_ai·
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n
DeepSeek tweet media
English
1.6K
7.7K
45K
9.6M
Max Weinbach
Max Weinbach@mweinbach·
Yea deepseek v4 flash/pro don't really perform that well compared to any of the major US models, even 1-2 revisions old. Looks like it's slightly behind Opus 4.5 in practice, and on par or slightly behind Kimi K2.6 Some good optimization techniques there, but overall, eh
English
71
13
425
66.1K
Bindu Reddy
Bindu Reddy@bindureddy·
YAY!!! - DEEPSEEK V4 IS OUT 🚀🚀🚀 Initial benchmarks numbers are ABSOLUTELY ASTOUNDING!! Opus 4.7 Max and GPT 5.5 level! Scrambling to verify their numbers
Bindu Reddy tweet media
English
73
104
1.3K
72.3K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Introducing our biggest upgrades to the Deep Research API yet... including Deep Research Max (our SOTA system), MCP support, Native charts & infographics, planning mode, full tool support (including Google tools), full multi-modal input support, & real-time progress streaming!
Logan Kilpatrick tweet media
English
119
145
1.9K
123.1K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Excited to share that Google AI subscriptions (Pro and Ultra) now work with @GoogleAIStudio!! Come vibe code and use the playground with higher rate limits. Available right now!
English
237
204
3K
282.3K
Yifan Zhang
Yifan Zhang@yifan_zhang_·
Recursive self-improvement via coding agents is the top priority for all frontier labs.
English
43
58
991
69.6K
Jonathan Ng
Jonathan Ng@JnJonathanpro·
@LexnLin Chatgpt, make me the full GTA 6 game, make no mistakes
English
1
0
1
182
Leon Lin
Leon Lin@LexnLin·
gpt 5.4 (5.5) pro is insane didn't expect to get this in one shot
English
23
29
591
41.6K
Jonathan Ng
Jonathan Ng@JnJonathanpro·
@xiangxiang103 Ai slop, just look at the generated images for general, reasoning, math, code etc
English
0
0
0
41
雨哥向前冲
雨哥向前冲@xiangxiang103·
疑似 DeepSeek V4 技术报告的 benchmark泄露! 图中 的“DeepSeek-V4 技术报告”基准测试(Benchmark)数据,为我们揭示了当前顶尖 AI 大模型的最新竞争格局。从这份涵盖通用能力、推理数学、代码以及智能体(Agent)四大维度的成绩单来看,DeepSeek-V4 展现出了极强的统治力,而大模型的第一梯队也正在经历重新洗牌。 🏆 竞争格局:当前大模型梯队划分 从图表数据中,我们可以清晰地看到五款参评模型的实力分层: 👑 领跑者:DeepSeek-V4在所有 12 项严苛测试中均斩获最高分(State-of-the-Art),实现了跨维度的全面压制。 🥈 最强追赶者:Gemini 3.1 Pro Preview在多项核心指标上紧咬榜首,并在绝大多数测试成绩中超越了 GPT-5.3,是目前这组对比中最具竞争力的对手。 🥉 第三梯队:GPT-5.3 与 Claude Opus 4.6两者互有胜负,依然保持着极高的水准,但在最顶尖的竞争中已稍显疲态。 📏 基准线参照:GPT-4.1作为较早期模型的代表,其数据在这组对比中全面垫底,但也直观地印证了新一代模型技术跨越的幅度之大。 🔍 四大核心能力深度拆解 🧠 1. 综合常识与学科能力 (General) 核心数据:在极具挑战性的 MMLU-Pro 测试中,DeepSeek-V4 (91.2) 和 Gemini 3.1 Pro (90.0) 是唯二突破 90 分大关的模型。 行业洞察:跨学科的专家级知识问答对头部模型已不再是难题。GPT-5.3 (88.4) 和 Claude (86.7) 在这方面稍显落后,知识密度的竞争正在向 90+ 的极限逼近。 🧮 2. 数学与复杂推理 (Reasoning & Math) 核心数据:顶尖数学竞赛基准 AIME 2025 呈现极度“内卷”的态势(DeepSeek-V4 96.4,Gemini 95.0,GPT-5.3 94.6)。 行业洞察:数学是 AI 进步最神速的领域。90+ 的得分意味着这些模型在解决人类高难度奥数题时已经具备了压倒性的优势,各家在这个领域的差距往往只在几道题之间。 💻 3. 编程与工程能力 (Code) 核心数据:在 Codeforces(算法竞赛平台)上,DeepSeek-V4 飙升至 2767 分,拉开显著差距;但在评估修复真实软件工程 Bug 的 SWE-bench Verified 中,所有模型均未突破 60%(最高为 DeepSeek-V4 的 59.6%)。 行业洞察:“写算法题容易,改人类代码难”。模型在纯逻辑生成上已经达到竞赛级选手水平,但在理解和修改复杂的现实商业代码库时,依然存在明显的短板。 🤖 4. 智能体自主行动 (Agent) 核心数据:在模拟网页浏览和执行任务的 WebArena 测试中,全场最高分(DeepSeek-V4)仅为 58.7,GPT-4.1 甚至低至 44.8。 行业洞察:这是全表绝对得分最低的板块。它反映了当前的行业痛点:大模型“做题”和“写文章”能力极强,但如果让它像人类一样自主操作浏览器、跨应用处理多步骤现实任务,成功率依然堪忧。 💡 核心总结这份基准测试不仅是 DeepSeek-V4 强悍实力的“肌肉秀”,也侧面印证了 Gemini 3.1 Pro Preview 在当前技术路线上的极强竞争力。更重要的是,它为行业指明了下一步的攻坚方向——当模型的知识储备和做题能力逼近人类极限时,突破“智能体自主执行 (Agentic tasks)”的现实应用瓶颈,将是决定下一代 AI 霸权的关键。 仅针对图中数据解读,真实情况还有待验证!
雨哥向前冲 tweet media雨哥向前冲 tweet media
中文
79
37
237
123.4K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Excited to share that the Gemini API now has prepaid billing, rolled out to start for US customers!! We have been working hard across Google to enable this. It’s the default for new API users and existing users can opt in via a new billing account, all directly in AI Studio.
Google AI Studio@GoogleAIStudio

x.com/i/article/2044…

English
55
52
623
76.7K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Introducing Tab Tab Tab, our new prompt auto complete engine in @GoogleAIStudio's vibe coding experience. Now when you show up with your fuzzy ideas, you can rely on Gemini to fill in the blanks : )
Logan Kilpatrick tweet media
English
98
97
1.3K
63.2K
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
So, was the wait worth it?
English
950
75
4.7K
488.1K
Jonathan Ng
Jonathan Ng@JnJonathanpro·
@_philschmid Chatgpt allows you to read the voice chat history live while actively using it. When is Gemini gonna get this feature? Also could you guys increase the context size, would be useful being able to voice chat with it without forgetting it's context when role playing
English
0
0
0
242
Philipp Schmid
Philipp Schmid@_philschmid·
We just launched Gemini 3.1 Flash Live! Our fastest, most natural real-time voice AI model for building Agents. - Scores 90.8% on ComplexFuncBench Audio for tool use. - 70 languages, Video streaming, Audio transcriptions, 128k context - Comes with Agent Skill for building live voice agents. - All generated audio is watermarked with SynthID.
Philipp Schmid tweet media
English
28
48
451
27.7K
Jonathan Ng
Jonathan Ng@JnJonathanpro·
@_philschmid Chatgpt allows you to read the voice chat history live while actively using it. When is Gemini gonna get this feature?
English
0
0
0
172
Jonathan Ng
Jonathan Ng@JnJonathanpro·
@OfficialLoganK Chatgpt allows you to read the voice chat history live while actively using it. When is Gemini gonna get this feature?
English
1
0
1
53
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
Introducing Gemini 3.1 Flash Live, our new realtime model to build voice and vision agents!! We have spent more than a year improving the model + infra + experience, the results? A step function improvement in quality, reliability, and latency.
Logan Kilpatrick tweet mediaLogan Kilpatrick tweet media
English
235
284
3.5K
314.7K
Jonathan Ng
Jonathan Ng@JnJonathanpro·
@OfficialLoganK Chatgpt allows you to read the voice chat history live while actively using it. When is Gemini gonna get this feature?
English
0
0
1
23
Jonathan Ng
Jonathan Ng@JnJonathanpro·
@imhimgojou I'll be honest it was a nearly 10/10 episode but this left a bad taste but still 9/10 regardless
English
0
0
0
776
Jonathan Ng
Jonathan Ng@JnJonathanpro·
@ParasiticEager2 The first granite blast bridge scene had lots of weird CGI nonetheless it's a 9/10 episode
English
1
0
9
2.5K
Jaeger🍓
Jaeger🍓@ParasiticEager2·
sadly it may be the last time Gosso did storyboarding for jjk
Jaeger🍓 tweet mediaJaeger🍓 tweet mediaJaeger🍓 tweet mediaJaeger🍓 tweet media
English
7
95
2.3K
32.6K
Jonathan Ng
Jonathan Ng@JnJonathanpro·
Deepseek V4 Confirmed
Jonathan Ng tweet media
English
0
0
0
121
Yashas
Yashas@YashasGunderia·
@bdsqlsz we're ready. autoresearch + deepseek v4 is gonna blow people's mind!
English
2
1
70
15.9K