Jonathan Ng

165 posts

Jonathan Ng

@JnJonathanpro

Katılım Eylül 2014

31 Takip Edilen7 Takipçiler

Jonathan Ng@JnJonathanpro·24 Nis

@deepseek_ai What a disappointment, V4 = Mid

English

DeepSeek@deepseek_ai·24 Nis

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

1.6K

7.7K

45K

9.6M

Jonathan Ng@JnJonathanpro·24 Nis

@mweinbach @stepbystepnomad Deepseek v4 is a shit hole, I was expecting Mythos levels of performance at the very least

English

Max Weinbach@mweinbach·24 Nis

@stepbystepnomad model arch and model performance are different things

English

7.2K

Max Weinbach@mweinbach·24 Nis

Yea deepseek v4 flash/pro don't really perform that well compared to any of the major US models, even 1-2 revisions old. Looks like it's slightly behind Opus 4.5 in practice, and on par or slightly behind Kimi K2.6 Some good optimization techniques there, but overall, eh

English

425

66.1K

Jonathan Ng@JnJonathanpro·24 Nis

@MSaintjour @bindureddy Yes

103

Marc Saint-Jour@MSaintjour·24 Nis

@bindureddy Celebrating foreign company distilling American technology?

English

2.4K

Bindu Reddy@bindureddy·24 Nis

YAY!!! - DEEPSEEK V4 IS OUT 🚀🚀🚀 Initial benchmarks numbers are ABSOLUTELY ASTOUNDING!! Opus 4.7 Max and GPT 5.5 level! Scrambling to verify their numbers

English

104

1.3K

72.3K

Jonathan Ng@JnJonathanpro·22 Nis

@RichardGibbonsX @OfficialLoganK @GoogleAIStudio He doesn't care

English

Richard Gibbons@RichardGibbonsX·21 Nis

@OfficialLoganK @GoogleAIStudio Google AI Ultra for Business doesnt qualify for aistudio.google.com ?

English

162

Logan Kilpatrick@OfficialLoganK·21 Nis

Introducing our biggest upgrades to the Deep Research API yet... including Deep Research Max (our SOTA system), MCP support, Native charts & infographics, planning mode, full tool support (including Google tools), full multi-modal input support, & real-time progress streaming!

English

119

145

1.9K

123.1K

Jonathan Ng@JnJonathanpro·21 Nis

@OfficialLoganK @GoogleAIStudio I guess my Reddit post finally got to you

English

Logan Kilpatrick@OfficialLoganK·21 Nis

Excited to share that Google AI subscriptions (Pro and Ultra) now work with @GoogleAIStudio!! Come vibe code and use the playground with higher rate limits. Available right now!

English

237

204

282.3K

Jonathan Ng@JnJonathanpro·20 Nis

@yifan_zhang_ Deepseek V4 will have recursive self improvement!

English

728

Yifan Zhang@yifan_zhang_·20 Nis

Recursive self-improvement via coding agents is the top priority for all frontier labs.

English

991

69.6K

Jonathan Ng@JnJonathanpro·20 Nis

@LexnLin Chatgpt, make me the full GTA 6 game, make no mistakes

English

182

Leon Lin@LexnLin·19 Nis

gpt 5.4 (5.5) pro is insane didn't expect to get this in one shot

English

591

41.6K

Jonathan Ng@JnJonathanpro·18 Nis

@xiangxiang103 Ai slop, just look at the generated images for general, reasoning, math, code etc

English

雨哥向前冲@xiangxiang103·17 Nis

疑似 DeepSeek V4 技术报告的 benchmark泄露！图中的“DeepSeek-V4 技术报告”基准测试（Benchmark）数据，为我们揭示了当前顶尖 AI 大模型的最新竞争格局。从这份涵盖通用能力、推理数学、代码以及智能体（Agent）四大维度的成绩单来看，DeepSeek-V4 展现出了极强的统治力，而大模型的第一梯队也正在经历重新洗牌。 🏆 竞争格局：当前大模型梯队划分从图表数据中，我们可以清晰地看到五款参评模型的实力分层： 👑 领跑者：DeepSeek-V4在所有 12 项严苛测试中均斩获最高分（State-of-the-Art），实现了跨维度的全面压制。 🥈 最强追赶者：Gemini 3.1 Pro Preview在多项核心指标上紧咬榜首，并在绝大多数测试成绩中超越了 GPT-5.3，是目前这组对比中最具竞争力的对手。 🥉 第三梯队：GPT-5.3 与 Claude Opus 4.6两者互有胜负，依然保持着极高的水准，但在最顶尖的竞争中已稍显疲态。 📏 基准线参照：GPT-4.1作为较早期模型的代表，其数据在这组对比中全面垫底，但也直观地印证了新一代模型技术跨越的幅度之大。 🔍 四大核心能力深度拆解 🧠 1. 综合常识与学科能力 (General) 核心数据：在极具挑战性的 MMLU-Pro 测试中，DeepSeek-V4 (91.2) 和 Gemini 3.1 Pro (90.0) 是唯二突破 90 分大关的模型。行业洞察：跨学科的专家级知识问答对头部模型已不再是难题。GPT-5.3 (88.4) 和 Claude (86.7) 在这方面稍显落后，知识密度的竞争正在向 90+ 的极限逼近。 🧮 2. 数学与复杂推理 (Reasoning & Math) 核心数据：顶尖数学竞赛基准 AIME 2025 呈现极度“内卷”的态势（DeepSeek-V4 96.4，Gemini 95.0，GPT-5.3 94.6）。行业洞察：数学是 AI 进步最神速的领域。90+ 的得分意味着这些模型在解决人类高难度奥数题时已经具备了压倒性的优势，各家在这个领域的差距往往只在几道题之间。 💻 3. 编程与工程能力 (Code) 核心数据：在 Codeforces（算法竞赛平台）上，DeepSeek-V4 飙升至 2767 分，拉开显著差距；但在评估修复真实软件工程 Bug 的 SWE-bench Verified 中，所有模型均未突破 60%（最高为 DeepSeek-V4 的 59.6%）。行业洞察：“写算法题容易，改人类代码难”。模型在纯逻辑生成上已经达到竞赛级选手水平，但在理解和修改复杂的现实商业代码库时，依然存在明显的短板。 🤖 4. 智能体自主行动 (Agent) 核心数据：在模拟网页浏览和执行任务的 WebArena 测试中，全场最高分（DeepSeek-V4）仅为 58.7，GPT-4.1 甚至低至 44.8。行业洞察：这是全表绝对得分最低的板块。它反映了当前的行业痛点：大模型“做题”和“写文章”能力极强，但如果让它像人类一样自主操作浏览器、跨应用处理多步骤现实任务，成功率依然堪忧。 💡 核心总结这份基准测试不仅是 DeepSeek-V4 强悍实力的“肌肉秀”，也侧面印证了 Gemini 3.1 Pro Preview 在当前技术路线上的极强竞争力。更重要的是，它为行业指明了下一步的攻坚方向——当模型的知识储备和做题能力逼近人类极限时，突破“智能体自主执行 (Agentic tasks)”的现实应用瓶颈，将是决定下一代 AI 霸权的关键。仅针对图中数据解读，真实情况还有待验证！

中文

237

123.4K

Jonathan Ng@JnJonathanpro·16 Nis

@OfficialLoganK What about this?

English

102

Logan Kilpatrick@OfficialLoganK·15 Nis

Excited to share that the Gemini API now has prepaid billing, rolled out to start for US customers!! We have been working hard across Google to enable this. It’s the default for new API users and existing users can opt in via a new billing account, all directly in AI Studio.

Google AI Studio@GoogleAIStudio

x.com/i/article/2044…

English

623

76.7K

Jonathan Ng@JnJonathanpro·14 Nis

@OfficialLoganK @GoogleAIStudio Been a 150 days since you introduced this mate

English

198

Logan Kilpatrick@OfficialLoganK·13 Nis

Introducing Tab Tab Tab, our new prompt auto complete engine in @GoogleAIStudio's vibe coding experience. Now when you show up with your fuzzy ideas, you can rely on Gemini to fill in the blanks : )

English

1.3K

63.2K

Jonathan Ng@JnJonathanpro·12 Nis

@OfficialLoganK @IamEmily2050 Wow this was such a lie :)

English

Logan Kilpatrick@OfficialLoganK·19 Kas

@IamEmily2050 We are working on AI Studio in the Google AI Pro subscription :)

English

214

25.3K

Logan Kilpatrick@OfficialLoganK·19 Kas

So, was the wait worth it?

English

950

4.7K

488.1K

Jonathan Ng@JnJonathanpro·26 Mar

@_philschmid Chatgpt allows you to read the voice chat history live while actively using it. When is Gemini gonna get this feature? Also could you guys increase the context size, would be useful being able to voice chat with it without forgetting it's context when role playing

English

242

Philipp Schmid@_philschmid·26 Mar

We just launched Gemini 3.1 Flash Live! Our fastest, most natural real-time voice AI model for building Agents. - Scores 90.8% on ComplexFuncBench Audio for tool use. - 70 languages, Video streaming, Audio transcriptions, 128k context - Comes with Agent Skill for building live voice agents. - All generated audio is watermarked with SynthID.

English

451

27.7K

Jonathan Ng@JnJonathanpro·26 Mar

@_philschmid Chatgpt allows you to read the voice chat history live while actively using it. When is Gemini gonna get this feature?

English

172

Jonathan Ng@JnJonathanpro·26 Mar

@OfficialLoganK Yes

Jonathan Ng@JnJonathanpro·26 Mar

@OfficialLoganK Chatgpt allows you to read the voice chat history live while actively using it. When is Gemini gonna get this feature?

English

Logan Kilpatrick@OfficialLoganK·26 Mar

Introducing Gemini 3.1 Flash Live, our new realtime model to build voice and vision agents!! We have spent more than a year improving the model + infra + experience, the results? A step function improvement in quality, reliability, and latency.