steeeve

2.2K posts

steeeve

@megakilo

nerd is the new cool; agentic AI fanboy

SF Bay Area Katılım Mart 2009

1.2K Takip Edilen192 Takipçiler

steeeve@megakilo·2h

@AYi_AInotes ARC-AGI-3不是谷歌的啊

中文

929

阿绎 AYi@AYi_AInotes·2h

前几天黄仁勋刚拍着胸脯说我们已经实现AGI了，谷歌一个测试直接把这话的底裤都扒了😂 3月25号谷歌悄无声息发布了ARC-AGI-3基准测试，没有铺天盖地的炒作，没有新品发布的噱头，就只有一个测试网格和一个计时器，结果48小时后出来的成绩，直接把整个AI圈都看傻了，万众期待的GPT-5，正确率只有0.26%，同梯队的Claude，只拿到了0.25%，马斯克的Grok更离谱，直接挂零交了白卷，更打脸的是，一个开发者花几周时间搭出来的4层简单CNN，直接干到了12.58%的正确率，吊打所有千亿级的顶流大模型，更扎心的还在后面，旧金山街头随便找的无业人员，面对这套测试，直接拿到了100%的满分，后续的发展更是离谱，有开发者只用了一个周末，就完整复现了那个获胜CNN的方法，主办方设下的70万美金大奖，至今依旧无人能拿走，眼看大模型集体翻车，分析师们急着跳出来说这个基准测试不公平， DRAM相关股票应声下跌，AI泡沫的讨论再次席卷了整个行业，记住0.26%这个数字，下次再有人跟你说AGI已经来了，就把这个数甩给他😁

BuBBliK@k1rallik

> Google publishes ARC-AGI-3 benchmark on March 25 > no hype, no product launch, just a grid and a timer > 48 hours later: > GPT-5: 0.26%, Claude: 0.25%, Grok: 0% > a 4-layer CNN built in weeks: 12.58% > Jensen Huang 2 days before: "I think we've achieved AGI" > random unemployed people from SF streets: 100% > one dev replicates the winning approach over a weekend > analysts say the benchmark is unfair > $700K prize still untouched > DRAM stocks drop, AI bubble discourse returns the number is 0.26%. Remember it next time someone says AGI is already here.

中文

31.1K

steeeve@megakilo·3h

@techdevnotes It means tomorrow would be 4 days

English

207

Tech Dev Notes@techdevnotes·5h

3 days absolutely nothing shipped from xAI which could only mean one thing

English

6.1K

steeeve@megakilo·19h

@yoyonofukuoka I didn’t turn on auto translate but it’s just one click away. Thanks to Grok / X. Feel free to post anything, mate.

English

kouji 🇯🇵@yoyonofukuoka·1d

日本語投稿が、自動で英語翻訳されてると聞いたけど、それは本当ですか？本当ならメチャクチャ嬉しい。

日本語

2.7K

347

23K

403K

steeeve@megakilo·1d

@AwakeningBear01 确实，因为文章没有独特想法，基本纯总结

中文

132

觉醒的熊Bear@AwakeningBear01·1d

大部分推特的长文感觉都是 AI 写的，虽然的确也有价值，但是感觉就跟吃了预制菜一样，味道还可以，但是说不上来的怪。

中文

12.2K

steeeve@megakilo·1d

@CuiMao 自从去年GPT image和nano banana出了以后就不用了。因为本地跑的模型对世界（和语言）的理解不够，控制力不行。现在变强了吗？

中文

199

CuiMao@CuiMao·1d

有多人亲自用过 comfyui ？

中文

5.2K

steeeve@megakilo·1d

@PandaTalk8 tool use突破了llm只有嘴的限制才是精髓

日本語

389

Mr Panda@PandaTalk8·2d

必须佩服这些工程师们，不久前大家聊的是 Prompt 后来有人开始讲 Prompt 工程，这样更加专业了。后来人们不讲 Prompt 工程了，开始讲 Context 工程。现在大家也不讲 Context 工程了，已经进化到了 Harness 工程了。

中文

207

61.1K

steeeve@megakilo·1d

@Elaina43114880 Try asking a follow up question about its creator

English

140

Elaina@Elaina43114880·2d

What model is “pteronura” on lmarena? I’ve picked it several times in a row and I think it’s pretty good at roleplay. 🤔🔍

English

2.6K

steeeve@megakilo·2d

The key focus would be on the agentic capabilities (instruction follow, tool calling, agency) of Gemma 4.

AiBattle@AiBattle_

A Gemma 4 model is likely being tested on the Arena under the name "significant-otter"

English

steeeve@megakilo·2d

@CeoSpaceY 封号太多了😂

中文

1.8K

AImaster@CeoSpaceY·2d

伟大的数学家，AI科学家，物理学奥林匹克，物理学学士，博士，神经医学博士后。Amodei先生。领导第一流的ai公司，Anthropic,的唯一产品，claude.

中文

397

88.9K

steeeve@megakilo·2d

@fkysly @Jason_Young1231 来的人是cursor出身啊，应该还会有的，现在头部公司都是AI自我迭代了

中文

马天翼@fkysly·2d

@Jason_Young1231 那完了

日本語

414

马天翼@fkysly·2d

马斯克的 Grok Code 咋还没上线啊，他之前说的 soon，该不会是半年以后吧？不是说马斯克速度都很快，什么飞速压榨么？

中文

4.7K

steeeve@megakilo·2d

@GeminiApp please make a $100 tier with some Deep Think usage per day + a bit more Antigravity quota

English

steeeve@megakilo·3d

I was using @GeminiApp to follow the lawsuit between Anthropic and Pentagon. Gemini is so good at task creation directly from conversations with agency. It scheduled daily briefing for me and actively suggests follow up tasks. @OfficialLoganK @joshwoodward ❤️

English

steeeve@megakilo·3d

@ItsAditya_xyz New Claude

English

737

Aditya 🦀@ItsAditya_xyz·3d

Claude is down bcz a guy named Jian Yang from China is distilling it.

English

998

51.4K

steeeve@megakilo·3d

@passluo 我的理解是Mythos是系列名称，卡皮巴拉是这个系列的一个版本（就比如Opus系列里面4.6, 4.7内部应该是有两个代号）

中文

688

𝙋𝙖𝙨𝙨𝙡𝙪𝙤@passluo·3d

haiku 俳句 / sonnet 十四行诗 / opus 杰作所以 mythos 神话我是能理解的，都是文学作品嘛但你这个卡皮巴拉是什么鬼？？？？？

中文

14K

steeeve@megakilo·3d

@vanyiliu @anitakirkovska Android

English

852

Ivan Lau@vanyiliu·3d

@anitakirkovska what does Gemini like?

English

10.9K

anita@anitakirkovska·3d

all of a suden OpenAI feels like Microsoft, and Claude like Apple.

English

307

973

22.7K

708.7K

steeeve@megakilo·3d

@himself65 官配ghostty哈

日本語

813

面包🍞@himself65·3d

是在受不了了，我把Warp卸载了…… 很多时候渲染错的；乱七八糟的AI功能，能有Claude Code好用吗

中文

28.6K

steeeve@megakilo·4d

@burkov Additionally you offered them to read for free before purchase

English

305

BURKOV@burkov·4d

Moron trusted a chatbot, bought my book thinking that it was "higher-level than some YouTube video," realized that it wasn't, punished the author. In the past, authors had the possibility to reply to reviews, but Amazon removed this possibility several years ago. Now morons can write whatever they want and no one can tell them they are morons.

English

7.8K

steeeve@megakilo·4d

@_xjdr Nothing beats pied piper

English

134

xjdr@_xjdr·5d

this is very cool! a few things tho: kv cache compression is not new (MLA, sglang / vllm have fp8 cache, etc) this only applies to inference so the main model HMB constraints remain the same this has nothing to do with pied piper

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

133

11.2K

steeeve@megakilo·5d

Win for freedom

SCOTUS Wire@scotus_wire

🚨 The Supreme Court reversed a $1B copyright verdict, holding that Internet Service Providers aren’t liable for user piracy based solely on knowledge of infringement.

English

steeeve@megakilo·5d

@mtrainier2020 都怪LeBron，42了身体还那么好

中文

508

Rainier@mtrainier2020·5d

这是要准备draft了吗？😱

卖茶叶蛋de理查德@Richard77778888

从 2026 年 4 月 20 日起，美国陆军将最高入伍年龄从 34 岁放宽到 42 岁

中文

29.9K

Keşfet

@AYi_AInotes @techdevnotes @yoyonofukuoka @AwakeningBear01 @CuiMao @PandaTalk8 @Elaina43114880 @CeoSpaceY