steeeve

2.2K posts

steeeve

steeeve

@megakilo

nerd is the new cool; agentic AI fanboy

SF Bay Area Katılım Mart 2009
1.2K Takip Edilen192 Takipçiler
阿绎 AYi
阿绎 AYi@AYi_AInotes·
前几天黄仁勋刚拍着胸脯说我们已经实现AGI了, 谷歌一个测试直接把这话的底裤都扒了😂 3月25号谷歌悄无声息发布了ARC-AGI-3基准测试, 没有铺天盖地的炒作,没有新品发布的噱头,就只有一个测试网格和一个计时器, 结果48小时后出来的成绩,直接把整个AI圈都看傻了, 万众期待的GPT-5,正确率只有0.26%, 同梯队的Claude,只拿到了0.25%, 马斯克的Grok更离谱,直接挂零交了白卷, 更打脸的是,一个开发者花几周时间搭出来的4层简单CNN,直接干到了12.58%的正确率,吊打所有千亿级的顶流大模型, 更扎心的还在后面,旧金山街头随便找的无业人员,面对这套测试,直接拿到了100%的满分, 后续的发展更是离谱, 有开发者只用了一个周末,就完整复现了那个获胜CNN的方法, 主办方设下的70万美金大奖,至今依旧无人能拿走, 眼看大模型集体翻车,分析师们急着跳出来说这个基准测试不公平, DRAM相关股票应声下跌,AI泡沫的讨论再次席卷了整个行业, 记住0.26%这个数字,下次再有人跟你说AGI已经来了,就把这个数甩给他😁
BuBBliK@k1rallik

> Google publishes ARC-AGI-3 benchmark on March 25 > no hype, no product launch, just a grid and a timer > 48 hours later: > GPT-5: 0.26%, Claude: 0.25%, Grok: 0% > a 4-layer CNN built in weeks: 12.58% > Jensen Huang 2 days before: "I think we've achieved AGI" > random unemployed people from SF streets: 100% > one dev replicates the winning approach over a weekend > analysts say the benchmark is unfair > $700K prize still untouched > DRAM stocks drop, AI bubble discourse returns the number is 0.26%. Remember it next time someone says AGI is already here.

中文
8
7
57
31.1K
Tech Dev Notes
Tech Dev Notes@techdevnotes·
3 days absolutely nothing shipped from xAI which could only mean one thing
English
26
3
77
6.1K
steeeve
steeeve@megakilo·
@yoyonofukuoka I didn’t turn on auto translate but it’s just one click away. Thanks to Grok / X. Feel free to post anything, mate.
English
0
0
0
12
kouji 🇯🇵
kouji 🇯🇵@yoyonofukuoka·
日本語投稿が、自動で英語翻訳されてると聞いたけど、それは本当ですか? 本当ならメチャクチャ嬉しい。
日本語
2.7K
347
23K
403K
觉醒的熊Bear
觉醒的熊Bear@AwakeningBear01·
大部分推特的长文感觉都是 AI 写的, 虽然的确也有价值,但是感觉就跟吃了预制菜一样, 味道还可以,但是说不上来的怪。
中文
32
0
75
12.2K
steeeve
steeeve@megakilo·
@CuiMao 自从去年GPT image和nano banana出了以后就不用了。因为本地跑的模型对世界(和语言)的理解不够,控制力不行。现在变强了吗?
中文
0
0
1
199
CuiMao
CuiMao@CuiMao·
有多人亲自用过 comfyui ?
中文
19
0
5
5.2K
steeeve
steeeve@megakilo·
@PandaTalk8 tool use突破了llm只有嘴的限制才是精髓
日本語
0
0
0
389
Mr Panda
Mr Panda@PandaTalk8·
必须佩服这些工程师们, 不久前大家聊的是 Prompt 后来有人开始讲 Prompt 工程, 这样更加专业了。 后来人们不讲 Prompt 工程了, 开始讲 Context 工程。 现在大家也不讲 Context 工程了, 已经进化到了 Harness 工程了。
中文
39
13
207
61.1K
Elaina
Elaina@Elaina43114880·
What model is “pteronura” on lmarena? I’ve picked it several times in a row and I think it’s pretty good at roleplay. 🤔🔍
Elaina tweet media
English
6
1
21
2.6K
AImaster
AImaster@CeoSpaceY·
伟大的数学家,AI科学家,物理学奥林匹克,物理学学士,博士,神经医学博士后。Amodei先生。领导第一流的ai公司,Anthropic,的唯一产品,claude.
AImaster tweet mediaAImaster tweet media
中文
21
44
397
88.9K
steeeve
steeeve@megakilo·
@fkysly @Jason_Young1231 来的人是cursor出身啊,应该还会有的,现在头部公司都是AI自我迭代了
中文
0
0
0
15
马天翼
马天翼@fkysly·
马斯克的 Grok Code 咋还没上线啊,他之前说的 soon,该不会是半年以后吧? 不是说马斯克速度都很快,什么飞速压榨么?
中文
2
0
9
4.7K
steeeve
steeeve@megakilo·
@GeminiApp please make a $100 tier with some Deep Think usage per day + a bit more Antigravity quota
English
0
0
0
40
steeeve
steeeve@megakilo·
I was using @GeminiApp to follow the lawsuit between Anthropic and Pentagon. Gemini is so good at task creation directly from conversations with agency. It scheduled daily briefing for me and actively suggests follow up tasks. @OfficialLoganK @joshwoodward ❤️
steeeve tweet media
English
0
0
0
16
Aditya 🦀
Aditya 🦀@ItsAditya_xyz·
Claude is down bcz a guy named Jian Yang from China is distilling it.
English
17
22
998
51.4K
steeeve
steeeve@megakilo·
@passluo 我的理解是Mythos是系列名称,卡皮巴拉是这个系列的一个版本(就比如Opus系列里面4.6, 4.7内部应该是有两个代号)
中文
0
0
0
688
𝙋𝙖𝙨𝙨𝙡𝙪𝙤
haiku 俳句 / sonnet 十四行诗 / opus 杰作 所以 mythos 神话我是能理解的,都是文学作品嘛 但你这个卡皮巴拉是什么鬼?????
𝙋𝙖𝙨𝙨𝙡𝙪𝙤 tweet media
中文
16
0
57
14K
anita
anita@anitakirkovska·
all of a suden OpenAI feels like Microsoft, and Claude like Apple.
English
307
973
22.7K
708.7K
面包🍞
面包🍞@himself65·
是在受不了了,我把Warp卸载了…… 很多时候渲染错的;乱七八糟的AI功能,能有Claude Code好用吗
中文
18
0
70
28.6K
steeeve
steeeve@megakilo·
@burkov Additionally you offered them to read for free before purchase
English
0
0
1
305
BURKOV
BURKOV@burkov·
Moron trusted a chatbot, bought my book thinking that it was "higher-level than some YouTube video," realized that it wasn't, punished the author. In the past, authors had the possibility to reply to reviews, but Amazon removed this possibility several years ago. Now morons can write whatever they want and no one can tell them they are morons.
BURKOV tweet media
English
12
2
45
7.8K
steeeve
steeeve@megakilo·
@_xjdr Nothing beats pied piper
English
0
0
1
134
xjdr
xjdr@_xjdr·
this is very cool! a few things tho: kv cache compression is not new (MLA, sglang / vllm have fp8 cache, etc) this only applies to inference so the main model HMB constraints remain the same this has nothing to do with pied piper
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
6
7
133
11.2K