guitarstring

12 posts

guitarstring

guitarstring

@guitarstring7

Pluck me

가입일 Aralık 2020
2.7K 팔로잉24 팔로워
Jeremy Bernstein
Jeremy Bernstein@jxbz·
I was really grateful to have the chance to speak at @Cohere_Labs and @ml_collective last week. My goal was to make the most helpful talk that I could have seen as a first-year grad student interested in neural network optimization. Sharing some info about the talk here... (1/6)
Jeremy Bernstein tweet media
English
9
47
562
55.4K
guitarstring
guitarstring@guitarstring7·
@teortaxesTex I definitely think this is a big part of it. But you can't tell me that wit and humor have nothing to do with intelligence and are entirely about looking cool. I think linguistic ability is another axis here
English
1
0
3
544
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
Truth nuke moment: I think one of the biggest issues with race relations is that whites do not correct for how they're more sexually selected than Asians, and it's mostly not about looks. Wypipos are straight up better at looking cool, and this generalizes to «looking smart».
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
English
18
5
183
20K
guitarstring
guitarstring@guitarstring7·
@teortaxesTex what I don't quite understand is them not wanting more money. That doesn't make sense. Surely they could use money
English
1
0
0
74
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
That's true. The issue is that people have not learned anything and regress to the mean quickly. DeepSeek has not made a major release in 3 months, does not hype itself, so we're back to «plucky little team, heroic effort, but…» mentality. It'll be the same shock all over again.
Karim Hummos@AiAnvil

@teortaxesTex R1s success had relatively little to do with its actual capabilities and a lot to do with media hype. Not sure v2 will be able to replicate that hype even with very strong benchmarks and pricing.

English
4
1
28
4.1K
guitarstring
guitarstring@guitarstring7·
@nanjiang_cs The gradient update's direction *does* change due to an affine transformation of the reward. What doesn't change is the theoretical policy gradient direction, which involves an expectation over potential trajectories. Is that correct?
English
1
0
0
105
guitarstring
guitarstring@guitarstring7·
@ruima Do you mean 10,000 A100s? That's what I see in their Fire-Flyer paper
English
0
0
0
116
Rui Ma
Rui Ma@ruima·
The AI community in China—including DeepSeek’s competitors—widely believes the company has around 10,000 H100s and ~3,000 H800s, as disclosed in their research papers. That’s a substantial number, but sure, let’s take an unsourced tweet claiming "50,000 Hoppers” (oh sorry not specifically H100s, just any Hopper) as the real figure and all cite it as gospel.* For context, H100s were banned from sale to China in Q4 2022, and H800s in Q4 2023. Both were legally available before those restrictions took effect. While DeepSeek was technically founded in 2023, its parent company, High Flyer, was founded in 2015. *Not that it even matters for the innovations they've published. And by publishing and raising their profile, I'm pretty sure they already know that they won't be getting their hands on any more NVDA chips. Also, for inference in China, they already announced they'll be partnering with Huawei and using Ascend chips.
English
3
3
47
5.1K
guitarstring
guitarstring@guitarstring7·
@teortaxesTex if v3 was trained with 2048 H800s for 2 months, is it so unreasonable that they might have access to 10x that number of cards? We also know they had 10000 A100s in 2021 (though that might be shared with high-flyer); it would be strange if they haven't grown from then
English
1
0
1
433
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
> body language of a scared liar > he keeps doubling down about "same order of magnitude cluster" Dario you don't want to know what Whale will do to your business with the same order of magnitude cluster, stop jinxing it
Tsarathustra@tsarnick

Dario Amodei says 2026-2027 is the critical window in AI and if you're ahead then, the models start getting better than humans at everything including AI design and using AI to make better AI, so export controls to prevent DeepSeek keeping up with US companies are worth continuing with

English
9
4
136
10.4K
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
The "deepseek distilled o1" is an intellectually vacuous discussion, precisely because what they reported in the R1 paper is a reproducible phenomenon! By now many experiments on non-deepseek models show that acc and inf-time compute increase as the result of outcome-based RL.
English
13
27
433
57.7K
guitarstring
guitarstring@guitarstring7·
@georgejrjrjr Rationalists are extremely verbal; the Chinese are not. If the indicator of brilliance most familiar to you is prolific verbal output, you might overlook some forms of talent
English
0
0
3
49
George
George@georgejrjrjr·
The rat on my shoulder protests: "That's not charitable!" Well, in arguing this with two Big Name Professional Doomer friends (who shall remain nameless) for ~18 months, I've personally made sure they are aware of many if not most of the big developments, and I've noticed they have persistently failed to update. In 2023, one asserted, "China won't permit domestic LLMs because effective censorship is not really possible, jailbreaks exist, etc." I point to GLM-130B (2022). No update. I'm dumbfounded. It's right there!!! Proof that Chinese labs are in fact scaling (and innovating with) LLMs. Then I created a chat, and from Nov 2023 to the present, I've been updating doomer policy shop leadership with whalesign. There was this recalcitrant belief that the US had an ample lead, so maybe a little US slowdown (and banning strong FOSS weights) means a global slowdown. No real concession of my point made until R1, ie, until it was proven beyond a shadow of a doubt that Chinese labs have caught up. In normal people that's fine, but rationalists and their ilk are expected to extrapolate a bit when updating on clear trends. Nope. Are the orgs they lead going to change tact *now*? Maybe. They've got bigger fish to fry than Llama while Meta flounders. But I have yet to see their public prescriptions change, and I'm not holding my breath.
George tweet mediaGeorge tweet mediaGeorge tweet mediaGeorge tweet media
English
2
0
7
580
George
George@georgejrjrjr·
Meta getting blindsided by @deepseek_ai’s V3 and R1 is bizarre: Meta’s own code benchmark showed them getting bested by an older 🐋 model January 2024.
George tweet media
English
1
2
69
7.7K
guitarstring
guitarstring@guitarstring7·
@francoisfleuret > For any elected position, *everyone* who votes has to be a candidate. What's the goal of this?
English
0
0
0
90
François Fleuret
François Fleuret@francoisfleuret·
Let me propose my political model. For any elected position, *everyone* who votes has to be a candidate. We build a balanced binary tree with everyone in a leaf. 1/2
English
5
0
16
4.2K