guitarstring

12 posts

guitarstring

@guitarstring7

Pluck me

가입일 Aralık 2020

2.7K 팔로잉24 팔로워

guitarstring@guitarstring7·17 May

@HighFreqAsuka do you have friends everywhere? how would you know

English

1.8K

Asuka@HighFreqAsuka·17 May

There are really only ~3 trading firms that are at the forefront of AI. HRT is one of those three.

Iain Dunning@iaindunning

Are you a researcher at OAI/Anthropic/etc and tired of overhiring, the orgchart chaos, the lowered talent bar, want to move to NYC, or just want to do something different? Email me, DM me, mail a postcard. We've got a new datacenter full of B200s, tight team, and very successful.

English

494

100.8K

guitarstring@guitarstring7·14 May

@jxbz @Cohere_Labs @ml_collective beautiful slides!

English

270

Jeremy Bernstein@jxbz·13 May

I was really grateful to have the chance to speak at @Cohere_Labs and @ml_collective last week. My goal was to make the most helpful talk that I could have seen as a first-year grad student interested in neural network optimization. Sharing some info about the talk here... (1/6)

English

562

55.4K

guitarstring@guitarstring7·25 Nis

@teortaxesTex I definitely think this is a big part of it. But you can't tell me that wit and humor have nothing to do with intelligence and are entirely about looking cool. I think linguistic ability is another axis here

English

544

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·25 Nis

Truth nuke moment: I think one of the biggest issues with race relations is that whites do not correct for how they're more sexually selected than Asians, and it's mostly not about looks. Wypipos are straight up better at looking cool, and this generalizes to «looking smart».

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

English

183

20K

guitarstring@guitarstring7·13 Nis

@teortaxesTex what I don't quite understand is them not wanting more money. That doesn't make sense. Surely they could use money

English

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·13 Nis

That's true. The issue is that people have not learned anything and regress to the mean quickly. DeepSeek has not made a major release in 3 months, does not hype itself, so we're back to «plucky little team, heroic effort, but…» mentality. It'll be the same shock all over again.

Karim Hummos@AiAnvil

@teortaxesTex R1s success had relatively little to do with its actual capabilities and a lot to do with media hype. Not sure v2 will be able to replicate that hype even with very strong benchmarks and pricing.

English

4.1K

guitarstring@guitarstring7·20 Şub

@nanjiang_cs The gradient update's direction *does* change due to an affine transformation of the reward. What doesn't change is the theoretical policy gradient direction, which involves an expectation over potential trajectories. Is that correct?

English

105

Nan Jiang@nanjiang_cs·19 Şub

Fun exercise: (1) SFT doesn’t use negative data. (2) PG’s direction doesn’t change with affine transformation of reward. (3) redefining reward as 2*reward-1 brings in negative data. What gives?

yobibyte@y0b1byte

RL/RLHF/LLM folks, is my reasoning correct? If we have two trajectories with sparse rewards (one traj with 0, one traj with 1), a single REINFORCE update step is equivalent to SFT with cross-entropy on the good trajectory with reward 1. Effectively, both of the methods want to go towards a policy that gives the probability of one to a good trajectory.

English

9.9K

guitarstring@guitarstring7·30 Oca

@ruima Do you mean 10,000 A100s? That's what I see in their Fire-Flyer paper

English

116

Rui Ma@ruima·30 Oca

The AI community in China—including DeepSeek’s competitors—widely believes the company has around 10,000 H100s and ~3,000 H800s, as disclosed in their research papers. That’s a substantial number, but sure, let’s take an unsourced tweet claiming "50,000 Hoppers” (oh sorry not specifically H100s, just any Hopper) as the real figure and all cite it as gospel.* For context, H100s were banned from sale to China in Q4 2022, and H800s in Q4 2023. Both were legally available before those restrictions took effect. While DeepSeek was technically founded in 2023, its parent company, High Flyer, was founded in 2015. *Not that it even matters for the innovations they've published. And by publishing and raising their profile, I'm pretty sure they already know that they won't be getting their hands on any more NVDA chips. Also, for inference in China, they already announced they'll be partnering with Huawei and using Ascend chips.

English

5.1K

guitarstring@guitarstring7·28 Oca

@teortaxesTex if v3 was trained with 2048 H800s for 2 months, is it so unreasonable that they might have access to 10x that number of cards? We also know they had 10000 A100s in 2021 (though that might be shared with high-flyer); it would be strange if they haven't grown from then

English

433

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·28 Oca

> body language of a scared liar > he keeps doubling down about "same order of magnitude cluster" Dario you don't want to know what Whale will do to your business with the same order of magnitude cluster, stop jinxing it

Tsarathustra@tsarnick

Dario Amodei says 2026-2027 is the critical window in AI and if you're ahead then, the models start getting better than humans at everything including AI design and using AI to make better AI, so export controls to prevent DeepSeek keeping up with US companies are worth continuing with

English

136

10.4K

guitarstring@guitarstring7·26 Oca

@DimitrisPapail @akatzzzzz i'm curious, why?

English

157

Dimitris Papailiopoulos@DimitrisPapail·26 Oca

@akatzzzzz nothing :)

English

929

Dimitris Papailiopoulos@DimitrisPapail·25 Oca

The "deepseek distilled o1" is an intellectually vacuous discussion, precisely because what they reported in the R1 paper is a reproducible phenomenon! By now many experiments on non-deepseek models show that acc and inf-time compute increase as the result of outcome-based RL.

English

433

57.7K

guitarstring@guitarstring7·25 Oca

@georgejrjrjr Rationalists are extremely verbal; the Chinese are not. If the indicator of brilliance most familiar to you is prolific verbal output, you might overlook some forms of talent

English

George@georgejrjrjr·25 Oca

The rat on my shoulder protests: "That's not charitable!" Well, in arguing this with two Big Name Professional Doomer friends (who shall remain nameless) for ~18 months, I've personally made sure they are aware of many if not most of the big developments, and I've noticed they have persistently failed to update. In 2023, one asserted, "China won't permit domestic LLMs because effective censorship is not really possible, jailbreaks exist, etc." I point to GLM-130B (2022). No update. I'm dumbfounded. It's right there!!! Proof that Chinese labs are in fact scaling (and innovating with) LLMs. Then I created a chat, and from Nov 2023 to the present, I've been updating doomer policy shop leadership with whalesign. There was this recalcitrant belief that the US had an ample lead, so maybe a little US slowdown (and banning strong FOSS weights) means a global slowdown. No real concession of my point made until R1, ie, until it was proven beyond a shadow of a doubt that Chinese labs have caught up. In normal people that's fine, but rationalists and their ilk are expected to extrapolate a bit when updating on clear trends. Nope. Are the orgs they lead going to change tact *now*? Maybe. They've got bigger fish to fry than Llama while Meta flounders. But I have yet to see their public prescriptions change, and I'm not holding my breath.

English

580

George@georgejrjrjr·24 Oca

Meta getting blindsided by @deepseek_ai’s V3 and R1 is bizarre: Meta’s own code benchmark showed them getting bested by an older 🐋 model January 2024.

English

7.7K

guitarstring@guitarstring7·8 Tem

@ccanonne_ their difference converges?

English

390

guitarstring@guitarstring7·8 Tem

@francoisfleuret > For any elected position, *everyone* who votes has to be a candidate. What's the goal of this?

English

François Fleuret@francoisfleuret·8 Tem

Let me propose my political model. For any elected position, *everyone* who votes has to be a candidate. We build a balanced binary tree with everyone in a leaf. 1/2

English

4.2K

guitarstring@guitarstring7·14 Ağu

@WeftOfSoul @alomenopee I think they meant Hiromi

English

탐색

@HighFreqAsuka @jxbz @Cohere_Labs @ml_collective @teortaxesTex @nanjiang_cs @ruima @DimitrisPapail