Mefaso

1.1K posts

Mefaso

@Mefaso

RL shitposts mainly

Tokyo-to, Japan เข้าร่วม Kasım 2009

191 กำลังติดตาม138 ผู้ติดตาม

ทวีตที่ปักหมุด

Mefaso@Mefaso·23 Tem

>Find a posttraining paper >Ask if it's proper RL or just weighted BC >They don't understand >Pull out Sutton&Barto and explain what is BC and what is proper RL >"It's a good approach, sir" >Read the paper >It's BC

English

Mefaso@Mefaso·3d

@prajdabre Yeah and then they'll be paid 8 million jpy at the end of their career

English

Raj Dabre@prajdabre·5d

Perfectly normal phenomenon in Japan.

aditya@adxtyahq

42 years at american express genuinely how do people stay at the same company that long?

English

3.5K

Mefaso@Mefaso·8 Nis

@LLMenjoyer @NinaDSchick 10T not 20T, unless I missed something? #t-mixtral-like-moe-on-trillium" target="_blank" rel="nofollow noopener">maxtext.readthedocs.io/en/latest/guid…

English

llm_enjoyer@LLMenjoyer·8 Nis

@NinaDSchick mythos wudn’t be the first 10T u fuking dummy gdm been benchmarking for 20T moe for a long time now in maxtext

English

1.5K

Nina Schick@NinaDSchick·8 Nis

Claude Mythos. Ten trillion parameters: the first model in this weight class. Estimated training cost: ten billion dollars. On the hardest coding test in the industry (SWE bench) it scores 94%. It found a security flaw in a system that had been running for 27 years, one that every human engineer and every automated check had missed. It found another bug that had survived five million test runs over 16 years. (It did so overnight.) It is so capable in cybersecurity that Anthropic will not release it to the public, instead it is launching Project Glasswing along with 100m in compute credits to help secure software. Only twelve partners currently have access: Amazon, Cisco, Apple, Google, Microsoft, NVIDIA, JPMorgan Chase, Crowdstrike, Palo Alto, AWS, The Linux Foundation, Broadcom. (I'm sure the Pentagon is on the line?) This is not a product launch: it is a controlled deployment of a system too powerful to distribute freely. Tell me this isn't (very expensive) AGI?

Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

578

905

11.3K

1.9M

Mefaso@Mefaso·7 Nis

@willccbb terminated vs done doesn't matter, it should be a 4 tuple at most

English

574

will brown@willccbb·7 Nis

step() returning a 5-tuple is bad

T NATION by Biotest@T_Nation

Drop your most controversial gym opinion.

English

289

27.9K

Mefaso@Mefaso·5 Nis

@giffmana @chrisoffner3d Rare Lucas L This obviously should be classified as a car/truck by a model used in a car, just like a person with a t-shirt including a car should still be classified as person Anything else is cope really

English

344

Lucas Beyer (bl16)@giffmana·5 Nis

Honestly, this is actually correct given they just don't have a class for this. It's like people saying computer vision doesn't work because an imagenet model doesn't say "car" on a car picture (there's no car class) If they add this thing as class, which they'll do after a few more memes, it'll work.

English

172

10.2K

Mefaso@Mefaso·3 Nis

Getting ghosted by your date or your reviewer I don't know what's worse

English

Mefaso@Mefaso·1 Nis

Peer review is external validation and the should be avoided

English

Mefaso@Mefaso·31 Mar

@cloneofsimo Yup, kit aircraft are very popular because they're a lot cheaper than a preassembled plane, 40k ish for the one she got. Just needs a lot of time to assemble it

English

Simo Ryu@cloneofsimo·30 Mar

You are telling me you can just do things, like you can literally make an airplane from scratch?

Math Files@Math_files

x.com/i/article/2038…

English

10.8K

Mefaso@Mefaso·30 Mar

@giffmana Heidi is still anime though

English

210

Lucas Beyer (bl16)@giffmana·30 Mar

Hey anime profile pic cracked cudamode hackers... Time to swap your profile pic to Heidi!

Lucas Beyer (bl16)@giffmana

Them: > We have to go to Japan for the blossom!! > I can't decide if I prefer snow or flowers!! The Swiss end of March:

English

12.1K

Mefaso@Mefaso·28 Mar

@Dorialexander Haha, sounds like you found a more open department than I did

English

102

Alexander Doria@Dorialexander·28 Mar

Mefaso@Mefaso

@giffmana @lvwerra Definitely can't get a PhD by creating datasets

QST

7.2K

Mefaso@Mefaso·28 Mar

@srchvrs @giffmana @lvwerra I agree, I hope it gets better. Tbh I always found the distinction between both very arbitrary

English

Leo Boytsov@srchvrs·27 Mar

@Mefaso @giffmana @lvwerra Finally venues like TMLR, so I am cautiously optimistic. 🟦

English

Leandro von Werra@lvwerra·26 Mar

There is almost unanimous agreement in the thread that data is the driver of models getting stronger, not architecture. Conferences and socials are heavily biased to architecture research, while working on data is so high leverage. It's a shame not more people work on data!

Leandro von Werra@lvwerra

Which LLM would be better: - today's best architecture trained on 2023's best data - 2023's best architecture trained on today's best data

English

17K

Mefaso@Mefaso·27 Mar

@srchvrs @giffmana @lvwerra Either really

English

Leo Boytsov@srchvrs·27 Mar

@Mefaso @giffmana @lvwerra ouch... But do you mean one new dataset or more like "how do these datasets help train better model or expose model shortcomings"?

English

Mefaso@Mefaso·27 Mar

@srchvrs @giffmana @lvwerra Might just be my department, but if your thesis is mainly a new dataset you'll be laughed out of your defense. Something something "it's engineering not research"

English

313

Leo Boytsov@srchvrs·27 Mar

@Mefaso @giffmana @lvwerra I wouldn't be so sure about it. Datasets + evaluations + 5-10% novelty will nearly certainly get you there. Quite a few people became famous by creating popular datasets. Of course, when everyone creates datasets... this stops becoming a competitive advantage...

English

375

Mefaso@Mefaso·27 Mar

@giffmana @lvwerra Definitely can't get a PhD by creating datasets

English

7.6K

Lucas Beyer (bl16)@giffmana·26 Mar

@lvwerra yeah I think so, but academia over-indexes on "cleverness", so if you like working on something that's clearly useful but not considered "clever" (=data), most people just decide to work on that where it's valued (=industry) instead of fighting the uphill battle in academia.

English

662

Hilde Kuehne@HildeKuehne·26 Mar

@Mefaso @miniapeur It’s not so much about the absolute number 3. My experience (and maybe why we converge to 3) is that this is just a good number for people to grow into researchers. The first paper is pretty much fully supervised, the second is weakly and the last one should be unsupervised.

English

Mefaso@Mefaso·27 Mar

@HildeKuehne @miniapeur I never thought about it that way. Good explanation, thank you.

English

Mathieu@miniapeur·25 Mar

With the peer-review system broken and so many people generating AI slop papers, I just hope I can publish a few papers, make a name for myself, find a good job, and then not have to worry about all of this burning down. Please just give me a few years.

English

200

13.5K

Mefaso@Mefaso·26 Mar

@HildeKuehne @miniapeur Right, AI slop will fail, but I feel like our system incentives publishing 4 very mediocre papers over 1 or 2 good papers. If you have one excellent paper that would be fine but it seems the best strategy currently is writing safe, mediocre papers

English

Hilde Kuehne@HildeKuehne·26 Mar

@Mefaso @miniapeur Yeah, but the question is, do they call you back after this first talk? If you have 3 AI-Slop NeurIPS papers, the answer is no. At some point, they realize that the SNR ratio is too low for this selection metric. Then they need something else and fall back to credible sources.

English

102

Mefaso@Mefaso·26 Mar

@HildeKuehne @miniapeur They are giving you interviews for papers though. Getting a foot in the door without top conference papers is really hard, regardless of the skills you got, especially if you're not at a famous school.

English

Hilde Kuehne@HildeKuehne·26 Mar

@miniapeur People never hired you bc of 3 papers you wrote. It was always just a dumb pseudo metric. Fun fact, they also didn’t hire you because of your findings, but bc of the skills you got while doing this. Just try to figure out how to do good research and people will respect that.

English

114

Mefaso@Mefaso·24 Mar

The next paper that dares claim a penalty's weight is a Lagrange multiplier gets a strong reject

English

Mefaso@Mefaso·23 Mar

@tomekkorbak Please tell me it's not a coincidence that the illustration looks like a JoJo reference. Also very cool work

English

Tomek Korbak@tomekkorbak·22 Mar

A blog post accompanying out recent paper "Training agents to self-report misbehavior" is out, have a look if you haven't read the paper yet! alignment.openai.com/self-incrimina…

English

Mefaso@Mefaso·19 Mar

@EhudReiter They aren't useful, the community is writing disposable research

English

Ehud Reiter@EhudReiter·18 Mar

A student is writing up an experiment which includes a comparison of how well two LLMs do at a task. The community expects such comparisons, but why are they useful, since the LLMs being compared will be obsolete by the time people read this?

English

3.9K

ค้นพบ

@prajdabre @LLMenjoyer @NinaDSchick @willccbb @giffmana @chrisoffner3d @cloneofsimo @Dorialexander