Mefaso

1.1K posts

Mefaso

Mefaso

@Mefaso

RL shitposts mainly

Tokyo-to, Japan เข้าร่วม Kasım 2009
191 กำลังติดตาม138 ผู้ติดตาม
ทวีตที่ปักหมุด
Mefaso
Mefaso@Mefaso·
>Find a posttraining paper >Ask if it's proper RL or just weighted BC >They don't understand >Pull out Sutton&Barto and explain what is BC and what is proper RL >"It's a good approach, sir" >Read the paper >It's BC
English
1
1
10
2K
Mefaso
Mefaso@Mefaso·
@prajdabre Yeah and then they'll be paid 8 million jpy at the end of their career
English
1
0
0
9
Mefaso
Mefaso@Mefaso·
@LLMenjoyer @NinaDSchick 10T not 20T, unless I missed something? #t-mixtral-like-moe-on-trillium" target="_blank" rel="nofollow noopener">maxtext.readthedocs.io/en/latest/guid…
English
1
0
1
36
llm_enjoyer
llm_enjoyer@LLMenjoyer·
@NinaDSchick mythos wudn’t be the first 10T u fuking dummy gdm been benchmarking for 20T moe for a long time now in maxtext
English
2
1
15
1.5K
Nina Schick
Nina Schick@NinaDSchick·
Claude Mythos. Ten trillion parameters: the first model in this weight class. Estimated training cost: ten billion dollars. On the hardest coding test in the industry (SWE bench) it scores 94%. It found a security flaw in a system that had been running for 27 years, one that every human engineer and every automated check had missed. It found another bug that had survived five million test runs over 16 years. (It did so overnight.) It is so capable in cybersecurity that Anthropic will not release it to the public, instead it is launching Project Glasswing along with 100m in compute credits to help secure software. Only twelve partners currently have access: Amazon, Cisco, Apple, Google, Microsoft, NVIDIA, JPMorgan Chase, Crowdstrike, Palo Alto, AWS, The Linux Foundation, Broadcom. (I'm sure the Pentagon is on the line?) This is not a product launch: it is a controlled deployment of a system too powerful to distribute freely. Tell me this isn't (very expensive) AGI?
Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English
578
905
11.3K
1.9M
Mefaso
Mefaso@Mefaso·
@willccbb terminated vs done doesn't matter, it should be a 4 tuple at most
English
1
0
2
574
Mefaso
Mefaso@Mefaso·
@giffmana @chrisoffner3d Rare Lucas L This obviously should be classified as a car/truck by a model used in a car, just like a person with a t-shirt including a car should still be classified as person Anything else is cope really
English
0
0
4
344
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
Honestly, this is actually correct given they just don't have a class for this. It's like people saying computer vision doesn't work because an imagenet model doesn't say "car" on a car picture (there's no car class) If they add this thing as class, which they'll do after a few more memes, it'll work.
English
15
1
172
10.2K
Mefaso
Mefaso@Mefaso·
Getting ghosted by your date or your reviewer I don't know what's worse
English
0
0
0
13
Mefaso
Mefaso@Mefaso·
Peer review is external validation and the should be avoided
English
0
0
0
15
Mefaso
Mefaso@Mefaso·
@cloneofsimo Yup, kit aircraft are very popular because they're a lot cheaper than a preassembled plane, 40k ish for the one she got. Just needs a lot of time to assemble it
English
0
0
0
37
Mefaso
Mefaso@Mefaso·
@giffmana Heidi is still anime though
English
0
0
4
210
Mefaso
Mefaso@Mefaso·
@Dorialexander Haha, sounds like you found a more open department than I did
English
1
0
2
102
Mefaso
Mefaso@Mefaso·
@srchvrs @giffmana @lvwerra I agree, I hope it gets better. Tbh I always found the distinction between both very arbitrary
English
0
0
1
19
Leo Boytsov
Leo Boytsov@srchvrs·
@Mefaso @giffmana @lvwerra ouch... But do you mean one new dataset or more like "how do these datasets help train better model or expose model shortcomings"?
English
1
0
2
78
Mefaso
Mefaso@Mefaso·
@srchvrs @giffmana @lvwerra Might just be my department, but if your thesis is mainly a new dataset you'll be laughed out of your defense. Something something "it's engineering not research"
English
1
0
1
313
Leo Boytsov
Leo Boytsov@srchvrs·
@Mefaso @giffmana @lvwerra I wouldn't be so sure about it. Datasets + evaluations + 5-10% novelty will nearly certainly get you there. Quite a few people became famous by creating popular datasets. Of course, when everyone creates datasets... this stops becoming a competitive advantage...
English
1
1
7
375
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
@lvwerra yeah I think so, but academia over-indexes on "cleverness", so if you like working on something that's clearly useful but not considered "clever" (=data), most people just decide to work on that where it's valued (=industry) instead of fighting the uphill battle in academia.
English
2
0
18
662
Hilde Kuehne
Hilde Kuehne@HildeKuehne·
@Mefaso @miniapeur It’s not so much about the absolute number 3. My experience (and maybe why we converge to 3) is that this is just a good number for people to grow into researchers. The first paper is pretty much fully supervised, the second is weakly and the last one should be unsupervised.
English
1
1
1
45
Mathieu
Mathieu@miniapeur·
With the peer-review system broken and so many people generating AI slop papers, I just hope I can publish a few papers, make a name for myself, find a good job, and then not have to worry about all of this burning down. Please just give me a few years.
English
10
13
200
13.5K
Mefaso
Mefaso@Mefaso·
@HildeKuehne @miniapeur Right, AI slop will fail, but I feel like our system incentives publishing 4 very mediocre papers over 1 or 2 good papers. If you have one excellent paper that would be fine but it seems the best strategy currently is writing safe, mediocre papers
English
1
0
0
24
Hilde Kuehne
Hilde Kuehne@HildeKuehne·
@Mefaso @miniapeur Yeah, but the question is, do they call you back after this first talk? If you have 3 AI-Slop NeurIPS papers, the answer is no. At some point, they realize that the SNR ratio is too low for this selection metric. Then they need something else and fall back to credible sources.
English
1
0
1
102
Mefaso
Mefaso@Mefaso·
@HildeKuehne @miniapeur They are giving you interviews for papers though. Getting a foot in the door without top conference papers is really hard, regardless of the skills you got, especially if you're not at a famous school.
English
1
0
0
28
Hilde Kuehne
Hilde Kuehne@HildeKuehne·
@miniapeur People never hired you bc of 3 papers you wrote. It was always just a dumb pseudo metric. Fun fact, they also didn’t hire you because of your findings, but bc of the skills you got while doing this. Just try to figure out how to do good research and people will respect that.
Hilde Kuehne tweet media
English
1
0
1
114
Mefaso
Mefaso@Mefaso·
The next paper that dares claim a penalty's weight is a Lagrange multiplier gets a strong reject
English
0
0
0
31
Mefaso
Mefaso@Mefaso·
@tomekkorbak Please tell me it's not a coincidence that the illustration looks like a JoJo reference. Also very cool work
English
0
0
1
51
Mefaso
Mefaso@Mefaso·
@EhudReiter They aren't useful, the community is writing disposable research
English
0
0
1
18
Ehud Reiter
Ehud Reiter@EhudReiter·
A student is writing up an experiment which includes a comparison of how well two LLMs do at a task. The community expects such comparisons, but why are they useful, since the LLMs being compared will be obsolete by the time people read this?
English
5
1
19
3.9K