Mefaso
1.1K posts

Sabitlenmiş Tweet

@prajdabre Yeah and then they'll be paid 8 million jpy at the end of their career
English

Perfectly normal phenomenon in Japan.
aditya@adxtyahq
42 years at american express genuinely how do people stay at the same company that long?
English

@LLMenjoyer @NinaDSchick 10T not 20T, unless I missed something?
#t-mixtral-like-moe-on-trillium" target="_blank" rel="nofollow noopener">maxtext.readthedocs.io/en/latest/guid…
English

@NinaDSchick mythos wudn’t be the first 10T u fuking dummy
gdm been benchmarking for 20T moe for a long time now in maxtext
English

Claude Mythos.
Ten trillion parameters: the first model in this weight class. Estimated training cost: ten billion dollars.
On the hardest coding test in the industry (SWE bench) it scores 94%.
It found a security flaw in a system that had been running for 27 years, one that every human engineer and every automated check had missed. It found another bug that had survived five million test runs over 16 years. (It did so overnight.)
It is so capable in cybersecurity that Anthropic will not release it to the public, instead it is launching Project Glasswing along with 100m in compute credits to help secure software.
Only twelve partners currently have access: Amazon, Cisco, Apple, Google, Microsoft, NVIDIA, JPMorgan Chase, Crowdstrike, Palo Alto, AWS, The Linux Foundation, Broadcom. (I'm sure the Pentagon is on the line?)
This is not a product launch: it is a controlled deployment of a system too powerful to distribute freely.
Tell me this isn't (very expensive) AGI?
Anthropic@AnthropicAI
Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing
English

step() returning a 5-tuple is bad
T NATION by Biotest@T_Nation
Drop your most controversial gym opinion.
English

@giffmana @chrisoffner3d Rare Lucas L
This obviously should be classified as a car/truck by a model used in a car, just like a person with a t-shirt including a car should still be classified as person
Anything else is cope really
English

Honestly, this is actually correct given they just don't have a class for this. It's like people saying computer vision doesn't work because an imagenet model doesn't say "car" on a car picture (there's no car class)
If they add this thing as class, which they'll do after a few more memes, it'll work.
English

@cloneofsimo Yup, kit aircraft are very popular because they're a lot cheaper than a preassembled plane, 40k ish for the one she got.
Just needs a lot of time to assemble it
English

You are telling me you can just do things, like you can literally make an airplane from scratch?
Math Files@Math_files
English

Hey anime profile pic cracked cudamode hackers...
Time to swap your profile pic to Heidi!

Lucas Beyer (bl16)@giffmana
Them: > We have to go to Japan for the blossom!! > I can't decide if I prefer snow or flowers!! The Swiss end of March:
English

@Dorialexander Haha, sounds like you found a more open department than I did
English

There is almost unanimous agreement in the thread that data is the driver of models getting stronger, not architecture.
Conferences and socials are heavily biased to architecture research, while working on data is so high leverage.
It's a shame not more people work on data!
Leandro von Werra@lvwerra
Which LLM would be better: - today's best architecture trained on 2023's best data - 2023's best architecture trained on today's best data
English

@lvwerra yeah I think so, but academia over-indexes on "cleverness", so if you like working on something that's clearly useful but not considered "clever" (=data), most people just decide to work on that where it's valued (=industry) instead of fighting the uphill battle in academia.
English

@Mefaso @miniapeur It’s not so much about the absolute number 3. My experience (and maybe why we converge to 3) is that this is just a good number for people to grow into researchers. The first paper is pretty much fully supervised, the second is weakly and the last one should be unsupervised.
English

@HildeKuehne @miniapeur I never thought about it that way. Good explanation, thank you.
English

@HildeKuehne @miniapeur Right, AI slop will fail, but I feel like our system incentives publishing 4 very mediocre papers over 1 or 2 good papers.
If you have one excellent paper that would be fine but it seems the best strategy currently is writing safe, mediocre papers
English

@Mefaso @miniapeur Yeah, but the question is, do they call you back after this first talk? If you have 3 AI-Slop NeurIPS papers, the answer is no. At some point, they realize that the SNR ratio is too low for this selection metric. Then they need something else and fall back to credible sources.
English

@HildeKuehne @miniapeur They are giving you interviews for papers though.
Getting a foot in the door without top conference papers is really hard, regardless of the skills you got, especially if you're not at a famous school.
English

@miniapeur People never hired you bc of 3 papers you wrote. It was always just a dumb pseudo metric. Fun fact, they also didn’t hire you because of your findings, but bc of the skills you got while doing this. Just try to figure out how to do good research and people will respect that.

English

@tomekkorbak Please tell me it's not a coincidence that the illustration looks like a JoJo reference.
Also very cool work
English

A blog post accompanying out recent paper "Training agents to self-report misbehavior" is out, have a look if you haven't read the paper yet!
alignment.openai.com/self-incrimina…

English

@EhudReiter They aren't useful, the community is writing disposable research
English



