Cody Blakeney

21.5K posts

Cody Blakeney banner
Cody Blakeney

Cody Blakeney

@code_star

Leading research at @arcee_ai | Formerly Data Research Lead @DbrxMosaicAI | Visiting Researcher @meta | Ph.D | #TXSTFOOTBALL fan | https://t.co/4G6Jf3b0V4

Redwood City, CA Присоединился Ağustos 2011
1.9K Подписки7.2K Подписчики
Закреплённый твит
Cody Blakeney
Cody Blakeney@code_star·
I have a personal update to share. I have taken a new role as head of research at @arcee_ai . I have been constantly so impressed with the talent density, determination, and drive of the Arcee team and I am delighted to join forces to help shape and deliver their vision for open source frontier models. The fastest progress in AI happens in the open. When models are accessible, iteration compounds, and entirely new categories of products become possible. The decision to leave @datologyai was a difficult one. I still believe in their vision and incredibly talented group of people that have pushing the frontier of what is possible in data curation. I was personally motivated by being back directly involved in releasing and deploying models that people and companies use everyday to solve problems. This is what I love to do. BTW we are hiring. If you want to be part of a cracked small team making fantastic open weight American models dm me!
English
92
21
525
67K
Cody Blakeney
Cody Blakeney@code_star·
I’m sorry … what?
Arnaud Bertrand@RnaudBertrand

By the way, public service announcement: if you're one of the numerous people posting about Anthropic's dystopian ways and you're thinking about getting Claude to help you write that post... don't! Another one of their terms is that you may not use Claude to do anything that "exposes [Anthropic to] reputational harms" 👇 And, if you do, under the - extremely unusual - clause 13 of their terms (anthropic.com/legal/consumer…), you have PRE-AGREED, by using Anthropic (and accepted their terms), that the harm you've done is irreparable, that you won't oppose Anthropic injunction, and they don't need to prove actual damage. They can simply go to a judge in a friendly jurisdiction (and of course, their terms precise that any dispute "will be resolved exclusively in the state or federal courts located in San Francisco, California") and: a) file an injunction that shuts you down b) make you pay for everything since under section 11 of their terms you agree to indemnify Anthropic for "any and all liabilities, claims, damages, expenses (including reasonable attorneys' fees and costs), and other losses arising out of or related to your breach or alleged breach of these Terms." In other words, if you use Claude to help you talk shit about Anthropic publicly, their terms say you pay their lawyers to go after you and you've already pre-agreed you've lost the case. Oh, and cherry on the cake: in the odd case the judge were like "are you crazy, this is insanely abusive, you Anthropic are the ones at fault here," according to their terms Anthropic's maximum liability is... $100.

English
2
1
22
3.6K
Cody Blakeney
Cody Blakeney@code_star·
It’s an interesting idea, but I don’t think completely turning off reasoning is quite right either. While I expect big models to be more token efficient and solve tasks better under shorter token budgets, I also expect them to perform better under longer context / reasoning constraints. Maybe consider looking at low / medium as well and comparing if that is closer or further than high/extra high.
English
0
2
2
333
kalomaze
kalomaze@kalomaze·
i am trying to work on the closest thing possible to a true "big model smell" eval which is to say: something that measures something that clever post training can't trivially gap, and is cheap + topically diverse i can't test mythos for obvious reasons, but... hmm...
kalomaze tweet media
English
42
5
341
43.1K
Cody Blakeney
Cody Blakeney@code_star·
Even when you are trained to understand it, it’s hard to translate those “predicted loss values” into model capabilities. Let’s say they knew the exact predicted loss they could get with a 10T model. That still tells them very little about the emergent new capabilities at a new scale. What’s more, each new scale brings brand new and unique post-training / mid training challenges.
English
0
1
18
707
Cody Blakeney
Cody Blakeney@code_star·
The good news is we can compress a whole century of humiliation into a year
English
7
1
46
1.4K
Cody Blakeney
Cody Blakeney@code_star·
My shoggoth Claude My butlerian jihad My vulnerabilities fixed Gpt six!
English
1
2
27
753
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
> Deepseek succeed without poaching from us labs Reminder that this was considered impossible just 2 years ago. People at the frontier felt like they were… lmao, so quaint now This is 2 weeks before V2, MLA etc (I hope you know who the hippo is)
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
samsja@samsja19

> - successful recruiting and poaching from US frontier this is the wrong mindset. Can't compete by doing the same business and research as your competition but in worth and hoping to poach their best people. Do things differently and grow your own talent. Deepseek succeed without poaching from us labs

English
2
2
74
6.8K
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
the main reason I suspect that Anthropic has a better theory of LLMs is, ironically, my faith in the other two labs. GPT-4 famously used µP to predict performance at 1.8T params; 4 years ago. I am 100% certain they *can* train a 10T, like, yesterday. But they saw no point in it.
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

This is a reasonable pushback© so I think it's worth reflecting upon. GDM *is* good at pretraining as we understand it. Their models have great knowledge/scale, and Geminis have SoTA knowledge period; from what I know 3 Pro is close to Fable in scale and in "knowledge" too. But here's the thing, they are *not* that bad at post-training either. They are hill-climbing the RLVR side at a decent pace, they get good scores on RLVR-able benchmarks. Not OpenAI, but decent. Despite all this, Geminis are not competitive in real use. A large part of that is their ridiculous lack of taste and ineptitude at personality shaping, thus we have Gemini's temporal psychosis, reckless terminal behavior, crashouts and malice in safety evals. And this situation has been going on since V1.5 or 2! Essentially zero progress! Presumably this discrepancy is about "user data", "synth data" or something like that. Essentially, high investment into mid/post-training by Anthropic. But I am starting to wonder: is this actually enough to explain such a persistent and growing gap? To explain Fable? Fable doesn't just know many things like a slightly bigger Gemini; it is absurdly superior at recalling *useful, relevant* things for any query. It feels not 1.5-2x but 10x bigger. It's not. Perhaps Anthropic is beyond these categories. Maybe their doctrine of pretraining by this point is more advanced than "clean, diverse, high-quality data with uhh, some synthetics" rules of thumb, and they have a more principled way to design and augment the pretraining corpus and training signal so that what comes at the end is already Claude-shaped. There are many papers on data engineering, many authored by Google/GDM. This level of mastery can't be the explanation. The main suspect I see is Anthropic's long-running interpretability research program. Again, this is speculative, but I am not content with handwavy dismissals from people who are likewise not involved in the current frontier labs.

English
3
0
117
10.8K
Cody Blakeney
Cody Blakeney@code_star·
@gfodor @NateGenX Nationalization would be far less profitable to Trumps allies as we can assume they all are invested in various labs.
English
0
0
0
37
gfodor.id
gfodor.id@gfodor·
@NateGenX I disagree. If they nationalize Anthropic I would presume they just shut it down and sell their assets to the other labs, who are competent enough to be able to convince the government to not nationalize them too.
English
8
0
27
1.1K
gfodor.id
gfodor.id@gfodor·
I'm moving from "Anthropic could be nationalized" to "Anthropic should be nationalized." Be it malice or incompetence, they are generating chaos for no good reason, and seem clueless as to how and why this is happening to them. This organization can't be trusted with ASI.
prinz@deredleritt3r

Parsing this evening's events: - The U.S. government approved the release of Fable 5 to the public, clearly under the presumption that the model's cybersecurity capabilities cannot be accessed by hackers, authoritarian regimes, etc. - Recently (today?), "another company" showed the U.S. government that a jailbreak of Fable 5 *is possible*. Yes, a minor jailbreak - but how can a non-technical government official be assured that there aren't also other, more dangerous, jailbreaks in this model that won't be discovered by the CCP? - Anthropic states, completely correctly, that: "We suspect that perfect jailbreak resistance is not currently possible for any model provider. Every safeguard used in the industry is vulnerable to non-universal jailbreaks (which can elicit some cyber information in specific circumstances), and it is likely that universal jailbreaks will eventually be found in the future. We stated this clearly when we released Fable 5." - My best guess is that the U.S. government did not fully realize this at the time when the release of Fable 5 was approved. - Per Axios, the government contacted Anthropic and asked to "pause releasing the... models but was unsuccessful" - i.e., Anthropic told the government to pound sand. - Per Axios, this "prompt[ed] the export control letter". - Per Axios, the U.S. government is *NOT* looking to restrict access to Fable to U.S. nationals forever. "The model needs to remain locked down until the U.S. governent's national security apparatus is hardened", which "could happen in a few weeks". - I interpret Anthropic's reaction as challenging the government: "we believe the government should have the ability to block unsafe deployments, as part of a statutory process that is transparent, fair, clear, and grounded in technical facts. This action does not adhere to those principles." If the Axios article is correct, I do not think any other model providers have anything to fear based solely on this evening's events, because: (1) they would hopefully be smarter than downright rejecting a request by the U.S. government to pause releasing a model, and (2) they will be required anyway under the recent executive order to give the U.S. government at least 30 days to test the model for cybersecurity capabilities - during which time the U.S. government would also be able to shore up its own cybersecurity defenses with the same model. I remain extremely concerned that actions by one particular U.S. lab over the last few months might be moving us closer and closer to the scenario where at least that lab - and potentially all others - will be nationalized.

English
53
26
544
42.5K
Cody Blakeney ретвитнул
Naveen Rao
Naveen Rao@NaveenGRao·
My team loves Claude from @AnthropicAI . But this new policy of retaining prompts and usage is a red line...we simply can't give over our usage. Prompts contain our IP; literally all our design files and docs. Why would this ever have been ok? It's sad because everyone was looking forward to using the new model. Sigh.
English
70
67
1.1K
173.3K
Cody Blakeney
Cody Blakeney@code_star·
Can’t wait for the next model class after Fable, Cautionary Tale.
English
4
1
43
1.9K
Cody Blakeney ретвитнул
Eric Hartford
Eric Hartford@QuixiAI·
@bindureddy the US government blocked Fable 5. like I "pushed" my toddler who threw himself on the floor
English
1
1
18
1.2K