

FhtAbd
337 posts

@fhtabd
Let the private labs compete. The swarm connects.






Just got off a decentralized AI training catchup call. I don't know if I've ever been more bullish about a crypto thing than this, ever.




in one of the first comprehensive analyses on saturation, we studied 60 popular benchmarks and busted some myths private test sets and open-ended tasks do not prevent saturation. benchmarks are evolving measurement instruments with lifecycles, not static artifacts



Researchers found our current approach to making AI smarter over time has a giant blind spot. AI is not actually understanding or applying high-level abstract lessons at all. Developers spend massive amounts of time building systems that condense past AI mistakes into neat little rules for the future. This paper proves that the AI essentially throws those rules in the trash and only looks at raw historical logs. Modern LLM systems try to get better over time by storing past tasks as either raw step-by-step histories or condensed summary rules. The study tested if these agents actually use their stored memories by secretly swapping the correct tips with random garbage text. - When the step-by-step histories were messed up, the AI failed hard, proving it heavily relies on copying exact past actions. - But when researchers completely corrupted the condensed summary rules, the AI kept acting normally and showed zero performance drop. If an AI cannot apply an abstract lesson to a new situation, it is not truly reasoning or learning. This raises the question if the entire AI industry need to rethink how memory works because right now these agents are just mimicking instead of understanding. ---- arxiv. org/abs/2601.22436 "LLM Agents Are Not Always Faithful Self-Evolvers"





The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…


As believers of open research, we are disappointed to see Anthropic silently degrading Fable 5 for AI development "Any topic related to building pretraining pipelines, distributed training infrastructure, or ML accelerator design... may have limited effectiveness through Claude via methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning." Not only do they get to decide what you use LLMs for in research, but this also enables them to silently intervene in your research without you knowing. This sets a dangerous precedent. If a model refuses openly, users can understand the boundary. If a model falls back to another model, users can still evaluate the difference. But if a model silently modifies or weakens its own answers while still pretending to help, researchers lose the ability to know whether a failed result came from their own idea, their implementation, or an invisible intervention by the model provider. That is not safety. Safety policies should be transparent, auditable, and user-visible. On top of that, the people most harmed by this are not the largest labs with massive teams and proprietary infrastructure. It is the independent researchers, academic groups, startups, and open-source builders who rely on public tools to compete, innovate, and pioneer AI for everyone else.

Fable refuses to discuss the closure of mathematics on safety grounds I think that's a "yes"..