mont
805 posts

mont
@MontagueMont
swe @MSquared_io physics @imperialcollege, reverse engineering for artists is my passion, he/him, @ariannawith2ens



@NielsRogge not sure why they don't detail what mode of GPT-5 this is, we can presume thinking-high but it's a weird thing to miss out on this paper (+many others miss is out too) Equally, there's no detail into other thinking depths for other models



Salesforce AI Research introduces MCP-Universe: the first benchmark to truly test LLM agents in real-world scenarios with live Model Context Protocol servers.



We will use Grok 3.5 (maybe we should call it 4), which has advanced reasoning, to rewrite the entire corpus of human knowledge, adding missing information and deleting errors. Then retrain on that. Far too much garbage in any foundation model trained on uncorrected data.


Obviously using chatbots makes you more dumb, everyone is skipping over the funniest part: the study authors are such dedicated LLM haters they set up traps to expose AI users lol



Amazon-backed AI model tried to blackmail engineers who threatened to shut it down, safety report revealed — HuffPost

This was quite a funny bug for the technical folks. Unique ids for tools in the world were coded in 10 bits. Bits are packed to reduce storage costs. This means we could only have 1024 tools in the world ever 😅. Will have to make a chunked V2 for the long term fix.




















