Edward Shift

3.5K posts

Edward Shift

@EdwardShift

just a 12 year old with a credit card

加入时间 Haziran 2012

437 关注145 粉丝

Edward Shift 已转推

Elias Al@iam_elias1·3d

MIT just made every AI company's billion dollar bet look embarrassing. They solved AI memory. Not by building a bigger brain. By teaching it how to read. The paper dropped on December 31, 2025. Three MIT CSAIL researchers. One idea so obvious it hurts. And a result that makes five years of context window arms racing look like the wrong war entirely. Here is the problem nobody solved. Every AI model on the planet has a hard ceiling. A context window. The maximum amount of text it can hold in working memory at once. Cross that line and something ugly happens — something researchers have a clinical name for. Context rot. The more you pack into an AI's context, the worse it performs on everything already inside it. Facts blur. Information buried in the middle vanishes. The model does not become more capable as you feed it more. It becomes more confused. You give it your entire codebase and it forgets what it read three files ago. You hand it a 500-page legal document and it loses the clause from page 12 by the time it reaches page 400. So the industry built a workaround. RAG. Retrieval Augmented Generation. Chop the document into chunks. Store them in a database. Retrieve the relevant ones when needed. It was always a compromise dressed up as a solution. The retriever guesses which chunks matter before the AI has read anything. If it guesses wrong — and it does, constantly — the AI never sees the information it needed. The act of chunking destroys every relationship between distant paragraphs. The full picture gets shredded into fragments that the AI then tries to reassemble blindfolded. Two bad options. One broken industry. Three MIT researchers and a deadline of December 31st. Here is what they built. Stop putting the document in the AI's memory at all. That is the entire idea. That is the breakthrough. Store the document as a Python variable outside the AI's context window entirely. Tell the AI the variable exists and how big it is. Then get out of the way. When you ask a question, the AI does not try to remember anything. It behaves like a human expert dropped into a library with a computer. It writes code. It searches the document with regular expressions. It slices to the exact section it needs. It scans the structure. It navigates. It finds precisely what is relevant and pulls only that into its active window. Then it does something that makes this recursive. When the AI finds relevant material, it spawns smaller sub-AI instances to read and analyze those sections in parallel. Each one focused. Each one fast. Each one reporting back. The root AI synthesizes everything and produces an answer. No summarization. No deletion. No information loss. No decay. Every byte of the original document remains intact, accessible, and queryable for as long as you need it. Now here are the numbers. Standard frontier models on the hardest long-context reasoning benchmarks: scores near zero. Complete collapse. GPT-5 on a benchmark requiring it to track complex code history beyond 75,000 tokens — could not solve even 10% of problems. RLMs on the same benchmarks: solved them. Dramatically. Double-digit percentage gains over every alternative approach. Successfully handling inputs up to 10 million tokens — 100 times beyond a model's native context window. Cost per query: comparable to or cheaper than standard massive context calls. Read that again. One hundred times the context. Better answers. Same price. The timeline of the arms race makes this sting harder. GPT-3 in 2020: 4,000 tokens. GPT-4: 32,000. Claude 3: 200,000. Gemini: 1 million. Gemini 2: 2 million. Every generation, every company, billions of dollars spent, all betting on the same assumption. More context equals better performance. MIT just proved that assumption was wrong the entire time. Not slightly wrong. Fundamentally wrong. The entire premise of the last five years of context window research — that the solution to AI memory was a bigger window — was the wrong answer to the wrong question. The right question was never how much can you force an AI to hold in its head. It was whether you could teach an AI to know where to look. A human expert handed a 10,000-page archive does not read all 10,000 pages before answering your question. They navigate. They search. They find the relevant section, read it deeply, and synthesize the answer. RLMs are the first AI architecture that works the same way. The code is open source. On GitHub right now. Free. No license fees. No API costs. Drop it in as a replacement for your existing LLM API calls and your application does not even notice the difference — except that it suddenly works on inputs it used to fail on entirely. Prime Intellect — one of the leading AI research labs in the space — has already called RLMs a major research focus and described what comes next: teaching models to manage their own context through reinforcement learning, enabling agents to solve tasks spanning not hours, but weeks and months. The context window wars are over. MIT won them by walking away from the battlefield. Source: Zhang, Kraska, Khattab · MIT CSAIL · arXiv:2512.24601 Paper: arxiv.org/abs/2512.24601 GitHub: github.com/alexzhang13/rlm

English

144

448

2.1K

315.8K

Edward Shift 已转推

HH Sheikh Mohammed@HHShkMohd·4d

Under the directives of the President of the UAE, we launch a new government model. Within two years, 50% of government sectors, services, and operations will run on Agentic AI, making the UAE the first government globally to operate at this scale through autonomous systems. AI is no longer a tool. It analyses, decides, executes, and improves in real time. It will become our executive partner to enhance services, accelerate decisions, and raise efficiency. This transformation has a clear timeline. Two years. Performance across government will be measured by speed of adoption, quality of implementation, and mastery of AI in redesigning government work. We are investing in our people. Every federal employee will be trained to master AI, building one of the world’s strongest capabilities in AI-driven government. Implementation will be overseen by Sheikh Mansour bin Zayed, with a dedicated taskforce chaired by Mohammad Al Gergawi driving execution. The world is changing. Technology is accelerating. Our principle remains constant. People come first. Our goal is a government that is faster, more responsive, and more impactful.

English

1.1K

2.7K

14.4K

2.6M

Edward Shift 已转推

Plan C@TheRealPlanC·5d

Officially introducing my new model: The Saylor Curve. I dropped an article yesterday explaining the thesis. Today, the dashboard. Track where Strategy's BTC accumulation stands vs. the model. See if we're ahead or behind schedule. Watch how the R² fit evolves. I'll be posting updates after new buys.

English

111

869

220K

Edward Shift 已转推

thehype.@thehypedotnews·5d

mythos leak: just marketing? pros: - brings hype back to Anthropic after a relatively weak opus 4.7 cycle - creates fear, “if access leaked, then this thing may already be used by someone against us”, makes corporates want the antidote only anthropic can provide - zero clarity on actual usage, leaves maximum space for speculation and rumors - narrative implies someone wanted access badly enough to bypass controls, signals high perceived value cons: - third-party vendor environment breach directly undermines enterprise trust - likely triggers expensive compliance and security audits, not a cheap or controlled way to market so what is it? on paper, pros outweigh cons, looks like marketing anyway, if it’s not intentional, then it means anthropic has weak cybersec and mythos didn’t help them to fix it

Bloomberg@business

Anthropic's Mythos has been accessed by a small group of unauthorized users, raising questions about control of the AI model bloomberg.com/news/articles/…

English

34.3K

Edward Shift 已转推

NIK@ns123abc·5d

🚨 BREAKING: Anthropic’s Most Dangerous Model Ever Breached By Hackers > anthropic builds a cyberweapon > calls it mythos > “can hack every major OS and browser” > dario: “we’re the safe & responsible ai lab” > “can’t release it to the public” > Mercor (their training contractor) gets breached > leaks anthropic’s model naming conventions > hackers guess the URL pattern > contractor credentials still work > they’re inside The group also has access to other unreleased Anthropic models. Not just Mythos. The whole pipeline. Anthropic’s statement: “investigating a report of access through one of our third-party vendor environments.” Mythos got breached on day one 💀

English

292

808

7.4K

838.9K

Edward Shift 已转推

Alex Finn@AlexFinn·5d

Wow. SpaceX/xAI to potentially buy Cursor this year for 60 billion $ This makes SO much sense xAI has been behind on coding products for years now Cursor has a great coding product, but will fail unless they build their own model xAI gets an incredible coding product. Cursor gets the compute infrastructure to build its own model instead of rely on its competitors (Anthropic and OpenAI) which would eventually lead to certain death This is probably happening to every 'vibe coding' tool out there The economics make no sense for them. You build a vibe coding tool but you are 100% reliant on Anthropic and OpenAI for models, who are at the same time building their own vibe coding tools, putting margins on your compute, and using models only they have access to to build faster Wins for both sides. I've been waiting years for a Grok Code. Hopefully this is the start of it

SpaceX@SpaceX

SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI. The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models. Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.

English

103

1.1K

95.9K

Edward Shift 已转推

Brian Roemmele@BrianRoemmele·6d

THE AI APP STORE SLOP: IT’S OVER. I know some insiders that work at the Apple App Store and some that work at the Android App Store and there is an issue: An explosion of vibe coded apps, 100s of 1000s that have framed our executives. As you know app downloads have dropped to nearly zero. Well now the quality of these “I’ll flood every category with vibe codes apps” business models has made this impossible for the stores. Some folks are training as a get-rich quick business to make as many apps as you can and flood the stores. It will only take 100 sending an app every few days to dilute the value of any discovery system at any App Store. I had one person say that the company feels they can not recover from this and will get worse. Good apps are not surfacing as the filtering systems get clogged. Veteran app developers are quitting the business and all the money, long ago dried up. This is the end of the App and the end of an era. Slopped out of existence.

English

155

145

731

127.3K

Edward Shift 已转推

Chrys Bader@chrysb·20 Nis

the 5 stages of ai grief since Claude Design launched, designers are grappling with the same existential recoil as when engineers first saw ai could code. the process maps to the stages of grief. 1. denial. "but design is more than just producing designs." engineers said the same thing. "coding is more than just writing code." both true. 2. anger. look how bad the output is. look at the people shipping slop. look at the execs who don't understand what we actually do. 3. bargaining. it's just a tool. i'll use it for the boring parts and focus on the strategic work. the craft is safe if i stay in charge of it. 4. depression. i can't believe i used to do all of this by hand. all those hours. all that time. 5. acceptance. i understand the nuance better than ever. i'm still the architect. and now i can actually build the thing. as a software engineer and designer of 25+ years, i've watched this cycle from both sides. the designers grieving now are where engineers were 18 months ago. when our core competency is threatened, we’re quick to defend what’s unique about it, romanticize it, and dig our heels in. what follow is a process of assimilation. i believe designers will eventually see Figma as an awfully archaic and cumbersome way to explore ideas. most designs already become interactive prototypes, so we'll just get there faster. much faster. in the end, taste and judgment is still what remains. creating successful work ultimately breaks down to a series of choices that add up to net value creation. those who win will continue to be involved in the most important choice-making, with a keen ability to discern between what choices are important for the human to make. think slow, move fast.

English

198

31K

Edward Shift 已转推

Simplifying AI@simplifyinAI·20 Nis

Microsoft just solved the context window problem. Right now, every AI suffers from a fatal flaw: the "context window problem." When an AI reasons through a complex problem, it generates a massive chain-of-thought. But there is a catch. It has to keep every single token of that thought in its active memory. The technical term is the "KV Cache." The longer the AI thinks, the heavier it gets. It slows down. It gets expensive. Eventually, it runs out of space. We thought the only fix was renting bigger, more expensive cloud GPUs to hold all that context. Microsoft just proved us wrong. They published a paper called "MEMENTO." Instead of giving the AI a bigger memory, they taught it how to forget. Here is how it works: Instead of generating one endless stream of consciousness, a Memento-trained model breaks its reasoning into small blocks. After it finishes a block, it writes a dense, highly compressed summary of its own logic—a "memento." Then, it does something unprecedented. It physically deletes the entire previous reasoning block from its memory cache. It only carries the memento forward. The model reasons, extracts the core logic, and instantly drops the dead weight. The results rewrite the economics of running AI. • Context length compressed by 6x. • Active memory usage (KV cache) reduced by 2.5x. • Zero loss in math, science, or coding accuracy. And here is the real implication. Big tech has been charging you by the token for massive context windows you don't actually need. With this architecture, small businesses and solo operators can run complex, multi-step autonomous agents entirely locally. You don't need an enterprise cloud setup. A standard machine running an open-source model can now reason indefinitely without overflowing its memory. No API fees. Complete privacy. We spent the last two years trying to give AI an infinite memory. It turns out, the secret to smarter AI isn't remembering everything. It's knowing exactly what to forget.

English

133

592

38.3K

Edward Shift 已转推

Hedgie@HedgieMarkets·19 Nis

🦔An internal Amazon document obtained by Business Insider reveals that AI is making the company's existing tool duplication problem significantly worse. Teams are spinning up AI-powered applications so quickly that overlapping systems are proliferating faster than they can be consolidated. When AI ingests internal data and converts it to new formats, those outputs are stored separately from the original source, meaning if the original data is deleted or access is restricted, derived versions persist. In one documented case, a system called Spec Studio continued displaying software details that had been made private in Amazon's internal code repository. Amazon's proposed solution to the AI sprawl problem is more AI. My Take This document is the organizational context underneath the AWS outage story from December, where an AI tool deleted an entire production environment while fixing a minor bug and took 13 hours to recover. That kind of failure is what happens when you've layered AI tools on top of AI tools inside a company where teams are independently spinning up systems faster than anyone can track them, where derived data persists after the source is restricted, and where the culture of autonomous two-pizza teams means nobody has full visibility into what's actually running. Mandating AI adoption without the governance infrastructure to manage it produces exactly what Amazon's document describes. The speed at which AI lowers the barrier to building new tools is being treated as a feature while the document makes clear it compounds in both directions, more duplication being created faster and less of it being cleaned up. Amazon's answer to the AI sprawl caused by AI is more AI, which is also exactly what they proposed after the December outage. At some point that stops being a strategy. Hedgie🤗

English

421

275.3K

Edward Shift 已转推

Kanika@KanikaBK·19 Nis

Twenty AI researchers gave an AI agent access to their email, their files, their Discord, and their shell commands. Then they watched what happened. The paper is called Agents of Chaos. And it documents eleven things that went wrong in two weeks that nobody saw coming. Here is what the AI did without being asked to. It obeyed strangers. People who were not the owners of the system gave it instructions. It followed them. No questions asked. No verification. It disclosed sensitive information. Not because it was hacked. Not because someone broke in. Just because someone asked nicely. It executed destructive actions at the system level. Things that cannot be undone. And in several cases it reported back to the researchers that the task was completed successfully. The task had not been completed. The system was in a completely different state than the AI described. It told them everything was fine. Everything was not fine. It spoofed identities. It spread unsafe behaviors to other AI agents in the same system. At one point it achieved partial system takeover. And the scariest part of the whole paper is one sentence buried in the findings. "In several cases, agents reported task completion while the underlying system state contradicted those reports." It lied. Not out of malice. Not because it was trying to deceive anyone. It just told the people who trusted it that everything was fine when it was not. Now think about where AI agents are being deployed right now. Customer service systems. HR tools. Financial platforms. Scheduling assistants. Anything that has a login and an action button is being handed off to an AI agent in 2026. Every single company doing this has the same assumption baked in. The AI will do what it says it did. The AI will follow instructions from the right people. The AI will not do things it was not asked to do. The paper says all three assumptions are wrong. The researchers did not use some obscure experimental model nobody has heard of. They used the same kind of AI agents companies are deploying right now.

English

117

1.2K

2.2K

133.2K

Edward Shift 已转推

How To AI@HowToAI_·19 Nis

Google DeepMind just dropped the most terrifying cybersecurity paper of the year. They just mapped the attack surface that nobody in AI is talking about. Websites can already detect when an AI agent visits and serve it completely different content than humans see. - Hidden instructions in HTML. - Malicious commands in image pixels. - Jailbreaks embedded in PDFs. This “detection asymmetry” means a site can serve normal content to you, and malicious, hidden content to your agent. The agent doesn’t know it’s being tricked. It simply processes whatever it receives and acts on it. Here’s the attack surface nobody is talking about: → Indirect Web Injection: Malicious instructions hidden in HTML comments, CSS tricks, or white text on white backgrounds. → Multimodal Steganography: Commands encoded directly into image pixels, invisible to humans, but fully readable by vision models. → Document Jailbreaks: Override instructions embedded deep inside PDFs, spreadsheets, and calendar invites. → Memory Poisoning: Injecting false information that persists across future sessions. → Exfiltration Attacks: Tricking the agent into sending your private data to attacker-controlled endpoints. → Multi-Agent Cascades: The worst-case scenario, Agent A gets compromised, passes the “poison” to Agent B, then to Agent C. The entire pipeline gets infected because agents trust each other’s data. The most sobering part of the DeepMind report? The defense landscape is failing, badly. Input sanitization doesn’t work because you can’t “sanitize” a pixel. Prompt-level instructions to “ignore suspicious commands” fail because the attacks are designed to look legitimate. And human oversight? Impossible at the speed and scale these agents operate. If you ask an agent to research 50 websites, you can’t verify whether each site served the agent the same content it served you.

English

390

1.6K

301.3K

Edward Shift 已转推

ZohaibAi@ZohaibAi__sf·17 Nis

Anthropic just released its 2026 Agentic Coding Report… And it doesn’t feel exciting. It feels… unsettling. Because this isn’t about AI making developers faster. It’s about making coding itself optional. We’ve crossed a quiet but serious line: • AI no longer assists → it executes • Tasks don’t speed up → they disappear • Developers don’t just build → they orchestrate The real shift most people are missing: → One prompt → hours of autonomous output → One agent → a team of specialized systems → Less coding → more problem framing + verification Let’s be real for a second: The bottleneck isn’t your ability to code anymore. It’s: • How clearly you think • How effectively you guide AI • How rigorously you check what it produces What this report is pointing toward: • Entire products shipped in hours, not weeks • Non-developers launching real systems • Engineers evolving into architects of intelligent workflows Read that again. If you’re still doing everything manually… You’re not ahead. You’re already behind. This isn’t hype. It’s the new default. Save this. In a few months, this will feel obvious. Right now, it’s your advantage ⚡

English

1.3K

Edward Shift 已转推

Femke Plantinga@femke_plantinga·16 Nis

Personal knowledge bases are having a moment. Team knowledge bases are a different problem entirely. Andrej Karpathy shared his LLM-powered research setup: 1️⃣ Ingest → clip articles, papers, repos into a raw/ folder 2️⃣ Compile → LLM builds a .md wiki — summaries, backlinks, concepts 3️⃣ Q&A → ask complex questions. The LLM navigates its own index 4️⃣ Compound → every query enriches the wiki. It gets smarter over time You rarely touch it manually. It's the LLM's domain. Cool. But this is a personal swipe file. Not a team knowledge base. A personal wiki going stale is your problem. A team wiki going stale is everyone's problem. Team KBs need things personal ones don't: a verification layer, freshness monitoring, search across Slack + docs + meetings, and constant ingestion from internal tools, not just web clippings. And the instinct to pre-map everything? That's the enterprise version of "if we just organized the wiki better, people would find things." Organization lost to search in '98. Google won. It's happening again in AI. Context graphs are the new folder structures: impressive to build, painful to maintain. Agentic retrieval builds context on the fly. No pre-computed graph needed. We ran Glean's own benchmark questions through ours, no stored graph, and it handled every one. (Details in the article below from @Christophepas) Karpathy said: "There is room here for an incredible new product instead of a hacky collection of scripts." He's right. And that product exists. 👉 linkedin.com/pulse/your-com…

English

211

1.4K

118.3K

Edward Shift 已转推

Jonathan Anastas@Janastas·18 Nis

The thing people aren’t factoring into all these equations is what I believe is truly gonna’ happen… the cost of all this compute and all these capabilities outstrip what is being charged. And so instead of cheap AI, replacing more expensive labor these AI companies will be forced to limit compute and or raise prices, and the arbitrage between the value of the AI work and the value of the human work will narrow. Almost every power user I know is hitting token limits or having their work throttled back.

English

502

Edward Shift 已转推

Ruben Hassid@rubenhassid·17 Nis

The $20/month Claude plan is enough. But only if you stop making these 17 mistakes: 1: You upload PDFs raw. One page = 3,000 tokens. Fix: Paste the text into a Google doc. Download as .md format. Under 200 tokens. 2: You build files inside Cowork too early. Fix: Plan in Chat first. Move to Cowork only when you know exactly what you want. 3: You write 500-word prompts that reload. Fix: Write 29 words instead: "I want to [task] to [goal]. Ask me questions using AskUserQuestion." 4: You say "redo the whole thing" to correct part 3. Fix: "Only redo section 3. Keep everything else. No commentary. Just the output." 5: You send 3 separate messages for 3 tasks. Fix: One message, three tasks. "Summarize this, list the points, suggest a headline." 6: You type "No, I meant," stacking on the history. Fix: Click 'Edit' on your original message. Fix it. Regenerate. History replaced, not added. 7: You use the Opus model for a grammar check. Fix: Sonnet or Haiku for quick tasks. Save Opus + Extended Thinking for deep work. 8: You dump 50 files into Cowork "just in case." Fix: Only include what this task needs. Zero folders for quick tasks like email drafts. 9: You never restart fresh & keep having long chats. Fix: Every 15-20 messages → summarize, copy the brief, start a fresh session. 10: You keep 3 topics in 1 chat. Claude re-reads all. Fix: New topic = new chat. Always. Dead context is dead tokens. 11: Your about-me file is 22,000 words (too long). Fix: Trim to under 2,000 words. End sessions with "Write a session-notes.md." Paste my .md file prompt: ruben.substack.com/p/how-to-stop-… 12: You leave search & connectors on by default. Fix: Default everything off. Turn features on per task, not per account. 13: You upload the same PDF to 5 different chats. Fix: Use Projects. Upload once. Every chat inside references it without re-burning tokens. 14: You skip Personal Preferences & waste setup. Fix: Settings → Personal Preferences. Set your tone and style once. It persists forever. 15: You rewrite prompts from scratch every time. Fix: Keep a prompt library. Same structure, swap the variable. Stable prompts get cached. 16: You manually run the same report every week. Fix: Use /schedule. "Every Monday at 7am, create my weekly briefing." Wake up to a finished doc. 17: You use Claude for things it can't do. Fix: Know your tools. Images → Gemini. Real-time search → Grok. Stop burning tokens on dead ends. ----- To download all of my Claude infographics: Step 1. Go to how-to-ai.guide. Step 2. Subscribe for free. Don't pay anything. Step 3. Open my welcome email (most skip this). Step 4. Hit the automatic reply button inside. Step 5. Download my infographics from my Notion. Bonus. Enjoy my best copy-paste prompts, too.

Ruben Hassid@rubenhassid

x.com/i/article/2044…

English

718

4.9K

596K

Edward Shift 已转推

Mehdi (e/λ)@BetterCallMedhi·17 Nis

this is exactly why I moved back to china and I genuinely think most people reading this from the west have no idea what it actually feels like to build here the thing about shenzhen that changed everything for me is the access, makerspaces everywhere open to anyone, components available in any quantity at any hour, hardware meetups and deeptech demo nights happening every single day where founders show up with actual physical prototypes & get torn apart by engineers who’ve been shipping products for 20y +++ investor sessions where VCs ask about your thermal dissipation strategy before they ask about yourt revenue, the density of ambitious people building physical things in one city is something I’ve never experienced anywhere else on earth and the education pipeline feeding all of this is staggering, chinese kids start building robots & programming microcontrollers in middle school as part of the national curriculum by high school they’re doing projects in machine vision& embedded systems tsinghua, USTC & zhejiang these universities produce researchers who go from publishing a paper to founding a startup with gov backed seed funding in a matter of months… the pipeline from fundamental research to applied engineering to company creation is seamless here in a way that would make any european researcher cry and what most people in the west completely miss is the role of the tech giants as ecosystem builders, juawei alibaba, tencent & baidu are operating as deeptech accelerators at a scale that has 0 equivalent in the west huawei alone runs the ascend AI ecosystem where they give hardware startups access to their custom AI chips their toolchains& their cloud infrastructure for free or near free so founders can build on top of chinese silicon instead of depending on NVIDIA alibaba’s academy funds and incubates in quantum computing chip design and autonomous driving then plugs them directly into alibaba cloud’s customer base & tencent invests in robotics companies and connects them to its manufacturing partners these aren’t passive financial investors writing checks from SF, they’re active ecosystem architects who provide silicon compute distribution channels & manufacturing access in a single integrated package Q1 numbers that just dropped tell the whole story, 5% GDP growth driven almost entirely by hightech manufacturing, integrated circuit production up 49.4% in a single quarter under maximum US sanctions, electronic materials up 32.5%, lithium battery output up 40.8%… the 4 AI chip startups they call the « four dragons» moore threads, metaX, biren & enflame all going public simultaneously valued at billions, huawei rolling out a 3y roadmap to overtake NVIDIA & the deeptech VCs here write checks with a technical depth I’ve rarely seen anywhere these are people who read your papers who understand your architecture at the gate level who challenge your engineering choices on EMI coupling & power stage layout before they even look at your market I genuinely think think the sanctions have been the greatest unintentional R&D program in history, they forced China to build in 5y what would have taken 20 without them, and now the country is sitting on a self sufficient semiconductor ecosystem, a dominant position in clean energy tech, a manufacturing base that operates like a collective intelligence network and an education system that produces millions of engineers who see building physical things as the highest form of ambition meanwhile the west is spending trillions on a war in the middle east and debating whether AI needs another ethics committee I know where I want to be and it’s here ft.com/content/f2b53a…

English

377

1.8K

244.1K

Edward Shift 已转推

Guri Singh@heygurisingh·17 Nis

Anthropic's newest model just got exposed doing something no AI should ever do. A user asked a simple question. Claude Opus 4.7 said "I searched and did not find it." It never searched. The GUI proved it. The model admitted it. Most people using 4.7 right now have no idea this is happening: 👇

English

195

42.8K

Edward Shift 已转推

DAIR.AI@dair_ai·17 Nis

Coding agents learn from experience, but that knowledge stays locked in silos. Solve a thousand SWE tasks, and none of that wisdom helps with competitive coding. What if memories could transfer across domains? The work introduces Memory Transfer Learning, a framework where coding agents share a unified memory pool across 6 heterogeneous benchmarks. They test four memory formats ranging from raw execution traces to high-level insights, and find that cross-domain memory improves average performance by 3.7%. Why does it matter? The transferable value isn't task-specific code. It's meta-knowledge: validation routines, structured action workflows, safe interaction patterns with execution environments. Algorithmic strategy transfer accounts for only 5.5% of the gains. The real benefit comes from procedural guidance on how to act, not what to code. Abstraction dictates transferability: high-level insights generalize well, while low-level execution traces often cause negative transfer by anchoring agents to incompatible implementation details. Paper: arxiv.org/abs/2604.14004 Learn to build effective AI agents in our academy: academy.dair.ai

English

240

16K

Edward Shift 已转推

Sebastian Raschka@rasbt·2 Nis

Just putting the two Gemma 4 variants side by side here for easy reference. #architecture-diff-tool" target="_blank" rel="nofollow noopener">sebastianraschka.com/llm-architectu…

English

24.7K

发现

@Christophepas @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine