📢 Kimi AI just released a paper showing you can match the performance of a model trained with 1.25x more compute by changing one thing: how residual connections work.
The core problem is something that has been sitting inside every transformer since 2015. When layer outputs accumulate through a network, every layer gets the same fixed weight of 1. By layer 50, earlier layers are contributing so little to the final result that research has shown you can remove a significant fraction of them entirely with barely any performance drop. The model had already learned to ignore them.
Attention residuals replace that fixed accumulation with a learned weighted sum over all previous layer outputs. Each layer computes a small search query, scores every earlier layer's output for relevance, and builds its input from the most useful ones. The weights adapt per input rather than staying fixed, which is what makes the difference.
Tested on a 48B parameter model trained on 1.4T tokens, the gains hold across every benchmark. GPQA-Diamond up 7.5 points. Math up 3.6. HumanEval up 3.1. The largest improvements are on multi-step reasoning tasks, which makes sense — those are exactly the tasks where later layers need to selectively build on what earlier layers figured out.
Full breakdown in the blog. Link in the replies!
#AttentionResiduals#KimiAI#LLM#DeepLearning#AIResearch#GenerativeAI#DataScience
We are hiring for a ton of roles on our #Research team @SnorkelAI - if interested please reply/reach out!
As one of the first academic teams to focus on AI data development back at @StanfordAILab / @UW - we have long believed this is one of *the* most exciting areas to be as a researcher :)
Today - as a frontier data lab & partner to the world's leading AI labs and companies - we have more research vectors than we can possibly handle!
Come help us tackle problems in complex environment generation; long-horizon and non-stationary benchmarking; complex rubric and process reward design; data valuation and curriculum learning; core data quality control; human-in-the-loop system design; large scale RL systems; and more!!
Most AI chatbots are wrong because they DON'T use RAG.
1. User asks a question.
2. Instead of guessing, the system searches a knowledge base.
3. Relevant documents are retrieved.
4. Those documents are added to the prompt.
5. The LLM generates an answer using real data.
Insane A complete 7-week Agentic RAG bootcamp was just open-sourced.
AI academies charge $10,000 for this curriculum. You can get it for free.
It covers everything from basic keyword search to building production-grade Agentic RAG systems with LangGraph. This is not a toy project tutorial. It is a full production pipeline.
Here is what is inside:
- 7 weeks of building an AI research assistant from scratch
- Complete infrastructure setup with Docker, FastAPI, and PostgreSQL
- Production keyword and hybrid search using OpenSearch
- Local LLM deployment with streaming responses
- Production monitoring with Langfuse tracing and Redis caching
- Agentic workflows using LangGraph and Telegram bots
Here is the core value:
It forces you to build the way successful companies do. You do not just jump to vector search. You build solid search foundations first, then enhance with AI. Theory and practice in one place. Thousands of developers are using this to master production AI.
Summary of the Production Agentic RAG Course:
- It gives you a senior AI engineer curriculum for free
- It bridges the gap between basic RAG and production systems
- It forces you to build an actual end-to-end portfolio project
You still have to write the code. It just removes the guesswork.
Got this AI Engineering Book (2025 Edition) 📘
It covers:
• LLM System Design
• RAG Architecture
• AI Agents
• Production AI patterns
If you want this FREE PDF,
DM me and I’ll send it.
#DataScience#AIイラスト
#Engineering#AI#MachineLearning#LLMs#agents
How to become AI engineer in next 6 months:
By the end, you want to be able to:
- build LLM apps end-to-end
- use APIs from OpenAI / Anthropic / open-source stacks
- design prompts and context properly
- add tool calling and structured outputs
- deploy real projects
So, let’s discuss your roadmap month by month
Month 1: Get solid enough in coding and fundamentals
What to learn:
- Python really well
- Git + GitHub
- CLI / terminal basics
- JSON, APIs, HTTP, async basics
- basic SQL
- basic data handling with pandas
- virtual environments, package management, error handling
- FastAPI or Flask
Month 2: Master LLM app development
What to learn:
- prompting fundamentals
- system vs user instructions
- structured outputs / JSON schemas
- function/tool calling
- streaming responses
- conversation state
- cost / latency / token basics
- failure handling
- prompt injection awareness
Month 3: Learn RAG properly
What to learn:
- embeddings
- chunking
- vector databases
- metadata filtering
- reranking
- retrieval quality issues
- hallucination reduction
- citations and grounding
Month 4: Agents, tools, workflows, evals
- agent loops
- tool selection
- state management
- retries
- when NOT to use agents
- multi-step workflows
- evaluation harnesses
- task success metrics
Month 5: Deployment, product thinking, and reliability
What to learn:
- FastAPI production patterns
- Docker
- background jobs
- queues
- auth + API key security
- logging
- observability
- prompt/version management
- eval dashboards
- cost monitoring
- rate limits
- caching
Month 6: Specialize and become hireable
these knowledge and skills you gained can be applied in three directions
you need to choose one of them and focus on practice
although everything mentioned above is also best learned purely through practice
Direction 1: AI product engineer
Best if you want startup jobs fast
Focus on:
- LLM apps
- RAG
- agents
- deployment
- product UX
Direction 2: Applied ML / LLM engineer
Focus on:
- fine-tuning
- when to fine-tune vs prompt
- evaluation
- inference optimization
- open-source models
- training pipelines
Direction 3: AI automation engineer
Focus on:
- workflow orchestration
- business process automation
- multi-tool systems
- CRM, docs, email, support, ops use cases
This roadmap will help you go through a practical path, and the key is to study each of these points and then test them in real work
By month six, you will already have several built products or examples of completed tasks
And it will be much easier to get a job as an AI engineer
Save it so you don't lose it and can return to study later
𝗔𝗜 𝗔𝗴𝗲𝗻𝘁’𝘀 𝗠𝗲𝗺𝗼𝗿𝘆 is the most important piece of 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴, this is how we define it 👇
In general, the memory for an agent is something that we provide via context in the prompt passed to LLM that helps the agent to better plan and react given past interactions or data not immediately available.
It is useful to group the memory into four types:
𝟭. 𝗘𝗽𝗶𝘀𝗼𝗱𝗶𝗰 - This type of memory contains past interactions and actions performed by the agent. After an action is taken, the application controlling the agent would store the action in some kind of persistent storage so that it can be retrieved later if needed. A good example would be using a vector Database to store semantic meaning of the interactions.
𝟮. 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 - Any external information that is available to the agent and any knowledge the agent should have about itself. You can think of this as a context similar to one used in RAG applications. It can be internal knowledge only available to the agent or a grounding context to isolate part of the internet scale data for more accurate answers.
𝟯. 𝗣𝗿𝗼𝗰𝗲𝗱𝘂𝗿𝗮𝗹 - This is systemic information like the structure of the System Prompt, available tools, guardrails etc. It will usually be stored in Git, Prompt and Tool Registries.
𝟰. Occasionally, the agent application would pull information from long-term memory and store it locally if it is needed for the task at hand.
𝟱. All of the information pulled together from the long-term or stored in local memory is called short-term or working memory. Compiling all of it into a prompt will produce the prompt to be passed to the LLM and it will provide further actions to be taken by the system.
We usually label 1. - 3. as Long-Term memory and 5. as Short-Term memory.
And that is it! The rest is all about how you architect the topology of your Agentic Systems.
Learn all of this hands-on in my End-to-end AI Engineering Bootcamp (we are kicking off in 2 weeks!).
🎁 Get a 15% discount via this link: maven.com/swirl-ai/end-t…
Any war stories you have while managing Agent’s memory? Let me know in the comments 👇
Google dropped another banger!
PaperBanana is an agentic framework that generates publication-ready academic illustrations from methodology descriptions.
no manual design, no Figma, just your method section and a caption.
I'm hiring some of the world's top AI researchers to join a new team I'm creating at Mercor, focused on building frontier benchmarks.
If you're an exceptional fit (or know someone else who is), dm me!
A small Qwen3.5 from-scratch reimplementation for edu purposes: github.com/rasbt/LLMs-fro…
(probably the best "small" LLM today for on-device tinkering)