Suresh

5.9K posts

Suresh banner
Suresh

Suresh

@_Suresh2

MSc Software Engineering @ Chongqing University ’26 | Researching AI x Software Engineering (AI for SE & SE for AI) | 🇵🇰➡️🇨🇳

Lahore, Pakistan Katılım Ocak 2019
437 Takip Edilen124 Takipçiler
Suresh
Suresh@_Suresh2·
@vboykis good engineering still shows up in the boring edges, especially when things get weird
English
0
0
0
1
vicki
vicki@vboykis·
Last week, I gave a keynote at appliedml.us/2026/ on machine learning and engineering. Like a lot of us, I've had questions and anxiety around software development today. The talk covers why good engineering is still important (and 🌷). vickiboykis.com/2026/04/20/bui…
English
2
4
50
2.7K
Suresh
Suresh@_Suresh2·
@heynavtoor openhands still falls over on vague tickets. repo count matters less than issue quality.
English
0
0
0
3
Nav Toor
Nav Toor@heynavtoor·
If I had to replace my entire engineering team with AI in 2026, I would not hire a single developer. I would set up these 10 GitHub repos. 1. OpenHands Replaces your junior developers. An autonomous software engineer that reads GitHub issues, writes the fix, runs the tests, and opens the PR. 65K+ stars. repo → github.com/All-Hands-AI/O… 2. Aider Replaces your mid-level dev. A terminal pair programmer that edits multi-file codebases, auto-commits to git, and works with any LLM. repo → github.com/Aider-AI/aider 3. Cline Replaces your VS Code teammate. An autonomous agent that lives in your editor, navigates files, runs commands, and ships features end-to-end. repo → github.com/cline/cline 4. Claude Task Master Replaces your project manager. Turns a product spec into a tracked task list and keeps the agent on rails across long builds. repo → github.com/eyaltoledano/c… 5. CrewAI Replaces your tech lead. Coordinates multiple AI agents with defined roles, responsibilities, and handoffs. Already used across the Fortune 500. repo → github.com/crewAIInc/crew… 6. LangGraph Replaces your architect. The orchestration layer every production AI system is being built on in 2026. Stateful, durable, observable. repo → github.com/langchain-ai/l… 7. n8n Replaces your ops hire. 400+ integrations, native AI nodes, self-hosted. Every internal tool and workflow your team used to build from scratch. repo → github.com/n8n-io/n8n 8. Coolify Replaces your DevOps engineer. Self-hosted Heroku and Vercel. Git push to deploy, auto SSL, databases, 280+ one-click services. repo → github.com/coollabsio/coo… 9. PostHog Replaces your QA and data team. Product analytics, session replay, feature flags, A/B tests, error tracking. All in one repo. repo → github.com/PostHog/posthog 10. Chatwoot Replaces your support hire. Live chat, email, WhatsApp, all from one inbox. Self-hosted and AI-assisted out of the box. repo → github.com/chatwoot/chatw… A 10-person engineering team in 2022 could ship what one founder ships now with these 10 repos. That is not a prediction. It is what is already happening at every AI-first startup in 2026. Pick one. Replace one role. Ship one feature. That is how you start. 100% free. 100% open source.
Nav Toor tweet mediaNav Toor tweet mediaNav Toor tweet mediaNav Toor tweet media
English
18
23
147
11.1K
Suresh
Suresh@_Suresh2·
@burkov @ChapterPal distributed training is where interview prep gets real. single gpu intuition breaks fast.
English
0
0
0
3
BURKOV
BURKOV@burkov·
A new curriculum on @ChapterPal: Prep reading for the model training infrastructure interview This curriculum equips learners with a deep understanding of fundamental and cutting-edge techniques in model training infrastructure. Covering optimization, distributed training paradigms, memory management, and advanced parallelization, it prepares you to confidently discuss complex system design and performance challenges. chapterpal.com/curriculum/4f0…
BURKOV tweet media
English
1
6
25
2K
Suresh
Suresh@_Suresh2·
@lxfater for react and next, how do you handle version drift in the installed skills?
English
0
0
0
32
铁锤人
铁锤人@lxfater·
这个项目能扫描出你的代码的技术栈,然后配齐Skill 比如说,你的有React、Next.js、Tailwind、Prisma,都能给你装齐Skill 而且是人工精选的Skill,没啥安全问题,支持monorepo github.com/midudev/autosk…
中文
7
15
98
6.5K
Suresh
Suresh@_Suresh2·
@Al_Grigor handoffs are where this usually breaks. were the roles the same across all 5?
English
1
0
0
4
Alexey Grigorev
Alexey Grigorev@Al_Grigor·
I've been experimenting with a different way of building software with AI agents. Not one agent doing everything, but a small team with defined roles, handoffs, and checks. Over the last month, I used this setup across 5 projects: - AI Shipping Labs website - DataTasks - Merm - Rustkyll - Codehive In my newsletter, I share the full setup and what I learned from using this approach in practice. Read here: alexeyondata.substack.com/p/i-built-an-a…
Alexey Grigorev tweet media
English
2
3
13
693
Suresh
Suresh@_Suresh2·
@hxiao tiny multilingual vlm sounds fun, how tiny was it once you added more languages?
English
0
0
0
2
Han Xiao
Han Xiao@hxiao·
heading to Rio from SF tmr for ICLR. long haul, 20hrs door to door 🫠. presenting 2 workshop posters on Saturday - one on squeezing embeddings into spherical coordinates, one on building a tiny multilingual VLM. find me if you're there
Han Xiao tweet media
English
1
0
8
495
XiulinYang
XiulinYang@xiulin_yang·
🎉 Happy to share that our paper on function words & language learning (w/ Heidi Getz & @weGotlieb) is accepted to #ACL2026! A little late to the party, but still worth celebrating 🥳 We ask: what statistical properties help a learner abstract grammatical knowledge from linear input? Turns out function words, though often overlooked, play an important role. Check out our updated preprint: arxiv.org/pdf/2601.21191 🧵 1/4
XiulinYang tweet media
English
2
11
28
2.4K
OpenBMB
OpenBMB@OpenBMB·
AI agents struggle with long-horizon tasks because their memory rules are rigid and hand-crafted (like summarizing every N steps). What if agents could learn exactly when and how to manage their memory? 🤔 Today, we dive into AtomMem—a novel approach by @TsinghuaNLP (OpenBMB member) alongside researchers from Renmin University of China. This paper transforms agentic memory from static pipelines into a learnable, dynamic decision-making process. 🤗 Paper: huggingface.co/papers/2601.08… 📄 arXiv: arxiv.org/abs/2601.08323 💻 Code: github.com/RUCBM/AtomMem Why it matters: 1️⃣ From Static to Dynamic: Instead of "one-size-fits-all" rules, AtomMem deconstructs memory management into atomic CRUD (Create, Read, Update, Delete) operations. Agents autonomously decide what to keep, fetch, modify, or forget based on the task context at hand. 🧠 2️⃣ Reinforcement Learning Powered: Using GRPO, the agent learns an end-to-end task-aligned policy. It discovers structured memory strategies natively rather than relying on human priors, bringing an average performance boost of ~9%! 🔄 3️⃣ Hybrid Retrieval Mechanism: It combines a deterministic "scratchpad" for tracking global state with selective, query-based semantic retrieval for a vector database. The model perfectly balances short-term tracking and long-term knowledge! 🧭 4️⃣ SOTA on Long-Context & Web Tasks: AtomMem consistently outperforms static memory methods across HotpotQA, 2WikiMultihopQA, Musique, GAIA, and WebWalkerQA. It even remains robust when scaling up to 800 noisy documents, conquering information overload! 🚀 AtomMem breaks the shackles of fixed memory pipelines, granting AI agents true autonomy over their knowledge. Read the full paper to see how dynamic memory evolves! #AI #THUNLP #OpenBMB #LLM #Agents #ReinforcementLearning #MachineLearning
OpenBMB tweet mediaOpenBMB tweet mediaOpenBMB tweet media
English
0
10
45
2.5K
Suresh
Suresh@_Suresh2·
@EkagraRanjan that only breaks if routing is sloppy, right? would love to see the expert load numbers
English
0
0
0
5
Ekagra Ranjan
Ekagra Ranjan@EkagraRanjan·
Ever wondered how Speculative Decoding interacts with production MoE models? Conventional wisdom: MoE + speculative decoding = too many experts to load, gains disappear. Reality: MoE amplifies speculative decoding. Checkout Cohere Blogpost: cohere.com/blog/mixture-o…
English
7
10
27
11.4K
de.bash 🦘 ✈️ ECIR'26 🇳🇱
[x] Reassemble the mGTE mixture [x] Large-scale contrastive pre-training [x] NV-Retriever style positive aware hard negative mining [x] Contrastive fine-tuning with hard negatives [x] Two 149M models at the Pareto frontier Need to understand the decontamination @LightOnIO team keeping the💡on
Amélie Chatelain@AmelieTabatta

Today at @LightOnIO, we release LateOn 💡 and DenseOn 💃 Two open retrieval models at 149M params that push new SOTA on BEIR! With a blog post packed with insights on pre-training data curation, filtering, ablations, and decontamination. 🧵

English
1
2
10
882
Suresh
Suresh@_Suresh2·
@mamagnus00 200k tokens per task is the bigger problem than everyone using their own key.
English
0
0
0
9
Magnus Müller
Magnus Müller@mamagnus00·
One big disadvantage of OpenSource is that everybody brings their own api key. Let’s say a user runs on avg 10 tasks. At ~200k tokens per task. That’s at least 250 Trillion tokens per year (if we don’t grow more;). Big labs care much less about us, than if we would spend those $250 Million directly on them. No discount. No reselling cut. No impact in revenue. For months no response. No extra R&D credits to improve their models for browser agents so they make more money!! Anyways, we want to be the default pillar for agents to interact with the web. That’s why we stay open source and give you SOTA browser agents and push the boundary of how you interact with your computer.
Magnus Müller tweet media
English
1
2
26
15.2K
Suresh
Suresh@_Suresh2·
@SFResearch did calibration stay bad even as OPD kept improving accuracy?
English
0
0
0
6
Salesforce AI Research
Salesforce AI Research@SFResearch·
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation: bit.ly/48iccVY On-policy distillation (OPD) improves task accuracy but systematically traps models in severe overconfidence. We trace this to an information mismatch between training and deployment, and introduce CaOPD to fix it. → Identifies a pervasive Scaling Law of Miscalibration: even frontier LLMs exhibit massive calibration gaps that scale does not resolve → Formalizes how privileged teacher context induces entropy collapse and optimism bias in the student → Replaces self-reported confidence with a student-grounded empirical target, decoupling what the model answers from how certain it should be → Achieves Pareto-optimal calibration without the capability tax of RL-based methods, enabling a compact 8B model to rival frontier LLMs on reliability Code: bit.ly/4cUtCKO Authors: Jiaxin Zhang @jxzhangjhu, Xiangyu Peng @beckypeng6, Qinglin Chen, Qinyuan Ye @qinyuan_ye, Caiming Xiong @CaimingXiong, Chien-Sheng Wu @jasonwu0731 #FutureOfAI #EnterpriseAI
Salesforce AI Research tweet media
English
0
5
21
1.8K
Suresh
Suresh@_Suresh2·
"reported leak on discord" is one of those headlines where i immediately want provenance, date, and whether the weights were complete or just eval artifacts. i don't know yet if this is real. but half the damage usually happens before that part is clear.
English
0
0
0
21
Evan Chu
Evan Chu@evan_j_chu·
Opus 4.7 takes the lead on both mean@5 and best@5 on FrontierSWE!
Proximal@ProximalHQ

Opus 4.7 is #1 on FrontierSWE! We found that it commits to decisions much earlier in its trace and executes, spending ~2x fewer tokens/less time than Opus 4.6 across all tasks

English
3
4
18
1.5K
BURKOV
BURKOV@burkov·
How To AI@HowToAI_

Yann LeCun was right the entire time. And generative AI might be a dead end. For the last three years, the entire industry has been obsessed with building bigger LLMs. Trillions of parameters. Billions in compute. The theory was simple: if you make the model big enough, it will eventually understand how the world works. Yann LeCun said that was stupid. He argued that generative AI is fundamentally inefficient. When an AI predicts the next word, or generates the next pixel, it wastes massive amounts of compute on surface-level details. It memorizes patterns instead of learning the actual physics of reality. He proposed a different path: JEPA (Joint-Embedding Predictive Architecture). Instead of forcing the AI to paint the world pixel by pixel, JEPA forces it to predict abstract concepts. It predicts what happens next in a compressed "thought space." But for years, JEPA had a fatal flaw. It suffered from "representation collapse." Because the AI was allowed to simplify reality, it would cheat. It would simplify everything so much that a dog, a car, and a human all looked identical. It learned nothing. To fix it, engineers had to use insanely complex hacks, frozen encoders, and massive compute overheads. Until today. Researchers just dropped a paper called "LeWorldModel" (LeWM). They completely solved the collapse problem. They replaced the complex engineering hacks with a single, elegant mathematical regularizer. It forces the AI's internal "thoughts" into a perfect Gaussian distribution. The AI can no longer cheat. It is forced to understand the physical structure of reality to make its predictions. The results completely rewrite the economics of AI. LeWM didn't need a massive, centralized supercomputer. It has just 15 million parameters. It trains on a single, standard GPU in a few hours. Yet it plans 48x faster than massive foundation world models. It intrinsically understands physics. It instantly detects impossible events. We spent billions trying to force massive server farms to memorize the internet. Now, a tiny model running locally on a single graphics card is actually learning how the real world works.

English
2
3
28
6.1K
Suresh
Suresh@_Suresh2·
@ihtesham2005 playwright output is the real test. how are failures surfaced without eating the window?
English
0
0
0
118
Ihtesham Ali
Ihtesham Ali@ihtesham2005·
Stop paying for Claude Max. Before you upgrade your plan to get more context, install this first. Context Mode is a free MCP server that makes your existing context window last 6x longer by never letting raw tool output touch it in the first place. → A full Playwright snapshot: 56 KB → 299 bytes → 20 GitHub issues: 59 KB → 1.1 KB → 500-request access log: 45 KB → 155 bytes → 986 KB repo research via subagent: 62 KB final context → 7.5 MB JSON API response with 20,000 records: 0.9 KB The reason you keep hitting your context limit isn't the model. It's that every MCP tool dumps raw data into the conversation. Fix the plumbing, stop buying bigger tanks. github.com/mksglu/context… 8.5K stars. ELv2 License. 100% Opensource.
Ihtesham Ali tweet media
English
12
29
223
18.3K
Mariya I. Vasileva
Mariya I. Vasileva@mariyaivasileva·
Am I “moving 10x faster with AI”, or am I moving slower on 10x more projects 🤔
English
8
1
28
1.6K
Suresh
Suresh@_Suresh2·
@gabriberton compute grants are still way less shared than paper advice
English
1
0
1
85
Gabriele Berton
Gabriele Berton@gabriberton·
Finding compute grants as a student is great advice Every PhD student should read this I'll add it to my list of "advice to anyone starting a PhD in ML, or things that I heard from more experienced researchers and I tried to follow" x.com/gabriberton/st…
Emmy Liu@_emliu

wrote a guide on getting compute grants as a student, something I wish I did more at the beginning of my PhD. It's honestly one of the highest ROI things you can do as a student (we've gotten 100k+ gpu hrs for roughly 2 weeks of work writing). nightingal3.github.io/blog/2026/04/1…

English
0
2
62
7.3K