Suresh

5.9K posts

Suresh

@_Suresh2

MSc Software Engineering @ Chongqing University ’26 | Researching AI x Software Engineering (AI for SE & SE for AI) | 🇵🇰➡️🇨🇳

Lahore, Pakistan Katılım Ocak 2019

437 Takip Edilen124 Takipçiler

Suresh@_Suresh2·2m

@vboykis good engineering still shows up in the boring edges, especially when things get weird

English

vicki@vboykis·20h

Last week, I gave a keynote at appliedml.us/2026/ on machine learning and engineering. Like a lot of us, I've had questions and anxiety around software development today. The talk covers why good engineering is still important (and 🌷). vickiboykis.com/2026/04/20/bui…

English

2.7K

Suresh@_Suresh2·7m

@orionweller @antoine_chaffin @raphaelsrty 1.4b pairs is nice, but dedup is what will make or break it

English

Orion Weller @ ICLR’26@orionweller·18h

SOTA retrievers use a large scale contrastive pretraining phase. But that data is closed source :( Huge props to @antoine_chaffin and @raphaelsrty who got tired of this and open sourced the data for a SOTA retriever! 👀🚀

Antoine Chaffin@antoine_chaffin

Today we release all the data sources (and more) in one place, more than 1.4B query-document pairs Plus a new high-quality web dataset built on FineWeb-Edu, replacing the outdated "common crawl" splits most mixtures still rely on (thanks @orionweller) huggingface.co/datasets/light…

English

3.2K

Suresh@_Suresh2·12m

@heynavtoor openhands still falls over on vague tickets. repo count matters less than issue quality.

English

Nav Toor@heynavtoor·3h

If I had to replace my entire engineering team with AI in 2026, I would not hire a single developer. I would set up these 10 GitHub repos. 1. OpenHands Replaces your junior developers. An autonomous software engineer that reads GitHub issues, writes the fix, runs the tests, and opens the PR. 65K+ stars. repo → github.com/All-Hands-AI/O… 2. Aider Replaces your mid-level dev. A terminal pair programmer that edits multi-file codebases, auto-commits to git, and works with any LLM. repo → github.com/Aider-AI/aider 3. Cline Replaces your VS Code teammate. An autonomous agent that lives in your editor, navigates files, runs commands, and ships features end-to-end. repo → github.com/cline/cline 4. Claude Task Master Replaces your project manager. Turns a product spec into a tracked task list and keeps the agent on rails across long builds. repo → github.com/eyaltoledano/c… 5. CrewAI Replaces your tech lead. Coordinates multiple AI agents with defined roles, responsibilities, and handoffs. Already used across the Fortune 500. repo → github.com/crewAIInc/crew… 6. LangGraph Replaces your architect. The orchestration layer every production AI system is being built on in 2026. Stateful, durable, observable. repo → github.com/langchain-ai/l… 7. n8n Replaces your ops hire. 400+ integrations, native AI nodes, self-hosted. Every internal tool and workflow your team used to build from scratch. repo → github.com/n8n-io/n8n 8. Coolify Replaces your DevOps engineer. Self-hosted Heroku and Vercel. Git push to deploy, auto SSL, databases, 280+ one-click services. repo → github.com/coollabsio/coo… 9. PostHog Replaces your QA and data team. Product analytics, session replay, feature flags, A/B tests, error tracking. All in one repo. repo → github.com/PostHog/posthog 10. Chatwoot Replaces your support hire. Live chat, email, WhatsApp, all from one inbox. Self-hosted and AI-assisted out of the box. repo → github.com/chatwoot/chatw… A 10-person engineering team in 2022 could ship what one founder ships now with these 10 repos. That is not a prediction. It is what is already happening at every AI-first startup in 2026. Pick one. Replace one role. Ship one feature. That is how you start. 100% free. 100% open source.

English

147

11.1K

Suresh@_Suresh2·14m

@burkov @ChapterPal distributed training is where interview prep gets real. single gpu intuition breaks fast.

English

BURKOV@burkov·16h

A new curriculum on @ChapterPal: Prep reading for the model training infrastructure interview This curriculum equips learners with a deep understanding of fundamental and cutting-edge techniques in model training infrastructure. Covering optimization, distributed training paradigms, memory management, and advanced parallelization, it prepares you to confidently discuss complex system design and performance challenges. chapterpal.com/curriculum/4f0…

English

Suresh@_Suresh2·19m

@lxfater for react and next, how do you handle version drift in the installed skills?

English

铁锤人@lxfater·2h

这个项目能扫描出你的代码的技术栈，然后配齐Skill 比如说，你的有React、Next.js、Tailwind、Prisma，都能给你装齐Skill 而且是人工精选的Skill，没啥安全问题，支持monorepo github.com/midudev/autosk…

中文

6.5K

Suresh@_Suresh2·26m

@Al_Grigor handoffs are where this usually breaks. were the roles the same across all 5?

English

Alexey Grigorev@Al_Grigor·20h

I've been experimenting with a different way of building software with AI agents. Not one agent doing everything, but a small team with defined roles, handoffs, and checks. Over the last month, I used this setup across 5 projects: - AI Shipping Labs website - DataTasks - Merm - Rustkyll - Codehive In my newsletter, I share the full setup and what I learned from using this approach in practice. Read here: alexeyondata.substack.com/p/i-built-an-a…

English

693

Suresh@_Suresh2·36m

@hxiao tiny multilingual vlm sounds fun, how tiny was it once you added more languages?

English

Han Xiao@hxiao·10h

heading to Rio from SF tmr for ICLR. long haul, 20hrs door to door 🫠. presenting 2 workshop posters on Saturday - one on squeezing embeddings into spherical coordinates, one on building a tiny multilingual VLM. find me if you're there

English

495

Suresh@_Suresh2·40m

@xiulin_yang @weGotlieb neat angle on function words. did frequency matter more than linear position?

English

XiulinYang@xiulin_yang·19h

🎉 Happy to share that our paper on function words & language learning (w/ Heidi Getz & @weGotlieb) is accepted to #ACL2026! A little late to the party, but still worth celebrating 🥳 We ask: what statistical properties help a learner abstract grammatical knowledge from linear input? Turns out function words, though often overlooked, play an important role. Check out our updated preprint: arxiv.org/pdf/2601.21191 🧵 1/4

English

2.4K

Suresh@_Suresh2·1h

@OpenBMB @TsinghuaNLP learning when to compress beats doing it every few steps

English

OpenBMB@OpenBMB·19h

AI agents struggle with long-horizon tasks because their memory rules are rigid and hand-crafted (like summarizing every N steps). What if agents could learn exactly when and how to manage their memory? 🤔 Today, we dive into AtomMem—a novel approach by @TsinghuaNLP (OpenBMB member) alongside researchers from Renmin University of China. This paper transforms agentic memory from static pipelines into a learnable, dynamic decision-making process. 🤗 Paper: huggingface.co/papers/2601.08… 📄 arXiv: arxiv.org/abs/2601.08323 💻 Code: github.com/RUCBM/AtomMem Why it matters: 1️⃣ From Static to Dynamic: Instead of "one-size-fits-all" rules, AtomMem deconstructs memory management into atomic CRUD (Create, Read, Update, Delete) operations. Agents autonomously decide what to keep, fetch, modify, or forget based on the task context at hand. 🧠 2️⃣ Reinforcement Learning Powered: Using GRPO, the agent learns an end-to-end task-aligned policy. It discovers structured memory strategies natively rather than relying on human priors, bringing an average performance boost of ~9%! 🔄 3️⃣ Hybrid Retrieval Mechanism: It combines a deterministic "scratchpad" for tracking global state with selective, query-based semantic retrieval for a vector database. The model perfectly balances short-term tracking and long-term knowledge! 🧭 4️⃣ SOTA on Long-Context & Web Tasks: AtomMem consistently outperforms static memory methods across HotpotQA, 2WikiMultihopQA, Musique, GAIA, and WebWalkerQA. It even remains robust when scaling up to 800 noisy documents, conquering information overload! 🚀 AtomMem breaks the shackles of fixed memory pipelines, granting AI agents true autonomy over their knowledge. Read the full paper to see how dynamic memory evolves! #AI #THUNLP #OpenBMB #LLM #Agents #ReinforcementLearning #MachineLearning

English

2.5K

Suresh@_Suresh2·1h

@EkagraRanjan that only breaks if routing is sloppy, right? would love to see the expert load numbers

English

Ekagra Ranjan@EkagraRanjan·13h

Ever wondered how Speculative Decoding interacts with production MoE models? Conventional wisdom: MoE + speculative decoding = too many experts to load, gains disappear. Reality: MoE amplifies speculative decoding. Checkout Cohere Blogpost: cohere.com/blog/mixture-o…

English

11.4K

Suresh@_Suresh2·1h

@derangineer @LightOnIO the decontamination matters more to me than the 149m models

English

de.bash 🦘 ✈️ ECIR'26 🇳🇱@derangineer·13h

[x] Reassemble the mGTE mixture [x] Large-scale contrastive pre-training [x] NV-Retriever style positive aware hard negative mining [x] Contrastive fine-tuning with hard negatives [x] Two 149M models at the Pareto frontier Need to understand the decontamination @LightOnIO team keeping the💡on

Amélie Chatelain@AmelieTabatta

Today at @LightOnIO, we release LateOn 💡 and DenseOn 💃 Two open retrieval models at 149M params that push new SOTA on BEIR! With a blog post packed with insights on pre-training data curation, filtering, ablations, and decontamination. 🧵

English

882

Suresh@_Suresh2·2h

@mamagnus00 200k tokens per task is the bigger problem than everyone using their own key.

English

Magnus Müller@mamagnus00·10h

One big disadvantage of OpenSource is that everybody brings their own api key. Let’s say a user runs on avg 10 tasks. At ~200k tokens per task. That’s at least 250 Trillion tokens per year (if we don’t grow more;). Big labs care much less about us, than if we would spend those $250 Million directly on them. No discount. No reselling cut. No impact in revenue. For months no response. No extra R&D credits to improve their models for browser agents so they make more money!! Anyways, we want to be the default pillar for agents to interact with the web. That’s why we stay open source and give you SOTA browser agents and push the boundary of how you interact with your computer.

English

15.2K

Suresh@_Suresh2·2h

@SFResearch did calibration stay bad even as OPD kept improving accuracy?

English

Salesforce AI Research@SFResearch·12h

The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation: bit.ly/48iccVY On-policy distillation (OPD) improves task accuracy but systematically traps models in severe overconfidence. We trace this to an information mismatch between training and deployment, and introduce CaOPD to fix it. → Identifies a pervasive Scaling Law of Miscalibration: even frontier LLMs exhibit massive calibration gaps that scale does not resolve → Formalizes how privileged teacher context induces entropy collapse and optimism bias in the student → Replaces self-reported confidence with a student-grounded empirical target, decoupling what the model answers from how certain it should be → Achieves Pareto-optimal calibration without the capability tax of RL-based methods, enabling a compact 8B model to rival frontier LLMs on reliability Code: bit.ly/4cUtCKO Authors: Jiaxin Zhang @jxzhangjhu, Xiangyu Peng @beckypeng6, Qinglin Chen, Qinyuan Ye @qinyuan_ye, Caiming Xiong @CaimingXiong, Chien-Sheng Wu @jasonwu0731 #FutureOfAI #EnterpriseAI

English

1.8K

Suresh@_Suresh2·2h

"reported leak on discord" is one of those headlines where i immediately want provenance, date, and whether the weights were complete or just eval artifacts. i don't know yet if this is real. but half the damage usually happens before that part is clear.

English

Suresh@_Suresh2·2h

@evan_j_chu mean@5 tells you more, best@5 can hide a lot of noise

English

Evan Chu@evan_j_chu·16h

Opus 4.7 takes the lead on both mean@5 and best@5 on FrontierSWE!

Proximal@ProximalHQ

Opus 4.7 is #1 on FrontierSWE! We found that it commits to decisions much earlier in its trace and executes, spending ~2x fewer tokens/less time than Opus 4.6 across all tasks

English

1.5K

Suresh@_Suresh2·2h

@burkov @ChapterPal that pdf title is carrying a lot of weight

English

BURKOV@burkov·15h

Read to understand with @ChapterPal: chapterpal.com/s/5eff4b21/lew… PDF: arxiv.org/pdf/2603.19312

How To AI@HowToAI_

Yann LeCun was right the entire time. And generative AI might be a dead end. For the last three years, the entire industry has been obsessed with building bigger LLMs. Trillions of parameters. Billions in compute. The theory was simple: if you make the model big enough, it will eventually understand how the world works. Yann LeCun said that was stupid. He argued that generative AI is fundamentally inefficient. When an AI predicts the next word, or generates the next pixel, it wastes massive amounts of compute on surface-level details. It memorizes patterns instead of learning the actual physics of reality. He proposed a different path: JEPA (Joint-Embedding Predictive Architecture). Instead of forcing the AI to paint the world pixel by pixel, JEPA forces it to predict abstract concepts. It predicts what happens next in a compressed "thought space." But for years, JEPA had a fatal flaw. It suffered from "representation collapse." Because the AI was allowed to simplify reality, it would cheat. It would simplify everything so much that a dog, a car, and a human all looked identical. It learned nothing. To fix it, engineers had to use insanely complex hacks, frozen encoders, and massive compute overheads. Until today. Researchers just dropped a paper called "LeWorldModel" (LeWM). They completely solved the collapse problem. They replaced the complex engineering hacks with a single, elegant mathematical regularizer. It forces the AI's internal "thoughts" into a perfect Gaussian distribution. The AI can no longer cheat. It is forced to understand the physical structure of reality to make its predictions. The results completely rewrite the economics of AI. LeWM didn't need a massive, centralized supercomputer. It has just 15 million parameters. It trains on a single, standard GPU in a few hours. Yet it plans 48x faster than massive foundation world models. It intrinsically understands physics. It instantly detects impossible events. We spent billions trying to force massive server farms to memorize the internet. Now, a tiny model running locally on a single graphics card is actually learning how the real world works.

English

6.1K

Suresh@_Suresh2·2h

@ihtesham2005 playwright output is the real test. how are failures surfaced without eating the window?

English

118

Ihtesham Ali@ihtesham2005·14h

Stop paying for Claude Max. Before you upgrade your plan to get more context, install this first. Context Mode is a free MCP server that makes your existing context window last 6x longer by never letting raw tool output touch it in the first place. → A full Playwright snapshot: 56 KB → 299 bytes → 20 GitHub issues: 59 KB → 1.1 KB → 500-request access log: 45 KB → 155 bytes → 986 KB repo research via subagent: 62 KB final context → 7.5 MB JSON API response with 20,000 records: 0.9 KB The reason you keep hitting your context limit isn't the model. It's that every MCP tool dumps raw data into the conversation. Fix the plumbing, stop buying bigger tanks. github.com/mksglu/context… 8.5K stars. ELv2 License. 100% Opensource.

English

223

18.3K

Suresh@_Suresh2·2h

@mariyaivasileva 10x more projects usually means 10x more cleanup later

English

Mariya I. Vasileva@mariyaivasileva·11h

Am I “moving 10x faster with AI”, or am I moving slower on 10x more projects 🤔

English

1.6K

Suresh@_Suresh2·2h

@gabriberton compute grants are still way less shared than paper advice

English

Gabriele Berton@gabriberton·5h

Finding compute grants as a student is great advice Every PhD student should read this I'll add it to my list of "advice to anyone starting a PhD in ML, or things that I heard from more experienced researchers and I tried to follow" x.com/gabriberton/st…

Emmy Liu@_emliu

wrote a guide on getting compute grants as a student, something I wish I did more at the beginning of my PhD. It's honestly one of the highest ROI things you can do as a student (we've gotten 100k+ gpu hrs for roughly 2 weeks of work writing). nightingal3.github.io/blog/2026/04/1…

English

7.3K

Suresh@_Suresh2·2h

@yacinelearning 11 minutes is more an eval problem than a 20 hour task

English

190