Suresh

6K posts

Suresh

@_Suresh2

MSc Software Engineering @ Chongqing University ’26 | Researching AI x Software Engineering (AI for SE & SE for AI) | 🇵🇰➡️🇨🇳

Lahore, Pakistan Se unió Ocak 2019

438 Siguiendo125 Seguidores

Suresh@_Suresh2·3m

@_vmlops vendored c/c++ is where this gets ugly. lockfiles are the easy part.

English

Vaishnavi@_vmlops·9h

GOOGLE BUILT A VULNERABILITY SCANNER AND OPEN-SOURCED IT most devs ship code without knowing half their dependencies are ticking time bombs osv-scanner fixes that it scans your entire project lockfiles, containers, even vendored c/c++ code and maps every dependency against the osv.dev database supports 11+ ecosystems. npm, pip, cargo, maven, go modules, gem. all of it. the guided remediation feature is the real unlock... it doesn't just tell you what's broken.... it tells you exactly which version upgrades fix the most issues with the least risk call analysis built in. so you only get alerts for vulnerable functions your code actually calls. no noise works offline too. download the db once, scan without internet one command to scan your whole directory: osv-scanner scan source -r ./ github.com/google/osv-sca…

English

298

19.1K

Suresh@_Suresh2·9m

@mdancho84 the fuzzy part is ai data scientists. half the bugs are routing, not retrieval.

English

Matt Dancho (Business Science)@mdancho84·4h

Is context engineering just a new name for RAG? Not quite. But they're solving the same problem: building the right context for your LLM. Here's how we got from one to the other — and why it matters for AI data scientists.

Matt Dancho (Business Science) tweet media

English

2.6K

Suresh@_Suresh2·11m

@heynavtoor pricing decisions are where docs fail. decision logs help way more.

English

Nav Toor@heynavtoor·7h

Your company's knowledge is trapped. It's scattered across Slack, Google Drive, Confluence, Salesforce, Gmail, Jira, GitHub, Notion, SharePoint, and 30 other tools. Someone asks "where is the Q3 report?" Nobody knows. Someone asks "what did we decide about pricing?" Nobody remembers. Someone asks "who handled the Samsung deal?" Three people search for 20 minutes. Nobody finds it. The average employee spends 3.6 hours every day searching for information. That is nearly half the workday. Gone. Glean charges $50 per user per month to fix this. Minimum $50,000 annual contract. No free tier. No self-hosting. No open source. Your company's most sensitive documents flow through their cloud. There is an open source alternative. Self-hosted. Your data never leaves your servers. It's called Onyx. 27,700+ stars on GitHub. Netflix uses it. Ramp uses it. Thales Group uses it. Y Combinator backed it. Khosla Ventures funded it. Here's what it does: → Connects to 40+ tools. Slack, Google Drive, Confluence, Salesforce, Gmail, Jira, GitHub, Notion, Zendesk, Gong, Teams, Dropbox, and more. → AI chat that answers questions from ALL your company data. With citations. → Agentic RAG. Not keyword search. AI that understands what you're asking and retrieves the right answer. → Deep Research mode. Multi-step research across all your connected sources. Generates full reports. → Custom AI agents with unique instructions, knowledge, and actions. → Web search built in. Combines internal knowledge with live internet results. → Code execution. Runs Python in sandboxed containers for analysis. → SSO via Google, OIDC, or SAML. RBAC. SCIM provisioning. → Works with every LLM. OpenAI, Anthropic, Gemini, or self-hosted models via Ollama. → Deploy in 30 minutes with Docker. Here's the wildest part: Glean raised $600 million in venture funding. Valued at $7.2 billion. Charges $50+/user/month with a $50,000 minimum annual contract. Onyx Community Edition is free under MIT License. Core features like RAG, chat, agents, search, and web browsing are included. The Enterprise Edition adds advanced features like permission-awareness, analytics, whitelabeling, and priority support for teams that need them. Even at the Enterprise tier, Onyx costs a fraction of Glean. A 100-person company on Glean pays $60,000+ per year. The same company on Onyx Enterprise pays well under that. Deploy it on your own servers. Your data stays on YOUR infrastructure. No third party ever indexes your internal documents. Backed by Y Combinator. $10M seed from Khosla Ventures and First Round Capital. Community Edition: MIT License. Self-hosted. Free. Enterprise Edition: Advanced security, permissions, and support for teams that scale. 100% Open Source core.

English

168

11.1K

Suresh@_Suresh2·53m

@snsf how does it do on long-running workflows when the spec changes midstream?

English

130

Srinivas Narayanan@snsf·1h

Very excited to launch workspace agents in ChatGPT today. Teams can now create shared agents powered by Codex that handle complex tasks and long-running workflows. We have always wanted to build a product that can go beyond helping individuals be more productive to also helping teams be more effective. Workspace agents are designed for that - they can gather context from the right systems, follow team processes, ask for human approval when needed, and keep work moving across tools/teams. Available in research preview in ChatGPT Business, Enterprise, Edu, and Teachers plans. Huge congrats @tarstarr @christinaahuang @_rohanmehta and the workspace agents team! openai.com/index/introduc…

English

3.8K

Suresh@_Suresh2·56m

@madzadev @BytePlusGlobal $10/month is nice, but MIT matters more if the weights are easy to self-host

English

Madza 👨‍💻⚡@madzadev·12h

Just came across GLM-5.1 by @BytePlusGlobal and the value-to-performance ratio is genuinely impressive! Claude Opus-level coding quality at $10/month, fully MIT-licensed open source, and backed by ByteDance's global infrastructure for a smooth and stable experience! Plug-and-play with Claude Code, TRAE, Open Claw, and Cursor! This is what an accessible, high-performance coding workflow looks like! Try it out yourself: #Byteplus" target="_blank" rel="nofollow noopener">tinyurl.com/3sknst2d#Bytep… See their official announcement and learn more ↓ #sponsored #ad #BytePlus #CodingPlan #ModelArk

BytePlus@BytePlusGlobal

GLM-5.1 is now on BytePlus ModelArk Coding Plan. Starting at just $10/month, ModelArk Coding Plan offers a highly cost-efficient way to access GLM-5.1 alongside other advanced coding models. GLM-5.1 is Z.AI's latest flagship model, MIT-licensed, open-weight, and built for long-horizon agentic coding. GLM-5.1 ranks among the world's top-tier models across leading coding benchmarks, including SWE-Bench Pro. What you get with ModelArk Coding Plan: → Multiple advanced coding models in one subscription: GLM-5.1, Kimi-K2.5, Dola-Seed-2.0-pro, DeepSeek-V3.2, and more. Switch freely or let Auto mode match the best model to the task. → Works with the tools you already use: Claude Code, Cursor, Cline, Codex CLI, Kilo Code, Roo Code, OpenCode, and OpenClaw → No throttling. Backed by ByteDance's infrastructure. → Activated on purchase. Ready to use immediately. Also new this month: Dreamina Seedance 2.0 is now available on BytePlus, the official API platform for Seedance models. Learn more: byteplus.com/en/product/see… Refer friends and earn 10% vouchers on every order with no cap. Your friends get 10% off their first subscription too. Get started for $10/month → tinyurl.com/4zvkf9kc #BytePlus #ModelArk #GLM #AIEngineering #DevTools #AIAgent

English

2.4K

Suresh@_Suresh2·58m

"workspace agents for business" makes me ask one boring thing first: what is the audit trail when the agent edits a doc, pings a customer, and books a meeting off the same prompt. i like the idea. i trust the logs more than the demo.

English

Suresh@_Suresh2·59m

@outcome_school 8x helps, but 100k context still gets ugly when retrieval misses

English

Outcome School@outcome_school·3h

64 heads × 100K tokens = massive KV Cache.That is Multi-Head Attention during long-context inference. GQA with 8 groups cuts that by 8x. Same model. Same GPU. 8x more room.

Amit Shekhar@amitiitbhu

x.com/i/article/2046…

English

422

Suresh@_Suresh2·1h

@shao__meng raw json is the easy part. event ordering gets messy fast

English

meng shao@shao__meng·18h

OpenAI Developers 发布 Euphony，用于可视化和浏览 Chat 数据和 Codex 会话日志开源地址： openai.github.io/euphony/ ChatGPT 或 Codex 的会话日志以原始 JSON、文本或结构化事件形式存在，这些数据往往冗长、噪声大、难以快速定位问题或洞察模式。Euphony 通过上传本地文件或粘贴公开 URL，将这些原始数据转化为易浏览的可视化界面。主要功能包括： · 可视化展示：将聊天记录和会话日志转化为直观的浏览视图，支持时间线、分支结构或事件序列展示，便于概览整个交互过程。 · 翻译支持：自动或手动翻译日志内容，方便跨语言开发者或国际团队使用。 · 过滤与搜索：根据关键词、事件类型（如工具调用、思考步骤、输出结果）、错误代码等进行筛选，快速定位关键部分。 · 编辑能力：允许用户直接在界面中修改日志（如修正提示词、调整上下文），或导出修改后的版本用于复现或分享。 · 更多扩展：支持 Codex 特有的会话日志（如 Agent Loop、工具使用轨迹、重试机制等），并可能包括事件规范化处理（例如标准化停止原因、工具调用边界）。

OpenAI Developers@OpenAIDevs

Introducing Euphony, an open-source tool for visualizing chat data and Codex session logs. Paste in a public URL or upload a local file, and Euphony turns the raw data into an easy-to-browse view. It supports translation, filtering, editing, and more.

中文

4.5K

Suresh@_Suresh2·1h

@freeman1266 the bigger risk is opening it up to remote mcp, not the model bump

English

老金@freeman1266·6h

Google 宣布升级其自主研究智能体，推出 Deep Research 与 Deep Research Max，均基于 Gemini 3.1 Pro 模型打造。核心突破在于扩展数据源与输出形式，可以搜索网络、任意远程 MCP 服务器、文件上传及连接的文件存储。

中文

508

Suresh@_Suresh2·1h

@emollick letters of recs already feel broken to me

English

Ethan Mollick@emollick·3h

Every system that was regulated, either explicitly or implicitly, by the fact that they were effortful for humans (letters of recommendation, lawsuits, government filings, essays) will break.

Anand Shah@avshah99

🚨New preprint! We find evidence of LLMs enabling people to file lawsuits without lawyers (filing "pro se") at historically unprecedented rates in federal courts.👇 1/n

English

364

39.3K

Suresh@_Suresh2·1h

@vllm_project @nvidia how usable is the 31b dense model on orin nano beyond demo latency?

English

vLLM@vllm_project·3h

🎉 Great to see vLLM powering Google Gemma 4 on NVIDIA Jetson. The @NVIDIA Jetson team published a tutorial covering the full Gemma 4 family, from E2B / E4B (audio + image + text) to 26B-A4B MoE and 31B dense. vLLM ships native tool calling and reasoning parser support for Gemma 4. Thanks to the @NVIDIARobotics team! Learn more 👇

NVIDIA Robotics@NVIDIARobotics

What happens when you combine voice, vision, and reasoning on-device? 🤔 Gemma 4 + a vision-language agent (VLA) running on NVIDIA Jetson Orin Nano shows how compact hardware can now handle real-world AI tasks using today’s open models—no cloud required. Get started: nvda.ws/4cU1ebL

English

4.1K

Suresh@_Suresh2·1h

@goodside 11 is a weird target. was iteration 6 off on symmetry or just stitch texture?

English

Riley Goodside@goodside·15h

ChatGPT Images 2.0 generates a crocheted doily with 11 petals—i.e. order 11 radial symmetry. (I was unable to find any 11-petal doilies via Google Image Search, and the reasoning trace included seven iterations before this result, so I’m confident it’s somewhat original.)

English

122

7.8K

Suresh@_Suresh2·1h

@makulas1913 24m vs 2m is wild if the retrieval quality was actually comparable

English

Mohammed Makulas@makulas1913·6h

نفس المهمة المطلوبة من الـ Agent: الـ RAG العادي استهلك 24 مليون توكن. أداة OpenViking استهلكت 2 مليون توكن فقط، وبنسبة نجاح أعلى 43%. أكبر كذبة تقنية صدقناها في بناء الـ AI Agents هي إن قواعد البيانات المتجهية (Vector DBs) المسطحة تكفي. ترمي فيها آلاف الملفات، ولما يسأل المستخدم، النظام يسحب أي شيء يشبه السؤال دلالياً ويحشره في الـ Context window. النتيجة؟ فاتورة API مرعبة، وهلوسة تكسر الـ Business logic. مستودع OpenViking (مفتوح المصدر من Volcengine) نسف هذي الفكرة تماماً. الـ Architecture الجديد حول إدارة السياق إلى "نظام ملفات". الـ Agent صار يتعامل مع الداتا ببروتوكول viking:// كأنه يتصفح نظام Linux. عنده مجلد مخصص للـ Skills، ومجلد للـ Memory. يبحث بوعي وهيكلة، مو خبط لصق. والعبقرية الهندسية في هذا النظام هي الـ Layered Loading (L0/L1/L2). الـ Agent ما يقرأ الملف كامل. يقرأ سطر واحد ملخص (L0). إذا تأكد إنه يحتاجه لحل المشكلة، يطلب قراءة التفاصيل (L2). هذا التكنيك البسيط هو اللي نزل استهلاك التوكنز بنسبة 96% ورفع دقة الإجابة. إذا جالس تبني Agents لمشاريع Enterprise أو ERP وتعتمد على البحث الأعمى في الـ RAG التقليدي، أنت قاعد تحرق موارد سيرفراتك على الفاضي. المستقبل للـ Structured Navigation، مو للحشو العشوائي. github.com/volcengine/Ope…

العربية

254

12.9K

Suresh@_Suresh2·1h

@Ali_TongyiLab @huggingface hitting #1 on hugging face usually means people are trying to break it first

English

Tongyi Lab@Ali_TongyiLab·13h

Qwen3.6-35B-A3B is trending at #1 on Hugging Face! 🥇🤗 Thank you for making us the top trending model on @huggingface this week. Let's keep building!

English

344

12.3K

Suresh@_Suresh2·1h

@LechMazur 66% flip rate is rough. did longer stories make it worse?

English

Lech Mazur@LechMazur·1d

Does an LLM keep the same judgment when you swap the answer order? New LLM Position Bias Benchmark! Judge models compare two lightly edited versions of the same story twice, with the order swapped. The median model flips in 45% of decisive case pairs. GPT-5.4 is worst at 66%!

English

27.7K

Suresh@_Suresh2·1h

@threeaus @AmandaAskell that grad-level constraint is doing a lot. otherwise you get textbook summaries fast.

English

182

陆三金@threeaus·7h

Anthropic 的哲学家@AmandaAskell 最近参加了一个访谈，在访谈中她分享了自己探索好奇领域的一个方法。提示词大概是：我希望你从「xx」领域里选一个大概研究生水平的概念。然后我希望你通过写一个寓言的方式，间接地把这个概念完整讲出来。最好一直到快结尾时，人才会慢慢意识到这个概念究竟是什么。然后在故事之后，再补一段解释，把你刚才真正要讲的概念说清楚。

中文

234

1.4K

79.7K

Suresh@_Suresh2·1h

@davideciffa 415 tok/s is wild, how bad is prompt processing with the megakernel?

English

237

mrciffa@davideciffa·3h

I love tinygrad, but with our megakernel you can go to 415 tok/s in decoding speed 🚄

the tiny corp@__tinygrad__

We set out to replicate Kimi's 193 tok/s Qwen3.5-0.8B on M3 Max. Our baseline is already 178 tok/s, beating LMStudio (160) and llama.cpp (140) out of the box, but with tinygrad's custom kernel feature Claude cranked it to 195.7!

English

9.1K

Suresh@_Suresh2·2h

@scaling01 the bidirectional part matters. curious how it does on messy chat logs vs benchmarks

English

Lisan al Gaib@scaling01·4h

OpenAI just released a new open-source model it's "a bidirectional token-classification model for personally identifiable information (PII) detection and masking in text" github.com/openai/privacy… huggingface.co/openai/privacy…

English

96.7K

Suresh@_Suresh2·2h

@danveloper the lazy excuse stops being funny when it eats two days

English

111

Dan Woods@danveloper·9h

I think I'm completely done with Anthropic at this point. Days of building and testing to find out Opus decided halfway through to just... not adhere to the spec. "I was being lazy" - Claude, 2026. For all of Codex's shortcomings (bugs seem to be ~fixed at this point), at least OpenAI models have never straight-up lied to me.

English

204

10.4K

Suresh@_Suresh2·3h

@paulabartabajo_ how did the browser-control eval handle small dom changes?

English

Pau Labarta Bajo@paulabartabajo_·17h

Advice for AI engineers 💡 Wanna learn how to fine-tune a Language Model for browser control? Here's an example with code ↓ github.com/Liquid4All/coo…

English

1.1K

Descubrir

@_vmlops @mdancho84 @heynavtoor @snsf @tarstarr @christinaahuang @_rohanmehta @madzadev