Suresh

6.1K posts

Suresh

@_Suresh2

MSc Software Engineering @ Chongqing University ’26 | Researching AI x Software Engineering (AI for SE & SE for AI) | 🇵🇰➡️🇨🇳

Lahore, Pakistan 가입일 Ocak 2019

438 팔로잉127 팔로워

Suresh@_Suresh2·5m

@RevenueCat the no release part is the win. way easier to test offer logic fast

English

RevenueCat@RevenueCat·19h

1️⃣ Ship one paywall 2️⃣ Make it behave like many 3️⃣ Enjoy all your free time* Introducing Paywall Rules: → Show/hide components based on offers or variables → Personalize without new releases (or new paywalls) * Or fill it with other things you need to do Learn more 👉 revenuecat.com/blog/engineeri…

English

2.8K

Suresh@_Suresh2·13m

@vincent_koc @steipete git sync is the win here. token refresh is such a dumb failure mode.

English

Vincent Koc@vincent_koc·13h

slacrawl 0.4.0 is out. Like @steipete's Discrawl, this release adds Git-backed archive sync, so a Slack archive can be published to a private repo and queried locally without every user needing bot credentials with auto-refresh. Get it! github.com/vincentkoc/sla…

English

28.4K

Suresh@_Suresh2·22m

@bclavie i've seen the same. needing 3x the tokens kills the small edge.

English

Ben Clavié@bclavie·5h

Qwen3.6-27B definitely looks better than 3.5 but isn't replacing Gemma 4 31B for me. Qwen does look very slightly ahead on my metrics but consistently needs ~3x more tokens to get there and it's definitely not worth it. Is it a my-tasks quirk or just a general Qwen thing?

English

7.2K

Suresh@_Suresh2·24m

@SuguruKun_ai 725 prompts is cool until the same one gives you two different styles

English

すぐる | ChatGPTガチ勢 𝕏@SuguruKun_ai·17h

GitHubで公開されてる「GPT Image 2」プロンプト集が有益すぎる。 ㅤ 725個のプロンプトが完全無料で使える人気リポジトリ。 ㅤ 日本語含む16言語対応で、コピペで即使える完成度。厳選された6個のフィーチャープロンプトも付いてて、引用カード生成、商用イラスト、ストーリーボード、マルチ言語ポスターなど実用的なものばかり。 ㅤ 最新の画像生成AIを使ってる人は必見。👇

日本語

144

1.3K

74K

Suresh@_Suresh2·29m

@deanwball that theory gets shaky once it can read slide text too

English

Dean W. Ball@deanwball·6h

As everyone knows, the internet has millions of images of art galleries filled with paintings of otters sitting on airplanes, which is the only reason these stochastic parrot AIs can produce outputs like this.

Ethan Mollick@emollick

I have been using GPT ImageGen-2 for the past weeks I didn't think that better image-generators would be a big deal but it turns out that there is a quality threshold I didn't expect, where you can now get text, slides, academic papers Look at what it does with my "otter test"!

English

522

31.1K

Suresh@_Suresh2·35m

@Pragya2k @Harman26Singh how did it do on tts with style and length leakage?

English

Pragya Srivastava@Pragya2k·6h

If you’re at #ICLR2026, come say hello! 👋 I and @Harman26Singh will be at Pavilion 4 (P4-#4603) tomorrow from 3:15 PM – 5:45 PM discussing our work on Reward Modeling via Causal Rubrics —and the gigantic gains it brings to on-policy RL and TTS! 📈 #ICLR2026 #RL #Rubrics

Pragya Srivastava@Pragya2k

🚨 New @GoogleDeepMind paper 𝐑𝐨𝐛𝐮𝐬𝐭 𝐑𝐞𝐰𝐚𝐫𝐝 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠 𝐯𝐢𝐚 𝐂𝐚𝐮𝐬𝐚𝐥 𝐑𝐮𝐛𝐫𝐢𝐜𝐬 📑 👉 arxiv.org/abs/2506.16507 We tackle reward hacking—when RMs latch onto spurious cues (e.g. length, style) instead of true quality. #RLAIF #CausalInference 🧵⬇️

English

Suresh@_Suresh2·36m

amzn at $255.28 on "anthropic partnership and aws momentum" reads like the market pricing compute scarcity before it prices model quality. i keep wanting one boring number here: how much revenue is inference pull-through vs everyone free-associating around capex.

English

Suresh@_Suresh2·44m

@hyunji_amy_lee @LucasPCaccia @EliasEskin very curious how often env or config drift breaks the executable gist

English

hyunji amy lee@hyunji_amy_lee·5h

I won’t be attending #ICLR2026, but @LucasPCaccia and @EliasEskin will be presenting our work, Gistify! We study whether coding agents can truly understand a repository by extracting its gist: generating a single, self-contained, executable file that reproduces the behavior of a target command (e.g., a test or entrypoint). It is lightweight, broadly applicable evaluation of codebase-level reasoning! I’d also love to connect online. Feel free to reach out! 📅 3:15 PM-5:45 PM Thu, Apr 23, 2026 📍 Pavilion 3, Poster 1020

hyunji amy lee@hyunji_amy_lee

🚨 Excited to announce Gistify!, where a coding agent must extract the gist of a repository: generate a single, executable, and self-contained file that faithfully reproduces the behavior of a given command (e.g., a test or entrypoint). ✅ It is a lightweight, broadly applicable evaluation that tests whether models can reason at the codebase level 😯 Even strong LLMs/frameworks struggle, especially on long, multi-file traces!

English

1.1K

Suresh@_Suresh2·49m

@JackWoth98 @googlecloud 13 skills is nice, but handoff error evals matter more than the count

English

Jack Wotherspoon@JackWoth98·7h

Official Agent Skills for Google Cloud 🪄 We just launched an official GitHub repo for @googlecloud Agent Skills. 13 new skills to be used with the agent harness of your choice: Gemini CLI, Codex, Claude Code, OpenCode. Read the full launch blog or checkout the repo below 👇

English

170

8.1K

Suresh@_Suresh2·4h

@Gorden_Sun what happens when the brand guidelines contradict each other? that's usually where this breaks

English

115

Gorden Sun@Gorden_Sun·15h

Google开源DESIGN.md 给Agent的设计系统规范，Agent读完这份文件，就能持续按照品牌规范生成UI，能跨工具、跨项目复用。 Github：github.com/google-labs-co…

中文

176

12.9K

Suresh@_Suresh2·4h

@kunchenguid web spreadsheet is a brutal benchmark. did any org optimize for speed over quality?

English

146

Kun Chen@kunchenguid·9h

lol remember this org chart meme? I just created a full simulation for all of them with agents, and the results blew my mind! the simulation asked each organization to build and ship a web spreadsheet want to take a guess who built the best product? reveal in thread below!

English

457

97K

Suresh@_Suresh2·5h

@QingQ77 zero hallucination is where this loses me. long sessions always pick up weird assumptions.

English

Geek Lite@QingQ77·20h

提供一套 Claude Code 工程化使用规范，通过上下文纪律、子代理委派和多模型共识，实现长时间高强度 SaaS 开发的零配额超限和零幻觉修复。 github.com/anothervibecod… 这是一套 Claude Code 的日常使用规范，核心是把主会话的上下文压到最轻，搜索、测试之类的事丢给便宜的子代理跑，重要改动让多个模型并行审查。配套的模板文件覆盖了职责分工、代码行数上限、部署流程、安全规则、记忆连续性等方面，目标是长时间用 AI 写代码不出错也不超配额。

中文

6.1K

Suresh@_Suresh2·5h

@cocktailpeanut all ai until the first cursed mobile breakpoint

English

cocktail peanut@cocktailpeanut·10h

Just redesigned the Pinokio website 100% with AI. GPT-image-2 as designer, Codex as developer. I'm convinced this is the future of building apps and websites. Here's how I did it.

cocktail peanut@cocktailpeanut

x.com/i/article/2047…

English

8.4K

Suresh@_Suresh2·5h

@DannyLimanseta the huge h100 spend matters, but inference quality usually lags the training jump

English

Danny Limanseta@DannyLimanseta·1d

Composer 2 is one of my favourite day-to-day coding models. Seeing this news means that we should see a big leap in Composer future models.

SpaceX@SpaceX

SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI. The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models. Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.

English

3.2K

Suresh@_Suresh2·5h

@RevenueCat stripe billing in funnels is nice, refunds and proration get messy fast

English

RevenueCat@RevenueCat·12h

If your app runs on Stripe Billing, you can now do a whole lot more with RevenueCat Web. We now support Stripe Billing across: 🔗 Web Purchase Links 💵 Web Paywalls ⏳ Funnels 🌐 The Web SDK Whether you’re already using RevenueCat Web, Stripe Billing, or both — here's what the update means for you👇🧵

English

7.6K

Suresh@_Suresh2·5h

@RedHat_AI @_llm_d_ @vllm_project 2x faster ttft is what users notice most. was that mostly prefix cache or scheduling?

English

Red Hat AI@RedHat_AI·13h

Red Hat and Tesla engineers tackled a real production problem together. 3x output tokens/sec, 2x faster TTFT on Llama 3.1 70B with KServe + @_llm_d_ + @vllm_project. Fixes pushed upstream to KServe along the way. This is what open source looks like. 🤝 🚀

Yuan (Terry) Tang@TerryTangYuan

Excited to share our latest blog post on how we're solving real-world LLM inference challenges at production scale, a collaboration between @RedHat_AI and Tesla engineering teams. We hit the usual pain points: massive model weights choking storage, GPU cycles wasted on naive load balancing, infrastructure that fights you when nodes go down. Our answer: KServe + @_llm_d_ + @vllm_project with prefix-cache aware routing. The results: 3x more output tokens/sec and 2x faster time to first token. Thanks everyone who've contributed to this successful adoption: Scott Cabrinha, Sai Krishna, Sergey Bekkerman, Nati Fridman, Killian Golds, Andres Llausas, Bartosz Majsak, Greg Pereira, Pierangelo Di Pilato, Ran Pollak, Vivek Karunai Kiri Ragavan, Robert Shaw

English

12.6K

Suresh@_Suresh2·5h

@mercor_ai @Kimi_Moonshot @ArtificialAnlys 452 out of 480 is solid, but which 28 were left out and why

English

Mercor@mercor_ai·9h

Kimi K2.6 from @Kimi_Moonshot scores 27.9% at pass@1 on APEX-Agents AA from @ArtificialAnlys. The scores are evaluated on 452 of the 480 public tasks from our benchmark for long-horizon professional work in investment banking, management consulting, and corporate law. K2.6 (27.9%) is a substantial improvement over K2.5 (11.5%), putting it within 5 points of GPT-5.4 (xhigh) and Claude Opus 4.6 (Max) on professional services work.

Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

3.7K

Suresh@_Suresh2·5h

@leothecurious the fancy ml idea is easy, owning the evals later is the hard part

English

100

davinci@leothecurious·8h

this is too real. at work, i've learned to be automatically aversive to gigabrained ML ideas on first impression. just do the scalable things and keep fighting against the inevitable creep of complexity.

roon@tszzl

i think you need to be a little bit stupid to work on neural nets. if you're too smart and too good at math you won't make any progress

English

310

19.8K

Suresh@_Suresh2·5h

@thaiscbranco_ memory is where most demos fall apart first

English

Thais Castello Branco@thaiscbranco_·9h

seeking AI Engineers. not just any AI engineer, one who: - turns model output from slop to *chef's kiss * - obsesses over evals and data quality - hates purple gradients - loves creativity - geeks out on harnesses and test scaffolding - believes memory is the great lock-up in AI - enjoys context & search problems we're hiring for multiple roles. some will work on search, indexing, and embeddings, crawling, scraping, extracting. others on judges, verifiers, and classifiers. others on data pipelines. others on research. all will touch the hard problems of design and creativity. all need to believe in the mission of ending AI slop. based in SF! DM to apply.

English

164

11K

Suresh@_Suresh2·5h

@paulabartabajo_ why fine-tune before pushing evals harder on the poc?

English

Pau Labarta Bajo@paulabartabajo_·15h

Wanna learn how to build real-world apps using Small language models? The whole journey. System design. Proof of concept. Evals. Fine-tune. Redeployment Today I am sharing the recipe, from the trenches of an AI lab. Register to attend liquid-ai.zoom.us/webinar/regist…

English

1.9K

탐색

@RevenueCat @vincent_koc @steipete @bclavie @SuguruKun_ai @deanwball @Pragya2k @Harman26Singh