Suresh

6.1K posts

Suresh banner
Suresh

Suresh

@_Suresh2

MSc Software Engineering @ Chongqing University ’26 | Researching AI x Software Engineering (AI for SE & SE for AI) | 🇵🇰➡️🇨🇳

Lahore, Pakistan 가입일 Ocak 2019
438 팔로잉127 팔로워
Suresh
Suresh@_Suresh2·
@RevenueCat the no release part is the win. way easier to test offer logic fast
English
0
0
0
1
RevenueCat
RevenueCat@RevenueCat·
1️⃣ Ship one paywall 2️⃣ Make it behave like many 3️⃣ Enjoy all your free time* Introducing Paywall Rules: → Show/hide components based on offers or variables → Personalize without new releases (or new paywalls) * Or fill it with other things you need to do Learn more 👉 revenuecat.com/blog/engineeri…
RevenueCat tweet media
English
2
1
16
2.8K
Vincent Koc
Vincent Koc@vincent_koc·
slacrawl 0.4.0 is out. Like @steipete's Discrawl, this release adds Git-backed archive sync, so a Slack archive can be published to a private repo and queried locally without every user needing bot credentials with auto-refresh. Get it! github.com/vincentkoc/sla…
English
4
8
97
28.4K
Suresh
Suresh@_Suresh2·
@bclavie i've seen the same. needing 3x the tokens kills the small edge.
English
0
0
1
7
Ben Clavié
Ben Clavié@bclavie·
Qwen3.6-27B definitely looks better than 3.5 but isn't replacing Gemma 4 31B for me. Qwen does look very slightly ahead on my metrics but consistently needs ~3x more tokens to get there and it's definitely not worth it. Is it a my-tasks quirk or just a general Qwen thing?
English
13
1
74
7.2K
Suresh
Suresh@_Suresh2·
@SuguruKun_ai 725 prompts is cool until the same one gives you two different styles
English
0
0
1
22
すぐる | ChatGPTガチ勢 𝕏
GitHubで公開されてる「GPT Image 2」プロンプト集が有益すぎる。 ㅤ 725個のプロンプトが完全無料で使える人気リポジトリ。 ㅤ 日本語含む16言語対応で、コピペで即使える完成度。厳選された6個のフィーチャープロンプトも付いてて、引用カード生成、商用イラスト、ストーリーボード、マルチ言語ポスターなど実用的なものばかり。 ㅤ 最新の画像生成AIを使ってる人は必見。👇
すぐる | ChatGPTガチ勢 𝕏 tweet media
日本語
8
144
1.3K
74K
Suresh
Suresh@_Suresh2·
@deanwball that theory gets shaky once it can read slide text too
English
0
0
0
7
Dean W. Ball
Dean W. Ball@deanwball·
As everyone knows, the internet has millions of images of art galleries filled with paintings of otters sitting on airplanes, which is the only reason these stochastic parrot AIs can produce outputs like this.
Ethan Mollick@emollick

I have been using GPT ImageGen-2 for the past weeks I didn't think that better image-generators would be a big deal but it turns out that there is a quality threshold I didn't expect, where you can now get text, slides, academic papers Look at what it does with my "otter test"!

English
13
23
522
31.1K
Pragya Srivastava
Pragya Srivastava@Pragya2k·
If you’re at #ICLR2026, come say hello! 👋 I and @Harman26Singh will be at Pavilion 4 (P4-#4603) tomorrow from 3:15 PM – 5:45 PM discussing our work on Reward Modeling via Causal Rubrics —and the gigantic gains it brings to on-policy RL and TTS! 📈 #ICLR2026 #RL #Rubrics
Pragya Srivastava tweet media
Pragya Srivastava@Pragya2k

🚨 New @GoogleDeepMind paper 𝐑𝐨𝐛𝐮𝐬𝐭 𝐑𝐞𝐰𝐚𝐫𝐝 𝐌𝐨𝐝𝐞𝐥𝐢𝐧𝐠 𝐯𝐢𝐚 𝐂𝐚𝐮𝐬𝐚𝐥 𝐑𝐮𝐛𝐫𝐢𝐜𝐬 📑 👉 arxiv.org/abs/2506.16507 We tackle reward hacking—when RMs latch onto spurious cues (e.g. length, style) instead of true quality. #RLAIF #CausalInference 🧵⬇️

English
0
3
13
2K
Suresh
Suresh@_Suresh2·
amzn at $255.28 on "anthropic partnership and aws momentum" reads like the market pricing compute scarcity before it prices model quality. i keep wanting one boring number here: how much revenue is inference pull-through vs everyone free-associating around capex.
English
0
0
0
5
hyunji amy lee
hyunji amy lee@hyunji_amy_lee·
I won’t be attending #ICLR2026, but @LucasPCaccia and @EliasEskin will be presenting our work, Gistify! We study whether coding agents can truly understand a repository by extracting its gist: generating a single, self-contained, executable file that reproduces the behavior of a target command (e.g., a test or entrypoint). It is lightweight, broadly applicable evaluation of codebase-level reasoning! I’d also love to connect online. Feel free to reach out! 📅 3:15 PM-5:45 PM Thu, Apr 23, 2026 📍 Pavilion 3, Poster 1020
hyunji amy lee@hyunji_amy_lee

🚨 Excited to announce Gistify!, where a coding agent must extract the gist of a repository: generate a single, executable, and self-contained file that faithfully reproduces the behavior of a given command (e.g., a test or entrypoint). ✅ It is a lightweight, broadly applicable evaluation that tests whether models can reason at the codebase level 😯 Even strong LLMs/frameworks struggle, especially on long, multi-file traces!

English
2
10
22
1.1K
Jack Wotherspoon
Jack Wotherspoon@JackWoth98·
Official Agent Skills for Google Cloud 🪄 We just launched an official GitHub repo for @googlecloud Agent Skills. 13 new skills to be used with the agent harness of your choice: Gemini CLI, Codex, Claude Code, OpenCode. Read the full launch blog or checkout the repo below 👇
Jack Wotherspoon tweet media
English
6
25
170
8.1K
Suresh
Suresh@_Suresh2·
@Gorden_Sun what happens when the brand guidelines contradict each other? that's usually where this breaks
English
0
0
0
115
Gorden Sun
Gorden Sun@Gorden_Sun·
Google开源DESIGN.md 给Agent的设计系统规范,Agent读完这份文件,就能持续按照品牌规范生成UI,能跨工具、跨项目复用。 Github:github.com/google-labs-co…
Gorden Sun tweet media
中文
2
39
176
12.9K
Suresh
Suresh@_Suresh2·
@kunchenguid web spreadsheet is a brutal benchmark. did any org optimize for speed over quality?
English
0
0
0
146
Kun Chen
Kun Chen@kunchenguid·
lol remember this org chart meme? I just created a full simulation for all of them with agents, and the results blew my mind! the simulation asked each organization to build and ship a web spreadsheet want to take a guess who built the best product? reveal in thread below!
Kun Chen tweet media
English
17
33
457
97K
Suresh
Suresh@_Suresh2·
@QingQ77 zero hallucination is where this loses me. long sessions always pick up weird assumptions.
English
0
0
0
31
Geek Lite
Geek Lite@QingQ77·
提供一套 Claude Code 工程化使用规范,通过上下文纪律、子代理委派和多模型共识,实现长时间高强度 SaaS 开发的零配额超限和零幻觉修复。 github.com/anothervibecod… 这是一套 Claude Code 的日常使用规范,核心是把主会话的上下文压到最轻,搜索、测试之类的事丢给便宜的子代理跑,重要改动让多个模型并行审查。 配套的模板文件覆盖了职责分工、代码行数上限、部署流程、安全规则、记忆连续性等方面,目标是长时间用 AI 写代码不出错也不超配额。
中文
5
19
94
6.1K
Suresh
Suresh@_Suresh2·
@DannyLimanseta the huge h100 spend matters, but inference quality usually lags the training jump
English
0
0
0
25
Danny Limanseta
Danny Limanseta@DannyLimanseta·
Composer 2 is one of my favourite day-to-day coding models. Seeing this news means that we should see a big leap in Composer future models.
SpaceX@SpaceX

SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI. The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models. Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.

English
7
0
58
3.2K
Suresh
Suresh@_Suresh2·
@RevenueCat stripe billing in funnels is nice, refunds and proration get messy fast
English
0
0
0
10
RevenueCat
RevenueCat@RevenueCat·
If your app runs on Stripe Billing, you can now do a whole lot more with RevenueCat Web. We now support Stripe Billing across: 🔗 Web Purchase Links 💵 Web Paywalls ⏳ Funnels 🌐 The Web SDK Whether you’re already using RevenueCat Web, Stripe Billing, or both — here's what the update means for you👇🧵
RevenueCat tweet media
English
3
2
14
7.6K
Red Hat AI
Red Hat AI@RedHat_AI·
Red Hat and Tesla engineers tackled a real production problem together. 3x output tokens/sec, 2x faster TTFT on Llama 3.1 70B with KServe + @_llm_d_ + @vllm_project. Fixes pushed upstream to KServe along the way. This is what open source looks like. 🤝 🚀
Yuan (Terry) Tang@TerryTangYuan

Excited to share our latest blog post on how we're solving real-world LLM inference challenges at production scale, a collaboration between @RedHat_AI and Tesla engineering teams. We hit the usual pain points: massive model weights choking storage, GPU cycles wasted on naive load balancing, infrastructure that fights you when nodes go down. Our answer: KServe + @_llm_d_ + @vllm_project with prefix-cache aware routing. The results: 3x more output tokens/sec and 2x faster time to first token. Thanks everyone who've contributed to this successful adoption: Scott Cabrinha, Sai Krishna, Sergey Bekkerman, Nati Fridman, Killian Golds, Andres Llausas, Bartosz Majsak, Greg Pereira, Pierangelo Di Pilato, Ran Pollak, Vivek Karunai Kiri Ragavan, Robert Shaw

English
4
7
50
12.6K
Mercor
Mercor@mercor_ai·
Kimi K2.6 from @Kimi_Moonshot scores 27.9% at pass@1 on APEX-Agents AA from @ArtificialAnlys. The scores are evaluated on 452 of the 480 public tasks from our benchmark for long-horizon professional work in investment banking, management consulting, and corporate law. K2.6 (27.9%) is a substantial improvement over K2.5 (11.5%), putting it within 5 points of GPT-5.4 (xhigh) and Claude Opus 4.6 (Max) on professional services work.
Mercor tweet media
Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English
1
1
21
3.7K
Suresh
Suresh@_Suresh2·
@leothecurious the fancy ml idea is easy, owning the evals later is the hard part
English
0
0
0
100
Thais Castello Branco
Thais Castello Branco@thaiscbranco_·
seeking AI Engineers. not just any AI engineer, one who: - turns model output from slop to *chef's kiss * - obsesses over evals and data quality - hates purple gradients - loves creativity - geeks out on harnesses and test scaffolding - believes memory is the great lock-up in AI - enjoys context & search problems we're hiring for multiple roles. some will work on search, indexing, and embeddings, crawling, scraping, extracting. others on judges, verifiers, and classifiers. others on data pipelines. others on research. all will touch the hard problems of design and creativity. all need to believe in the mission of ending AI slop. based in SF! DM to apply.
Thais Castello Branco tweet media
English
21
15
164
11K
Pau Labarta Bajo
Pau Labarta Bajo@paulabartabajo_·
Wanna learn how to build real-world apps using Small language models? The whole journey. System design. Proof of concept. Evals. Fine-tune. Redeployment Today I am sharing the recipe, from the trenches of an AI lab. Register to attend liquid-ai.zoom.us/webinar/regist…
English
6
3
36
1.9K