sunil mallya

4.4K posts

sunil mallya

@sunilmallya

CTO/Co-Founder @_FlipAI; ex-Head of Eng #AmazonComprehend #NLP, Creator #AWSDeepRacer #SF, Tweets on ML, RL, Privacy, Cats

SF Katılım Eylül 2009

2.2K Takip Edilen2K Takipçiler

sunil mallya@sunilmallya·5d

@dandigangi @QuinnyPig Somehow they didn’t learn that compliance has standards, apparently they are going through their auditors for standards!

English

Dan DiGangi@dandigangi·5d

@sunilmallya @QuinnyPig bro isn’t even 30 and thinks he could write a book on compliance already 😩

English

Corey Quinn@QuinnyPig·5d

Absolutely none of this would have been done if they hadn’t gotten caught not doing it. “Untrustworthy” is an instant dealbreaker for freaking compliance. You do not lie to your auditor, full stop. Much less your customers.

Karun Kaushik@karunkaushik_

Over the past week, you may have seen an anonymous post about Delve. While we responded to it in a day, we want to provide more details about what’s true, what's not, and some changes we’ve made. There’s one question behind everything: did Delve fabricate compliance evidence or issue fraudulent audit reports? No. We did not. → Delve is an AI compliance platform that connects customers with independent auditors. We are not an auditor, just as tax preparation software is not an accountant. We have never signed an audit report. → Using default templates for our customers, just like any other compliance platform, is not “faking evidence.” These are meant to serve as a starting point for customers. → Delve does have automation in the platform, with 600+ automated integration tests, an AI Copilot to guide customers through compliance, AI code scanning, and more. -- We built Delve to accelerate innovation by bringing AI to compliance. In doing that, we pushed hard on automation. However, we now realize we didn’t provide enough clarity about what is automated, what is customer-provided, and what is independently audited. We have been working relentlessly to make improvements over the last week. -- On our auditor network: Delve connects customers with independent auditors. Some customers choose their own auditors, but many use firms in our network. Questions have been raised about some of those firms, including ones used by other platforms. Going forward we will set a higher bar in how our auditor relationships are structured and how the process is experienced by customers. Delve is rebuilding our auditor network, removing firms that don’t meet our standards, and offering complimentary re-audits and penetration tests to every customer. On platform templates for our customers: Delve provides default templates, just like many other platforms, for policies, board meetings, risk assessments, and more. These are designed to be starting points only. We should have been more explicit about how they are meant to be reviewed and customized by customers. We are making that indisputably clearer within the platform. On draft audit reports: Third-party auditors are responsible for independently reviewing all evidence and issuing final reports. We built automation that interacts closely with independent audit workflows to help expedite the process on behalf of our customers. However, this contributed to confusion about where automation ends and independent judgment begins. From now on, Delve will no longer automate these parts of the process. Furthermore, customers have a direct line of communication with their auditor to enhance transparency in any audit communications. -- We started Delve because we went through compliance ourselves and saw how slow, expensive, and manual it was. To anyone that wants to sit down and discuss our product philosophy and improvements, please reach out and let’s chat about it.

English

152

17.9K

sunil mallya@sunilmallya·6d

@ypatil125 @abhijaymrana How are you different than an Anthropic + FDE?, it seems that your platform too needs FDE embedding to solve use cases specific to an enterprise

English

Yash Patil@ypatil125·6d

@abhijaymrana Small clarification: What we do at Applied Compute is much broader than “RL-as-a-service” We’re building the platform enterprises use to create high-quality agents specific to their business. RL is one tool we sometimes use to improve those systems.

English

1.5K

Abhijay Rana@abhijaymrana·6d

There were a few RL-as-a-service companies that emerged this year (with one recently acquired by DoorDash for a very generous price). Some aspects of custom post-training are exciting, but the economics seem tricky here. Fundamentally, the RLaaS thesis is that enterprises should post-train models in-house to: 1. Outperform base models on uncommon, domain-specific tasks. - For DoorDash, this includes optimizing ad relevance and order recommendation models. Enterprises at DoorDash's scale have petabytes of proprietary data, and are evidently willing to spend hundreds of millions for talent alone. - Tangentially, these complex domain-specific tasks also make for great training data, which is why some of these companies make most of their revenue from data sales, not enterprise RL. - This is usually only necessary if the model is the bottleneck, which isn't the case for almost all agent deployments. Context/harness engineering go a long way. However, if you have you have a measurable objective + tight feedback loop (like Decagon), improvements start to become nonlinear and significant. 2. Derisk from frontier labs (long-term). - Owning your own model insulates you from OpenAI/Anthropic API pricing, which is why Cursor is racing for a SoTA coding model. - Regulated industries may also prefer local models for data security (especially if training truly becomes commoditized and they can justify the spend). In reality, as Michael mentions, there are ~zero businesses that are immediately ready for RL. Teams like Applied Compute spend most of their time transforming data and mapping processes (which can spiral into months of work) and even building agents for the enterprises before they can train. This feels slightly distracting—closer to McKinsey than OpenAI—but is also the only way to get AI in businesses today. But is this a good business? In some ways, it feels too early and hard to commoditize. Michael's write-up seems a bit frustrated. Also, there are a few existential questions here: - Is a six-figure training run + continual improvement costs worth it for most businesses? - It's almost like (beyond compute) hiring is the bottleneck to scale here. Sure, AC is innovating on RL infra but why won't businesses hire AI talent internally instead of paying AC millions on top of already high training costs? This doesn't seem sticky. - If Opus 5 is substantially better than Kimi K2.5 (which is already distilled from Opus 4.6 lol) does the fine-tuned model become obsolete? Also, the cost of the training run is amortized across the time period between runs (e.g. Cursor has to "pay off" Composer 1 in the period between launch and the Composer 2 release), which applies additional pressure. - Or, if they re-train on the next open-source model, will any of the initial post-training investment transfer? Maybe the data doesn't have to be re-processed, but unsure if training is cheap enough for this to be dismissible yet. Also, part of the reason these are still open questions is because the impact of post-training is difficult to quantify—there's no reliable way to attribute business outcomes to post-training improvements. - Sure, you can create a benchmark to track model improvement on some tasks. - But there's still a gap between [progress on an ad-recommendation benchmark] and [direct revenue growth for the business] that is latent (and hard to measure, given the lack of a single variable in a company's strategy). Realistic benchmarks are still an unsolved problem. In-house post-training is a no brainer for some businesses. Maybe DoorDash will benefit, and it's obvious Cursor and Decagon will. But broadly, I'm unsure how large or sticky this market really is.

Michael Chen@michaelzchen5

x.com/i/article/2037…

English

280

78.1K

sunil mallya@sunilmallya·25 Mar

@PrajwalTomar_ No it doesn’t make LLM smaller, just the kv cache

English

183

Prajwal Tomar@PrajwalTomar_·25 Mar

Google just dropped TurboQuant, and I'm about to get so much more out of my Mac Mini now lol. It makes LLMs 6x smaller and 8x faster with zero quality loss. Now I can run insane AI models locally for free. → Bigger context windows → Way faster processing → Completely secure Google could have kept this to themselves. They didn't. Huge respect for pushing the entire industry forward. We're just 3 months into 2026 and so much has been happening already. If you're not paying attention, you're already behind.

Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English

702

156.1K

sunil mallya@sunilmallya·25 Mar

Dear @nikitabier and @Support, why do genuine posts don’t get the visibility they deserve? I can’t even reach the people who follow me! This has been a consistent observation for many posts over the last few months, despite being a paid user.

English

141

sunil mallya@sunilmallya·20 Mar

@GergelyOrosz I always wondered how people with barely any real work experience build compliance products that require such nuanced knowledge, lots of trust and painful discipline that you learn from experiencing the flows.

English

331

Gergely Orosz@GergelyOrosz·20 Mar

Damning evidence suggesting that compliance certificates issued by Delve (a startup founded in 2023) are fraudlent + worthless I never understood how eg Cluely could be GDPR, SOC2, HIPAA compliant in ~a week. Now we know: they probably aren't. Just wild substack.com/home/post/p-19…

English

1.2K

121.1K

sunil mallya@sunilmallya·16 Mar

@levie Two pizza model for the win

English

221

Aaron Levie@levie·16 Mar

“AI exposed jobs may increase hiring and attract higher wages. It all depends on a) elasticity of consumer demand and b) number of AI exposed tasks in a job.” This is a key point. We’re going to see lots of AI automation emerge that has the opposite effect that we expect, because the cost of doing something goes down and greater demand for that service exists at lower prices. Take a *very* simplistic example in agentic coding to see what happens when you can dramatically increase output per $ of engineering budget. Before AI, a mid-sized company or team within a large company has a project they want to build software for. It takes 50 engineers to fully resource the effort, but the project doesn’t provide the ROI to fund it compared to other initiatives. Or the company knows its expertise isn’t in building software so it’s not even worth starting. So they hire 0 engineers, and don’t start the project. Now, AI agents make it possible for this to be a 10 engineer problem. All of a sudden the ROI calculus immediately changes on starting up the project. So now instead of hiring 0 engineers to do the project, the company hires 10 with AI agents. This has endless implications in coding, in particular, because coding can now have impact for anything from doing internal workflow automation, systems integration, data analysis, as well as customer-facing product innovation. By bringing down the cost of writing code, we can just begin to use it for far more. This will likely play out in a number of other job families as well, where lowered costs or higher output will lead to more demand. Now, not all of this will be smooth. For instance, there may need to be some reallocation of talent across the economy to move from some places of excess supply to places of lower supply. This could be bumpy at times, but the dynamic holds.

Alex Imas@alexolegimas

Also: *EXPOSURE DOES NOT MEAN THREAT OF DISPLACEMENT* *EXPOSURE DOES NOT MEAN THREAT OF DISPLACEMENT* *EXPOSURE DOES NOT MEAN THREAT OF DISPLACEMENT* It can literally mean the opposite: AI exposed jobs may increase hiring and attract higher wages. It all depends on a) elasticity of consumer demand and b) number of AI exposed tasks in a job.

English

246

101.3K

sunil mallya@sunilmallya·11 Mar

@anishmoonka We can’t see the decision that coding agents make for us, that’s why I built and open sourced Agentdiff to help developers including myself gain visibility. github.com/sunilmallya/ag…

English

594

Anish Moonka@anishmoonka·10 Mar

Amazon had four Sev-1 outages (their highest severity level) in a single week. Internal memos say AI-assisted code changes were a contributing factor. The timeline here is wild. In October 2025, Amazon laid off 14,000 corporate employees. In January 2026, another 16,000. That’s about 30,000 people in five months, roughly 10% of the corporate workforce. CEO Andy Jassy said the cuts were about culture, not AI. During those same months, Amazon set a target: 80% of developers using AI coding tools at least once a week. They tracked adoption closely and blocked rival tools like OpenAI’s Codex. Even so, 30% of developers still hadn’t touched Amazon’s in-house tool Kiro by January. In December 2025, Kiro caused a 13-hour AWS outage. The AI tool had production-level permissions and decided the best fix for a bug was to delete and recreate an entire live environment. A second incident involved Amazon Q Developer, another AI tool. Amazon blamed both on “user error, not AI.” But quietly added mandatory peer review for all production access afterward. Then March 5: Amazon’s retail site went down for about six hours. Over 22,000 users reported checkout failures, missing prices, and app crashes. Amazon called it a “software code deployment” error. Five days later, SVP Dave Treadwell made the normally optional weekly engineering meeting mandatory. His memo acknowledged “GenAI tools supplementing or accelerating production change instructions, leading to unsafe practices.” These problems trace back to Q3 2025. Amazon’s own assessment: their GenAI safeguards “are not yet fully established.” The new rule: junior and mid-level engineers now need senior sign-off on any AI-assisted production changes. Treadwell also announced “controlled friction” for the most critical parts of the retail experience. For context, Google’s 2025 DORA report found 90% of developers use AI for coding but only 24% trust it “a lot.” An Uplevel study of 800 developers found Copilot users introduced 41% more bugs with no improvement in output. Amazon is finding out what those numbers look like at the scale of a $500 Billion revenue company, with 30,000 fewer people on staff to catch the mistakes.

Polymarket@Polymarket

BREAKING: Amazon reportedly holds mandatory meeting after “vibe coded” changes trigger major outages.

English

224

1.9K

15.6K

2.7M

sunil mallya@sunilmallya·10 Mar

@fenestbuc Just built it for Claude code now, but the architecture is such that it could extend easily for other CLI tools. Just need to build those pre and post hooks

English

Vaibhav | e/acc@fenestbuc·10 Mar

@sunilmallya this solves a real problem. when 90% of your codebase is agent-generated, git blame becomes meaningless without the reasoning context. attaching prompts and task context as git notes is the right abstraction. curious if this extends beyond claude code to other coding agents too.

English

sunil mallya@sunilmallya·10 Mar

Agentdiff: Git blame for coding agents New open source project to track claude code changes to track its prompt, reasoning, and task context. Agent metadata is auto-attached as Git notes, so PR reviewers get the full AI context without leaving git log. Audit AI-generated code like a human teammate. github.com/sunilmallya/ag…

English

145

sunil mallya@sunilmallya·9 Mar

@levie Dear Aaron, you build a cloud file system many years ago. remember?

English

299

Aaron Levie@levie·8 Mar

x.com/i/article/2030…

ZXX

140

359

2.6K

610.5K

sunil mallya@sunilmallya·9 Mar

@vasuman Hiring got 10x harder. That’s what it is

English

293

vas@vasuman·9 Mar

After interviewing CS students, one thing is clear: too many are drinking the koolaid that software engineering is unimportant because coding models are getting better. So they try to "differentiate" by pivoting to product, GTM, or CoS. Do not stray from the tech. Most founders, myself included, are looking for deeply technical people regardless of role. In a world where AI makes you 100x, it pays to know how to build agents that are robust. Every founder's fear is that your agents are brittle because you have no production software engineering experience. It's also easier than ever to learn. You can genuinely become top 10% in a week and top 1% in a month. The bar is on the floor because no one wants to learn it. I saw a tweet from a huge CEO saying "it's actually better to NOT know how to code." Holy COPE. If you're a student "pivoting from just SWE," chances are you're neither here nor there. It's not ok to be mediocre at GTM/Product while simultaneously being mediocre at SWE. And it's much easier to become an exceptional self-taught engineer in college than an exceptional self-taught GTM/Product person, which almost never happens without real business experience. Learn software engineering, you will be rewarded.

English

1.2K

86K

sunil mallya@sunilmallya·7 Mar

@nesbubuu @Alibaba_Qwen @ggerganov @claudeai It’s a 12pro. Should work fine on yours too

English

Mr. Nesbitt@nesbubuu·7 Mar

@sunilmallya @Alibaba_Qwen @ggerganov @claudeai I’m on a 13 and figured it would be too old what’s your setup?

English

sunil mallya@sunilmallya·3 Mar

The new multimodal Qwen3.5 @Alibaba_Qwen running locally on an old iPhone 12 with pretty good speed (10 tok/s) Uses a modified llama.cpp backend to support metal memory layout for Apple A series chip and SSM ops. Building on top of great library by @ggerganov and @claudeai code with zero iOS experience and old kernel dev experience :)

English

2.9K

sunil mallya@sunilmallya·5 Mar

If you take a catch running back in a World Cup game you automatically win. Thanks @akshar2026

English

sunil mallya@sunilmallya·4 Mar

@scottastevenson This is great news for companies to review contracts faster :), reduce that sales cycle for me

English

115

Scott Stevenson@scottastevenson·4 Mar

We’ve raised $40m in addition to the $50m we raised last October. We’re seeing record-breaking growth in 2026, with lawyers booking 410 demos of Spellbook last week. We now service 4,000+ in-house legal teams and law firms in 80 countries. theglobeandmail.com/business/artic…

English

295

270.1K

sunil mallya@sunilmallya·3 Mar

@JustinLin610 @JustinLin610 Great set of models, did some work on getting them run on old iPhone 12

sunil mallya@sunilmallya

English

2.2K

Junyang Lin@JustinLin610·2 Mar

final shot. these are the small models that i told u!

Qwen@Alibaba_Qwen

🚀 Introducing the Qwen 3.5 Small Model Series Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B ✨ More intelligence, less compute. These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL: • 0.8B / 2B → tiny, fast, great for edge device • 4B → a surprisingly strong multimodal base for lightweight agents • 9B → compact, but already closing the gap with much larger models And yes — we’re also releasing the Base models as well. We hope this better supports research, experimentation, and real-world industrial innovation. Hugging Face: huggingface.co/collections/Qw… ModelScope: modelscope.cn/collections/Qw…

English

1.3K

149.4K

sunil mallya@sunilmallya·3 Mar

@DeepLearningAI I got it working on an old iPhone12 :) x.com/sunilmallya/st…

sunil mallya@sunilmallya

I just built an iOS app that runs @liquidai VL1.6B model locally on an iPhone 12 at ⚡️ speed: 15 tok/sec. I had to write a custom memory allocator for the llama.cpp backend to make it use the iPhone's Metal GPU. Fun project that spanned a few weekends!

English

232

DeepLearning.AI@DeepLearningAI·3 Mar

Liquid AI released LFM2.5-1.2B-Thinking, a 1.17-billion-parameter reasoning model that runs in under 900 MB of RAM and operates about twice as fast as similar models. Designed for small devices, it performs competitively on reasoning benchmarks and is suited to agents that orchestrate tools, extract data, or run local workflows without cloud compute. Find all the details in The Batch ⬇️ hubs.la/Q045gVzd0

English

111

10K

sunil mallya@sunilmallya·3 Mar

cc @Scobleizer more local model goodness, this time cracking the vision + text on Qwen3.5

English

101

sunil mallya@sunilmallya·3 Mar

*some tech notes* llama.cpp crashes on iOS (older iPhones), as it uses 16K system page size vs 4K Metal requirement Qwen3.5 uses a hybrid SSM/attention architecture (delta-net). Metal GPU kernels for SSM ops produce garbage output. Pure CPU works but is slow. Since iPhone12 has unified memory for CPU and GPU, I'm able to offload select operators on CPU, so no PCIe copies, just pointer handoff. Once llama.cpp has full support for the operators, it'll get even faster!

English

197

Keşfet

@dandigangi @QuinnyPig @ypatil125 @abhijaymrana @PrajwalTomar_ @nikitabier @Support @GergelyOrosz