Mudith Jayasekara

69 posts

Mudith Jayasekara

@mudithj

co-founder @parsedlabs sticking it to Big Token, half eng/cs phd @rhodes_trust @UniofOxford, scaling care through intelligence

SF Se unió Ocak 2018

272 Siguiendo169 Seguidores

Tweet fijado

Mudith Jayasekara@mudithj·29 Ağu

Rewatching the greats in this video never gets old. Grateful to now be making our own dent in defining the next paradigm of AI. Working with the most thoughtful and passionate people I know and backed by incredible investors (from LocalGlobe @svennj @asharoraa , HuggingFace @Thom_Wolf, DeepMind, NHS, and others). Thank you to our customers who care enough to glimpse into what the future of language models looks like. Let's build🫡

Charlie O'Neill@oneill_c

Today, we’re launching Parsed. We are incredibly lucky to live in a world where we stand on the shoulders of giants, first in science and now in AI. Our heroes have gotten us to this point, where we have brilliant general intelligence in our pocket. But this is a local minima. We now have an ecosystem of burgeoning tasks where each requires a different kind of intelligence, a different context, a whole host of implicit assumptions and latent knowledge and domain expertise that is very difficult to cram into a system prompt. The big labs want you renting their $50k/month amnesiac interns that forget everything between conversations. Generic behemoths that get quantised, versioned and deprecated behind the scenes, where the only element of control you have is your messy monolithic user prompt. We want people who need their own intelligence to be able to not only access it, but also control it. And whilst the big general models are unbelievably good chatbots and coding agents and purveyors of the world, specialisation of intelligence is required. Clinical scribes, marketing compliance agents, legal red-lining models, insurance policy recommenders, the list goes on. And so that’s what Parsed does: deploy your own frontier model that actually learns. We eval your specific task, build a custom evaluation harness, optimise a model just for you, and host it with continual learning. We bake all the context and knowledge of your task into the model itself, from your engineers to your domain experts to customer feedback, all in a tight SFT → RL loop, with useful interpretability made possible by the open-source ecosystem we build on top of. No more 2000-word prompts with seventeen "IMPORTANT: NEVER DO X" clauses. Your model gets better at YOUR job every single day; the amnesiac pseudo-gods have had their run. Your model, your data, your moat. Let's build 🫡

English

2.2K

Mudith Jayasekara@mudithj·30m

@leonardtang_ 🔨🔨🔨

QME

Leonard Tang@leonardtang_·1h

⚔️⚔️ TOURNO ⚔️⚔️ TOURNAMENT OPTIMIZATION FOR REINFORCEMENT LEARNING IN 🚨NON-VERIFIABLE DOMAINS 🚨 today, models are goated at the easily verifiable: math? ez. code? ez. accounting? ez. …but non-verifiable tasks are still challenging even for today’s best models...

English

3.1K

Mudith Jayasekara@mudithj·1h

We stand on the shoulders of fragile giants. Grateful for the ecosystem but still so much work to get a flexible, bulletproof training stack.

Paras Stefanopoulos@stefanopopoulos

x.com/i/article/2035…

English

Mudith Jayasekara retuiteado

Charlie O'Neill@oneill_c·17 Mar

Contrary to what people think @part_harry_ and I believe about open-source, we know there will always be a place for really large, general models (some of which will be closed-source). A lot of inference volume will come from having many different specialised subagents for different workflows that are basically free and frictionless to train, being orchestrated by a main agent (which in many cases we can also RL an open-source model to do). Tool calls will soon no longer be functions, they will all be calls to specialised qwen 4bs. Search is a great early example of this: remove the complexity of search and the context bloat that comes with it from the main agent, and palm all the responsibility off to a search subagent

Nathan Lambert@natolambert

TLDR: Thinking open models will win on absolute performance is more than just cope, it's holding the open ecosystem from building into something that's different, better, and far more influential at the frontier of AI. Eg Open models can be tools and compliment the best agents.

English

1.3K

Mudith Jayasekara@mudithj·16 Mar

@part_harry_ @willccbb all roads leading to the single layer transformer

English

578

Harry Partridge@part_harry_·16 Mar

Attention residuals and mixture of expert reuse (x.com/yichen4nlp/sta…) are two independent results pointing in the same direction: a single transformer layer, looped n times, is more efficient than n independent transformer layers. As @willccbb has often remarked, the best, most enduring discoveries are when you get improved performance by making the architecture LESS complicated. It seems abundantly clear to me that a single ultra wide layer, looped n times, can be made into a strict generalisation of the current paradigm, whilst also being more elegant in its simplicity.

Kimi.ai@Kimi_Moonshot

Introducing 𝑨𝒕𝒕𝒆𝒏𝒕𝒊𝒐𝒏 𝑹𝒆𝒔𝒊𝒅𝒖𝒂𝒍𝒔: Rethinking depth-wise aggregation. Residual connections have long relied on fixed, uniform accumulation. Inspired by the duality of time and depth, we introduce Attention Residuals, replacing standard depth-wise recurrence with learned, input-dependent attention over preceding layers. 🔹 Enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. 🔹 Introduces Block AttnRes, partitioning layers into compressed blocks to make cross-layer attention practical at scale. 🔹 Serves as an efficient drop-in replacement, demonstrating a 1.25x compute advantage with negligible (<2%) inference latency overhead. 🔹 Validated on the Kimi Linear architecture (48B total, 3B activated parameters), delivering consistent downstream performance gains. 🔗Full report: github.com/MoonshotAI/Att…

English

18.8K

Mudith Jayasekara@mudithj·13 Mar

OPSD-pilled. @kirkby_max and @oneill_c on the heater again

Max Kirkby@kirkby_max

x.com/i/article/2032…

English

199

Mudith Jayasekara@mudithj·10 Mar

@oneill_c Life’s all about range mate

English

Charlie O'Neill@oneill_c·9 Mar

tbpn boys have range, half of the time they're having some philosophical discussion on neuro-based analogies between continual learning approaches in llms to human brain and the other half the time it's just talking about caffeinated beverages

TBPN@tbpn

In 2005, Four Loko came in 23.5oz cans, was 12% ABV and included 156mg of caffeine - the rough equivalent of drinking 4 beers and 2 Red Bulls simultaneously. Phusion Products, the company behind Four Loko - and other drinks like "Earthquake" and "Pirate Water" - is now working with JPMorgan to offload the brand for a rumored $400M. The original Four Loko was "one of the most guarded resources on all college campuses" after it was removed from shelves due to regulatory pressure in 2010, per John.

English

607

Mudith Jayasekara@mudithj·10 Mar

@clarejtbirch @_sholtodouglas @FlintCasey @oneill_c No dish cricket but we’ll take it

English

448

clare ❤️‍🔥@clarejtbirch·10 Mar

very australian weekend tbh

English

114

9.8K

Mudith Jayasekara@mudithj·10 Mar

Couple of Aussies being optimistic and having a crack 🔨 Ty @Stanford_ANZ

Stanford ANZ@Stanford_ANZ

Thank you for coming to our first ANZ in AI Forum @ Stanford! We had a blast. Thank you to incredible speakers: @FlintCasey, @oneill_c, @chrmanning, @clarejtbirch, @danielhanchen, @david__booth, @JamesAlcorn94, @KathrynZ_Skip, @lachygroom, @minmintymin, @mudithj, @saron__berhane, @_sholtodouglas, @TobinSouth, @TristanHeywood! Organized by @janetzhong82, @shiye_su, @saron__berhane

English

514

Mudith Jayasekara retuiteado

Charlie O'Neill@oneill_c·5 Mar

This is one of the early steps we took into researching truly infinite context with repeated, smart KV cache compaction. We have some more bitter lesson continual learning/cache research coming soon!

Baseten@baseten

Long-running agents accumulate context while model memory stays fixed. This leads to a tradeoff: either discard older information or compress it. New work by @oneill_c explores repeated KV-cache compression for persistent agents using Attention Matching. Our research shows one-shot compaction preserves detailed information remarkably well with 65–80% accuracy at 2–5× compression. This far outperforms text summarization. But what happens when you compress, add more context, and compress again repeatedly? baseten.co/research/repea…

English

780

Mudith Jayasekara@mudithj·1 Mar

@oneill_c Pulling up from the logo…

English

Charlie O'Neill@oneill_c·28 Şub

My use of claude code is a slot machine where I try and get my favourite NBA spinner verbs; my favourite is when it hits me with the Mike Breen double bang

English

654

Mudith Jayasekara@mudithj·25 Şub

@amiruci a billion new worlds

English

304

Amir Haghighat@amiruci·25 Şub

You’ve used language models, image models, video models, and voice models. Now it’s time for world models, thanks to World Labs.

English

201

836K

Mudith Jayasekara@mudithj·23 Şub

@philipkiely hardcover printing magic representative of the best content!

English

1.2K

Philip Kiely@philipkiely·23 Şub

Inference Engineering launches today. baseten.com/inference-engi…

English

187

216

2.2K

1.3M

Mudith Jayasekara retuiteado

Charlie O'Neill@oneill_c·17 Şub

Come join @baseten where you get to research and be part of an index on AI to expose yourself to all of it

Amy Tam@amytam01

x.com/i/article/2023…

English

3.5K

Mudith Jayasekara@mudithj·14 Şub

@oneill_c REALLY

English

Charlie O'Neill@oneill_c·14 Şub

x.com/i/article/2022…

ZXX

2.1K

Baseten@baseten·10 Şub

Introducing Kimi K2.5 on Baseten’s Model APIs with the most performant TTFT (0.26 sec) and TPS (340) on Artificial Analysis. Even among a landscape of incredible open source models, Kimi K2.5 stands out with its multi-modal capabilities and it's ability to accommodate an alarmingly large number of tool calls. Get the good stuff here: baseten.co/library/kimi-k…

English

15K

Mudith Jayasekara@mudithj·10 Şub

@baseten model perf 🐐’s

Italiano

143

Mudith Jayasekara@mudithj·28 Oca

@tuhinone model goes brrrr

English

Mudith Jayasekara@mudithj·27 Oca

@oneill_c Always comes back to cartridges

English

Charlie O'Neill@oneill_c·27 Oca

x.com/i/article/2015…

ZXX

1.7K

Mudith Jayasekara@mudithj·23 Oca

@tuhinone Another one

English

100

Tuhin Srivastava@tuhinone·23 Oca

Baseten’s day 0 bet was that inference was the technology that would enable the best user experiences AI could deliver–fast, smart, reliable, secure. And that those experiences would rely not only on a handful of giant general intelligence models, but millions of specialized models built by companies for their specific customers and use cases. Whether you’re a doctor, developer, lawyer, mechanic, researcher, construction worker, marketer, etc, you’re accelerated by specialized tools worthy of your craft. To me, this is one of the most meaningful promises AI can deliver on. We’re starting to see it now. Many of the main-character AI companies on the application layer are built on highly-specialized models for highly-specialized workflows–Abridge, Clay, Cursor, OpenEvidence, Hebbia, Mercor, Notion–these businesses are booming because customers love specialized tools. There are probably hundreds of custom models in production today. Soon, there will be thousands and then millions. All enabled by a high-performing inference layer. Inference has emerged as one of the hardest problems in modern AI systems. Delivering reliable, low-latency experiences requires deep coordination across distributed infrastructure, kernel-level performance, and software ergonomics—even world-class teams struggle to do this well. As a result, as consumers and developers, we’ve grown to accept sluggish performance, frequent downtime, and inconsistent quality across both application companies and model providers. Meanwhile, the demands on inference are accelerating: AI adoption is trending towards ubiquity with reasoning models that are orders of magnitude more compute-intensive. This will only increase as more companies catch on to the virtues of owning their end-to-end IP rather than relying on black-box model APIs on shared infrastructure. Whether we can realize the impact of this generational shift will depend on our ability to serve these models reliably at scale. We knew we could make the technology work, but the biggest delight of it all has been seeing what our customers do with it. The (many-model) future is bright.

Baseten@baseten

We’re thrilled to announce that we have raised $300M at a $5B valuation. The round is led by IVP and CapitalG, both doubling down on their investment in Baseten, and joined by 01A, Altimeter, Battery Ventures, BOND, BoxGroup, Blackbird Ventures, Conviction, Greylock, and NVIDIA. Read more here: baseten.co/blog/announcin…

English

222

78K

Mudith Jayasekara retuiteado

Baseten@baseten·6 Oca

Here’s the part people don’t like to say out loud: in most practical settings, RL is the wrong starting point. The criteria to choose between RL vs. SFT for training your own models is not what you think. Check out pt 2 of our series ⤵️ baseten.co/resources/guid…

English

Descubrir

@leonardtang_ @part_harry_ @willccbb @kirkby_max @oneill_c @clarejtbirch @_sholtodouglas @FlintCasey