Karolus Sariola

398 posts

Karolus Sariola

@ksariola

Augmenting AI @flowaicom

Helsinki, Finland Katılım Ağustos 2011

422 Takip Edilen366 Takipçiler

Sabitlenmiş Tweet

Karolus Sariola@ksariola·6 May

Sharing my experiences from building specialized harnesses for analytical SaaS companies. It's likely that your harness requires your own defaults around data, context, multi-tenancy, and evolving business rules. After all knowledge work is different from software development. Which default behaviors do you encode in your harness today? Are you encoding them in the best way?

Flow AI@flowaicom

Claude Code is a great agent harness, for coding. For analytical SaaS, it is the wrong default. Our CTO @ksariola took that case to AgentCon Silicon Valley this week, drawing on our experience of building specialized harnesses for analytical SaaS. youtube.com/watch?v=pikC5I…

English

109

469.3K

Karolus Sariola retweetledi

Igor Kotenkov@stalkermustang·3d

While reading the DeepSeek v4 paper, I ended up writing down over 90 questions. A lot of the paper reviews out there skip over the details, which is usually where the actual learning happens. So, I decided to put together a proper guide: an Annotated Paper Walkthrough. The core idea is that you still read the original paper as your source material, but whenever things get dense or confusing, I hold your hand through it. You get detailed annotations with visualizations, code snippets, reference links, and—most importantly—the context you need so you don't feel lost. Today I'm releasing v1 with the first 50 notes. Some of the things I unpack: • Why swap Softmax and Sigmoid for Sqrt-Softplus in the MoE Router? • What on earth is a Birkhoff polytope? • Does attention process some tokens 3 times? • What are split-KV and split-K, and why did DeepSeek drop them? • Why use Reverse KL, and where does it even come from? ..and a lot more. Even the most demanding readers will find something new here. Open-source models are still heavily borrowing from DeepSeek v3, and there’s no doubt that v4 details will soon become standard topics in discussions and ML interviews. Hopefully, this guide helps you stay ahead of the curve. As a friend of mine joked, going through this will not only make you a better engineer, but a better man 😂 I can't prove that scientifically, but it's worth a shot. Check it out: dsv4.interactive.ikot.blog

English

237

37.5K

Karolus Sariola@ksariola·3d

Given that Groq and Cerebras aren't adding new models to their catalogue and are seemingly entering into other deals (like serving Spark), do we assume the consumer category of OSS models with fast inference is dead? are there new players entering?

English

Karolus Sariola retweetledi

Aaro Isosaari@aaroisosaari·7 May

Context is King no. 4 in San Francisco yesterday. A few of the highlights below, full talks on YouTube: youtube.com/playlist?list=… Next up: London. @bergr7 (@flowaicom), Itai Smith (@trychroma), @Romainsestier (@StackOneHQ), @OtsoVeistera (@thetokenco), @NathanBurg (@GitHits_com), @aiven_io

English

207

Karolus Sariola retweetledi

Bernardo García@bergr7·7 May

Spoke at Context is King #4 in SF yesterday about why we ended up building our own specialized agent harness instead of reusing an existing one. I walked through the default behaviors we encoded into the harness, and the implementation choices they led to: how to make schema, organizational knowledge, and business rules available to the agent all at once; how to let our semantic data layer learn at the same pace as knowledge evolves; and how to efficiently manage the context window when working with indivisible data. Thanks @aiven_io for co-organizing, and to everyone who came out.

English

Karolus Sariola retweetledi

Aaro Isosaari@aaroisosaari·5 May

The most common mistake we see in analytical agents is dumping data into the context. Letting the agent work from references to the data instead keeps the system fast, the numbers right, and each customer's data separated. @ksariola does the best job I have heard of explaining why that matters and what it changes about how you ship reliable agents on top of real customer data.

Flow AI@flowaicom

English

153

Karolus Sariola retweetledi

Bernardo García@bergr7·5 May

Building slides with Claude made PowerPoint feel unnecessary. 🧵

GIF

English

Karolus Sariola retweetledi

Aaro Isosaari@aaroisosaari·3 May

We are back in San Francisco for Context is King no. 4 on Tuesday, with over 100 builders already in and a few last spots left. My co-founder @bergr7 is going deep on the harness we built at @flowaicom for data-heavy analytical agents. Search structure, memory pointers, multi-tenancy, the parts that make our agents actually work with real data. Joined on stage by Itai Smith (@trychroma), @RomainSestier (@StackOneHQ), @OtsoVeistera (@thetokenco), and @NathanBurg (@GitHits_com). Let's go 🌉

English

221

Karolus Sariola@ksariola·2 May

luma.com/u96hax55?tk=lK…

ZXX

Karolus Sariola@ksariola·2 May

If you're in the Bay Area on Monday, come say hi.

English

Karolus Sariola@ksariola·2 May

Speaking at AgentCon Silicon Valley on Monday, May 4 at the Computer History Museum.

English

Karolus Sariola retweetledi

Bernardo García@bergr7·28 Nis

Speaking at Context is King #4 in SF on May 5th, the meetup we at @flowaicom run with @aiven_io. I’ll be talking about agent-ready knowledge, memory pointers for long context, and multi-tenancy in product data agents. Come along if you're in town!! luma.com/24jez9g5

English

155

Karolus Sariola retweetledi

Erik Kaunismäki@ErikKaum·25 Nis

Super cool to speak at #3 Context is King in Helsinki 🔥 I spoke about: - building the best context for agents that orchestrate GPUs - and demoed ml-intern (credits to @akseljoonas for building it 💪) Thanks to @aaroisosaari (@flowaicom) and @aiven_io for organizing 🙌

English

Karolus Sariola retweetledi

Aaro Isosaari@aaroisosaari·25 Nis

Context is King #3 was the biggest and best edition we have run, with a packed room at Wolt discussing how coding agents deal with context. Fitting for the theme of the series, the operations side of running these is also getting smoother each time, with agents handling more of the repetitive work so we can spend our time on the parts that actually need taste. Thanks to everyone involved: @ConfiMind @huggingface @ErikKaum @FSecure @aiven_io @skvark @GitHits_com @woltapp @flowaicom and more SF is up next on May 5, with several other cities in the pipeline ✈️ Images: Nguyen Oanh & Tony Bui

English

175

Karolus Sariola retweetledi

GDP@bookwormengr·24 Nis

DeepSeek V4 hits it out of the park and addresses HBM shortage: DeepSeek proves why it is such a fundamental research lab. In addition to exceeding Opus 4.6 on Terminal Bench and virtually matching on other performance metrics, the most notable advancement is this statement: "In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2" To understand significance of this point, consider below diagram that shows memory layout for Prefill and Decode nodes. If you implement Decode with Data and Expert parallelism (DEP16) with 16 GPUs on GB200 or GB300 NVL72 rack with DeepSeek v3.2, you are left with 104GB or 176 GB HBRAM per GPU respectively. Here we are assuming MoE parameters are in NVFP4. The remaining HBRAM per GPU dictates how large batch size you can have for inference, which determines how many concurrent request you can serve. Consider GB300 with 176GB left: 1. For 128K context, you need 4.45 GB HBRam for KV Cache, and you can serve only 36 concurrent requests. 2. For 256K context, you need 8.90 GB HBRam for KV Cache, and you can serve only 18 concurrent requests. 3. For 512K context, you need 17.80 GB HBRam for KV Cache, and you can serve only 9 concurrent requests. 4. For 1M context, you need 35.60 GB HBRam for KV Cache, and you can serve only 4 concurrent requests. You see the point. Now you imagine, you actually required 10 times less KV cache somehow at 1M! It basically enables you to server 10 times more requests with same resources. Recall Decode is memory bound and not compute bound, unlike Prefill. This is probably the most important contribution of DeepSeek V4. @teortaxesTex @jukan05 @zephyr_z9

DeepSeek@deepseek_ai

Structural Innovation & Ultra-High Context Efficiency 🔹 Novel Attention: Token-wise compression + DSA (DeepSeek Sparse Attention). 🔹 Peak Efficiency: World-leading long context with drastically reduced compute & memory costs. 🔹 1M Standard: 1M context is now the default across all official DeepSeek services. 4/n

English

241

1.6K

211K

Karolus Sariola retweetledi

DeepSeek@deepseek_ai·24 Nis

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

1.6K

7.7K

45.2K

9.7M

Karolus Sariola retweetledi

Bernardo García@bergr7·23 Nis

Agent harnesses are everywhere lately. The easiest way to understand them is to compare them with frameworks: A framework gives you primitives: LLM calls, tools, state, routing, retries. A harness holds opinions about how an agent should behave: how it plans, uses tools, manages context, asks for approval, recovers from failure, and improves. At Flow AI, we built a vertical harness for data-heavy SaaS: 1. Semantic data layer Agents understand schemas, relationships, business rules, terminology, documentation, and large datasets without flooding the context window. 2. Planning and orchestration The system searches, decomposes the request, resolves affected entities, and creates a structured plan before execution. 3. Safe tool execution Tools have schemas, validation, structured errors, and approval gates. The model chooses the tool; code controls what happens. 4. Evaluation and improvement Every run is traceable. User corrections flow through review and back into the semantic layer. Use a framework when you want primitives and flexibility. But use a harness when you want opinionated defaults for a specific class of work.

English

Karolus Sariola retweetledi

Bernardo García@bergr7·16 Nis

Most SaaS companies think exposing an API makes them "agent-ready". Is this true? "Rebalance stock for slow-moving SKUs in EMEA" sounds simple like a simple query but requires: 1. defining slow-moving 2. mapping EMEA region in db 3. applying business rules You also need a semantic data layer. APIs provide access. Semantics provide understanding.

English

Keşfet

@bergr7 @flowaicom @trychroma @Romainsestier @StackOneHQ @OtsoVeistera @thetokenco @NathanBurg