Karolus Sariola

398 posts

Karolus Sariola banner
Karolus Sariola

Karolus Sariola

@ksariola

Augmenting AI @flowaicom

Helsinki, Finland Katılım Ağustos 2011
422 Takip Edilen366 Takipçiler
Sabitlenmiş Tweet
Karolus Sariola
Karolus Sariola@ksariola·
Sharing my experiences from building specialized harnesses for analytical SaaS companies. It's likely that your harness requires your own defaults around data, context, multi-tenancy, and evolving business rules. After all knowledge work is different from software development. Which default behaviors do you encode in your harness today? Are you encoding them in the best way?
Flow AI@flowaicom

Claude Code is a great agent harness, for coding. For analytical SaaS, it is the wrong default. Our CTO @ksariola took that case to AgentCon Silicon Valley this week, drawing on our experience of building specialized harnesses for analytical SaaS. youtube.com/watch?v=pikC5I…

English
5
12
109
469.3K
Karolus Sariola retweetledi
Igor Kotenkov
Igor Kotenkov@stalkermustang·
While reading the DeepSeek v4 paper, I ended up writing down over 90 questions. A lot of the paper reviews out there skip over the details, which is usually where the actual learning happens. So, I decided to put together a proper guide: an Annotated Paper Walkthrough. The core idea is that you still read the original paper as your source material, but whenever things get dense or confusing, I hold your hand through it. You get detailed annotations with visualizations, code snippets, reference links, and—most importantly—the context you need so you don't feel lost. Today I'm releasing v1 with the first 50 notes. Some of the things I unpack: • Why swap Softmax and Sigmoid for Sqrt-Softplus in the MoE Router? • What on earth is a Birkhoff polytope? • Does attention process some tokens 3 times? • What are split-KV and split-K, and why did DeepSeek drop them? • Why use Reverse KL, and where does it even come from? ..and a lot more. Even the most demanding readers will find something new here. Open-source models are still heavily borrowing from DeepSeek v3, and there’s no doubt that v4 details will soon become standard topics in discussions and ML interviews. Hopefully, this guide helps you stay ahead of the curve. As a friend of mine joked, going through this will not only make you a better engineer, but a better man 😂 I can't prove that scientifically, but it's worth a shot. Check it out: dsv4.interactive.ikot.blog
English
8
24
237
37.5K
Karolus Sariola
Karolus Sariola@ksariola·
Given that Groq and Cerebras aren't adding new models to their catalogue and are seemingly entering into other deals (like serving Spark), do we assume the consumer category of OSS models with fast inference is dead? are there new players entering?
English
0
0
0
27
Karolus Sariola retweetledi
Bernardo García
Bernardo García@bergr7·
Spoke at Context is King #4 in SF yesterday about why we ended up building our own specialized agent harness instead of reusing an existing one. I walked through the default behaviors we encoded into the harness, and the implementation choices they led to: how to make schema, organizational knowledge, and business rules available to the agent all at once; how to let our semantic data layer learn at the same pace as knowledge evolves; and how to efficiently manage the context window when working with indivisible data. Thanks @aiven_io for co-organizing, and to everyone who came out.
English
1
2
2
85
Karolus Sariola retweetledi
Aaro Isosaari
Aaro Isosaari@aaroisosaari·
The most common mistake we see in analytical agents is dumping data into the context. Letting the agent work from references to the data instead keeps the system fast, the numbers right, and each customer's data separated. @ksariola does the best job I have heard of explaining why that matters and what it changes about how you ship reliable agents on top of real customer data.
Flow AI@flowaicom

Claude Code is a great agent harness, for coding. For analytical SaaS, it is the wrong default. Our CTO @ksariola took that case to AgentCon Silicon Valley this week, drawing on our experience of building specialized harnesses for analytical SaaS. youtube.com/watch?v=pikC5I…

English
0
2
3
153
Karolus Sariola retweetledi
Bernardo García
Bernardo García@bergr7·
Building slides with Claude made PowerPoint feel unnecessary. 🧵
GIF
English
1
1
2
80
Karolus Sariola retweetledi
Aaro Isosaari
Aaro Isosaari@aaroisosaari·
We are back in San Francisco for Context is King no. 4 on Tuesday, with over 100 builders already in and a few last spots left. My co-founder @bergr7 is going deep on the harness we built at @flowaicom for data-heavy analytical agents. Search structure, memory pointers, multi-tenancy, the parts that make our agents actually work with real data. Joined on stage by Itai Smith (@trychroma), @RomainSestier (@StackOneHQ), @OtsoVeistera (@thetokenco), and @NathanBurg (@GitHits_com). Let's go 🌉
Aaro Isosaari tweet media
English
0
5
6
221
Karolus Sariola
Karolus Sariola@ksariola·
If you're in the Bay Area on Monday, come say hi.
Karolus Sariola tweet media
English
1
0
0
18
Karolus Sariola
Karolus Sariola@ksariola·
Speaking at AgentCon Silicon Valley on Monday, May 4 at the Computer History Museum.
English
1
3
3
77
Karolus Sariola retweetledi
Bernardo García
Bernardo García@bergr7·
Speaking at Context is King #4 in SF on May 5th, the meetup we at @flowaicom run with @aiven_io. I’ll be talking about agent-ready knowledge, memory pointers for long context, and multi-tenancy in product data agents. Come along if you're in town!! luma.com/24jez9g5
Bernardo García tweet media
English
2
3
6
155
Karolus Sariola retweetledi
Erik Kaunismäki
Erik Kaunismäki@ErikKaum·
Super cool to speak at #3 Context is King in Helsinki 🔥 I spoke about: - building the best context for agents that orchestrate GPUs - and demoed ml-intern (credits to @akseljoonas for building it 💪) Thanks to @aaroisosaari (@flowaicom) and @aiven_io for organizing 🙌
Erik Kaunismäki tweet media
English
4
4
18
1K
Karolus Sariola retweetledi
Aaro Isosaari
Aaro Isosaari@aaroisosaari·
Context is King #3 was the biggest and best edition we have run, with a packed room at Wolt discussing how coding agents deal with context. Fitting for the theme of the series, the operations side of running these is also getting smoother each time, with agents handling more of the repetitive work so we can spend our time on the parts that actually need taste. Thanks to everyone involved: @ConfiMind @huggingface @ErikKaum @FSecure @aiven_io @skvark @GitHits_com @woltapp @flowaicom and more SF is up next on May 5, with several other cities in the pipeline ✈️ Images: Nguyen Oanh & Tony Bui
Aaro Isosaari tweet mediaAaro Isosaari tweet mediaAaro Isosaari tweet media
English
0
4
8
175
Karolus Sariola retweetledi
GDP
GDP@bookwormengr·
DeepSeek V4 hits it out of the park and addresses HBM shortage: DeepSeek proves why it is such a fundamental research lab. In addition to exceeding Opus 4.6 on Terminal Bench and virtually matching on other performance metrics, the most notable advancement is this statement: "In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2" To understand significance of this point, consider below diagram that shows memory layout for Prefill and Decode nodes. If you implement Decode with Data and Expert parallelism (DEP16) with 16 GPUs on GB200 or GB300 NVL72 rack with DeepSeek v3.2, you are left with 104GB or 176 GB HBRAM per GPU respectively. Here we are assuming MoE parameters are in NVFP4. The remaining HBRAM per GPU dictates how large batch size you can have for inference, which determines how many concurrent request you can serve. Consider GB300 with 176GB left: 1. For 128K context, you need 4.45 GB HBRam for KV Cache, and you can serve only 36 concurrent requests. 2. For 256K context, you need 8.90 GB HBRam for KV Cache, and you can serve only 18 concurrent requests. 3. For 512K context, you need 17.80 GB HBRam for KV Cache, and you can serve only 9 concurrent requests. 4. For 1M context, you need 35.60 GB HBRam for KV Cache, and you can serve only 4 concurrent requests. You see the point. Now you imagine, you actually required 10 times less KV cache somehow at 1M! It basically enables you to server 10 times more requests with same resources. Recall Decode is memory bound and not compute bound, unlike Prefill. This is probably the most important contribution of DeepSeek V4. @teortaxesTex @jukan05 @zephyr_z9
GDP tweet media
DeepSeek@deepseek_ai

Structural Innovation & Ultra-High Context Efficiency 🔹 Novel Attention: Token-wise compression + DSA (DeepSeek Sparse Attention). 🔹 Peak Efficiency: World-leading long context with drastically reduced compute & memory costs. 🔹 1M Standard: 1M context is now the default across all official DeepSeek services. 4/n

English
27
241
1.6K
211K
Karolus Sariola retweetledi
DeepSeek
DeepSeek@deepseek_ai·
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n
DeepSeek tweet media
English
1.6K
7.7K
45.2K
9.7M
Karolus Sariola retweetledi
Bernardo García
Bernardo García@bergr7·
Agent harnesses are everywhere lately. The easiest way to understand them is to compare them with frameworks: A framework gives you primitives: LLM calls, tools, state, routing, retries. A harness holds opinions about how an agent should behave: how it plans, uses tools, manages context, asks for approval, recovers from failure, and improves. At Flow AI, we built a vertical harness for data-heavy SaaS: 1. Semantic data layer Agents understand schemas, relationships, business rules, terminology, documentation, and large datasets without flooding the context window. 2. Planning and orchestration The system searches, decomposes the request, resolves affected entities, and creates a structured plan before execution. 3. Safe tool execution Tools have schemas, validation, structured errors, and approval gates. The model chooses the tool; code controls what happens. 4. Evaluation and improvement Every run is traceable. User corrections flow through review and back into the semantic layer. Use a framework when you want primitives and flexibility. But use a harness when you want opinionated defaults for a specific class of work.
Bernardo García tweet media
English
1
1
1
43
Karolus Sariola retweetledi
Bernardo García
Bernardo García@bergr7·
Most SaaS companies think exposing an API makes them "agent-ready". Is this true? "Rebalance stock for slow-moving SKUs in EMEA" sounds simple like a simple query but requires: 1. defining slow-moving 2. mapping EMEA region in db 3. applying business rules You also need a semantic data layer. APIs provide access. Semantics provide understanding.
English
0
1
3
50