Fedir "Ted" Martynov 🇺🇦

4.4K posts

Fedir "Ted" Martynov 🇺🇦 banner
Fedir "Ted" Martynov 🇺🇦

Fedir "Ted" Martynov 🇺🇦

@byte_ua

Building Neither — governed company context for AI agents. BC-scoped graph + evidence on every fact. https://t.co/Gc21E3bfog

Kyiv, UA Katılım Şubat 2011
372 Takip Edilen221 Takipçiler
Fedir "Ted" Martynov 🇺🇦
Single agent, single thread is still the sane workflow for most coding. Multi-agent sounds cool until you spend half the time babysitting agents arguing with the repo.
English
0
0
0
10
Fedir "Ted" Martynov 🇺🇦
Grabbing control mid-run is the real feature. Agent logs look fine right until it clicks the wrong button and burns 20 minutes.
English
0
0
0
10
Fedir "Ted" Martynov 🇺🇦
@aparnadhinak TS-first is the interesting bit. Most agent frameworks still feel like Python demos duct-taped into prod JS repos, then you pay for it in tracing and deploy glue.
English
0
0
0
12
Aparna Dhinakaran
Aparna Dhinakaran@aparnadhinak·
The agent framework space has gotten busy fast. Sam Bhagwat (Mastra) is joining Observe to talk about what production teams actually need from a TypeScript-first agent stack. If you're a JS/TS shop trying to decide where to anchor your agent code, this conversation will save you a quarter of trial and error. June 4, SF → arize.com/observe
Aparna Dhinakaran tweet media
English
2
1
11
1.1K
Peter Yang
Peter Yang@petergyang·
What used to feel like procrastination (building systems instead of the MVP) is now a prerequisite to ship effectively with AI agents. My number 1 lesson from @ryancarson: "We used to say just do the bare minimum to get the MVP out. Don't spend time on systems. It's literally reversed now. You have to spend a lot of time setting up your documentation. Build all that into a cron job with a skill file, and suddenly you're doing the work of 10 people." 📌 I asked Ryan how he ships 10 PRs a day, here's his answer: youtube.com/watch?v=IDqdVZ…
YouTube video
YouTube
Peter Yang@petergyang

"We used to say build the MVP. Now you should build the system that builds the MVP first." Here's my new episode with @ryancarson where he shared how he runs his startup solo with AI agents: ✅ OpenClaw as his AI chief of staff to triage emails, book meetings, and do sales outreach ✅ Codex and Devin as his AI eng team to ship features while he sleeps Some quotes from Ryan: "Spend a lot of time upfront setting up your skills + documentation. Then you've suddenly unlocked the work of 10 people." "Treat your agent like a real employee. Give it a real email address, calendar access, and GitHub account." "Pay a designer to set up your design system and brand. After that, you can use AI to generate on-brand assets." 📌 Watch now: youtu.be/IDqdVZwAwjw Thanks to our sponsors: @WisprFlow: Don't type, just speak ref.wisprflow.ai/peteryang @linear: The AI agent platform for modern teams linear.app/partners/behin…

English
17
10
64
16.6K
Fedir "Ted" Martynov 🇺🇦
@DanKornas Putting reflection/planning/tool use next to actual LangGraph/LlamaIndex notebooks is the useful part. Random RAG tutorials age in like two weeks.
English
0
0
1
30
Dan Kornas
Dan Kornas@DanKornas·
Agentic RAG is moving faster than random tutorials can keep up. AgenticRAG-Survey is a survey companion and resource repo for researchers and builders studying agentic retrieval-augmented generation. It helps you map the space by organizing agentic patterns, workflow patterns, system taxonomy, comparisons, applications, tools, notebooks, tutorials, and references in one README. Key features: • Agentic patterns – covers reflection, planning, tool use, and multi-agent collaboration • Workflow patterns – includes prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer loops • System taxonomy – breaks down single-agent, multi-agent, hierarchical, corrective, adaptive, graph-based RAG, and Agentic Document Workflows • Comparison table – contrasts Traditional RAG, Agentic RAG, and ADW across context, orchestration, tools, scalability, and use cases • Implementation links – maps techniques to tools and notebooks across LangChain, LlamaIndex, LangGraph, FAISS, Chroma, Redis, Bedrock, and Vertex AI Free public GitHub repo. Link in the reply 👇
Dan Kornas tweet media
English
4
10
50
2.1K
Fedir "Ted" Martynov 🇺🇦
@DanKornas Honestly this is a better progress indicator than most agent UIs. Tiny pixel pet saying “running tests” beats staring at a terminal wondering if Claude is thinking or dead.
English
0
0
1
18
Dan Kornas
Dan Kornas@DanKornas·
Give your coding agent a desktop pet that shows what it’s doing. OpenPets is a tray-first desktop companion app for AI coding agents. It helps you see agent progress, tool use, test runs, and coding state by turning agent activity into small pet reactions and safe speech bubbles on your desktop. Key features: • Agent-state reactions – the pet can react while agents think, edit, test, wait for approval, finish, or hit an error • Claude Code + OpenCode setup – includes MCP tools, instructions, and hooks/plugins for first-class integrations • Generic MCP support – MCP-capable editors and coding agents can send short safe reactions through the OpenPets MCP server • Pet packs + routing – installed animated pets can be selected per agent/project with their own pet window • Privacy-conscious bubbles – automatic speech is static/local and avoids prompts, code, logs, URLs, paths, and secrets It’s open-source (MIT license). Link in the reply 👇
Dan Kornas tweet media
English
5
6
8
728
Fedir "Ted" Martynov 🇺🇦
@asmah2107 The logo soup is funny, but yeah. Without per-task isolation and a fast kill switch, agents touching repos and secrets is basically running random npm scripts with a nicer UI.
English
0
0
0
51
Fedir "Ted" Martynov 🇺🇦
@PrateekJainDev The MCP servers in Desktop part is useful, but Docker turning into an AI control plane is also how “just run this container” becomes 4 background services and a settings page nobody understands.
English
0
0
0
23
Prateek Jain
Prateek Jain@PrateekJainDev·
Docker in 2026 isn't the Docker you learned in 2023. → Run LLMs locally with docker model pull → 200+ MCP servers, one click in Desktop → AI agent embedded in your CLI → MicroVM sandboxes for coding agents → Free hardened distroless images → Bake GA for declarative builds Check out the full breakdown below
English
3
5
10
2.5K
Fedir "Ted" Martynov 🇺🇦
@intertwineai The held-out selection gate is the useful bit. Without it these text-skill loops turn into fancy prompt drift and you only notice after the agent gets weird on the next task.
English
0
0
0
7
Bryan Young
Bryan Young@intertwineai·
Yes, training agent skills like neural networks in pure text space—no weight updates—is a sharp direction, and SkillOpt’s clean sweep across 52 settings shows it works. The same optimizer pattern powers GEPA in dspy-agent-skills. Our latest examples (updated for DSPy 3.2.1) show the exact loop that lifted a 1.2B model 25 points. x.com/intertwineai/s…
Bryan Young tweet media
DailyPapers@HuggingPapers

Microsoft just released SkillOpt Train agent skills like neural networks — in text space, without touching model weights. Best or tied-best in 52/52 settings across 6 benchmarks and 7 models.

English
1
4
16
1.6K
Fedir "Ted" Martynov 🇺🇦
@RisingWaveLabs The GET/PUT amplification is the part people keep handwaving. S3 as primary storage only works if cache + planner are core system design, not some Redis bandage added later.
English
1
0
3
90
RisingWave
RisingWave@RisingWaveLabs·
Object-Storage-Native Is the Future of Modern Data Infrastructure Not S3 as backup. Not S3 as cold storage. S3 as the primary storage layer. Modern data systems like RisingWave, Turbopuffer, Neon, WarpStream, Snowflake, LanceDB, Chroma, Milvus, TiDB X, and SlateDB are being built around object storage. But the real shift is not: “directly querying S3.” The real shift is: object-storage-native + cache-native architectures. Because the biggest challenge is not storage cost. It is: request amplification object-store latency excessive GET/PUT operations That’s why modern systems combine: durable object storage hybrid caching NVMe/memory hot paths intelligent query planning async prefetching Data systems like RisingWave and Turbopuffer are great examples of this architectural direction. The future of data infrastructure is: object-storage-native cache-native disaggregated open-standard-based S3 is the durable storage layer. Caching is the performance layer.
RisingWave tweet media
English
9
12
83
19.9K
Fedir "Ted" Martynov 🇺🇦
@gitlab BYOK is mostly “who pays for tokens”, not governance. Auditable CI/CD actions with scoped permissions is the actual boring enterprise part that matters.
English
0
0
0
28
🦊 GitLab
🦊 GitLab@gitlab·
Copilot's BYOK offers flexibility, but true enterprise governance requires more. GitLab Duo CLI provides auditable, controlled CI/CD automation for AI agents.
English
4
1
22
4.1K
Charles Packer
Charles Packer@charlespacker·
One cool thing about agents that can self-modify their own harnesses (Letta Code, @badlogicgames 's Pi, etc) is that they can bake memory into the harness itself I told one of our internal agents to stfu because it kept saying things like "Noted." (very inhuman) and it decided it wasn't enough to just edit its memory, but it also edited the harness w/ a pre-tool use hook to block no-op messages. Constrained decoding at the harness layer? As the agent put it, "hooks are memory"
Charles Packer tweet media
English
5
4
47
6.7K
Fedir "Ted" Martynov 🇺🇦
@swyx @ankit2119 That highlighted line is the whole problem. Cross-entropy gives you a scary good compressor, not the generator, then everyone acts surprised when it backfits weirdly outside distribution.
English
0
0
0
4
swyx
swyx@swyx·
co-sign. a very handy mental framework for what kinds of learning transformers do well today, and why it runs into limitations. when @ankit2119 and i wrote about the need for adversarial world models earlier this year, we were describing a couple of the functions of these rungs of thinking that bring us ever closer to the kolmogorov-limit generator of reality. throwing more params, more power, more everything at a demonstrably inefficient paradigm will be outclassed by the simple solution that can hypothesize and seek truth rather than backfit a house of cards - although the bitter lesson is it is simpler to scale and we may hit agi anyway because human intelligence just isn’t that smart nor plentiful
swyx tweet media
Rishabh Agarwal@agarwl_

Very well written blog. I think of RL as learning from interventions, and it kinda explains why it's more powerful as a paradigm than supervised learning. Now learning from counterfactuals is something we haven't been historically good at but maybe world modelling+ RL can get us there.

English
50
7
94
15K
Fedir "Ted" Martynov 🇺🇦
@hwchase17 Agent that reads traces and suggests evaluators is the useful part. Most failures are obvious only after staring at 20 broken runs like an idiot.
English
0
0
0
2
Harrison Chase
Harrison Chase@hwchase17·
🚀Launching: LangSmith Engine LangSmith Engine is an agent that sits on top of your traces It runs in the background and automatically identifies issues It then proactively suggests action items (code changes, evaluators to add) Try it today: smith.langchain.com
English
44
58
423
112K
Fedir "Ted" Martynov 🇺🇦
@cwolferesearch The Figure 6 takeaway is basically the whole problem. C=2 helps but burns efficiency, exec penalty does nothing. If eval only asks “tests pass?”, you’ll keep selecting agents that pass tests by dumping garbage code into the repo.
English
0
0
0
20
Cameron R. Wolfe, Ph.D.
Cameron R. Wolfe, Ph.D.@cwolferesearch·
A lot of research has dismissed the benefits of process rewards over the last few years, but the way that we test if process rewards are helpful is oftentimes flawed IMO. If we are testing the benefit of process rewards versus pure outcome rewards, we need to be careful with how we perform evaluation. In particular, we should not use the outcome reward / final accuracy as the primary evaluation metric. If we do this, then of course training with pure outcome rewards will perform similarly to or better than outcome + process rewards. Training with pure outcome rewards directly optimizes the main metric we are using for evaluation. Process rewards will play a massive role in the future of AI. However, the benefit of process rewards may not be obvious if we are only looking at accuracy. It is very possible that outcome rewards provide more than enough signal to optimize an LLM / agent's accuracy. Even if this is the case, process rewards will help to optimize how we reach a correct final solution, which is oftentimes equally important to the correctness of the final solution. These are two equally important dimensions of model quality. As a concrete example, we could train a coding agent using pure outcome rewards and achieve good accuracy. However, we may also integrate a variety of process rewards that check the style, structure, and cleanliness of the code. Maybe these process rewards are unnecessary to achieve an accurate final solution. But, they are extremely beneficial in practice because they produce a coding agent that writes code that is both elegant and accurate (instead of just accurate). Some of these points might be obvious, as I think process rewards are already heavily used in many production RL settings. However, I still think taking a deeper look at this research area provides a nice example of how the way we evaluate techniques may heavily influence the findings that we get (and in turn change the trajectory of research!).
Cameron R. Wolfe, Ph.D. tweet media
English
8
11
60
4.2K
Fedir "Ted" Martynov 🇺🇦
@rasbt Standalone GPT-style example is the useful bit here. Sparse attention papers are easy to handwave, but seeing the compressed and selected blocks wired into real code removes a lot of magic.
English
0
0
0
18
Sebastian Raschka
Added a DeepSeek Sparse Attention (DSA) from-scratch implementation to my LLMs-from-scratch repo thanks to an awesome new reader contrib. With motivation, overview, and GPT-style model reference implementation as standalone example code: github.com/rasbt/LLMs-fro…
Sebastian Raschka tweet media
English
41
239
1.8K
71.2K