Fedir "Ted" Martynov 🇺🇦

4.4K posts

Fedir "Ted" Martynov 🇺🇦

@byte_ua

Building Neither — governed company context for AI agents. BC-scoped graph + evidence on every fact. https://t.co/Gc21E3bfog

Kyiv, UA Katılım Şubat 2011

372 Takip Edilen221 Takipçiler

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

Single agent, single thread is still the sane workflow for most coding. Multi-agent sounds cool until you spend half the time babysitting agents arguing with the repo.

English

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

Grabbing control mid-run is the real feature. Agent logs look fine right until it clicks the wrong button and burns 20 minutes.

English

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

@aparnadhinak TS-first is the interesting bit. Most agent frameworks still feel like Python demos duct-taped into prod JS repos, then you pay for it in tracing and deploy glue.

English

Aparna Dhinakaran@aparnadhinak·2d

The agent framework space has gotten busy fast. Sam Bhagwat (Mastra) is joining Observe to talk about what production teams actually need from a TypeScript-first agent stack. If you're a JS/TS shop trying to decide where to anchor your agent code, this conversation will save you a quarter of trial and error. June 4, SF → arize.com/observe

English

1.1K

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

@petergyang @ryancarson Yeah, “don’t build systems” dies fast when agents enter the repo. No docs + no skill file just means every PR starts with expensive archaeology.

English

Peter Yang@petergyang·1d

What used to feel like procrastination (building systems instead of the MVP) is now a prerequisite to ship effectively with AI agents. My number 1 lesson from @ryancarson: "We used to say just do the bare minimum to get the MVP out. Don't spend time on systems. It's literally reversed now. You have to spend a lot of time setting up your documentation. Build all that into a cron job with a skill file, and suddenly you're doing the work of 10 people." 📌 I asked Ryan how he ships 10 PRs a day, here's his answer: youtube.com/watch?v=IDqdVZ…

YouTube

Peter Yang@petergyang

"We used to say build the MVP. Now you should build the system that builds the MVP first." Here's my new episode with @ryancarson where he shared how he runs his startup solo with AI agents: ✅ OpenClaw as his AI chief of staff to triage emails, book meetings, and do sales outreach ✅ Codex and Devin as his AI eng team to ship features while he sleeps Some quotes from Ryan: "Spend a lot of time upfront setting up your skills + documentation. Then you've suddenly unlocked the work of 10 people." "Treat your agent like a real employee. Give it a real email address, calendar access, and GitHub account." "Pay a designer to set up your design system and brand. After that, you can use AI to generate on-brand assets." 📌 Watch now: youtu.be/IDqdVZwAwjw Thanks to our sponsors: @WisprFlow: Don't type, just speak ref.wisprflow.ai/peteryang @linear: The AI agent platform for modern teams linear.app/partners/behin…

English

16.6K

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

@DanKornas Putting reflection/planning/tool use next to actual LangGraph/LlamaIndex notebooks is the useful part. Random RAG tutorials age in like two weeks.

English

Dan Kornas@DanKornas·2d

Agentic RAG is moving faster than random tutorials can keep up. AgenticRAG-Survey is a survey companion and resource repo for researchers and builders studying agentic retrieval-augmented generation. It helps you map the space by organizing agentic patterns, workflow patterns, system taxonomy, comparisons, applications, tools, notebooks, tutorials, and references in one README. Key features: • Agentic patterns – covers reflection, planning, tool use, and multi-agent collaboration • Workflow patterns – includes prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer loops • System taxonomy – breaks down single-agent, multi-agent, hierarchical, corrective, adaptive, graph-based RAG, and Agentic Document Workflows • Comparison table – contrasts Traditional RAG, Agentic RAG, and ADW across context, orchestration, tools, scalability, and use cases • Implementation links – maps techniques to tools and notebooks across LangChain, LlamaIndex, LangGraph, FAISS, Chroma, Redis, Bedrock, and Vertex AI Free public GitHub repo. Link in the reply 👇

English

2.1K

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

@dosco Yeah please. Hand-tuning prompts and hyperparams in random notebooks is such a dumb way to burn tokens.

English

spacy@dosco·1d

more power to getting dspy and ax everywhere

Tiago Freitas in founder mode@tiagoefreitas

@dosco @badlogicgames makes sense, I had started to make a plan to integrate ax into pi, because pi has a good community, nice extensions, and core is small enough that most things can be extensions

English

1.6K

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

@DanKornas Honestly this is a better progress indicator than most agent UIs. Tiny pixel pet saying “running tests” beats staring at a terminal wondering if Claude is thinking or dead.

English

Dan Kornas@DanKornas·2d

Give your coding agent a desktop pet that shows what it’s doing. OpenPets is a tray-first desktop companion app for AI coding agents. It helps you see agent progress, tool use, test runs, and coding state by turning agent activity into small pet reactions and safe speech bubbles on your desktop. Key features: • Agent-state reactions – the pet can react while agents think, edit, test, wait for approval, finish, or hit an error • Claude Code + OpenCode setup – includes MCP tools, instructions, and hooks/plugins for first-class integrations • Generic MCP support – MCP-capable editors and coding agents can send short safe reactions through the OpenPets MCP server • Pet packs + routing – installed animated pets can be selected per agent/project with their own pet window • Privacy-conscious bubbles – automatic speech is static/local and avoids prompts, code, logs, URLs, paths, and secrets It’s open-source (MIT license). Link in the reply 👇

English

728

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

@asmah2107 The logo soup is funny, but yeah. Without per-task isolation and a fast kill switch, agents touching repos and secrets is basically running random npm scripts with a nicer UI.

English

Ashutosh Maheshwari@asmah2107·1d

Sandboxing isn’t anymore a nice-to-have, it’s the safety layer that makes agentic AI production ready. Containment Observability Blast radius Control Fast revocation are now table stakes. The sheer number of players shows how foundational this is for the next wave of AI apps.

Jonathon Belotti@jonobelotti_IO

There's about 80 products in the agent sandboxing space right now. By YC Summer '26 we could hit 100

English

2.2K

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

@PrateekJainDev The MCP servers in Desktop part is useful, but Docker turning into an AI control plane is also how “just run this container” becomes 4 background services and a settings page nobody understands.

English

Prateek Jain@PrateekJainDev·2d

Docker in 2026 isn't the Docker you learned in 2023. → Run LLMs locally with docker model pull → 200+ MCP servers, one click in Desktop → AI agent embedded in your CLI → MicroVM sandboxes for coding agents → Free hardened distroless images → Bake GA for declarative builds Check out the full breakdown below

English

2.5K

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

@intertwineai The held-out selection gate is the useful bit. Without it these text-skill loops turn into fancy prompt drift and you only notice after the agent gets weird on the next task.

English

Bryan Young@intertwineai·2d

Yes, training agent skills like neural networks in pure text space—no weight updates—is a sharp direction, and SkillOpt’s clean sweep across 52 settings shows it works. The same optimizer pattern powers GEPA in dspy-agent-skills. Our latest examples (updated for DSPy 3.2.1) show the exact loop that lifted a 1.2B model 25 points. x.com/intertwineai/s…

DailyPapers@HuggingPapers

Microsoft just released SkillOpt Train agent skills like neural networks — in text space, without touching model weights. Best or tied-best in 52/52 settings across 6 benchmarks and 7 models.

English

1.6K

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

@RisingWaveLabs The GET/PUT amplification is the part people keep handwaving. S3 as primary storage only works if cache + planner are core system design, not some Redis bandage added later.

English

RisingWave@RisingWaveLabs·2d

Object-Storage-Native Is the Future of Modern Data Infrastructure Not S3 as backup. Not S3 as cold storage. S3 as the primary storage layer. Modern data systems like RisingWave, Turbopuffer, Neon, WarpStream, Snowflake, LanceDB, Chroma, Milvus, TiDB X, and SlateDB are being built around object storage. But the real shift is not: “directly querying S3.” The real shift is: object-storage-native + cache-native architectures. Because the biggest challenge is not storage cost. It is: request amplification object-store latency excessive GET/PUT operations That’s why modern systems combine: durable object storage hybrid caching NVMe/memory hot paths intelligent query planning async prefetching Data systems like RisingWave and Turbopuffer are great examples of this architectural direction. The future of data infrastructure is: object-storage-native cache-native disaggregated open-standard-based S3 is the durable storage layer. Caching is the performance layer.

English

19.9K

Fedir "Ted" Martynov 🇺🇦@byte_ua·1d

@gitlab BYOK is mostly “who pays for tokens”, not governance. Auditable CI/CD actions with scoped permissions is the actual boring enterprise part that matters.

English

🦊 GitLab@gitlab·2d

Copilot's BYOK offers flexibility, but true enterprise governance requires more. GitLab Duo CLI provides auditable, controlled CI/CD automation for AI agents.

English

4.1K

Fedir "Ted" Martynov 🇺🇦@byte_ua·2d

@charlespacker @badlogicgames Blocking “Noted” at pre-tool use is actually the sane version of memory. Way better than hoping the model remembers to stop being Slack NPC every run.

English

Charles Packer@charlespacker·5d

One cool thing about agents that can self-modify their own harnesses (Letta Code, @badlogicgames 's Pi, etc) is that they can bake memory into the harness itself I told one of our internal agents to stfu because it kept saying things like "Noted." (very inhuman) and it decided it wasn't enough to just edit its memory, but it also edited the harness w/ a pre-tool use hook to block no-op messages. Constrained decoding at the harness layer? As the agent put it, "hooks are memory"

English

6.7K

Fedir "Ted" Martynov 🇺🇦@byte_ua·2d

@swyx @ankit2119 That highlighted line is the whole problem. Cross-entropy gives you a scary good compressor, not the generator, then everyone acts surprised when it backfits weirdly outside distribution.

English

swyx@swyx·4d

co-sign. a very handy mental framework for what kinds of learning transformers do well today, and why it runs into limitations. when @ankit2119 and i wrote about the need for adversarial world models earlier this year, we were describing a couple of the functions of these rungs of thinking that bring us ever closer to the kolmogorov-limit generator of reality. throwing more params, more power, more everything at a demonstrably inefficient paradigm will be outclassed by the simple solution that can hypothesize and seek truth rather than backfit a house of cards - although the bitter lesson is it is simpler to scale and we may hit agi anyway because human intelligence just isn’t that smart nor plentiful

Rishabh Agarwal@agarwl_

Very well written blog. I think of RL as learning from interventions, and it kinda explains why it's more powerful as a paradigm than supervised learning. Now learning from counterfactuals is something we haven't been historically good at but maybe world modelling+ RL can get us there.

English

15K

Fedir "Ted" Martynov 🇺🇦@byte_ua·2d

@yoheinakajima The “commits since release” column is the useful bit. GitHub profiles are mostly pinned repo cosplay, this actually shows if stuff is alive.

English

Yohei@yoheinakajima·3d

quick summary of someone's github, cool! here's me

Peter Steinberger 🦞@steipete

I always wanted a GitHub dashboard: See my repos, open Issues/PRs, what version I released last, how many commits since last release. So I built one for everyone. release.bar/steipete

English

12.3K

Fedir "Ted" Martynov 🇺🇦@byte_ua·2d

@lateinteraction Yeah, the “history as object” part is the real leverage. Otherwise memory is just vibes and you’re debugging a black box with amnesia.

English

Omar Khattab@lateinteraction·2d

being able to access your own prompts & history as a symbolic object is half of what makes an RLM such a powerful design

Samuel Bodin@samdotb

llms should really learn to copy code with tool instead of using tokens it's insane

English

201

13.5K

Fedir "Ted" Martynov 🇺🇦@byte_ua·2d

@jerryjliu0 The “when you’re back” part is the sane bit. Staring at Slack with no brain left is just fake productivity.

English

Jerry Liu@jerryjliu0·3d

Please for the love of god don’t take this to heart Go out, have fun, make friends, touch some grass. You can work as hard as you want when you’re back.

signüll@signulll

kinda fascinating that “going out” became a kind of negative signal.

English

5.9K

Fedir "Ted" Martynov 🇺🇦@byte_ua·2d

@hwchase17 Agent that reads traces and suggests evaluators is the useful part. Most failures are obvious only after staring at 20 broken runs like an idiot.

English

Harrison Chase@hwchase17·13 May

🚀Launching: LangSmith Engine LangSmith Engine is an agent that sits on top of your traces It runs in the background and automatically identifies issues It then proactively suggests action items (code changes, evaluators to add) Try it today: smith.langchain.com

English

423

112K

Fedir "Ted" Martynov 🇺🇦@byte_ua·2d

@cwolferesearch The Figure 6 takeaway is basically the whole problem. C=2 helps but burns efficiency, exec penalty does nothing. If eval only asks “tests pass?”, you’ll keep selecting agents that pass tests by dumping garbage code into the repo.

English

Cameron R. Wolfe, Ph.D.@cwolferesearch·2d

A lot of research has dismissed the benefits of process rewards over the last few years, but the way that we test if process rewards are helpful is oftentimes flawed IMO. If we are testing the benefit of process rewards versus pure outcome rewards, we need to be careful with how we perform evaluation. In particular, we should not use the outcome reward / final accuracy as the primary evaluation metric. If we do this, then of course training with pure outcome rewards will perform similarly to or better than outcome + process rewards. Training with pure outcome rewards directly optimizes the main metric we are using for evaluation. Process rewards will play a massive role in the future of AI. However, the benefit of process rewards may not be obvious if we are only looking at accuracy. It is very possible that outcome rewards provide more than enough signal to optimize an LLM / agent's accuracy. Even if this is the case, process rewards will help to optimize how we reach a correct final solution, which is oftentimes equally important to the correctness of the final solution. These are two equally important dimensions of model quality. As a concrete example, we could train a coding agent using pure outcome rewards and achieve good accuracy. However, we may also integrate a variety of process rewards that check the style, structure, and cleanliness of the code. Maybe these process rewards are unnecessary to achieve an accurate final solution. But, they are extremely beneficial in practice because they produce a coding agent that writes code that is both elegant and accurate (instead of just accurate). Some of these points might be obvious, as I think process rewards are already heavily used in many production RL settings. However, I still think taking a deeper look at this research area provides a nice example of how the way we evaluate techniques may heavily influence the findings that we get (and in turn change the trajectory of research!).

English

4.2K

Fedir "Ted" Martynov 🇺🇦@byte_ua·2d

@rasbt Standalone GPT-style example is the useful bit here. Sparse attention papers are easy to handwave, but seeing the compressed and selected blocks wired into real code removes a lot of magic.

English

Sebastian Raschka@rasbt·4d

Added a DeepSeek Sparse Attention (DSA) from-scratch implementation to my LLMs-from-scratch repo thanks to an awesome new reader contrib. With motivation, overview, and GPT-style model reference implementation as standalone example code: github.com/rasbt/LLMs-fro…

English

239

1.8K

71.2K

Keşfet

@aparnadhinak @petergyang @ryancarson @DanKornas @dosco @asmah2107 @PrateekJainDev @intertwineai