Yaniv Markovski

26 posts

Yaniv Markovski

@yanivm13

I am excited about artificial intelligence, technological advancements, education, and customer experience. Ex-OpenAI, currently at AI21 Labs.

San Francisco Katılım Eylül 2012

106 Takip Edilen50 Takipçiler

Yaniv Markovski@yanivm13·14 Oca

@pamelafox MCPJam looks super handy for iterating on tool calls and OAuth. next pain point after you get requests working is state: parallel runs, rollback, and safe writes. we wrote about workspace isolation for MCP tools here: ai21.com/blog/stateful-…

English

Pamela Fox@pamelafox·7 Oca

If you're building MCP servers, check out the MCPJam Inspector: mcpjam.com MCPJam makes it easy to send tool requests plus it includes support for OAuth flows (like the Entra OAuth Proxy DCR flow from FastMCP) along with a step-by-step OAuth debugger.

English

1.4K

Yaniv Markovski@yanivm13·14 Oca

@levie Bumped into this tweet - looks cool. curious how you handle state when you go beyond read-only. if Claude iterates on the deck in parallel, do you isolate runs, version, and merge changes? we wrote up a workspace approach for MCP tools here: ai21.com/blog/stateful-…

English

Aaron Levie@levie·21 Eyl

AI agents are getting upgraded capabilities at a rapid rate. Here's the Box MCP server connected to Claude to generate a near-perfect powerpoint presentation based on an existing document. This is a small peek into what the future of knowledge work will look like.

English

356

105K

Yaniv Markovski@yanivm13·14 Oca

@John_Capobianco VibeOps is the right instinct. MCP is a great standard, but reliability lives in “where does this run” and “what state can it touch”. per-task workspaces made that predictable for us. ai21.com/blog/stateful-…

English

John Capobianco@John_Capobianco·30 Ara

Please checkout Senad, from NVIDIA, #autocon4 talk I'm more than humbled by this; I've come to realize that I have to take my *responsibility* as an ambassador for AI and MCP much more seriously. I've started with the new VibeOps Forum - you can join here: join.slack.com/t/vibeopsforum… Full talk here: youtu.be/x2uRzsD4fE8?si…

YouTube

English

951

Yaniv Markovski@yanivm13·14 Oca

@DBVolkov skimmed the toc, this looks like a legit curriculum. one add: MCP standardizes how you call tools, but not where they run or how writes stay safe when you go parallel. per-task workspaces helped us a lot: ai21.com/blog/stateful-…

English

Yaniv Markovski@yanivm13·14 Oca

@neo4j This is great. Curious how you handle isolation when multiple agent runs hit the same Aura Graph Analytics project. do you snapshot, clone, or rely on transactions? we’ve been working on MCP workspaces for this exact “write path” problem: ai21.com/blog/stateful-…

English

Neo4j@neo4j·7 Oca

Curious about what it takes to build an Aura Graph Analytics #MCP? Tomaz Bratanic breaks down the key lessons learned and the surprising challenges along the way, including: 🔥 How to get data from external sources without building and maintaining a separate integration for each data provider 🔥How to keep the actual graph data out of the #LLM context, which would quickly get overwhelmed 🔥If it makes sense to group algorithms into higher-level tools rather than exposing each one individually Full breakdown: bit.ly/48Hl48b #Neo4j

English

724

Yaniv Markovski@yanivm13·14 Oca

@_akhaliq this is exactly where MCP shines. i made a similar chat-with-arXiv tool with Maestro. once you add “save to doc”, “run experiments”, “edit repo”, isolation becomes the difference between magic and chaos: ai21.com/blog/stateful-…

English

AK@_akhaliq·7 Oca

chat with papers for any arXiv link to HF paper you can now chat using Hugging Chat All Hugging Face Papers now include a built-in assistant, powered by HuggingChat and the Hugging Face MCP server. It helps you quickly understand papers by answering questions, summarizing key ideas, and providing context as you browse the latest research

English

140

39.9K

Yaniv Markovski@yanivm13·14 Oca

@andy_pavlo Loved the “MCP for every database” section. the proxy part is real, but the hard part is safe writes and rollback. we wrote up a workspace isolation approach for MCP style tools here: ai21.com/blog/stateful-…

English

Andy Pavlo (@andypavlo.bsky.social)@andy_pavlo·5 Oca

Here is my latest article on the world of databases: cs.cmu.edu/~pavlo/blog/20… All the hot topics from the last year: • More Postgres action! • MCP for everyone! • MongoDB gets litigious with FerretDB! • File formats! • Market movements! • The richest person in the world!

English

163

955

168.6K

Yaniv Markovski@yanivm13·14 Oca

@VittoStack Anthropic’s Claude Code course is solid. one tip from their docs that’s worth adopting fast: git worktrees for parallel sessions, so agents don’t step on each other. we wrote up the broader “workspaces” pattern here: ai21.com/blog/stateful-…

English

Vitto Rivabella@VittoStack·31 Ara

Anthropic literally released a full 2-hour course to learn Claude Code. Completely for free! Context management, custom automations, MCP servers, GitHub workflows, and much more ✨ If you want to improve your vibe coding skills, you should start here.

English

274

3.2K

238.4K

Yaniv Markovski@yanivm13·14 Oca

@sahnlam MCP is the standardized tool-calling lane. APIs are the endpoints. for agents that write, the missing piece is “where does this run” + “what state can it touch”. per-task workspaces make it boring but safe. ai21.com/blog/stateful-…

English

Sahn Lam@sahnlam·30 Ara

MCP vs API: what’s the difference?

English

197

1.2K

49.2K

Yaniv Markovski@yanivm13·13 Oca

@s_scardapane @iwiwi Love this. AB-MCTS makes TTC an explore/exploit problem: go wider (new candidates) or deeper (refine) from feedback. That's the missing layer for long-horizon agents too: branch, validate, stop early, pick winner. ai21.com/blog/test-time…

English

Simone Scardapane@s_scardapane·9 Ara

*Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search* by @iwiwi et al. A method for test-time scaling that selects, for each iteration, whether to refine a previous answer or sample something new (adaptive MCTS). arxiv.org/abs/2503.04412

English

153

12.2K

Yaniv Markovski@yanivm13·13 Oca

@awnihannun Nice demo - budget forcing in the wild. Love the 'Wait'↔</think> knob to dial compute. For tool-using, long-horizon agents, the next layer is orchestrated TTC: branch when needed, validate, stop early, pick best. Deep dive: ai21.com/blog/test-time…

English

Awni Hannun@awnihannun·8 Şub

Made a demo to do test-time-scaling with mlx-lm and a R1-based reasoning model. Same idea as S1: - To force a response, swap "Wait" for "</think>" - To think more, swap "</think>" for "Wait" Runs fast with 4-bit Qwen 32B on an M3 max:

English

441

60.1K

Yaniv Markovski@yanivm13·13 Oca

@omarsar0 Practical trick is spending compute where it buys certainty: parallel attempts, verification, early stopping. For long-horizon agents that’s orchestration. We shared how this plays out on SWE-bench here: ai21.com/blog/test-time…

English

elvis@omarsar0·2 Ara

The Art of Scaling Test-Time Compute for LLMs This is a large-scale study of test-time scaling (TTS). It also provides a practical recipe for selecting the best test-time scaling strategy. (bookmark it) My takeaways: Test-time compute scaling works - Allocating more computation during inference (not training) can significantly boost LLM performance on complex reasoning tasks. Strategic allocation matters - Not all extra compute is equally beneficial; how you spend the additional resources is as important as how much you spend. Different strategies for different tasks - Certain test-time scaling approaches outperform others depending on the characteristics of the task. No retraining required - LLMs can be made more capable by intelligently using additional computation at inference time, without modifying model weights. The paper evaluates various reasoning verification and refinement techniques, plus methods for deciding when/how to use extra computation. The research highlights the trade-offs between different scaling strategies, helping practitioners choose the right approach for their use case. Great read for AI devs.

English

218

13.1K

Yaniv Markovski@yanivm13·13 Oca

Totally. “Tools” is really shorthand for the whole agent runtime. MCP standardizes how you call tools, but it’s silent on where they run, which is fine for read-only tasks and breaks the moment you mutate state (files, docs, downloads, builds). We wrote up a workspace layer for MCP with primitives like initialize/clone/compare/merge/delete so parallel runs don’t collide and you can rollback safely: ai21.com/blog/stateful-…

English

1.1K

Ethan Mollick@emollick·13 Oca

Gemini is held back by lack of tools, a big gap compared to ChatGPT and Claude. Gemini 3 is a really good model, but it just isn't able to do things. For example, take a GDPval prompt that involves downloading from the web, PDFs & docs. ChatGPT wins here, Claude close, Gemini😬

English

511

51.3K

Yaniv Markovski@yanivm13·13 Oca

@emollick Our team just posted a series of technical blogs about this topic: ai21.com/blog/test-time…

English

Yaniv Markovski@yanivm13·13 Oca

@emollick One year later it’s even clearer, TTC is orchestration, not just “think longer”. rStar-Math shows search+select. For long-horizon agents you want structured TTC: try a few approaches in parallel, auto-kill the dead ends, and stop as soon as one passes the checks

English

Ethan Mollick@emollick·11 Oca

Paper shows very small LLMs can match or beat larger ones through 'deep thinking' - evaluating different solution paths. Their 7B model beats o1-preview on complex math by exploring 64 different solutions & picking the best one. Test-time compute paradigm seems really fruitful.

English

205

1.3K

81.8K

Yaniv Markovski@yanivm13·23 Eki

@MillionInt Excellent content

Français

Jerry Tworek@MillionInt·16 Eki

I don't do podcasts very often - in reality this is my first one ever, but if anyone wants to listen to someone talk about RL for an hour, this is it

Matt Turck@mattturck

How GPT-5 thinks, with @OpenAI VP of Research @MillionInt 00:00 - Intro 01:01 - What Reasoning Actually Means in AI 02:32 - Chain of Thought: Models Thinking in Words 05:25 - How Models Decide How Long to Think 07:24 - Evolution from o1 to o3 to GPT-5 11:00 - The Road to OpenAI: Growing up in Poland, Dropping out of School, Trading 20:32 - Working on Robotics and Rubik's Cube Solving 23:02 - A Day in the Life: Talking to Researchers 24:06 - How Research Priorities Are Determined 26:53 - OpenAI's Culture of Transparency 29:32 - Balancing Research with Shipping Fast 31:52 - Using OpenAI's Own Tools Daily 32:43 - Pre-Training Plus RL: The Modern AI Stack 35:10 - Reinforcement Learning 101: Training Dogs 40:17 - The Evolution of Deep Reinforcement Learning 42:09 - When GPT-4 Seemed Underwhelming at First 45:39 - How RLHF Made GPT-4 Actually Useful 48:02 - Unsupervised vs Supervised Learning 49:59 - GRPO and How DeepSeek Accelerated US Research 53:05 - What It Takes to Scale Reinforcement Learning 55:36 - Agentic AI and Long-Horizon Thinking 59:19 - Alignment as an RL Problem 1:01:11 - Winning ICPC World Finals Without Specific Training 1:05:53 - Applying RL Beyond Math and Coding 1:09:15 - The Path from Here to AGI 1:12:23 - Pure RL vs Language Models

English

103

1.1K

197.9K

Yaniv Markovski@yanivm13·19 Mar

@Mapbox @PointrTech Go Moritz!

English

Mapbox@Mapbox·19 Mar

Two weeks ago, Moritz Förster, Solutions Architect at Mapbox, joined a panel webinar hosted by @PointrTech to discuss how AI and large language models (LLMs) are making maps more intuitive and responsive. "MapGPT is our LLM-powered interface that understands natural language, allowing users to query maps effortlessly. Drivers can engage with maps using simple voice commands without taking their hands off the steering wheel. These technological advances make things possible that weren't possible years ago — like what Pointr accomplished with MapScale. It's remarkable how quickly it processes floor plans and produces high-quality results, demonstrating how AI is transforming both our interfaces and the underlying mapping technologies." -Moritz Read the full webinar recap here: pointr.tech/blog/ai-for-th… #BuiltWithMapbox #AI #Navigation

English

1.1K

Yaniv Markovski@yanivm13·6 Mar

We just launched Jamba 1.6 - it outperforms Cohere, Mistral and Llama on key benchmarks, including Arena Hard, and narrows the gap with leading closed models. Now available on AI21’s Studio and @huggingface huggingface.co/ai21labs/AI21-…

English

119

Yaniv Markovski@yanivm13·8 Ara

@FeedTechILUncen הצטרפתי לפני שנה לחברה. זו התעשיה הכי דינאמית שיש, מתחרים מוציאים דברים מדהימים כל שבוע. בלי שינויים מהותיים אין לנו מקום בספייס הזה. בטוח שזה לא תמיד כיף או ברור לכולם, אבל לשמחתי יש ויז׳ן מעניין שדורש חדשנות ושיתוף פעולה מכולם וייחשף (החוצה) בקרוב.

עברית

2.2K

פידטק וידויים אנונימיים ללא צנזורה@FeedTechILUncen·8 Ara

ai21 3 סבבים של שינויים ארגוניים מאסיבים בפחות משנה לצד ״התייעלות״ ולצד בכירים שלא מפסיקים לעזוב. העובדים לא מבינים מה החברה מנסה לעשות וכנראה שגם ההנהלה לא

פידטק וידויים אנונימיים ללא צנזורה@FeedTechILUncen

*התוודו אנונימית כאן👇 bit.ly/3GCanEe

עברית

30.4K

Yaniv Markovski@yanivm13·3 May

@mcavaliere @rao_hacker_one @npew @gdb @OpenAI Yup!

Mike Cavaliere@mcavaliere·3 May

@yanivm13 @rao_hacker_one @npew @gdb @OpenAI Thanks, that worked! 🎉 I take it you work for @OpenAI 😀

English

Arun Rao@sudoraohacker·24 Mar

Lol - looks like payment systems for ChatGPT Plus subscriptions are down - not taking credit cards (I tried 2 cards and both got declined, as the system asked for a debit card). Signing up to play with the multi-modal for some ideas I have for GPT-4. @npew @gdb @OpenAI

English

1.8K

Keşfet

@pamelafox @levie @John_Capobianco @DBVolkov @neo4j @_akhaliq @andy_pavlo @VittoStack