Yaniv Markovski

26 posts

Yaniv Markovski

Yaniv Markovski

@yanivm13

I am excited about artificial intelligence, technological advancements, education, and customer experience. Ex-OpenAI, currently at AI21 Labs.

San Francisco Katılım Eylül 2012
106 Takip Edilen50 Takipçiler
Yaniv Markovski
Yaniv Markovski@yanivm13·
@pamelafox MCPJam looks super handy for iterating on tool calls and OAuth. next pain point after you get requests working is state: parallel runs, rollback, and safe writes. we wrote about workspace isolation for MCP tools here: ai21.com/blog/stateful-…
English
0
0
0
3
Pamela Fox
Pamela Fox@pamelafox·
If you're building MCP servers, check out the MCPJam Inspector: mcpjam.com MCPJam makes it easy to send tool requests plus it includes support for OAuth flows (like the Entra OAuth Proxy DCR flow from FastMCP) along with a step-by-step OAuth debugger.
Pamela Fox tweet media
English
3
5
27
1.4K
Yaniv Markovski
Yaniv Markovski@yanivm13·
@levie Bumped into this tweet - looks cool. curious how you handle state when you go beyond read-only. if Claude iterates on the deck in parallel, do you isolate runs, version, and merge changes? we wrote up a workspace approach for MCP tools here: ai21.com/blog/stateful-…
English
0
0
0
3
Aaron Levie
Aaron Levie@levie·
AI agents are getting upgraded capabilities at a rapid rate. Here's the Box MCP server connected to Claude to generate a near-perfect powerpoint presentation based on an existing document. This is a small peek into what the future of knowledge work will look like.
English
27
43
356
105K
Yaniv Markovski
Yaniv Markovski@yanivm13·
@DBVolkov skimmed the toc, this looks like a legit curriculum. one add: MCP standardizes how you call tools, but not where they run or how writes stay safe when you go parallel. per-task workspaces helped us a lot: ai21.com/blog/stateful-…
English
0
0
0
10
Yaniv Markovski
Yaniv Markovski@yanivm13·
@neo4j This is great. Curious how you handle isolation when multiple agent runs hit the same Aura Graph Analytics project. do you snapshot, clone, or rely on transactions? we’ve been working on MCP workspaces for this exact “write path” problem: ai21.com/blog/stateful-…
English
0
0
0
14
Neo4j
Neo4j@neo4j·
Curious about what it takes to build an Aura Graph Analytics #MCP? Tomaz Bratanic breaks down the key lessons learned and the surprising challenges along the way, including: 🔥 How to get data from external sources without building and maintaining a separate integration for each data provider 🔥How to keep the actual graph data out of the #LLM context, which would quickly get overwhelmed 🔥If it makes sense to group algorithms into higher-level tools rather than exposing each one individually Full breakdown: bit.ly/48Hl48b #Neo4j
Neo4j tweet media
English
2
4
17
724
Yaniv Markovski
Yaniv Markovski@yanivm13·
@_akhaliq this is exactly where MCP shines. i made a similar chat-with-arXiv tool with Maestro. once you add “save to doc”, “run experiments”, “edit repo”, isolation becomes the difference between magic and chaos: ai21.com/blog/stateful-…
English
0
0
0
22
AK
AK@_akhaliq·
chat with papers for any arXiv link to HF paper you can now chat using Hugging Chat All Hugging Face Papers now include a built-in assistant, powered by HuggingChat and the Hugging Face MCP server. It helps you quickly understand papers by answering questions, summarizing key ideas, and providing context as you browse the latest research
English
11
27
140
39.9K
Yaniv Markovski
Yaniv Markovski@yanivm13·
@andy_pavlo Loved the “MCP for every database” section. the proxy part is real, but the hard part is safe writes and rollback. we wrote up a workspace isolation approach for MCP style tools here: ai21.com/blog/stateful-…
English
0
0
0
15
Andy Pavlo (@andypavlo.bsky.social)
Here is my latest article on the world of databases: cs.cmu.edu/~pavlo/blog/20… All the hot topics from the last year: • More Postgres action! • MCP for everyone! • MongoDB gets litigious with FerretDB! • File formats! • Market movements! • The richest person in the world!
English
28
163
955
168.6K
Yaniv Markovski
Yaniv Markovski@yanivm13·
@VittoStack Anthropic’s Claude Code course is solid. one tip from their docs that’s worth adopting fast: git worktrees for parallel sessions, so agents don’t step on each other. we wrote up the broader “workspaces” pattern here: ai21.com/blog/stateful-…
English
0
0
0
32
Vitto Rivabella
Vitto Rivabella@VittoStack·
Anthropic literally released a full 2-hour course to learn Claude Code. Completely for free! Context management, custom automations, MCP servers, GitHub workflows, and much more ✨ If you want to improve your vibe coding skills, you should start here.
Vitto Rivabella tweet media
English
70
274
3.2K
238.4K
Yaniv Markovski
Yaniv Markovski@yanivm13·
@sahnlam MCP is the standardized tool-calling lane. APIs are the endpoints. for agents that write, the missing piece is “where does this run” + “what state can it touch”. per-task workspaces make it boring but safe. ai21.com/blog/stateful-…
English
0
0
0
7
Sahn Lam
Sahn Lam@sahnlam·
MCP vs API: what’s the difference?
Sahn Lam tweet media
English
19
197
1.2K
49.2K
Yaniv Markovski
Yaniv Markovski@yanivm13·
@s_scardapane @iwiwi Love this. AB-MCTS makes TTC an explore/exploit problem: go wider (new candidates) or deeper (refine) from feedback. That's the missing layer for long-horizon agents too: branch, validate, stop early, pick winner. ai21.com/blog/test-time…
English
0
0
1
43
Simone Scardapane
Simone Scardapane@s_scardapane·
*Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search* by @iwiwi et al. A method for test-time scaling that selects, for each iteration, whether to refine a previous answer or sample something new (adaptive MCTS). arxiv.org/abs/2503.04412
Simone Scardapane tweet media
English
5
30
153
12.2K
Yaniv Markovski
Yaniv Markovski@yanivm13·
@awnihannun Nice demo - budget forcing in the wild. Love the 'Wait'↔</think> knob to dial compute. For tool-using, long-horizon agents, the next layer is orchestrated TTC: branch when needed, validate, stop early, pick best. Deep dive: ai21.com/blog/test-time…
English
0
0
0
28
Awni Hannun
Awni Hannun@awnihannun·
Made a demo to do test-time-scaling with mlx-lm and a R1-based reasoning model. Same idea as S1: - To force a response, swap "Wait" for "</think>" - To think more, swap "</think>" for "Wait" Runs fast with 4-bit Qwen 32B on an M3 max:
English
21
56
441
60.1K
Yaniv Markovski
Yaniv Markovski@yanivm13·
@omarsar0 Practical trick is spending compute where it buys certainty: parallel attempts, verification, early stopping. For long-horizon agents that’s orchestration. We shared how this plays out on SWE-bench here: ai21.com/blog/test-time…
English
0
0
0
30
elvis
elvis@omarsar0·
The Art of Scaling Test-Time Compute for LLMs This is a large-scale study of test-time scaling (TTS). It also provides a practical recipe for selecting the best test-time scaling strategy. (bookmark it) My takeaways: Test-time compute scaling works - Allocating more computation during inference (not training) can significantly boost LLM performance on complex reasoning tasks. Strategic allocation matters - Not all extra compute is equally beneficial; how you spend the additional resources is as important as how much you spend. Different strategies for different tasks - Certain test-time scaling approaches outperform others depending on the characteristics of the task. No retraining required - LLMs can be made more capable by intelligently using additional computation at inference time, without modifying model weights. The paper evaluates various reasoning verification and refinement techniques, plus methods for deciding when/how to use extra computation. The research highlights the trade-offs between different scaling strategies, helping practitioners choose the right approach for their use case. Great read for AI devs.
elvis tweet media
English
17
40
218
13.1K
Yaniv Markovski
Yaniv Markovski@yanivm13·
Totally. “Tools” is really shorthand for the whole agent runtime. MCP standardizes how you call tools, but it’s silent on where they run, which is fine for read-only tasks and breaks the moment you mutate state (files, docs, downloads, builds). We wrote up a workspace layer for MCP with primitives like initialize/clone/compare/merge/delete so parallel runs don’t collide and you can rollback safely: ai21.com/blog/stateful-…
English
0
0
0
1.1K
Ethan Mollick
Ethan Mollick@emollick·
Gemini is held back by lack of tools, a big gap compared to ChatGPT and Claude. Gemini 3 is a really good model, but it just isn't able to do things. For example, take a GDPval prompt that involves downloading from the web, PDFs & docs. ChatGPT wins here, Claude close, Gemini😬
Ethan Mollick tweet mediaEthan Mollick tweet mediaEthan Mollick tweet media
English
48
35
511
51.3K
Yaniv Markovski
Yaniv Markovski@yanivm13·
@emollick One year later it’s even clearer, TTC is orchestration, not just “think longer”. rStar-Math shows search+select. For long-horizon agents you want structured TTC: try a few approaches in parallel, auto-kill the dead ends, and stop as soon as one passes the checks
English
1
0
0
19
Ethan Mollick
Ethan Mollick@emollick·
Paper shows very small LLMs can match or beat larger ones through 'deep thinking' - evaluating different solution paths. Their 7B model beats o1-preview on complex math by exploring 64 different solutions & picking the best one. Test-time compute paradigm seems really fruitful.
Ethan Mollick tweet media
English
24
205
1.3K
81.8K
Jerry Tworek
Jerry Tworek@MillionInt·
I don't do podcasts very often - in reality this is my first one ever, but if anyone wants to listen to someone talk about RL for an hour, this is it
Matt Turck@mattturck

How GPT-5 thinks, with @OpenAI VP of Research @MillionInt 00:00 - Intro 01:01 - What Reasoning Actually Means in AI 02:32 - Chain of Thought: Models Thinking in Words 05:25 - How Models Decide How Long to Think 07:24 - Evolution from o1 to o3 to GPT-5 11:00 - The Road to OpenAI: Growing up in Poland, Dropping out of School, Trading 20:32 - Working on Robotics and Rubik's Cube Solving 23:02 - A Day in the Life: Talking to Researchers 24:06 - How Research Priorities Are Determined 26:53 - OpenAI's Culture of Transparency 29:32 - Balancing Research with Shipping Fast 31:52 - Using OpenAI's Own Tools Daily 32:43 - Pre-Training Plus RL: The Modern AI Stack 35:10 - Reinforcement Learning 101: Training Dogs 40:17 - The Evolution of Deep Reinforcement Learning 42:09 - When GPT-4 Seemed Underwhelming at First 45:39 - How RLHF Made GPT-4 Actually Useful 48:02 - Unsupervised vs Supervised Learning 49:59 - GRPO and How DeepSeek Accelerated US Research 53:05 - What It Takes to Scale Reinforcement Learning 55:36 - Agentic AI and Long-Horizon Thinking 59:19 - Alignment as an RL Problem 1:01:11 - Winning ICPC World Finals Without Specific Training 1:05:53 - Applying RL Beyond Math and Coding 1:09:15 - The Path from Here to AGI 1:12:23 - Pure RL vs Language Models

English
45
103
1.1K
197.9K
Mapbox
Mapbox@Mapbox·
Two weeks ago, Moritz Förster, Solutions Architect at Mapbox, joined a panel webinar hosted by @PointrTech to discuss how AI and large language models (LLMs) are making maps more intuitive and responsive. "MapGPT is our LLM-powered interface that understands natural language, allowing users to query maps effortlessly. Drivers can engage with maps using simple voice commands without taking their hands off the steering wheel. These technological advances make things possible that weren't possible years ago — like what Pointr accomplished with MapScale. It's remarkable how quickly it processes floor plans and produces high-quality results, demonstrating how AI is transforming both our interfaces and the underlying mapping technologies." -Moritz Read the full webinar recap here: pointr.tech/blog/ai-for-th… #BuiltWithMapbox #AI #Navigation
English
1
0
1
1.1K
Yaniv Markovski
Yaniv Markovski@yanivm13·
@FeedTechILUncen הצטרפתי לפני שנה לחברה. זו התעשיה הכי דינאמית שיש, מתחרים מוציאים דברים מדהימים כל שבוע. בלי שינויים מהותיים אין לנו מקום בספייס הזה. בטוח שזה לא תמיד כיף או ברור לכולם, אבל לשמחתי יש ויז׳ן מעניין שדורש חדשנות ושיתוף פעולה מכולם וייחשף (החוצה) בקרוב.
עברית
0
0
14
2.2K
פידטק וידויים אנונימיים ללא צנזורה
ai21 3 סבבים של שינויים ארגוניים מאסיבים בפחות משנה לצד ״התייעלות״ ולצד בכירים שלא מפסיקים לעזוב. העובדים לא מבינים מה החברה מנסה לעשות וכנראה שגם ההנהלה לא
פידטק וידויים אנונימיים ללא צנזורה@FeedTechILUncen

*התוודו אנונימית כאן👇 bit.ly/3GCanEe

עברית
11
0
77
30.4K
Arun Rao
Arun Rao@sudoraohacker·
Lol - looks like payment systems for ChatGPT Plus subscriptions are down - not taking credit cards (I tried 2 cards and both got declined, as the system asked for a debit card). Signing up to play with the multi-modal for some ideas I have for GPT-4. @npew @gdb @OpenAI
Arun Rao tweet media
English
3
0
6
1.8K