1.5K posts

aj

@anndvision

postdoc @Columbia . member of @blei_lab . phd @UniofOxford . prev @OATML_Oxford , @PVG_McGill , intern @Meta . he / they

nyc Katılım Mart 2011

654 Takip Edilen1.3K Takipçiler

Sabitlenmiş Tweet

aj@anndvision·5 Haz

new preprint "ReLU to the Rescue: Improve your On-policy Actor-Critic with Positive Advantages" shockingly simple changes to A3C can give a cautious RL algorithm more effective than PPO in some settings, just adding a ReLU is enough! arxiv.org/abs/2306.01460

English

41.2K

aj retweetledi

TensorZero@TensorZero·6d

x.com/i/article/2044…

ZXX

aj retweetledi

TensorZero@TensorZero·23 Mar

We’re building TensorZero Autopilot, an automated AI engineer that analyzes LLM observability data, optimizes prompts and models, sets up evals, and runs A/B tests. It dramatically improves the performance of LLM agents on every single benchmark we’ve tried. Read more below.

English

8.1K

aj@anndvision·18 Oca

ZXX

150

aj retweetledi

TensorZero@TensorZero·11 Kas

🗞️ [Blog Post] Bandits in your LLM Gateway: Improve LLM Applications Faster with Adaptive Experimentation (A/B Testing) • Experimentation (A/B testing) with production traffic is the most reliable way to identify the best prompts and models for your task, but traditional approaches have significant limitations: you must either fix the experiment length in advance (risking wasted data or inconclusive results) or repeatedly check for significance (inflating error rates through p-hacking). • TensorZero now provides adaptive experimentation directly in its open-source LLM gateway. This multi-armed bandit algorithm overcomes the p-hacking problem, running experiments precisely until there’s enough evidence to pick a winner while dynamically allocating LLM inference traffic for maximum efficiency. • Across a diverse set of realistic and challenging environments, adaptive experimentation reduced the average time to correctly identify the best LLM variants (prompts, models, etc.) by 37% compared to simple A/B testing. Read more ↓

English

1.9K

aj@anndvision·10 Kas

@TensorZero @_shuyang_ welcome !

English

TensorZero@TensorZero·10 Kas

Shuyang Li previously was a staff software engineer at Google focused on next-generation search infrastructure, LLM-based search, and many other specialized search products (local, travel, maps, etc.). Before that, he worked on ML/analytics products at Palantir and graduated summa cum laude from Notre Dame. Welcome to the team, @_shuyang_!

English

752

aj@anndvision·17 Eki

or during value estimation, matter of fact

English

104

aj@anndvision·17 Eki

when y'all start leaving lora dropout on during inference, don't forget to cite

aj@anndvision

English

188

aj@anndvision·16 Eki

changing tab from path completion to a think toggle in claude code is wild

English

123

aj@anndvision·30 Eyl

algorithms have been doing me dirty all week, this one's quality

aj@anndvision

is reinforcement fine tuning worth it? @OpenAI's RFT can be 700x more expensive than SFT and has stricter content moderation i tested it on data extraction, agentic coding, and customer service to find out 🧵

English

256

aj@anndvision·30 Eyl

read the full analysis, methodology, and code examples: tensorzero.com/blog/is-openai… thanks to @gabrielbianconi @thebigmehtaphor Alan Mishler and Shuyang Li for their helpful feedback what's your experience with rft? cooked or let it cook?

English

338

aj@anndvision·30 Eyl

i ran this using @tensorzero's open-source stack 💾github.com/tensorzero/llm… includes: • programmatic sft/rft workflows • llm grader configs • evaluation methodology

English

169

aj@anndvision·30 Eyl

English

658

aj@anndvision·30 Eyl

agentic coding (terminal-bench): RFT wins here it improved performance where SFT failed (at a 241x cost premium) if you're building agents that benefit from reasoning and have the budget, this might be your use case

English

aj@anndvision·30 Eyl

data extraction (CoNLL++ NER): RFT improves performance with 10 examples... but SFT on a larger dataset did better with: • 159x lower optimization cost • 11x cheaper inference • 3x faster responses

English

Keşfet

@TensorZero @_shuyang_ @gabrielbianconi @thebigmehtaphor @OpenAI @elonmusk @BarackObama @taylorswift13