Padarn

282 posts

Padarn

@Padarn

London, England Entrou em Mayıs 2009

530 Seguindo45 Seguidores

Padarn@Padarn·18 Oca

@rvtond Why can’t you just deny all tools except the code execution one? code.claude.com/docs/en/settin…

English

Rijnard van Tonder@rvtond·15 Oca

PreToolUse is nifty and how I forced Claude to only have one tool that generates code. rijnard.com/blog/the-code-…

Boris Cherny@bcherny

@palmerj3 Use a PreToolUse hook for bash. Ask Claude and it can add it for you code.claude.com/docs/en/hooks

English

323

Padarn@Padarn·13 Ara

@shi_weiyan Thanks! I don't see the recording at the link though?

English

Weiyan Shi@shi_weiyan·9 Ara

Recording: tinyurl.com/ymc5tady Hard to moderate a "multi-agent multi-turn human-AI interaction" session😂 so my answer to q1: “I’d love a moderation agent in 1yr🙏” Let's also put AI on more panels! As Claude said, AI can offer views as “someone who sit on both sides”🤣

English

Weiyan Shi@shi_weiyan·9 Ara

fun panel with @jaseweston @ysu_nlp @willccbb @xwang_lk @natashajaques - What agents can/can't solve in 1 yr - 1K+ step tasks - Academia & long-horizon tasks - Continual learning: in-context vs weights - Human-AI co-evolution Claude joined as our first AI panelist! Recording🧵

Weiyan Shi@shi_weiyan

Finally with a closing keynote by @ysu_nlp on “Computer Use: Modern Moravec’s Paradox”, we connect the history and the future 🙌 — “symbolic reasoning” vs “Perception & Mobility” in agents — future of AI — dragon-slaying on agent plasticity and reliability

English

24K

Padarn@Padarn·9 Ara

@shi_weiyan @jaseweston Is there a recording? Looks great!

English

108

Weiyan Shi@shi_weiyan·7 Ara

The afternoon starts with @jaseweston on “challenges long-horizon tasks” and solutions 🤩 — failure from memory — credit assignment in outcome-only reward — lack of environment & generalization

English

7.1K

Weiyan Shi@shi_weiyan·6 Ara

Starting now 🤩🤩🤩 Room 11AB

Weiyan Shi@shi_weiyan

Join us today (Sat) to have a full-day-long multi-turn interactions at the workshop🤩🤩🤩 ⏱️8:20am-5:30pm 📍Upper Level Room 11AB

English

10.3K

Padarn@Padarn·23 Kas

@mitsuhiko OpenAI have a minimal implementation of the response API for gpt_oss github.com/openai/gpt-oss… This seems less necessary for open models though: If you have access to the full trace (thinking included) the missing state is the KV-cache which I'd consider closer to an optimization.

English

Armin Ronacher ⇌@mitsuhiko·22 Kas

Followup to yesterday's post: I'm starting to think as agents and LLM APIs of being a state synchronization problem and that we might look into what the local first folks are doing. Dumped my thoughts here: lucumr.pocoo.org/2025/11/22/llm…

English

233

57.6K

Padarn@Padarn·23 Kas

@athyuttamre @mitsuhiko Is there any way to extend the retention? Does the whole thread expire at 30d or just the stored state of some response_ids?

English

Atty Eleti@athyuttamre·23 Kas

All items are written to the DB and retained for 30d. If the request fails before it reached us, then no changes. If the request fails after it reaches us, you’ll have state mismatch, but your application will always send the previous_response_id that _it_ thinks was the last one, and allow you to continue from where you left off.

English

384

Padarn@Padarn·24 Eyl

Great article from the Spotify Experimentation Team Beyond Winning: Spotify’s Experiments with Learning Framework engineering.atspotify.com/2025/9/spotify… Would be really interested to hear how what powered means for a metric. Is there an 'effect size of interest'?

English

Padarn@Padarn·16 Mar

@ezyang That’s a good point: I’ve not had this issue when using cursor, presumably because there is a separate model call to fit the edit to the current code?

English

Edward Z. Yang@ezyang·13 Mar

@Padarn The main problem is the formatter is going to change the structure of the edited code, which means we have to tell the LLM that the code changed, and we are more at risk of the LLM hallucinating the old structure

English

Edward Z. Yang@ezyang·10 Mar

Interesting codemcp problem: if you have an autoformatter, when should it run? I've currently decided to make it run at the end of the task, so the LLM doesn't get sidetracked fixing formatting errors while it still has N other tasks to do. But maybe Sonnet is up to the task? idk

English

975

Padarn@Padarn·26 Kas

@SlackHQ Hey Slack! I'm wondering if there is anywhere I learn learn about the roadmap for the slack API In particular I want to know when I can get thread information from this event api.slack.com/events/assista…

English

Padarn@Padarn·29 Mar

@AmazonScience @WSDMSocial Great work! Will it be available on github anytime soon?

English

Amazon Science@AmazonScience·14 Mar

Anomaly detection on graphs is complex because of graph topologies, and training data is scarce. At @WSDMSocial, Amazon researchers showed how to generate anomalous graphs using a variational graph neural network and diffusion modeling in the latent space. amazon.science/blog/anomaly-d…

English

1.5K

Padarn@Padarn·16 Şub

@ezyang I couldn’t find it :sad:

English

Edward Z. Yang@ezyang·16 Şub

Many thanks to the Rust Programming Languages Discord for helping me figure it out.

English

859

Edward Z. Yang@ezyang·16 Şub

Performance puzzle! In Rust, you are iterating lines of file running regex "\d{2}:\d{2}:\d{2} [A-Za-z0-9]+: (.+)". On production data, it is 20MB/s slow. But when you delete (.+) from the regex, your speed jumps up to 1GB/s. It doesn't repro on test data. What's the problem?

English

4.2K

Padarn@Padarn·25 Oca

@rakyll Curious what, if any, tools you're using to do this? I've found copilot only "okay" as an experience to do test driven LLM development.

English

Jaana Dogan ヤナドガン@rakyll·24 Oca

LLMs made software development difficult. I'm developing a fairly complicated state machine and my options are: - Making LLMs manage the flow based on some descriptions of state and steps, and evaluate the hell out of it - Generating the state machine code based on the same descriptions - Generating conformace test suite from the same descriptions and implement the state machine myself And various combinations of all three.

English

34.8K

Padarn@Padarn·1 Oca

@ZheqingZhu @AIatMeta Fantastic stuff. Thanks a lot to you and your team for all the writing, really valuable for this field to have practical examples of where RL works today!

English

239

Zheqing (Bill) Zhu@ZheqingZhu·1 Oca

2023 has been a really fruitful year for the Applied Reinforcement Learning Team at @AIatMeta ! A quick summary of our external research contributions across open-source, recomender systems, ads, infra, experimentation and new algorithms: 1. Open-source software: we released Pearl (github.com/facebookresear…), our flagship OSS on production-ready reinforcement learning, so far obtained 2K stars and 120+ forks on github within 3 weeks of release and presented at NeurIPS 2023. We are also nominated by @tryolabs as runner-ups to top 2023 python libraries. 2. Recommender systems: we've released four RL papers regarding recommender systems across exploration, on-policy RL, offline learning: - On-policy RL: Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning (RecSys 2023, arxiv.org/abs/2305.13747) - Scalable neural bandit: Scalable Neural Contextual Bandit for Recommender Systems (CIKM 2023, arxiv.org/pdf/2306.14834) - Scalable deep neural exploration: Deep Exploration for Recommendation Systems (RecSys 2023, arxiv.org/pdf/2109.12509) - Offline learning: Learning to Bid and Rank Together (Machine Learning, Springer, ArXiv to be released) 3. Ads and auction systems: a new RL based pacing algorithm is released that ensures intepretability and compatibility with classic controller based pacing systems. - Offline Reinforcement Learning for Optimizing Production Bidding Policies (in submission, arxiv.org/abs/2310.09426) 4. Data center and infrastructure: RL has been a really powerful tool for optimization and data center operations. We released two papers that leverage RL for network migration and resource allocation - Resource allocation: Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation via Deep Reinforcement Learning (in submission, arxiv.org/abs/2306.17054) - Network migration: Klotski - Efficient and Safe Network Migration of Large Production Datacenters (ACM SigComm 2023, dl.acm.org/doi/abs/10.114…) 5. Experimentation: to address training data leakage from test group to control group in A/B tests introduced by online exploration, we developed a new experimentation procedure to measure the true impact of exploration methods: - Evaluating Online Bandit Exploration In Large-Scale Recommender System (KDD Workshop 2023, arxiv.org/abs/2304.02572) 6. New RL/Bandit Algorithms: We also spent time to advance methods in non-stationary neural contextual bandits and model-based RL. - Nonstationary neural contextual bandit: Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling (in submission, arxiv.org/abs/2310.07786) - Offline model-based RL: IQL-TD-MPC - Implicit Q-Learning for Hierarchical Model Predictive Control (ICML workshop 2023 and in submission, arxiv.org/abs/2306.00867) Looking forward to 2024 where we will bring more RL magic into the real world!

English

10.5K

Padarn@Padarn·27 Ara

@eugeneyan Got it. It’s interesting it’s not shown up in the discussion at all when people worry about ordering the context in the prompt. Seems like it’d avoid having to guess about which documents were being conditioned on and to what extent.

English

Eugene Yan@eugeneyan·26 Ara

@Padarn yes that’s right. it turns out that the simpler approach you mention goes a long way toward though there may be some gains from the techniques in the papers

English

Eugene Yan@eugeneyan·2 Ağu

Wrote abt patterns for LLM systems/products • Evals: Track performance • RAG: Add external knowledge • Finetuning: Improve specific tasks • Caching: Reduce latency & cost • Guardrails: Ensure output quality • Defensive UX: Anticipate & manage errors eugeneyan.com/writing/llm-pa…

English

173

790

141.2K

Padarn@Padarn·25 Ara

@thorstenball Seems like this got Amazon a lot of free advertising for a feature… maybe it’s all working out?

English

Thorsten Ball@thorstenball·23 Ara

So is the myth that every pixel you see on Amazon has been optimized and A/B tested for thousands of lifetimes before you laid your eyes on it — is that actually true? Because, man, I have a hard time accepting that *this* is the best way to present the add-to-wishlist button.

English

291

71.7K

Padarn@Padarn·16 Ara

@ChinmayaKausik @eytan @ZheqingZhu @AIatMeta I’d be fascinated to know how people use these paradigms together?

English

Chinmaya Kausik@ChinmayaKausik·11 Ara

Two fantastic hands-on tutorials on Ax and Pearl from @eytan and @ZheqingZhu’s teams at @AIatMeta! Definitely check out the colab tutorial file for Ax: tinyurl.com/ax-neurips2023 The Pearl colab file is linked to in their GitHub readme: github.com/facebookresear…

English

1.8K

Padarn@Padarn·1 Ara

@airvistara hello can you help me with a lost Apple Watch on a flight? The flight was from Delhi to Singapore, UK115, leaving 23:45 27th of November

English

Padarn@Padarn·27 Eki

@testingham (Off topic: Sad those abstracts are not published)

English

tom cunningham@testingham·25 Eki

Full post here: tecunningham.github.io/posts/2023-10-…

English

433

tom cunningham@testingham·25 Eki

NEW POST: Thinking about tradeoffs? draw an ellipse. With applications to (1) experiment launch rules; (2) ranking weights in a recommender; and (3) allocating headcount in a company.

English

2.2K

Padarn@Padarn·27 Eki

@testingham I remember seeing this in a CODE@MIT abstract last year (I didn’t go but a colleague did) and loving the idea. Glad to see this write up

English

Descobrir

@rvtond @shi_weiyan @jaseweston @ysu_nlp @willccbb @xwang_lk @natashajaques @mitsuhiko