Bryon Kucharski

67 posts

Bryon Kucharski

Bryon Kucharski

@bryonkuchML

Search & LLMs at @Gartner_inc | MS from @manningcics | BS from @WITEngineering

Connecticut, USA Katılım Kasım 2021
987 Takip Edilen111 Takipçiler
Bryon Kucharski
Bryon Kucharski@bryonkuchML·
@katelyn_lesse Any detail you can go into about this code generated “post processing” of search results? Having a hard time understanding what code the model actually generates that can be used as a relevancy signal
English
0
0
1
123
Katelyn Lesse
Katelyn Lesse@katelyn_lesse·
sonnet 4.6 is available today. we continue to be bullish on the power of code execution, so we leaned in with our new programmatic web search & fetch tools. sonnet 4.6 saw 13% higher accuracy on BrowseComp while using 32% fewer input tokens. claude.com/blog/improved-…
English
1
6
138
5.9K
Bryon Kucharski
Bryon Kucharski@bryonkuchML·
@willccbb love it! Let me know if youd ever want to hear more about the painpoints I've had with RLTrainer, potentially that could help articulate why prime-rl is so great!
English
0
0
1
19
will brown
will brown@willccbb·
@bryonkuchML yeah! been wanting to make a video + accompanying blog walking through this stuff for a while (one day). for starters, will probably just look like fleshing out the verifiers / prime-rl docs with more conceptual walkthroughs
English
1
0
0
77
will brown
will brown@willccbb·
prob gonna deprecate vf.RLTrainer soon and move it to a demo folder in the repo there’s no good reason to use it over prime-rl. it’s purely for educational purposes as a 1000-LOC example
English
3
2
97
6.9K
Bryon Kucharski
Bryon Kucharski@bryonkuchML·
@johnowhitaker Would also ditto your comments about the value prop of Tinker 😁 Majority of my time here was spent on the GPU and infra related setup. Definitely spent way more than 30 cents too! But verifiers/prime-rl is solving a different problem
English
0
0
1
21
Jonathan Whitaker
Jonathan Whitaker@johnowhitaker·
OK I had to record a quick video and share a dialog showing my first few tests: youtube.com/watch?v=yId2PE… Dialog: share.solve.it.com/d/e52b8889b9d3… In the video, I show how easy it can be to train a model on a custom task with your own reward function. LMK what I should try next :)
YouTube video
YouTube
Jonathan Whitaker@johnowhitaker

First impressions of Tinker: I can tinker with LLMs again! Really liking it so far - you can focus on the data and *what* you want to DO, not the stress of distributed training, model loading, arcane incantations, implementation differences, library bugs... Amazing work @thinkymachines team <3

English
6
11
41
9.1K
Bryon Kucharski
Bryon Kucharski@bryonkuchML·
@johnowhitaker here my exact training toml. Used RLTrainer() for the training run on a ml.g5.24xlarge obligatory thanks to @willccbb and prime intellect
Bryon Kucharski tweet media
English
0
0
2
18
Bryon Kucharski
Bryon Kucharski@bryonkuchML·
@jxnlco @Anthropic @ivanleomk I love your package and the procedural APU. Im having some issues setting up the UI. Seems like im not able to load my JSONL checkpoints properly. Is this a known issue?
English
0
0
0
24
Bryon Kucharski
Bryon Kucharski@bryonkuchML·
@willccbb That’d be sweet. I’m trying to figure out what people normally do for deep research type systems. Any idea?
English
0
0
1
45
will brown
will brown@willccbb·
they should make an App Store for Verifiable Rewards
English
14
4
145
16.8K
Bryon Kucharski retweetledi
ClearSight AI
ClearSight AI@ClearSightAI·
$IT Q2 FY25 Earnings 📊 ✅ Revenue: $1.686B (beat) ✅ Adj. EPS: $3.53 (beat) ✅ GAAP EPS: $3.11 (beat) 📈 EPS up 9.6% YoY 🤖 Rolled out new AI tool “AskGartner” 🔁 $700M boost to share repurchase plan 📊 Guidance updated for FY25 #Earnings #Tech #AI
ClearSight AI tweet mediaClearSight AI tweet media
English
0
1
1
449
Clayton Thorrez
Clayton Thorrez@cthorrez·
Extremely excited to announce that I've joined @lmarena_ai! For years I've been working in LLMs for my job, and hacking on rankings and ratings for fun, beyond thrilled to be able to join this project at the intersection!
English
8
2
38
4.3K
Bryon Kucharski
Bryon Kucharski@bryonkuchML·
@yoavgo I have been playing with this and love it, thank you so much for all hard work and the public demo! Do you have any more information regarding ETA for the code? I know the blog mentions some copyright challenges
English
0
0
0
14
thomas
thomas@thomazvu·
this paper was so cool but they didn’t go far enough introduce money, equip each agent with an inventory, let them trade and scam each other let them fall in love, form alliances and fight each other, put them in the hunger games who’s working on this? DM me
thomas tweet media
English
2
0
5
779
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
@jobergum I am mostly speaking about the results on BEIR On MLDR, even if the training on MS MARCO is different, ColBERT (with long context models) outperforms even fine-tuned (in domain) dense models, so this is not even a fight here
English
1
0
2
115
Jo Kristian Bergum
Jo Kristian Bergum@jobergum·
Thoughts on ModernBERT and retrieval! What stands out to me here is the difference in effectiveness between a single-dense representation (DPR) versus ColBERT. Especially on long-context (MLDR).
Jo Kristian Bergum tweet media
English
3
5
57
3.9K
Joseph Suarez 🐡
Joseph Suarez 🐡@jsuarez·
@emollick We have NetHack running 130k steps/second in PufferLib. I love this env and wish we had more contributors interested in working on it. We can probably get farther than current LLMs with a 2M param model + RL
English
1
0
13
785
Ethan Mollick
Ethan Mollick@emollick·
This may sound odd, but game-based benchmarks are some of the most useful for AI, since we have human scores and they require reasoning, planning & vision The hardest of all is Nethack. No AI is close, and I suspect that an AI that can fairly win/ascend would need to be AGI-ish.
Ethan Mollick tweet mediaEthan Mollick tweet mediaEthan Mollick tweet media
English
35
105
596
90.7K
Jo Kristian Bergum
Jo Kristian Bergum@jobergum·
NotebookML is a great LLM product with a dorky name. Would love to see the system prompt as I have tried their aistudio for similar workflows without being impressed.
English
3
0
8
1.5K