Christopher

26.3K posts

Christopher banner
Christopher

Christopher

@communicating

Optimist, Geek, Building @AgletsAI. Dot Connector, Tool Builder, Info Hacker & Coder. Into Edge & Physical AI, Agents, Small LLMs, and making hard things easier

Now YVR | Soon Everywhere Katılım Haziran 2008
307 Takip Edilen2K Takipçiler
Christopher
Christopher@communicating·
@samhogan Feels right. So we can decrease the analysis surface with HALO but we still need a lot of runs to feed it. You’re iterating currently but do you c this as a post run process where you gather your traces over time or a pre run process where you simulate what u r optimizing for?
English
0
0
0
6
Christopher
Christopher@communicating·
@samhogan Ps. You said, “This is an important negative result. HALO can find slack in harnesses, but it cannot reinvent missing knowledge from the models itself.” Total agreement. Could HALO also be used in an “off brand “way to directly test model domain knowledge depth in open SLMs?
English
0
0
0
23
Christopher
Christopher@communicating·
We’ve largely passed the point where agent failures in production are about intelligence or capability bottlenecks. The real challenge is now execution assurance.
English
0
0
0
29
Christopher
Christopher@communicating·
@cline Doing a rebuild is a huge undertaking. It looks loke you’ve pulled in learnings from not only all your work but from others. 2 questions. 1. Do you pull in dependencies from projects you’ve learned from or just learning? 2. Does the core support forming like in Pi? Thank you!
English
0
0
0
12
Cline
Cline@cline·
Introducing the Cline SDK. We rebuilt the Cline harness for our extension and CLI from scratch using all the lessons learned since creating one of the world's first coding agents in 2024, and are open sourcing it for others to build with today. npm i @​cline/sdk 🧵
English
176
1.7K
722
3.5M
Christopher
Christopher@communicating·
@rachelrapp Not all providers treat the stream as a raw sequence of events. They try to b helpful ensuring the stream is “clean” but if you need to do anything w/ the stream it’s a nightmare. Confirming @Baseten just streams tokens means RL & other tasks they need “data” are straightforward
English
0
0
1
26
Christopher
Christopher@communicating·
@rachelrapp Hi Rachel, I have 2 technical questions about tool call streaming on the platform I’m hoping you can answer for me. This is critical to some features in implementing I’ll add each as an attached tweet using the + gesture since they’ll never fit in one. Thank you.🍺
English
1
0
0
104
Christopher
Christopher@communicating·
@rachelrapp Thank you Rachel, this is a huge help. It means I don’t have to worry about an issue I was really worried about having to deal with. 🔥
English
1
0
1
17
Christopher
Christopher@communicating·
@kevin_x_li Hi Kevin. Thanks for reaching out. Sorry, I tried to make it clear in my tweet that even w/ the “limitation” this is still a super useful dataset. 270 char is never enough. For some tasks you’ll want completions others as you say you will be fine. It’s a great contribution.🔥
English
0
0
1
20
Kevin Li
Kevin Li@kevin_x_li·
Hi Christopher, thanks for reposting! The limitations you mentioned are all true and fully documented in the dataset card. However, as I pointed out in the card, based on our and others' experiments, trace completion/correctness doesn't matter as much for training. In fact, NemotronTerminal's Table 7 shows that no filtering (12.4 %) significantly surpasses both complete-only (6.74 %) and success-only (5.06 %) filtering strategies on their synthetic dataset. Our own experiments with Qwen3-1.7-Base corroborate this: the model reaches 9% on SWE-bench Verified when trained on 100K samples from this corpus truncated to 8K tokens, even though most rollouts do not end with a patch submission. Hope this clarifies some of the concerns!
English
1
0
0
35
Christopher
Christopher@communicating·
Sweet huge trace corpus released Caveat: While it’s massive & is very useful the below nature of the dataset may limit that usefulness Most rollouts do not reach a successful Submitted exit status; …. If you filter to clean submissions only, you’ll lose the bulk of the corpus
Kevin Li@kevin_x_li

Introducing SWE-ZERO-12M-trajectories: the largest agentic trace dataset in the open, 5.7x larger than the previous largest. 112B tokens · 12M trajectories · 122K PRs · 3K repos · 16 languages huggingface.co/datasets/Alien…

English
1
0
1
136
Christopher retweetledi
Sam Hogan 🇺🇸
Sam Hogan 🇺🇸@samhogan·
"self-improving agents" "self-healing agents" "continuously-improving agents" "model-data flywheel" these are all terms being thrown around to describe non-deterministic software systems that self-analyze and improve with little human interaction, but there is pretty
English
4
2
12
1K
Christopher
Christopher@communicating·
RT @baseten: “The question every app layer company is now asking is no longer ‘how do we use AI?’ It is ‘how do we resist commodification t…
English
0
1
0
9
Christopher retweetledi
Mario Zechner
Mario Zechner@badlogicgames·
People of pi.dev. Do not install.by via any method other than what's shown on the website and in the docs. E.g. we do not publish to brew and never will. Someone else did. We have zero control over what goes into the brew release.
English
18
55
420
26.1K
Christopher
Christopher@communicating·
@evalstate Nice. 👍 I ran out of characters. lol. Need to go pro I guess but was holding off to do that on the AgletsAI account. Guess I could do both.
English
0
0
1
10
Christopher
Christopher@communicating·
@evalstate Stumbled on this & on 1st glance it looked interesting & since we were taking PII thot I’d share Not a rec since I haven’t had time to actually review it yet MemPrivacy - Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents: huggingface.co/papers/2605.09…
English
1
0
2
46
Christopher
Christopher@communicating·
@RichardSocher Hi Richard. A big day, especially since I get the impression from following your writings, talks, and past work that the thesis behind Recursive is the core idea you’ve been working towards for a long time. Congratulations. Very happy for you and your team.
English
1
0
2
43
Christopher
Christopher@communicating·
@rachelrapp I really appreciate it.🍺 Sorry for pestering The same happens to me actually. Not sure why it goes wonky🤔 How I gen’d tweet: I was reading announcement tweets you posted, clicked your profile pic to get to your account, hit plus & tweeted. Wonder y it didn’t go into mainline
English
0
0
0
9
Rachel Rapp
Rachel Rapp@rachelrapp·
😱 I only saw this because it popped up in my feed! So sorry, my X notifications are messed up -- I only get the random notification if someone reposts something I posted or comments (and not even all of those!) I pinged the engineers who work on tool calling, should have an answer soon :)
English
1
0
1
11
Christopher
Christopher@communicating·
@rachelrapp Hi Rachel, just followng up on my 2 tool call delta queries. Hoping u can confirm how they’re handled for both model api & custom truss scenarios. If u need more detail let me know Really appreciate the help. I haven’t been processing the thought chain yet but need 2
English
1
0
1
23
Christopher
Christopher@communicating·
The Thinking Machines Lab’s approach to human-ai collaboration w/ their Interaction Models isn’t getting the attention it deserves. Tinker’s awesome but this approach, if it has the accuracy, speed, and scales is game changing for human-model interfacing. thinkingmachines.ai/blog/interacti…
English
1
0
0
39
Christopher
Christopher@communicating·
@arlanr PS. I was thinking. Since you’re doing messaging maybe a space theme would be a bit different.🤔 Cheers.
English
0
0
0
13
Christopher
Christopher@communicating·
@arlanr I like the concept. I like the animated cards. I think the hero animation and menus could be tighter. A lot of potential but 5/10 as it. If I had one worry it’d be it kind of reminds me of createanything.com but anything you build will likely remind someone of something
English
1
0
1
395
Arlan
Arlan@arlanr·
please rate this landing page
Arlan tweet media
English
100
1
219
167.2K