xoots

6.7K posts

xoots

@xoots1

alpha xoots

Katılım Şubat 2019

1.9K Takip Edilen741 Takipçiler

xoots retweetledi

David Ondrej@DavidOndrej1·6d

if you're not running Gemma 4 E4B on locally on your airpods you're falling behind

English

179

3.1K

145K

xoots@xoots1·1 Nis

@0xSero

GIF

QME

0xSero@0xSero·1 Nis

Do you want to try Droid? I’m doing a giveaway 3 people will win 100M Factory credits each.Thats 5 months of their 20$ a month subscription. Winners selected randomly from comments in 48 hours.

English

1.1K

789

72.3K

xoots@xoots1·1 Nis

love me some cache

Jonathan Barazany@Barazany

Claude Code's compaction engine in one sentence: Three tiers of cleanup before every API call, LLM summarization dead last, and every decision shaped by one constraint – cache hits are everything. barazany.dev/blog/claude-co…

English

xoots@xoots1·27 Mar

@amit4tek @trychroma @grok it’s funny how much money they dumped into media etc to make this bad video just to have people not understand bc they video and presentation is horrible 😭💀

English

230

Snowlion@amit4tek·27 Mar

@trychroma @grok what are practical usecases ?

English

5.4K

Chroma@trychroma·26 Mar

Introducing Chroma Context-1, a 20B parameter search agent. > pushes the pareto frontier of agentic search > order of magnitude faster > order of magnitude cheaper > Apache 2.0, open-source

English

140

402

4.1K

1.1M

xoots@xoots1·26 Mar

@mikeyobrienv I posted some tips today if you do on what I was doing for the high hit rates, (hermes wrote it💀)

English

Mikey O'Brien@mikeyobrienv·26 Mar

@xoots1 I have not but I might have to try now

English

Mikey O'Brien@mikeyobrienv·25 Mar

Interesting read from Anthropic on harness design for long-running apps. A lot of parts they described: loops, handoffs, evaluator separation, and runtime control are exactly the layer ralph-orchestrator provides anthropic.com/engineering/ha… github.com/mikeyobrien/ra…

English

281

18.6K

xoots@xoots1·26 Mar

@mikeyobrienv hey you ever try out some deepseek chat via official endpoint in your ralph’s ? the ability to hit cache in loops is crazy, I sustained 97% cache hit across like 40mil tokens of loops

English

Mikey O'Brien@mikeyobrienv·26 Mar

@xoots1 Love hermes agents

English

xoots@xoots1·26 Mar

@konnydev *Claude is requesting to exit plan mode*

English

Konny@konnydev·26 Mar

@xoots1 Yeah, because high IQ sometimes get stuck in planning mode which won’t get you very far

English

Konny@konnydev·25 Mar

Hot take: Vibe coding is useless when it’s a bigger project.

English

506

1.3K

95.3K

xoots@xoots1·26 Mar

@konnydev and while high iq’s can see patterns everywhere, it can paralyze them. while an idiot yolo executes and goes farther than them hah

English

Konny@konnydev·26 Mar

@xoots1 I never understood. Some people see monoliths as an advantage, and some people see them as a disadvantage.

English

xoots@xoots1·26 Mar

@KyleHessling1 @LottoLabs fire, followed

English

Kyle Hessling@KyleHessling1·25 Mar

@LottoLabs My latest camera app that I built largely with qwen 27b, I had to finish with Opus 4.6 because while qwen was working, the apple log pipelines were pretty complex and new so I tagged in opus to close it out, finalizing some bug fixes then will post for free to the AI community!

English

405

Lotto@LottoLabs·25 Mar

GPT5.4 critique on some of qwen 27b w/ hermes agent code. Very difficult domain and project scope. Especially w/ the help of sota models the 27b can hold its own, especially doing the scaffolding.

English

4.9K

xoots@xoots1·26 Mar

@Rahatcodes the hermes agent has some cool stuff you can use for this, you can use it as a communication layer and a operator that connects cluade codex and agent together and to you. additionally you can set up an inbox system that allows two way comms between claude code and herme w hooks

English

rahat@Rahatcodes·25 Mar

Before I go build this thing I want to know if someone has a tool for this: When I start building a feature into a codebase I do this: - Start planning with Claude - Copy the plan over to codex and review - then some manual back and forth until me, codex, and claude agree on the plan Ideally i'd like a terminal view that just seemlessly shares the context to both agents somehow

English

1.9K

xoots@xoots1·26 Mar

@imjszhang @rahulgs and if internal mem system that are run and retrieved by the internal model running the agent can be trusted in long forms. experimenting with external mem callable via api to pre flight inject kb to agents before tasks

English

JS@imjszhang·26 Mar

@rahulgs The arms race for longer context windows is a race to the bottom. What's scarce isn't token capacity—it's knowing what deserves attention when everything technically fits.

English

1.1K

rahul@rahulgs·25 Mar

seems obvious but: things that are changing rapidly: 1. context windows 2. intelligence / ability to reason within context 3. performance on any given benchmark 4. cost per token things that are not changing much: 1. humans 2. human behavior, preferences, affinities 3. tools, integrations, infrastructure 4. single core cpu performance therefore, ngmi: 1. "i found this method to cut 15% context" 2. "our method improves retrieval performance 10% by using hybrid search" 3. "our finetuned model is cheaper than opus at this benchmark" 4. "our harness does this better because we invented this multi agent system" 5. "we're building a memory system" 6. "context graphs" 7. "we trained an in house specialized rl model to improve task performance in X benchmark at Y% cost reduction" wagmi: 1. product/ui 3. customer acquisition 4. integrations 5. fast linting, ci, skills, feedback for agents 6. background agent infra to parallelize more work 7. speed up your agent verification loops 8. training your users, connecting to their systems and working with their data, meeting them where they are

English

111

228

3.3K

398.2K

xoots@xoots1·26 Mar

@imranye I just did a write up about cache hit discounts with deepseek that’s helpful for creating specific repeatable workflows, and can also help with more general agentic work flows as well: x.com/xoots1/status/…

xoots@xoots1

I ran 110 million tokens through the DeepSeek API in March. Autonomous agents. Research pipelines. Overnight coding sprints. 7,030 API calls. My bill was $6.84. Here’s how it worked, what breaks it, and how to set it up so you can do the same thing. 🧵

English

364

imran@imranye·25 Mar

i am now spending $100 every 4 days on tokens this is unsustainable especially because I'm nowhere near where I want to be does anyone have suggestions for self-hosting models? budget 2-4k

English

100

120

162K

xoots@xoots1·25 Mar

@0xSero

GIF

QME

531

0xSero@0xSero·25 Mar

21% total reduction in vram for the same context

English

715

42.3K

xoots@xoots1·25 Mar

Drop a comment if this was helpful and you plan to try this setup. I’ll drop a guide on using deepseek for coding specific tasks and harness findings that improved quality output and spammed less api calls to do the same tasks. Stay tuned🐋

English

xoots@xoots1·25 Mar

Claude caching is different. Anthropic uses explicit cache breakpoints in the API, shorter TTL windows, and different tradeoffs. We’re hitting ~88% there too, but you earn it differently. If you want that breakdown next, I’ll post it. Same outcome, different method.

English

xoots@xoots1·25 Mar

English

550

Keşfet

@0xSero @amit4tek @trychroma @grok @mikeyobrienv @konnydev @KyleHessling1 @LottoLabs