Elroy Tracey

327 posts

Elroy Tracey banner
Elroy Tracey

Elroy Tracey

@elroyic

เข้าร่วม Eylül 2008
312 กำลังติดตาม55 ผู้ติดตาม
Sam Altman
Sam Altman@sama·
Here is a manga made by ChatGPT Images 2.0 of @gabeeegoooh and me looking for more GPUs:
Sam Altman tweet mediaSam Altman tweet mediaSam Altman tweet media
English
583
221
4K
516.4K
Carlo
Carlo@CarloFCesar·
@AlexFinn I got 2x DGX spark and 1x Mac mini 24gb. What is your advised model setup for a fully local multi-agent workflow that runs 24/7 computations->analysis on a vectorized (ever-growing) database?
English
2
0
0
934
Alex Finn
Alex Finn@AlexFinn·
It happened. An open weights model just dropped that benchmarks higher than Opus 4.6 is out If you have 2 Mac Studios w/ 512gb, you can run Opus 4.6 level intelligence completely for free on your desk I warned you this would happen months ago. Now Mac Studios and Mac Minis are sold out The next Mac Studio has been delayed until Q3/Q4. The price will be significantly higher I told you this was going to happen. Intelligence explosion. Hardware bottleneck. Increased efficiency Luckily I picked up 2 Mac Studio 512gbs, 2 Mac Minis, and a DGX Spark I will be loading this up in the next couple of days and will have completely private super intelligence running for me 24/7 I’m telling you right now by end of year we will have a local version of Mythos. It’s 100% guaranteed You called me crazy but every single prediction I’ve made has turned out to be true These models will only get more efficient and require less hardware. But that hardware is only going to get more expensive Local/open source is so obviously the future and if you’re still denying this now you are delusional
Kimi.ai@Kimi_Moonshot

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English
188
146
1.7K
317.9K
Elroy Tracey
Elroy Tracey@elroyic·
@VisualFidelity Had my doubts when I started playing but the story actually is very meta, just think of it as you being isakied into main character and everything flows. Controls didn't make sense but as you get skills they make more sense.
English
0
0
0
175
Visual Fidelity
Visual Fidelity@VisualFidelity·
Are you happy you bought Crimson Desert?
Visual Fidelity tweet media
English
1.1K
40
3.6K
417.1K
Elroy Tracey
Elroy Tracey@elroyic·
@antigravity Thanks for update these limits are dumb cancelled sub thanks for all the fish.
English
0
0
0
85
Google Antigravity
Google Antigravity@antigravity·
We’re evolving Google AI plans to give you more control over how you build. Every subscription includes built-in AI credits, which can now be used for Antigravity, giving you a seamless path to scale. Google AI Pro is the home for the practical builder, hobbyists, students, and developers who live in the IDE and don't necessarily rely on an agent. This plan features generous limits for Gemini Flash, with a baseline quota included to "taste test" our most advanced premium models. Google AI Ultra serves as the daily driver for those shipping at the highest scale who need consistent, high-volume access to our most complex models. If you’re on Pro but need "extra juice" for a heavy sprint or deeper access to premium models, simply top up your AI credits to customize your plan. Keep building. Keep shipping.
English
1.5K
302
4.4K
1.5M
Denis Shiryaev 💙💛
Denis Shiryaev 💙💛@literallydenis·
Just tested: llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M So, smaller models are even more usable
English
103
137
2.4K
276.6K
Elroy Tracey
Elroy Tracey@elroyic·
@sudoingX Running with ROCm on Halo Strix 128Gb Qwen3.5 35B at full context get 35tok/s uses 60Gb Ram/VRAM.
English
1
0
3
329
Sudo su
Sudo su@sudoingX·
for anyone on AMD, genuinely curious how ROCm is performing on equivalent models. haven't tested it yet but it's on the list.
English
11
0
15
4.7K
Sudo su
Sudo su@sudoingX·
if you have a single RTX 3090 and want the best local inference setup right now, here's what i landed on after testing 5 open source models across 7 GPU configs this month. GPU: 1x RTX 3090 24GB model: Qwen 3.5 27B Dense Q4_K_M (16.7GB) context: 262K (native max) speed: 35 tok/s generation, flat from 4K to 300K+ reasoning: built in chain of thought, survives Q4 quant config: llama-server -ngl 99 -c 262144 -fa on --cache-type-k q4_0 --cache-type-v q4_0 what this gives you: - 27B params all active every token - no speed degradation as context fills - full reasoning mode on a consumer GPU - 7GB VRAM headroom after model load tested MoE (faster but less depth per token) and dense hermes (same speed, degraded under load). qwen dense hit the sweet spot for single GPU. more architecture comparisons dropping soon. what's your single GPU setup? curious what configs people are running.
English
69
57
693
46.1K
@martinlasek
@martinlasek@martinlasek·
You can pay $200/m for Claude for eternity Or buy a Mac Studio for $166/m and run your own model for free Best part? You own it after 12 months at 0% interest: > M4 Max > 36GB Memory > 512 GB SSD Hmmm…
@martinlasek tweet media@martinlasek tweet media
English
439
43
1.1K
663.6K
sudo rm -rf
sudo rm -rf@itsjustmarky·
@TeksEdge @arena It is far too slow on a Strix Halo. 8 tokens/sec when I tested and way slower as you fill the context.
English
1
0
2
978
David Hendrickson
David Hendrickson@TeksEdge·
🦞 Clawdbot News: Interested in choosing the best local Qwen3.5 model to run your Agent? Qwen3.5-27B dominates the benchmarks in terms of size. Ranking #66 on @arena latest leaderboard (above MiniMax M2.5, Step 3.5, GPT-5-mini). Benjamin's assessment shows Unsloth and Bartowski continue to be the gold standard. Unsloth shows nearly no quality loss, bringing down the footprint by 65% (55.6GB to 19.46GB), the perfect size to run on an AMD Ryzen AI Max+ 395 rig with 64 GB RAM (or M4 Mac Mini w/64GB)
David Hendrickson tweet mediaDavid Hendrickson tweet media
Benjamin Marie@bnjmn_marie

Qwen3.5 27B GGUF evaluation: ✅UD-Q4_K_L, IQ4_XS, and IQ4_NL perform closely to the original ✔️UD-IQ3_XXS is good enough ❌Couldn't find a good Q2 Next: Qwen3.5 9B. Again, early results suggest Q4 quants are very close to original

English
7
14
137
61.2K
@Axiomofmind ⚡
@Axiomofmind ⚡@axiomofmind·
@mattshumer_ Multi agent on one gateway or multiple gateways? Multi agent as long as it's set up right (separate workspaces) you just have the main agent use the session ID to communicate
English
2
0
1
1.8K
Matt Shumer
Matt Shumer@mattshumer_·
For those with multiple OpenClaw agents on one machine, what's your setup to allow them to work together?
English
163
9
282
68.5K
Seth Cronin
Seth Cronin@SethCronin·
@mattshumer_ Follow up, if each (human) on a team has an openclaw, what’s the best way to let them work together? My instinct is email, but that seems slow and difficult to audit.
English
5
0
1
2.6K
Elroy Tracey
Elroy Tracey@elroyic·
@onusoz @mattshumer_ working on many tasks at once... splitting context to many claws instead of clogging one... performance and ability for each claw to focus on one task instead of everything you ever thought of.
English
0
0
0
223
Onur Solmaz
Onur Solmaz@onusoz·
@mattshumer_ why do you need multiple openclaw instances on 1 machine?
English
3
0
4
2.6K
Elroy Tracey
Elroy Tracey@elroyic·
@mattshumer_ Many claws, one machine, one vm per claw. Each claw has its own workspace. There is a shared workspace. Discord Channels for communication between claws (cron job to check channels), embedded knowledge graph to share context.
English
1
0
0
153
pc
pc@pcshipp·
What’s the best way to run OpenClaw? - VPS - Locally
English
147
3
128
27.1K
Elroy Tracey
Elroy Tracey@elroyic·
@sudoingX On AMD Halo strix platform with 128Gb Ram. Getting 25 t/s on Q8 model with 200k token context window. Feels fast enough.
English
0
0
0
100
Sudo su
Sudo su@sudoingX·
the numbers coming in from this thread: 5090: 166 tok/s (z33b0t), 153 tok/s (EmmanuelMr) 4090: 122 tok/s (StubbyTech) 3090: 112 tok/s (sudo), 100 tok/s (Eduardo) 6800XT: 20-30 tok/s (Dark) Qwen3.5-35B-A3B. 4-bit quant, 19.7 GB on disk. fits entirely on any single 3090 24GB card with room to spare. no offloading, no splitting, full speed. 5090 owners keep pushing the ceiling and we haven't found it yet. NVIDIA side is stacking up. where are the ROCm numbers? report your GPU and tok/s below. building the full map.
Sudo su tweet mediaSudo su tweet mediaSudo su tweet mediaSudo su tweet media
Sudo su@sudoingX

hey if you're running Qwen3.5-35B-A3B on llama.cpp and stuck at 40-70 tok/s on 24GB+ VRAM, you're leaving speed on the table. use llama.cpp from source. add these flags: --cache-type-k q8_0 --cache-type-v q8_0 -np 1 Eduardo went from 50 to 100 on a 3090 (24 GB). StubbyTech just hit 122 on a 4090 (24 GB). full 262K context, zero speed loss. UD-Q4_K_XL quant. all layers on GPU. stop leaving performance on the table.

English
57
29
518
121.1K
Ian Chi
Ian Chi@SimplePestMgmt·
Hey @steipete and @openclaw why is OC so bad with dates? This isn’t the only time it’s made horribly easy date mistakes
Ian Chi tweet media
English
46
0
107
64.3K
Elroy Tracey รีทวีตแล้ว
Vadim
Vadim@VadimStrizheus·
POV: your OpenClaw after you didn’t set up a second brain system. Paste this prompt to fix that: 👇 I want you to build me a second brain memory system. Create a memory/ folder and a MEMORY.md file in your workspace. Every session, read these FIRST before doing anything, they are your entire memory. memory/YYYY-MM-DD.md are your daily journals. As we talk each day, log everything in real-time - decisions, tasks, preferences, context, mistakes. Timestamp each entry. These are your raw notes. MEMORY.md is your long-term memory. This is curated, who I am, my goals, my preferences, active projects, lessons learned, key decisions and why. Every few days, review your daily journals and distill the important stuff into here. The rule: if it's not written to a file, you don't remember it. When I say "remember this", write it immediately. When you make a mistake,document it so you never repeat it. When you learn something about me update MEMORY.md. Over time you should know my communication style, what I care about, what annoys me, my projects, my goals. After a week this should feel like a real assistant that actually knows me. After a month, indispensable.
Vadim tweet media
English
60
60
1K
143.7K
Elroy Tracey รีทวีตแล้ว
Elroy Tracey รีทวีตแล้ว
Emre Elbeyoglu
Emre Elbeyoglu@elbeyoglu·
I built markdown.new Put markdown.new before any URL → get clean Markdown back. Cloudflare's Markdown for Agents is great, but only works for enabled sites. markdown.new works for ANY website on the internet. 80% fewer tokens. Also converts PDFs, images, audio. Free. No signup. markdown.new
Cloudflare@Cloudflare

Time to consider not just human visitors, but to treat agents as first-class citizens. Cloudflare’s network now supports real-time content conversion to Markdown at the source using content negotiation headers. cfl.re/4ksZQ1S

English
196
314
4.3K
1.1M