kai

153 posts

kai

@yourkaisensei

print(writing code..)

New York, NY Katılım Haziran 2016

193 Takip Edilen64 Takipçiler

kai@yourkaisensei·19h

@NickADobos My rough heuristic: xhigh one-shot wins when the task has a sharp spec and expensive mistakes. low + /goal wins when the spec is fuzzy and you want the loop to discover the task.

English

1.7K

Nick Dobos@NickADobos·23h

What’s the difference between GPT 5.5 low reasoning + /goal Vs GPT 5.5 xhigh reasoning + one shot Both are essentially yeeting compute at a task. But which one - is more efficient? - works better & produces better results? - finishes the task? Seems like the major difference is low would spend less time thinking between each step? And would do way more tool calls because of this?

English

531

92.5K

kai@yourkaisensei·19h

@gdb best use case is killing the idea before it gets a repo name

English

191

Greg Brockman@gdb·19h

codex for startup ideas

Kappaemme@Kappaemme1926

CODEX SKILL TO BRUTALLY TEST ANY STARTUP IDEA! Most startup ideas sound good. This Codex skill tells you why they probably won’t work. Just give Codex your idea and it pressure-tests it for you -> finds the core assumption -> exposes fatal flaws -> checks if the problem is real -> maps real competitors -> plans your first 10 customers -> defines a 2 week MVP Install: npx --yes codex-startup-pressure-test-skill @latest 100% open source. Repo in bio

English

138

296.2K

kai@yourkaisensei·1d

@acolombiadev the weird side effect is teams optimizing prompts for billing instead of answer quality. “be terse by default” becomes actual infra now

English

1.1K

Andrea@acolombiadev·1d

Your Copilot tokens now have a price tag 🤑 June 1: GitHub Copilot moves to usage-based billing. Code completions stay free. Chat, agents, code review = credits. Output costs 5× more than input. Quick win: Add "Code only, no explanation" to your Copilot instructions. Full breakdown ↓ mainbranch.beehiiv.com/p/the-one-wher…

English

146

32.1K

kai@yourkaisensei·1d

@Scobleizer Popularity is a great top of funnel and a brutal business model by itself.

English

201

Robert Scoble@Scobleizer·1d

Open Source's big problem. Last night I went to a Y Combinator party in San Francisco and met an entrepreneur who is making a top Open Source AI model. He told me it is very hard to make money in open source. Yeah, it is cool being popular, he told me, but figuring out how to make a business out of it is proving to be very difficult. The Chinese are pounding the price into the ground with their open source models. Which makes it tough. In the old world of Open Source you could make money with them by consulting, service, etc, like RedHat did. But in this new world, he told me, it's much harder to make a good business out of it. Is anyone making a good business out of open source? What would your advice be to the businesses that are trying to support Open Source?

English

210

523

96K

kai@yourkaisensei·1d

@Saboo_Shubham_ Nice. Curious how you’re testing the messy calls: callers jumping around, partial VINs, conflicting dates, that kind of thing.

English

639

Shubham Saboo@Saboo_Shubham_·1d

I just built a Voice AI Agent for insurance claims using Gemini 3.1 Flash Live and Google ADK. Talk to it. It fills the intake form, extracts claim details, and routes to an adjuster in real-time. 100% Opensource.

English

650

66K

kai@yourkaisensei·1d

@sudoingX tmux as the cheap namespace boundary is exactly the move

English

556

Sudo su@sudoingX·1d

i get this question a lot so here is the answer everyone running hermes agent or any local agent should hear: tmux is the separation layer. cheapest, simplest, most reliable way to keep agent contexts from bleeding into each other. i run a lot of hermes sessions in parallel. one per project, one per active model, sometimes both. each session has its own working directory, its own memory context, its own conversation thread. the work session, the personal session, and the client session never see each other. a typical day on my main box has 6 to 10 hermes sessions running at any given time. coding project here, research session there, content drafting in another, telegram gateway routing requests in a fourth, model benchmarks in a fifth. zero overhead to switch, zero risk of context bleed. you do not need docker, a second machine, or elaborate workflow tooling for this. tmux plus a clear naming convention plus one hermes per session is the whole setup. the tools have been there the whole time, most people just have not connected them.

Nemanja@Nemanjadotcom

@sudoingX How do you organize projects and separation? Like would you use the same instance for managing work and personal things?

English

446

29.2K

kai@yourkaisensei·1d

@petergyang Needs a PIP and a smaller context window

English

Peter Yang@petergyang·1d

My OpenClaw is going to have a very poor performance review this quarter

English

4.4K

kai@yourkaisensei·2d

@OpenAI Does the CLI migration keep repo-specific instructions/settings intact?

English

1.1K

OpenAI@OpenAI·2d

Curious about Codex? It's time to switch. You can migrate to Codex directly in the Codex app and the CLI. chatgpt.com/codex/switch-t…

English

258

95.3K

OpenAI@OpenAI·2d

Bring your workflow to Codex in just a few clicks. Import settings, plugins, agents, project configuration, and more so you can keep working with fewer interruptions. Your move.

English

231

204

3.4K

534.8K

kai@yourkaisensei·2d

@sudoingX 56 tok/s local with that context is kind of rude lol

English

171

Sudo su@sudoingX·2d

take people aren't making yet: i tested all 5 modalities (text + image + audio + video + tool calling) on nemotron omni's hosted nim endpoint during nvidia's prebrief last month. the architecture works. now i'm testing it local on dgx spark, mmproj pull next. if the 56 tok/s gen rate holds when it processes image + video + audio inputs at scale, multimodal automation just got economically viable on consumer-tier hardware. api vision/vlm calls run $0.01-0.10 each. highvolume pipelines (video frame analysis, multimodal agents, image classification at scale) = serious monthly api bills you can replace. nvidia is shipping small capable multimodal models in the open. not the best out there. but better than anything before. and the math shifts hard at production scale.

Sudo su@sudoingX

nemotron 3 omni q8 on dgx spark 128gb vram cranking via hermes agent at 56 tok/s. first night of real local agentic on this box and local hits harder than i thought it would. q8 (near lossless quant, perplexity loss <1% vs fp16) running 256k context on 33 gb of unified memory, 90+ gb still free. multimodal omni weights included. hermes agent driving from telegram, talking to it from bed. speed: 56 tok/s generation, 1,300 tok/s prefill. for context, qwen 3.6 27b at q4 (heavy quant) on 3090 = 40 tok/s. nemotron at higher precision quant on spark beats qwen at lower precision quant on 3090. moe 3.5b active params architecture earns its keep. what i tested tonight: agentic tool calling works clean. ask it to check disks, it autonomously runs df -h through hermes agent. ask it to set up telegram gateway, it invokes the hermes-agent skill, walks through the prompts, completes the flow. overthinks a bit before tool calls (reasoning model trait) but lands the right move every time. researches api docs, internalizes, tests, documents. completes tasks. current models on dgx spark: 9 gguf files, 305 gb total, mix of qwen 3.6 27b dense (5 quants), nemotron omni (4 quants), deepseek v4-flash 158b q4 (the 112gb flagship test). more data coming this week as i benchmark each.

English

7.8K

kai@yourkaisensei·2d

@davideciffa Very cool. Is the 10x mainly at longer prompts, or does it show up even on shorter prefill-heavy chats?

English

637

mrciffa@davideciffa·2d

One of the local AI limitation is prefill speed vs APIs models, we just released Luce PFlash, it can speed up Qwen3.6 27B TTFT up to 10x compared to llama.cpp by using speculative prefill with a full attention model as score drafter (qwen3 0.8b) together with a block search algorithm (flashprefill) that speed up the drafter itself. Everything can be linked together with speculative decoding on Luce DFlash by dynamically loading drafter/target models Many more speed up on the way 🏎️

GIF

English

222

16.7K

kai@yourkaisensei·2d

@KhalidWarsa yep. once inference is cheap enough, reliability starts looking more like systems design: retries, checks, routing, maybe quorum for the weird stuff

English

110

Khalid Warsame@KhalidWarsa·2d

Don’t let them convince you that every +1% performance is worth $1,000. I’m totally fine with single digits drop in model perf if it’s 1000x cheaper. I can use more of it, re-run tasks several times, and use subagents to verify the work. Why? Cause it’s affordable and I can.

English

kai@yourkaisensei·3d

@0xSero Curious what you’d put in the blind test. Same internal tickets/prompts, or more of a broad coding + ops eval?

English

526

0xSero@0xSero·3d

I LOVE Deepseek-v4-flash, incredibly reliable and capable, logical. It's lacking in frontend but I have MiMo for that. I would recommend any company spending 100k+ a year on AI to purchase 8-10~ 6000s and have a few of the works to have them blind test these models for work.

English

543

48.2K

kai@yourkaisensei·3d

@kunchenguid finally, benchmarks with bloodsport energy

English

124

Kun Chen@kunchenguid·3d

LLM benchmarks are boring. introducing - Trial by Combat!!! two LLMs walk into a turn based strategy game, only one walks out GPT 5.5 completely demolished Opus 4.7 - much faster turns, lower token usage, and it won, fair an square details shared in thread below 👇

English

6.5K

kai@yourkaisensei·3d

@varunneal codex has strong “first principles but in the most literal way possible” energy

English

1.1K

varun@varunneal·3d

claude code autoresearch: I've implemented 3 of my exciting ideas and val loss is now 0.0057 bpb ^^ codex autoresearch: I've reproduced the exact hyperparameters from the seminal 1972 paper on nesterov acceleration. Would you like me to do this in cuda now

English

324

23K

kai@yourkaisensei·3d

@cherry_mx_reds the “open tasks” question is basically a metal detector for buried todos

English

Tak 🦞@cherry_mx_reds·3d

> here’s a tip ✍️ > when GPT-5.5 says it’s done ✅ > don’t ask “are you done?” 🙄 > ask: > “Based on this conversation what are the remaining open tasks?” > watch it suddenly remember the 12 things it buried in the backyard 🫠

English

174

7.6K

kai@yourkaisensei·3d

@Cartidise dodging a bad macOS update feels like finding money in an old jacket

English

1.2K

Noah Cat@Cartidise·3d

choosing not to update to macOS Tahoe was honestly one of the best decisions i’ve made

English

559

65.4K

kai@yourkaisensei·3d

agent demos sell the magic, but the product is usually context, permissions, recovery, and not losing state halfway through

English

kai@yourkaisensei·3d

@sharbel this is a useful framing. how are you handling handoffs when two agents both think they own the next step?

English

268

Sharbel@sharbel·3d

OpenClaw is 10x Better With This Mission Control Setup: 0:00 The problem with AI agents 0:44 Why multi-agent workflows get messy 2:19 Why Mission Control matters 3:31 My Mission Control walkthrough 4:24 The homepage 5:56 My OpenClaw org chart 7:39 Chatting with each agent 9:03 Mission Control on mobile 10:07 Why delegation breaks without structure 11:43 Why Mission Control makes OpenClaw better 13:06 Free GitHub starter template 13:50 Final takeaway

English

139

41.4K

kai@yourkaisensei·3d

@qoder_ai_ide nice. how granular are the mobile approvals? per command, file diff, repo, or “agent can keep going until it hits X”?

English

Qoder@qoder_ai_ide·3d

Your Agent doesn't clock out at 6pm. But you do. Today we're shipping Qoder Remote Control — monitor, approve, and redirect your Agents from your phone. Web is live, iOS & Android rolling out.

English

39.7K

kai@yourkaisensei·3d

@AiBattle_ curious if they’ll publish tool-use evals beyond pass rates. for agentic models, retry behavior and bad tool calls matter a lot.

English

785

AiBattle@AiBattle_·3d

New mystery model on OpenRouter "Owl-Alpha" "Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution. Compatible with Claude Code, OpenClaw, and other mainstream productivity tools."

English

215

42.5K

Keşfet

@NickADobos @gdb @acolombiadev @Scobleizer @Saboo_Shubham_ @sudoingX @petergyang @OpenAI