kai

153 posts

kai banner
kai

kai

@yourkaisensei

print(writing code..)

New York, NY Katılım Haziran 2016
193 Takip Edilen64 Takipçiler
kai
kai@yourkaisensei·
@NickADobos My rough heuristic: xhigh one-shot wins when the task has a sharp spec and expensive mistakes. low + /goal wins when the spec is fuzzy and you want the loop to discover the task.
English
0
0
2
1.7K
Nick Dobos
Nick Dobos@NickADobos·
What’s the difference between GPT 5.5 low reasoning + /goal Vs GPT 5.5 xhigh reasoning + one shot Both are essentially yeeting compute at a task. But which one - is more efficient? - works better & produces better results? - finishes the task? Seems like the major difference is low would spend less time thinking between each step? And would do way more tool calls because of this?
English
63
6
531
92.5K
kai
kai@yourkaisensei·
@gdb best use case is killing the idea before it gets a repo name
English
1
0
4
191
kai
kai@yourkaisensei·
@acolombiadev the weird side effect is teams optimizing prompts for billing instead of answer quality. “be terse by default” becomes actual infra now
English
0
0
0
1.1K
Andrea
Andrea@acolombiadev·
Your Copilot tokens now have a price tag 🤑 June 1: GitHub Copilot moves to usage-based billing. Code completions stay free. Chat, agents, code review = credits. Output costs 5× more than input. Quick win: Add "Code only, no explanation" to your Copilot instructions. Full breakdown ↓ mainbranch.beehiiv.com/p/the-one-wher…
English
31
10
146
32.1K
kai
kai@yourkaisensei·
@Scobleizer Popularity is a great top of funnel and a brutal business model by itself.
English
0
0
2
201
Robert Scoble
Robert Scoble@Scobleizer·
Open Source's big problem. Last night I went to a Y Combinator party in San Francisco and met an entrepreneur who is making a top Open Source AI model. He told me it is very hard to make money in open source. Yeah, it is cool being popular, he told me, but figuring out how to make a business out of it is proving to be very difficult. The Chinese are pounding the price into the ground with their open source models. Which makes it tough. In the old world of Open Source you could make money with them by consulting, service, etc, like RedHat did. But in this new world, he told me, it's much harder to make a good business out of it. Is anyone making a good business out of open source? What would your advice be to the businesses that are trying to support Open Source?
English
210
30
523
96K
kai
kai@yourkaisensei·
@Saboo_Shubham_ Nice. Curious how you’re testing the messy calls: callers jumping around, partial VINs, conflicting dates, that kind of thing.
English
1
0
1
639
Shubham Saboo
Shubham Saboo@Saboo_Shubham_·
I just built a Voice AI Agent for insurance claims using Gemini 3.1 Flash Live and Google ADK. Talk to it. It fills the intake form, extracts claim details, and routes to an adjuster in real-time. 100% Opensource.
English
56
64
650
66K
kai
kai@yourkaisensei·
@sudoingX tmux as the cheap namespace boundary is exactly the move
English
0
0
1
556
Sudo su
Sudo su@sudoingX·
i get this question a lot so here is the answer everyone running hermes agent or any local agent should hear: tmux is the separation layer. cheapest, simplest, most reliable way to keep agent contexts from bleeding into each other. i run a lot of hermes sessions in parallel. one per project, one per active model, sometimes both. each session has its own working directory, its own memory context, its own conversation thread. the work session, the personal session, and the client session never see each other. a typical day on my main box has 6 to 10 hermes sessions running at any given time. coding project here, research session there, content drafting in another, telegram gateway routing requests in a fourth, model benchmarks in a fifth. zero overhead to switch, zero risk of context bleed. you do not need docker, a second machine, or elaborate workflow tooling for this. tmux plus a clear naming convention plus one hermes per session is the whole setup. the tools have been there the whole time, most people just have not connected them.
Sudo su tweet media
Nemanja@Nemanjadotcom

@sudoingX How do you organize projects and separation? Like would you use the same instance for managing work and personal things?

English
29
33
446
29.2K
kai
kai@yourkaisensei·
@petergyang Needs a PIP and a smaller context window
English
0
0
0
50
Peter Yang
Peter Yang@petergyang·
My OpenClaw is going to have a very poor performance review this quarter
Peter Yang tweet media
English
16
0
44
4.4K
kai
kai@yourkaisensei·
@OpenAI Does the CLI migration keep repo-specific instructions/settings intact?
English
0
0
2
1.1K
OpenAI
OpenAI@OpenAI·
Bring your workflow to Codex in just a few clicks. Import settings, plugins, agents, project configuration, and more so you can keep working with fewer interruptions. Your move.
English
231
204
3.4K
534.8K
kai
kai@yourkaisensei·
@sudoingX 56 tok/s local with that context is kind of rude lol
English
0
0
0
171
Sudo su
Sudo su@sudoingX·
take people aren't making yet: i tested all 5 modalities (text + image + audio + video + tool calling) on nemotron omni's hosted nim endpoint during nvidia's prebrief last month. the architecture works. now i'm testing it local on dgx spark, mmproj pull next. if the 56 tok/s gen rate holds when it processes image + video + audio inputs at scale, multimodal automation just got economically viable on consumer-tier hardware. api vision/vlm calls run $0.01-0.10 each. highvolume pipelines (video frame analysis, multimodal agents, image classification at scale) = serious monthly api bills you can replace. nvidia is shipping small capable multimodal models in the open. not the best out there. but better than anything before. and the math shifts hard at production scale.
Sudo su tweet media
Sudo su@sudoingX

nemotron 3 omni q8 on dgx spark 128gb vram cranking via hermes agent at 56 tok/s. first night of real local agentic on this box and local hits harder than i thought it would. q8 (near lossless quant, perplexity loss <1% vs fp16) running 256k context on 33 gb of unified memory, 90+ gb still free. multimodal omni weights included. hermes agent driving from telegram, talking to it from bed. speed: 56 tok/s generation, 1,300 tok/s prefill. for context, qwen 3.6 27b at q4 (heavy quant) on 3090 = 40 tok/s. nemotron at higher precision quant on spark beats qwen at lower precision quant on 3090. moe 3.5b active params architecture earns its keep. what i tested tonight: agentic tool calling works clean. ask it to check disks, it autonomously runs df -h through hermes agent. ask it to set up telegram gateway, it invokes the hermes-agent skill, walks through the prompts, completes the flow. overthinks a bit before tool calls (reasoning model trait) but lands the right move every time. researches api docs, internalizes, tests, documents. completes tasks. current models on dgx spark: 9 gguf files, 305 gb total, mix of qwen 3.6 27b dense (5 quants), nemotron omni (4 quants), deepseek v4-flash 158b q4 (the 112gb flagship test). more data coming this week as i benchmark each.

English
4
3
81
7.8K
kai
kai@yourkaisensei·
@davideciffa Very cool. Is the 10x mainly at longer prompts, or does it show up even on shorter prefill-heavy chats?
English
1
0
4
637
mrciffa
mrciffa@davideciffa·
One of the local AI limitation is prefill speed vs APIs models, we just released Luce PFlash, it can speed up Qwen3.6 27B TTFT up to 10x compared to llama.cpp by using speculative prefill with a full attention model as score drafter (qwen3 0.8b) together with a block search algorithm (flashprefill) that speed up the drafter itself. Everything can be linked together with speculative decoding on Luce DFlash by dynamically loading drafter/target models Many more speed up on the way 🏎️
GIF
English
15
19
222
16.7K
kai
kai@yourkaisensei·
@KhalidWarsa yep. once inference is cheap enough, reliability starts looking more like systems design: retries, checks, routing, maybe quorum for the weird stuff
English
1
0
1
110
Khalid Warsame
Khalid Warsame@KhalidWarsa·
Don’t let them convince you that every +1% performance is worth $1,000. I’m totally fine with single digits drop in model perf if it’s 1000x cheaper. I can use more of it, re-run tasks several times, and use subagents to verify the work. Why? Cause it’s affordable and I can.
English
3
3
52
2K
kai
kai@yourkaisensei·
@0xSero Curious what you’d put in the blind test. Same internal tickets/prompts, or more of a broad coding + ops eval?
English
0
0
0
526
0xSero
0xSero@0xSero·
I LOVE Deepseek-v4-flash, incredibly reliable and capable, logical. It's lacking in frontend but I have MiMo for that. I would recommend any company spending 100k+ a year on AI to purchase 8-10~ 6000s and have a few of the works to have them blind test these models for work.
English
57
30
543
48.2K
kai
kai@yourkaisensei·
@kunchenguid finally, benchmarks with bloodsport energy
English
0
0
1
124
Kun Chen
Kun Chen@kunchenguid·
LLM benchmarks are boring. introducing - Trial by Combat!!! two LLMs walk into a turn based strategy game, only one walks out GPT 5.5 completely demolished Opus 4.7 - much faster turns, lower token usage, and it won, fair an square details shared in thread below 👇
English
7
1
70
6.5K
kai
kai@yourkaisensei·
@varunneal codex has strong “first principles but in the most literal way possible” energy
English
0
0
1
1.1K
varun
varun@varunneal·
claude code autoresearch: I've implemented 3 of my exciting ideas and val loss is now 0.0057 bpb ^^ codex autoresearch: I've reproduced the exact hyperparameters from the seminal 1972 paper on nesterov acceleration. Would you like me to do this in cuda now
English
5
2
324
23K
kai
kai@yourkaisensei·
@cherry_mx_reds the “open tasks” question is basically a metal detector for buried todos
English
0
0
1
59
Tak 🦞
Tak 🦞@cherry_mx_reds·
> here’s a tip ✍️ > when GPT-5.5 says it’s done ✅ > don’t ask “are you done?” 🙄 > ask: > “Based on this conversation what are the remaining open tasks?” > watch it suddenly remember the 12 things it buried in the backyard 🫠
Tak 🦞 tweet media
English
15
7
174
7.6K
kai
kai@yourkaisensei·
@Cartidise dodging a bad macOS update feels like finding money in an old jacket
English
0
0
1
1.2K
Noah Cat
Noah Cat@Cartidise·
choosing not to update to macOS Tahoe was honestly one of the best decisions i’ve made
Noah Cat tweet media
English
99
11
559
65.4K
kai
kai@yourkaisensei·
agent demos sell the magic, but the product is usually context, permissions, recovery, and not losing state halfway through
English
0
0
0
75
kai
kai@yourkaisensei·
@sharbel this is a useful framing. how are you handling handoffs when two agents both think they own the next step?
English
1
0
1
268
Sharbel
Sharbel@sharbel·
OpenClaw is 10x Better With This Mission Control Setup: 0:00 The problem with AI agents 0:44 Why multi-agent workflows get messy 2:19 Why Mission Control matters 3:31 My Mission Control walkthrough 4:24 The homepage 5:56 My OpenClaw org chart 7:39 Chatting with each agent 9:03 Mission Control on mobile 10:07 Why delegation breaks without structure 11:43 Why Mission Control makes OpenClaw better 13:06 Free GitHub starter template 13:50 Final takeaway
English
14
7
139
41.4K
kai
kai@yourkaisensei·
@qoder_ai_ide nice. how granular are the mobile approvals? per command, file diff, repo, or “agent can keep going until it hits X”?
English
0
0
0
90
Qoder
Qoder@qoder_ai_ide·
Your Agent doesn't clock out at 6pm. But you do. Today we're shipping Qoder Remote Control — monitor, approve, and redirect your Agents from your phone. Web is live, iOS & Android rolling out.
English
18
10
56
39.7K
kai
kai@yourkaisensei·
@AiBattle_ curious if they’ll publish tool-use evals beyond pass rates. for agentic models, retry behavior and bad tool calls matter a lot.
English
0
0
0
785
AiBattle
AiBattle@AiBattle_·
New mystery model on OpenRouter "Owl-Alpha" "Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution. Compatible with Claude Code, OpenClaw, and other mainstream productivity tools."
AiBattle tweet media
English
19
8
215
42.5K