RayBytes

21 posts

RayBytes

@raybytez

Katılım Haziran 2023

13 Takip Edilen0 Takipçiler

RayBytes@raybytez·1d

@thegenioo Provider “Weights and Biases”. I don’t think it’s at 500 tk/s always since I only got one result like that, but it’s plenty fast. 15 seconds for a fully functioning app.

English

Hamza@thegenioo·1d

@raybytez bro which provider on OpenRouter is giving you 500t/s lol

English

438

Hamza@thegenioo·1d

Honestly I think Moonshot fumbled big time with Kimi K2.6 Their previous model K2.5 was just so good, and it just needed that perfect polish and a few bits of upgrades to get a really strong, cheap model. And you know what? That exists! Yes, it is Composer 2.5. This is what Moonshot should have done: K2.5 should have been polished like Composer 2.5 and released as K2.6 Now don't get me wrong, K2.6 is a very powerful and strong model, but it has some issues: - It just overthinks the hell out of things and gets stuck in long, endless thinking loops - It is unbelievably slow, like really slow - It is good, but I found DeepSeek and Qwen models more efficient, workable, and faster So what Cursor has pulled off here with Composer 2.5 should have been done by Moonshot with the release of Kimi K2.6, and I hope they fix these issues with K3

English

234

25.2K

RayBytes@raybytez·1d

@gtbot2007 @Pikaclicks @OsuKanade @ibxtoycat He filed the trademark too late, once he realised there were material gains to be made. If the trademark went through, skyblock as we know it could be incredibly different. With it not being trademarked, anyone can use skyblock as they want to. The alternative makes no sense.

English

GTbot2007@gtbot2007·1d

@Pikaclicks @OsuKanade @ibxtoycat "this person was doing a good thing but uh... what if he wasn't"

English

330

Andrew (Toycat)@ibxtoycat·1d

An update on the "skyblock" legal case: it is now legally a generic term within Minecraft, and not one that can be trademarked.

English

126.8K

RayBytes@raybytez·5d

@christofsalis @theo I mean, there #1 example and something they advertised massively was using thousands of agents with Gemini 3.5 Flash to make an OS, with a full new agentic system push with v2 of Antigravity.

English

Christof Salis@christofsalis·5d

@theo This is an agentic benchmark I am guessing, right? I noticed that google really doesn't seem to be too interested in agents (just my hear-say). Very knowledgeable models but not interessted in chaining tool calls.

English

2.6K

Theo - t3.gg@theo·6d

Oh my god it scored worse than Composer 2! Not even 2.5! And it cost 4x more to run!!! This might be the worst major lab model drop of all time. Llama 4 tier. Insane.

Michael Truell@mntruell

Gemini Flash 3.5 is now on CursorBench, our main coding agent eval. We’ll keep updating the leaderboard as new models come out. cursor.com/evals

English

359

259

6.2K

1.2M

RayBytes@raybytez·6d

@scaling01 It’s more expensive than 3.1 pro in real terms. Check artificial analysis’s cost to run the benchmark. It cost 74% more than 3.1 Pro, while losing to 5.5 Medium which is cheaper than it. ($1199 5.5 Medium vs $1552 3.5 Flash). 5.5 Medium is also smarter (57 score vs 55).

English

159

Lisan al Gaib@scaling01·6d

Gemini 3.5 Flash Pricing confirmed at $1.5 / $9 per mtoks

Lisan al Gaib@scaling01

Gemini 3.5 Flash Benchmarks

English

8.9K

RayBytes@raybytez·18 May

@saltjsx @zxnelli how is this more for normies vs t3chat?

English

salt@saltjsx·18 May

@zxnelli t3 chat isn't really for normies

English

salt@saltjsx·18 May

ChatGPT sucks! So I'm building something way better.

English

222

85.9K

RayBytes@raybytez·17 May

@Getlucky12341 @Kirito1262 It was about money, not about claiming anything. He filed his trade mark claim too late, once everyone in the community was already branching of the idea.

English

1.9K

Getlucky@Getlucky12341·17 May

@Kirito1262 Not wanting other people to claim they have the "og skyblock" is pretty fair tbh

English

249

24K

Getlucky@Getlucky12341·17 May

The original Skyblock map in Minecraft is lost media?

English

3.7K

866K

RayBytes@raybytez·12 May

@rbranson @gajesh GPT-5.5 is currently at 38 tps… And anthropic is running Opus 4.7 at 39 tps. It’s perfectly usable tbh

English

203

ricky b@rbranson·12 May

@gajesh 40tok/s lmao

English

7.7K

ricky b@rbranson·12 May

$30K just to get 3 tokens/s on a 2-bit quant of deepseek v4

Ivan Kuleshov@Merocle

M5 Max cluster 72 CPU and 128 GPU cores, 512GB unified Ram Each MacBook is connected to all the others with Thunderbolt 5 (120Gbit/s). But I’ll have to use Wi-Fi to connect to the cluster

English

2.4K

180.9K

RayBytes@raybytez·9 May

@StoicYield @GrantSlatton costs could be your time, or more concrete backend costs

English

903

StoicYield@StoicYield·9 May

@GrantSlatton If it doesnt cost you anything, why did you have to charge $10 besides greed?

English

6.5K

Grant Slatton@GrantSlatton·9 May

what do the "no such thing as ethical billionaires" people say about an individual who codes a great app that sells 100 million copies for $10 each who was exploited? zero marginal cost goods are weird

English

182

2.9K

160.8K

RayBytes@raybytez·25 Nis

@mihirmodi @AppleBytesPhD @MaxRovensky @Mrwhosetheboss If you decided to look into their claims actually, you’d very quickly find out their claim is just plain wrong.

English

Mihir@mihirmodi·25 Nis

@AppleBytesPhD @MaxRovensky @Mrwhosetheboss No, they said that for a user upgrading from M1, the choice would be between an M4 and M5 and (apparently, as per them) this info is not available.

English

860

Arun Maini@Mrwhosetheboss·25 Nis

Tech companies are basically lying to you: - They compare their new products to ones released MANY years ago (to make it as confusing as possible to figure out how much has actually changed this time) - They invent new specs that mean absolutely nothing (like how much zoom their phone cameras can do) - They write “up to” just before telling you how much their new product has improved by (so you can’t sue them when you don’t get those numbers) And a LOT more So, I decided it was time to break it down with @MKBHD on YouTube - video live now

English

287

422

10.1K

7.5M

RayBytes@raybytez·25 Nis

@Samarmendiratta @GregoryMcFadden @borntoFLY11135 @ajisharul @Mrwhosetheboss His video and what he shows in the title card is the “AI performance” part. Here they compare directly, for e.g “time to first token” (ttft) which is a legitimate way to benchmark performance. And you can see how they got said perf with what model in a footnote at the bottom.

English

131

RayBytes@raybytez·21 Nis

@nahcrof it seems p slow? Do you have plans for a lightning version?

English

269

nahcrof@nahcrof·21 Nis

kimi-k2.6 is now available on CrofAI! $0.55/m input $0.11/m cached $2.70/m output let me know if you have any issues :) (the model is currently set as precision since this was the community choice but if the cost is low enough or I can get it low enough I'll change it)

English

194

16.3K

RayBytes@raybytez·21 Nis

@anKit0017_ @Kimi_Moonshot $0.95 cache miss is high? (esp for this calibre of model?)

English

755

Ankit@anKit0017_·21 Nis

@Kimi_Moonshot Input prices are still high, is K2.6 really worth the cost, or is this just another incremental update?

English

10.1K

Kimi.ai@Kimi_Moonshot·21 Nis

📢 Kimi K2.6 API is live • Input Price (Cache Hit): $0.16 / M tokens • Input Price (Cache Miss): $0.95 / M tokens • Output: $4.00 / M tokens Kimi K2.6 is our latest + most intelligent model - stronger long-horizon coding, better instruction following & self-correction. Native multimodal (text/image/video), thinking + non-thinking, 256K context. Supports tool calls, JSON / Partial mode, and web search. Try it → platform.kimi.ai

English

195

3.2K

765.9K

RayBytes retweetledi

Kimi.ai@Kimi_Moonshot·20 Nis

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…

English

934

2.4K

18.2K

7.5M

RayBytes@raybytez·17 Nis

@bstnxbt Hey, have not been able to replicate your findings on my M1 Pro 16gb. Had made an issue on your repo where other people also found the same problem, but you haven’t responded yet. Mind taking a look?

English

276

bstn 👁️@bstnxbt·17 Nis

Just tested Qwen3.6-35B-A3B-4bit on DFlash. M5 Max, 40-core GPU, stock mlx_lm baseline: ► @ 1024 · 138.98 → 232.64 tok/s (1.67x) · 88.4% acceptance ► @ 2048 · 136.74 → 224.88 tok/s (1.64x) · 88.4% acceptance ► @ 4096 · 128.39 → 170.95 tok/s (1.33x) · 86.3% acceptance Working on custom Metal kernels to improve long-context decode and optimize the quantized model path.

English

207

14.9K

RayBytes@raybytez·7 Nis

@techdroider How is making everything cpu bound a fair setup, especially for an LLM benchmark?

English

243

TechDroider@techdroider·6 Nis

Ran a fresh AI benchmark using Gemma 4 with Temperature set to 0 and Accelerator forced to CPU for a fair comparison, especially since Pixel doesn’t support GPU acceleration in this setup. Devices tested: iPhone 17 Pro Max, Samsung S26 Ultra, Pixel 10 Pro XL, OnePlus 15, Xiaomi 17 Pro Max.

English

239

24.8K

RayBytes@raybytez·29 Mar

@theo I think you’ve just discovered twitter man, not really representative of the majority, just a vocal minority

English

147

Theo - t3.gg@theo·29 Mar

Why can’t the “local model community” have any conversation in good faith? They have to open with lies and conspiracy theories. I’ve never seen a healthy community that spends so much energy lying.

Sudo su@sudoingX

a corporate salesman on an openai paycheck tells you local models aren't there yet. an influencer selling you an API wrapper calls the local AI community on X "cancer." meanwhile we're out here modding communities, helping strangers debug their configs at midnight, fighting spam, pushing open source, and doing it all for free. these people don't want you running models on your own hardware. they want you as a customer. every local install is revenue they lose. every migration from their bloat is a subscription cancelled. don't let corporate noise and engagement bait merchants convert you into their recurring revenue. buy a GPU. compile from source. own your thinking. the community they call cancer is the same community that will help you get started for free while they charge you per token.

English

346

72.1K

RayBytes@raybytez·29 Mar

@ZoellaBolkiah @theo Okay, sure, but then that’s $30 million input tokens, and $150 per million output. Could’ve even used a better example of Gemini 3 flash for e.g. eitherway, the fastest model on openrouter is currently openai’s oss model, so idk if throughput is the main issue

English

Zoe@zosyrai·29 Mar

@raybytez @theo Opus 4.6 fast mode 170t/s

English

109

RayBytes@raybytez·29 Mar

@ZoellaBolkiah @theo GPT 5.4 is 44 t/s

English

112

Zoe@zosyrai·29 Mar

@theo their LLLMs output 60t/s so they have more time to complain and wait until they have to reprompt it again 50 times

English

992

RayBytes@raybytez·6 Mar

@jimmyjames_tech Is Metal comparable to NVIDIA’s OpenCLscore on geekbench? Do they normalise the score somehow?

English

🦊@jimmyjames_tech·6 Mar

GIF

Andrea Ciccarello@CiccaAndre

@Saishiv17 @VadimYuryev @jimmyjames_tech actually it doesn’t, 5080 has a similar core with OpenCL (which is different than Metal). M5 Max with OpenCL will be much lower.

ZXX

1.9K

Keşfet

@thegenioo @gtbot2007 @Pikaclicks @OsuKanade @ibxtoycat @christofsalis @theo @scaling01