toriset

383 posts

toriset

@torisetxd

The real toriset @toriset on discord

China Katılım Nisan 2024

43 Takip Edilen11 Takipçiler

toriset@torisetxd·8h

@rattecs cs2

5.7K

ratte 🟪@rattecs·10h

what do these three colours remind you of??

English

213

1.6K

219.9K

toriset@torisetxd·14h

@stolsvik @teortaxesTex its from google, its just like batching, just higher priority slightly

English

105

Endre Stølsvik@stolsvik·17h

@teortaxesTex They have made a new thing on OpenRouter; Service Tiers. So there is a «Slow».

English

7.5K

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·18h

Google should do a reverse-Anthropic and add Gemini 3.5-SLOW 3-Flash was fast enough, and its cost made it attractive this thing is maybe better in some ways but that's not worth the 3x price hike

BURKOV@burkov

I was super excited reading this and was just about to ask Codex to replace, in ChapterPal, all calls to Gemini 3 Flash with Gemini 3.5 Flash, but then I checked the price and decided to stick to Gemini 3 Flash. The new Flash is priced almost like 3.1 Pro, so I'm expecting that the new 3.5 Pro will be priced even higher. If (or rather when) Google phases out 3 Flash, I'm not sure I'll know what to replace it with for visual reasoning about documents. Gemma 4 31B is almost as good as 3 Flash, but it's only available via OpenRouter where, so far, it's unreliable and slow. Any suggestions?

English

6.3K

toriset@torisetxd·14h

@trydotworks @xiong_hui_chen Qwen 3 Max was 1T, so likely around that.

English

[email protected]@trydotworks·17h

@xiong_hui_chen Can you say how big Max is actually?

English

1.5K

xiong-hui (barry) chen@xiong_hui_chen·23h

our qwen 3.7 max is released, try our best toward agentic frontier🚀 qwen.ai/blog?id=qwen3.7 #qwen

English

876

52.5K

toriset@torisetxd·1d

@alexworkmode testing in prod?

English

alex@alexworkmode·1d

testing something

English

156

6.6K

toriset@torisetxd·1d

@LukeParkerDev check back in a month please, too expensive for a flash-named model, should've used a different name if you're gonna raise the price 3x

English

477

Luke Parker@LukeParkerDev·1d

Are you guys finally with me about Google, or do I need to check back in a month?

English

115

17.2K

toriset@torisetxd·1d

@sama wouldnt the plan behind this be rather to sell the spare compute you guys have? i assume the sheer amount of expansion leads to a lot of being wasted if training isnt actively running

English

779

Sam Altman@sama·1d

we will offer this until we sell out of our current allocation for this program. (we will make sure to leave enough capacity for ChatGPT, Codex, etc.) we plan to offer it again in the future; our intention remains to build as much compute as fast as we can.

English

101

939

155.8K

Sam Altman@sama·1d

customers are increasingly asking us for certainty on capacity. as models get better, we expect that the world will be capacity-constrained for some time. we are offering discounted tokens for 1-3 year commits. (it also helps us plan, so hopefully a big win-win.)

OpenAI@OpenAI

Introducing OpenAI Guaranteed Capacity: a new offering that enables customers to guarantee long-term access to OpenAI compute. We’ve made long-term investments in infrastructure, partnerships, and capacity planning to help customers scale reliably. Now, Guaranteed Capacity helps customers plan ahead for critical workloads in a compute-constrained world. openai.com/guaranteed-cap…

English

656

225

5.4K

1.1M

toriset@torisetxd·1d

@draecomino groq already does this when its not under load, and also its 32B active parameters, not nearly as impressive as actually a 1T param dense model.

English

226

James Wang@draecomino·1d

Cerebras sets a new record: a one trillion parameter model @ 1,000 tokens/s

Cerebras@cerebras

Cerebras is now running Kimi K2.6 – a trillion parameter model – in enterprise trials. At ~1,000 tokens/s, this is the fastest frontier model performance ever measured by Artificial Analysis @ArtificialAnlys.

English

536

87.6K

toriset@torisetxd·1d

@AdamHoltererer what skill is this? Taste?

English

3.2K

Adam Holter@AdamHoltererer·1d

Gemini 3.5 Flash Frontend Test: Without Skill vs. With Skill

English

294

43.5K

toriset@torisetxd·1d

@cerebras @LukeKabbash @ArtificialAnlys @Kimi_Moonshot per I/O on TPU 8i it can reach up to 1300 t/s

English

351

Cerebras@cerebras·1d

@LukeKabbash @ArtificialAnlys @Kimi_Moonshot Gemini Flash is <300 TPS per Google

English

116

13.8K

Cerebras@cerebras·1d

English

166

309

4.2K

772.8K

toriset@torisetxd·1d

@OfficialLoganK @jessethanley @cursor_ai @claudeai @OpenAI flash definitely not from what we've seen..

English

Logan Kilpatrick@OfficialLoganK·27 Oca

@jessethanley @cursor_ai @claudeai @OpenAI Model costs will keep going down

English

410

51.7K

˗ˏˋ Jesse Hanley ˎˊ˗@jessethanley·27 Oca

something I keep thinking about: @cursor_ai I hit $1-2k/mo in overages on the ultra plan + have a @claudeai max $200 plan that limits out + looking at a @OpenAI codex plan too all three are heavily subsidised and if i was to use the API I would likely burn +$5k/mo in tokens. how long can this gravy chain be sustained?

English

141

48.5K

toriset@torisetxd·2d

@0xSero 700M in a day is crazy, i barely get that in 3 weeks

English

115

0xSero@0xSero·2d

8.8B this month on Codex. 6.3B in Droid 1.5B in Claude Weak

English

6.4K

toriset@torisetxd·2d

@pigeon__s @teortaxesTex that makes no sense, pro is their leading model, that would be the direct competitor to 5.5/5.6, and a competitor to openai's Pro models is DeepThink.

English

ρ:ɡeσn@pigeon__s·2d

@teortaxesTex 3.5 Flash kinda does have to beat 5.5 even though itll be cheaper because oAI are just gonna release 5.6 probably in like 2 weeks and Google takes way longer than oAI to ship so they need a 5.6 competitior not a 5.5 competitor so Flash needs to beat oAI's previous best

English

1.3K

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·2d

Google is held to an unreasonably high standard. Flash would ordinarily be 10% of the cost of GPT-5.5. We're not in the age where Only Google hill-climbs hard math. Even fucking Anthropic ships models that beat… DeepSeek. Everyone is serious now.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

spicylemonade@spicey_lemonade

If this is Gemini 3.2 flash, it's strictly worse than gpt 5.5 at math. Fails MathArena Apex #11 and #12/ IMO p6

English

218

22.6K

toriset retweetledi

Elaina@Elaina43114880·2d

@torisetxd Yes, it was already relatively low to begin with, but on the road to AGI, we should never be satisfied!😸

English

295

toriset@torisetxd·2d

@HermesAgentTips the fact that its dense probably.

English

Hermes Agent Tips@HermesAgentTips·2d

what makes Qwen 3.6 27B the best local LLM model compared to the others in its class?

English

14.2K

toriset@torisetxd·2d

@ZaiforStartups i think its actually the code quality, at the beginning the code is okay but it becomes sloppy very quickly once you stop trying to care because it takes a lot more work

English

885

Z.ai for Startups@ZaiforStartups·2d

The hardest problem in AI agents may no longer be intelligence. It’s coordination. Multi-agent systems are failing 41–87% of the time — mostly from coordination breakdowns, not model weakness. which means: the next infrastructure layer isn’t smarter models. It could be systems that keep agents aligned, verified, and on track.

English

417

36.4K

toriset@torisetxd·2d

@Elaina43114880 that post might be a teaser at it.. but its a pretty hard task while still maintaining the very niche knowledge gemini has, 50% is already a very good number imo

English

535

Elaina@Elaina43114880·2d

Can I expect the new Gemini models to significantly reduce hallucination? 😹

Logan Kilpatrick@OfficialLoganK

Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?

English

101

11.5K

toriset@torisetxd·2d

@OfficialLoganK well, they maybe might by being consistently uncertain

English

436

Logan Kilpatrick@OfficialLoganK·2d

Why don’t LLM’s just tell you when you are asking a question / doing something that is out of distribution?

English

313

228.2K

toriset@torisetxd·2d

@pigeon__s deepseek v4 pro is really good ngl

English

ρ:ɡeσn@pigeon__s·2d

qwen and deepseek have dissapointed me recenlty but i cant really think of the last time ive been dissapointed by Moonshot im SO excited for K3!

ρ:ɡeσn@pigeon__s

Kimi-K2.6 has finally been added to ECI! my favorite benchmark since it covers basically every domain in existence so K2.6 is just strictly a better model in every possible capability sense (pretty much) than Gemini 3 Flash and around Sonnet 4.6 level Ants CURRENT mid model

English

639

toriset retweetledi

Logan Kilpatrick@OfficialLoganK·3d

The model is the product

English

209

1.9K

151.2K

toriset@torisetxd·3d

@bindureddy 50% cheaper would ruin their overall margins (not just gross/raw)

English

Bindu Reddy@bindureddy·4d

The Gemini Pro model is rumored to be a GPT 5.5 level coding model 🧐 The catch - it will be more than 50% cheaper at $12/1M output token Gemini will take the lead over both GPT 5.5. and Opus 4.7 on the price-performance curve

English

141

1.1K

60.7K

Keşfet

@rattecs @stolsvik @teortaxesTex @trydotworks @xiong_hui_chen @alexworkmode @LukeParkerDev @sama