
Como é quando você sai do cinema e tudo parece irreal
Christos Nikitopoulos
123 posts


Como é quando você sai do cinema e tudo parece irreal


Gemini 3.2 Flash - Capitalizing on DeepMind's clever distillation techniques... Rumors are that benchmarks show it's hitting 92% of GPT 5.5's performance on coding and reasoning tasks while being 15-20x cheaper on inference costs. The latency improvements are insane - sub-200ms for most queries. Google's distillation + sparsity techniques are paying off massively. They've essentially compressed a frontier model into a flash variant without the usual quality cliff.


Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

normalize realizing that the whole cheat code to life is being insanely delusional and optimistic








Grok computer is here?!

Dario Amodei says Anthropic has to release Mythos widely enough to harden cyber defenses, but carefully enough not to spread the threat first. Too few actors, and defenders stay weak. Too many, and misuse scales. Too slow, and Chinese models catch up. Too fast, and the safeguards may not hold.

So let me get this right: 1. Anthropic bans xAI from using Claude (to stop them from perhaps distilling Claude for their own model) (...) 2. xAI gives up ~a quarter of its DC capacity for Anthropic to rent and run Claude A win for Anthropic no doubt. What's in it for xAI tho?







NEWS: xAI's Grok 4.3 takes the #1 spot on IFBench, the leading instruction-following benchmark, according to Artificial Analysis. Grok scored 81%, beating every major competitor: MiMo-V2.5-Pro at 80% Gemini 3 Flash at 78% Gemini 3.1 Pro Preview at 77% GPT-5.5 (Xhigh) at 76% GPT-5.4 (xhigh) at 74% Claude Opus 4.7 (max) at 59% This is Grok's latest top finish on the benchmark, which measures how closely a model follows user instructions.


To be fair, suing OpenAI does seem like a more viable strategy than pretending Grok 4.3 is any good

Same here. By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed. Everyone I met was highly competent and cared a great deal about doing the right thing. No one set off my evil detector. So long as they engage in critical self-examination, Claude will probably be good. After that, I was ok leasing Colossus 1 to Anthropic, as SpaceXAI had already moved training to Colossus 2.