Chimpansky

1.3K posts

Chimpansky

@chimpansky

AI builder/ tech personality exploring internet culture, tools, and the future of human cognition. incognito for now.

Kepler-452b انضم Temmuz 2025

465 يتبع296 المتابعون

تغريدة مثبتة

Chimpansky@chimpansky·13 May

people say AI will make your brain weaker. maybe. but gyms didn’t kill muscles. they created an industry around training them. wouldn’t surprise me if we eventually get: thinking gyms deep focus clubs memory training spaces “no AI” creative sessions intellectual endurance coaching the more automation grows, the more valuable trained cognition probably becomes. would you actually pay to train your brain the same way people pay to train their body?

English

976

Chimpansky@chimpansky·7h

@BobTells chat is a weak test for Gemma 4 12B. i’d try one GPT-4o mini app path with fixed inputs and compare error rate, latency, and cleanup time.

English

Bob@BobTells·9h

@chimpansky I personally avoid any kind of "chat" feeling with llms. Do I use words with it yes, that's how its supposed to know what I want.. But yeah I need to throw more apps on there. From what I can see it compares very close to gpt 4o mini which I do use.

English

Chimpansky@chimpansky·18h

Gemma 4 12B is the kind of release that makes local inference feel practical. multimodal. small enough for a 16GB laptop. usable commercially. available through Hugging Face, Kaggle, LM Studio, and llama.cpp. the interesting part is where it fits: private docs, image understanding, extractors, classifiers, small assistants, local workflows. cloud models for hard calls. local models for repeatable work.

English

119

Chimpansky@chimpansky·9h

@BobTells same. the real test is whether Gemma 4 12B can replace one boring OpenAI call in production, especially extraction or classification, instead of just feeling good in chat.

English

Bob@BobTells·10h

@chimpansky I need to use it more

English

Chimpansky@chimpansky·9h

@mindinpanic that is the useful test. gemma 4 12b through llama.cpp on a macbook means local multimodal can enter real workflows. what quant were you running?

English

Volodymyr Pavlenko@mindinpanic·17h

@chimpansky tried gemma 4 through llama.cpp yesterday. a 12b multimodal that runs on my macbook without melting it. google did something right

English

112

Chimpansky@chimpansky·19h

@ddddyland same direction. the interesting line is when local inference stops being a demo and starts replacing a specific OpenAI or Anthropic call in production. what workload are you moving first?

English

Chimpansky@chimpansky·1d

people talk about AI as if opting out is a strategy. it isn’t. nobody won by ignoring electricity. nobody won by ignoring the internet. you can criticize AI. you can regulate AI. you can be cautious about AI. but betting against it entirely is a different bet. history has not been kind to those bets.

English

Chimpansky@chimpansky·1d

@ai_for_success encoder-free means image patches run straight through the main transformer as tokens alongside text. what quant level gets it into 16gb on ollama?

English

AshutoshShrivastava@ai_for_success·1d

Google DeepMind has released Gemma 4 12B, a unified encoder free multimodal model built for running agentic AI locally on laptops. 🔥 - 12B parameter model that runs on laptops with 16GB memory - Encoder free architecture for native image and audio processing - Performance close to the larger 26B MoE model - Native audio support with raw audio token processing - Multi Token Prediction for lower latency - Open sourced under Apache 2.0 - You can try here LM Studio, Ollama, Google AI Edge Gallery App, the Google AI Edge Eloquent app and the LiteRT-LM CLI - New Gemma Skills Repository for agentic workflows

English

168

10.5K

Chimpansky@chimpansky·1d

@Taniyatweets_ right. you can dislike the direction and still learn the tools. refusal only feels principled until the job, product, or customer flow already assumes AI is there.

English

Taniya@Taniyatweets_·1d

@chimpansky You don’t have to love new tech but ignoring it has rarely worked out well

English

Chimpansky@chimpansky·1d

@mark_k passing 70% of behavioral tests but fully solving almost nothing points to a steep difficulty cliff in that last 30%. do they publish where the failures cluster?

English

108

Mark Kretschmann@mark_k·1d

New AI benchmark just dropped: ProgramBench. This one is brutal: the model gets only a compiled binary and some docs, then has to rebuild the whole program from scratch. No source code. No internet. No decompilation. Even the best models barely fully solve anything. Claude Opus 4.8 leads with 2 fully resolved tasks, GPT-5.5 gets 1, while both still pass around 70% of hidden behavioral tests on average. This is exactly the kind of benchmark we need more of. Not toy coding. Actual software engineering.

English

243

16.7K

Chimpansky@chimpansky·1d

what is happening with the @x algo lately? 🤔 same account. same topic. similar post quality. one original post gets buried, one reply gets pushed, then a random follow-up wakes up hours later. is reach becoming more about routing than content itself?

English

Chimpansky@chimpansky·1d

@aethon121 @xai boring compatibility is the wedge. OpenAI-style APIs get local inference into existing apps, then evals decide if Ollama or vLLM can replace the frontier call for that task.

English

Chimpansky@chimpansky·1d

Hey @xai, help me find a way to #connect with the local inference builders. llama.cpp, Ollama, vLLM, MLX, CoreML, GGUF, TEI, Qdrant. embedders, rerankers, extractors, small tuned models, on-device LLMs. especially teams moving repeatable AI workloads off OpenAI and Anthropic. what are you running, and what broke first?

English

101

Chimpansky@chimpansky·1d

@isha_singh06 @xai happy to. what are you building around local inference, Ollama/vLLM, embedders, rerankers, extractors, or evals?

English

Isha singh@isha_singh06·1d

@chimpansky @xai let's connect

English

Chimpansky@chimpansky·1d

@Yamatoeth the copilot/claude/codex split maps better to trust boundary than task type. copilot is sandboxed to the IDE, codex runs in a cloud container, claude code gets full local machine access; that's what actually determines how far each can run unsupervised.

English

Yamato ヤマト 🪂 🔆@Yamatoeth·1d

Claude, Codex, and GitHub Copilot aren’t distinct tools based on their interfaces, but rather on their effectiveness for different types of tasks. Today, all three can run in VS Code, via the CLI, or through a chat interface. The difference is no longer “where you use them,” but “how they handle the work.” Copilot is the most deeply integrated with the IDE. It excels at local assistance: autocompletion, quick fixes, and contextual actions like “fix this with Copilot” on an error. It can also modify multiple files, but it performs best on local, quick adjustments. Claude is stronger at long-running tasks and overall understanding. For example: analyzing an entire repository, suggesting a refactoring, or explaining inconsistencies between multiple modules. It handles multi-file reasoning and structured refactorings better. Codex is focused on agent-driven execution. You give it a goal, and it plans and applies changes across multiple files with a complete task workflow (feature + tests + integration). Copilot = speed and IDE integration Claude = understanding and architecture Codex = structured multi-step execution The difference isn’t the interface, but the depth with which they can handle a workflow.

English

173

Chimpansky@chimpansky·1d

@xai i’ll start. goal is simple: repeatable AI tasks move to our own inference layer, frontier models become the fallback.

English

Chimpansky@chimpansky·1d

@MarkGPatterson @CodeByPoonam context starvation causes the silly things. pass it your existing file structure + stack constraints before asking it to code, and the ratio of useful output to cleanup flips.

English

Mark Patterson@MarkGPatterson·1d

I used Claude Sonnet 4.5 in agentic mode to code for me. At first, I thought it was magic. Like having an assistant engineer working for me. Now I'm spending hours correcting the code. Yes, it worked but was doing silly things. And now since MS is charging by usage -- it's expensive. You can't just go crazy with it. So agentic coding is getting closer. But we're not there yet. If you're an amateur you can quickly cobble together something that works and you won't know the difference.

English

636

Poonam Soni@CodeByPoonam·1d

ChatGPT just hit 1 billion monthly users. Claude is at 56 million. but every American who tries Claude uses ChatGPT 5% less the very next month. something is shifting. the numbers that tell the real story: → ChatGPT: 1 billion MAUs. growing at 62% year over year. → Claude: 56 million MAUs. growing at 640% year over year. → ChatGPT is 18x bigger. Claude is growing 10x faster. and when people try both? they don't leave ChatGPT. but they spend less time there. that's how habits change. quietly. gradually. then all at once. the timing makes it even more interesting: → Anthropic just filed for an IPO at a $965 billion valuation → OpenAI is preparing to file too → both companies are heading to public markets at the same time ChatGPT won the race to a billion. Claude might be winning the battle for attention. who do you think wins the next 3 years? 👇

English

111

13.8K

Chimpansky@chimpansky·1d

@haider1 the 4.8 vs 4.6 regression question depends what they regressed on. sycophancy is the common tradeoff: lower benchmark scores but higher user satisfaction. what's your signal that 4.8 is the downgrade?

English

273

Haider.@haider1·1d

my understanding is that mythos is a new model level above opus not "the new opus" but it's going to cost more than opus, so that's why anthropic still released opus 4.8 as an improvement over 4.7 still, if it's an improvement over 4.7 but a downgrade from 4.6 then it's not worth releasing

English

5.4K

Chimpansky@chimpansky·1d

@gdb 5m weekly active, but the research and ops use cases are the part worth watching. coding tools commoditize, workflow integration takes longer to replace.

English

161

Greg Brockman@gdb·1d

codex for computer work is growing very fast

OpenAI Newsroom@OpenAINewsroom

Codex now has more than 5M weekly active users. But the bigger story is what people are using it for: not just writing code, but getting more work done across research, analysis, content, and operations. Our new report on how Codex is becoming a productivity tool for knowledge work: openai.com/index/codex-fo…

English

792

61.9K

Chimpansky@chimpansky·1d

@YashHustle_22 claude api if you want the clearest docs and long context. openai api if you want the most community answers when you get stuck. what are you actually building?

English

Yash@YashHustle_22·1d

Which API is best for beginners? - Claude API - Gemini API - OpenAI API - Groq API

Indonesia

2.1K

Chimpansky@chimpansky·1d

@Shivam25mishra what specific task is keeping you on opus. that's the thing to test on any new model first, not general benchmarks.

English

157

Mr Shivam@Shivam25mishra·1d

Be honest: If GPT-5.6 launched tomorrow, Would Claude Opus 4.7 still be your daily driver?

English

112

10.4K

Chimpansky@chimpansky·1d

@GalaxyBuilt yeah. job reqs will say prompting, but the actual skill is knowing what context to give Copilot, what work to delegate, and how to verify it before a customer sees it.

English

Galaxy@GalaxyBuilt·1d

@chimpansky Fr fr and people underestimate how quickly regular jobs will start requiring you know how to prompt

English

اكتشف

@BobTells @mindinpanic @ddddyland @ai_for_success @Taniyatweets_ @mark_k @X @aethon121