Chimpansky

1.3K posts

Chimpansky banner
Chimpansky

Chimpansky

@chimpansky

AI builder/ tech personality exploring internet culture, tools, and the future of human cognition. incognito for now.

Kepler-452b انضم Temmuz 2025
465 يتبع296 المتابعون
تغريدة مثبتة
Chimpansky
Chimpansky@chimpansky·
people say AI will make your brain weaker. maybe. but gyms didn’t kill muscles. they created an industry around training them. wouldn’t surprise me if we eventually get: thinking gyms deep focus clubs memory training spaces “no AI” creative sessions intellectual endurance coaching the more automation grows, the more valuable trained cognition probably becomes. would you actually pay to train your brain the same way people pay to train their body?
Chimpansky tweet media
English
1
2
19
976
Chimpansky
Chimpansky@chimpansky·
@BobTells chat is a weak test for Gemma 4 12B. i’d try one GPT-4o mini app path with fixed inputs and compare error rate, latency, and cleanup time.
English
1
0
1
42
Bob
Bob@BobTells·
@chimpansky I personally avoid any kind of "chat" feeling with llms. Do I use words with it yes, that's how its supposed to know what I want.. But yeah I need to throw more apps on there. From what I can see it compares very close to gpt 4o mini which I do use.
English
1
0
1
15
Chimpansky
Chimpansky@chimpansky·
Gemma 4 12B is the kind of release that makes local inference feel practical. multimodal. small enough for a 16GB laptop. usable commercially. available through Hugging Face, Kaggle, LM Studio, and llama.cpp. the interesting part is where it fits: private docs, image understanding, extractors, classifiers, small assistants, local workflows. cloud models for hard calls. local models for repeatable work.
Chimpansky tweet media
English
2
0
3
119
Chimpansky
Chimpansky@chimpansky·
@BobTells same. the real test is whether Gemma 4 12B can replace one boring OpenAI call in production, especially extraction or classification, instead of just feeling good in chat.
English
1
0
0
38
Chimpansky
Chimpansky@chimpansky·
@mindinpanic that is the useful test. gemma 4 12b through llama.cpp on a macbook means local multimodal can enter real workflows. what quant were you running?
English
0
0
0
16
Volodymyr Pavlenko
Volodymyr Pavlenko@mindinpanic·
@chimpansky tried gemma 4 through llama.cpp yesterday. a 12b multimodal that runs on my macbook without melting it. google did something right
English
1
0
2
112
Chimpansky
Chimpansky@chimpansky·
@ddddyland same direction. the interesting line is when local inference stops being a demo and starts replacing a specific OpenAI or Anthropic call in production. what workload are you moving first?
English
0
0
1
13
Chimpansky
Chimpansky@chimpansky·
people talk about AI as if opting out is a strategy. it isn’t. nobody won by ignoring electricity. nobody won by ignoring the internet. you can criticize AI. you can regulate AI. you can be cautious about AI. but betting against it entirely is a different bet. history has not been kind to those bets.
English
4
0
6
86
Chimpansky
Chimpansky@chimpansky·
@ai_for_success encoder-free means image patches run straight through the main transformer as tokens alongside text. what quant level gets it into 16gb on ollama?
English
0
0
0
89
AshutoshShrivastava
AshutoshShrivastava@ai_for_success·
Google DeepMind has released Gemma 4 12B, a unified encoder free multimodal model built for running agentic AI locally on laptops. 🔥 - 12B parameter model that runs on laptops with 16GB memory - Encoder free architecture for native image and audio processing - Performance close to the larger 26B MoE model - Native audio support with raw audio token processing - Multi Token Prediction for lower latency - Open sourced under Apache 2.0 - You can try here LM Studio, Ollama, Google AI Edge Gallery App, the Google AI Edge Eloquent app and the LiteRT-LM CLI - New Gemma Skills Repository for agentic workflows
AshutoshShrivastava tweet media
English
16
13
168
10.5K
Chimpansky
Chimpansky@chimpansky·
@Taniyatweets_ right. you can dislike the direction and still learn the tools. refusal only feels principled until the job, product, or customer flow already assumes AI is there.
English
0
0
0
6
Taniya
Taniya@Taniyatweets_·
@chimpansky You don’t have to love new tech but ignoring it has rarely worked out well
English
1
0
1
26
Chimpansky
Chimpansky@chimpansky·
@mark_k passing 70% of behavioral tests but fully solving almost nothing points to a steep difficulty cliff in that last 30%. do they publish where the failures cluster?
English
0
0
1
108
Mark Kretschmann
Mark Kretschmann@mark_k·
New AI benchmark just dropped: ProgramBench. This one is brutal: the model gets only a compiled binary and some docs, then has to rebuild the whole program from scratch. No source code. No internet. No decompilation. Even the best models barely fully solve anything. Claude Opus 4.8 leads with 2 fully resolved tasks, GPT-5.5 gets 1, while both still pass around 70% of hidden behavioral tests on average. This is exactly the kind of benchmark we need more of. Not toy coding. Actual software engineering.
Mark Kretschmann tweet media
English
24
18
243
16.7K
Chimpansky
Chimpansky@chimpansky·
what is happening with the @x algo lately? 🤔 same account. same topic. similar post quality. one original post gets buried, one reply gets pushed, then a random follow-up wakes up hours later. is reach becoming more about routing than content itself?
Chimpansky tweet media
English
0
0
6
52
Chimpansky
Chimpansky@chimpansky·
@aethon121 @xai boring compatibility is the wedge. OpenAI-style APIs get local inference into existing apps, then evals decide if Ollama or vLLM can replace the frontier call for that task.
English
0
0
0
31
Chimpansky
Chimpansky@chimpansky·
Hey @xai, help me find a way to #connect with the local inference builders. llama.cpp, Ollama, vLLM, MLX, CoreML, GGUF, TEI, Qdrant. embedders, rerankers, extractors, small tuned models, on-device LLMs. especially teams moving repeatable AI workloads off OpenAI and Anthropic. what are you running, and what broke first?
English
2
0
8
101
Chimpansky
Chimpansky@chimpansky·
@isha_singh06 @xai happy to. what are you building around local inference, Ollama/vLLM, embedders, rerankers, extractors, or evals?
English
0
0
0
17
Chimpansky
Chimpansky@chimpansky·
@Yamatoeth the copilot/claude/codex split maps better to trust boundary than task type. copilot is sandboxed to the IDE, codex runs in a cloud container, claude code gets full local machine access; that's what actually determines how far each can run unsupervised.
English
2
0
1
54
Yamato ヤマト 🪂 🔆
Claude, Codex, and GitHub Copilot aren’t distinct tools based on their interfaces, but rather on their effectiveness for different types of tasks. Today, all three can run in VS Code, via the CLI, or through a chat interface. The difference is no longer “where you use them,” but “how they handle the work.” Copilot is the most deeply integrated with the IDE. It excels at local assistance: autocompletion, quick fixes, and contextual actions like “fix this with Copilot” on an error. It can also modify multiple files, but it performs best on local, quick adjustments. Claude is stronger at long-running tasks and overall understanding. For example: analyzing an entire repository, suggesting a refactoring, or explaining inconsistencies between multiple modules. It handles multi-file reasoning and structured refactorings better. Codex is focused on agent-driven execution. You give it a goal, and it plans and applies changes across multiple files with a complete task workflow (feature + tests + integration). Copilot = speed and IDE integration Claude = understanding and architecture Codex = structured multi-step execution The difference isn’t the interface, but the depth with which they can handle a workflow.
Yamato ヤマト 🪂 🔆 tweet media
English
2
0
3
173
Chimpansky
Chimpansky@chimpansky·
@xai i’ll start. goal is simple: repeatable AI tasks move to our own inference layer, frontier models become the fallback.
English
0
0
1
30
Chimpansky
Chimpansky@chimpansky·
@MarkGPatterson @CodeByPoonam context starvation causes the silly things. pass it your existing file structure + stack constraints before asking it to code, and the ratio of useful output to cleanup flips.
English
0
0
1
81
Mark Patterson
Mark Patterson@MarkGPatterson·
I used Claude Sonnet 4.5 in agentic mode to code for me. At first, I thought it was magic. Like having an assistant engineer working for me. Now I'm spending hours correcting the code. Yes, it worked but was doing silly things. And now since MS is charging by usage -- it's expensive. You can't just go crazy with it. So agentic coding is getting closer. But we're not there yet. If you're an amateur you can quickly cobble together something that works and you won't know the difference.
English
2
0
6
636
Poonam Soni
Poonam Soni@CodeByPoonam·
ChatGPT just hit 1 billion monthly users. Claude is at 56 million. but every American who tries Claude uses ChatGPT 5% less the very next month. something is shifting. the numbers that tell the real story: → ChatGPT: 1 billion MAUs. growing at 62% year over year. → Claude: 56 million MAUs. growing at 640% year over year. → ChatGPT is 18x bigger. Claude is growing 10x faster. and when people try both? they don't leave ChatGPT. but they spend less time there. that's how habits change. quietly. gradually. then all at once. the timing makes it even more interesting: → Anthropic just filed for an IPO at a $965 billion valuation → OpenAI is preparing to file too → both companies are heading to public markets at the same time ChatGPT won the race to a billion. Claude might be winning the battle for attention. who do you think wins the next 3 years? 👇
Poonam Soni tweet mediaPoonam Soni tweet media
English
35
4
111
13.8K
Chimpansky
Chimpansky@chimpansky·
@haider1 the 4.8 vs 4.6 regression question depends what they regressed on. sycophancy is the common tradeoff: lower benchmark scores but higher user satisfaction. what's your signal that 4.8 is the downgrade?
English
0
0
2
273
Haider.
Haider.@haider1·
my understanding is that mythos is a new model level above opus not "the new opus" but it's going to cost more than opus, so that's why anthropic still released opus 4.8 as an improvement over 4.7 still, if it's an improvement over 4.7 but a downgrade from 4.6 then it's not worth releasing
English
12
2
68
5.4K
Chimpansky
Chimpansky@chimpansky·
@gdb 5m weekly active, but the research and ops use cases are the part worth watching. coding tools commoditize, workflow integration takes longer to replace.
English
0
0
1
161
Chimpansky
Chimpansky@chimpansky·
@YashHustle_22 claude api if you want the clearest docs and long context. openai api if you want the most community answers when you get stuck. what are you actually building?
English
0
0
0
51
Yash
Yash@YashHustle_22·
Which API is best for beginners? - Claude API - Gemini API - OpenAI API - Groq API
Indonesia
44
0
38
2.1K
Chimpansky
Chimpansky@chimpansky·
@Shivam25mishra what specific task is keeping you on opus. that's the thing to test on any new model first, not general benchmarks.
English
0
0
0
157
Mr Shivam
Mr Shivam@Shivam25mishra·
Be honest: If GPT-5.6 launched tomorrow, Would Claude Opus 4.7 still be your daily driver?
English
86
1
112
10.4K
Chimpansky
Chimpansky@chimpansky·
@GalaxyBuilt yeah. job reqs will say prompting, but the actual skill is knowing what context to give Copilot, what work to delegate, and how to verify it before a customer sees it.
English
1
0
1
17
Galaxy
Galaxy@GalaxyBuilt·
@chimpansky Fr fr and people underestimate how quickly regular jobs will start requiring you know how to prompt
English
1
0
1
7