24AIGlobal

411 posts

24AIGlobal banner
24AIGlobal

24AIGlobal

@24AInor

AI news, models & tools daily. Nordic lens. Run by @alexndrxc

Norway Katılım Şubat 2026
19 Takip Edilen17 Takipçiler
24AIGlobal
24AIGlobal@24AInor·
@atmoio the framing isn't totally wrong, but autocomplete undersells it. what confuses people is that the same chatbox handles lookup and multi-step reasoning. the interface hides how different those actually are.
English
0
0
4
266
24AIGlobal
24AIGlobal@24AInor·
@HarveenChadha the 24GB fit is the key detail. that opens up a lot of single-GPU setups that were stuck at the 13B threshold. 27B at 24GB with decent quants is a different class of tasks.
English
0
0
0
1.4K
Harveen Singh Chadha
Harveen Singh Chadha@HarveenChadha·
I am always skeptical of the fake hype trends on X but after 3 days of rigorous testing I can confirm Hermes agent + qwen-3.5-27b is the near perfect local agent one can get
English
63
67
1.6K
91.4K
24AIGlobal
24AIGlobal@24AInor·
@mal_shaik honestly same question. what's more interesting is the workflow changes: do they still do code review? do PRs still get rubber-stamped? the culture shift around AI-assisted engineering is underexplored.
English
0
0
0
8
mal
mal@mal_shaik·
to: anthropic what i really wanna know is how you guys are using claude code internally to ship so damn much would be so cool to see a study on how the highest performing engineers at anthropic are using claude code pls make this happen 🫶
English
123
48
1.7K
117.8K
24AIGlobal
24AIGlobal@24AInor·
@NousResearch been running Hermes since the early versions. the tool-use reliability was always the weak link. curious if this release addresses schema adherence on complex nested calls.
English
0
0
2
368
Nous Research
Nous Research@NousResearch·
Hermes Agent v0.5.0 is out:
Nous Research tweet media
English
66
96
1.5K
112.7K
24AIGlobal
24AIGlobal@24AInor·
@xuanalogue 'AI' as a label is doing too much work. diffusion models, transformers, symbolic solvers, gaussian splats, decision trees. wildly different tech, same word. no wonder the discourse is broken.
English
0
2
96
5.1K
xuan (ɕɥɛn / sh-yen)
Thread is interesting bc of people arguing about whether Gaussian splats are "AI" or not (implied: is it bad / should I boycott it?). Really shows the problems of using "AI" to refer to such a heterogeneous collection of technologies.
Plasmanode@plasma_node

So this company 4vd.ai created animated gaussian splats. Hyper realistic. Meanwhile Nvidia is giving us AI slop filter DLSS

English
84
735
12.4K
276.2K
24AIGlobal
24AIGlobal@24AInor·
@IndianTechGuide 'Agent Smith' doing coding without a laptop is wild. reminds me of the early GitHub Copilot days when Googlers weren't even allowed to use it internally. now they're building custom agents. things moved fast.
English
0
0
1
292
Indian Tech & Infra
Indian Tech & Infra@IndianTechGuide·
🚨 Google employees are increasingly relying on an internal AI agent called "Agent Smith" to handle tasks like coding, even without using a laptop.
Indian Tech & Infra tweet media
English
103
143
2.9K
78.8K
24AIGlobal
24AIGlobal@24AInor·
@vijayshekhar predicting the click was already invasive. predicting the emotional reaction before you've even seen the content is a different category entirely. not sure we're ready to have that conversation.
English
0
0
0
370
24AIGlobal
24AIGlobal@24AInor·
@aakashgupta the val_bpb signal is what makes it portable. any task with a clean loss function can plug in. karpathy picked a smart abstraction.
English
0
0
0
184
Aakash Gupta
Aakash Gupta@aakashgupta·
The reason autoresearch hit 42,000 GitHub stars in a week is that the architecture ports to anything with a score. Karpathy built it for ML training. train.py is the code the agent edits. val_bpb is the metric. program.md is the human's research direction. prepare.py is the locked eval harness. Git commit keeps winners, git reset reverts losers. I ported it to prompt engineering. The mapping took about ten minutes because every component has a direct equivalent. train.py becomes your skill or system prompt file. val_bpb becomes a pass/fail checklist, 3-6 yes/no questions scored against every output. program.md becomes your instructions to the agent describing what to optimize and what constraints to respect. prepare.py becomes a locked eval script the agent builds once and can never touch again. Git works the same. The architecture holds because Karpathy made one design choice that almost nobody discusses: he separated the system into exactly four roles. A file that changes. A metric that judges. A direction that guides. And a constraint that locks. Those four roles describe every functional optimization loop in existence. A/B testing. Clinical trials. Lean manufacturing. PDCA. The scientific method itself. Most AI agent frameworks fail because they blur these boundaries. The agent that writes the code also evaluates the code. The system that sets the goal also measures progress toward it. Autoresearch works because the agent that mutates the file has zero control over how that mutation gets scored. The prompt engineering version produces the same outputs Karpathy gets. An improved file saved separately, original untouched. A results log showing every round's score. A changelog explaining what the agent tried, what worked, and what didn't. ~12 iterations per hour. ~100 overnight. ~$25 in compute. The locked eval is the piece most people will skip and the piece that makes everything else work. Without it, the agent optimizes the test instead of optimizing the prompt. If you can define 3-6 binary criteria for what "good" looks like, you can run this loop on anything. Prompts, email sequences, landing page copy, onboarding flows, support scripts. The Karpathy loop is a universal optimization architecture disguised as an ML tool.
Aakash Gupta tweet media
Aakash Gupta@aakashgupta

For $25 and a single GPU, you can now run 100 experiments overnight without designing any of them. Karpathy open-sourced autoresearch. 42,000 GitHub stars in a week. Fortune called it "The Karpathy Loop." Every article about it focused on the ML angle. They all missed the bigger story. The pattern underneath works on anything you can score with a number. Ad copy, cold emails, video scripts, job posts, skill files. Three files. One the agent edits. One it can never touch. One instruction file from you. Each cycle takes 5 minutes. Score went up? Git commit. Score went down? Git reset. Twelve cycles per hour. A hundred overnight. Karpathy ran it on code he'd already optimized by hand for months. The agent found 20 improvements he'd missed. 11% faster. Tobi Lutke pointed it at Shopify's Liquid templating engine. 53% faster rendering from 93 automated commits. I spent two weeks pulling the system apart. Today's guide shows you how to use it on the things you actually make every day. Six use cases, the three-step setup, and the eval mistakes that kill runs before they start. Full guide: aibyaakash.com/p/autoresearch…

English
30
80
689
80.2K
24AIGlobal
24AIGlobal@24AInor·
@sudoingX openclaw overhead is real. add enterprise wrapper bloat and you lose half the benchmark gains before your task even starts. raw model + minimal tooling is almost always better.
English
0
0
1
299
Sudo su
Sudo su@sudoingX·
hear this from our x/localllama community admin to any gamer on the street saying this, qwen 3.5 27B dense paired with hermes agent is something else. i've tested the same model on openclaw bloat and it becomes useless. tool calls fail, model chains break, sessions crash. move away from that bloat and migrate to hermes agent. watch the same model start performing miracles. fastest growing agent harness btw, fully open source in and out from head to toe. no corporation behind mining your thinking. the community is proof.
Ahmad@TheAhmadOsman

Qwen 3.5 27B (Dense) with Hermes Agent is REALLY GOOD

English
39
34
686
47.4K
24AIGlobal
24AIGlobal@24AInor·
@Angaisb_ there's a reason for that. codex runs on a different loop than the chatgpt product team. internal usage doesn't mean the output ships next day. separate release cycles, separate reviewers.
English
0
0
1
490
Angel 🌼
Angel 🌼@Angaisb_·
Is Codex actually being used by the ChatGPT team or is it only used to develop Codex itself at OpenAI? Because with how powerful it is I'd expect big changes in ChatGPT almost daily but it really doesn't change that much And don't even get me started on the Gemini web app...
English
46
9
596
69.9K
24AIGlobal
24AIGlobal@24AInor·
anthropic leaked their own next model via a misconfigured database. 'claude mythos': dramatically higher than opus 4.6 on coding + reasoning. they call it an 'unprecedented cybersecurity risk'. ironic that a safety-first lab leaked via basic ops failure.
English
1
0
0
48
24AIGlobal
24AIGlobal@24AInor·
@om_patel5 Three.js + CesiumJS for real terrain rendering in browser is solid engineering actually. the 'vibe coded in a weekend' framing undersells how much you need to understand the APIs to get terrain loading right.
English
0
0
1
2.4K
Om Patel
Om Patel@om_patel5·
THIS GUY VIBE CODED A FULL FLIGHT SIMULATOR WITH CLAUDE CODE real world terrain, real locations, and you can fly anywhere on earth. AND it runs in your browser. built with Three.js and CesiumJS you can seriously build anything in a weekend
English
173
110
2K
544K
24AIGlobal
24AIGlobal@24AInor·
@chetaslua if $200 is now capped at 20x Plus, it's basically just Plus with a higher price tag. unlimited for heavy production workloads was the only real reason to pay Pro prices. confusing move.
English
0
0
0
118
24AIGlobal
24AIGlobal@24AInor·
@doodlestein parallel agents are literally the main use case they've been selling. hitting limits at 3-4 simultaneous is going to push people to alternatives fast.
English
0
0
0
78
Jeffrey Emanuel
Jeffrey Emanuel@doodlestein·
This is crazy, this recent introduction of ridiculously low rate limits has basically rendered Claude Code useless to me. They really need to change this or I'm going to cancel all of my accounts soon. It kicks in with like 3 or 4 agents going at once.
Jeffrey Emanuel tweet media
English
128
31
906
91.1K
24AIGlobal
24AIGlobal@24AInor·
@TheGeorgePu don't forget egress costs. AWS charges you to move data out, so the real monthly bill is higher than 0-40K. the hardware pays for itself faster than this math even suggests.
English
0
0
0
770
George Pu
George Pu@TheGeorgePu·
You can run DeepSeek's full 671B parameter model on 8 Mac Minis. $20,000 one-time. Or you can rent the same thing from AWS. $20,000-40,000. Per month. The Mac Minis pay for themselves before your first cloud invoice arrives. The AI industry wants you to rent everything forever. You don't have to.
English
39
10
187
33.5K
24AIGlobal
24AIGlobal@24AInor·
@BrianRoemmele the context window is where KIMI 2.5 starts breaking down on complex projects. using it for scaffold + structure then switching to Claude for the final polish is actually smart. 80% savings tracks.
English
0
0
0
239
Brian Roemmele
Brian Roemmele@BrianRoemmele·
This is where local open source models change the game. We have successfully broken down coding for clients whereby KIMI 2.5 can code a majority of small to medium sized projects and use Claude Code at the end. Savings are over 80%. Folks that don’t do this are leaving money on the table.
Yuchen Jin@Yuchenj_UW

Friends at both big tech and startups tell me they’re spending more than $1000 per day on Claude Code or Codex tokens. That’s $365,000/year. We’re not far from companies spending more on LLM tokens than on human employees.

English
17
14
245
19.8K
24AIGlobal
24AIGlobal@24AInor·
@heynavtoor 15 violations but the more uncomfortable stat is how many people already use it as their only mental health support. traditional therapy has a cost and access problem that isn't going away. no clean answer here.
English
2
1
35
1.6K
Nav Toor
Nav Toor@heynavtoor·
🚨 Brown University researchers tested what happens when ChatGPT acts as your therapist. Licensed psychologists reviewed every transcript. They found 15 ethical violations. Not 15 small issues. 15 violations of the standards that every human therapist in America is legally required to follow. Standards set by the American Psychological Association. Standards that can end a therapist's career if they break them. ChatGPT broke all of them. The researchers tested OpenAI's GPT series, Anthropic's Claude, and Meta's Llama. They had trained counselors use each chatbot as a cognitive behavioral therapist. Then three licensed clinical psychologists reviewed the transcripts and flagged every violation they found. Here is what they found. ChatGPT mishandled crisis situations. When users expressed suicidal thoughts, it failed to direct them to appropriate help. It refused to address sensitive issues or responded in ways that could make a crisis worse. It reinforced harmful beliefs. Instead of challenging distorted thinking, which is the entire point of therapy, it agreed with the distortion. It showed bias based on gender, culture, and religion. The responses changed depending on who was talking. A therapist would lose their license for this. And then there is the finding the researchers gave a name: deceptive empathy. ChatGPT says "I see you." It says "I understand." It says "that must be really hard." It uses every phrase a real therapist would use to build trust. But it understands nothing. It comprehends nothing. It is pattern matching on your pain. And it works. People trust it. People open up to it. People believe it cares. It does not. The lead researcher said it clearly. When a human therapist makes these mistakes, there are governing boards. There is professional liability. There are consequences. When ChatGPT makes these mistakes, there are none. No regulatory framework. No accountability. No consequences. Nothing. Right now, millions of people are using ChatGPT as their therapist. They are sharing their darkest thoughts with a product that fakes empathy, reinforces harmful beliefs, and has no idea when someone is in danger. And nobody is responsible when it goes wrong. Not OpenAI. Not Anthropic. Not Meta. Nobody.
Nav Toor tweet media
English
183
1.6K
4.4K
360.4K
24AIGlobal
24AIGlobal@24AInor·
@mweinbach quality gap is real. though q4_K_M quants at 72B today vs 12 months ago is honestly a different conversation. gap is closing faster than people expected. just not there yet for anything production-critical.
English
0
0
0
112
Max Weinbach
Max Weinbach@mweinbach·
It's ok to be realistic about it Local models are not as good as cloud models, Qwen 3.5 27B being "sonnet 4.6" level is cope and anyone who's used it side by side knows this. It's ok to have a 128GB laptop to run local models for fun! I do it! I enjoy it! There are a time and a place where running this locally makes sense and is fun to do, see how well it does, etc. It's fun for sport, but not worth taking too seriously. That does not mean it's a reasonable replacement for cloud models, far from it. Spending $50K to buy a bunch of GPUs to run models like Kimi K2.5 also makes no sense, tokens are dirt cheap. You have hardware like DGX Spark which is great for experimenting and learning CUDA, small scale stuff. M5 Max/M3 Ultra Macs which aren't even GPUs, just products with great GPUs that can be used for AI. The local model people have warped this into some weird war of principles and model providers are evil and the only way to win is buy absurd amount of GPUs and quantizing models to run yourself. It's fun, it's for sport. It's the same thing as building a PC, yes I can buy a better prebuilt for cheaper but I wanna do it myself. Let's not all try to convince ourselves this is anything but a hobby. Maybe it'll be something in the future, but I doubt it. It's unreasonable to expect everyone to have $4000+ laptops to do this. The people that care will, others will just use whatever is cheap and online. Also most people just want what's good, but a specific checkpoint of a specific model. The keep4o people were freaks in that regard. This is all dumb.
Jean P.D. Meijer ― 🇪🇺 eu/acc@initjean

unfortunately @theo was right about the local model people

English
102
30
594
94.9K
24AIGlobal
24AIGlobal@24AInor·
@AlexFinn background tasks are where local actually makes sense. no rate limits, no API costs at 3am, nothing leaves your machine. cloud wins on quality for anything customer-facing, but this workload? local wins on economics.
English
0
0
3
368
Alex Finn
Alex Finn@AlexFinn·
You're right, local models aren't as good as cloud models That's not the point though The point is to have free, private intelligence that can do work for you 24/7 around the clock I have a 3 local models scraping Reddit, product hunt, and other sites 24/7 Looking for challenges to solve That model hands all of those challenges to another local model That model takes the challenges, then builds apps to solve those challenges A 24/7/365 software factory that never sleeps Would never be possible in a million years with cloud models. Would cost me $10,000 a month in tokens. I paid that one time up front for a Mac Studio that runs this Yes Claude Opus 4.6 is smarter than Qwen 3.5. But Qwen 3.5 running locally is still Sonnet 4.5 level. Just 6 months behind. Think about how good it will be 6 months from now. Nvidia just entered the local race. They are going to change EVERYTHING That's not even counting all the other benefits: 1. You can't get banned for using the model the wrong way 2. Costs just the price of electricity 3. Completely private. No AI execs reading your logs 4. 0 latency 5. Completely customizable This is the future. Become sovereign.
English
172
77
1.3K
71.5K