Data-drone

504 posts

Data-drone

@BplLaw

ML & AI @databricks

Melbourne, Australia Katılım Aralık 2012

2.1K Takip Edilen141 Takipçiler

Data-drone@BplLaw·6d

@ClementDelangue What do you feel are the most important part of a good harness?

English

clem 🤗@ClementDelangue·6d

More people should work on harnesses for open and local models!

eric zakariasson@ericzakariasson

i still think about how these models performed better in cursor harness than their native ones

English

180

31.4K

Data-drone@BplLaw·9 Nis

@mervenoyann How do you try? Do you have a preset set of prompts and answers and compare

English

merve@mervenoyann·8 Nis

why local models matter for reproducibility, proven time after time just try GLM-5.1, Gemma-4 or upcoming MiniMax one, you won't regret it

Nozz@NoahEpstein_

anthropic's playbook, confirmed: 1. drop new model (Mythos Preview, today) 2. quietly make the old one spastic 3. charge the same price 4. blame the user when people notice the data is in. 6,852 claude code sessions analyzed: - thinking depth dropped 67% - the habit of reading code before editing it: gone from 6.6 reads average to 2 - lazy behavior violations: zero to 10 per day they went quiet for weeks. then boris cherny shows up on the github issue the moment the numbers went public. that's not accountability. that's pr management. mythos drops today. opus 4.6 just became the "old model." same price. And now fucking retarded. i'm glad i'm building around local models. gemma 4/GLM runs on your machine. it doesn't get quietly worse when a new product launches. can't shrinkflation a model you control.

English

5.5K

Data-drone@BplLaw·31 Mar

@SullyOmarr It can do some pretty good things but yes agree its costly and slow. Im sure it'll get there though

English

Sully@SullyOmarr·30 Mar

Serious question Has anyone even found computer use remotely useful? It’s way to slow and expensive for how mediocre it is

Claude@claudeai

Computer use is now in Claude Code. Claude can open your apps, click through your UI, and test what it built, right from the CLI. Now in research preview on Pro and Max plans.

English

213

320

73.2K

Data-drone@BplLaw·31 Mar

@0xSero Oh neat we definitely need more options than just Claude Code

English

0xSero@0xSero·31 Mar

Giving away 5 Opencode Go subs Winners selected randomly from comments in 24 hours.

OpenCode@opencode

we’ve signed Zero Data Retention agreements with all providers for Go all models now follow a zero-retention policy your data is not used for training

English

2.2K

2.4K

223K

Data-drone@BplLaw·29 Mar

@mervenoyann The most important thing is to just keep taking steps forward

English

108

merve@mervenoyann·29 Mar

my pronouns are "trying to catch-up with CVPR accepted papers while having a big project at work and having to travel to give a speech but I'm tired post-book writing already" 🥱

English

4.7K

Data-drone@BplLaw·28 Mar

@sudoingX Nanoclaw is my main for now

English

Sudo su@sudoingX·28 Mar

what agent harness are you using and why? drop your reasoning below. lets find out what's keeping you on your current setup or what made you switch.

English

144

15K

Data-drone@BplLaw·28 Mar

@Scobleizer Started testing Hermes now based on all the chatter let's see how it goes against my nanoclaw

English

Robert Scoble@Scobleizer·28 Mar

For instance: x.com/TheAhmadOsman/… Ahmad is one of the smartest I met at NVIDIA's GTC last week in building these systems.

Ahmad@TheAhmadOsman

Qwen 3.5 27B (Dense) with Hermes Agent is REALLY GOOD

English

4.9K

Robert Scoble@Scobleizer·28 Mar

I'm seeing so many positive messages about Hermes. Is it about to have its breakout weekend like Open Claw did a couple of months ago?

David Zhang (▲)@dazhengzhang

Tired of openclaw doing this all the time Time for Hermes

English

118

26.7K

Data-drone@BplLaw·26 Mar

@iScienceLuvr Yay im awesome

English

315

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·26 Mar

if you can reply to this, you're awesome and cool :)

English

169

358

42.7K

Data-drone@BplLaw·26 Mar

@levelsio Harro

Español

@levelsio@levelsio·26 Mar

Okay let's see who can reply to this

English

2.5K

2.1K

Data-drone@BplLaw·25 Mar

@daniel_nguyenx @levelsio I dont engage much with Australians i dont care about sports

English

Daniel Nguyen@daniel_nguyenx·25 Mar

@levelsio Yeah same. I have close to zero Vietnamese audience. I don’t think local audiences are interested in AI news or tpot memes

English

6.1K

@levelsio@levelsio·25 Mar

Terrible for me because my audience is mostly US But I am always outside US, like now Brazil, and other times Portugal or Thailand This isn't great for digital nomads

Nikita Bier@nikitabier

Starting Thursday, we'll be updating our revenue sharing incentives to better reward the content we want on X: We will be giving more weight to impressions from your home region—to encourage content that resonates with people in your country, in neighboring countries and people who speak your language. While we appreciate everyone's opinion on American politics, we hope this will disincentivize gaming the attention of US or Japanese accounts and instead, drive diverse conversations on the platform. We invite creators to start building an audience locally. X will be a much richer community when there's relevant posts for people in all parts of the world.

English

407

2.6K

362.8K

Data-drone@BplLaw·24 Mar

@simonw Any options without a mac?

English

Simon Willison@simonw·24 Mar

Here's Qwen3.5-397B-A17B- a 397B model - using the streaming MoE weights trick to run on an iPhone!

Anemll@anemll

Running 400B model on iPhone! 0.6 t/s Credit @danveloper @alexintosh @danpacary @anemll

English

146

25.6K

Simon Willison@simonw·24 Mar

Turns out you can run enormous Mixture-of-Experts on Mac hardware without fitting the whole model in RAM by streaming a subset of expert weights from SSD for each generated token - and people keep finding ways to run bigger models Kimi 2.5 is 1T, but only 32B active so fits 96GB

seikixtc@seikixtc

I got a 1T-parameter model running locally on my MacBook Pro. LLM: Kimi K2.5 1,026,408,232,448 params (~1.026T) Hardware: M2 Max MacBook Pro (2023) w/ 96GB unified memory Running on MLX with a flash-style SSD streaming path + local patching. This is an experimental setup and I haven’t optimized speed yet, but it’s stable enough that I’ve started testing it in an autoresearch-style loop. #LocalAI #MLX #MoE

English

124

277

3.8K

329.6K

Data-drone@BplLaw·23 Mar

@Appyg99 I want an agent to find interesting things for me and maybe things find products for me that I wouldn't have known to search for. But I'll do final inspections and purchase direct myself once I have some suggestions

English

Apoorva Govind@Appyg99·22 Mar

As someone that was a true believer of agentic commerce last year & ultra skeptic this year — The problem is this belief that humans want agents shopping for them. Other than a few efficiency obsessed nerds, most customers don't just hand off their wallet to some bot to buy stuff without being able to be a part of the decision irrespective of what the stated preferences are. Shopping is a conscious and important decision for 90% of households. A pleasurable hobby for many. Unless somehow you manage to change this human behavior (highly unlikely), agentic commerce needs to be restructured around discovery and less around payments and actual conversion.

English

192

41.4K

Data-drone@BplLaw·22 Mar

@TheZachMueller yes pls!

English

143

Zach Mueller@TheZachMueller·21 Mar

Considering the current pinch-bench results, I kind of want to run a quant gauntlet with a few of these top models to see the usefulness drop off etc. Would folks be interested in that?

English

31.4K

Data-drone@BplLaw·19 Mar

@GergelyOrosz I can't even get it to reliably create a gmail filter.....

English

Gergely Orosz@GergelyOrosz·10 Haz

Google is raising my Google Workspace pricing claiming all the “AI value” added. I turned off Gemini in Gmail and Docs because it just doesn’t work / do anything useful for me. So why am I being charged more? Having costs b/c of AI is not the same as generating value with AI…

English

743

55.5K

Data-drone@BplLaw·19 Mar

@svpino Yeah it took me ages to get it semi-work

English

Santiago@svpino·19 Mar

Claude code remote sessions don’t work for me half the time as well. This is one of the best ideas from the Anthropic team, but right now, they are too unreliable.

sankalp@dejavucoder

bruh claude code remote control session does not connect half the time. i thought my wifi had issues.

English

7.2K

Data-drone@BplLaw·19 Mar

Lesson learned today. Do not use Claude Cowork on Pro plan if it needs to figure out and click through a lot of screens

English

Data-drone@BplLaw·16 Mar

@sudoingX 5080 FE

Sudo su@sudoingX·15 Mar

drop your GPU below. i'll tell you exactly what model and config to run on it. here's what i've tested and verified on real hardware: RTX 3060 12GB - Qwen 3.5 9B Q4 - 50 tok/s - 128K context RTX 3090 24GB - Qwen 3.5 27B Q4 - 35 tok/s - 300K context RTX 3090 24GB - Qwen 3.5 35B MoE Q4 - 112 tok/s - 262K context 2x RTX 3090 - Qwen3-Coder 80B Q4 - 46 tok/s - full VRAM all running llama.cpp with flash attention. every number is real. every config is tested. if your card isn't on this list drop it below and i'll tell you what fits.

English

731

102

1.6K

192K

Data-drone@BplLaw·15 Mar

@dimd00d @mynamebedan I see it reimplement existing methods rather than try and find the right api to use a bit

English

dimd00d@dimd00d·15 Mar

@mynamebedan another point - when you get stuck and it feels like the code is “pushing back”, it’s usually a sign that the architecture/approach is wrong and time to rethink the spaghetti. an LLM happily slaps 1000 lines “adapter” and continues

English

118

dan ⚡️@mynamebedan·15 Mar

this is actually one of the worst side effects of agent coding workflows. suffering through confusion and bugs creates long lasting memories. these are canon events for programmers

Jared Friedman@snowmaker

I realized something else AI has changed about coding: you don't get stuck anymore. Programming used to be punctuated by episodes of extreme frustration, when a tricky bug ground things to a halt. That doesn't happen anymore.

English

220

13.3K

Data-drone@BplLaw·15 Mar

@snowmaker Now when you "feel stuck" its more that you dont know what new feature to build or the development process has been so adhoc you feel like you have a mess

English

Jared Friedman@snowmaker·14 Mar

English

592

439

7.3K

922.4K

Data-drone@BplLaw·12 Mar

@petergostev I think every name sounds bad in some language but this one is a genuinely funny coincidence

English

Peter Gostev@petergostev·11 Mar

Who named Amazon’s coding assistant 'Kiro'? I’m reliably informed that, in Balkan folklore, 'Kiro' is the go-to name for a slightly clueless village idiot like character

Anish Moonka@anishmoonka

Amazon had four Sev-1 outages (their highest severity level) in a single week. Internal memos say AI-assisted code changes were a contributing factor. The timeline here is wild. In October 2025, Amazon laid off 14,000 corporate employees. In January 2026, another 16,000. That’s about 30,000 people in five months, roughly 10% of the corporate workforce. CEO Andy Jassy said the cuts were about culture, not AI. During those same months, Amazon set a target: 80% of developers using AI coding tools at least once a week. They tracked adoption closely and blocked rival tools like OpenAI’s Codex. Even so, 30% of developers still hadn’t touched Amazon’s in-house tool Kiro by January. In December 2025, Kiro caused a 13-hour AWS outage. The AI tool had production-level permissions and decided the best fix for a bug was to delete and recreate an entire live environment. A second incident involved Amazon Q Developer, another AI tool. Amazon blamed both on “user error, not AI.” But quietly added mandatory peer review for all production access afterward. Then March 5: Amazon’s retail site went down for about six hours. Over 22,000 users reported checkout failures, missing prices, and app crashes. Amazon called it a “software code deployment” error. Five days later, SVP Dave Treadwell made the normally optional weekly engineering meeting mandatory. His memo acknowledged “GenAI tools supplementing or accelerating production change instructions, leading to unsafe practices.” These problems trace back to Q3 2025. Amazon’s own assessment: their GenAI safeguards “are not yet fully established.” The new rule: junior and mid-level engineers now need senior sign-off on any AI-assisted production changes. Treadwell also announced “controlled friction” for the most critical parts of the retail experience. For context, Google’s 2025 DORA report found 90% of developers use AI for coding but only 24% trust it “a lot.” An Uplevel study of 800 developers found Copilot users introduced 41% more bugs with no improvement in output. Amazon is finding out what those numbers look like at the scale of a $500 Billion revenue company, with 30,000 fewer people on staff to catch the mistakes.

English

150

21.7K

Keşfet

@ClementDelangue @mervenoyann @SullyOmarr @0xSero @sudoingX @Scobleizer @iScienceLuvr @levelsio