josh

3.6K posts

josh banner
josh

josh

@JoshCaughtFire

I build tech stuff and travel. Meta-programming, ML and breaking stuff. Lost less than $100m. Want to work at @Chilis or @McDonalds

Salt Lake City, UT Katılım Eylül 2023
113 Takip Edilen542 Takipçiler
Sabitlenmiş Tweet
josh
josh@JoshCaughtFire·
Developing on the go is my new focus. Went on a long walk last night, and had so many ideas I wanted to collaborate with AI on, but mobile isn't there yet Genuinely excited for the mobile IDE, stoked about producing quality while on a 4 hour walk or in-between activities with fam and friends
English
0
0
9
2.4K
josh
josh@JoshCaughtFire·
Auto might be nice honestly, if you can break this metric down by task type, stack or whatever then auto select the model that works best. Even with this, if I’m doing compiler work vs embedded vs a web app crossed with the task. You guys have the benefit having more than one family of models to track
English
1
0
1
17
Pierce Boggan
Pierce Boggan@pierceboggan·
@JoshCaughtFire @intellectronica @code Yeah, we have thought about how we can better educate folks in the product. One attempt was our new "opinionated" model picker where the top options are all top performing in terms of code retained in commits
English
1
0
2
79
Pierce Boggan
Pierce Boggan@pierceboggan·
Code survival is a great metric for understanding the effectiveness of models on different scenarios in @code! Should we share more metrics like this?
Kyle Daigle@kdaigle

Hot take from looking at @github Copilot telemetry: benchmarks make coding models look wildly different. Production workflows make them look much more similar. 👀 We looked at 23M+ Copilot requests and examined one simple metric: code survivability.

English
10
1
60
7.1K
josh
josh@JoshCaughtFire·
@smoon_lee I find opus works great in Claude code, less so in copilot
English
0
0
0
69
Simon Lee
Simon Lee@smoon_lee·
@JoshCaughtFire I've not had this issue with Opus 4.6 and been using it the last of weeks and had amazing success with it
English
1
0
2
119
josh
josh@JoshCaughtFire·
So GitHub Copilot... how is it this bad? > Sonnet 4.6 plans... > Does 1 thing, but marks everything complete > I compact > Switch to Opus 4.6 > "Didn't do X,Y,Z, please audit" > Read 5 files > Auto compacts > "Everything was marked complete, what would you like me to do?" WTF?
English
5
0
3
2.4K
josh
josh@JoshCaughtFire·
Claude Opus took 26 minutes and 14k tokens to tell me my directory is empty. My bad for not checking if OneDrive had finished syncing, but still...
josh tweet media
English
0
0
2
93
josh
josh@JoshCaughtFire·
@DavidKPiano @chddaniel Crap, this reminds me I forgot I was going to build react after effects, declare your AE compositions in react, all the power of AE with all the complexity of react. Think it would be better than remotion
English
0
0
0
83
josh
josh@JoshCaughtFire·
@pierceboggan No, but I don’t get the task tracking there, sometimes it writes to a plan md and sometimes it calls sqlite to update a task db, then it gets conflicts between new and old plans (same issue in Claude code recently, but it seems more gracefully handled)
English
2
0
2
370
Pierce Boggan
Pierce Boggan@pierceboggan·
@JoshCaughtFire hey - sorry you had a bad experience. do you happen to still have this session around? if so, if you could export the logs from "Developer: Show Chat Debug View" I'd love to see what is going on there
English
3
0
14
2.2K
josh
josh@JoshCaughtFire·
@vyrotek I like the analogy, describes a lot of AI output at the moment
English
0
0
1
7
JSONB
JSONB@vyrotek·
The Uncanny Valley effect applies to software too. And you can't fix it with more or better AI.
English
2
1
3
201
josh
josh@JoshCaughtFire·
@SpaceMatthieu @levelsio Yeah, I ended up building a whole multi agent app to bring good coding to my phone, been dogfooding it, kinda cool to use it to build itself, ngl
English
0
0
0
27
Matthieu Richard
Matthieu Richard@SpaceMatthieu·
@JoshCaughtFire @levelsio Yes, that’s what I was thinking about. Speech recognition is not really a problem in my setup, i just need and easy way to switch between sessions
English
1
0
2
1.4K
@levelsio
@levelsio@levelsio·
Are you guys aware I am coding mostly on my phone now all day via Termius to Claude Code on my server while I go with gf to the dentist, clothing store, cafe, etc. 😛✌️
@levelsio tweet media
rootkid ✌️@rootkid

@levelsio "You" ➡️ IP your Internet provider assigns you; not your servers IPs. If you had a static IP I'd like to know why you prefer Tailscale over just adding e.g. your company IP to the firewalls SSH whitelist.

English
320
88
2.1K
679.7K
josh
josh@JoshCaughtFire·
Haven't seen this before with Claude Code... Instead of compacting, it wants to save a memory and clear.
josh tweet media
English
0
0
1
64
josh
josh@JoshCaughtFire·
@SpaceMatthieu @levelsio Have your agent build your own terminal app, seriously the best thing. On iOS at least you can also build a custom speech recognition grammar, so you can dictate to your phone and it will pick up technical and project related words way better
English
1
0
3
1.5K
Matthieu Richard
Matthieu Richard@SpaceMatthieu·
@levelsio Is there a good way to jump between tmux sessions on Termius? I find it quite hard to manage multiple codex/claude sessions on the go
English
36
1
14
223.1K
josh
josh@JoshCaughtFire·
New sparse architecture for my NER model just posted first results. 600 sequences/sec during training (6x dense model), F1 of .74. Still gotta close the gap to F1=.96 from the dense model, but this is promising... Good start to the week.
josh tweet media
English
0
0
0
35
josh
josh@JoshCaughtFire·
Claude Code gets a 1 star for asking for my feedback right after giving me a numbered list of ideas for me to respond to: Me: '1) ...' Claude: 'Thanks for your feedback!'
English
0
0
0
43
josh
josh@JoshCaughtFire·
Quality control goes both ways... Been really building out my dev setup with coding agents recently, and heavily pushing quality all around. Really fun to build what I use to build.
josh tweet media
English
0
0
0
28
josh
josh@JoshCaughtFire·
I assume this is Claude Code... I wonder if secretly they are doing like SETI @ Home kind of thing with distributed training with all their users
josh tweet media
English
0
0
0
51
josh
josh@JoshCaughtFire·
@ibuildthecloud They wanted to keep it as pure sugar, but no one told them too much sugar is a bad thing
English
0
0
2
75
Darren Shepherd
Darren Shepherd@ibuildthecloud·
TypeScript is stupid language, poster child of good intentions.
English
8
0
18
2.1K
josh
josh@JoshCaughtFire·
OMG, we are bringing back perl! Claude almost always defaults to sed or writes a python script, but guess it just had enough and went with perl
josh tweet media
English
0
0
1
71
josh
josh@JoshCaughtFire·
@levelsio It’s crazy how this isn’t routine, no idea why doctors don’t push for broad tests even just to establish a baseline. So many issues can slide under the radar because things are still ‘in range’ hiding the fact they’ve changed significantly for you
English
1
0
1
1.3K
@levelsio
@levelsio@levelsio·
Get your blood tested every 3-6 months Include T!
Hans Amato@HansAmato

I spent $12,000 on therapy over 2 years trying to fix my anger before someone checked my blood Was snapping at my girlfriend over nothing. Road rage every commute. Constant irritability that I'd mask all day at work and then explode the second I got home. Therapist said it was unprocessed childhood trauma. We spent 2 years unpacking my relationship with my father Here's what was actually happening in my body: my gut was leaking endotoxin into my bloodstream which was keeping my immune system in a permanent inflammatory state. Chronic inflammation elevates cortisol. Elevated cortisol burns through magnesium. Low magnesium destroys GABA production. Without GABA your nervous system has no brake pedal I didn't have an anger problem. I had no neurological ability to regulate a stress response because my inhibitory system was running on empty Nobody tested this. Not my therapist. Not my doctor. Not the psychiatrist who suggested I "might benefit from an SSRI to take the edge off" 3 practitioners. 2 years. $12,000+. Zero blood draws Fixed my gut. Restored magnesium and zinc. GABA came back online. The rage disappeared in weeks. Not managed. Not suppressed. Gone. Because the thing causing it was gone I think about those 2 years of therapy sessions analyzing my childhood while my body was on fire and nobody thought to check. Talking about my father while my cortisol was 3x baseline because my intestinal lining had holes in it The mental health industry is billing $280 billion a year in the US. Almost none of it starts with blood work If you're doing everything right psychologically and still can't control your reactions, you might not have a mind problem. You might have an inflammation problem that no one in the room is trained to find

English
62
33
1.4K
621.5K
josh
josh@JoshCaughtFire·
@jesseleite85 @aarondfrancis Precisely this, I’ve found them genuinely good in my current workflows, especially where hooks either don’t work or aren’t available across clients, including orchestrating state in an agent agnostic way, especially across hosts. Just drop in and done, no extra work
English
0
0
1
34
Dr. Elvim Ransom 👽
Dr. Elvim Ransom 👽@jesseleite85·
@aarondfrancis Not to mention, an MCP can inform the harness when it should be called. A CLI needs rules outside of itself, to inform the harness when it should be called. But nuance on the internet is not popular, so ignore me.
English
1
0
3
281
Aaron Francis
Aaron Francis@aarondfrancis·
As always, be careful who you listen to on x dot com, the everything app. Levels makes a bold proclamation about MCP being dead, Fatih comes in with a reasoned take about MCP being useful in a non-indie-hacker environment. Listen to the nuanced reasonablists like Fatih!
Fatih Arslan@fatih

They are not dead. They are a must have for any enterprise installation and companies with hundreds of employees. I also thought MCP was dead, but you're thinking like an indie hacker. Assuming you're working at a company with 50 people, people install remote MCP servers, use Oauth and then are done. Everyone is authenticated and you don't have to fiddle around with CLI's. It's also secure. For example check @cursor_ai's new Plugin MarketPlace and you got the idea why MCP's are blooming and why they solve a real issue.

English
24
7
145
28.6K