Barja
13 posts

Barja
@XBarja
Full Stack Engineer. Professor @ULusofona. Passionate about Cloud & Backend development. Documenting my journey. https://t.co/yTThMK1FD1 https://t.co/KrkHzVUZHI https://t.co/SiUlSXylF0
Sintra, Portugal, Europe Katılım Haziran 2025
19 Takip Edilen1 Takipçiler
Barja retweetledi

classping.org mentioned on "GitHub Education Teacher Newsletter - May 2025"
ty @GitHubEducation

English

Check how js promises can go wrong on real use cases from James Snell
youtu.be/XV-u_Ow47s0?si…
A must watch!

YouTube
English
Barja retweetledi

because venv, pip, pyenv are a mess
Low Level@LowLevelTweets
@thdxr I honestly do not understand why everyone hates on Python.
HT
Barja retweetledi

@NetworkChuck It depends on how they use it. Students who treat AI as a learning tool will absolutely get smarter. But if it’s just a shortcut to avoid thinking, they’ll probably get dumber.
English
Barja retweetledi

T3 Chat has recovered and is now working again.
Now for ACCOUNTABILITY POSTING 2.0.
Outages like this are unacceptable. Tens of thousands of people rely on T3 Chat every day, and we need to make sure our service is reliable. We also need better safety nets in place for when issues occur.
I'm going to talk about the upstream provider that triggered the issue. Please treat this as TRANSPARENCY and not BLAME SHIFTING. Blame us!!!
We've been working hard to move over to Convex as our data layer and sync engine for T3 Chat. This might seem like a "database swap" but it goes much deeper. It's effectively a full rewrite of T3 Chat.
After a lot of effort and 3 failed migrations, we finally had a successful move at around 8pm last night. That was during our lowest traffic window (~40% of our peak traffic).
All looked good. I was pumped. A month of effort, finally shipped. I literally slept for 12 hours. Woke up to utter chaos.
The tl;dr is that a traffic spike took down their websocket connection layer, and some bad client code from their React package caused a reconnect loop that effectively DDOS'd the Convex endpoint.
Convex will have a detailed write up in the near future, but I want to talk about what we're doing going forward.
1. Actual status updates and reporting in-app
Right now, outages are reported via me via Twitter. We're a real app now. You shouldn't have to follow me to know what's going on.
We'll be introducing a status page soon to make things clearer
2. Paging system for when outages occur
Right now, we're too reliant on the community for tracking outages. I love that y'all DM me when issues occur, but that doesn't help when I am asleep.
We need better methods to report outages so Mark and I get woken up and can fix things faster.
Side note: I hate PagerDuty, so I'd love suggestions on what we can use instead.
3. Automated "refresh to latest" flow on client
A lot of the issue we had today was caused by a bad client side package DDOSing Convex. Even when we pushed a fix, lots of users were on the old version still, and would stay on that old version until refreshing.
We have a "please refresh" button, but that's not enough. If an old client can connect, we need the ability to disconnect it. This will be an annoying tech overhaul with a lot of potential edge cases, but it is necessary for us to assure stability.
4. Evaluate all upstream providers to make sure they are prepared for T3 Chat's load
I deeply love Convex and I know they're the right database for what we're building with T3 Chat. Outages like this still scare the shit out of me. I need to seriously evaluate them and everyone else we rely on to make sure we won't have more problems as we continue to scale up.
Anyways...
This sucks. Seriously. I hate outages so much. You guys use T3 Chat because it's the best chat app ever. Outages make it the worst chat app ever.
Know we're taking this as seriously as possible. I expect to have a few more sleepless nights as we get everything in order to be more resilient.
I'm sorry.
English



