Arka

3.9K posts

Arka

@arkabagchi24

Katılım Mart 2016

732 Takip Edilen496 Takipçiler

Arka@arkabagchi24·3d

For a company convinced they’re about to automate software engineers out of existence, Anthropic sure generates enough server-side API errors to keep human DevOps engineers employed for the next millennium

English

Arka@arkabagchi24·5d

@Adrian_H Couldn't make it past the first paragraph of the intro without groaning and rolling my eyes

English

134

Adrian H@Adrian_H·5d

this Honig v. Anson (and mentioning Citron) lawsuit is nuts.. quite the chutzpah remember the big shiny balls storage.courtlistener.com/recap/gov.usco… courtlistener.com/docket/7317854…

English

751

Arka@arkabagchi24·29 Nis

Friend manages an eng team had to fire a dev bc they would just correspond with AI slop messages, update documentation way too fast with AI slop, and push AI slop code . Said he’ll have to “adjust interviewing to incorporate some questions around AI ethics, or AI work ethics.”

English

109

Arka@arkabagchi24·19 Nis

I wonder how many users realize that an LLM’s versatility far exceeds the narrow niches they’re currently tucked into. Tbh this is why it’s way more interesting to meet power users who are NOT tech bros

joolsd@joolsd

it is pretty funny just how much better they are for coding and computer troubleshooting than literally anything else

English

113

Arka@arkabagchi24·17 Nis

@iScienceLuvr They aren't training models to run startups. They're training them to navigate human workflows. Just bc a company didn't reach a billion dollar valuation doesn't mean their engineers didn't write good code, or that their day-to-day Slack comms aren't perfect training data

English

185

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr·16 Nis

how is it not obvious to these labs that this is a bad idea??? why do they want to train models to learn to run DEAD STARTUPS?!

Iain Martin@_IainMartin

AI labs are paying hundreds of thousands of dollars to buy email, Slack and Jira threads from dead startups as feedstock for ‘reinforcement learning gyms,’ which specialize in using defunct company data to build simulated work environments forbes.com/sites/annatong…

English

114

31.9K

Arka retweetledi

@·25 Mar

please shut the fuck up i don't even care about the specific thing you're saying i'm just so tired of hearing predictions one after the other telling me what the future is going to be like just please shut the fuck up

English

390

881

14.7K

621.2K

Arka@arkabagchi24·18 Mar

The Algorithm has decided to start putting these morons on my feed again to entertain me

Tsachy Mishal@CapitalObserver

Muddy Waters @muddywatersre disclosure on their short report

English

209

Arka@arkabagchi24·2 Mar

@VicVijayakumar What was the rationale to start with Fargate instead of EC2?

English

Vic 🌮@VicVijayakumar·2 Mar

Breakdown of my February AWS bill to run my side projects: EC2: $44.22 RDS (reserved Aurora MySQL): $41.31 ELB: $16.96 Data Transfer: $15.12 VPC: $11.61 CodeBuild: $1.29 S3: $1.10 ECS: $0.67 ECR: $0.29 ------------ Total: $132.57 For completeness, here's August to February- August: $203.95 September: $210.77 October: $245.98 November: $261.70 December: $221.30 January: $146.65 February: $132.57 In November, I moved all my instances from Fargate to EC2. <--- cheaper and much more performant. In December, I fixed the binpack strategy for one of my projects so I didn't pointlessly run an extra EC2 instance. I also moved my RDS to a reserved instance. In January, I moved the most resource intensive scheduled jobs to Fargate and I was able to drop the base container size, which dropped the EC2 instance sizes. Specifically I am able to see that my scheduled Fargate jobs ran for 13 hours and cost a total of $0.67. No changes in February that I remember, but it's 3 days shorter than January so 🤷‍♂️

Vic 🌮@VicVijayakumar

Breakdown of my January AWS bill to run my side projects: RDS (reserved Aurora MySQL): $46.02 EC2: $42.68 Data Transfer: $21.43 ELB: $19.21 VPC: $13.67 CodeBuild: $1.44 S3: $1.01 ECS: $0.89 ECR: $0.29 Cost Explorer: $0.01 (lol what, I didn't realize they even charged for this) ------------ Total: $146.65 For completeness, here's August to January- August: $203.95 September: $210.77 October: $245.98 November: $261.70 December: $221.30 January: $146.65 In November, I moved all my instances from Fargate to EC2. <--- cheaper and much more performant. In December, I fixed the binpack strategy for one of my projects so I didn't pointlessly run an extra EC2 instance. I also moved my RDS to a reserved instance. In January, I moved the most resource intensive scheduled jobs to Fargate and I was able to drop the base container size, which dropped the EC2 instance sizes.

English

14K

Arka@arkabagchi24·9 Şub

@grfwings @fuckpoasting they put WHAT in their tea???

English

Griffin@grfwings·9 Şub

@fuckpoasting Sunnyvale had the only Molly Tea in the Bay Area for a bit. Definitely bumped the ranking

English

110

fudge@fuckpoasting·9 Şub

people will look you in the eyes, say this and then hit you with “sunnyvale”

yuzu@yuzu_4ever

fyi there are cities in California that blow LA and SF out of the water but i'm gatekeeping them. you do not have the California ball knowledge i have and frankly you don't deserve it

English

123

10.2K

Arka@arkabagchi24·9 Şub

@Adrian_H Most superbowl watchers obviously are interested in trying out new CLI tools / IDE extensions

English

Adrian H@Adrian_H·9 Şub

this is awesome and I appreciate it but c'mon, not sure what fraction of the superbowl watching population this resonates with

OpenAI@OpenAI

You can just build things.

English

471

Arka@arkabagchi24·3 Şub

GIF

I am currently averaging about 10k LOC per day (35% of the lines are tests) so wow, 15k/day is #goals

ZXX

174

Arka@arkabagchi24·16 Oca

@seezatnap @MegaBasedChad Ehh, 5.2 seems to make a lot more focused and sensible changes. The architecture astronaut shit is from Opus’ suggested code IME

English

seezatnap@seezatnap·16 Oca

@MegaBasedChad i use both in an alternating loop and yeah, claude is the eng who will just ship something and it will kinda be bad code but it'll work fine and we're all happy, codex is the L7 code machine who insists on a five week refactor and ships some god tier thing only it understands

English

Arka@arkabagchi24·14 Oca

GPT-5.2-xhigh enjoyers are pretty punished because its hard explaining that this is the smartest AI in the world at the moment but you have to wait around 15 minutes for the AI to actually start responding to you.

English

Arka@arkabagchi24·23 Ara

5.2 xhigh (only on xhigh reasoning effort) is so spiky, ridiculously uneven capability profile and I mean that in the good sense wrt its ceiling.

English

Arka@arkabagchi24·13 Ara

@tekbog Everything I use

English

terminally onλine εngineer@tekbog·13 Ara

whats the hardest aws service to debug right now?

English

14.4K

Arka@arkabagchi24·8 Ara

I still feel like GPT-5.1 (high reasoning_effort) is better for my software use cases than Opus 4.5 or Gemini 3 Pro. GPT-5.1 still makes the most targeted and focused changes where it understands your system architectures and reuses or extends existing schemas, modules, etc.

English

175

Arka@arkabagchi24·27 Kas

@AlertFoxes @circlerotator @scaling01 Gotcha. Interesting observation re. Gemini struggling to parse intent with complex/unclear/contradictory instructions. I think intent parsing is this intangible that is hard to benchmark but becomes apparent with personal use

English

James Wigglesworth@AlertFoxes·27 Kas

Yeah, good question. Gemini isn't as reliable as Anthropic models at tool calling. The CLI is less effective than both Claude Code and Codex (not a model issue though). Gemini struggles with uncertainty, ambiguity, and contradictions more than GPT and Claude. And both GPT and Gemini have issues with intention understanding. Not saying not to use Gemini though! It has a lot of use cases. It's SOTA at vision and physical understanding. It's also extremely smart and a great brainstormer

English

102

Lisan al Gaib@scaling01·25 Kas

Claude 4.5 Opus is only a slight step above Opus 4.1, but nowhere near Gemini 3 Pro on SimpleBench

English

152

Arka@arkabagchi24·26 Kas

@AlertFoxes @circlerotator @scaling01 Just curious what issues re. Gemini 3 Pro you’re referring to when you said it has “so many issues” that it isn’t widely useful.

English

James Wigglesworth@AlertFoxes·26 Kas

@circlerotator @scaling01 Yeah, totally agree about Goodharting. Most of those benchmarks are one-dimensional as well which is why have to resort to vibes so much. ie Gemini killed it on a lot of benchmarks, but had so many issues with it that it's not widely useful.

English

Arka@arkabagchi24·26 Kas

@shiels_ai @zephyr_z9 Honestly is hilarious watching big AI labs benchmaxx while we (at LLM-based startups) use their models for tasks that the benchmarks don't even come close to covering. The benchmaxxing is kind of comical atp

English

Jack Shiels@shiels_ai·25 Kas

@zephyr_z9 Not so sure of this. Still too many hurdles, uncertainty of intent (this is why BAs exist), and labs are too focused on competitive benchmaxxing.

English

1.8K

Zephyr@zephyr_z9·25 Kas

Application layer will be eaten by the model providers

Soren Larson@hypersoren

when lawyers with no technical training ask Gemini 3 to replicate expensive SaaS so they can stop paying for marked up intelligence This is the squeeze microeconomics predicts

English

1.4K

220.2K

Arka@arkabagchi24·26 Kas

@zephyr_z9 Subindustries/niches are messy. The UX over an LLM that a transactional business formation attorney wants is way different than what a personal injury litigator wants in their LLM wrapper app.

English

Keşfet

@Adrian_H @iScienceLuvr @VicVijayakumar @grfwings @fuckpoasting @seezatnap @MegaBasedChad @elonmusk