Arka

3.9K posts

Arka banner
Arka

Arka

@arkabagchi24

Katılım Mart 2016
732 Takip Edilen496 Takipçiler
Arka
Arka@arkabagchi24·
For a company convinced they’re about to automate software engineers out of existence, Anthropic sure generates enough server-side API errors to keep human DevOps engineers employed for the next millennium
English
0
0
0
69
Arka
Arka@arkabagchi24·
@Adrian_H Couldn't make it past the first paragraph of the intro without groaning and rolling my eyes
English
0
0
1
134
Arka
Arka@arkabagchi24·
Friend manages an eng team had to fire a dev bc they would just correspond with AI slop messages, update documentation way too fast with AI slop, and push AI slop code . Said he’ll have to “adjust interviewing to incorporate some questions around AI ethics, or AI work ethics.”
English
0
0
0
109
Arka
Arka@arkabagchi24·
@iScienceLuvr They aren't training models to run startups. They're training them to navigate human workflows. Just bc a company didn't reach a billion dollar valuation doesn't mean their engineers didn't write good code, or that their day-to-day Slack comms aren't perfect training data
English
0
0
0
185
Arka retweetledi
@·
please shut the fuck up i don't even care about the specific thing you're saying i'm just so tired of hearing predictions one after the other telling me what the future is going to be like just please shut the fuck up
English
390
881
14.7K
621.2K
Arka
Arka@arkabagchi24·
@VicVijayakumar What was the rationale to start with Fargate instead of EC2?
English
0
0
0
62
Vic 🌮
Vic 🌮@VicVijayakumar·
Breakdown of my February AWS bill to run my side projects: EC2: $44.22 RDS (reserved Aurora MySQL): $41.31 ELB: $16.96 Data Transfer: $15.12 VPC: $11.61 CodeBuild: $1.29 S3: $1.10 ECS: $0.67 ECR: $0.29 ------------ Total: $132.57 For completeness, here's August to February- August: $203.95 September: $210.77 October: $245.98 November: $261.70 December: $221.30 January: $146.65 February: $132.57 In November, I moved all my instances from Fargate to EC2. <--- cheaper and much more performant. In December, I fixed the binpack strategy for one of my projects so I didn't pointlessly run an extra EC2 instance. I also moved my RDS to a reserved instance. In January, I moved the most resource intensive scheduled jobs to Fargate and I was able to drop the base container size, which dropped the EC2 instance sizes. Specifically I am able to see that my scheduled Fargate jobs ran for 13 hours and cost a total of $0.67. No changes in February that I remember, but it's 3 days shorter than January so 🤷‍♂️
Vic 🌮@VicVijayakumar

Breakdown of my January AWS bill to run my side projects: RDS (reserved Aurora MySQL): $46.02 EC2: $42.68 Data Transfer: $21.43 ELB: $19.21 VPC: $13.67 CodeBuild: $1.44 S3: $1.01 ECS: $0.89 ECR: $0.29 Cost Explorer: $0.01 (lol what, I didn't realize they even charged for this) ------------ Total: $146.65 For completeness, here's August to January- August: $203.95 September: $210.77 October: $245.98 November: $261.70 December: $221.30 January: $146.65 In November, I moved all my instances from Fargate to EC2. <--- cheaper and much more performant. In December, I fixed the binpack strategy for one of my projects so I didn't pointlessly run an extra EC2 instance. I also moved my RDS to a reserved instance. In January, I moved the most resource intensive scheduled jobs to Fargate and I was able to drop the base container size, which dropped the EC2 instance sizes.

English
9
0
39
14K
Griffin
Griffin@grfwings·
@fuckpoasting Sunnyvale had the only Molly Tea in the Bay Area for a bit. Definitely bumped the ranking
English
1
0
0
110
Arka
Arka@arkabagchi24·
@Adrian_H Most superbowl watchers obviously are interested in trying out new CLI tools / IDE extensions
English
0
0
1
77
Arka
Arka@arkabagchi24·
@seezatnap @MegaBasedChad Ehh, 5.2 seems to make a lot more focused and sensible changes. The architecture astronaut shit is from Opus’ suggested code IME
English
1
0
2
37
seezatnap
seezatnap@seezatnap·
@MegaBasedChad i use both in an alternating loop and yeah, claude is the eng who will just ship something and it will kinda be bad code but it'll work fine and we're all happy, codex is the L7 code machine who insists on a five week refactor and ships some god tier thing only it understands
English
2
0
1
79
Arka
Arka@arkabagchi24·
GPT-5.2-xhigh enjoyers are pretty punished because its hard explaining that this is the smartest AI in the world at the moment but you have to wait around 15 minutes for the AI to actually start responding to you.
English
0
0
0
91
Arka
Arka@arkabagchi24·
5.2 xhigh (only on xhigh reasoning effort) is so spiky, ridiculously uneven capability profile and I mean that in the good sense wrt its ceiling.
English
0
0
0
85
Arka
Arka@arkabagchi24·
@tekbog Everything I use
English
0
0
1
43
Arka
Arka@arkabagchi24·
I still feel like GPT-5.1 (high reasoning_effort) is better for my software use cases than Opus 4.5 or Gemini 3 Pro. GPT-5.1 still makes the most targeted and focused changes where it understands your system architectures and reuses or extends existing schemas, modules, etc.
English
0
0
1
175
Arka
Arka@arkabagchi24·
@AlertFoxes @circlerotator @scaling01 Gotcha. Interesting observation re. Gemini struggling to parse intent with complex/unclear/contradictory instructions. I think intent parsing is this intangible that is hard to benchmark but becomes apparent with personal use
English
0
0
1
20
James Wigglesworth
James Wigglesworth@AlertFoxes·
Yeah, good question. Gemini isn't as reliable as Anthropic models at tool calling. The CLI is less effective than both Claude Code and Codex (not a model issue though). Gemini struggles with uncertainty, ambiguity, and contradictions more than GPT and Claude. And both GPT and Gemini have issues with intention understanding. Not saying not to use Gemini though! It has a lot of use cases. It's SOTA at vision and physical understanding. It's also extremely smart and a great brainstormer
English
1
0
2
102
Lisan al Gaib
Lisan al Gaib@scaling01·
Claude 4.5 Opus is only a slight step above Opus 4.1, but nowhere near Gemini 3 Pro on SimpleBench
Lisan al Gaib tweet media
English
11
3
152
9K
Arka
Arka@arkabagchi24·
@AlertFoxes @circlerotator @scaling01 Just curious what issues re. Gemini 3 Pro you’re referring to when you said it has “so many issues” that it isn’t widely useful.
English
2
0
1
31
James Wigglesworth
James Wigglesworth@AlertFoxes·
@circlerotator @scaling01 Yeah, totally agree about Goodharting. Most of those benchmarks are one-dimensional as well which is why have to resort to vibes so much. ie Gemini killed it on a lot of benchmarks, but had so many issues with it that it's not widely useful.
English
1
0
0
34
Arka
Arka@arkabagchi24·
@shiels_ai @zephyr_z9 Honestly is hilarious watching big AI labs benchmaxx while we (at LLM-based startups) use their models for tasks that the benchmarks don't even come close to covering. The benchmaxxing is kind of comical atp
English
0
0
0
26
Jack Shiels
Jack Shiels@shiels_ai·
@zephyr_z9 Not so sure of this. Still too many hurdles, uncertainty of intent (this is why BAs exist), and labs are too focused on competitive benchmaxxing.
English
1
0
3
1.8K
Arka
Arka@arkabagchi24·
@zephyr_z9 Subindustries/niches are messy. The UX over an LLM that a transactional business formation attorney wants is way different than what a personal injury litigator wants in their LLM wrapper app.
English
0
0
0
93