GlitchNarwal

941 posts

GlitchNarwal

GlitchNarwal

@JamKo_Ams

Dull boy, work all day

Amsterdam Katılım Şubat 2010
292 Takip Edilen101 Takipçiler
Claude
Claude@claudeai·
New in Claude Code: auto mode. Instead of approving every file write and bash command, or skipping permissions entirely, auto mode lets Claude make permission decisions on your behalf. Safeguards check each action before it runs.
English
2K
2.8K
38.1K
6M
Jeroen Breevoort
Jeroen Breevoort@JeroenBreevoort·
@cursor_ai Cursor is not doing this, It's the Figma MCP that is doing this right?
English
7
0
33
7.4K
Cursor
Cursor@cursor_ai·
Cursor can now create new components and frontends in Figma using your team's design system.
English
109
269
3.7K
590.9K
Haider.
Haider.@slow_developer·
anthropic keeps doing this thing where its models are brilliant at launch, then much worse a month later opus 4.6 now is far behind gpt-5.3 codex, and even more behind gpt-5.4 high when working on large codebases not surprised though this has been happening since sonnet 4
English
141
36
1.7K
160.4K
claire vo 🖤
claire vo 🖤@clairevo·
since no one else will say it: i'm on my way to spending $15-25k this year on my @openclaw agents
English
90
8
287
55.7K
GlitchNarwal
GlitchNarwal@JamKo_Ams·
@TheDefiantGhost How am I seeing Tucker Carlson doing good shit everywhere while in my mind he was a right wing extremist?
English
0
0
0
30
Defiant Ghost
Defiant Ghost@TheDefiantGhost·
Remember when Tucker Carlson straight-up silenced Mark Cuban? Mark: "Half my family is Ukrainian. I think we should help." Tucker: "How much money have you sent to Ukraine?" Mark: "None." Tucker: "So what do you mean by we? If you think we need to help, why don't you start? How about you first?" This never gets old.
English
57
1.2K
8.9K
345.9K
GlitchNarwal
GlitchNarwal@JamKo_Ams·
@archiexzzz @karpathy This kind of thing + automated scripted voice models + scammers are going to be a pain in the but in the comming years
English
0
0
0
25
Archie Sengupta
Archie Sengupta@archiexzzz·
Introducing AutoVoiceEvals I've applied the @karpathy autoresearch loop to voice AI agents. It's open source. Your voice agent has a system prompt. That prompt determines how it handles every call - bookings, complaints, edge cases, background noises, long pauses, people trying to trick it. Most teams write it once, test manually, and hope for the best. autovoiceevals makes it a loop. One artifact (system prompt), one metric (adversarial eval score), keep what improves it, revert what doesn't. Run it overnight. Wake up to a better agent. > How it works: You describe your agent in a config file - what it does, its services, policies, and what it should never do. You don't write test cases. You don't define attack vectors. provider: vapi / smallest ai assistant: id: "your-agent-id" description: | Voice receptionist for a hair salon. Maria does coloring only. Jessica does cuts only. $25 cancellation fee under 24 hours notice. Cannot advise on skin conditions. Closed Sundays. From that description alone, Claude generates adversarial caller personas - each with an attack strategy, a voice profile (accents, background noise, mumblers, interrupters), a multi-turn caller script, and pass/fail evaluation criteria. The eval suite is generated once and held fixed for the entire run, like a validation set. > The loop: 1. Read the agent's current prompt from the platform 2. Generate adversarial eval suite from your description 3. Run baseline 4. Claude proposes ONE surgical change to the prompt 5. Push the modified prompt to the agent via API 6. Run all scenarios against the updated agent 7. Score improved? Keep. Same score but shorter prompt? Keep. Otherwise revert. 8. Go to 4. Run until Ctrl+C. The system sees its own experiment history. When a change fails, the next proposal knows what was tried and why it didn't work. We ran 20 experiments on a live Vapi dental scheduling agent. 0 human intervention. > Score: 0.728 → 0.969 (+33%) > CSAT: 45 → 84 > Pass rate: 25% → 100% > 9 kept, 10 discarded > Prompt: 1191 → 1139 chars (better AND shorter) You describe your agent. It figures out how to break it.
Archie Sengupta tweet media
English
66
84
1.2K
276.1K
Vantage Point
Vantage Point@VantagePointHQx·
@tim_cook Can you make a Mac that doesn’t become a potato after 4 years?
English
63
0
56
60.8K
Tim Cook
Tim Cook@tim_cook·
Mac just had its best launch week ever for first-time Mac customers. We love seeing the enthusiasm!
English
1.3K
1.4K
30K
5.1M
Cursor
Cursor@cursor_ai·
Composer 2 is now available in Cursor.
Cursor tweet media
English
634
896
9.8K
5.3M
GlitchNarwal
GlitchNarwal@JamKo_Ams·
@GeospyAI crazy. How are you trying to make cash out of this slop app when literally every button on your site is broken.
English
0
0
0
18
GlitchNarwal
GlitchNarwal@JamKo_Ams·
@protosphinx They made us promise to undersell it so Nvidia can have 70% margins
English
0
0
0
22
GlitchNarwal
GlitchNarwal@JamKo_Ams·
@business Funny how there hasn’t been a fire on a carrier for almost 20 years but Iran doesn’t has anything to do with this one. Also what’s up with this new trend where everybody in Israel/UAE that shares rocket impacts is getting indicted?
GlitchNarwal tweet media
English
1
0
4
1.9K
Bloomberg
Bloomberg@business·
The USS Gerald R. Ford aircraft carrier is leaving the fight with Iran and heading back to port, a source said, after a fire broke out in its laundry area and left at least two sailors with non-life-threatening injuries. bloomberg.com/news/articles/…
English
178
536
1.8K
577.7K
GlitchNarwal
GlitchNarwal@JamKo_Ams·
@MarioNawfal Funny how there hasn’t been a fire on a carrier for almost 20 years but Iran doesn’t has anything to do with this one. Also what’s up with this new trend where everybody in Israel/UAE that shares rocket impacts is getting indicted?
GlitchNarwal tweet media
English
0
0
0
58
Mario Nawfal
Mario Nawfal@MarioNawfal·
🚨 BREAKING: The USS Ford is now leaving combat after a a major fire broke out last week injuring ~200 sailors and knocking out about out ~100 bunsleeping quarters. It took 30 hours to fully control the blaze Iran claim they struck the aircraft carrier. The U.S. states damage is non-combat related Not sure what to believe anymore
English
1.6K
2.4K
12.7K
4M
GlitchNarwal
GlitchNarwal@JamKo_Ams·
@trq212 I do keep wondering how long these kinds of things will be necessary. They feel overly manual and prime targets to bake into models capabilities, and a lot of it will be obsolete once memory is - finally - evolving into a more robust system
English
0
0
0
66
GlitchNarwal
GlitchNarwal@JamKo_Ams·
@sarahwooders Same. Blew away tons of tokens on mid gpt performance. Why did they even claim to have integrated Codex performance into the model 🤷🏻‍♂️
English
0
0
0
11
Sarah Wooders
Sarah Wooders@sarahwooders·
To the people who had early access to GPT-5.4 and told us all it was amazing for coding - did you even use it?? I'm fully back to Codex 5.3
English
93
6
338
86K
Kadi🇪🇪🌻
Kadi🇪🇪🌻@TheFl0orIsLaVa·
USA: We're lifting sanctions on russian oil Ukraine: Hold my beer 🍺💥
English
105
1.1K
7.5K
52.2K
GlitchNarwal
GlitchNarwal@JamKo_Ams·
@maddenifico You guys got played so hard. It’s been crazy to see al this shit go down from outside the US
English
0
0
0
5
Bill Madden
Bill Madden@maddenifico·
This is a great idea. 👇
Bill Madden tweet media
English
1.6K
10.9K
47.5K
348.9K
GlitchNarwal
GlitchNarwal@JamKo_Ams·
@leerob Couple more years and we don’t even have to let ‘em play, we’ll just have ASI simulate based on probabilities and do NIL payouts accordingly
English
0
0
0
143
Lee Robinson
Lee Robinson@leerob·
I built an app to simulate the 2026 NCAA tournament! It uses historical data, KenPom rankings, game locations, and more to determine the win probability. ...but then has an AI model review the results and prompt for the reality of March Madness, unpredictable!
English
101
38
2.3K
425.3K