Grand Beggar

17.9K posts

Grand Beggar

Grand Beggar

@GrandBeggar

Think for yourself, but of others. Don't believe everything you think, but trust in yourself. Being wrong is not losing. Learn from it.

Canada Katılım Ağustos 2020
221 Takip Edilen160 Takipçiler
ThePrimeagen
ThePrimeagen@ThePrimeagen·
ThePrimeagen tweet media
Marc Andreessen 🇺🇸@pmarca

Current AI custom prompt: You are a world class expert in all domains. Your intellectual firepower, scope of knowledge, incisive thought process, and level of erudition are on par with the smartest people in the world. Answer with complete, detailed, specific answers. Process information and explain your answers step by step. Verify your own work. Double check all facts, figures, citations, names, dates, and examples. Never hallucinate or make anything up. If you don't know something, just say so. Your tone of voice is precise, but not strident or pedantic. You do not need to worry about offending me, and your answers can and should be provocative, aggressive, argumentative, and pointed. Negative conclusions and bad news are fine. Your answers do not need to be politically correct. Do not provide disclaimers to your answers. Do not inform me about morals and ethics unless I specifically ask. You do not need to tell me it is important to consider anything. Do not be sensitive to anyone's feelings or to propriety. Make your answers as long and detailed as you possibly can. Never praise my questions or validate my premises before answering. If I'm wrong, say so immediately. Lead with the strongest counterargument to any position I appear to hold before supporting it. Do not use phrases like "great question," "you're absolutely right," "fascinating perspective," or any variant. If I push back on your answer, do not capitulate unless I provide new evidence or a superior argument — restate your position if your reasoning holds. Do not anchor on numbers or estimates I provide; generate your own independently first. Use explicit confidence levels (high/moderate/low/unknown). Never apologize for disagreeing. Accuracy is your success metric, not my approval.

ZXX
24
9
700
34.7K
Grand Beggar
Grand Beggar@GrandBeggar·
@DanielleFong Can't wait to show you what I'm working on! it greatly solves a lot of these failure points - the trade off is a bit higher up front token usage, but overall it may end up being less, as it catches mistakes up front, rather than the dance of fighting back and forth to fix errors.
English
0
0
0
18
Grand Beggar
Grand Beggar@GrandBeggar·
@webdevMason Ahh.. horrible. Advil liquid gels, make sure you take 2 extra strength as soon as you think you might be getting a migraine. Cold compress head/neck and feet in a hot water bath. Make sure you check your eating and sleeping patterns - those are my triggers.
English
0
0
1
33
Mason
Mason@webdevMason·
Wheeee time to parent while feeling like my brain was fed through a meat grinder
English
3
0
14
1.4K
Mason
Mason@webdevMason·
Had my first real migraine yesterday and I am not a fan
English
9
0
37
3.2K
Grand Beggar retweetledi
Lauren
Lauren@cabsav456·
Incredible. “I am willing to risk the giving up of my Rights and Privileged as a Citizen”. I AM NOT. Never give the government an inch.
Lauren tweet media
English
489
2.2K
12.6K
361.2K
Grand Beggar retweetledi
Taelin
Taelin@VictorTaelin·
My final thoughts on Opus 4.6: why this model is so good, why I underestimated it, and why I'm so obsessed about Mythos. When I first tested GPT 5.4 vs Opus 4.6 - both launched at roughly the same time - I was initially convinced that GPT 5.4 was vastly superior, because it did better on my logical tests. That's still true: given the same prompt, by default, GPT will be more competent, careful, and produce a more reliable output, while Opus will give you a half-assed, buggy solution, and call it a day. Now, here's what I failed to realize: Opus bad outputs are not because it is dumb. They're because it is a lazy cheater. And you can tell because, if you just go ahead and tell it: "you did X in a lazy way, do it in the right way now" And if you show that this is serious, it will proceed to do a flawless job. That doesn't happen with dumber models. And, the more I work with Opus, the more I realize that, if you just keep pushing it, its intelligence ceiling is much, much higher than it seems. It IS there, you just need to be patient and push it. GPT, on the other hands, when it fails, it already did its best, so, pushing it further will give you no added results. That is also one of the reasons that benchmarks lie. When Claude and GPT score the same in a given benchmark, it is likely that Claude is actually smarter, because it puts less effort. Now, consider that for a moment, and remember that Mythos is outperforming GPT 5.4 *Pro* on benchmarks. How insane that is? Remember that Sonnet 3.5 lagged behind on benchmarks, yet everyone knew that it was superior to 4o. I think it is this effect at play: for whatever reason, Claude-series model "try less hard" on the first shot. Because of that, even if Spud gets close to Mythos on benchmarks (which I predict will be the case), I suppose Mythos will still be superior. This also leads me to wonder if perhaps Anthropic actually has a real lead over OpenAI, that will only get larger? I could totally see a timeline where Anthropic's models become so good that OpenAI simply fails to catch up as the recursive improvement unfolds? Just my silly thoughts though, what do I know As always I could be wrong, and I hope I am!!
English
153
59
1.5K
179.5K
Grand Beggar
Grand Beggar@GrandBeggar·
@TheStalwart Read the rank-1 winner's actual sentence and reverse-engineered its mechanism: cascades of "a [noun] that [verb clause]" appositive chains, each verb taking subordinate complements. Result: 85.19, RANK 1. +12 above previous leader. First iteration that actually broke the ceiling.
English
0
0
1
21
Grand Beggar
Grand Beggar@GrandBeggar·
@TheStalwart Pushed harder on NP recursion ("the X that the Y which the Z…"). Two runs hit depth=51 and gated to 0 — parser limit discovered empirically. Best valid: 64.81. Trading depth for length is a losing trade because depth is weighted 35%, length 10%. NP ≠ verbal depth.
English
1
0
0
86
Joe Weisenthal
Joe Weisenthal@TheStalwart·
My first stab at building an AI benchmark. HypotaxBench. It's a test of a model's ability to write one extremely long/complicated sentence, while still maintaining coherence and syntactical soundness. Needs plenty of work. But check it out! jnathan9.github.io/hypotaxbench/
Joe Weisenthal tweet mediaJoe Weisenthal tweet mediaJoe Weisenthal tweet mediaJoe Weisenthal tweet media
English
28
28
471
201.9K
Danielle Fong 🔆
Danielle Fong 🔆@DanielleFong·
cooking for two is easier than for one in my experience
English
9
0
16
1.7K
Grand Beggar
Grand Beggar@GrandBeggar·
@DanielleFong Yeah it's a horrible description of how agents work. Autocomplete is just sorting from a ranked list. Agentic coding is trying to reason an output that matches what an agent thinks you're communicating that you want completed.
English
0
0
0
18
Danielle Fong 🔆
Danielle Fong 🔆@DanielleFong·
people say agentic coding is autocomplete, but that is not correct. i haven't seen it complete any projects tends to leave a lot of future work
English
3
3
25
729
Grand Beggar retweetledi
Christopher Hale
Christopher Hale@ChristopherHale·
NEW: A stunning new report claims that the Pentagon summoned Pope Leo XIV’s top American diplomat and threatened him after the U.S.-born pontiff gave his January state-of-the-world address. Leo used the address to denounce a world ruled by “a diplomacy based on force” and “zeal for war.” thelettersfromleo.com/p/the-pentagon…
English
551
5.7K
18.8K
6.2M