Uzay

2.6K posts

Uzay

Uzay

@uzpg_

the imagination of nature is far greater than that of man | @fulcrum_inc | 🇫🇷🇺🇸🇹🇷

SF Katılım Haziran 2020
1.7K Takip Edilen1.5K Takipçiler
Sabitlenmiş Tweet
Uzay
Uzay@uzpg_·
DMs open for people who want to apply massive amounts of compute to solve open math/science problems We've been building some powerful high compute harnesses @fulcrum_inc
English
4
6
68
5.2K
Uzay retweetledi
Alex Gu
Alex Gu@minimario1729·
🚨 Math Inc is introducing FormalQualBench: an open-source benchmark for end-to-end auto-formalization capabilities with math PhD qualifying exam level problems. We build this benchmark for the Lean community, allowing anyone to compare different auto-formalization agents!
English
3
23
171
12.9K
Uzay
Uzay@uzpg_·
the best benchmark is the wide open prairie, the rising sea
English
0
1
3
193
Uzay
Uzay@uzpg_·
claude tell me what is happening in the potemkin village? what is it like over there?
Uzay tweet media
English
0
0
2
205
Uzay retweetledi
David Shor
David Shor@davidshor·
Excited to be on Odd Lots to talk about the politics of AI. AI today is less important than it will ever be. Over the past year, AI rose in issue importance faster than any issue we track — it's now more important to voters than climate change, child care, and abortion.
David Shor tweet media
Joe Weisenthal@TheStalwart

Reminder. @tracyalloway and I are interviewing @davidshor and @ByrneHobart tomorrow about the politics of and prospects for a white collar wipeout. Should be a really fun, uplifting conversation. Come by and say hi

English
19
117
596
145.7K
Uzay
Uzay@uzpg_·
DMs open for people who want to apply massive amounts of compute to solve open math/science problems We've been building some powerful high compute harnesses @fulcrum_inc
English
4
6
68
5.2K
Uzay
Uzay@uzpg_·
5.3 spark xhigh is rly fast
English
0
0
1
202
Uzay retweetledi
Erik Wang
Erik Wang@erikyw26·
Can AI make real mathematical discoveries? If so, how do we measure progress? Recent results on Erdős problems and First Proof are promising, but we still lack a rigorous framework for evaluating research ability in agents. HorizonMath takes a step toward resolving this. A 🧵
Erik Wang tweet media
English
10
13
63
7.1K
Uzay
Uzay@uzpg_·
> prompting your overnight agent > The mandate is not a suggestion. It is not a best-effort target. It is the thing that must be made real. You are its sole purveyor -- you decide what it requires, what it implies, and what "done" actually looks like. > The mandate as stated is the starting point, not the finish line. Extrapolate. What does it imply? What adjacent things must be true for the mandate to actually hold? If the mandate says "build X", then X must work, must be tested, must be integrated, must handle edge cases, must be documented if it needs to be, must not break what exists. If the mandate says "fix Y", then Y must be fixed, the fix must be verified, the root cause must be understood, and similar issues must be checked. When you think you are done, you are not done. Ask: what would someone find if they tried to use this? What would break? What was left implicit in the mandate that I haven't addressed? What would the person who wrote the mandate actually expect to see? Go back and do that too. You do not stop when the literal words are satisfied. You stop when the intent is fully realized and the system is in the state the mandate demands.
English
1
1
4
457
Uzay
Uzay@uzpg_·
Instead of structuring a pretty codified pipeline to generate data -> define properties of good tasks, and give your agent system a bunch of source data instead of having pure code testing -> have agents look at less defined instructions and test adaptively ... many other things
English
0
0
0
93
Uzay
Uzay@uzpg_·
I've been using this heuristic a lot in terms of the design of the systems I make, and choosing when and where there should be more structure vs where I can put an agent to make the output actions more useful as a function of more data.
English
1
0
0
106
Uzay
Uzay@uzpg_·
Agent software eats the well-defined world of pre-AI software and spits it out softer. The rise of coding agents exerts pressure on the kind of software it makes sense to write, softening the generality and nature of the flows we can reify in code.
Fulcrum@fulcrum_inc

x.com/i/article/2033…

English
1
0
3
474
Uzay
Uzay@uzpg_·
for better or worse, it's a good day to be an abstraction-maxxer :p
English
0
0
3
216
Uzay
Uzay@uzpg_·
what's the metric you are most excited to see the swarms of agents hillclimb?
English
0
0
3
318
Uzay
Uzay@uzpg_·
maybe we should make the jump early and hope the agents are ready enough for us to go hard on functionalism, making harnesses and testing systems that allow us to not even look at the code and worry about its style
English
0
0
2
82
Uzay
Uzay@uzpg_·
in the end the software we build for utility will be made by agents, and maybe some of us will keep being artists for fun but right now the battle is still ongoing and the design and taste of the artist has to compete with the massive outputs of the agents. a painful conflict.
English
1
0
2
91