patrick sphinx

310 posts

patrick sphinx

patrick sphinx

@SphinxPatrick

Katılım Eylül 2015
224 Takip Edilen5 Takipçiler
patrick sphinx retweetledi
geniusvczh
geniusvczh@geniusvczh·
gpt写代码特别喜欢吃掉异常,而我们的目标是尽量让程序尽可能多的崩溃,这个靠prompt是做不到的,只能安排test case把每一个点都引爆一次🤪
中文
0
0
11
1.7K
patrick sphinx retweetledi
Yihao Sun
Yihao Sun@StarGazerMiao·
After 6 years, finally be able to ends my last chapter of the last paper in PhD time with the same style title of the first paper I read in first day of my PhD confidently.
Yihao Sun tweet media
English
1
4
36
2.1K
patrick sphinx retweetledi
Maxwell Brown
Maxwell Brown@imax153·
We on the @EffectTS_ team we often recommend cloning the Effect repo into your project so your agent can explore the source directly. I finally wrote up why it works and how to set it up: effect.website/blog/the-one-w…
English
20
38
398
29.2K
patrick sphinx retweetledi
LemonHX
LemonHX@lemonhxmoe·
chiba-lang.org/blog/chiba5 大家可以看看我是怎么在AI时代用最现代的手法干PL的: - SubAgent - Context Engineering - Harness 反正什么都有!
中文
3
1
24
4.8K
patrick sphinx retweetledi
Taelin
Taelin@VictorTaelin·
To whom it may concern NanoProof.hs: the smallest viable proof checker I posted something similar before, but it was more of a research experiment with weird λ-encoded shit, than something usable. This new repo contains a tiny, 1000-LOC Haskell self-contained proof checker that you can actually use to prove arbitrary theorems. The language has just 6 base types: → Empty (`⊥`): type with 0 elems → Unit (`⊤`): type with 1 elem (`()`) → Bool (`𝔹`): type with 2 elems (`0 | 1`) → Sigma (`ΣA.B`): dependent pairs (`(x,y)`) → Pi (`ΠA.B`): dependent functions (`λx.f`) → Equal (`a==b`): propositional equality (`{==}`) That's all you need. Each of these is needed, as it introduces something fundamental. The file includes a parser, stringifier, equality, a bidirectional type checker, and a simple CLI. It also includes first-class reduction relations, which allow us to pretty print goas just like Lean. You can place '()' in a position to inspect the current context and goal there. I also include a demo proof for the commutation of multiplication.
English
12
11
279
17.2K
patrick sphinx retweetledi
Talia Ringer 🕊🪬
Talia Ringer 🕊🪬@TaliaRinger·
Our synthetic Euclidean Geometry proof assistant is now open for public contributions! We created this together as a class in my Build Your Own Proof Assistant course this semester. Please give it a spin and consider contributing! github.com/nicegeo/nicegeo
English
2
12
60
5.1K
patrick sphinx retweetledi
Suwako — e/acc
Suwako — e/acc@suwakopro·
各个LLM在ProgramBench上的ast-grep测试的通过率好低啊 @hd_nvim
Suwako — e/acc tweet media
中文
4
2
9
3.2K
patrick sphinx retweetledi
Kiran
Kiran@kirancodes·
5 lines of python. an economic game with complex equilibria. Our new language Pact uses Choreographies with game theory, allows expressing economic transactions in lines. So simple an agent could write it Claude? make me some money. and make no mistakes arxiv.org/abs/2605.03143
Kiran tweet mediaKiran tweet media
English
3
12
71
5.5K
WongSSH
WongSSH@wong_ssh·
部分内容参考了 The Calculus of Computation 的第六章 Program Correctness: Strategies。 目前来看,使用 dafny 写形式化证明的核心就是复制已知条件进入 loop invariant,毕竟使用的 basic path 方法会丢弃大量上下文。当然,很好的是目前来看,LLM 比我这个初学者厉害,核心性质都可以写出来
中文
3
0
2
589
WongSSH
WongSSH@wong_ssh·
最近手动使用 dafny 证明了几个经典的数组排序算法,从浅入深分别是冒泡排序、快速排序和归并排序,最难证明的应该是快速排序。在写代码过程中,我写了一篇博客。由于我目前实践较少,所以博客内容不太成体系。 除了算法外,博客最后附上了一个智能合约漏洞的证明,明天应该还会再补充另一个案例。
WongSSH tweet mediaWongSSH tweet media
中文
1
5
26
3.9K
patrick sphinx
patrick sphinx@SphinxPatrick·
@wong_ssh dafny之类的语言如果真让developer手写感觉太繁琐了,基本上人脑要随时想哪个地方有什么条件要满足,不知道program一旦大了还能不能proof能不能跟上
中文
0
0
0
25
patrick sphinx
patrick sphinx@SphinxPatrick·
@wong_ssh 现在有一些formal methods的人在做用llm生成proof的工作
日本語
0
0
0
15
patrick sphinx retweetledi
Kiran
Kiran@kirancodes·
Did a survey of all LLM-based VeriCoding benchmarks Seems like everyone's focusing on single-file programs. Have you ever seen a REAL verified system? a file-system? a OS? the specs for every function are HUGE. It looks nothing like your fibonacci leetcode spec. We're cooked.
Kiran tweet media
English
1
4
24
1.1K
patrick sphinx retweetledi
Devdatta Akhawe
Devdatta Akhawe@frgx·
Everyone’s talking about AI-powered attackers finding software vulnerabilities at scale. Hot take: that’s not the risk I’d prioritize first.
English
2
4
16
1.9K
patrick sphinx retweetledi
Marcel Böhme👨‍🔬
Marcel Böhme👨‍🔬@mboehme_·
From an economic perspective, once we are back to equilibrium, bugs in critical software will be just as difficult to find as they were before AI agents (and before fuzzing). More details: arxiv.org/abs/2402.01944… (Security as a function of incentive)
s1r1us (mohan)@S1r1u5_

from firefox blogpost where mythos found 270 new bugs: > The defects are finite, and we are entering a world where we can finally find them all it's like lord kelvin saying "there is nothing new to be discovered in physics now". can't tell if firefox has some incentives at play or is just naivete fascinating example here on what i mean x.com/5aelo/status/2…, saelo wrote a fuzzer with a few files and found crazy bugs. he pulled it off because he already knows the target deeply( he designed ubercage?) and knows how to shape the fuzzer toward the interesting surface. i still think, operators like saelo + mythos set the ceiling of the bugs that can be found, even then its not all bugs, the next version after mythos would move up, but mythos in a loop on its own sits below the ceiling you only want the software to be secure from smartest adversary in the world, its not all bugs, cuz rice theorem and stuff means you are not getting there anyway. sure, for fixed code base like basic web app, the set might be finite and you can exhaust them all, but i cant convince myself that software like firefox has finite set of bugs and you can exhaust em all. if mythos isn't agi and is still jagged, the narrative that mythos alone is the smartest adversary and will find all "finite" bugs is exactly what a frontier model company would sell untested. and bro even "our team + mythos will find them all" is a crazy narrative too, it assumes your team has the smartest humans in the world, and that nso or some north korean team won't be pwning you with the same setup at the top of the ceiling BUT ALSO, mythos alone is probably smarter than 99.9% of humans (vibes-based), and 100s of them running behind api keys is really bad, because most things you’d want to breach don’t need saelo+mythos ceiling bugs to get into. so we cooked?

English
6
20
102
27.3K