SQ Mah

148 posts

SQ Mah banner
SQ Mah

SQ Mah

@SQMah

Codex Research, OpenAI

San Francisco Bay Area 가입일 Haziran 2012
405 팔로잉1.5K 팔로워
고정된 트윗
SQ Mah
SQ Mah@SQMah·
Happy to announce that I’ve won runner’s up for the Vesuvius Challenge — an incredible initiative to read ancient Roman scrolls using AI. Taking home $50,000 as the only solo team! Read more about it below:
Nat Friedman@natfriedman

Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD. Today we are overjoyed to announce that our crazy project has succeeded. After 2000 years, we can finally read the scrolls: This image was produced by @Youssef_M_Nader, @LukeFarritor, and @JuliSchillij, who have now won the Vesuvius Challenge Grand Prize of $700,000. Congratulations!! These fifteen columns come from the very end of the first scroll we have been able to read and contain new text from the ancient world that has never been seen before. The author – probably Epicurean philosopher Philodemus – writes here about music, food, and how to enjoy life's pleasures. In the closing section, he throws shade at unnamed ideological adversaries – perhaps the stoics? – who "have nothing to say about pleasure, either in general or in particular." This year, the Vesuvius Challenge continues. The text that we revealed so far represents just 5% of one scroll. In 2024, our goal is to from reading a few passages of text to entire scrolls, and we're announcing a new $100,000 grand prize for the first team that is able to read at least 90% of all four scrolls that we have scanned. The scrolls stored in Naples that remain to be read represent more than 16 megabytes of ancient text. But the villa where the scrolls were found was only partially excavated, and scholars tell us that there may be thousands more scrolls underground. Our hope is that the success of the Vesuvius Challenge catalyzes the excavation of the villa, that the main library is discovered, and that whatever we find there rewrites history and inspires all of us. It's been a great joy to work on this strange and amazing project. Thanks to Brent Seales for laying the foundation for this work over so many years, thanks to the friends and Twitter users whose donations powered our effort, and thanks to the many contestants whose contributions have made the Vesuvius Challenge successful! Read more in our announcement: scrollprize.org/grandprize

English
7
0
26
8.7K
Lakshmi Narayanan G
Lakshmi Narayanan G@_glnarayanan·
@SQMah @xdotli As in GPT 5.4 can be used for general purpose & coding? So we no longer need to switch to 5.3-codex for coding tasks?
English
1
0
0
128
Xiangyi Li
Xiangyi Li@xdotli·
openai is now * way behind claude in terms of single model capability (you can use opus 4.6 for everything, not xxx-codex) * behind claude in terms of coding (am i the only one who are often confused by the output of codex?) * behind google in distribution (gemini is my driver)
English
78
3
125
36.7K
Xiangyi Li
Xiangyi Li@xdotli·
@SQMah ofc! since api is out happy to update results on skillsbench.ai. wonder if you guys do any credits for benchmarks. i topped ~3k earlier and it's mostly gone. we got some insane tractions here x.com/xdotli/status/… also reached 4 citations in <3 weeks and growing
Xiangyi Li@xdotli

first day launching SkillsBench, some stats. all organic, zero promotional effort: - #1 Trending paper of the day on @askalphaxiv - reposted by @garrytan, pinned to profile by @omarsar0 - 355 upvotes and 160 comments on HackerNews This is a long-term and continued effort. We will continue to create the best resources for evaluating skills and agent harnesses efficacy on the most valuable tasks in diverse domains. The next batch of our tasks will be even more realistic in terms of task setting, longer horizon, and we will further collect and consolidate skills from a larger pool of experts in computer science (software engineering, machine learning, cybersecurity etc), industries (healthcare, finance, logistics, public sector etc), physical world (robotics, energy, infrastructure, manufacturing etc.), and natural sciences (kudos to @DillmannSteven). We will do in-depth user interviews with all experts we onboard and with people who have been using skills and different agents heavily recently. And we will host an Agent Skills hackathon on March 7 (link in comment) Our follow up works will include harnesses' efficiency (kudos @ryancarson etc), training dataset and open data recipe (kudos OpenThoughts @NeginRaoof_ @etash_guha @alexgshaw @lschmidt3 @charlie_ruan @ryanmart3n @harborframework etc), continual learning (@charlespacker @a1zhang @LakshyAAAgrawal @withmartian @joshgreaves_ml etc). If you are interested, don't hesitate to reach out / join our discord (in comment) / schedule a time with me on the website (in comment).

English
1
0
0
709
SQ Mah
SQ Mah@SQMah·
@xdotli Let me know if you have any feedback!
English
1
0
1
454
Xiangyi Li
Xiangyi Li@xdotli·
@SQMah this makes a lot of sense. i will reactivate my plus plan and try it out got very frustrated with gpt-5.3-codex earlier cuz it constantly writes confusing paragraphs and sometimes spits weird tokens
English
2
0
2
2K
SQ Mah
SQ Mah@SQMah·
@PaulSolt Use playwright interactive and turn on js repl in settings
English
0
0
1
261
Hanson Wang
Hanson Wang@hansonwng·
good morning
Hanson Wang tweet media
English
1
0
5
495
SQ Mah
SQ Mah@SQMah·
@george__wing Browser based ones are the main approach… for now
English
1
0
2
37
George Wing
George Wing@george__wing·
@SQMah Got any advice for cua harnesses? Like would you recommend any that actually control the desktop? or are browser based ones still the main approach?
English
1
0
0
52
SQ Mah
SQ Mah@SQMah·
Just demoed some of 5.4’s computer use and frontend capabilities - check it out here! What I really like is that computer use was on an Electron app, so Codex can also make and test desktop apps as well Also yes I need a haircut :)
OpenAI Developers@OpenAIDevs

GPT-5.4 is here. Native computer-use capabilities. Up to 1M tokens of context in Codex and the API. Best-in-class agentic coding for complex tasks. Scalable tool search across larger ecosystems. More efficient reasoning for long, tool-heavy workflows. openai.com/index/introduc…

English
6
1
32
2.6K
SQ Mah
SQ Mah@SQMah·
@boj_ne Thanks Yueh Han! Give it a try :)
English
0
0
0
24
Yueh-Han Huang
Yueh-Han Huang@boj_ne·
@SQMah Woah good work! We can use 5.4 to navigate old software and it’s unlocking a lot of potentials. Also looks awesome you don’t need a haircut!
English
1
0
1
46
hung
hung@hungtran·
@SQMah great work SQ!
English
1
0
1
48
SQ Mah
SQ Mah@SQMah·
@sgrove Thanks Sean! Hope all is well
English
0
0
0
39
Sean Grove
Sean Grove@sgrove·
@SQMah Hair's lookin' good, don't sweat it!
English
1
0
1
63
SQ Mah
SQ Mah@SQMah·
@OpenAIDevs Was a blast showing this demo! And thank you to the rest of the team that made this model awesome @_adiganesh Arshi Eric
English
0
0
5
199
OpenAI Developers
OpenAI Developers@OpenAIDevs·
GPT-5.4 is here. Native computer-use capabilities. Up to 1M tokens of context in Codex and the API. Best-in-class agentic coding for complex tasks. Scalable tool search across larger ecosystems. More efficient reasoning for long, tool-heavy workflows. openai.com/index/introduc…
English
333
657
6.6K
1.1M