SQ Mah

148 posts

SQ Mah

@SQMah

Codex Research, OpenAI

San Francisco Bay Area 가입일 Haziran 2012

405 팔로잉1.5K 팔로워

고정된 트윗

SQ Mah@SQMah·5 Şub

Happy to announce that I’ve won runner’s up for the Vesuvius Challenge — an incredible initiative to read ancient Roman scrolls using AI. Taking home $50,000 as the only solo team! Read more about it below:

Nat Friedman@natfriedman

Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD. Today we are overjoyed to announce that our crazy project has succeeded. After 2000 years, we can finally read the scrolls: This image was produced by @Youssef_M_Nader, @LukeFarritor, and @JuliSchillij, who have now won the Vesuvius Challenge Grand Prize of $700,000. Congratulations!! These fifteen columns come from the very end of the first scroll we have been able to read and contain new text from the ancient world that has never been seen before. The author – probably Epicurean philosopher Philodemus – writes here about music, food, and how to enjoy life's pleasures. In the closing section, he throws shade at unnamed ideological adversaries – perhaps the stoics? – who "have nothing to say about pleasure, either in general or in particular." This year, the Vesuvius Challenge continues. The text that we revealed so far represents just 5% of one scroll. In 2024, our goal is to from reading a few passages of text to entire scrolls, and we're announcing a new $100,000 grand prize for the first team that is able to read at least 90% of all four scrolls that we have scanned. The scrolls stored in Naples that remain to be read represent more than 16 megabytes of ancient text. But the villa where the scrolls were found was only partially excavated, and scholars tell us that there may be thousands more scrolls underground. Our hope is that the success of the Vesuvius Challenge catalyzes the excavation of the villa, that the main library is discovered, and that whatever we find there rewrites history and inspires all of us. It's been a great joy to work on this strange and amazing project. Thanks to Brent Seales for laying the foundation for this work over so many years, thanks to the friends and Twitter users whose donations powered our effort, and thanks to the many contestants whose contributions have made the Vesuvius Challenge successful! Read more in our announcement: scrollprize.org/grandprize

English

8.7K

SQ Mah@SQMah·7 Mar

@Angaisb_ we are working on it!

English

391

Angel 🌼@Angaisb_·7 Mar

GPT models are already amazing at coding. They really should spend more time improving their creative side If OpenAI solves creative writing and frontend taste, there would be no reason for me to use other models

Angel 🌼@Angaisb_

She's now considering getting Claude Max 5x for a month She's constantly hitting rate limits on Pro Claude models are still superior for creative writing unfortunately

English

173

9.8K

SQ Mah@SQMah·7 Mar

@_glnarayanan @xdotli yes

906

Lakshmi Narayanan G@_glnarayanan·7 Mar

@SQMah @xdotli As in GPT 5.4 can be used for general purpose & coding? So we no longer need to switch to 5.3-codex for coding tasks?

English

128

Xiangyi Li@xdotli·7 Mar

openai is now * way behind claude in terms of single model capability (you can use opus 4.6 for everything, not xxx-codex) * behind claude in terms of coding (am i the only one who are often confused by the output of codex?) * behind google in distribution (gemini is my driver)

English

125

36.7K

SQ Mah@SQMah·7 Mar

@xdotli Seems interesting, @dkundel do you know?

English

306

Xiangyi Li@xdotli·7 Mar

@SQMah ofc! since api is out happy to update results on skillsbench.ai. wonder if you guys do any credits for benchmarks. i topped ~3k earlier and it's mostly gone. we got some insane tractions here x.com/xdotli/status/… also reached 4 citations in <3 weeks and growing

Xiangyi Li@xdotli

first day launching SkillsBench, some stats. all organic, zero promotional effort: - #1 Trending paper of the day on @askalphaxiv - reposted by @garrytan, pinned to profile by @omarsar0 - 355 upvotes and 160 comments on HackerNews This is a long-term and continued effort. We will continue to create the best resources for evaluating skills and agent harnesses efficacy on the most valuable tasks in diverse domains. The next batch of our tasks will be even more realistic in terms of task setting, longer horizon, and we will further collect and consolidate skills from a larger pool of experts in computer science (software engineering, machine learning, cybersecurity etc), industries (healthcare, finance, logistics, public sector etc), physical world (robotics, energy, infrastructure, manufacturing etc.), and natural sciences (kudos to @DillmannSteven). We will do in-depth user interviews with all experts we onboard and with people who have been using skills and different agents heavily recently. And we will host an Agent Skills hackathon on March 7 (link in comment) Our follow up works will include harnesses' efficiency (kudos @ryancarson etc), training dataset and open data recipe (kudos OpenThoughts @NeginRaoof_ @etash_guha @alexgshaw @lschmidt3 @charlie_ruan @ryanmart3n @harborframework etc), continual learning (@charlespacker @a1zhang @LakshyAAAgrawal @withmartian @joshgreaves_ml etc). If you are interested, don't hesitate to reach out / join our discord (in comment) / schedule a time with me on the website (in comment).

English

709

SQ Mah@SQMah·7 Mar

@xdotli Let me know if you have any feedback!

English

454

Xiangyi Li@xdotli·7 Mar

@SQMah this makes a lot of sense. i will reactivate my plus plan and try it out got very frustrated with gpt-5.3-codex earlier cuz it constantly writes confusing paragraphs and sometimes spits weird tokens

English

SQ Mah 리트윗함

Hanson Wang@hansonwng·6 Mar

x.com/i/article/2029…

ZXX

249

175.5K

SQ Mah@SQMah·6 Mar

This is how Codex is building Codex

Dwayne@CtrlAltDwayne

OpenAI are massive trolls. Notice in the left side of Codex in the chess demo for GPT-5.4 in Codex the mention of GPT-6? lmao.

English

468

SQ Mah@SQMah·6 Mar

@PaulSolt Use playwright interactive and turn on js repl in settings

English

261

Paul Solt@PaulSolt·5 Mar

Any new UI design tips for app development with Codex?

Paul Solt@PaulSolt

What agent skills are you using for UI design for Mac or iOS apps?

English

11.6K

SQ Mah@SQMah·6 Mar

@hansonwng All part of the masterplan

English

Hanson Wang@hansonwng·5 Mar

good morning

English

495

SQ Mah@SQMah·6 Mar

@george__wing Browser based ones are the main approach… for now

English

George Wing@george__wing·6 Mar

@SQMah Got any advice for cua harnesses? Like would you recommend any that actually control the desktop? or are browser based ones still the main approach?

English

SQ Mah@SQMah·6 Mar

Just demoed some of 5.4’s computer use and frontend capabilities - check it out here! What I really like is that computer use was on an Electron app, so Codex can also make and test desktop apps as well Also yes I need a haircut :)

OpenAI Developers@OpenAIDevs

GPT-5.4 is here. Native computer-use capabilities. Up to 1M tokens of context in Codex and the API. Best-in-class agentic coding for complex tasks. Scalable tool search across larger ecosystems. More efficient reasoning for long, tool-heavy workflows. openai.com/index/introduc…

English

2.6K

SQ Mah@SQMah·6 Mar

@boj_ne Thanks Yueh Han! Give it a try :)

English