William Ritossa

182 posts

William Ritossa

@williamritossa

Australia Katılım Ocak 2017

847 Takip Edilen105 Takipçiler

Sabitlenmiş Tweet

William Ritossa@williamritossa·1 May

I enjoy @TheEconomist but can't read it all. Their "Your Day in Brief" section summarise key articles, but not all of them (& don't include the GOATs @matt_levine and @benthompson) A great thing with ChatGPT/GPT API is how quickly you can make tools that used to take days/weeks

English

7.9K

William Ritossa@williamritossa·23 Kas

Link: github.com/williamritossa…

English

156

William Ritossa@williamritossa·23 Kas

The Nevada Department of Transportation has a live feed of traffic cams, so I quickly Codex’d a page which renders all the cams on the Las Vegas GP F1 track The UX could be improved, but I'm late for my run. GitHub link in thread, just download and run index.html locally

English

1.6K

William Ritossa@williamritossa·8 Eki

Oh gosh, now the joke about having to play on a minecraft parkour reel/tik tok on split screen to hold people’s attention will become a reality

English

William Ritossa@williamritossa·8 Eki

OpenAI’s pro series reasoning models are now cheaper than the GPT-4 32K when it launched, with the price for the pro models dropping ~90% in less than a year. This opens up sooooo many new product classes as the required economic value is lowered drastically - Input per 1M: $60 for 4 vs $15 for 5pro - Output per 1M: $120 for 4 vs $80 for 5pro

English

William Ritossa@williamritossa·7 Eki

o3 token velocity dropped quite by ~40% on the 2nd. gpt-4o and 5 are unaffected 👀

English

William Ritossa@williamritossa·6 Eki

@OpenAIDevs Team? 🥺

English

212

OpenAI Developers@OpenAIDevs·6 Eki

The Slack integration and Codex SDK are available to developers on ChatGPT Plus, Pro, Edu, Business, and Enterprise. Admin tools are available on Edu, Business, and Enterprise. More in the blog: openai.com/index/codex-no…

English

23.2K

OpenAI Developers@OpenAIDevs·6 Eki

Codex is now GA, along with 3 features that make it more useful for engineering teams: - @Codex in Slack - Codex SDK - New admin tools

English

779

438K

William Ritossa@williamritossa·18 Eyl

Talk about codex token caching efficiency! 5.37M tokens used until I ran out of context window

English

William Ritossa@williamritossa·16 Eyl

Now time to be humble and try it on medium/default reasoning x.com/embirico/statu…

Alexander Embiricos@embirico

yes—although we actually recommend leaving model_reasoning_effort at default (medium), to take the most advantage of the more dynamic reasoning effort

English

William Ritossa@williamritossa·16 Eyl

On first impressions, GPT-5-Codex is noticeably more rigorous when it needs to be, and at the same time completes simple tasks faster (fewer reasoning tokens. In practice, this means doing /model to toggle up/down reasoning effort, and leaving it on gpt-5-codex-high

English

128

William Ritossa@williamritossa·15 Eyl

Changing the prompt to be clearer about our desired output fixed this 4/4 🧵

English

William Ritossa@williamritossa·15 Eyl

It turns out it's because we were literally asking GPT-5 to output the date after the item! It was just following our instructions 3/4 🧵

English

William Ritossa@williamritossa·15 Eyl

A good comment on GPT-5's instruction following ability by @sherwinwu and @oliviergodement made me recall one lesson we learnt the hard way when some of our teams who don't use evals upgraded to GPT-5 without prompt changes... model output often worsened and bugs increased because GPT-5 is much better at instruction following Two examples: 1. If you give it a request where your prompt is a bit unclear or you contradict a different part of the prompt, older models would handle this gracefully (do what you meant, not what you said) but GPT-5 will over-index on your instructions and follow them literally 2. You had to beg GPT-4 and o3 to be talk in a particular way (e.g. be concise, use this tone). Whereas if you beg GPT-5 to do it, it will do it, and it will often over-index for it. After moving to gpt-5 (or writing any new prompt), regardless of whether you have evals: - Be pedantic when reviewing the prompt (or copy+paste it to GPT and ask it to be pedantic) - Manually review the model output <-- this is the key, always, ongoing 1/4 🧵

English

William Ritossa@williamritossa·4 Eyl

@sama That’s a really good observation — here’s why you’ve nailed it …

English

2.6K

74.1K

Sam Altman@sama·4 Eyl

i never took the dead internet theory that seriously but it seems like there are really a lot of LLM-run twitter accounts now

English

3.4K

1.5K

33.8K

5.8M

William Ritossa@williamritossa·4 Eyl

@sama Feeling called out in a good way 👀

English

4.7K

Sam Altman@sama·4 Eyl

really cool to see how much people are loving codex; usage is up ~10x in the past two weeks! lots more improvements to come, but already the momentum is so impressive.

English

750

365

7.2K

William Ritossa@williamritossa·25 Ağu

Great take from @btaylor on @AcquiredFM: Say you are debugging code that caused a system to shutdown- you don’t just restart the system. You find the part of the process that was broken and caused the shutdown. The same philosophy should apply Cursor/Codex/Claude. If it produces incorrect code, the philosophy should be don’t fix the code, fix the context that Cursor had that produced the bad code If you just fix the code you don’t have leverage. If you go back and say, ‘what context did this coding AI not have that if it had it, it would have produced the correct code?’ It takes longer in the short term, but in the long run it’s the difference between properly leaning into AI vs just using AI

English

249

William Ritossa@williamritossa·8 Ağu

Sharing my GPT-5 tl;dr that I sent internally after watching the livestream at 3am Sydney time: Cursor/Coding - GPT-5 is out in Cursor today and is their default model - Early testers say it’s very good at pair programming, better than Opus 4.1 - Lots of people reporting it’s very good at very long conversations/pair programming sessions. But say Opus 4/4.1 still wins on agentic coding tasks - It looks fast - much faster than o3, and gpt-4 when it first came out API - There are 4 models: gpt-5, gpt-5-mini, and gpt-5-nano, gpt-5-chat. I’m particularly to see what gpt-5 is like vs gpt-5-chat. When we moved from GPT-3 to GPT-3.5, it got a lot better at chat tasks but lost some abilities in the RLHF process - It’s available in the API today - It decides how much to think/reason on a problem. You set this in the API with the reasoning parameter, which has a new minimal reasoning param - There is a new verbosity parameter (low, medium, or high) - Structured outputs can now check regex (previously just a JSON schema) API Pricing: - Input tokens are 50% cheaper than gpt-4o - Output tokens are the same price as gpt-4o (but with thinking, it would use more tokens) ChatGPT App - GPT-5 decides how hard to think on a problem by default. This unlocks reasoning for so many people who didn’t use or have the model picker - OAI are deprecating all the other models - Early testers say it’s better than o3 for similar research (where simple = you don’t need to go deep into a problem, not that the problem is hard) but not as good as o3 on deeper research. (I hope that’s not true, since I use o3 to dive deep ~20 times a day) - They say a big unlock is that it is faster. Letting you iterate much faster at a high quality each cycle

English

277

William Ritossa@williamritossa·9 Haz

Before: { "id": 123, "status": "ok" } After: {"id":123,"status":"ok"}

English

William Ritossa@williamritossa·9 Haz

GPT defaults to returning JSON pretty-printed, with indents & new lines - great for humans but redundant for computers. We found asking it to output minified JSON reduced tokens significantly, cutting total response time by ~30% This saved us 30s on a call that was 70s (TTLT)

English

119

Keşfet

@OpenAIDevs @codex @sherwinwu @oliviergodement @sama @btaylor @AcquiredFM @elonmusk