I'm really fascinated by this dataset from the AI poetry survey paper. Here's another visualization I just made. Survey respondents were shown one of these 10 poems, and either told that they were authored by AI, human, or not told anything.
We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning.
State-of-the-art AIs get <10% accuracy and are highly overconfident.
@ai_risk@scaleai
If you have ever tried to read free books from sites like Project Gutenberg, you noticed that they can be uncomfortable to read, due to their layouts, type & occasional errors
This project takes those free books and makes them beautiful (and still free). standardebooks.org
The reason everything will not change quickly, even if AI generally exceeds human abilities across fields, is, in large part, the nature of systems.
Organizational and societal change is much slower than technological change, even when the incentives to change quickly are there.
Fantastic work by the ai safety institute
aisi.gov.uk/work/our-first…
‘World-leading’ gets touted regularly in UK, often egregiously.
But the AISI is an *actually* world-leading capability in something very important!
We need to celebrate and support it!!
@bluehost@CMAgovUK And not only do prices rocket after the 1st year - the features available for new subscribers are significantly better than existing subscribers, e.g. hosting 10 sites vs. 1 on basic tier. 🤨 @MartinSLewis wouldn't be impressed! Better switch to @Hostinger or @ionos_help_uk
@richardsargeant@CMAgovUK We certainly appreciate how prices shift over time. However, it's very common to offer introductory rates that, after a certain amount of time, increase to the original, standard rate.
“more money has been spent on the Lower Thames Crossing’s planning application alone than it cost Norway to not only build the world’s longest road tunnel (the Laerdal tunnel), but also the world’s deepest subsea road tunnel.” 😣
open.substack.com/pub/samdumitri…
S. Korea spent $200b trying to increase its birthrate. Hungary spends 5% of GDP.
Both are failing.
Yet the small country of Georgia spiked its birthrate massively without spending a dollar. How?
They understood that fertility isn't about money. It's about status.
New on @airstreetpress: Now that LLMs can convincingly automate much of a bored human’s tasks, attention is turning to “agentic AI”.
In this piece, we evaluate how far advanced this work actually is, look at both promising research directions and the challenges ahead.
Thread:
Testing, evaluating, and securing GenAI deployments in a scalable and repeatable manner is difficult. @bcgx_ have just released an 'Automated Red Teaming' Kit to support teams do this efficiently, consistently and to a higher standard. Check it out: github.com/BCG-X-Official…
Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: openai.com/index/hello-gp…
Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks.