Xilin Xia

148 posts

Xilin Xia

@xia_xilin

Associate Professor @unibirmingham, Turing Fellow @turinginst, #Resilience, #Sustainability and #AI, views my own

Katılım Ocak 2019

318 Takip Edilen312 Takipçiler

Sabitlenmiş Tweet

Xilin Xia@xia_xilin·16 Tem

I am deeply honored and pleasantly surprised to be named in this prize #PSIPW. It gives confirmation that the hard work put into developing open source flood modeling code has made a difference. Thanks to all my supporters!

College of Engineering & Physical Sciences@eps_unibham

Congratulations to Dr Xilin Xia (@xia_xilin), part of a team awarded the 2024 Prince Sultan Bin Abdulaziz International Prize for Water, which recognises groundbreaking solutions covering the entire water research landscape: birmingham.ac.uk/news/2024/birm… @SchoolofEng_UoB @unibirmingham

English

2.2K

Xilin Xia retweetledi

ARC Prize@arcprize·11 Ara

A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year

English

156

661

4.6K

2.3M

Xilin Xia@xia_xilin·5 May

@jsalsman @emollick Agree, obviously the results need to be verified by whoever gave the prompt.

English

Jim Salsman@jsalsman·4 May

@xia_xilin @emollick Reasonable advice, but sometimes this approach breaks things in subtle ways.

English

108

Ethan Mollick@emollick·4 May

"o3 I want you to make a map of the lighthouses of the great lakes. I want the map in “dark mode “ but each lighthouse marker should be aesthetically sized so it covers the distance it can be seen on an average night and is the color of the light" Few rounds of feedback later...

English

987

122.1K

Xilin Xia@xia_xilin·4 May

@haider1 For a enough wide range of tasks, it is already agi and sometimes asi.

English

Haider.@haider1·4 May

if you often look beyond the AI bubble, you'll see two main perspectives: > AGI/ASI will arrive this decade > AGI/ASI isn’t real and may never arrive in our lifetime where do you stand?

English

120

195

25.6K

Xilin Xia@xia_xilin·2 May

@ben_j_todd In some cases o-3 can certainly complete task that takes human 1-day. And in some areas it can even do things far better. This is already enough for transformative change.

English

Benjamin Todd@ben_j_todd·1 May

o3 can't reliably book a restaurant, control a robot, complete 1-day coding projects, or play pokemon better than a 7 year old. General intelligence means you can complete a similar range of *tasks* as humans. *That's* what enables it to have a transformative impact. Sure you can define AGI as "being good at answering short questions" if you like, but that's not a very useful definition – you can't automate labour with just Q&A. An Q&A AI *could* have a transformative impact if it could answer questions at the frontier of human knowledge and have novel insights, but o3 can't yet do that either.

Paul Novosad@paulnovosad

o3 is AGI, by any reasonable definition people would have had in 2015. It’s weird, it’s different, it’s not what we thought AGI would be like. But it’s definitely AGI. Strange times are ahead.

English

694

113.3K

Xilin Xia@xia_xilin·30 Nis

@icodeagents I think it is more complicated than this. If things can be done and communicated by chatting to AI, is it still necessary to create complicated ppt slides (an example)?

English

Sacrificial Pancakes@icodeagents·30 Nis

@xia_xilin Orgs need agents and automation, not chat bots. Soon, devs will understand the power of structured output for subjective classification (is this funny? Does this violate policy? Etc), then things will move forward

English

Xilin Xia@xia_xilin·7 Nis

A very insightful article indeed. This is also what I have observed, while individuals usually find LLMs useful, the benefits are not necessarily captured by organisations.

Andrej Karpathy@karpathy

x.com/i/article/1909…

English

204

Xilin Xia@xia_xilin·30 Nis

@kimmonismus I don’t have the statistics but from my observation of those around me, most people who are exposed to AI at least generate 70% of their code by AI.

English

290

Chubby♨️@kimmonismus·30 Nis

Microsoft: 30% of all code written by AI Google: 30% of all code written by AI I think it’s fair to assume that by end of 2025 it’s about 50%

TechCrunch@TechCrunch

Microsoft CEO says up to 30% of the company's code was written by AI | TechCrunch techcrunch.com/2025/04/29/mic…

English

113

1.3K

154.2K

Xilin Xia@xia_xilin·30 Nis

I also think there needs to be a rethink about human-computer interaction and existing workflows, many software are not really designed to be used as a tool by AI. Just an example, LLM can create good enough content for a document but may struggle to format it to a specific template.

English

278

Matthew Berman@MatthewBerman·30 Nis

The raw intelligence of models is good enough for 90% of use cases. What we need now is scaffolding: * Model routing * Agentic frameworks * AI coding frameworks * Memory management * Guardrails * Computer/browser use * Tool use * Prompt optimization What else am I missing?

English

111

453

28.5K

Xilin Xia@xia_xilin·29 Nis

@DeryaTR_ I think if we limit it to terminal based tasks, the latest models such as o3 and Gemini-2.5 are very close to AGI. In many cases where the tasks fail is because the lack of multi-modal capabilities.

English

Derya Unutmaz, MD@DeryaTR_·28 Nis

@xia_xilin What remains to be developed is full agentic capability that can perform multi-level tasks, learn new abilities unsupervised, and possess much better memory. We are indeed very close to reaching it at least level 1 AGI

English

212

Derya Unutmaz, MD@DeryaTR_·28 Nis

My condensed definition of AGI: is an AI system that has a memory & can both learn & carry out almost any computer-based task that a typical well-educated human can, like writing, coding, researching, planning or problem-solving, without needing to be re-trained for each new job.

English

217

17.4K

Xilin Xia@xia_xilin·20 Nis

@kimmonismus I found this amusing. What o3 revealed is its ability to reason and orchestrate tools. Those funny examples can be easily solved if the right tools are accessible by o3. And with the new models capabilities, we will soon be able to make task-specific tools more quickly.

English

Chubby♨️@kimmonismus·20 Nis

To be honest, it kind of bores me when I see the smirks on Reddit and elsewhere from people who enjoy seeing o3 make mistakes when counting fingers or something similar. I wonder what the purpose behind it is. Is it a psychological defense mechanism that suppresses the fear that AI will shatter one's own hubris? Is it a secret fear of AI rendering them irrelevant? o3 is better than me at 99% of all intellectual tasks. It can't (unfortunately) do all the work on a PC yet (I hope Operator is developed further quickly), but when it comes to finding solutions, it's significantly better than me. I very much welcome the fact that AI is becoming smarter than humans. For me, it is a feeling of relief to know that we are not working against technology, but with it to create a better world. And arrogance is an evil in this process that must be overcome.

English

132

777

64.7K

Xilin Xia@xia_xilin·20 Nis

@iruletheworldmo o3 seems a genuine leap forward in LLM's ability which I am able feel. It pull information from online seamlessly. The answers are well structured and insightful, on par with what I would expect from an expert - judged by the topics I know well about.

English

171

🍓🍓🍓@iruletheworldmo·19 Nis

tempted to get pro again for more o3. i dislike the fear i feel in using it.

English

257

23.4K

Xilin Xia@xia_xilin·16 Nis

@danshipper The agency of o3 is definitely impressive, it can try to run code to solve a problem. But sometimes the same problem could be better solved without using tool, o3 seems to be not smart enough decide what is the right time to use tool.

English

153

Dan Shipper 📧@danshipper·16 Nis

o3 can repeatedly zoom and crop into images in order to read small, handwritten text it is CRAZY

English

130

157

2.6K

398K

Xilin Xia@xia_xilin·5 Nis

@emollick What really struck me is that many organizations take a ‘watch and wait’ attitude. It is true that AI is fast evolving. But the fundamental thinking about the relationship between AI and human will stand the time.

English

110

Ethan Mollick@emollick·5 Nis

One of the most important things to figure out is how to make AI work additive to humans. Studies in medicine and creativity keep finding that AI alone outperforms humans and AI. I don’t think that needs to be the case, but developing better approaches requires serious R&D work.

Eric Topol@EricTopol

Another example today of A.I. performing better than physicians with access to A.I. acpjournals.org/doi/10.7326/AN… @AnnalsofIM

English

397

45.3K

Xilin Xia@xia_xilin·28 Mar

@MatthewBerman And it is fast.

English

Xilin Xia@xia_xilin·28 Mar

@MatthewBerman gpt4o is not a thinking model so I wouldn’t think it would be as good as a thinking model. But even compared with other thinking models, the long context window of Gemini2.5 is quite impressive, it can read my entire project and do some really cool things.

English

437

Matthew Berman@MatthewBerman·28 Mar

gpt4o is no where near gemini 2.5 pro at coding. like...not even close.

English

108

1.6K

189.4K

Xilin Xia@xia_xilin·14 Mar

@emollick @kevinroose The last part is well said, anyone with serious engagement with frontier AI models should come to the same conclusion - the possibility of AGI should be taken seriously.

English

1.7K

Ethan Mollick@emollick·14 Mar

“I believe now is the right time to start preparing for AGI” The same warnings are now appearing with increasing frequency from smart outside observers of the AI industry, like @kevinroose (below) & Ezra Klein. I think ignoring the possibility they are right is a real mistake.

English

141

177

1.2K

139.2K

Xilin Xia retweetledi

UKCEH@UK_CEH·12 Mar

Fantastic to present the STORMS project at today's @DAFNIfacility-DINI showcase event funded by @SciTechgovuk. Clear consensus that data sharing infrastructures are key for sustainable growth & better outcomes for society and the environment. Progress made but more to do!

English

563

Xilin Xia@xia_xilin·27 Şub

@iruletheworldmo Where is it from?

English

935

🍓🍓🍓@iruletheworldmo·27 Şub

well well well. mr orion.

English

241

24.6K

Keşfet

@OpenAI @jsalsman @emollick @haider1 @ben_j_todd @icodeagents @kimmonismus @DeryaTR_