Xilin Xia

148 posts

Xilin Xia banner
Xilin Xia

Xilin Xia

@xia_xilin

Associate Professor @unibirmingham, Turing Fellow @turinginst, #Resilience, #Sustainability and #AI, views my own

Katılım Ocak 2019
318 Takip Edilen312 Takipçiler
Sabitlenmiş Tweet
Xilin Xia
Xilin Xia@xia_xilin·
I am deeply honored and pleasantly surprised to be named in this prize #PSIPW. It gives confirmation that the hard work put into developing open source flood modeling code has made a difference. Thanks to all my supporters!
College of Engineering & Physical Sciences@eps_unibham

Congratulations to Dr Xilin Xia (@xia_xilin), part of a team awarded the 2024 Prince Sultan Bin Abdulaziz International Prize for Water, which recognises groundbreaking solutions covering the entire water research landscape: birmingham.ac.uk/news/2024/birm… @SchoolofEng_UoB @unibirmingham

English
2
2
20
2.2K
Xilin Xia retweetledi
ARC Prize
ARC Prize@arcprize·
A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year
ARC Prize tweet media
English
156
661
4.6K
2.3M
Xilin Xia
Xilin Xia@xia_xilin·
@jsalsman @emollick Agree, obviously the results need to be verified by whoever gave the prompt.
English
0
0
0
91
Ethan Mollick
Ethan Mollick@emollick·
"o3 I want you to make a map of the lighthouses of the great lakes. I want the map in “dark mode “ but each lighthouse marker should be aesthetically sized so it covers the distance it can be seen on an average night and is the color of the light" Few rounds of feedback later...
English
23
70
987
122.1K
Xilin Xia
Xilin Xia@xia_xilin·
@haider1 For a enough wide range of tasks, it is already agi and sometimes asi.
English
0
0
1
53
Haider.
Haider.@haider1·
if you often look beyond the AI bubble, you'll see two main perspectives: > AGI/ASI will arrive this decade > AGI/ASI isn’t real and may never arrive in our lifetime where do you stand?
English
120
12
195
25.6K
Xilin Xia
Xilin Xia@xia_xilin·
@ben_j_todd In some cases o-3 can certainly complete task that takes human 1-day. And in some areas it can even do things far better. This is already enough for transformative change.
English
1
0
0
82
Benjamin Todd
Benjamin Todd@ben_j_todd·
o3 can't reliably book a restaurant, control a robot, complete 1-day coding projects, or play pokemon better than a 7 year old. General intelligence means you can complete a similar range of *tasks* as humans. *That's* what enables it to have a transformative impact. Sure you can define AGI as "being good at answering short questions" if you like, but that's not a very useful definition – you can't automate labour with just Q&A. An Q&A AI *could* have a transformative impact if it could answer questions at the frontier of human knowledge and have novel insights, but o3 can't yet do that either.
Paul Novosad@paulnovosad

o3 is AGI, by any reasonable definition people would have had in 2015. It’s weird, it’s different, it’s not what we thought AGI would be like. But it’s definitely AGI. Strange times are ahead.

English
47
26
694
113.3K
Xilin Xia
Xilin Xia@xia_xilin·
@icodeagents I think it is more complicated than this. If things can be done and communicated by chatting to AI, is it still necessary to create complicated ppt slides (an example)?
English
1
0
1
15
Sacrificial Pancakes
Sacrificial Pancakes@icodeagents·
@xia_xilin Orgs need agents and automation, not chat bots. Soon, devs will understand the power of structured output for subjective classification (is this funny? Does this violate policy? Etc), then things will move forward
English
1
0
0
19
Xilin Xia
Xilin Xia@xia_xilin·
@kimmonismus I don’t have the statistics but from my observation of those around me, most people who are exposed to AI at least generate 70% of their code by AI.
English
0
0
0
290
Xilin Xia
Xilin Xia@xia_xilin·
I also think there needs to be a rethink about human-computer interaction and existing workflows, many software are not really designed to be used as a tool by AI. Just an example, LLM can create good enough content for a document but may struggle to format it to a specific template.
English
1
0
2
278
Matthew Berman
Matthew Berman@MatthewBerman·
The raw intelligence of models is good enough for 90% of use cases. What we need now is scaffolding: * Model routing * Agentic frameworks * AI coding frameworks * Memory management * Guardrails * Computer/browser use * Tool use * Prompt optimization What else am I missing?
English
111
23
453
28.5K
Xilin Xia
Xilin Xia@xia_xilin·
@DeryaTR_ I think if we limit it to terminal based tasks, the latest models such as o3 and Gemini-2.5 are very close to AGI. In many cases where the tasks fail is because the lack of multi-modal capabilities.
English
0
0
1
61
Derya Unutmaz, MD
Derya Unutmaz, MD@DeryaTR_·
@xia_xilin What remains to be developed is full agentic capability that can perform multi-level tasks, learn new abilities unsupervised, and possess much better memory. We are indeed very close to reaching it at least level 1 AGI
English
1
1
4
212
Derya Unutmaz, MD
Derya Unutmaz, MD@DeryaTR_·
My condensed definition of AGI: is an AI system that has a memory & can both learn & carry out almost any computer-based task that a typical well-educated human can, like writing, coding, researching, planning or problem-solving, without needing to be re-trained for each new job.
English
25
17
217
17.4K
Xilin Xia
Xilin Xia@xia_xilin·
@kimmonismus I found this amusing. What o3 revealed is its ability to reason and orchestrate tools. Those funny examples can be easily solved if the right tools are accessible by o3. And with the new models capabilities, we will soon be able to make task-specific tools more quickly.
English
0
0
1
80
Chubby♨️
Chubby♨️@kimmonismus·
To be honest, it kind of bores me when I see the smirks on Reddit and elsewhere from people who enjoy seeing o3 make mistakes when counting fingers or something similar. I wonder what the purpose behind it is. Is it a psychological defense mechanism that suppresses the fear that AI will shatter one's own hubris? Is it a secret fear of AI rendering them irrelevant? o3 is better than me at 99% of all intellectual tasks. It can't (unfortunately) do all the work on a PC yet (I hope Operator is developed further quickly), but when it comes to finding solutions, it's significantly better than me. I very much welcome the fact that AI is becoming smarter than humans. For me, it is a feeling of relief to know that we are not working against technology, but with it to create a better world. And arrogance is an evil in this process that must be overcome.
English
132
47
777
64.7K
Xilin Xia
Xilin Xia@xia_xilin·
@iruletheworldmo o3 seems a genuine leap forward in LLM's ability which I am able feel. It pull information from online seamlessly. The answers are well structured and insightful, on par with what I would expect from an expert - judged by the topics I know well about.
English
0
0
0
171
🍓🍓🍓
🍓🍓🍓@iruletheworldmo·
tempted to get pro again for more o3. i dislike the fear i feel in using it.
English
30
7
257
23.4K
Xilin Xia
Xilin Xia@xia_xilin·
@danshipper The agency of o3 is definitely impressive, it can try to run code to solve a problem. But sometimes the same problem could be better solved without using tool, o3 seems to be not smart enough decide what is the right time to use tool.
English
0
0
1
153
Dan Shipper 📧
Dan Shipper 📧@danshipper·
o3 can repeatedly zoom and crop into images in order to read small, handwritten text it is CRAZY
Dan Shipper 📧 tweet media
English
130
157
2.6K
398K
Xilin Xia
Xilin Xia@xia_xilin·
@emollick What really struck me is that many organizations take a ‘watch and wait’ attitude. It is true that AI is fast evolving. But the fundamental thinking about the relationship between AI and human will stand the time.
English
0
0
0
110
Xilin Xia
Xilin Xia@xia_xilin·
@MatthewBerman gpt4o is not a thinking model so I wouldn’t think it would be as good as a thinking model. But even compared with other thinking models, the long context window of Gemini2.5 is quite impressive, it can read my entire project and do some really cool things.
English
1
0
1
437
Matthew Berman
Matthew Berman@MatthewBerman·
gpt4o is no where near gemini 2.5 pro at coding. like...not even close.
English
108
35
1.6K
189.4K
Xilin Xia
Xilin Xia@xia_xilin·
@emollick @kevinroose The last part is well said, anyone with serious engagement with frontier AI models should come to the same conclusion - the possibility of AGI should be taken seriously.
English
0
1
6
1.7K
Ethan Mollick
Ethan Mollick@emollick·
“I believe now is the right time to start preparing for AGI” The same warnings are now appearing with increasing frequency from smart outside observers of the AI industry, like @kevinroose (below) & Ezra Klein. I think ignoring the possibility they are right is a real mistake.
Ethan Mollick tweet media
English
141
177
1.2K
139.2K
Xilin Xia retweetledi
UKCEH
UKCEH@UK_CEH·
Fantastic to present the STORMS project at today's @DAFNIfacility-DINI showcase event funded by @SciTechgovuk. Clear consensus that data sharing infrastructures are key for sustainable growth & better outcomes for society and the environment. Progress made but more to do!
UKCEH tweet media
English
4
4
8
563
🍓🍓🍓
🍓🍓🍓@iruletheworldmo·
well well well. mr orion.
🍓🍓🍓 tweet media
English
16
14
241
24.6K