Xiaowei Huang

927 posts

Xiaowei Huang

@xiaoweih

Father of a girl. Professor of computer science, working on the safety and trustworthiness of AI & ML systems.

London, England Katılım Mart 2010

482 Takip Edilen163 Takipçiler

Xiaowei Huang@xiaoweih·20 Tem

“Having the behavior of an LLM change over time is not acceptable.” — why?

Santiago@svpino

GPT-4 is getting worse over time, not better. Many people have reported noticing a significant degradation in the quality of the model responses, but so far, it was all anecdotal. But now we know. At least one study shows how the June version of GPT-4 is objectively worse than the version released in March on a few tasks. The team evaluated the models using a dataset of 500 problems where the models had to figure out whether a given integer was prime. In March, GPT-4 answered correctly 488 of these questions. In June, it only got 12 correct answers. From 97.6% success rate down to 2.4%! But it gets worse! The team used Chain-of-Thought to help the model reason: "Is 17077 a prime number? Think step by step." Chain-of-Thought is a popular technique that significantly improves answers. Unfortunately, the latest version of GPT-4 did not generate intermediate steps and instead answered incorrectly with a simple "No." Code generation has also gotten worse. The team built a dataset with 50 easy problems from LeetCode and measured how many GPT-4 answers ran without any changes. The March version succeeded in 52% of the problems, but this dropped to a pale 10% using the model from June. Why is this happening? We assume that OpenAI pushes changes continuously, but we don't know how the process works and how they evaluate whether the models are improving or regressing. Rumors suggest they are using several smaller and specialized GPT-4 models that act similarly to a large model but are less expensive to run. When a user asks a question, the system decides which model to send the query to. Cheaper and faster, but could this new approach be the problem behind the degradation in quality? In my opinion, this is a red flag for anyone building applications that rely on GPT-4. Having the behavior of an LLM change over time is not acceptable. Have you noticed any issues when using GPT-4 and ChatGPT lately? Do you think these problems are overblown?

English

456

Xiaowei Huang@xiaoweih·10 Tem

The grading methodology is loose. Instead of giving credit about how good a certain activity (such as Evaluations and Testing) has been performed, it only requires the company "Report the results of " their internal reports. lnkd.in/gDdv6wQT

English

239

Xiaowei Huang retweetledi

The Guardian@guardian·7 Tem

‘A huge relief’: scientists react to hopes of UK rejoining EU Horizon scheme #Echobox=1688687786" target="_blank" rel="nofollow noopener">theguardian.com/science/2023/j…

English

50.5K

Xiaowei Huang@xiaoweih·1 Tem

@matthew_wicker @ICComputing Congratulations! Well done and well deserved

English

154

Matthew Wicker@matthew_wicker·30 Haz

Excited to announce that I will be continuing my work on guarantees for trustworthy ML/AI this summer as a Lecturer (Assistant Professor) at Imperial College's Department of Computing! @ICComputing 🎉🎊🥳

English

10.8K

Xiaowei Huang@xiaoweih·30 Haz

Not sure if this is true. If so, that’d be a significant academic misconduct. People take significant amount of time writing proposals. If really busy, why can’t one just refuse to review?

English

319

Xiaowei Huang@xiaoweih·28 Haz

i am curious about where they got the data?

World of Statistics@stats_feed

The countries with the highest rates of smartphone addiction: 1. 🇨🇳 China 2.🇸🇦 Saudi Arabia 3.🇲🇾 Malaysia 4.🇧🇷 Brazil 5.🇰🇷 South Korea 6.🇮🇷 Iran 7. 🇨🇦 Canada 8.🇹🇷 Turkey 9.🇪🇬 Egypt 10.🇳🇵 Nepal 11.🇮🇹 Italy 12.🇦🇺 Australia 13.🇮🇱 Israel 14.🇷🇸 Serbia 15.🇯🇵 Japan 16.🇬🇧 United Kingdom 17.🇮🇳 India 18.🇺🇸 United States 19.🇷🇴 Romania 20.🇳🇬 Nigeria 21.🇧🇪 Belgium 22.🇨🇭 Switzerland 23.🇫🇷 France 24.🇩🇪 Germany

English

253

Xiaowei Huang@xiaoweih·28 Haz

Life: discover something fun, exercise small (yet sensible) risk, play around it, but always come back to your solid support.

Buitengebieden@buitengebieden

My all time favorite.. ❤️ Never gets old..

English

185

Xiaowei Huang retweetledi

Taylor Ogan@TaylorOgan·23 Haz

A Tesla on Full Self-Driving blows through a stop sign at 35mph and nearly collides with two cars. The kicker is that this was during a livestream debate-demo-drive between FSD fan @GerberKawasaki and FSD skeptic @RealDanODowd. This should go without saying, but for an automated system to be at the safety level of a human, this cannot happen. The fact that this occurred yesterday during THIS drive (along with other safety-critical disengagements) should serve as statistical evidence of how frequent this is occurring. IMO, this is the biggest nail in the Tesla FSD coffin.

English

401

394

2.7K

1.5M

Xiaowei Huang@xiaoweih·23 Haz

Is GPT a high risk system? As a general purpose system, it is not designed to be. However, people might use it when building high risk systems. The same argument goes to every machine learning system. There needs to be a stronger reason to exclude GPT fro…lnkd.in/gByVRYNF

English

104

Xiaowei Huang@xiaoweih·22 Haz

The question is what would be a satisfactory V&V process? — “Significant technical documentation will be required on testing and validation procedures, the collection, storage, mining and so on of data, and accountability. ” lnkd.in/g-6VxfB8

English

Xiaowei Huang@xiaoweih·19 Haz

Isn't it obvious that a "failing grade for pedestrian crashworthiness" should veto the "five start safety rating"? lnkd.in/gkVRcggM

English

Xiaowei Huang@xiaoweih·28 May

Can deep learning be absolutely safe? Can this safety be proven? — I thought people have now generally agreed that neither of the above questions can have positive answer, and some concepts like safety integrity level (SIL) should play a role as a probabi…lnkd.in/efjWheRf

English

117

Xiaowei Huang@xiaoweih·22 May

LLMs -- we completed a survey with 300+ references, trying to summarise the known vulnerabilities of LLMs and discuss whether and how the verification and validation (V&V) techniques can be adapted to work with LLMs. The paper is now available at ArXiv: lnkd.in/eK5EEQKS .

English

638

Xiaowei Huang retweetledi

spooky pshilla (alpha male) 👑🎃@BoobsRespectr·6 May

@pmddomingos europe can just tax american companies, no need for own development.

English

683

Xiaowei Huang@xiaoweih·4 May

Would be great if we can receive submissions regarding large language models and ChatGPT for interesting discussions in the workshop. lnkd.in/eMcP4qB5

English

351

Xiaowei Huang@xiaoweih·3 May

Finally get this book published! The title self-explains what it is for :) lnkd.in/dk3SMs4f lnkd.in/dcc_i_pu

English

313

Xiaowei Huang@xiaoweih·26 Nis

Is this a good example of GDP or bad?

English

121

Xiaowei Huang@xiaoweih·26 Nis

thankful that the last word was “poorer” not “poor”

BBC News (UK)@BBCNews

Bank of England economist says people need to accept they are poorer trib.al/lePGD3J

English

171

Xiaowei Huang@xiaoweih·17 Nis

This is appalling… probably before we regulate AI on privacy, we should regulate human.

neil by mouth@nbreavington

JUST IN - Elon Musk tells Tucker Carlson that various government agencies had full access to everything that's going on Twitter.. including people's DMs (direct messages). Full interview on 8 pm ET Monday!!!

English

121

Keşfet

@matthew_wicker @ICComputing @GerberKawasaki @RealDanODowd @pmddomingos @elonmusk @BarackObama @taylorswift13