Clement Neo

407 posts

Clement Neo

Clement Neo

@_clementneo

making AI safer

Singapore Katılım Mart 2021
318 Takip Edilen469 Takipçiler
Sabitlenmiş Tweet
Clement Neo
Clement Neo@_clementneo·
🧠🖼️ New paper on interpreting VLMs!  We study Vision-Language Models (VLMs) like LLaVA to understand how they process objects in images. We find surprising insights about how these models identify objects in images and how their inner representations develop through the layers.
Clement Neo tweet media
English
2
15
65
14.1K
Clement Neo
Clement Neo@_clementneo·
@xuanalogue @StephenLCasper Yeah this is likely overcounting - they’re counting the full funding amount for when DTC was established in 2022, which is before the DTC was designated the AISI two years later in 2024. The DTC also still works on some other non-AIS work afaik
English
0
0
1
20
Cas (Stephen Casper)
Cas (Stephen Casper)@StephenLCasper·
CAISI is very small and bureaucratically suppressed. This is part of why I, as an American, I pursued a residency at UK AISI instead of US CAISI. I’m not sure of the exact counts. But I wouldn’t be surprised if AISI has more Americans working for it than CAISI.
Cole Salvador@ColeSalvador31

In 2022 and 2023, tiny teams of researchers drew straight lines on graphs that predicted the US was headed for an energy bottleneck in AI. But the government had no idea. The future of AI is too important to make the same mistake again. We need talent-dense, AI-focused offices that can skate to where the puck is going and implement President Trump’s AI agenda. In a new piece for AFPI (@A1Policy), we discuss 2 promising offices that could act as hubs of government AI foresight: the Center for AI Standards and Innovation (CAISI) in the Department of Commerce and the Bureau of Emerging Threats (ET) in the Department of State. We found that they have the density of talent to succeed but still lack resources: funding, headcount, and authorization. Here’s a summary: 1) The Center for AI Standards and Innovation (CAISI) lacks resources > It has talented technical staff and a strong track record in evaluations, industry relationships, and insight into China > But it’s chronically underfunded. It’s been around for 3 years but only received $30M in total, not annual, funds. That’s 11 times less than the UK’s equivalent. (It’s even short of Canada and Singapore) > It’s only has 20-30 employees who are swamped with workstreams and external requests from agencies like the IC To solve this, Congress should fund CAISI with an annual budget of $50-100 million. 2) CAISI lacks authorization or a focused mission > Between Department asks, inbound from other offices, and the AI Action Plan, it has more missions than staff > Its critical mission could be threatened by future administrations, who would externally pressure it to pursue DEI initiatives Congress needs to enshrine the office and give it a clear mission. We present an America First vision for CAISI, in which it acts as a technical strike team, bridge between industry and government, frontier analysis unit, and technical standards organization. 3) The Bureau of Emerging Threats (ET) lacks authorization > ET is similarly talent-dense, with experts in cyber, AI, and international relations > But it lacks congressional authorization and could be destroyed or co-opted by future administrations The Bureau needs concrete support from Congress and levers of interagency influence, like regular reports to national security leaders. With appropriate action, Congress can help ensure the President has the resources he needs to help America win the AI race and usher in a new golden age of human flourishing. Always fun to collaborate with @CrovitzJack and @YusufSMahmood, who have posted about other sections of our piece.

English
1
1
57
9.2K
Boyang "Albert" Li
Boyang "Albert" Li@AlbertBoyangLi·
These days I feel LLMs are still worse than humans but I can't say exactly where or how. Do you share the same feeling? Well, maybe we found where and how. We created 500 stories with fewer than 100 words on average and only 2 choices. On this simple test, GPT-4o falls to the level of random guesses. Without any training, humans are at 92%. This test requires LLMs to think of story characters as independent actors with their own minds, who choose actions based on their own knowledge. And, if you believe the cognitive scientists, it is a prerequisite skill to the understanding of human intentions. LLMs, we would argue, do not understand human intentions. At least not yet. And that's why they don't follow your instructions very well.
Boyang "Albert" Li tweet mediaBoyang "Albert" Li tweet media
English
1
0
5
368
Clement Neo retweetledi
Greg Burnham
Greg Burnham@GregHBurnham·
I looked into this and the answer is so funny. In the No Thinking setting, Opus 4.5 repurposes the Python tool to have an extended chain of thought. It just writes long comments, prints something simple, and loops! Here's how it starts one problem:
Greg Burnham tweet media
Epoch AI@EpochAIResearch

Opus 4.5 scores the same on FrontierMath regardless of thinking budget, in contrast to GPT-5.1 where higher reasoning settings correspond to higher scores. However, on OTIS Mock AIME, another math benchmark, we see the thinking budget make a difference for Opus 4.5 as well.

English
22
52
772
110.2K
Clement Neo
Clement Neo@_clementneo·
@jetnew_sg What would a systematic/breadth first approach look like? And would you say this is a job better done by the platform (ie a doc/notes type app with AI functionalities) or vice versa (an AI chatbot type app with note functionalities)?
English
1
0
3
62
Jet New
Jet New@jetnew_sg·
We need new interfaces for thinking. Notion, Obsidian, Google Docs, Apple Notes - none of them are designed around thinking with AI. ChatGPT, Claude, and a whole range of AI assistant apps are great for the “rabbit hole” type of inquiry. If you need a systematic, breadth-first approach, they unfortunately fall short.
English
2
0
4
965
Clement Neo
Clement Neo@_clementneo·
@jameschua_sg @PradyuPrasad I think the question here is whether the correlation means that it is a necessary evil, or we have just taken it to be so as an excuse to not try to go against the prevailing culture
English
1
0
2
31
James Chua
James Chua@jameschua_sg·
@PradyuPrasad (obviously you can be innovative without too many crazies on the street, but you get my point about tradeoffs)
English
1
0
2
54
Pradyumna (in Bay Area)
Pradyumna (in Bay Area)@PradyuPrasad·
I love the guy who posted the essay because he gets stuff done. But amongst all the people who complain very little do. And this has been the case since the 70s! LKY in a speech then said that the Singaporean is a champion grumbler.
Pradyumna (in Bay Area)@PradyuPrasad

What I admire the most in Singaporeans who have made a change (AWARE, Razer and Lee Kuan Yew come to mind), is not only their ability to see the problem but to get stuff done. I think the onus to do that is on us.

English
2
1
27
1.7K
polymath daddy
polymath daddy@sujantkumarkv·
@kalomaze okay now I'm sold BUT please elaborate more why's it such a difference.
English
1
0
0
227
kalomaze
kalomaze@kalomaze·
uv is unfathomably good software
English
38
25
1K
57.7K
Alper Canberk
Alper Canberk@alpercanbe·
how much does the last layer of a VLM retain the original image? i trained a linear probe on the output features of several CLIP/SigLIP models on image reconstruction, and found that *only* with SigLIP, if you multiply the input by 10-100, pixels get reconstructed perfectly??
Alper Canberk tweet media
English
14
19
202
51.4K
Clement Neo
Clement Neo@_clementneo·
This is going to be an interesting social, and the team organizing this are super cool! The public sector is often a good indicator of conservative LLM adopters worried about safety guarantees, and I’ve learned a ton from these folks. Definitely do attend if you’re free.
Gabriel Chua@gabrielchua

interested in LLMs for the public sector? join us at our @iclr_conf social on day 1! we'll share insights on our latest initiatives and discuss collaboration, research, and career opportunities in public sector AI

English
0
0
3
496
Clement Neo
Clement Neo@_clementneo·
@gzcl3000 My take is that the author is a historian, so Sapiens made sense and was good because it's a history book while the other books weren't compelling
English
0
0
1
41
Clement Neo
Clement Neo@_clementneo·
@ivanleomk What do you find good? My brain has been so hardwired on the types of task I perceive Claude vs 4o to be good at that I find it hard to properly explore how 4p has improved
English
1
0
1
96
Ivan Leo
Ivan Leo@ivanleomk·
Might switch from Claude to 4o wow
English
2
0
9
1.1K
Clement Neo
Clement Neo@_clementneo·
This reminds me of the phenomenon I think I saw (but can’t find/verify) where Claude was somewhat aware that its response got pre-filled and had a similar disbelief to the earlier part of its response. Does anyone know what I’m referring to?
Garrison Lovely@GarrisonLovely

My roommates kept asking me if the AIs can count the Rs in "Strawberry" yet. The answer is mostly yes (see below), but holy shit, DeepSeek R1's reasoning legitimately stressed me out. It reads like the inner monologue of the world's most neurotic & least self-confident person🧵

English
0
0
0
559
Clement Neo
Clement Neo@_clementneo·
@jetnew_sg Do you find that memory is useful? A lot of the times for ChatGPT I find that the most random things end up in there and I’m not sure how it improves things, as compared to Claude’s project memory where you can stash key files
English
1
0
1
81
Jet New
Jet New@jetnew_sg·
I really want a way to integrate personal memory or knowledge base with all model providers and applications. I use ChatGPT's memory extensively by giving life and business updates. But this personal memory isn't shared with o1, Claude, DeepSeek, Qwen. Is anyone building this?
English
2
0
2
281
Clement Neo
Clement Neo@_clementneo·
I actually worked on this paper for a really long time, my first draft was actually in June last year! I think I’ve grown quite a lot as a researcher since then. Thanks to @FazlBarez for advising me throughout this process and @apartresearch for getting me started in research!
English
0
0
1
339
Clement Neo
Clement Neo@_clementneo·
I will be presenting my first ever poster at EMNLP 2024 from 10:30am-12pm today in the Jasmine room! I think I have a really nice poster so come check it out if you’re around :)
Fazl Barez@FazlBarez

📢 🎉 New paper with @_clementneo & Shay Cohen! We study how attention heads work with MLP neurons to predict the next token. We find a set of interpretable activity. More in the thread!

English
1
3
19
3.2K
Clement Neo
Clement Neo@_clementneo·
@akbirthko Do you think that internet-scale video pretraining would be an equally good prior? (recently I’ve been thinking about whether the most intelligent models for the next decade will continue to be those with majority text pretraining or not)
English
1
0
1
78
karthik
karthik@akbirthko·
text is the universal interface is empirically not true. the purpose of most wrappers is handling perception bottlenecks internet-scale text pretraining is a very good prior is a more accurate statement
English
2
0
19
503
Clement Neo
Clement Neo@_clementneo·
If you have any thoughts about the paper, or have any cool ideas you think are worth exploring in VLMs, do reach out to me in the DMs or replies! I’m currently thinking of pursuing a PhD, and I’ll be exploring further research on multimodality over the coming months.
English
0
1
6
452
Clement Neo
Clement Neo@_clementneo·
Also shoutout to this parallel work, which seems to have similar results for the logit lens! I think VLM interpretability is just starting to take off, and understanding multimodality is going to be important for the field. x.com/nickhjiang/sta…
Nick Jiang@nickhjiang

🔥 Paper Drop 🔥 What can we understand by peering inside vision-language models (VLMs) like LLaVA? We show that image representations inside VLMs can be directly interpreted and edited in the language space, and we apply our findings to mitigate hallucinations!

English
1
0
7
814
Clement Neo
Clement Neo@_clementneo·
🧠🖼️ New paper on interpreting VLMs!  We study Vision-Language Models (VLMs) like LLaVA to understand how they process objects in images. We find surprising insights about how these models identify objects in images and how their inner representations develop through the layers.
Clement Neo tweet media
English
2
15
65
14.1K