Mihai Surdeanu

2.8K posts

Mihai Surdeanu

@msurd

working on #nlproc at University of Arizona

انضم Kasım 2011

167 يتبع1.3K المتابعون

Mihai Surdeanu أُعيد تغريده

Ellen Riloff@EllenRiloff·12 Ara

The Computer Science Department at U.Arizona is looking to hire multiple tenure-track and multiple teaching faculty this year. If you are searching for a faculty position and like sunshine, consider applying! ☀️🌵 cs.arizona.edu/currently-open…

English

798

Mihai Surdeanu@msurd·10 Ara

Just an average sunset in Tucson AZ:

English

516

Mihai Surdeanu@msurd·2 Eki

"dorction" - a new and important word invented by ChatGPT:

English

191

Mihai Surdeanu أُعيد تغريده

Razvan Dumitru@RazvanDuu·5 Eyl

Co-authors: @Yminglai @Vikas_NLP_UA @msurd — thank you! See you in Suzhou, Nov 4–9. 🙏 #EMNLP2025 (6/6)

English

220

Mihai Surdeanu أُعيد تغريده

Dan Jurafsky@jurafsky·24 Ağu

Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here: web.stanford.edu/~jurafsky/slp3/

English

400

34.7K

Mihai Surdeanu@msurd·25 Ağu

An important new EMNLP paper coming from our lab, with several nice and cool co-authors :)

Minglai Yang@Yminglai

Our paper accepted at EMNLP 2025 Main! 🎉 @emnlpmeeting “How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark” 👉 arxiv.org/abs/2505.18761 📌 We introduce GSM-DC: a controlled benchmark for reasoning under Irrelevant Context (IC). We systematically vary reasoning depth and IC level via a knowledge DAG to study LLM reasoning behavior under distractions, not just accuracy🧭 👥 Huge thanks to my awesome team: @_ethan_huang @LiangZhang4825 @msurd @WilliamWangNLP @PanLiangming

English

452

Mihai Surdeanu أُعيد تغريده

Matt Pocock@mattpocockuk·20 Ağu

This is actually a really solid context engineering template. Kudos, @AnthropicAI

English

615

7.9K

909K

Mihai Surdeanu@msurd·21 Ağu

This is a good idea: 2025.emnlp.org/desk-rejection/

English

232

Mihai Surdeanu أُعيد تغريده

Hadi Amiri@amirieb·25 Nis

Today (4/25, 11am EST)! In our final CS Colloquium of this series, @msurd shares how combining symbolic rules with neural models leads to more explainable information extraction. #NLP #ExplainableAI @KCSciences_UML @uarizona see details👉 ow.ly/fZL250BGU4M

English

349

Mihai Surdeanu@msurd·3 Nis

I am truly humbled to receive this award. It represents everything I stand for. I consider it the apex of my career.

English

529

Mihai Surdeanu أُعيد تغريده

Alisa Liu@alisawuffles·21 Mar

We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵

English

326

2.8K

368.7K

Mihai Surdeanu@msurd·16 Şub

Our new paper in Findings of NAACL 2025, with Vlad Negru, @robert_nlp, @CameliaLemnaru, and Rodica Potolea, proposes a new, softer take on Natural Logic, where alignment is generated through text morphing. This yields robust performance cross domain. arxiv.org/abs/2502.09567

English

5.3K

Mihai Surdeanu أُعيد تغريده

Andrew Ng@AndrewYNg·9 Oca

Using AI-assisted coding to build software prototypes is an important way to quickly explore many ideas and invent new things. In this and future posts, I’d like to share with you some best practices for prototyping simple web apps. This post will focus on one idea: being opinionated about the software stack. The software stack I personally use changes every few weeks. There are many good alternatives to these choices, and if you pick a preferred software stack and become familiar with its components, you’ll be able to develop more quickly. But as an illustration, here’s my current default: - Python with FastAPI for building web-hosted APIs: I develop primarily in Python, so that’s a natural choice for me. If you’re a JavaScript/TypeScript developer, you’ll likely make a different choice. I’ve found FastAPI really easy to use and scalable for deploying web services (APIs) hosted in Python. - Uvicorn to run the backend application server (to execute code and serve web pages) for local testing on my laptop. - If deploying on the cloud, then either Heroku for small apps or AWS Elastic Beanstalk for larger ones (disclosure: I serve on Amazon’s board of directors): There are many services for deploying jobs, including HuggingFace Spaces, Railway, Google’s Firebase, Vercel, and others. Many of these work fine, and becoming familiar with just 1 or 2 will simplify your development process. - MongoDB for NoSQL database: While traditional SQL databases are amazing feats of engineering that result in highly efficient and reliable data storage, the need to define the database structure (or schema) slows down prototyping. If you really need speed and ease of implementation, then dumping most of your data into a NoSQL (unstructured or semi-structured) database such as MongoDB lets you write code quickly and sort out later exactly what you want to do with the data. This is sometimes called schema-on-write, as opposed to schema-on-read. Mind you, if an application goes to scaled production, there are many use cases where a more structured SQL database is significantly more reliable and scalable. - OpenAI’s o1 and Anthropic’s Claude 3.5 Sonnet for coding assistance, often by prompting directly (when operating at the conceptual/design level). Also occasionally Cursor (when operating at the code level). I hope never to have to code again without AI assistance! Claude 3.5 Sonnet is widely regarded as one of the best coding models. And o1 is incredible at planning and building more complex software modules, but you do have to learn to prompt it differently. On top of all this, of course, I use many AI tools to manage agentic workflows, data ingestion, retrieval augmented generation, and so on. DeepLearning.AI and our wonderful partners offer courses on many of these tools. My personal software stack continues to evolve regularly. Components enter or fall out of my default stack every few weeks as I learn new ways to do things. So please don’t feel obliged to use the components I do, but perhaps some of them can be a helpful starting point if you are still deciding what to use. Interestingly, I have found most LLMs not very good at recommending a software stack. I suspect their training sets include too much “hype” on specific choices, so I don’t fully trust them to tell me what to use. And if you can be opinionated and give your LLM directions on the software stack you want it to build on, I think you’ll get better results. A lot of the software stack is still maturing, and I think many of these components will continue to improve. With my stack, I regularly build prototypes in hours that, without AI assistance, would have taken me days or longer. I hope you, too, will have fun building many prototypes! [Original text: deeplearning.ai/the-batch/issu… ]

English

119

446

3.1K

293.2K

Mihai Surdeanu أُعيد تغريده

Firoj Alam@firojalam04·6 Oca

🚀 Registration for CLEF 2025 Labs is NOW OPEN! Don’t miss your chance to participate in this year’s CheckThat! Lab, where we tackle some of the most critical challenges in fact-checking and information verification. 🔥 Why Join CheckThat! Lab? This year, we bring you four cutting-edge tasks designed to advance the boundaries of Natural Language Processing and Multilingual Fact-Checking: 🔍 Task 1: Subjectivity Detect subjective text and pave the way for a refined fact-checking pipeline. 🌍 Languages: Arabic, English, Bulgarian, German, Italian, and Multilingual ✏️ Task 2: Claims Extraction & Normalization Simplify and normalize social media claims across 20 languages! 🌍 Languages Include: English, Arabic, Hindi, Spanish, Thai, and more 📊 Task 3: Fact-Checking Numerical Claims Verify numerical claims. 🌍 Languages: Arabic, English, Spanish 🔬 Task 4: Scientific Web Discourse Classify online scientific discourse and retrive the mentioned paper from a pool of candidate papers 🌍 Language: English 🎓 Who Should Join? Researchers, students, and professionals in NLP, AI, and fact-checking eager to make an impact. 👉 Register Now: …ef2025-labs-registration.dei.unipd.it 👉 Learn More: checkthat.gitlab.io 👉 Access Data & Code: gitlab.com/checkthat_lab/… 🗓️ Key Dates to Remember: November 2024: Registration opens December 2024: Training materials released April–May 2025: Evaluation cycle

English

483

Mihai Surdeanu@msurd·7 Oca

Over the break, I simplified the most common usage of our #nlproc library: clulab.org/processors/bas…

English

199

Mihai Surdeanu@msurd·29 Ara

Sunsets in Arizona are something else (no Photoshop):

English

604

Mihai Surdeanu أُعيد تغريده

(((ل()(ل() 'yoav))))👾@yoavgo·17 Ara

and it is like that with any sufficiently challenging NLP task. LLMs are way better than before, but not perfect, and cannot really be improved in an interesting way. frustrating. but we should focus on *other* opportunities they bring, not the old tasks in which we are stuck.

English

3.2K

Mihai Surdeanu أُعيد تغريده

Conference on Language Modeling@COLM_conf·17 Ara

Announcement #1: our call for papers is up! 🎉 colmweb.org/cfp.html And excited to announce the COLM 2025 program chairs @yoavartzi @eunsolc @RanjayKrishna and @AdtRaghunathan

Conference on Language Modeling tweet media

English

163

22.8K

Mihai Surdeanu@msurd·16 Ara

I just finished teaching an introduction to deep learning course based on our textbook. All content (book, slides, code) is available here: clulab.org/gentlenlp/

English

667

Mihai Surdeanu أُعيد تغريده

Graham Neubig@gneubig·6 Ara

Check out our new benchmark on Evaluating LMs as Synthetic Data Generators! Main findings: - LMs' ability to generate synthetic data varies - This is not necessarily correlated with problem solving ability - More data from cheaper models is often better than less from stronger

Seungone Kim@seungonekim

#NLProc Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.

English

132

9.6K

اكتشف

@Yminglai @Vikas_NLP_UA @AnthropicAI @KCSciences_UML @uarizona @robert_nlp @CameliaLemnaru @yoavartzi