Mihai Surdeanu

2.8K posts

Mihai Surdeanu

Mihai Surdeanu

@msurd

working on #nlproc at University of Arizona

انضم Kasım 2011
167 يتبع1.3K المتابعون
Mihai Surdeanu أُعيد تغريده
Ellen Riloff
Ellen Riloff@EllenRiloff·
The Computer Science Department at U.Arizona is looking to hire multiple tenure-track and multiple teaching faculty this year. If you are searching for a faculty position and like sunshine, consider applying! ☀️🌵 cs.arizona.edu/currently-open…
English
0
2
2
798
Mihai Surdeanu
Mihai Surdeanu@msurd·
Just an average sunset in Tucson AZ:
Mihai Surdeanu tweet media
English
0
0
7
516
Mihai Surdeanu
Mihai Surdeanu@msurd·
"dorction" - a new and important word invented by ChatGPT:
Mihai Surdeanu tweet media
English
0
0
2
191
Mihai Surdeanu أُعيد تغريده
Dan Jurafsky
Dan Jurafsky@jurafsky·
Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here: web.stanford.edu/~jurafsky/slp3/
English
9
70
400
34.7K
Mihai Surdeanu
Mihai Surdeanu@msurd·
An important new EMNLP paper coming from our lab, with several nice and cool co-authors :)
Minglai Yang@Yminglai

Our paper accepted at EMNLP 2025 Main! 🎉 @emnlpmeeting “How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark” 👉 arxiv.org/abs/2505.18761 📌 We introduce GSM-DC: a controlled benchmark for reasoning under Irrelevant Context (IC). We systematically vary reasoning depth and IC level via a knowledge DAG to study LLM reasoning behavior under distractions, not just accuracy🧭 👥 Huge thanks to my awesome team: @_ethan_huang @LiangZhang4825 @msurd @WilliamWangNLP @PanLiangming

English
0
0
7
452
Mihai Surdeanu أُعيد تغريده
Matt Pocock
Matt Pocock@mattpocockuk·
This is actually a really solid context engineering template. Kudos, @AnthropicAI
Matt Pocock tweet media
English
63
615
7.9K
909K
Mihai Surdeanu
Mihai Surdeanu@msurd·
I am truly humbled to receive this award. It represents everything I stand for. I consider it the apex of my career.
Mihai Surdeanu tweet media
English
3
0
29
529
Mihai Surdeanu أُعيد تغريده
Alisa Liu
Alisa Liu@alisawuffles·
We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵
Alisa Liu tweet media
English
93
326
2.8K
368.7K
Mihai Surdeanu
Mihai Surdeanu@msurd·
Our new paper in Findings of NAACL 2025, with Vlad Negru, @robert_nlp, @CameliaLemnaru, and Rodica Potolea, proposes a new, softer take on Natural Logic, where alignment is generated through text morphing. This yields robust performance cross domain. arxiv.org/abs/2502.09567
English
0
5
24
5.3K
Mihai Surdeanu أُعيد تغريده
Andrew Ng
Andrew Ng@AndrewYNg·
Using AI-assisted coding to build software prototypes is an important way to quickly explore many ideas and invent new things. In this and future posts, I’d like to share with you some best practices for prototyping simple web apps. This post will focus on one idea: being opinionated about the software stack. The software stack I personally use changes every few weeks. There are many good alternatives to these choices, and if you pick a preferred software stack and become familiar with its components, you’ll be able to develop more quickly. But as an illustration, here’s my current default: - Python with FastAPI for building web-hosted APIs: I develop primarily in Python, so that’s a natural choice for me. If you’re a JavaScript/TypeScript developer, you’ll likely make a different choice. I’ve found FastAPI really easy to use and scalable for deploying web services (APIs) hosted in Python. - Uvicorn to run the backend application server (to execute code and serve web pages) for local testing on my laptop. - If deploying on the cloud, then either Heroku for small apps or AWS Elastic Beanstalk for larger ones (disclosure: I serve on Amazon’s board of directors): There are many services for deploying jobs, including HuggingFace Spaces, Railway, Google’s Firebase, Vercel, and others. Many of these work fine, and becoming familiar with just 1 or 2 will simplify your development process. - MongoDB for NoSQL database: While traditional SQL databases are amazing feats of engineering that result in highly efficient and reliable data storage, the need to define the database structure (or schema) slows down prototyping. If you really need speed and ease of implementation, then dumping most of your data into a NoSQL (unstructured or semi-structured) database such as MongoDB lets you write code quickly and sort out later exactly what you want to do with the data. This is sometimes called schema-on-write, as opposed to schema-on-read. Mind you, if an application goes to scaled production, there are many use cases where a more structured SQL database is significantly more reliable and scalable. - OpenAI’s o1 and Anthropic’s Claude 3.5 Sonnet for coding assistance, often by prompting directly (when operating at the conceptual/design level). Also occasionally Cursor (when operating at the code level). I hope never to have to code again without AI assistance! Claude 3.5 Sonnet is widely regarded as one of the best coding models. And o1 is incredible at planning and building more complex software modules, but you do have to learn to prompt it differently. On top of all this, of course, I use many AI tools to manage agentic workflows, data ingestion, retrieval augmented generation, and so on. DeepLearning.AI and our wonderful partners offer courses on many of these tools. My personal software stack continues to evolve regularly. Components enter or fall out of my default stack every few weeks as I learn new ways to do things. So please don’t feel obliged to use the components I do, but perhaps some of them can be a helpful starting point if you are still deciding what to use. Interestingly, I have found most LLMs not very good at recommending a software stack. I suspect their training sets include too much “hype” on specific choices, so I don’t fully trust them to tell me what to use. And if you can be opinionated and give your LLM directions on the software stack you want it to build on, I think you’ll get better results. A lot of the software stack is still maturing, and I think many of these components will continue to improve. With my stack, I regularly build prototypes in hours that, without AI assistance, would have taken me days or longer. I hope you, too, will have fun building many prototypes! [Original text: deeplearning.ai/the-batch/issu… ]
English
119
446
3.1K
293.2K
Mihai Surdeanu أُعيد تغريده
Firoj Alam
Firoj Alam@firojalam04·
🚀 Registration for CLEF 2025 Labs is NOW OPEN! Don’t miss your chance to participate in this year’s CheckThat! Lab, where we tackle some of the most critical challenges in fact-checking and information verification. 🔥 Why Join CheckThat! Lab? This year, we bring you four cutting-edge tasks designed to advance the boundaries of Natural Language Processing and Multilingual Fact-Checking: 🔍 Task 1: Subjectivity Detect subjective text and pave the way for a refined fact-checking pipeline. 🌍 Languages: Arabic, English, Bulgarian, German, Italian, and Multilingual ✏️ Task 2: Claims Extraction & Normalization Simplify and normalize social media claims across 20 languages! 🌍 Languages Include: English, Arabic, Hindi, Spanish, Thai, and more 📊 Task 3: Fact-Checking Numerical Claims Verify numerical claims. 🌍 Languages: Arabic, English, Spanish 🔬 Task 4: Scientific Web Discourse Classify online scientific discourse and retrive the mentioned paper from a pool of candidate papers 🌍 Language: English 🎓 Who Should Join? Researchers, students, and professionals in NLP, AI, and fact-checking eager to make an impact. 👉 Register Now: …ef2025-labs-registration.dei.unipd.it 👉 Learn More: checkthat.gitlab.io 👉 Access Data & Code: gitlab.com/checkthat_lab/… 🗓️ Key Dates to Remember: November 2024: Registration opens December 2024: Training materials released April–May 2025: Evaluation cycle
Firoj Alam tweet media
English
0
2
2
483
Mihai Surdeanu
Mihai Surdeanu@msurd·
Sunsets in Arizona are something else (no Photoshop):
Mihai Surdeanu tweet media
English
2
1
31
604
Mihai Surdeanu أُعيد تغريده
(((ل()(ل() 'yoav))))👾
and it is like that with any sufficiently challenging NLP task. LLMs are way better than before, but not perfect, and cannot really be improved in an interesting way. frustrating. but we should focus on *other* opportunities they bring, not the old tasks in which we are stuck.
English
5
3
57
3.2K
Mihai Surdeanu
Mihai Surdeanu@msurd·
I just finished teaching an introduction to deep learning course based on our textbook. All content (book, slides, code) is available here: clulab.org/gentlenlp/
English
0
1
16
667
Mihai Surdeanu أُعيد تغريده
Graham Neubig
Graham Neubig@gneubig·
Check out our new benchmark on Evaluating LMs as Synthetic Data Generators! Main findings: - LMs' ability to generate synthetic data varies - This is not necessarily correlated with problem solving ability - More data from cheaper models is often better than less from stronger
Seungone Kim@seungonekim

#NLProc Just because GPT-4o is 17 times more expensive than GPT-4o-mini, does that mean it generates synthetic data 17 times better? Introducing the AgoraBench, a benchmark for evaluating data generation capabilities of LMs.

English
0
27
132
9.6K