Sebastian Nehrdich

2.1K posts

Sebastian Nehrdich banner
Sebastian Nehrdich

Sebastian Nehrdich

@SebastianNehrd2

東北大学 助教 Assistant professor, Tohoku University. Also in charge of Dharmamitra in collaboration with BAIR, UC Berkeley. Research in ancient Asian languages.

Sendai, Japan Katılım Kasım 2020
1.8K Takip Edilen7.4K Takipçiler
Sabitlenmiş Tweet
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
Thrilled to share that I will be joining Tohoku University in Sendai, Japan, as tenure track assistant professor from autumn this year. Happy to work on AI for Sanskrit, Chinese, Tibetan, Pali, and Japanese, and to educate new generation of students who can work with these tools.
Sebastian Nehrdich tweet media
English
43
45
750
35.6K
Sebastian Nehrdich retweetledi
Dharmamitra
Dharmamitra@dharmamitra_ucb·
Coming very soon: MITRA Explore will enable to ask more open questions and get answers based on the powerful retrieval capabilities of Dharmamitra!
Dharmamitra tweet media
English
3
3
22
1.5K
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
One recent observation by a colleague: “cracking complex Sanskrit sentences, understanding their grammar and translating them used to be a solitary joy. AI tools like dharmamitra make it less solitary, and less joyful.”
English
4
2
42
2.6K
Sebastian Nehrdich retweetledi
Dharmamitra
Dharmamitra@dharmamitra_ucb·
We are happy to announce that Dharmamitra now features a board of advisors. They will advise on the kind of data that Dharmamitra includes, on the functionality and design of our applications, and on making sure that we keep providing tools and utility of highest quality.
Dharmamitra tweet media
English
2
4
9
817
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
A bit late to the party but this is a great paper, and I am very happy to see that ByT5-Sanskrit is indeed a versatile model that adepts well and outperforms much larger LLMs in task-specific settings for Sanskrit!
Manoj Balaji@manojbalaji1

🧨 Think giant LLMs can do everything? Sanskrit poetry just put them on notice: a small, task-specific model beats instruction-tuned LLMs at converting verse → canonical prose. Curious? Read on. 1/n #AACL #AACLIJCNLP #AACLIJCNLP2025 #ACL #NLP #Sanskrit

English
1
0
15
693
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
This is sooo avoidable! Like there is zero intellectual effort in looking up the papers and citing them properly, at least that much effort one can expect right!
English
1
0
7
313
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
I am reviewing for various computer science conferences these days and I had to reject roughly 40% of the papers I looked at so far purely on the fact that their bibliographies are filled with hallucinations.
English
2
0
13
423
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
2. Mitrasamgraha: A Comprehensive Classical Sanskrit Machine Translation Dataset arxiv.org/abs/2601.07314 A large dataset of parallel sentence pairs for Classical/Vedic Sanskrit to English, covering multiple domains and time spans. Useful for machine translation!
English
0
1
8
407
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
Two preprints of the Dharmamitra project: 1. MITRA: arxiv.org/abs/2601.06400 This paper describes our large multilingual parallel dataset release, the machine translation model, and our retrieval system. 1/
English
1
5
13
1.3K
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
Deep learning can of course be very helpful when it comes to the creation of such resources like annotated corpora, treebanks, etc. Thats where LLMs, POS taggers, etc. can come in extremely handy. I hope we will see more of this in 'mainstream Indology' and Buddhist Studies.
English
1
0
7
160
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
Linguistically transparent dependency treebanks, that enable the application of statistical modeling (think Bayes) are very different because traceable features can be defined, and developments can be modeled. 5/
English
1
0
4
161
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
An interesting observation is that people who work on 'the periphery' of Indology/Buddhist Studies (Vedic Sanskrit, or Tocharian, Kothanese etc.) are usually also much more likely to be trained in linguistics and to apply proper linguistic tooling. 1/
English
1
1
8
324
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
Exciting NLP doesn't necessarily require hardware that costs more than the yearly salary of a postdoc, if you have good data and the right ideas. Throwing more compute at a problem often helps, but sometimes going the opposite direction can lead to very elegant solutions.
English
1
0
5
269
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
It took some years for us to make transformer-based implementations truly competitive (i.e. aclanthology.org/2024.findings-…), but the 2018 model has kept the edge even during the LLM explosion of the recent years. 2/
English
1
0
5
266
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
It still amazes me that the data-driven Sanskrit word segmentation model Oliver Hellwig (with a bit of contribution of myself) in 2018 created (aclanthology.org/D18-1295/) was completely trained and evaluated on CPU within a couple of hours! 1/
English
1
1
8
315
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
At this year's SAT 令和大蔵経 event in Tokyo!
Sebastian Nehrdich tweet media
日本語
0
0
3
322
Sebastian Nehrdich
Sebastian Nehrdich@SebastianNehrd2·
I gave my first public presentation here at Tohoku university yesterday! And we had the first snow in Sendai yesterday for this year. I am happy to be back in a country that has proper seasons…
Sebastian Nehrdich tweet media
English
0
0
4
365