Nahfid Nissar

71 posts

Nahfid Nissar

Nahfid Nissar

@NahfidN

Katılım Ekim 2025
445 Takip Edilen14 Takipçiler
Sabitlenmiş Tweet
Nahfid Nissar
Nahfid Nissar@NahfidN·
you know, we keep talking about how ai needs more representation for low-resource languages. but kashmiri, with all that incredible literature, centuries of history, was pretty much invisible to modern ocr systems. it bothered us. so we decided to stop talking about it and build something. today, we're open-sourcing koshur pixel. it’s the largest synthetic ocr dataset for the kashmiri language: 613,000 image-text pairs covering words, sentences, even whole pages. really grateful to @Haq_Nawaz_Malik and @FaizanIqbal__52. couldn't have done this without them. paper: arxiv.org/abs/2606.23144 dataset: huggingface.co/datasets/Omarr…
Nahfid Nissar tweet mediaNahfid Nissar tweet mediaNahfid Nissar tweet mediaNahfid Nissar tweet media
English
1
3
6
213
Yash Pandey
Yash Pandey@_yashhx0·
looking to connect people on X if you're into - building SaaS - GenAI - system design - AI tools - shipping in public - figuring it out as you go - AIML research - Dev Tools say Hi or drop what you're working on looking to connect with active ones 🫡
English
24
0
15
758
Nahfid Nissar
Nahfid Nissar@NahfidN·
@sevanlp I am working on opensource kashmiri language datasets. we just released a 613k ocr dataset.
English
0
0
0
4
Sevan Lewis-Payne
Sevan Lewis-Payne@sevanlp·
Trying to meet more people who are building software. Founders, devs, students, AI builders, indie hackers, product people: What are you working on? I’m building in public, learning fast, and trying to make my feed less noise and more real builders shipping useful things.
English
45
3
47
1.7K
Nahfid Nissar
Nahfid Nissar@NahfidN·
@TheSuperEng 3b1b is goated. for basic problem solving i would like to add theorganicchemistry tutor and khan academy.
English
1
0
1
84
Shubh
Shubh@TheSuperEng·
Top of the line INTERNET RESOURCES for studying machine learning mathematics: ➜ Distill ➜ Setosa .io ➜ 3Blue1Brown ➜ Explained .ai ➜ CNN Explainer ➜ ImmersiveMath .com ➜ MLU-Explain by AWS ➜ TensorFlow Playground ➜ Seeing Theory by Brown ➜ StatQuest by Josh Starmer ➜ Neural Networks and Deep Learning
English
4
16
119
3K
Moummar
Moummar@MoummarNawafleh·
claude scheduled tasks are so underrated we have our own apps, granola, email, slack and docs connected to it and have built some cool tasks that speed up our matching
English
2
1
16
1.4K
Nahfid Nissar
Nahfid Nissar@NahfidN·
i build the boring parts that make agents work everyone wants to talk about the model. i spend my time on the plumbing around it. the part that decides whether an agent actually ships or just demos well. at nbyula that's meant a bunch of things: a harness with self-correcting loops, an rlhf annotation and eval pipeline, a browser agent that works through long multi-step forms and recovers from its own mistakes, a moderation system where a fine-tuned model gets second-guessed by an llm judge. messaging agents, voice, the unglamorous infra underneath all of it. somewhere along the way i stopped seeing engineering as the support role for research. if you can't build the eval and the data pipeline yourself, your good ideas just sit in a queue waiting for someone who can. building is the research. on the side i write: two ieee papers on cloud security and run a blog that emails me a digest of new llm research every morning, fully on autopilot. i finally put everything in one place. there's even an ai on the site that'll answer questions about me, so you can interrogate it. nahfid.vercel.app
English
1
1
1
37
Nahfid Nissar
Nahfid Nissar@NahfidN·
@oneill_c You're probably the first one I have seen explaining it this well
English
0
0
0
15
Charlie O'Neill
Charlie O'Neill@oneill_c·
1/ We fine-tune a lot of customer models, so we decided to systematically try and figure out some best practices for finetuning. SFT isn't sexy, but it's still important. We vary one SFT lever at a time across 2 model families, dense + MoE to 235B, on 4 real-world customer datasets. What makes this clean is that each dataset is paired with an eval that took weeks to build with the customer, and the training outputs were generated to pass that eval. So the supervised target and the thing we measure downstream are the same criterion, which strips out the usual confounders
Charlie O'Neill tweet media
English
21
72
709
147.7K
Ishaan
Ishaan@ishaangodhaa·
if you understand harness and inference in the Al space, i want to talk to you. we're working on something ambitious. not hiring. just looking for the right people to brainstorm with. DM me or drop a comment.
English
27
0
57
5.4K
Aritra
Aritra@ariG23498·
[HF ML Club India] We are pleased to announce the IRL event in Bengaluru. We are partnering with @GoogleDeepMind and this takes place on the 15th of July 2026 (more details below). We will have someone from the GDM team talk about Gemma 4, and an hour long build session with Gemma and @huggingface. Great opportunity for folks to network and build. PS: We will not entertain any "where is the registration link" query. Seek and ye shall find.
Aritra tweet media
English
32
12
303
22.7K
shyamal
shyamal@shyamalanadkat·
after close to four years at @openai, i moved from the bay area to india earlier this year. i still believe deeply in ensuring true superintelligence accelerates science and remains accessible and beneficial to all. having grown up here, i've also always felt deeply connected to the ecosystem here. over the past several weeks, i've been speaking with researchers, engineers, and thinkers across india and apac. it's become clear that there are many who want to build the future from here. moving back felt like the counterintuitive choice. i no longer think that's true. what's been missing is the belief that you can build institutions of global consequence from anywhere. and more importantly, the ambition and the will to pursue ideas that seem impossibly large at first. this may be a once in a generation opportunity. more to come soon. DMs open if this resonates.
English
314
352
4.9K
584.1K
juggernaut
juggernaut@curlysaarthak·
"..sit down and turn all your work in goals/loops, connect your phone with codex or dispatch/code or droid computer. Go outside and enjoy the nature...." a strategy when you have infinite compute at your disposal. who even has 8x B200 to do research?? I still remember how hard was it to get any budget on GNN research work I did for a startup. I had to prompt the founder multiple times to get even ~$10 worth of compute for which I would use 4x3090 for 2.5hrs with lightning to train the models on distributed gpus. whole research took ~$40 and I trained 3 GNN models on a novel dataset, novel task and an architecture I came up with by reading multiple research papers/trying different combinations. Yes the startup was broke but I learned to do research by being resourceful.
juggernaut tweet media
English
4
1
54
1.5K
bilal
bilal@bilaltwovec·
@ekzhang1 what if u team up w 7 of my friends to rent a full node and then we can batch requests and then.. oh
English
3
1
52
12.5K
Eric Zhang
Eric Zhang@ekzhang1·
$200/month is enough to buy an H100 GPU for 6 hours every workday
English
119
67
2.6K
552.9K
Nahfid Nissar retweetledi
Zulqarnain Ibn Usuf | ذوالقرنین ابن یوسف
The researchers have “publicly released the model, dataset and source code to encourage further research and development for the Kashmiri language ecosystem.”
Zulqarnain Ibn Usuf | ذوالقرنین ابن یوسف tweet media
ETV Bharat Jammu & Kashmir@ETVBharatJK

The researchers, @Haq_Nawaz_Malik , @NahfidN and @FaizanIqbal__52 , have unveiled Koshur Diacritizer to add missing diacritics, the small marks used in Kashmiri script that help determine pronunciation and meaning. #Kashmir etvbharat.com/en/state/three…

English
0
1
2
105
Nahfid Nissar retweetledi
HAQ NAWAZ MALIK
HAQ NAWAZ MALIK@Haq_Nawaz_Malik·
''LLMs fail at Kashmiri because they’ve barely seen real data.'' For decades, authentic literature was stuck in **InPage**, a non-Unicode format AI can’t read. I built a converter that unlocked **3.1M words** for Kashmiri LLM training. Read More: #kashmiri-llm-blog" target="_blank" rel="nofollow noopener">haq-nawaz-malik.github.io/hnmblogs/#kash
English
1
2
3
89
Nahfid Nissar
Nahfid Nissar@NahfidN·
kashmiri text needs diacritics for meaning and pronunciation, but digital platforms often just drop them. it's a huge problem for the language. we just open-sourced koshur diacritizer, an ai model to restore them automatically. feels like a crucial step for the community. big thanks to @Haq_Nawaz_Malik and @FaizanIqbal__52. Paper link: arxiv.org/pdf/2606.15883…
English
1
4
5
116
Nahfid Nissar
Nahfid Nissar@NahfidN·
i've been thinking about kashmiri llms lately. it's kinda wild how much effort goes into english or even spanish models, all that data ready to be scraped. then you look at something like kashmiri, and the data just isn't there in the same way. makes building anything useful a completely different ballgame. it's not just scaling bigger, it's about starting from scratch in a lot of places. feels like a huge mountain to climb.
English
1
0
2
48
Prakash Kagitha @ ACL 2026
Prakash Kagitha @ ACL 2026@prakashkagitha·
Small language models with test-time scaling (N calls) beat larger models, but not all test-time scaling harnesses are the same! We evolve test-time scaling harness and prompts with @gepa_ai and 8x Haiku beats Opus by ~20%.
Prakash Kagitha @ ACL 2026 tweet media
English
4
9
41
2.9K