Nahfid Nissar

71 posts

Nahfid Nissar

@NahfidN

Katılım Ekim 2025

445 Takip Edilen14 Takipçiler

Sabitlenmiş Tweet

Nahfid Nissar@NahfidN·7h

you know, we keep talking about how ai needs more representation for low-resource languages. but kashmiri, with all that incredible literature, centuries of history, was pretty much invisible to modern ocr systems. it bothered us. so we decided to stop talking about it and build something. today, we're open-sourcing koshur pixel. it’s the largest synthetic ocr dataset for the kashmiri language: 613,000 image-text pairs covering words, sentences, even whole pages. really grateful to @Haq_Nawaz_Malik and @FaizanIqbal__52. couldn't have done this without them. paper: arxiv.org/abs/2606.23144 dataset: huggingface.co/datasets/Omarr…

English

213

Nahfid Nissar@NahfidN·7h

@_yashhx0 for sure, i am open sourcing datasets for kashmiri language

English

Yash Pandey@_yashhx0·9h

looking to connect people on X if you're into - building SaaS - GenAI - system design - AI tools - shipping in public - figuring it out as you go - AIML research - Dev Tools say Hi or drop what you're working on looking to connect with active ones 🫡

English

758

Nahfid Nissar@NahfidN·7h

@sevanlp I am working on opensource kashmiri language datasets. we just released a 613k ocr dataset.

English

Sevan Lewis-Payne@sevanlp·22h

Trying to meet more people who are building software. Founders, devs, students, AI builders, indie hackers, product people: What are you working on? I’m building in public, learning fast, and trying to make my feed less noise and more real builders shipping useful things.

English

1.7K

Nahfid Nissar retweetledi

Faizan Iqbal@FaizanIqbal__52·7h

Today, we are releasing #Koshur #Pixel, the largest open-source synthetic #613k image-text #OCR #dataset for the Kashmiri language. I am grateful to my research collaborators, @Haq_Nawaz_Malik and @NahfidN Paper: https://arxiv.org/abs/2606.23144 Dataset: huggingface.co/datasets/Omarr…

English

219

Nahfid Nissar@NahfidN·7h

@TheSuperEng 3b1b is goated. for basic problem solving i would like to add theorganicchemistry tutor and khan academy.

English

Shubh@TheSuperEng·13h

Top of the line INTERNET RESOURCES for studying machine learning mathematics: ➜ Distill ➜ Setosa .io ➜ 3Blue1Brown ➜ Explained .ai ➜ CNN Explainer ➜ ImmersiveMath .com ➜ MLU-Explain by AWS ➜ TensorFlow Playground ➜ Seeing Theory by Brown ➜ StatQuest by Josh Starmer ➜ Neural Networks and Deep Learning

English

119

Nahfid Nissar@NahfidN·7h

@MoummarNawafleh Yeppp! and u can unlock a lot of websites by using playwright stealth, try it!

English

Moummar@MoummarNawafleh·7h

@NahfidN it can use chrome right?

English

Moummar@MoummarNawafleh·7h

claude scheduled tasks are so underrated we have our own apps, granola, email, slack and docs connected to it and have built some cool tasks that speed up our matching

English

1.4K

Nahfid Nissar@NahfidN·17h

i build the boring parts that make agents work everyone wants to talk about the model. i spend my time on the plumbing around it. the part that decides whether an agent actually ships or just demos well. at nbyula that's meant a bunch of things: a harness with self-correcting loops, an rlhf annotation and eval pipeline, a browser agent that works through long multi-step forms and recovers from its own mistakes, a moderation system where a fine-tuned model gets second-guessed by an llm judge. messaging agents, voice, the unglamorous infra underneath all of it. somewhere along the way i stopped seeing engineering as the support role for research. if you can't build the eval and the data pipeline yourself, your good ideas just sit in a queue waiting for someone who can. building is the research. on the side i write: two ieee papers on cloud security and run a blog that emails me a digest of new llm research every morning, fully on autopilot. i finally put everything in one place. there's even an ai on the site that'll answer questions about me, so you can interrogate it. nahfid.vercel.app

English

Nahfid Nissar@NahfidN·21h

@oneill_c You're probably the first one I have seen explaining it this well

English

Charlie O'Neill@oneill_c·4d

1/ We fine-tune a lot of customer models, so we decided to systematically try and figure out some best practices for finetuning. SFT isn't sexy, but it's still important. We vary one SFT lever at a time across 2 model families, dense + MoE to 235B, on 4 real-world customer datasets. What makes this clean is that each dataset is paired with an eval that took weeks to build with the customer, and the training outputs were generated to pass that eval. So the supervised target and the thing we measure downstream are the same criterion, which strips out the usual confounders

English

709

147.7K

Nahfid Nissar@NahfidN·1d

@ishaangodhaa Sure let's chat

English

123

Ishaan@ishaangodhaa·1d

if you understand harness and inference in the Al space, i want to talk to you. we're working on something ambitious. not hiring. just looking for the right people to brainstorm with. DM me or drop a comment.

English

5.4K

Nahfid Nissar@NahfidN·1d

@ariG23498 @GoogleDeepMind @chauhan_nilay16 @RisingSayak I have requested for access!

English

Aritra@ariG23498·1d

[HF ML Club India] We are pleased to announce the IRL event in Bengaluru. We are partnering with @GoogleDeepMind and this takes place on the 15th of July 2026 (more details below). We will have someone from the GDM team talk about Gemma 4, and an hour long build session with Gemma and @huggingface. Great opportunity for folks to network and build. PS: We will not entertain any "where is the registration link" query. Seek and ye shall find.

English

303

22.7K

Nahfid Nissar@NahfidN·2d

@shyamalanadkat @OpenAI I dm'ed you please take a look

English

shyamal@shyamalanadkat·2d

after close to four years at @openai, i moved from the bay area to india earlier this year. i still believe deeply in ensuring true superintelligence accelerates science and remains accessible and beneficial to all. having grown up here, i've also always felt deeply connected to the ecosystem here. over the past several weeks, i've been speaking with researchers, engineers, and thinkers across india and apac. it's become clear that there are many who want to build the future from here. moving back felt like the counterintuitive choice. i no longer think that's true. what's been missing is the belief that you can build institutions of global consequence from anywhere. and more importantly, the ambition and the will to pursue ideas that seem impossibly large at first. this may be a once in a generation opportunity. more to come soon. DMs open if this resonates.

English

314

352

4.9K

584.1K

Nahfid Nissar@NahfidN·3d

@iarthsingh @curlysaarthak That's a lot of gpus brother :)

English

Arth Singh @ICML’26 🇰🇷@iarthsingh·3d

@curlysaarthak So i think I am lucky to have 12 b200s all the time and 5000 H100 clusters 😭😭

English

239

juggernaut@curlysaarthak·3d

"..sit down and turn all your work in goals/loops, connect your phone with codex or dispatch/code or droid computer. Go outside and enjoy the nature...." a strategy when you have infinite compute at your disposal. who even has 8x B200 to do research?? I still remember how hard was it to get any budget on GNN research work I did for a startup. I had to prompt the founder multiple times to get even ~$10 worth of compute for which I would use 4x3090 for 2.5hrs with lightning to train the models on distributed gpus. whole research took ~$40 and I trained 3 GNN models on a novel dataset, novel task and an architecture I came up with by reading multiple research papers/trying different combinations. Yes the startup was broke but I learned to do research by being resourceful.

English

1.5K

Nahfid Nissar@NahfidN·3d

@bilaltwovec @ekzhang1 I am in??

English

bilal@bilaltwovec·13 Nis

@ekzhang1 what if u team up w 7 of my friends to rent a full node and then we can batch requests and then.. oh

English

12.5K

Eric Zhang@ekzhang1·13 Nis

$200/month is enough to buy an H100 GPU for 6 hours every workday

English

119

2.6K

552.9K

Nahfid Nissar retweetledi

Zulqarnain Ibn Usuf | ذوالقرنین ابن یوسف@mzzulfi·4d

The researchers have “publicly released the model, dataset and source code to encourage further research and development for the Kashmiri language ecosystem.”

Zulqarnain Ibn Usuf | ذوالقرنین ابن یوسف tweet media

ETV Bharat Jammu & Kashmir@ETVBharatJK

The researchers, @Haq_Nawaz_Malik , @NahfidN and @FaizanIqbal__52 , have unveiled Koshur Diacritizer to add missing diacritics, the small marks used in Kashmiri script that help determine pronunciation and meaning. #Kashmir etvbharat.com/en/state/three…

English

105

Nahfid Nissar retweetledi

ETV Bharat Jammu & Kashmir@ETVBharatJK·4d

English

225

Nahfid Nissar retweetledi

HAQ NAWAZ MALIK@Haq_Nawaz_Malik·11 Ara

''LLMs fail at Kashmiri because they’ve barely seen real data.'' For decades, authentic literature was stuck in **InPage**, a non-Unicode format AI can’t read. I built a converter that unlocked **3.1M words** for Kashmiri LLM training. Read More: #kashmiri-llm-blog" target="_blank" rel="nofollow noopener">haq-nawaz-malik.github.io/hnmblogs/#kash…

English

Nahfid Nissar@NahfidN·6d

kashmiri text needs diacritics for meaning and pronunciation, but digital platforms often just drop them. it's a huge problem for the language. we just open-sourced koshur diacritizer, an ai model to restore them automatically. feels like a crucial step for the community. big thanks to @Haq_Nawaz_Malik and @FaizanIqbal__52. Paper link: arxiv.org/pdf/2606.15883…

English

116

Nahfid Nissar@NahfidN·15 Haz

i've been thinking about kashmiri llms lately. it's kinda wild how much effort goes into english or even spanish models, all that data ready to be scraped. then you look at something like kashmiri, and the data just isn't there in the same way. makes building anything useful a completely different ballgame. it's not just scaling bigger, it's about starting from scratch in a lot of places. feels like a huge mountain to climb.

English

Nahfid Nissar@NahfidN·15 Haz

@prakashkagitha @gepa_ai Will read your code

English

Prakash Kagitha @ ACL 2026@prakashkagitha·15 Haz

Small language models with test-time scaling (N calls) beat larger models, but not all test-time scaling harnesses are the same! We evolve test-time scaling harness and prompts with @gepa_ai and 8x Haiku beats Opus by ~20%.

English

2.9K

Keşfet

@_yashhx0 @sevanlp @Haq_Nawaz_Malik @TheSuperEng @MoummarNawafleh @oneill_c @ishaangodhaa @ariG23498