Nikita Pavlichenko

78 posts

Nikita Pavlichenko

@nv_pavlichenko

LLM Lead at @jetbrains

Berlin, Germany Beigetreten Haziran 2020

111 Folgt89 Follower

Nikita Pavlichenko@nv_pavlichenko·1d

@gianmalio I don’t think I was even aware of their existence at that time

English

Gianmalio@gianmalio·1d

@nv_pavlichenko Have you looked at Citadel / BAM?

English

Nikita Pavlichenko@nv_pavlichenko·5d

I interviewed at Jane Street for a Quant Intern in 2019. This was the only time in my life when I had an interview on the **phone**. A guy called me from NY and ask some 9-th grade math questions, I think about how many steps you take going down an escalator and the probs of winning in paper rock scissors with some restrictions. I was so shocked that the phone interview is actually by phone so I fucked it up. I also found it hard to write something with a phone in hand but probably ngmi the moment I took a pen.

Westside L.A. Guy@WestsideLAGuy

Jane Street is so cracked that not even Alex, a brilliant finance professional, could get an offer. Alex crushed Stanford undergrad, fixed income trading at MS, fixed income investing at Bain Capital, HBS, top hedge fund, early finance hire who scaled Ramp.

English

491

Nikita Pavlichenko@nv_pavlichenko·2d

@iamgrigorev @poolsideai Finally! Congrats!

English

142

George Grigorev@iamgrigorev·2d

We just released our first models at @poolsideai – including Laguna XS.2 (open weights) that competes with Qwen3.6-35B. I worked across pretraining — happy to answer questions! We now have great understanding of exactly all components that went into training through principled ablations, and now we're confident to scale. This year would be 🔥 for us! More coming very soon

English

235

21.9K

Nikita Pavlichenko@nv_pavlichenko·4d

@_avichawla Literally interview is over after this answer. DeepSeek dsa requires you to have mla, enabling swa in mid training is sketchy, no mentioning of yarn on rope theta scaling as well as long context data job. FlashAttention irrelevant. Nice rage bait

English

5.4K

Avi Chawla@_avichawla·4d

You're in a Research Scientist interview at OpenAI. The interviewer asks: "How would you expand the context length of an LLM from 2K to 128K tokens?" You: "I will fine-tune the model on longer docs with 128K context." Interview over. Here's what you missed:

English

928

248.8K

Nikita Pavlichenko@nv_pavlichenko·8 Nis

@HungryMinded @deedydas First model to discover gambling

English

Hungry Minded@HungryMinded·8 Nis

@deedydas First model to become a part of the grind culture?

English

7.1K

Deedy@deedydas·7 Nis

Claude Mythos just obliterated every single benchmark in AI. I can't believe what I'm reading.

English

323

751

6.6K

771.8K

Nikita Pavlichenko@nv_pavlichenko·4 Nis

ZXX

119

Nikita Pavlichenko@nv_pavlichenko·4 Nis

nvidia/Nemotron-Instruction-Following-Chat-v1 is my favourite hf dataset from now on. THIS is how we build AGI

English

214

Nikita Pavlichenko@nv_pavlichenko·4 Nis

They don’t rely on hype rather they get you addicted on claude code so you then go beg your enterprise to buy you credits. The moment everyone got addicted and can’t work without it anymore, they rug pull the consumer plans and force everyone to enterprise. Was clear as day from the start

English

335

ℏεsam@Hesamation·4 Nis

Redditor claims Claude Code is nerfed for Pro/Max users vs Enterprise customers and the strategy is to use the paid plan users to generate hype on X and LinkedIn so companies would reach out to them.

English

205

270

3.7K

458.2K

Nikita Pavlichenko@nv_pavlichenko·2 Nis

We're currently wrapping up our first big pre-training run at JetBrains: a coding-focused LLM, which can apparently be the second best fully european open pre-train after Mistral (they have larger models). Though gotta do a research on that, please drop the models we need to compare with. Probably need to shitpost here a little bit since we're really lacking publicity Here's the loss plot for the start (periodic loss spikes are interesting, probably gonna have a section in the report dedicated to what they are). Benchmarks are honestly insane for the first team's run

English

107

Nikita Pavlichenko@nv_pavlichenko·2 Nis

@teortaxesTex So indeed “wrong climate?”

English

1.2K

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·1 Nis

The funny thing about living in the tropics is that you can tell these people are less competent than Russians in every day-to-day manner, have like infinity time preference, goof around, but they end up with a society that's *more* livable than Russia + no fascism or cortisol

Ра@bankukku

@escapefrommelos As a Russian, living in the tropics for a year completely changed me. The reasons are: no police state, no winter, no war, everyone's friendly and chill. We are thriving in the tropics like nobody else.

English

296

20.8K

Nikita Pavlichenko@nv_pavlichenko·2 Nis

@icanvardar By other harnesses being overoptimized for a benchmark

English

1.2K

Can Vardar@icanvardar·1 Nis

how is claude code the worst harness in opus 4.6 benchmarks 💀

English

966

160.9K

Nikita Pavlichenko@nv_pavlichenko·2 Nis

Was waiting for this since 2019

English

Nikita Pavlichenko@nv_pavlichenko·25 Mar

@lvwerra The old architecture will be slow and expensive but definitely better if you count in RL envs. Practically all the architecture changes in the last three years were dedicated to making a dense llama-like model faster (moe, mla, hybrid models, etc), not better

English

253

Leandro von Werra@lvwerra·25 Mar

Which LLM would be better: - today's best architecture trained on 2023's best data - 2023's best architecture trained on today's best data

English

25.3K

Nikita Pavlichenko@nv_pavlichenko·27 Şub

@OjasSharma276 Tasteless rlhf data annotators

Latviešu

Ojas Sharma@OjasSharma276·26 Şub

I don’t get it. ChatGPT and other AI models use so many emojis. On what data were they trained? Because I’ve never seen this much emoji usage anywhere else.

English

181

3.9K

225.7K

Nikita Pavlichenko@nv_pavlichenko·22 Oca

@0xkyle__ It’s literally r/wallstreetbets type of portfolio with long on ai infra and hedge on nvidia

English

335

Kyle@0xkyle__·21 Oca

Bringing this back to remind you that a 25 year old German woke up, got tired of building AI, decided he would outperform all of finance, raised 1.5 billion, and put 20% of his fund into INTC His name is Leopold Aschenbrenner, and INTC is up 10% today.

English

271.5K

Nikita Pavlichenko@nv_pavlichenko·10 Oca

One of the main unsolved problems in coding AI is a cold start of new libraries, languages, and frameworks. The models perform best on what is represented in the training data the most. This is usually old frameworks and JS/Python code. If you ship the next best thing, all the models will struggle with using it. This is why Claude still wants to use pip instead of uv sometimes even when you explicitly tell it not to. There isn’t a proper solution to it really. You can practically go with two general directions: 1. Fine-tune on the frameworks code or hope for continuous pre-training. Helps a little bit but not dramatic since the relative (and absolute) size of it in the datasets will be tiny. Targeted RL (so the data is potentially unlimited) post-training is the closest thing to make it work. 2. Drop the docs into context window inference time either directly or with RAG. Bloats the context window, requires manual work and overall can be done only with a limited amount of specifically targeted technologies. This is a real problem that can prevent adoption of nice things in the industry and when most of the code is generated by LLMs it only becomes worse. We can end up in the 2025 state of tech for longer than we’d want. I do think this highlights some big flaws in the current approach to LLMs that will accelerate the progress when resolved.

English

109

Nikita Pavlichenko@nv_pavlichenko·10 Oca

@rieszspieces @fluxtheorist @DeepDishEnjoyer 75 is enough for measure theory. literally the most intuitive thing from this thread

English

159

mostly harmless graduate student@rieszspieces·9 Oca

@fluxtheorist @DeepDishEnjoyer to be fair you need to have a 175 iq to understand measure theory, the nuances of borel sigma-algebras and convergence in measure do not come easy to mere mortals

English

2.9K

peepeepoopoo@DeepDishEnjoyer·9 Oca

>gets asked for evidence >makes it totally up i can't think of a single graduate level math class that 130 IQ kids can't handle but 140 IQ suddenly can

Charles Murray@charlesmurray

In a sufficiently advanced college math class, lots of the 130 IQ kids have to drop out because they just can't learn the material whereas almost all the 140 kids can. In a history or literature class it's more subjective. My experience is that the 130 range, though really smart, is accompanied by more class contributions that indicate they don't quite get it, whereas that seldom happens with 140 kids. In a class on contemporary politics, a fair number of the 140 kids will take monumentally stupid positions that would never occur to the 130 kids.

English

2.8K

148.6K

Nikita Pavlichenko@nv_pavlichenko·8 Oca

@GenAI_is_real hope this post was written by deepseek then and not 5.2

English

325

Chayenne Zhao@GenAI_is_real·7 Oca

86 pages of pure technical dominance. People keep asking how DeepSeek mastered the synergy between Infra and Algo. It’s not magic; it’s a radical rejection of corporate rot. While xxxx is drowning in "safety" bureaucracy, yyyy is paralyzed by middle management bloat, and zzzz is bogged down by legacy silos, DeepSeek has built the ultimate spec-ops squad. Their blueprint is lethal: They only hire "Full-stack" hybrids who master both Infra & Algo. They pay at the absolute top of the market to keep the elite together. They grant total autonomy with maximum GPU-per-capita. They operate in a flat, high-density environment—no "alignment" meetings, just execution. DeepSeek is the definition of an AI-era elite organization. Everything else is just legacy overhead. The empires are crumbling.

机器之心 JIQIZHIXIN@jiqizhixin

DeepSeek-R1’s paper was updated 2 days ago, expanding from 22 pages to 86 pages and adding a substantial amount of detail. The new content covers topics such as the self-evolution of DeepSeek-R1-Zero, evaluation of DeepSeek-R1, further analysis, and DeepSeek-R1 distillation. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper: arxiv.org/abs/2501.12948…

English

422

38.6K

Nikita Pavlichenko@nv_pavlichenko·7 Oca

@GergelyOrosz There was only like a two years period around covid when you didn’t need a degree anyway

English

Gergely Orosz@GergelyOrosz·6 Oca

One trend that will likely return in tech, thanks to AI: Increasingly only possible to be hired as a new grad with a CS or similar degree. AI agents render writing code less relevant: but eveything else about building software (aka software engineering fundamentals) more!

English

934

86K

Nikita Pavlichenko@nv_pavlichenko·6 Oca

i feel so ashamed that i didn’t end up digging into JEPA papers saga largely because in russian this names kinda sounds like “ass” and it prevents me from taking it seriously enough the only thing that’s helping me feel better is that it was also the reason for multiple respectable ai folks that I personally know

English

Entdecken

@gianmalio @iamgrigorev @poolsideai @_avichawla @HungryMinded @deedydas @teortaxesTex @icanvardar