Nikita Pavlichenko

78 posts

Nikita Pavlichenko

Nikita Pavlichenko

@nv_pavlichenko

LLM Lead at @jetbrains

Berlin, Germany Beigetreten Haziran 2020
111 Folgt89 Follower
Nikita Pavlichenko
Nikita Pavlichenko@nv_pavlichenko·
I interviewed at Jane Street for a Quant Intern in 2019. This was the only time in my life when I had an interview on the **phone**. A guy called me from NY and ask some 9-th grade math questions, I think about how many steps you take going down an escalator and the probs of winning in paper rock scissors with some restrictions. I was so shocked that the phone interview is actually by phone so I fucked it up. I also found it hard to write something with a phone in hand but probably ngmi the moment I took a pen.
Westside L.A. Guy@WestsideLAGuy

Jane Street is so cracked that not even Alex, a brilliant finance professional, could get an offer. Alex crushed Stanford undergrad, fixed income trading at MS, fixed income investing at Bain Capital, HBS, top hedge fund, early finance hire who scaled Ramp.

English
1
0
1
491
George Grigorev
George Grigorev@iamgrigorev·
We just released our first models at @poolsideai – including Laguna XS.2 (open weights) that competes with Qwen3.6-35B. I worked across pretraining — happy to answer questions! We now have great understanding of exactly all components that went into training through principled ablations, and now we're confident to scale. This year would be 🔥 for us! More coming very soon
George Grigorev tweet media
English
18
26
235
21.9K
Nikita Pavlichenko
Nikita Pavlichenko@nv_pavlichenko·
@_avichawla Literally interview is over after this answer. DeepSeek dsa requires you to have mla, enabling swa in mid training is sketchy, no mentioning of yarn on rope theta scaling as well as long context data job. FlashAttention irrelevant. Nice rage bait
English
3
1
55
5.4K
Avi Chawla
Avi Chawla@_avichawla·
You're in a Research Scientist interview at OpenAI. The interviewer asks: "How would you expand the context length of an LLM from 2K to 128K tokens?" You: "I will fine-tune the model on longer docs with 128K context." Interview over. Here's what you missed:
English
28
67
928
248.8K
Hungry Minded
Hungry Minded@HungryMinded·
@deedydas First model to become a part of the grind culture?
Hungry Minded tweet media
English
2
2
41
7.1K
Deedy
Deedy@deedydas·
Claude Mythos just obliterated every single benchmark in AI. I can't believe what I'm reading.
Deedy tweet media
English
323
751
6.6K
771.8K
Nikita Pavlichenko
Nikita Pavlichenko@nv_pavlichenko·
nvidia/Nemotron-Instruction-Following-Chat-v1 is my favourite hf dataset from now on. THIS is how we build AGI
Nikita Pavlichenko tweet media
English
1
0
0
214
Nikita Pavlichenko
Nikita Pavlichenko@nv_pavlichenko·
They don’t rely on hype rather they get you addicted on claude code so you then go beg your enterprise to buy you credits. The moment everyone got addicted and can’t work without it anymore, they rug pull the consumer plans and force everyone to enterprise. Was clear as day from the start
English
0
0
1
335
ℏεsam
ℏεsam@Hesamation·
Redditor claims Claude Code is nerfed for Pro/Max users vs Enterprise customers and the strategy is to use the paid plan users to generate hype on X and LinkedIn so companies would reach out to them.
ℏεsam tweet media
English
205
270
3.7K
458.2K
Nikita Pavlichenko
Nikita Pavlichenko@nv_pavlichenko·
We're currently wrapping up our first big pre-training run at JetBrains: a coding-focused LLM, which can apparently be the second best fully european open pre-train after Mistral (they have larger models). Though gotta do a research on that, please drop the models we need to compare with. Probably need to shitpost here a little bit since we're really lacking publicity Here's the loss plot for the start (periodic loss spikes are interesting, probably gonna have a section in the report dedicated to what they are). Benchmarks are honestly insane for the first team's run
Nikita Pavlichenko tweet media
English
0
0
2
107
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
The funny thing about living in the tropics is that you can tell these people are less competent than Russians in every day-to-day manner, have like infinity time preference, goof around, but they end up with a society that's *more* livable than Russia + no fascism or cortisol
Ра@bankukku

@escapefrommelos As a Russian, living in the tropics for a year completely changed me. The reasons are: no police state, no winter, no war, everyone's friendly and chill. We are thriving in the tropics like nobody else.

English
14
6
296
20.8K
Can Vardar
Can Vardar@icanvardar·
how is claude code the worst harness in opus 4.6 benchmarks 💀
Can Vardar tweet media
English
82
29
966
160.9K
Nikita Pavlichenko
Nikita Pavlichenko@nv_pavlichenko·
Was waiting for this since 2019
Nikita Pavlichenko tweet media
English
0
0
0
71
Nikita Pavlichenko
Nikita Pavlichenko@nv_pavlichenko·
@lvwerra The old architecture will be slow and expensive but definitely better if you count in RL envs. Practically all the architecture changes in the last three years were dedicated to making a dense llama-like model faster (moe, mla, hybrid models, etc), not better
English
0
0
0
253
Leandro von Werra
Leandro von Werra@lvwerra·
Which LLM would be better: - today's best architecture trained on 2023's best data - 2023's best architecture trained on today's best data
English
23
2
48
25.3K
Ojas Sharma
Ojas Sharma@OjasSharma276·
I don’t get it. ChatGPT and other AI models use so many emojis. On what data were they trained? Because I’ve never seen this much emoji usage anywhere else.
English
181
24
3.9K
225.7K
Nikita Pavlichenko
Nikita Pavlichenko@nv_pavlichenko·
@0xkyle__ It’s literally r/wallstreetbets type of portfolio with long on ai infra and hedge on nvidia
English
0
0
0
335
Kyle
Kyle@0xkyle__·
Bringing this back to remind you that a 25 year old German woke up, got tired of building AI, decided he would outperform all of finance, raised 1.5 billion, and put 20% of his fund into INTC His name is Leopold Aschenbrenner, and INTC is up 10% today.
Kyle tweet media
English
43
66
2K
271.5K
Nikita Pavlichenko
Nikita Pavlichenko@nv_pavlichenko·
One of the main unsolved problems in coding AI is a cold start of new libraries, languages, and frameworks. The models perform best on what is represented in the training data the most. This is usually old frameworks and JS/Python code. If you ship the next best thing, all the models will struggle with using it. This is why Claude still wants to use pip instead of uv sometimes even when you explicitly tell it not to. There isn’t a proper solution to it really. You can practically go with two general directions: 1. Fine-tune on the frameworks code or hope for continuous pre-training. Helps a little bit but not dramatic since the relative (and absolute) size of it in the datasets will be tiny. Targeted RL (so the data is potentially unlimited) post-training is the closest thing to make it work. 2. Drop the docs into context window inference time either directly or with RAG. Bloats the context window, requires manual work and overall can be done only with a limited amount of specifically targeted technologies. This is a real problem that can prevent adoption of nice things in the industry and when most of the code is generated by LLMs it only becomes worse. We can end up in the 2025 state of tech for longer than we’d want. I do think this highlights some big flaws in the current approach to LLMs that will accelerate the progress when resolved.
English
0
0
1
109
Chayenne Zhao
Chayenne Zhao@GenAI_is_real·
86 pages of pure technical dominance. People keep asking how DeepSeek mastered the synergy between Infra and Algo. It’s not magic; it’s a radical rejection of corporate rot. While xxxx is drowning in "safety" bureaucracy, yyyy is paralyzed by middle management bloat, and zzzz is bogged down by legacy silos, DeepSeek has built the ultimate spec-ops squad. Their blueprint is lethal: They only hire "Full-stack" hybrids who master both Infra & Algo. They pay at the absolute top of the market to keep the elite together. They grant total autonomy with maximum GPU-per-capita. They operate in a flat, high-density environment—no "alignment" meetings, just execution. DeepSeek is the definition of an AI-era elite organization. Everything else is just legacy overhead. The empires are crumbling.
机器之心 JIQIZHIXIN@jiqizhixin

DeepSeek-R1’s paper was updated 2 days ago, expanding from 22 pages to 86 pages and adding a substantial amount of detail. The new content covers topics such as the self-evolution of DeepSeek-R1-Zero, evaluation of DeepSeek-R1, further analysis, and DeepSeek-R1 distillation. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper: arxiv.org/abs/2501.12948…

English
16
51
422
38.6K
Nikita Pavlichenko
Nikita Pavlichenko@nv_pavlichenko·
@GergelyOrosz There was only like a two years period around covid when you didn’t need a degree anyway
English
0
0
0
17
Gergely Orosz
Gergely Orosz@GergelyOrosz·
One trend that will likely return in tech, thanks to AI: Increasingly only possible to be hired as a new grad with a CS or similar degree. AI agents render writing code less relevant: but eveything else about building software (aka software engineering fundamentals) more!
English
78
48
934
86K
Nikita Pavlichenko
Nikita Pavlichenko@nv_pavlichenko·
i feel so ashamed that i didn’t end up digging into JEPA papers saga largely because in russian this names kinda sounds like “ass” and it prevents me from taking it seriously enough the only thing that’s helping me feel better is that it was also the reason for multiple respectable ai folks that I personally know
English
0
0
0
90