The Nurse Engineer🇳🇬

1.1K posts

The Nurse Engineer🇳🇬

@boochi_dot_dev

• ICU Nurse • Computer Scientist • NeuralMind • ML Engineer

Portsmouth, England Katılım Ekim 2017

640 Takip Edilen328 Takipçiler

Sabitlenmiş Tweet

The Nurse Engineer🇳🇬@boochi_dot_dev·30 Nis

1/ Most engineers are sleeping on why DeepSeek-V4 is a "black swan" moment for local AI. Guys, look into the future: we are moving from 64K context to 1M context on consumer grade GPUs (RTX 3090/4090). A Twitter Trend 🧵

English

39.6K

The Nurse Engineer🇳🇬@boochi_dot_dev·2d

@arjunkocher Such a brilliant presentation

English

103

Arjun@arjunkocher·2d

Kimi is also Yang Zhilin’s English name.

Filipino

119

12.5K

The Nurse Engineer🇳🇬@boochi_dot_dev·2d

@thegenioo There’s nothing special about composer 2.5; it’s just more compute and quality training data. Most open source labs don’t have this, which is sad cos they actually do the hard work with pre-training and neural algorithm research that pushes the field forward

English

598

Hamza@thegenioo·2d

Honestly I think Moonshot fumbled big time with Kimi K2.6 Their previous model K2.5 was just so good, and it just needed that perfect polish and a few bits of upgrades to get a really strong, cheap model. And you know what? That exists! Yes, it is Composer 2.5. This is what Moonshot should have done: K2.5 should have been polished like Composer 2.5 and released as K2.6 Now don't get me wrong, K2.6 is a very powerful and strong model, but it has some issues: - It just overthinks the hell out of things and gets stuck in long, endless thinking loops - It is unbelievably slow, like really slow - It is good, but I found DeepSeek and Qwen models more efficient, workable, and faster So what Cursor has pulled off here with Composer 2.5 should have been done by Moonshot with the release of Kimi K2.6, and I hope they fix these issues with K3

English

235

25.4K

The Nurse Engineer🇳🇬@boochi_dot_dev·2d

@Mrbhadoosky Alien is a constitutional word.

English

Habdulakeem Bhadmus@Mrbhadoosky·3d

Immigrants in the USA = Alien Immigrants in the UK = Boriswave Life though everywhere.

English

10.7K

The Nurse Engineer🇳🇬@boochi_dot_dev·3d

@neural_avb I also forgot to add that the perplexity score has a correlation to the value of the loss function in next token prediction tasks. So you end up with the best checkpoint having low perplexity and lower loss function output value.

English

The Nurse Engineer🇳🇬@boochi_dot_dev·3d

I’m not a PhD expert yet, but from my experience, this is what pre-training engineers look at: perplexity score. If you have 5 to 10 checkpoints of a pre-trained base LLM, you curate a set of text completion benchmarks across multiple categories such as coding, math, and text generation. Then you validate each checkpoint independently on the benchmarks using perplexity. Note that perplexity tells you how uncertain an LLM is about its prediction. Whichever checkpoint returns the lower perplexity score (i.e., lower uncertainty) becomes the champion or winner checkpoint. One piece of evidence for this approach was documented in the Cursor Composer 2 technical report, where they stated that Kimi K2.5 was chosen as the base LLM for post-training their composer model over others (GLM-5.1, Deepseek-3.2, Qwen-3-235B) because it returned a lower perplexity score on their internal Cursor benchmark for coding-related tasks.

English

AVB@neural_avb·3d

Pretraining is the most mysterious aspect of LM training for me... All I know about is it: "feed the whole internet's data into it and train next token prediction" Feels too simple... More importantly, do you actually measure intelligence at that layer, other than basic token overlap (rogue/bleu/f1)? The last time a base model was reallllly studied was GPT-3... but no one talks or writes papers about this anymore. We all know pretraining is where the model gathers the most world knowledge, albeit in an unstructured way. At that stage the model is already capable of in-context learning and recognizing tasks/patterns. I wanna know what kind of ICL experiments the big labs evaluate their pretrained base models on before going to the next stage. (Or do they?)

English

3.8K

The Nurse Engineer🇳🇬@boochi_dot_dev·5d

@marlene_zw @som_nnamani Awesome job 💯

English

126

Marlene Mhangami@marlene_zw·6d

I genuinely thought no one was recording😂 Shocked to see this online, but here's the workshop from yesterday's Code with Claude event!

Movez@0xMovez

Microsoft Senior AI developer just showed how they build AI agents with Claude at Microsoft. 34-minutes. free. By Microsoft team Opus 4.7 + 1,400+ pre-built MCP tools plug Claude into agent → give it tools → ship to production worth more than any $500 vibe-coding course.

English

853

7.1K

735.8K

The Nurse Engineer🇳🇬 retweetledi

Tom Dörr@tom_doerr·5d

Fits 10 million documents into 4GB of RAM github.com/RyanCodrai/tur…

English

380

3.3K

143K

The Nurse Engineer🇳🇬@boochi_dot_dev·5d

Hot take: You don’t have to love Yann LeCun to see the genius in the JEPA architecture. Theoretically, it’s a perfect match for: • Recommender Systems • Intelligent Tutoring • Neural Adaptive Testing But will it kill Transformers? No. Autoregressive models will absolutely remain SOTA for mainstream GenAI (text, images, etc.). Both can win. 🤝

English

The Nurse Engineer🇳🇬@boochi_dot_dev·5d

Whether Gemini’s underlying architecture is sparse MoE or dense remains, in the strict sense, ignoramus et ignorabimus: unknown and likely unknowable.

English

Djasnive Rajaona@djasnive·6d

@boochi_dot_dev @shiri_shh But it’s an Ultra Sparse Mixture of Experts There must be a trade off from that

English

shirish@shiri_shh·6d

Composer 2.5 just made Gemini 3.5 flash look like a joke. - 3x cheaper. - 63% vs 49% on benchmark score Google fumbled badly today😭

Michael Truell@mntruell

Gemini Flash 3.5 is now on CursorBench, our main coding agent eval. We’ll keep updating the leaderboard as new models come out. cursor.com/evals

English

1.4K

131.8K

The Nurse Engineer🇳🇬@boochi_dot_dev·6d

Evidence 1 x.com/spicey_lemonad…

The Nurse Engineer🇳🇬@boochi_dot_dev

Unpopular opinion: DeepseekV4-flash >>>>> Gemini 3.5 flash

English

The Nurse Engineer🇳🇬@boochi_dot_dev·6d

Unpopular opinion: DeepseekV4-flash >>>>> Gemini 3.5 flash

English

The Nurse Engineer🇳🇬@boochi_dot_dev·6d

@djasnive @shiri_shh That flash is at least >1.5 Trillion parameters (given estimates from its API pricing) and yet still underperforms Composer 2.5 which is 1 Trillion parameters

English

Djasnive Rajaona@djasnive·6d

@shiri_shh it's a Flash Model Though

English

262

The Nurse Engineer🇳🇬@boochi_dot_dev·6d

BS: Nursing (second class upper) MS: Computer Science (distinction) PhD: ??? Mathematics (hopefully) My friends think I’m crazy or mentally deranged 😂😭

Bibliophile 🎀@CodingIncloud_

What did you get a degree in?

English

The Nurse Engineer🇳🇬@boochi_dot_dev·6d

Reinforcement Learning looks very simple and straightforward in its code implementation; it’s almost as if nothing serious is happening until you read the mathematical formulation behind it.

English

The Nurse Engineer🇳🇬@boochi_dot_dev·6d

@Thyndd @pmddomingos Good points, but why does the parallel approach (i.e attention) always perform better than sequential approach (RNN)? We know the parallel should/would be faster to train….yes. But why is it giving better results in terms of performance?

English

Thyndd@Thyndd·19 May

@boochi_dot_dev @pmddomingos But that has nothing to do with attention. Attention actually originates in the context of RNNs. Transformers just said, let's get rid of the R. So it's not attention what makes transformers perform better, it's the fact that they're highly parallelizable and stable to train.

English

Pedro Domingos@pmddomingos·19 May

If the transformers paper was written by one of my students, I wouldn’t let him graduate until he did a better job.

Ash Jogalekar@curiouswavefn

Cosmo Shalizi: "I like to think I am not a stupid man, and I have been reading about, and coding up, neural networks since the early 1990s. But I read Vaswani et al. (2017) ["Attention Is All You Need"] multiple times, carefully, and was quite unable to grasp what "attention" was supposed to be doing. (I could follow the math.) I also read multiple tutorials, for multiple intended audiences, and got nothing from them...the sheer opacity of this literature is I think a real problem." bactra.org/notebooks/nn-a…

English

279

68.9K

The Nurse Engineer🇳🇬@boochi_dot_dev·6d

@ptwino No vex senior🙇‍♂️

Português

Kanyeheyo™@ptwino·6d

Amd you ruined it

The Nurse Engineer🇳🇬@boochi_dot_dev

Very brilliant explanation, I must say👏🏿 A simple rule for positioning semicolon(;) is that it typically stands in place of a conjunction (whether a coordinating or a subordinate conjunction). Also, in a sentence where each clause has multiple commas within each other, you use a semicolon (;) to signal the location of inter-clause linkage from the other commas. For example: Clause 1: Obi, Gbenga and I went to play football Claude 2: However, we returned early because it rained. Combined sentence: Obi, Gbenga and I went to play football, However, we returned early because it rained. ❌ (reason: use of comma here is confusing and not grammatical the correct choice cos each individual clauses have commas within them) Obi, Gbenga and I went to play football; However, we returned early because it rained. ✅

English

The Nurse Engineer🇳🇬@boochi_dot_dev·6d

@IntegralOye @OlawAlausa @shelovesore Thank you ooo Benson🙏

English

Benson@IntegralOye·6d

@OlawAlausa @boochi_dot_dev @shelovesore His own made it clearer: she didn’t tell us what it is replacing and why. Just read the second to the last and also the last paragraph again.

English

khaleesi🧍🏽‍♀️@shelovesore·19 May

Colon (:) introduces something. You usually use it when what comes after is a list, or an explanation, or even a reveal. An example is “she had one rule: never apologize” or “this is a list of things you should get from the market: eggs, tomatoes…” It’s also used as the eyes in a smiley face :) Semicolon (;) connects two complete thoughts that are related but could stand alone as separate sentences. It’s stronger than a comma but softer than a full stop. An example is “Frank never apologized; he didn't think he was wrong” It’s also used as the eyes in a winking face ;)

Chioma Genia@eugeeyy

What's the difference between : and ;

English

185

7.4K

37.1K

1.3M

The Nurse Engineer🇳🇬@boochi_dot_dev·6d

@OlawAlausa @shelovesore Wahala😂😭

Filipino