!Briann🗿

4.1K posts

!Briann🗿

@Access2B

data and for the fun of it

San Junipero Inscrit le Şubat 2022

184 Abonnements77 Abonnés

!Briann🗿@Access2B·13h

@Ji_Ha_Kim Same their work is soo cool and i read the 100 hours in kimi and i loved everything @Kimi_Moonshot

English

201

Ji-Ha@Ji_Ha_Kim·18h

I would love to join Kimi, they do amazing research strongly aligned with my interests

Kimi.ai@Kimi_Moonshot

x.com/i/article/2006…

English

160

15K

!Briann🗿@Access2B·13h

@charmainemahach 6, 7 times

English

Charmaine Mahachi@charmainemahach·1d

Out of curiosity, how many times do you read a research paper before implementing it?

English

6.6K

!Briann🗿 retweeté

cinesthetic.@TheCinesthetic·2d

The Society (2019) really had something going, that whole “kids left to run everything” setup spiraling fast, and the fact it got cancelled right when things were opening up still stings.

cinesthetic.@TheCinesthetic

The cancellation of which TV show are you still frustrated about?

English

924

4.5K

317.6K

!Briann🗿@Access2B·1d

@gpt_alex @Kimi_Moonshot shooters will shoot😂

English

Alex@gpt_alex·2d

@Kimi_Moonshot

QME

1.7K

Kimi.ai@Kimi_Moonshot·2d

Come introduce yourself to the team, we have your slippers ready. Reach out at: talent@moonshot.ai

ℏεsam@Hesamation

> be Moonshot > 300 employees, avg age <30 > no departments, no titles, no KPIs > so many former CEOs and founders > 80% of company are introverts > everyone keeps slippers under desk > no bureaucratic culture > some mornings you walk in not knowing what to do > no one tells you if you’re doing well > doesn’t care about job background, care about “taste” > “if you ranked AI companies by employees who play instruments, kimi wins”

English

1.3K

155.1K

!Briann🗿@Access2B·1d

@rasbt Training recipe carrying the model🔥🔥

English

Sebastian Raschka@rasbt·2d

Flagship open-weight release days are always exciting. Was just reading through the Gemma 4 reports, configs, and code, and here are my takeaways: Architecture-wise, besides multi-model support, Gemma 4 (31B) looks pretty much unchanged compared to Gemma 3 (27B). Gemma 4 maintains a relatively unique Pre- and Post-norm setup and remains relatively classic, with a 5:1 hybrid attention mechanism combining a sliding-window (local) layer and a full-attention (global) layer. The attention mechanism itself is also classic Grouped Query Attention (GQA). But let’s not be fooled by the lack of architectural changes. Looking at the benchmarks, Gemma 4 is a huge leap from Gemma 3. This is likely due to the training set and recipe. Interestingly, on the AI Arena Leaderboard, Gemma 4 (31B) ranks similarly to the much larger Qwen3.5-397B-A17B model. But as I discussed in my model evaluation article, arena scores are a bit problematic as they can be gamed and are biased towards human (style) preference. If we look at some other common benchmarks, which I plotted below, we can see that it’s indeed a very clear leap over Gemma 3 and ranks on par with Qwen3.5 27B. Note that there is also a Mixture-of-Experts (MoE) Gemma 4 variant that is slightly smaller (27B with 4 billion parameters active. The benchmarks are only slightly worse compared to Gemma 4 (31B). I omitted the MoE architecture in the figure below because the figure is already very crowded, but you can find it in my LLM Architecture Gallery. Anyways, overall, it's a nice and strong model release and a strong contender for local usage. Also, one aspect that should not be underrated is that (it seems) the model is now released with a standard Apache 2.0 open-source license, which has much friendlier usage terms than the custom Gemma 3 license.

English

161

1.1K

60.3K

François Chollet@fchollet·1d

JAX is what a well-designed low-level machine learning framework looks like. Good design lets you deliver much greater performance with much lower effort. Bad design is the exact opposite.

English

736

50.2K

!Briann🗿@Access2B·1d

@fchollet For real jax breaks down the whole process easily and in a very learnable way

English

115

!Briann🗿 retweeté

OWICH@PiocheBrio·2d

Rono’s drunk uncle clips are finishing me tbh

R🧚🏽‍♀️@RubyMuasya

Everyone is a content creator now, ata haibambi anymore.

English

335

2.8K

104.9K

!Briann🗿@Access2B·1d

Those videos are funny 😂

OWICH@PiocheBrio

Rono’s drunk uncle clips are finishing me tbh

English

143

!Briann🗿 retweeté

.@realgugo·2d

our brain rot probably started with sanjay and craig.

English

499

31K

681.4K

!Briann🗿 retweeté

✮✮✮@inclusivetwts·2d

long ass weekend imenipata na mia

English

597

2.6K

31K

!Briann🗿@Access2B·2d

@space_colonist @AnthropicAI You got a slot for one more?

English

817

Martian@space_colonist·2d

I’m pleased to announce I will be leading a new team on VLA interpretability @AnthropicAI

English

398

38.7K

!Briann🗿 retweeté

disha@yzybby·4d

parents love giving you job hunting and career advice that are like “Have you tried destroying and betraying yourself for nothing”

English

105

5.4K

56.2K

1.3M

!Briann🗿@Access2B·2d

@IamIronLAN Damn they even learning mech interp😐🙂

English

1.3K

krish@IamIronLAN·3d

Stanford is kinda crazy because as a CS undergrad this term you’re choosing between: - CS336: 0 to hero on frontier model training - CS224R taught by Chelsea Finn (founder of Pi) - CS231N taught by Fei Fei Li (Imagenet, WorldLabs CEO) - CS221M Mech Interp Intro taught with Goodfire And a host of personal podcasts delivered by $T CEOs.

Jesse Mu@jayelmnop

protip for stanford undergrads: beware the classes with guest speaker lineups that read like AI coachella. you’re basically paying $5k to listen to a live podcast series.

English

1.4K

188.4K

!Briann🗿 retweeté

maeva@maevaemiliaa·4d

i tried in january i tried in february i tried in march and i will try again in april

English

14.8K

63.8K

743.8K

!Briann🗿 retweeté

mizukii@mizukiiverse·4d

btw i’d swallow 20 cursed fingers for you, but you’re not nerdy enough to understand

English

2.3K

9.4K

263.8K

!Briann🗿 retweeté

CJB, Esq.@CJoeBlack·3d

This is why elite universities are elite universities.

Reva Jariwala@reva_jariwala

how is this a class? absolutely insane line-up

English

543

6.8K

467.5K

!Briann🗿@Access2B·3d

No way I am discovering As You Are by the Weeknd rn...he will ask for water when I am done with this song

English

!Briann🗿 retweeté

tender@tenderizzation·4d

>72B MoE MNIST classifier

Ethan@torchcompiled

ML interview question: You’re training a 72B MoE MNIST classifier. Layer 53 MLP expert 7 destabilizes when the ones in the dataset are turned upside down. What happened?

Français

787

36.8K

!Briann🗿 retweeté

Nick Khami@skeptrune·4d

"claude usage limit reached. your limit will reset at 7pm"

English

103

13.3K

355.5K

Découvrir

@Ji_Ha_Kim @Kimi_Moonshot @charmainemahach @gpt_alex @rasbt @fchollet @space_colonist @AnthropicAI