!Briann🗿

4.1K posts

!Briann🗿 banner
!Briann🗿

!Briann🗿

@Access2B

data and for the fun of it

San Junipero Inscrit le Şubat 2022
184 Abonnements77 Abonnés
Charmaine Mahachi
Charmaine Mahachi@charmainemahach·
Out of curiosity, how many times do you read a research paper before implementing it?
English
17
2
48
6.6K
!Briann🗿
!Briann🗿@Access2B·
@rasbt Training recipe carrying the model🔥🔥
English
0
0
0
63
Sebastian Raschka
Flagship open-weight release days are always exciting. Was just reading through the Gemma 4 reports, configs, and code, and here are my takeaways: Architecture-wise, besides multi-model support, Gemma 4 (31B) looks pretty much unchanged compared to Gemma 3 (27B). Gemma 4 maintains a relatively unique Pre- and Post-norm setup and remains relatively classic, with a 5:1 hybrid attention mechanism combining a sliding-window (local) layer and a full-attention (global) layer. The attention mechanism itself is also classic Grouped Query Attention (GQA). But let’s not be fooled by the lack of architectural changes. Looking at the benchmarks, Gemma 4 is a huge leap from Gemma 3. This is likely due to the training set and recipe. Interestingly, on the AI Arena Leaderboard, Gemma 4 (31B) ranks similarly to the much larger Qwen3.5-397B-A17B model. But as I discussed in my model evaluation article, arena scores are a bit problematic as they can be gamed and are biased towards human (style) preference. If we look at some other common benchmarks, which I plotted below, we can see that it’s indeed a very clear leap over Gemma 3 and ranks on par with Qwen3.5 27B. Note that there is also a Mixture-of-Experts (MoE) Gemma 4 variant that is slightly smaller (27B  with 4 billion parameters active. The benchmarks are only slightly worse compared to Gemma 4 (31B). I omitted the MoE architecture in the figure below because the figure is already very crowded, but you can find it in my LLM Architecture Gallery. Anyways, overall, it's a nice and strong model release and a strong contender for local usage. Also, one aspect that should not be underrated is that (it seems) the model is now released with a standard Apache 2.0 open-source license, which has much friendlier usage terms than the custom Gemma 3 license.
Sebastian Raschka tweet media
English
41
161
1.1K
60.3K
François Chollet
François Chollet@fchollet·
JAX is what a well-designed low-level machine learning framework looks like. Good design lets you deliver much greater performance with much lower effort. Bad design is the exact opposite.
English
37
37
736
50.2K
!Briann🗿
!Briann🗿@Access2B·
@fchollet For real jax breaks down the whole process easily and in a very learnable way
English
0
0
0
115
!Briann🗿 retweeté
.
.@realgugo·
our brain rot probably started with sanjay and craig.
English
499
5K
31K
681.4K
!Briann🗿 retweeté
✮✮✮
✮✮✮@inclusivetwts·
long ass weekend imenipata na mia
English
21
597
2.6K
31K
Martian
Martian@space_colonist·
I’m pleased to announce I will be leading a new team on VLA interpretability @AnthropicAI
English
21
3
398
38.7K
!Briann🗿 retweeté
disha
disha@yzybby·
parents love giving you job hunting and career advice that are like “Have you tried destroying and betraying yourself for nothing”
English
105
5.4K
56.2K
1.3M
krish
krish@IamIronLAN·
Stanford is kinda crazy because as a CS undergrad this term you’re choosing between: - CS336: 0 to hero on frontier model training - CS224R taught by Chelsea Finn (founder of Pi) - CS231N taught by Fei Fei Li (Imagenet, WorldLabs CEO) - CS221M Mech Interp Intro taught with Goodfire And a host of personal podcasts delivered by $T CEOs.
Jesse Mu@jayelmnop

protip for stanford undergrads: beware the classes with guest speaker lineups that read like AI coachella. you’re basically paying $5k to listen to a live podcast series.

English
9
72
1.4K
188.4K
!Briann🗿 retweeté
maeva
maeva@maevaemiliaa·
i tried in january i tried in february i tried in march and i will try again in april
English
88
14.8K
63.8K
743.8K
!Briann🗿 retweeté
mizukii
mizukii@mizukiiverse·
btw i’d swallow 20 cursed fingers for you, but you’re not nerdy enough to understand
English
96
2.3K
9.4K
263.8K
!Briann🗿
!Briann🗿@Access2B·
No way I am discovering As You Are by the Weeknd rn...he will ask for water when I am done with this song
English
0
0
0
13
!Briann🗿 retweeté
Nick Khami
Nick Khami@skeptrune·
"claude usage limit reached. your limit will reset at 7pm"
English
103
1K
13.3K
355.5K