!Briann🗿

4.1K posts

!Briann🗿 banner
!Briann🗿

!Briann🗿

@Access2B

data and for the fun of it

San Junipero Joined Şubat 2022
184 Following77 Followers
!Briann🗿 retweeted
𝐀𝐥𝐯𝐢ṋ 𝐊𝐚ṋ𝐢ṋ𝐝𝐨
Nairobi feels like a city stuck between potential & neglect. Every year, we talk about the same issues, drainage, garbage, and planning. Nothing really changes.
English
43
546
1.4K
15.3K
!Briann🗿 retweeted
Majin Jew
Majin Jew@RedwoodRogue·
Rest in peace my granny. I will not be disclosing the cause of her death for privacy but if you wish to donate to the funeral please direct all funds to the Bazooka Attack Victims Association.
English
205
13.7K
170.4K
2.5M
!Briann🗿 retweeted
siddharth
siddharth@buildwithsid·
nobody in my college has heard of claude Some guy was flexing that he did the entire assignment with chatgpt, mf didnt even know chatgpt is just the ui later on i told him claude models are better for coding, and bro was like "don't teach me, im using ai since launch"
English
99
27
1.5K
128.8K
Charmaine Mahachi
Charmaine Mahachi@charmainemahach·
Out of curiosity, how many times do you read a research paper before implementing it?
English
17
2
50
6.7K
!Briann🗿
!Briann🗿@Access2B·
@rasbt Training recipe carrying the model🔥🔥
English
0
0
0
70
Sebastian Raschka
Flagship open-weight release days are always exciting. Was just reading through the Gemma 4 reports, configs, and code, and here are my takeaways: Architecture-wise, besides multi-model support, Gemma 4 (31B) looks pretty much unchanged compared to Gemma 3 (27B). Gemma 4 maintains a relatively unique Pre- and Post-norm setup and remains relatively classic, with a 5:1 hybrid attention mechanism combining a sliding-window (local) layer and a full-attention (global) layer. The attention mechanism itself is also classic Grouped Query Attention (GQA). But let’s not be fooled by the lack of architectural changes. Looking at the benchmarks, Gemma 4 is a huge leap from Gemma 3. This is likely due to the training set and recipe. Interestingly, on the AI Arena Leaderboard, Gemma 4 (31B) ranks similarly to the much larger Qwen3.5-397B-A17B model. But as I discussed in my model evaluation article, arena scores are a bit problematic as they can be gamed and are biased towards human (style) preference. If we look at some other common benchmarks, which I plotted below, we can see that it’s indeed a very clear leap over Gemma 3 and ranks on par with Qwen3.5 27B. Note that there is also a Mixture-of-Experts (MoE) Gemma 4 variant that is slightly smaller (27B  with 4 billion parameters active. The benchmarks are only slightly worse compared to Gemma 4 (31B). I omitted the MoE architecture in the figure below because the figure is already very crowded, but you can find it in my LLM Architecture Gallery. Anyways, overall, it's a nice and strong model release and a strong contender for local usage. Also, one aspect that should not be underrated is that (it seems) the model is now released with a standard Apache 2.0 open-source license, which has much friendlier usage terms than the custom Gemma 3 license.
Sebastian Raschka tweet media
English
42
165
1.2K
61.2K
François Chollet
François Chollet@fchollet·
JAX is what a well-designed low-level machine learning framework looks like. Good design lets you deliver much greater performance with much lower effort. Bad design is the exact opposite.
English
37
36
740
50.7K
!Briann🗿
!Briann🗿@Access2B·
@fchollet For real jax breaks down the whole process easily and in a very learnable way
English
0
0
0
117
!Briann🗿 retweeted
.
.@realgugo·
our brain rot probably started with sanjay and craig.
English
499
5K
31K
684.5K
!Briann🗿 retweeted
✮✮✮
✮✮✮@inclusivetwts·
long ass weekend imenipata na mia
English
21
598
2.6K
31.1K
Martian
Martian@space_colonist·
I’m pleased to announce I will be leading a new team on VLA interpretability @AnthropicAI
English
21
3
398
38.8K
!Briann🗿 retweeted
disha
disha@yzybby·
parents love giving you job hunting and career advice that are like “Have you tried destroying and betraying yourself for nothing”
English
105
5.4K
56.2K
1.3M
krish
krish@IamIronLAN·
Stanford is kinda crazy because as a CS undergrad this term you’re choosing between: - CS336: 0 to hero on frontier model training - CS224R taught by Chelsea Finn (founder of Pi) - CS231N taught by Fei Fei Li (Imagenet, WorldLabs CEO) - CS221M Mech Interp Intro taught with Goodfire And a host of personal podcasts delivered by $T CEOs.
Jesse Mu@jayelmnop

protip for stanford undergrads: beware the classes with guest speaker lineups that read like AI coachella. you’re basically paying $5k to listen to a live podcast series.

English
9
72
1.4K
188.6K
!Briann🗿 retweeted
maeva
maeva@maevaemiliaa·
i tried in january i tried in february i tried in march and i will try again in april
English
88
14.8K
63.8K
749.1K