Chris Bolas

5.3K posts

Chris Bolas banner
Chris Bolas

Chris Bolas

@chrisbolas

Full stack web developer. Linux, python, javascript, robotics, DIY, gardening and fermentation all pique my interest.

Belton, MO Katılım Mayıs 2008
1.1K Takip Edilen419 Takipçiler
Chris Bolas
Chris Bolas@chrisbolas·
@gondowar my hope with these models generally is that they use NoPE and have the state space model looks like they train up to 512k in the granite 4 models so much longer context capability then I want to train the tiny model, 7B-1B active within an agentic harness
English
0
0
1
11
Gondavar
Gondavar@gondowar·
@chrisbolas I experiment with G4 350m, but I don’t understand or mess with model architecture. I tried 350m H, it took much more vram, back to using 350m. Are you seeing better results with H? Any tips?
English
1
0
1
12
Chris Bolas
Chris Bolas@chrisbolas·
first steps 1. benchmark env established. 2. double control (symmetry vs SOLAR) 3. 32 layers -> 48/49 4. currently applying dolma3 150B Mix on it as continued pretraining - gathering initial log performance before parameter sweep
Chris Bolas tweet media
Chris Bolas@chrisbolas

legit brain surgery too it's a model upscaling method that sounds cool and I wonder if it's feasible with the IBM granite 4 models with just a slight tasteful modification from the paper arxiv.org/html/2312.1516…

English
0
0
1
123
Chris Bolas
Chris Bolas@chrisbolas·
in that regard I'm just thinking about now what the best hyperparameters are going to be to do a much more thorough continued pretraining session that will then be repeated against the 48 layer as well as the base 32 layer for a continued pretraining control
English
0
0
0
10
Chris Bolas
Chris Bolas@chrisbolas·
GSM8K is just barely starting to come back when it's initially spliced it outputs repetitions from what I was seeing so it needs training in order to be okay
English
1
0
0
14
Chris Bolas retweetledi
Teknium (e/λ)
Teknium (e/λ)@Teknium·
Getting Hermes ready to work with the spark over here
Teknium (e/λ) tweet media
English
6
3
70
3.7K
Chris Bolas
Chris Bolas@chrisbolas·
if you made it this far, let me know! :D
English
1
0
1
19
Chris Bolas
Chris Bolas@chrisbolas·
will it make a difference? well it’s already showing some mildly interesting initial traits
English
1
0
1
18
Chris Bolas
Chris Bolas@chrisbolas·
this also makes the full runs seem much more approachable I was intending to get a guideline from fully brain damaged and "pretrained back to normal"
English
1
0
2
34
Chris Bolas
Chris Bolas@chrisbolas·
this was a bad sign actually! it wasn't using an sm_120 capable causal_conv1d went down to 10% load and only using 2GB vram now Oh! And iterations/sec went up from 1.5 to like... 20+? whoops lol
Chris Bolas@chrisbolas

initial mini-evals runs

English
1
0
3
93