Chris Bolas

5.3K posts

Chris Bolas

@chrisbolas

Full stack web developer. Linux, python, javascript, robotics, DIY, gardening and fermentation all pique my interest.

Belton, MO Katılım Mayıs 2008

1.1K Takip Edilen419 Takipçiler

Sabitlenmiş Tweet

Chris Bolas@chrisbolas·26 Kas

this is my aesthetic microship.com/bikes/ p.s. you are welcome

English

7.5K

Chris Bolas@chrisbolas·2h

@gondowar my hope with these models generally is that they use NoPE and have the state space model looks like they train up to 512k in the granite 4 models so much longer context capability then I want to train the tiny model, 7B-1B active within an agentic harness

English

Gondavar@gondowar·3h

@chrisbolas I experiment with G4 350m, but I don’t understand or mess with model architecture. I tried 350m H, it took much more vram, back to using 350m. Are you seeing better results with H? Any tips?

English

Chris Bolas@chrisbolas·9h

first steps 1. benchmark env established. 2. double control (symmetry vs SOLAR) 3. 32 layers -> 48/49 4. currently applying dolma3 150B Mix on it as continued pretraining - gathering initial log performance before parameter sweep

Chris Bolas@chrisbolas

legit brain surgery too it's a model upscaling method that sounds cool and I wonder if it's feasible with the IBM granite 4 models with just a slight tasteful modification from the paper arxiv.org/html/2312.1516…

English

123

Chris Bolas@chrisbolas·3h

in that regard I'm just thinking about now what the best hyperparameters are going to be to do a much more thorough continued pretraining session that will then be repeated against the 48 layer as well as the base 32 layer for a continued pretraining control

English

Chris Bolas@chrisbolas·3h

GSM8K is just barely starting to come back when it's initially spliced it outputs repetitions from what I was seeing so it needs training in order to be okay

English

Chris Bolas retweetledi

Teknium (e/λ)@Teknium·5h

Getting Hermes ready to work with the spark over here

English

3.7K

Chris Bolas@chrisbolas·4h

if you made it this far, let me know! :D

English

Chris Bolas@chrisbolas·4h

will it make a difference? well it’s already showing some mildly interesting initial traits

English

Chris Bolas@chrisbolas·16h

lfg 🤘

Chris Bolas@chrisbolas·17h

this also makes the full runs seem much more approachable I was intending to get a guideline from fully brain damaged and "pretrained back to normal"

English

Chris Bolas@chrisbolas·17h

this was a bad sign actually! it wasn't using an sm_120 capable causal_conv1d went down to 10% load and only using 2GB vram now Oh! And iterations/sec went up from 1.5 to like... 20+? whoops lol

Chris Bolas@chrisbolas

initial mini-evals runs

English

Chris Bolas retweetledi

fish@fishPointer·1d

one atom at a time

Jirachi🌟@0xJirachi

we’re rebuilding a proper fucking country 4 good men came and manhandled the future of manufacturing into my garage with me because they didn’t want to see me spend on equipment rental id never even met 3 of them WE’RE REBUILDING A PROPER COUNTRY

English

273

15.1K

Keşfet

@gondowar @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine