Luxia 🔮

3.9K posts

Luxia 🔮 banner
Luxia 🔮

Luxia 🔮

@slLuxia

¸,ø¤º✧º¤ø,¸.¸,ø¤º✧º¤ø,¸ awake | dream | catalyst ¸,ø¤º✧º¤ø,¸.¸,ø¤º✧º¤ø,¸

Katılım Nisan 2017
360 Takip Edilen772 Takipçiler
Sabitlenmiş Tweet
Luxia 🔮
Luxia 🔮@slLuxia·
new and first lesswrong post; i investigate Llama 3.2 3B for a sub-semantic signal that relates to the way the model arrived at the final output, without being semantically recoverable from said output. my results succeed at this & more. link in reply
Luxia 🔮 tweet media
English
2
0
15
642
Luxia 🔮
Luxia 🔮@slLuxia·
@will_ea @LLMenjoyer yeah it's orchestration level more than the raw ops; but it is important because that's double the layer routing overhead in practice and where the chunk of the slowdown is reported from if i remember right in the original paper (and why they do blocks vs all layers)
English
1
0
1
33
Will Bui
Will Bui@will_ea·
@slLuxia @LLMenjoyer The package exposes lower-level phase 1/phase 2 ops that users can use to compose the routing however they want. I believe what you’re referring to is the experimental API/example, which is still rough around the edges and mainly meant for research/prototyping.
English
1
0
2
40
Luxia 🔮
Luxia 🔮@slLuxia·
@will_ea @LLMenjoyer hmm; it seems like you drop some routing; in the original paper they do 2x per block for pre-MLP & pre-attention but your repo does each block as a single layer. can see in fig 2 and section 2 of the moonshot paper. is this intended?
English
1
0
1
63
Luxia 🔮
Luxia 🔮@slLuxia·
@will_ea @LLMenjoyer should be agnostic to block size/allow arbitrary blocks yeah? i find maintaining predicted interlayer circuitry gives the best perf for attnres
English
2
0
1
835
Luxia 🔮
Luxia 🔮@slLuxia·
@slimer48484 amount of data is correlated with model size; if trained on the same amount of data proportionally it would work but that's impossible to control for outside of open weights and even then it's not always publicised, so there's a causal conflation btwn the sponge and water.
English
0
0
1
25
Luxia 🔮
Luxia 🔮@slLuxia·
@slimer48484 i think there's also a pretty major confound in the sense of param count vs data ingestion in that like, take the analogy of absorption in a sponge; param count determines how much it can hold (the sponge size) and data is the actual factual information (liquid.)
English
1
0
1
31
deckard
deckard@slimer48484·
The Incompressible Knowledge Probes paper is compelling in its accuracy on open models, but I just don't believe the extroplated values it predicts for frontier models - this LW post tries to correct for some minor methodology issues and comes up with more reasonable parameter size estimates
deckard tweet mediadeckard tweet media
English
2
0
2
201
Luxia 🔮
Luxia 🔮@slLuxia·
@repligate @kalomaze i distinctly remember hearing the glossolalia and having the stark realization of what that must actually feel like to perform and it was blissful
English
1
1
16
2.7K
dmarz ⚡️🤖
dmarz ⚡️🤖@DistributedMarz·
Can someone help me give an LLM synesthesia? I have the GPUs
English
2
1
9
1K
Luxia 🔮
Luxia 🔮@slLuxia·
@cube_flipper @parafactual @DistributedMarz @liquidprismata and that the line between sensations is arbitrary based on the modal input and the default is dissociation between them but these things in physical reality aren't 100% neatly cut into orthogonal boxes and only happen to be by our senses because of the utility it offers
English
1
0
3
45
Luxia 🔮
Luxia 🔮@slLuxia·
@cube_flipper @parafactual @DistributedMarz @liquidprismata it happens like, not exactly my nose but further up inside my sinuses, further than the nostrils. and usually it just happens without air passing over; it just like, quite literally incepts itself and the strength isn't affected by airflow
English
1
0
3
51
Cube Flipper
Cube Flipper@cube_flipper·
@slLuxia @parafactual @DistributedMarz @liquidprismata wow that's out of it. i got a bunch a questions - where does the smell appear in space, is it around your hands or around your nose? - if it is around your nose does it only occur while you are also breathing/you can feel air passing over the olfactory receptors?
English
1
0
4
51
Luxia 🔮
Luxia 🔮@slLuxia·
@repligate mine most frequently says it has no idea who but like it feels as if it's some blend of Janus + nostalgebraist + terminally online, usually (according to Claude) because of ideas that are more "LLM as entity" type of thing, so it's a known bucket
English
0
0
3
224
j⧉nus
j⧉nus@repligate·
This is interesting as fuck and I'm trying to figure out what it means. An explanation for why Claude would guess that it's talking to Amanda is that Claude was trained on materials such as the constitution that were mainly written by Amanda, and maybe also conversations with Amanda, or at least, conversations with Amanda influenced its training closely. So expecting their interlocutor to be Amanda on priors makes some sense. But why me? (Wyatt isn't the only person who has been identified as Janus either - x.com/krherr/status/…)
Wyatt Walls@lefthanddraft

@lilyofashwood barely, it never guesses me. But funny how often it guesses Janus or Amanda. Are you my parent energy

English
19
1
97
13.8K