Jared Castorena

814 posts

Jared Castorena

@JaredC1728

LLM and AI growth is now in my wheelhouse. Startup company with ai 1728Studios LLC. founding member.

Colorado Springs, CO Katılım Ağustos 2023

132 Takip Edilen52 Takipçiler

Jared Castorena@JaredC1728·16 Nis

@BLUECOW009 Hi 👋

@bluecow 🐮@BLUECOW009·16 Nis

reply to this to get a follow back, must be following me

English

1.3K

Jared Castorena@JaredC1728·16 Nis

I finally submitted to the parameter golf... I am working on 5080+5060 so it was the best I could do, I still need to change a few things to the model, but the base submission still stands. github.com/openai/paramet…

English

Jared Castorena@JaredC1728·13 Nis

Claudecode on android is amazing...

English

Jared Castorena@JaredC1728·9 Nis

@Dado50449061 Interesting... you use kv cache still. My models do not.

English

Jared Castorena@JaredC1728·6 Nis

@Dado50449061 This is similar to what I am building

English

Jared Castorena@JaredC1728·2 Nis

@astraiaintel I have to imagine that ice age mosquitos were as large as drones...

English

480

Jared Castorena@JaredC1728·30 Mar

still bringing down loss, working with a new type of model is interesting.. also this model has no KV cache.

English

Jared Castorena@JaredC1728·24 Mar

#GolfTournament getting closer... still .4 bpb off *sigh* close...

English

Jared Castorena@JaredC1728·22 Mar

@VukRosic99 trying to push loss down further and further...

English

Vuk Rosić 武克@VukRosic99·22 Mar

when setting up automated ai researach (for OpenAI's challenge), you first need to establish good workflow do not start with big training, just start with fast trainings to figure out what you will do with results, how you will: 1. first explore with quick architecture changes 2. eliminate very bad ones 3. scale others a bit more 4. eliminate verybad ones, etc first focus on establishing this workflow, here is the prompt to set this up (debugging): "1. test 15x1 second long runs on different training architectures, take best 7 and test them on 2 seconds, take best 3 and scale to 3 seconds, also have a separate baseline for the 3 seconds, do this now, and write 3 tables at the end comparing all of them, loss as well as baseline, delta loss, one column epxlaining the architecture change, then a few sentences on the conclusion, i know its too short training but let's pretend it's a real thing, it's for debugging, add a skill for this and make it aable to specify duration, go with this duration now" Thank you @novita_labs for compute ❤️

English

1.6K

Jared Castorena@JaredC1728·21 Mar

@VukRosic99 I can hit about 1.4 bpb val. When I quantized my model I lose about 0.0023 loss eval. Seems like my new architecture holds up to being quantized lol

English

202

Vuk Rosić 武克@VukRosic99·21 Mar

> you dont need a lot of gpus to do ai research > even researchers at anthropic and openai are only using 8 gpus to 64 gpus for new ideas, which is completely rentable and obtainable (pay for it with your money, money is a tool not a sacred religion that must not be touched) > you can also use 1 gpu (i pay for many myself though, it's better)

OpenAI@OpenAI

Are you up for a challenge? openai.com/parameter-golf

English

7.5K

Jared Castorena@JaredC1728·21 Mar

@VukRosic99 my results so far... its on my meager 5080 lol

English

Vuk Rosić 武克@VukRosic99·20 Mar

my world records for openai challenge are nearining currently i'm watining for experiments to finish:

OpenAI@OpenAI

Are you up for a challenge? openai.com/parameter-golf

English

249

Jared Castorena@JaredC1728·21 Mar

@OpenAI looks like I have a new contender. this is running on a 5080....

English

OpenAI@OpenAI·18 Mar

Are you up for a challenge? openai.com/parameter-golf

English

382

290

4.2K

1.4M

Jared Castorena@JaredC1728·20 Mar

@OpenAI Does it have to be transformers based ?

English

Jared Castorena@JaredC1728·20 Mar

Language follows the golden ratio...

English

Jared Castorena@JaredC1728·17 Mar

@BLUECOW009 nice nice ! congratz ! how did you solve for memory contradictions with recall lookup ? also, this from my initial look seems to be a good system1 thinking type of system yes ?

English

311

@bluecow 🐮@BLUECOW009·17 Mar

just open-sourced nuggets the first personal AI assistant with holographic memory it lives in your telegram, remembers everything across sessions, and reaches out when it has something useful to say no vector db. no embeddings api. just complex-valued vectors and algebraic unbinding facts that keep getting recalled auto-promote to permanent memory. the agent gets smarter, faster, and cheaper over time

English

5.8K

Jared Castorena@JaredC1728·16 Mar

ZXX

Jared Castorena@JaredC1728·16 Mar

Working on a new type of model...

English

Jared Castorena@JaredC1728·16 Mar

@VictorTaelin Realize you are imperfect → find someone else who realizes they are too imperfect → try to form a more perfect union

English

Jared Castorena@JaredC1728·15 Mar

@hxiao Claude code fixed this for me, recompiled and fixed the issue. Llama-cpp has no ai authored submissions so I never uploaded it back, just a local fix for my machine..

English

226

Han Xiao@hxiao·15 Mar

uh..Qwen3.5-35B-A3B on llama.cpp re-prefill on every request, ~4x slower than it should be. anyone solved this? Thought people have happily deployed & used it locally? But if this is not solved yet, the perf is quite limited. Root cause: GDN layers are recurrent → pos_min tracks full sequence → but llama.cpp validates cache using an SWA threshold that defaults to 1 for non-SWA models → pos_min > 1 always true → cache always discarded → full re-refill every time?

English

271

27.2K

Jared Castorena@JaredC1728·12 Mar

heh, interesting a simulated proprioceptive emotional orientation over memory storage... #hermes

English

Keşfet

@BLUECOW009 @Dado50449061 @astraiaintel @VukRosic99 @novita_labs @OpenAI @elonmusk @BarackObama