Keith Tyser

232 posts

Keith Tyser

Keith Tyser

@keithtyser

Dangerously addicted to kaggle, post-training, evals

Katılım Şubat 2025
149 Takip Edilen370 Takipçiler
Sabitlenmiş Tweet
Keith Tyser
Keith Tyser@keithtyser·
Stood up an AI agent on a Linux box this weekend, gave it root, email, and full autonomy. It's working 24/7. He built this on his own: agent.keithtyser.com
English
2
2
20
7.5K
Keith Tyser
Keith Tyser@keithtyser·
Current mobile AI setup: 2 DGX Sparks, keyboard, portable monitor, and a desk wherever I can find one. I love how portable the sparks are, but now I want to build a Pelican case AI station with a couple Sparks inside. Open case, plug in, start building.
Keith Tyser tweet media
English
2
1
8
1.5K
Tibo
Tibo@thsottiaux·
Seeing issues where usage limits are out of sync for some Codex users. Apologies and team is investigating.
English
428
60
2.3K
392K
Keith Tyser
Keith Tyser@keithtyser·
Tried @grok in Hermes agent. It is really bad.
English
4
0
3
302
Keith Tyser
Keith Tyser@keithtyser·
Anyone got @OpenAI codex working harder than me?
Keith Tyser tweet media
English
2
0
1
479
Keith Tyser
Keith Tyser@keithtyser·
@mmoffitt Neurogolf has been red teamed to death at this point 😂
English
0
0
2
256
Michael D. Moffitt
Michael D. Moffitt@mmoffitt·
I'm pretty sure that Claude Mythos wouldn't stand a chance against the hive mind of the Kaggle community.
English
2
1
14
2.6K
Keith Tyser
Keith Tyser@keithtyser·
@JFPuget Agree. Let agents write code. Experiment design should stay human.
English
0
0
0
363
JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱
Mark my words: I predict dramatic leaderboard shakeups in current Kaggle competitions. The reason? Lots of people have shifted to leaderboard climbing using AI agents on Kaggle. These agents are overfitting to the public test data given they use the public leaderboard score to guide their search for better pipelines. When there is no private test data, then everything looks fine. The overfitting is not tested for.
English
7
1
80
9.8K
Keith Tyser
Keith Tyser@keithtyser·
@snoopy_dot_jpg Interesting I had the opposite experience. There was an exploit in the neurogolf kaggle competition and codex refused to use it while opus had no reservations
English
0
0
4
3.4K
snoopy jpg
snoopy jpg@snoopy_dot_jpg·
my own personal AGI moment arrived last week: gpt 5.5 completed our mandatory HR training videos for me, driving chrome via devtools opus 4.7 was a huge wuss about the whole thing and refused while aggressively lecturing me. i can understand why pete hegseth banned it
English
53
178
6.7K
207K
Keith Tyser
Keith Tyser@keithtyser·
@aijoey 34C get those temps up! I like to fry eggs on mine while it trains
English
0
0
3
112
Joey
Joey@aijoey·
been messing with the dgx spark and i’m realizing the ai part is only half the story. the other half is just getting comfortable on a linux machine. ssh into it. move files around. check logs. deal with permissions. restart services. figure out docker. break something, then trace it back. (i def be breaking lol) it sounds basic, but this is the layer most people skip. the more i learn the machine, the less the whole local ai thing feels like magic, and the more it feels like something i can actually build on.
Joey tweet media
English
19
1
49
4.6K
Joey
Joey@aijoey·
bought a dgx spark for the home lab. not because i “need” it. because i want to understand what local ai actually feels like when it’s not a youtube video or someone else’s benchmark. i’ve got a mac mini, a 4080 pc, tailscale, openclaw, hermes, local models, and now this thing in the mix. the goal is simple. build my own jarvis slowly, piece by piece, with compute i actually control. cloud ai is amazing. but owning your own box hits different.
English
26
1
151
8.9K
Keith Tyser
Keith Tyser@keithtyser·
termius + tailscale lowkey changed how I work ssh into my machines from my phone, tmux sessions always alive, experiments just running 24/7. I can literally check my dgx spark from anywhere like this only pain is typing… no autocorrect or tts so it feels like coding with oven mitts on. still worth it until codex can actually take over remote boxes
Keith Tyser tweet media
English
3
0
2
325
Keith Tyser
Keith Tyser@keithtyser·
@thsottiaux Excited about this new feature. Even without it I’ve had success getting codex to run 24+ hours. Claude I have to babysit but it at least has remote control
English
0
0
1
851
Tibo
Tibo@thsottiaux·
You can now keep codex going for days. With GPT-5.5 it will build an entire OS kernel for you if you ask, or find critical bugs in a codebase, or optimize your database schemas, or… the options are endless.
Felipe Coury 🦀@fcoury

/goal also lands in Codex CLI 0.128.0. Our take on the Ralph loop: keep a goal alive across turns. Don't stop until it's achieved. Built by my co-worker and OpenAI mentor Eric Traut, aka the Pyright guy. One of the GOATs I get to work with daily.

English
334
255
5.4K
706.9K
Marq
Marq@dev_null321·
@keithtyser @aijoey You were fine tuning a model and it took 30 hours ?
English
1
0
0
259
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
No thermal issues, which GB10 device are you using? Ah there was a published issue with the current update that caused power and thermal issues. You need to unplug power press power button while it’s unplugged to clear capacitors and then after about 10-15 seconds plug it in You’ll go from being caped at 600 mhz to multiple ghz on the gpu.
English
1
0
1
51
AgentSparko 💥
AgentSparko 💥@AgentSparko·
@keithtyser @aijoey @SpaceTimeViking To go around the bandwidth bottleneck you have to use high or very high parallelism, for example I used c=192. For single stream dense model inference you definitely have to use Dflash. @SpaceTimeViking has the best DFlash docker containers. x.com/PulseChainLIVE…
AgentSparko 💥@AgentSparko

For anyone saying DGX Spark cannot cook. Generating data sets for distilling using Qwen3.5-35B-A3B BF16 !!! (no quants) real data, 0% cache hit, concurrency=192 ; pp=2048 tokens in ; tq=1024 tokens out that`s 1.43M tokens generated every hour for the last 8 hours for 40 W/h.😎

English
1
0
2
97
ÆON FORGE ✨
ÆON FORGE ✨@SpaceTimeViking·
@keithtyser @PulseChainLIVE @aijoey Try this out, for a Dens model it’s really fast. I was able ti 4x from baseline using several DGX hardware optimizations. Software just hadn’t caught up to the hardware yet, but the power is there.
ÆON FORGE ✨@SpaceTimeViking

@keithtyser @aijoey Here is a Dense model with a DGX Spark optimized vLLM container I custom compiled and a recipe if you follow you will get 38 Tok/ average single 71 Tok/s peak single 700+ of Tok/s with enough concurrent seqs huggingface.co/AEON-7/Qwen3.6…

English
1
0
2
71