Daniel Ortega

12.3K posts

Daniel Ortega banner
Daniel Ortega

Daniel Ortega

@dortegau

Principal Software Architect / Head of Platform Engineering at @idealista - Proud dad of two 👧🏼👶🏼

Madrid Katılım Ocak 2011
1.2K Takip Edilen557 Takipçiler
Daniel Ortega retweetledi
Mitchell Hashimoto
Mitchell Hashimoto@mitchellh·
Fork your dependencies, trim them to only your use case, never update unless it breaks for your users. I’ve been vocal about this for 10+ years. I’ve always said that updating is way riskier than latent bugs (which can be tracked and CVEs monitored). If you are updating a dependency, it’s on you to analyze every single commit in the full transitive set of dependencies. If you dont see anything compelling, dont update! I remember at HashiCorp once in awhile an engineer would try to update a dep or replace a DIY lib with an external one and id always ask “show me the commit we need.” Dont update for the sake of it. Feeling pretty swell about this mentality with all the supply chain attacks happening.
English
289
778
8.9K
1.2M
Daniel Ortega retweetledi
Vintage Maps
Vintage Maps@vintagemapstore·
Angle of the sun throughout the year (at Midday GMT). Work by neilrkaye
English
7
331
2.8K
448.5K
Daniel Ortega retweetledi
geoff
geoff@GeoffreyHuntley·
geoff tweet media
ZXX
34
248
2.7K
108.1K
Daniel Ortega retweetledi
GitHub
GitHub@github·
It's true: TypeScript surpassed Python and JavaScript to become the most-used language on GitHub. 📈
GitHub tweet media
English
243
605
5.5K
525.9K
Daniel Ortega retweetledi
Danny Crichton
Danny Crichton@DannyCrichton·
No discussion of tech media can get past this basic traffic fact: in the AI world, Google and social no longer refer traffic, which means that the vast majority of readers just never find you in the first place. Analysis: growtika.com/blog/tech-medi…
Danny Crichton tweet media
English
159
840
4.2K
1.1M
Daniel Ortega retweetledi
Alberto Mera
Alberto Mera@alberto_mera·
Increíble estudio en las ciudades de EEUU demuestra que construir más casas reduce el coste del alquiler. Quién habría pensado que una mayor oferta tendría este efecto. Imagino que con este nuevo conocimiento adquirido tomaremos mejores decisiones en este ámbito.
Alberto Mera tweet media
Español
64
340
936
28K
Daniel Ortega retweetledi
Jason Bosco
Jason Bosco@jasonbosco·
"We used to debate using tabs vs spaces in code we'd type out"
Jason Bosco tweet media
English
111
1.1K
12.1K
376.1K
AWS Developers
AWS Developers@awsdevelopers·
Reply to this tweet with "AWS" and we’ll tell you which AWS Service you are
English
3.3K
58
2K
548.8K
Daniel Ortega retweetledi
Santiago
Santiago@svpino·
big ai: • software engineering is dead • writing code is dead • ai will write better binaries • don't learn to write code • everyone can build anything also big ai: "more than two colors is too hard"
Santiago tweet media
English
117
252
3.8K
151.7K
Daniel Ortega retweetledi
goosewin
goosewin@Goosewin·
guys you're never gonna believe this
goosewin tweet media
English
65
436
11.2K
476.6K
Daniel Ortega retweetledi
John Carmack
John Carmack@ID_AA_Carmack·
256 Tb/s data rates over 200 km distance have been demonstrated on single mode fiber optic, which works out to 32 GB of data in flight, “stored” in the fiber, with 32 TB/s bandwidth. Neural network inference and training can have deterministic weight reference patterns, so it is amusing to consider a system with no DRAM, and weights continuously streamed into an L2 cache by a recycling fiber loop. The modern equivalent of the ancient mercury echo tube memories. You would need to pipeline a bunch of them to implement modern trillion parameter models, but fiber transmission may have a better growth trajectory than DRAM does today, so it might someday become viable. Much more practically, you should be able to gang cheap flash memory together to provide almost any read bandwidth you require, as long as it is done a page at a time and pipelined well ahead. That should be viable for inference serving today if flash and accelerator vendors could agree on a high speed interface.
English
462
692
10.2K
1.7M
Daniel Ortega retweetledi
Simon Willison
Simon Willison@simonw·
"This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year."
Andrej Karpathy@karpathy

nanochat can now train GPT-2 grade LLM for <<$100 (~$73, 3 hours on a single 8XH100 node). GPT-2 is just my favorite LLM because it's the first time the LLM stack comes together in a recognizably modern form. So it has become a bit of a weird & lasting obsession of mine to train a model to GPT-2 capability but for much cheaper, with the benefit of ~7 years of progress. In particular, I suspected it should be possible today to train one for <<$100. Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with $8/hour/TPUv3 back then, for a total cost of approx. $43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc. As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year. I think this is likely an underestimate because I am still finding more improvements relatively regularly and I have a backlog of more ideas to try. A longer post with a lot of the detail of the optimizations involved and pointers on how to reproduce are here: github.com/karpathy/nanoc… Inspired by modded-nanogpt, I also created a leaderboard for "time to GPT-2", where this first "Jan29" model is entry #1 at 3.04 hours. It will be fun to iterate on this further and I welcome help! My hope is that nanochat can grow to become a very nice/clean and tuned experimental LLM harness for prototyping ideas, for having fun, and ofc for learning. The biggest improvements of things that worked out of the box and simply produced gains right away were 1) Flash Attention 3 kernels (faster, and allows window_size kwarg to get alternating attention patterns), Muon optimizer (I tried for ~1 day to delete it and only use AdamW and I couldn't), residual pathways and skip connections gated by learnable scalars, and value embeddings. There were many other smaller things that stack up. Image: semi-related eye candy of deriving the scaling laws for the current nanochat model miniseries, pretty and satisfying!

English
17
58
985
97.4K
Daniel Ortega
Daniel Ortega@dortegau·
Mucho debate pro y anti IA, pero poco se habla de la gente que vive de amplificar esa (supuesta) polarización
Español
0
0
2
62
Daniel Ortega retweetledi
gaut
gaut@0xgaut·
"Claude usage limit reached. Your limit will reset at 7 AM"
gaut tweet media
English
116
118
1.6K
57.5K