aarush

66 posts

aarush

@bxptr_

🇺🇸

san francisco Katılım Mayıs 2021

71 Takip Edilen50 Takipçiler

aarush@bxptr_·2d

@omkizzy so based

English

omkaar@omkizzy·2d

I hand-wrote a 500-LoC RL stack to make hacking on RL research much easier. Most RL stacks are either massive and unhackable, or duct-taped research scripts. I am open-sourcing Mithrl, a modular RLVR stack. Next items on my checklist: adding more complex environment examples, supporting multi-gpu + async RL, and QoL fixes. I might scrap external runtime dependencies (Huggingface PEFT + vLLM) and write purpose-built, simpler versions from scratch if I feel the need. If you want to experiment with RL and are looking to own sovereign tools, I’d love to get on call, understand your requirements and help integrate for free.

English

167

12.8K

aarush@bxptr_·13 Mar

@agniv_s rookie numbers at in-n-out gotta pump those up

English

agniv@agniv_s·12 Mar

This quarter so far has been pretty cool. Made something that 300 people used. Wrote an exposition that hit 30 pages and 40 citations. Biked 300+ miles. Ate In-N-Out 5 times. Wrote 6 poems I’m proud of.

English

991

aarush@bxptr_·20 Şub

@RickRossTN @taalas_inc rick ross man i love your music

English

Rick Ross@RickRossTN·20 Şub

@taalas_inc Unbelievable speed of response from this model. Well done!

English

9.4K

Taalas Inc.@taalas_inc·19 Şub

24 dedicated people. $30M spent on development. Extreme specialization, speed, and power efficiency. Today we launch Taalas’ first product. Check it out: Details: taalas.com/the-path-to-ub… Demo chatbot: chatjimmy.ai API: taalas.com/api-request-fo…

English

465

585

6.1K

4.1M

aarush@bxptr_·19 Şub

@mikeydsoftware @emollick because it’s cool and swag mike

English

Mike D · Software Systems@mikeydsoftware·19 Şub

@emollick there's got to be a reason if it's selling, i'm just confused.

English

888

Ethan Mollick@emollick·19 Şub

The hardcover book of GPT-1’s weights that Claude Code designed, produced, and sold (including the cool cover which visualizes the numbers in the volume) actually came in the mail today and it looks really nice. I never touched any code or did any design or any API to make this.

Ethan Mollick@emollick

Sold out! But I had Claude create and deploy all 80 volumes of The Weights to the site as well-formatted PDFs, so you can download them for free if you want. 58,276 pages in total. 117 million floating point numbers. This is everything that makes GPT-1. weights-press.netlify.app

English

104

2.2K

475.1K

aarush@bxptr_·2 Ara

@_rajanagarwal goated

English

163

rajan agarwal@_rajanagarwal·2 Ara

new blog! if we only reward llms for winning a game, do they naturally learn to deceive other players? we found natural misalignment + fast distillation with RL in multi turn hidden-information games! findings & architecture: rajan.sh/emergent-decep…

English

269

23.2K

aarush@bxptr_·5 Kas

New research! We worked on teaching language models new languages. As language models near superhuman intelligence, it's super important we're inclusive to data-sparse and low-resource languages, and this work tries to get us a step towards that. Check it out!

rajan agarwal@_rajanagarwal

recently, i spent some time working on cross-lingual alignment for LLMs via encoder injection! treating languages as modalities is a compute-efficient way to extend understanding of low-resource languages without extending pre-training or the tokenizer

English

399

aarush@bxptr_·15 Eki

@_rajanagarwal @karpathy goated

English

444

rajan agarwal@_rajanagarwal·15 Eki

make nanochat multimodal for < $10! this evening, i trained nanochatVL: via a projection model (llava-style) between SigLIP ViT and @karpathy nanochat to extend its understanding to images it's a huge wip rn, but have a few promising results! now i can finally sleep

Andrej Karpathy@karpathy

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

English

946

104.7K

aarush@bxptr_·19 Eyl

@will__ye slack pfp is so tuff

English

521

will ye@will__ye·18 Eyl

we only use our AI data analyst for very serious purposes

Ian Macomber@iandmacomber

In the last four weeks, @tryramp Research (our AI analyst agent in slack) has answered 1,476 questions, compared to 66 in our help data channel. This is one of my favorite projects that we've shipped. Our team built an agentic solution based on tools to index domain docs, to search tables and columns, to run SQL iteratively: reasoning through data challenges the way a human would. As @bennstancil wrote: data's constant presence in an organization is like knowing the count of the deck. Though it makes us a bit more informed in each decision, the effect is only felt in the aggregate, as the small edge compounds over time. We are making a lot more data-driven decisions at Ramp than we were two months ago!

English

191

27.3K

aarush@bxptr_·9 Eyl

@TheDanielJeong i’ll work for free if you teach me how to look so fly

English

120

Daniel Jeong@TheDanielJeong·8 Eyl

Asian final boss origin story dropping. World #1 Brawl Stars. Violin prodigy since age 7. Nationally awarded Chinese linguistics. Fluent Korean & Japanese. Valedictorian. Columbia University. SIG quant at 19. My name is Daniel Jeong, and this is my story. I am a Korean-American born to immigrant parents in a small city in Ohio, moved to North Carolina and eventually to an Appalachian town in Tennessee with a population of 2k. Asian population 0.4%. 9 people. That’s my family and 3 of our friends. Growing up in this environment often felt limiting, forced to be the eternal underdog, the odds always stacked against me. Destined for mediocrity but driven by sheer willpower, I always dreamt of the world beyond the fence. I wanted more, to learn everything and achieve something truly spectacular. I am always all-in. At first, it was music. I debuted as a solo violinist when I was 11 years old. Eventually, I performed on the same stages that giants like Yo-yo Ma and Itzhak Perlman once played on. I’m Korean, but I’ve always loved learning other languages. After studying Chinese for 2 years, I begged my parents to send me to China for 4 weeks to be fully immersed. I did the same with Japanese and lived in Osaka for 6 weeks. I’m now fluent and when I was 16, I became the US #1 Non-heritage Chinese speaker by winning the US National Chinese Proficiency Competition. I went on to represent the USA in the International Chinese Proficiency Competition where the top 100 Chinese speakers in the World, 1 or 2 per country, compete on the global stage. I was Valedictorian of the McCallie School, the best high school in the South (and in the nation I would argue) and went to Columbia University, where I graduated with majors in Math, Stats, and Computer Science. I’m hyper-competitive and love games and sports. I was captain of my high school’s first squash team and led the program to National’s in our first year, which became D1 only a few years after inception. I also loved Brawl Stars to the point where I became World #1. I used to play video games, but I felt stuck after grinding to the top. Then, I realized that the stock market is the largest video game ever created. I’ve been obsessed with the markets ever since. When I got to college, I wanted to see who retail investors were trading against. I didn’t know what it looked like, but I could sense that the tools and strategies that were being distributed to retail investors weren’t the ones used by institutions. After years of studying quantitative finance, options theory, probability theory, and playing too much poker, I broke into quant trading at 19 years old at Susquehanna International Group, one of the largest market makers in the world. The internal tools that they have access to are unimaginable. The tech that retail traders have access to is literally decade-old tech. The retail trading experience has gotten much better, and commission-free trading was a game changer for retail traders, but almost ALL pain points still remain. If you have an idea about the world, Donald Trump running for a third term for example, there are so many pain points. Which stocks are affected? How much? How does news impact your portfolio? That’s why I’m building Omen. I am taking everything that I learned and bringing it directly to you. We are building the first agentic trading platform, a quant firm for the people. Omen empowers you to bet on any of your ideas by automating all the research, analysis, and portfolio management using AI agents. Retail traders have never seen this level of ease and power. I have always started as the underdog, but I always come out on top. Retail traders have been the underdog for far too long. It’s time to bet on the underdog. It’s time to bet on me. I am leading the retail trading revolution. From the most unlikely beginnings to the top of the world. Join me.

English

114

723

147.9K

aarush@bxptr_·31 Ağu

@demirdjiantwins nano

Filipino

Demirdjian Twins@demirdjiantwins·30 Ağu

Nano banana + Linah AI + n8n = Ad Factory This system pumps out TikTok/FB/Insta video ads on autopilot using the latest AI video models. - No actors. - No editors. - No overpriced agencies. Just endless, scroll-stopping UGC-style ads at scale. Perfect for e-com brands & growth agencies who need constant creative testing. Here’s how it works: → Drop your product catalog into Airtable → n8n pulls product data + hooks → Linah AI generates video variations (hooks, demos, product-in-hand) → Auto-styles each ad for platform-specific virality → Airtable logs everything so you can track winners 24/7 production. Pennies per video. You own 100% of the assets. Want the full template? Comment “NANO” + like this post, and I’ll DM it to you. (must be following)

English

2.4K

465

5.7K

661.6K

aarush@bxptr_·12 Ağu

@_rajanagarwal @ishaandey_ @ElijahKurien deleted cursor only shadow now

English

264

rajan agarwal@_rajanagarwal·12 Ağu

meet Shadow, a powerful open-source background coding agent! fully featured with a remote environment, codebase indexing/wikis and subagents to understand, write and test your code, directly making PRs to github built in a few weeks w/ @ishaandey_, @ElijahKurien

English

295

53K

aarush@bxptr_·22 Mar

@spikedoanz very interesting read!

English

304

spike@spikedoanz·21 Mar

i wrote a new blog. i hope you all enjoy.

English

310

5.3K

450.3K

aarush@bxptr_·16 Mar

@skydotcs @theo

QME

sky@skydotcs·16 Mar

if you ever wondered what @theo would look like with the colours reversed

English

1.9K

79.7K

aarush@bxptr_·7 Mar

@Dorialexander i’m a human and i can’t even read that

English

195

Alexander Doria@Dorialexander·6 Mar

Hum. Unfortunately Mistral-OCR has still the usual VLM curse: with challenging manuscripts, it hallucinates completely.

English

669

84.8K

aarush@bxptr_·23 Şub

@sonith__ looks fire slide me one🙏🙏

English

Sonith@_sonith·22 Şub

The future of cameras.. link to an album below. Comment for a free camera.

English

8.3K

aarush@bxptr_·19 Şub

@anduriltech

QME

Anduril Industries@anduriltech·19 Şub

dont work at anduril

English

697

376

9.4K

1.8M

aarush retweetledi

bioRxiv Neuroscience@biorxiv_neursci·22 Ara

Biologically Plausible Graph Neural Networks for Simulating Brain Dynamics and Inferring Connectivity biorxiv.org/cgi/content/sh… #biorxiv_neursci

English

5.2K

aarush@bxptr_·22 Ara

cerebrum allows for neural dynamics simulation, connectivity inference, interactive network perturbations, and disease-specific dynamics. links: code: github.com/bxptr/cerebrum blog post: svbrain.xyz/2024/12/20/cer… paper: svbrain.xyz/cerebrum.pdf

English

334

aarush@bxptr_·22 Ara

introducing cerebrum by svbrain.xyz! it's a novel framework that combines biologically inspired neuron models with graph neural networks to simulate and infer synaptic connectivity in large-scale brain networks. we can recreate your brain. check it out!

English

1.2K

aarush retweetledi

Josh Duyan ☰@jduyan·18 Ara

The Silicon Valley Brain Co. develops AI-native HCI software for BCIs, using massive EEG-trained models to enable thought-driven interfaces. Backed by Google AI, it aims to deliver breakthrough consumer tech inspired by high-risk, high-reward research.

English

525

Keşfet

@omkizzy @agniv_s @RickRossTN @taalas_inc @mikeydsoftware @emollick @_rajanagarwal @karpathy