Abylay Ospan

247 posts

Abylay Ospan banner
Abylay Ospan

Abylay Ospan

@AbylayOspan

Engineer at Amazon EC2 | ex-Magic Leap | Linux Kernel Maintainer. Views are my own. Building @sbnb_io (AI Linux)

Miami, FL Katılım Mart 2014
2.5K Takip Edilen495 Takipçiler
Sabitlenmiş Tweet
Abylay Ospan
Abylay Ospan@AbylayOspan·
We just won 4th place in Google’s Gemma 3n Impact Challenge! 🚀 Huge thanks to my daughter and teammate @AlsuOspan, to all participants for the amazing projects, and to Google for organizing this challenge and opening Gemma 3n to spark projects that make everyday life better! 💚
Google AI Developers@googleaidevs

Fourth place: Sixth Sense for Security Guards This video monitoring system by @AbylayO and @AlsuOspan combines movement detection with multimodal reasoning, using Gemma 3n to distinguish benign events from genuine threats.

English
1
2
7
2.4K
Abylay Ospan
Abylay Ospan@AbylayOspan·
Can’t wait to deploy code on real @CorticalLabs neurons. Whose neurons (and whose DNA) are we running this on? 🧠😅
Abylay Ospan tweet media
English
1
0
2
81
Abylay Ospan
Abylay Ospan@AbylayOspan·
@karpathy At this rate, pretty soon we’ll just hand over a bucket of sand and it’ll come back as perfectly running silicon with the app already hardwired in lol
English
0
0
1
165
Andrej Karpathy
Andrej Karpathy@karpathy·
It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow. Just to give an example, over the weekend I was building a local video analysis dashboard for the cameras of my home so I wrote: “Here is the local IP and username/password of my DGX Spark. Log in, set up ssh keys, set up vLLM, download and bench Qwen3-VL, set up a server endpoint to inference videos, a basic web ui dashboard, test everything, set it up with systemd, record memory notes for yourself and write up a markdown report for me”. The agent went off for ~30 minutes, ran into multiple issues, researched solutions online, resolved them one by one, wrote the code, tested it, debugged it, set up the services, and came back with the report and it was just done. I didn’t touch anything. All of this could easily have been a weekend project just 3 months ago but today it’s something you kick off and forget about for 30 minutes. As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. The biggest prize is in figuring out how you can keep ascending the layers of abstraction to set up long-running orchestrator Claws with all of the right tools, memory and instructions that productively manage multiple parallel Code instances for you. The leverage achievable via top tier "agentic engineering" feels very high right now. It’s not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas. It works a lot better in some scenarios than others (e.g. especially for tasks that are well-specified and where you can verify/test functionality). The key is to build intuition to decompose the task just right to hand off the parts that work and help out around the edges. But imo, this is nowhere near "business as usual" time in software.
English
1.6K
4.8K
37.3K
5.1M
Abylay Ospan
Abylay Ospan@AbylayOspan·
@Gavriel_Cohen Next step is Google’s gVisor for tighter Docker sandboxing - maybe when the claws 🦞 cut their way out 😄
English
0
0
0
1K
Grummz
Grummz@Grummz·
Taalas has a hardcoded LLM for inference, 100% on chip delivering peaks of up to 17,000 tokens per second. Replies are so fast you miss them if you blink. Demo out now: chatjimmy.ai
Grummz tweet media
English
186
184
2.2K
300.1K
Wildminder
Wildminder@wildmindai·
17,000 tokens per second!! Read that again! LLM is hard-wired directly into silicon. no HBM, no liquid cooling, just raw specialized hardware. 10x faster and 20x cheaper than a B200. the "waiting for the LLM to think" era is dead. Code generates at the speed of human thought. Transition from brute-force GPU clusters to actual AI appliances. taalas.com/the-path-to-ub…
Wildminder tweet mediaWildminder tweet mediaWildminder tweet media
English
388
950
7.4K
1.8M
Aakash Gupta
Aakash Gupta@aakashgupta·
Nvidia paid $20 billion for Groq’s IP. Taalas raised $169 million with 24 employees. And they just demonstrated 8x faster single-model inference than Cerebras on the same Llama 3.1 8B. The number everyone’s fixating on is the speed. The number that actually matters is the constraint. HC1 runs exactly one model. Llama 3.1 8B, released July 2024, aggressively quantized to 3-bit and 6-bit precision with measurable quality degradation. You can’t swap in a new architecture. You can’t load different weights. If you want to serve Llama 4, you fabricate an entirely new chip. This tells you everything about what Taalas is actually betting on. They’re betting model release cadences slow down. That enterprises will lock into stable, mature models for 12+ months. That the two-month tapeout cycle they’ve built with TSMC (N6 process, 815 mm2 die, only two metal layers change per model) can keep pace with a frontier that’s still accelerating. The economics on paper are staggering. 0.75 cents per million tokens versus Cerebras at 10 cents. That’s 13x cheaper. Ten HC1 cards in an air-cooled 2U server pull 2,500 watts total. No HBM, no liquid cooling, no advanced packaging. The founder, Ljubisa Bajic, cofounded Tenstorrent and grew it to unicorn status before starting Taalas. Jim Keller was his first angel investor. This team has shipped silicon before. Where it gets interesting is the multi-chip math. Taalas simulated DeepSeek R1 671B across 30 custom HC chips: 12,000 tokens per second at 7.6 cents per million tokens. Nobody has run that in production. Simulated multi-chip inference and production multi-chip inference are different engineering problems with very different failure modes. The real question is market timing. If model improvements keep delivering large generational gains, the two-month fabrication cycle can’t keep up and you’re perpetually running yesterday’s model in silicon. If improvements plateau and enterprises standardize on stable versions for their highest-volume workloads, Taalas wins on pure economics. Medical devices don’t hot-swap models mid-certification. Satellites don’t patch weights in orbit. Nvidia just priced Groq’s fast-inference approach at $20 billion. A 24-person team in a different corner of the same design space just showed 45x the single-model throughput of a B200. The acquisition math writes itself. Whether the production math does is the $169 million bet.
Wildminder@wildmindai

17,000 tokens per second!! Read that again! LLM is hard-wired directly into silicon. no HBM, no liquid cooling, just raw specialized hardware. 10x faster and 20x cheaper than a B200. the "waiting for the LLM to think" era is dead. Code generates at the speed of human thought. Transition from brute-force GPU clusters to actual AI appliances. taalas.com/the-path-to-ub…

English
89
95
938
146.6K
Abylay Ospan
Abylay Ospan@AbylayOspan·
Yes, that’s why I’m cautious as well. In the SEC filing, the address is listed as the “principal executive office address,” which usually refers to the company’s main physical location, often considered its “nerve center” or headquarters. That is where the CEO and other senior executives oversee daily operations and make key strategic decisions. So if that filing is accurate, it would suggest more than just a retail presence.
English
1
0
1
447
Abylay Ospan
Abylay Ospan@AbylayOspan·
Where to build the next Silicon Valley in Miami? 🌴 Definitely not Miami Downtown, Brickell, Wynwood, or Miami Beach - there’s simply no space to build massive campuses like a Googleplex or Meta’s Menlo Park. As a former Magic Leap engineer, my vote is Plantation - which is part of the broader Miami conglomerate. It’s the only company in South FL with true engineering depth and may serve as the primary nucleation site for the entire Miami Silicon Valley. While FAANG has small offices dispersed across the city, they’re mostly non-technical roles - Plantation is where the builders are 🧠💻 Why is this important? I’ve been here in Miami since 2012 and I’ve seen so many talented engineers move here, spend time kitesurfing, sailing, cycling, or playing tennis, then get bored and move back to California. Or they stay, but become a "Miami enjoyer" with no time or motivation to grind. This is why I call it a trap for tech entrepreneurs - to avoid that churn, talent needs to be together - apes stronger together lol🦍 P.S. What do we even call it? AI Palms? 🌴🧠 P.P.S. I wrote this post because I’m watching everyone here on X (@chamath @shaig @BillAckman @jonoringer @patrickc @eladgil) force Miami as the Silicon Valley alternative for a lot of reasons - not least of which is the debated tax on unrealized gains being proposed by @RoKhanna.
Abylay Ospan tweet media
English
3
0
11
2K
Abylay Ospan
Abylay Ospan@AbylayOspan·
Spent the day as invited speakers at my daughter’s school @IprepNorth here in Miami. We shared our journey in the @GoogleDeepMind Gemma3n AI Impact Challenge, where we won a prize place. And of course, I brought a GPU-powered PC with me and demonstrated the journey of the tokens 🙂 I showed the class the opening from @NVIDIAAI GTC March 2025 Keynote, and the room went completely silent watching with their mouths open. In my view, it is one of the best videos to explain where AI is today! Then came a flood of questions. I truly enjoyed the conversation with the students. Ad Astra 🚀
Abylay Ospan tweet mediaAbylay Ospan tweet mediaAbylay Ospan tweet mediaAbylay Ospan tweet media
English
0
0
1
138
mattytay
mattytay@mattytay·
How will Miami attract and retain the best founders in the world?
English
40
2
68
6.7K
Waymo
Waymo@Waymo·
Miami, your Waymo ride is ready. ☀️ Starting today, we're beginning to welcome the first public riders into our fully autonomous ride-hailing service. Read more: waymo.com/blog/2026/01/m…
English
71
120
729
207.4K
reed
reed@reed·
welcome to miami 🌴 @waymo service starts today
English
22
16
217
28K