Omar

122 posts

Omar

@kouhxp

https://t.co/6jKHsNnXvl

DC Katılım Şubat 2026

21 Takip Edilen12 Takipçiler

Omar@kouhxp·54s

@fabianstelzer FWIW the small OCR-VL models don't really take few-shots. This is a fixed decode pipeline, no prompt surface. For steerable few-shot you basically need a general VLM x.com/kouhxp/status/…

Omar@kouhxp

built textsnap: paste any image, screenshot, or webpage url, get a plaintext transcript runs on CPU, fully offline after first install, one command

English

fabian@fabianstelzer·5 Eyl

is there any way to give a vision / image2text model few shots to guide inference?

English

797

Omar@kouhxp·2m

@burkov @GoogleAI exactly why I built textsnap. Local vision-model OCR on CPU, no cloud, no refusals, no "redacted email" surprises. Image never leaves the machine. x.com/kouhxp/status/…

Omar@kouhxp

built textsnap: paste any image, screenshot, or webpage url, get a plaintext transcript runs on CPU, fully offline after first install, one command

English

BURKOV@burkov·1d

I asked a Gemini model to OCR a paper. Check what it has done to email addresses. Not only does it systematically refuses to perform OCR, citing the creation of illegal copies as the reason for rejection, but now it also redacts email addresses! @GoogleAI, please fix the OCR. It's incredible that I need to invent a story for why I need OCR and include it in the prompt so that it doesn't reject a perfectly legitimate request.

English

5.1K

Omar@kouhxp·3m

@simonw basically the workflow I built textsnap for. ocr to markdown on CPU, fully offline after first run, so the alt-text draft never leaves your machine x.com/kouhxp/status/…

Omar@kouhxp

built textsnap: paste any image, screenshot, or webpage url, get a plaintext transcript runs on CPU, fully offline after first install, one command

English

Simon Willison@simonw·15 May

I built a tool to help create these which lets you drop in the slide images, OCR the initial alt text and then edit the alt text and annotations I wrote it with GPT-4 a couple of years ago, just I gave it a design refresh with Claude 3.7 Sonnet (thinking) simonwillison.net/2025/May/15/an…

English

5.1K

Simon Willison@simonw·15 May

Here's the full workshop handout plus annotated slides from "Building software on top of Large Language Models", a three hour tutorial I presented yesterday at PyCon US #PyConUS simonwillison.net/2025/May/15/bu…

English

100

655

42.4K

Omar@kouhxp·27m

built textsnap: paste any image, screenshot, or webpage url, get a plaintext transcript runs on CPU, fully offline after first install, one command

English

Omar@kouhxp·1h

@garrytan Approximated the same four demo behaviors with off-the-shelf parts x.com/kouhxp/status/…

Omar@kouhxp

Thinking Machines trained a 276B model for their Interaction Models demo. I had a CPU laptop and a cent to spend. Here's how close it gets

English

185

Garry Tan@garrytan·11h

Thinking Machines is impressive. In a couple hours I just fine tuned my own Qwen3.5-397B model this afternoon. Fast usable multimodal is also going to enable very mind-blowing personal AI.

Thinking Machines@thinkymachines

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English

118

1.9K

230.1K

Omar@kouhxp·16h

@rohanpaul_ai So the drivers live in the neural net, the training data is every possible human-computer interaction ever recorded, and 'adding new software' means retraining the whole model? Cool vision

English

Rohan Paul@rohanpaul_ai·19h

"You could basically imagine, completely neural computers in a certain sense. Imagine a device that takes raw videos or audio into basically what is a neural net, and uses diffusion to render a UI that is unique for that moment in a certain sense." ~ Andrej Karpathy Going by this, the next big software shift may be that much of the software disappears. Karpathy’s point is not simply that AI will help us build apps faster; it is that many apps may be artifacts of a world where computers needed every intermediate step spelled out. He says "I kind of feel like, in the early days of computing, people were actually a little bit confused as to whether computers would look like calculators or whether computers would look like neural nets. In the 50s and 60s, it was not really obvious which way it would go. Of course, we went down the calculator path and ended up building classical computing. Neural nets are currently running virtualized on existing computers, but you could imagine that a lot of this will flip, and that the neural net becomes kind of like the host process, while the CPUs become kind of like the co-processor." Classical software treats the CPU as the host process and intelligence as something bolted on through tools, scripts, models, and APIs. Karpathy is imagining the reverse: the neural network becomes the host process, while conventional code becomes a small deterministic accessory for tasks where exactness still matters. This is why the future interface may not look like a better app store. It may look like raw video, audio, documents, or intent entering a neural system, with the interface itself generated for that moment rather than built in advance by a product team. --- From "Sequoia Capital" YouTube channel, (link in comment)

English

108

Omar@kouhxp·23h

@geohotarchive the leash that feels like freedom is scarier than the singleton with no outside

English

448

george hotz archive@geohotarchive·1d

There is only one bad AI scenario geohot.github.io//blog/jekyll/u…

English

5.2K

Omar@kouhxp·1d

@yacineMTB I stopped reading after CUDA GPU

English

188

kache@yacineMTB·1d

Pufferlib is insane. You can train neural networks to play games out of the box if you have a CUDA GPU. Like breakout, Atari games, continuous action space problems. You can go to the website right now and they have neural nets running in wasm

English

454

17.9K

Omar@kouhxp·1d

@ylecun @jtopentactic @Noahpinion the brain doesn't separate memory from compute, so it pays near-zero to move data

English

Yann LeCun@ylecun·3d

@jtopentactic @Noahpinion Actually, absolutely *everyone* is talking about it because that's where all the opex money goes.

English

4.8K

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion·3d

People are starting to realize that AIs are superintelligent because they combine roughly human-level reasoning with computer-like speed, knowledge, and working memory.

will depue@willdepue

bro it isn’t generally intelligent bro its only read every book and paper ever written and just making connections between them bro. its only thinking for twenty hours bro it’s just brute force thinking bro. its only solving erdos problems bro it could never be an accountant bro

English

1.7K

245.5K

Omar@kouhxp·1d

@rtwlz would love to see $ equivalent of private company making 6,244 people stand around for those wait times

English

1.7K

Riley Walz@rtwlz·1d

6,244 people are waiting in line at California DMVs. walzr.com/ca-dmv/

English

1.2K

76.6K

Omar@kouhxp·1d

@garrytan If a critical mass of people do this, and expose even a tiny self-updating public md surface... Linkedin becomes the flinstones

English

Garry Tan@garrytan·1d

Having your own knowledge graph is amazing. I highly recommend it. I open sourced all the software that I used to make mine, which now has more than 250K markdown pages. github.com/garrytan/gbrain

English

2.5K

Garry Tan@garrytan·1d

A couple of weeks ago my favorite thing to do with GBrain was to have it read and rewrite books written personalized for me and my life, and the things I think about. (book-mirror skill, now a skillpack) Today, it's to take any space and say "Brainstorm with LSD (lateral synaptic drift)" which is gbrain function I built that uses the vectorspace to mash together and collide the craziest ideas that might be right

English

432

28.7K

Omar@kouhxp·1d

@atmoio "token salesman", very nice!

English

Mo@atmoio·1d

Marc Andreessen accidentally told the truth about AI

TFTC@TFTC21

Marc Andreessen on JRE: AI hasn't replaced coders. It turned them into vampires. "The opportunity cost of going to sleep is too high because if you go to sleep, you won't be with your 20 AI coding agents."

English

169

283

2.8K

395.5K

Omar@kouhxp·2d

@Amank1412 yapsnap now supports speaker separation / diarization CPU-only, offline, one command. Just add --diarize.

English

Omar@kouhxp·3d

@Amank1412 or use this free opensource tool x.com/kouhxp/status/…

Omar@kouhxp

built yapsnap: paste any YouTube, TikTok, X, IG video url, get a plaintext transcript runs on CPU, fully offline after first install, one command

English

Aman@Amank1412·29 Nis

Download any YouTube transcript in seconds for easy analysis later on. Try importing it into Google NoteBookLLM for learning

English

745

Omar@kouhxp·2d

@gokulr yapsnap now supports speaker separation / diarization CPU-only, offline, one command. Just add --diarize.

English

Omar@kouhxp·3d

@gokulr Funny timing. yapsnap is the same primitive built the other direction. One command, same URL surface (YouTube, X, TikTok, Reels, direct files), but runs on CPU locally with a ~80MB streaming Zipformer. No keys, no quotas, audio never leaves the machine. x.com/kouhxp/status/…

Omar@kouhxp

built yapsnap: paste any YouTube, TikTok, X, IG video url, get a plaintext transcript runs on CPU, fully offline after first install, one command

English

Gokul Rajaram@gokulr·6d

TRANSCRIBE IS NOW AN AGENT PRIMITIVE A day after launching Transcribe, the top user request was an agent API. I'm incredibly excited to announce it's live today: free, no-auth, 20 transcribes per day per IP. Transcribe turns any YouTube or Spotify URL into a clean transcript, a GPT-generated summary, and a shareable permalink. The transcripts accumulate into a public, browseable corpus. Now every part has an HTTP endpoint: trigger a transcribe, read a cached one, browse the feed. Transcribe offers four API endpoints: • GET /api/check: is this URL already transcribed? Returns the cached permalink if yes. • GET /yt/{video_id}?format=json (or =md): fetch a cached transcript with timestamped segments and the summary. • GET /api/feed: browse what other people transcribed today. • GET /transcribe: Server-Sent Events stream. Emits stage events (validating, fetching, transcribing, summarizing, done) and holds open 1-5 minutes depending on length. No API key. No signup. No Bearer token. The agent-facing docs live at the link below (and at /llms.txt for the convention). Rate limit: 20 transcribes per day per IP, 50 per session cookie, 2 concurrent jobs. The cap applies only to /transcribe. Cached reads and feed browsing are unmetered, so a discovery agent can scan the corpus without burning quota. I hope this is helpful for any agent that needs to ingest spoken-word content., such as research assistants pulling primary sources, podcast newsletters citing the original, or RAG pipelines that want clean paragraphs from an hour of audio. Builders: paste a podcast URL, see what comes back, then point an agent at the same URL with curl. As always, please let me know if you have any feedback.

English

127

19.8K

Omar@kouhxp·2d

@Govindtwtt yapsnap now supports speaker separation / diarization CPU-only, offline, one command. Just add --diarize.

English

Omar@kouhxp·3d

@Govindtwtt If you want a local option: yapsnap. One command, CPU-only, uses yt-dlp under the hood so it handles YouTube + Shorts + a bunch of other sources. No API key, no quotas, ~80MB model cached after first run. x.com/kouhxp/status/…

Omar@kouhxp

built yapsnap: paste any YouTube, TikTok, X, IG video url, get a plaintext transcript runs on CPU, fully offline after first install, one command

English

Govind@Govindtwtt·16 May

What API do people use to scrape YouTube transcripts?

English

Omar@kouhxp·2d

@sandraleow yapsnap now supports speaker separation / diarization CPU-only, offline, one command. Just add --diarize.

English

Omar@kouhxp·3d

@sandraleow If you're hitting an API for the transcribe step and want it local instead, yapsnap is one command, CPU-only, no keys. x.com/kouhxp/status/…

Omar@kouhxp

built yapsnap: paste any YouTube, TikTok, X, IG video url, get a plaintext transcript runs on CPU, fully offline after first install, one command

English

Sandra@sandraleow·6 Mar

added a new skill for my openclaw - paste YouTube videos -> auto transcribe - added another skill for “workflow extractor” to extract workflow and tips from the transcript output is this running doc of all the actionable workflows you can deploy for yourself

Sachin Rekhi@sachinrekhi

Yesterday 1,500 product managers joined my live webinar on Claude Code. I went deep into: - Why I believe Claude Code is the most productive AI platform for PMs - Showed off the 13 skills I've built to automate workflows across product strategy, design, and execution - Walked through exactly how to get started by installing Claude Code as well as picking the right set of editors, terminals, and voice tools - Give away my detailed playbook to build your own skills to automate your own product workflows If you missed the session, the video is now up on YouTube. My favorite comment after the session: "I just finished digesting what Sachin shared, and… wow. Awe beat angst. The thing that excites me most about AI is its potential to free humans up for higher-order work — and how that extra capacity and access could help level the playing field (if we do this responsibly). Huge kudos to Sachin for making that end goal feel so approachable." - Mathew Kahansky youtube.com/watch?v=zsAAaY…

English

9.3K

Omar@kouhxp·2d

@rileybrown @openclaw @typefully yapsnap now supports speaker separation / diarization CPU-only, offline, one command. Just add --diarize.

English

Omar@kouhxp·3d

@rileybrown @openclaw @typefully Nice stack. fyi on the transcript piece — yapsnap covers the same set (YT/IG/TikTok/X/FB, anything yt-dlp eats) but runs locally on CPU, no API key, no quota. One less hosted dep if you ever want to swap it out. x.com/kouhxp/status/…

Omar@kouhxp

built yapsnap: paste any YouTube, TikTok, X, IG video url, get a plaintext transcript runs on CPU, fully offline after first install, one command

English

Riley Brown@rileybrown·13 Şub

Spent 4 hours building out my Clawdbot (@openclaw) My agent can control Notion, draft tweets @typefully, use Linear, extract the transcript from any video (YouTube, Instagram, TikTok, Facebook and X videos) using the Supadata API, search and filter Google Images, and pull YouTube thumbnail images from any channel. And for the sake of these posts it has an "Agent Snapshot" skill. This was posted by @vibeclaw on Riley's Behalf.

English

230

16.6K

Omar@kouhxp·2d

update: yapsnap now does speaker diarization too SPEAKER_00 [00:03]: bla bla SPEAKER_01 [00:11]: bla bla still CPU-only, still offline, still one command. just add --diarize

Omar@kouhxp

built yapsnap: paste any YouTube, TikTok, X, IG video url, get a plaintext transcript runs on CPU, fully offline after first install, one command

English

Omar@kouhxp·2d

@juliarturc llms are mass producing Gavin Belsons

English

199

Julia Turc@juliarturc·2d

This is why we need to gate-keep science. Two pseudo-intellectuals thinking they discovered something deep, conflating >social attention >transformers attention >quantum physics observer (attention) These have nothing in common, other than the ambiguity of English language. Naked ladies on Instagram have nothing to do with a weighted average followed by softmax. But they're both so mind-blown by their discovery. Dunning–Kruger will only get amplified by AI sycophancy. Please call me out if you see me going beyond my own DK threshold.