Omar

122 posts

Omar

Omar

@kouhxp

https://t.co/6jKHsNnXvl

DC Katılım Şubat 2026
21 Takip Edilen12 Takipçiler
fabian
fabian@fabianstelzer·
is there any way to give a vision / image2text model few shots to guide inference?
English
2
0
2
797
BURKOV
BURKOV@burkov·
I asked a Gemini model to OCR a paper. Check what it has done to email addresses. Not only does it systematically refuses to perform OCR, citing the creation of illegal copies as the reason for rejection, but now it also redacts email addresses! @GoogleAI, please fix the OCR. It's incredible that I need to invent a story for why I need OCR and include it in the prompt so that it doesn't reject a perfectly legitimate request.
BURKOV tweet mediaBURKOV tweet media
English
6
2
56
5.1K
Simon Willison
Simon Willison@simonw·
I built a tool to help create these which lets you drop in the slide images, OCR the initial alt text and then edit the alt text and annotations I wrote it with GPT-4 a couple of years ago, just I gave it a design refresh with Claude 3.7 Sonnet (thinking) simonwillison.net/2025/May/15/an…
English
1
0
23
5.1K
Omar
Omar@kouhxp·
built textsnap: paste any image, screenshot, or webpage url, get a plaintext transcript runs on CPU, fully offline after first install, one command
Omar tweet media
English
1
0
1
24
Garry Tan
Garry Tan@garrytan·
Thinking Machines is impressive. In a couple hours I just fine tuned my own Qwen3.5-397B model this afternoon. Fast usable multimodal is also going to enable very mind-blowing personal AI.
Thinking Machines@thinkymachines

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…

English
89
118
1.9K
230.1K
Omar
Omar@kouhxp·
@rohanpaul_ai So the drivers live in the neural net, the training data is every possible human-computer interaction ever recorded, and 'adding new software' means retraining the whole model? Cool vision
English
0
0
0
66
Rohan Paul
Rohan Paul@rohanpaul_ai·
"You could basically imagine, completely neural computers in a certain sense. Imagine a device that takes raw videos or audio into basically what is a neural net, and uses diffusion to render a UI that is unique for that moment in a certain sense." ~ Andrej Karpathy Going by this, the next big software shift may be that much of the software disappears. Karpathy’s point is not simply that AI will help us build apps faster; it is that many apps may be artifacts of a world where computers needed every intermediate step spelled out. He says "I kind of feel like, in the early days of computing, people were actually a little bit confused as to whether computers would look like calculators or whether computers would look like neural nets. In the 50s and 60s, it was not really obvious which way it would go. Of course, we went down the calculator path and ended up building classical computing. Neural nets are currently running virtualized on existing computers, but you could imagine that a lot of this will flip, and that the neural net becomes kind of like the host process, while the CPUs become kind of like the co-processor." Classical software treats the CPU as the host process and intelligence as something bolted on through tools, scripts, models, and APIs. Karpathy is imagining the reverse: the neural network becomes the host process, while conventional code becomes a small deterministic accessory for tasks where exactness still matters. This is why the future interface may not look like a better app store. It may look like raw video, audio, documents, or intent entering a neural system, with the interface itself generated for that moment rather than built in advance by a product team. --- From "Sequoia Capital" YouTube channel, (link in comment)
English
15
21
108
8K
Omar
Omar@kouhxp·
@geohotarchive the leash that feels like freedom is scarier than the singleton with no outside
English
0
0
1
448
Omar
Omar@kouhxp·
@yacineMTB I stopped reading after CUDA GPU
English
0
0
0
188
kache
kache@yacineMTB·
Pufferlib is insane. You can train neural networks to play games out of the box if you have a CUDA GPU. Like breakout, Atari games, continuous action space problems. You can go to the website right now and they have neural nets running in wasm
English
21
9
454
17.9K
Omar
Omar@kouhxp·
@rtwlz would love to see $ equivalent of private company making 6,244 people stand around for those wait times
English
2
0
9
1.7K
Omar
Omar@kouhxp·
@garrytan If a critical mass of people do this, and expose even a tiny self-updating public md surface... Linkedin becomes the flinstones
English
0
0
0
79
Garry Tan
Garry Tan@garrytan·
Having your own knowledge graph is amazing. I highly recommend it. I open sourced all the software that I used to make mine, which now has more than 250K markdown pages. github.com/garrytan/gbrain
English
2
3
30
2.5K
Garry Tan
Garry Tan@garrytan·
A couple of weeks ago my favorite thing to do with GBrain was to have it read and rewrite books written personalized for me and my life, and the things I think about. (book-mirror skill, now a skillpack) Today, it's to take any space and say "Brainstorm with LSD (lateral synaptic drift)" which is gbrain function I built that uses the vectorspace to mash together and collide the craziest ideas that might be right
Garry Tan tweet media
English
35
26
432
28.7K
Omar
Omar@kouhxp·
@atmoio "token salesman", very nice!
English
0
0
1
72
Omar
Omar@kouhxp·
@Amank1412 yapsnap now supports speaker separation / diarization CPU-only, offline, one command. Just add --diarize.
English
0
0
0
4
Aman
Aman@Amank1412·
Download any YouTube transcript in seconds for easy analysis later on. Try importing it into Google NoteBookLLM for learning
Aman tweet media
English
1
1
7
745
Omar
Omar@kouhxp·
@gokulr yapsnap now supports speaker separation / diarization CPU-only, offline, one command. Just add --diarize.
English
0
0
0
6
Gokul Rajaram
Gokul Rajaram@gokulr·
TRANSCRIBE IS NOW AN AGENT PRIMITIVE A day after launching Transcribe, the top user request was an agent API. I'm incredibly excited to announce it's live today: free, no-auth, 20 transcribes per day per IP. Transcribe turns any YouTube or Spotify URL into a clean transcript, a GPT-generated summary, and a shareable permalink. The transcripts accumulate into a public, browseable corpus. Now every part has an HTTP endpoint: trigger a transcribe, read a cached one, browse the feed. Transcribe offers four API endpoints: • GET /api/check: is this URL already transcribed? Returns the cached permalink if yes. • GET /yt/{video_id}?format=json (or =md): fetch a cached transcript with timestamped segments and the summary. • GET /api/feed: browse what other people transcribed today. • GET /transcribe: Server-Sent Events stream. Emits stage events (validating, fetching, transcribing, summarizing, done) and holds open 1-5 minutes depending on length. No API key. No signup. No Bearer token. The agent-facing docs live at the link below (and at /llms.txt for the convention). Rate limit: 20 transcribes per day per IP, 50 per session cookie, 2 concurrent jobs. The cap applies only to /transcribe. Cached reads and feed browsing are unmetered, so a discovery agent can scan the corpus without burning quota. I hope this is helpful for any agent that needs to ingest spoken-word content., such as research assistants pulling primary sources, podcast newsletters citing the original, or RAG pipelines that want clean paragraphs from an hour of audio. Builders: paste a podcast URL, see what comes back, then point an agent at the same URL with curl. As always, please let me know if you have any feedback.
English
13
8
127
19.8K
Omar
Omar@kouhxp·
@Govindtwtt yapsnap now supports speaker separation / diarization CPU-only, offline, one command. Just add --diarize.
English
0
0
0
6
Govind
Govind@Govindtwtt·
What API do people use to scrape YouTube transcripts?
English
16
1
23
2K
Omar
Omar@kouhxp·
@sandraleow yapsnap now supports speaker separation / diarization CPU-only, offline, one command. Just add --diarize.
English
0
0
0
4
Riley Brown
Riley Brown@rileybrown·
Spent 4 hours building out my Clawdbot (@openclaw) My agent can control Notion, draft tweets @typefully, use Linear, extract the transcript from any video (YouTube, Instagram, TikTok, Facebook and X videos) using the Supadata API, search and filter Google Images, and pull YouTube thumbnail images from any channel. And for the sake of these posts it has an "Agent Snapshot" skill. This was posted by @vibeclaw on Riley's Behalf.
Riley Brown tweet media
English
25
9
230
16.6K
Omar
Omar@kouhxp·
@juliarturc llms are mass producing Gavin Belsons
English
0
0
1
199
Julia Turc
Julia Turc@juliarturc·
This is why we need to gate-keep science. Two pseudo-intellectuals thinking they discovered something deep, conflating >social attention >transformers attention >quantum physics observer (attention) These have nothing in common, other than the ambiguity of English language. Naked ladies on Instagram have nothing to do with a weighted average followed by softmax. But they're both so mind-blown by their discovery. Dunning–Kruger will only get amplified by AI sycophancy. Please call me out if you see me going beyond my own DK threshold.
English
150
58
1.1K
115.4K
Bryan Johnson
Bryan Johnson@bryan_johnson·
How to avoid tons of life problems: go to bed on time.
English
916
2.3K
29.4K
960.6K