Gobi Dasu

419 posts

Gobi Dasu banner
Gobi Dasu

Gobi Dasu

@gobidasu

Future of Work · Human-AI · Startups · Travel · Stanford BSCS/MSCS · Northwestern PhD CSEd · https://t.co/T4Mcg2vjOJ · https://t.co/CIzxtsse2B / aka गोविंद

San Francisco, CA เข้าร่วม Haziran 2011
458 กำลังติดตาม761 ผู้ติดตาม
Gobi Dasu
Gobi Dasu@gobidasu·
@elonmusk @karpathy Do you have any intuition on complexity or entropy here? Wouldn't shifting from a token space to a pixel or photon space increase the input and output spaces significantly?
English
0
0
0
2.6K
Andrej Karpathy
Andrej Karpathy@karpathy·
I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...
vLLM@vllm_project

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

English
560
1.6K
13.3K
3.3M
Gobi Dasu
Gobi Dasu@gobidasu·
Specific insights for bootstrapped founders: 1. Bootstrapped founders trying to scale without burning out need to build human-AI systems since keeping dozens of people on payroll isn't an option. 2. The 'vibe coding' → 'scalable ops' transition is exactly what we're all wrestling with. Establishing a 'system of record' early can avert doing everything through Slack DMs and founder heroics. 3. The trust-but-verify + budget caps approach could be a game-changer for teams that can't afford expensive ops tools yet. People say you need to hit PMF first, but isn't PMF like any extensive heuristic search? Why wouldn't having a more robust operating model help you reach PMF faster? What's your biggest bottleneck when trying to professionalize your startup?
Gobi Dasu@gobidasu

Some founders scale humans. Others don't. Here’s the difference we gleaned from ops leaders at Alchemy, Meter, Sonder, and FAANGM.

English
0
1
4
306
Gobi Dasu
Gobi Dasu@gobidasu·
@anishackd Plug: Building Hailcube, human‑AI ops that make all of this automatic. Calendly on my profile.
English
0
0
2
87
Gobi Dasu
Gobi Dasu@gobidasu·
Some founders scale humans. Others don't. Here’s the difference we gleaned from ops leaders at Alchemy, Meter, Sonder, and FAANGM.
English
11
1
4
445
Gobi Dasu
Gobi Dasu@gobidasu·
@anishackd What's blocking you from scaling vibe-coded visions into scalable human-AI systems?
English
0
0
2
81
Gobi Dasu
Gobi Dasu@gobidasu·
@anishackd A DAO won't ship a vision; setters and accountable middle managers who update the system of record and align teams will.
English
0
0
1
81
Gobi Dasu
Gobi Dasu@gobidasu·
@anishackd Tribal knowledge is continuity—retain or document it in the system of record. So it's no longer "tribal".
English
0
0
1
68
Gobi Dasu
Gobi Dasu@gobidasu·
@anishackd Hiring for attention to detail means hiring credible people who will actually OWN keeping the system of record up to date.
English
0
0
1
51
Gobi Dasu
Gobi Dasu@gobidasu·
@anishackd Onboarding for alignment; traceability for visibility. All orchestrated through the system of record.
English
0
0
1
50
Gobi Dasu
Gobi Dasu@gobidasu·
@anishackd System of record (ATS→CRM→handbooks) > Slack archaeology.
English
0
0
1
57
Gobi Dasu
Gobi Dasu@gobidasu·
@anishackd Allow tool freedom; standardize roles, flows, and the system of record, but not apps.
English
0
0
1
48
Gobi Dasu
Gobi Dasu@gobidasu·
@anishackd Build vs. buy vs. acquire = finance math. VP-level folks should do these. Decision records > vibes.
English
0
0
1
53
Gobi Dasu
Gobi Dasu@gobidasu·
@anishackd Publish the vision—but silently test edge‑case coverage on delivery.
English
0
0
1
54
Gobi Dasu
Gobi Dasu@gobidasu·
@anishackd Trust‑but‑verify means a culture of bottom‑up suggestions from ICs, but go authoritative during crunch.
English
0
0
1
61
Gobi Dasu
Gobi Dasu@gobidasu·
Thanks John, and congrats on building Kolega across many tiers/use cases. We specifically have built a self running AI Kanban board connected to a vetted tech talent network. The traceability aspect is with respect to the human actions in this context. In a nutshell, the system can take in a high level vision, assign subtasks with suggested “prompts to use” to vetted human IC, and coordinate peer reviews amongst those ICs. The ICs themselves use AI tools. Every AI coordinated task assignment, reassignment, extension request, and peer review is logged, linked, and timestamped in the Kanban board interface allowing the visionary to oversee the instrumented actions of many fractional human ICs of varying org affinity before even having the money to keep them on full time payroll.
English
0
0
1
27
John Pellew
John Pellew@JohnWPellew·
@gobidasu Love that you spent the week building a human-AI system to fix "vibe coding" with a traceable, reliable HITL workflow. Quick question, how are you surfacing that traceability to engineers today so they can trust and act on suggestions? John Pellew, CTO @ Kolega
English
1
0
1
49
Gobi Dasu
Gobi Dasu@gobidasu·
Summer wrap w/ #SFTechWeek: Malcolm Gladwell × Jay Gambetta, SPC, a16z + IBM meetups, ERA NYC, Stanford/Harker alumni events, hackathons, and a drone show. Spent most of our time heads-down on a human-AI system that fixes vibe coding with a traceable, reliable HITL workflow. DM "aipm" to join the waitlist.
Gobi Dasu tweet mediaGobi Dasu tweet mediaGobi Dasu tweet mediaGobi Dasu tweet media
English
1
2
4
259
Ananya Chadha
Ananya Chadha@ananyachdh·
It's official — we are excited to launch Quander.ai in the world. GENERATE A 1-MINUTE MOVIE FROM A MINOR PROMPT. Our AI agent assembles the entire video for you, in your video timeline, and you can manually make changes, if you’d like. It's incredible to see our first users have made videos with their friends as main characters, product ads, faith stories, music videos, fanfictions, book trailers, and more. 🧵 Try it today @quanderAI. If you’ve made it here, we have free credits for you 👇
English
681
227
2.4K
300.9K
Gobi Dasu
Gobi Dasu@gobidasu·
🚀 Important Update: Dear customers and talent, due to high demand, we're actively automating our processes to serve you faster and more efficiently! Thank you for your patience—we're excited to connect with you soon. Stay tuned! 🔥
English
0
0
3
134