Joe Golden

633 posts

Joe Golden

Joe Golden

@JoeGoJoeGo

Current: https://t.co/C4lxGFEvou: Listen to your documents. Prev: https://t.co/swqBbwZmHL Co-founder/CEO. UMich PhD econ. Google / Microsoft / Upwork.

Seattle, WA Katılım Şubat 2017
2.4K Takip Edilen507 Takipçiler
Joe Golden
Joe Golden@JoeGoJoeGo·
@jt_kerwin Haha they must have trained on Futurama scripts. Lucy Liu: "I'll never forget you, Fry. MEMORY DELETED."
English
0
0
1
32
Jason Kerwin
Jason Kerwin@jt_kerwin·
Amazing things are happening with the WhatsApp AI
Jason Kerwin tweet media
English
2
0
5
388
Joe Golden
Joe Golden@JoeGoJoeGo·
@aviel I absolutely love this idea and would love to be involved.
English
0
0
2
61
aviel
aviel@aviel·
Sick of AI doomerism but also keenly aware of reality? Us too, so we decided to do something about it. AI has forever changed software. It’s easier than ever to ship, but much harder to know what actually needs fixing. And even worse, there is a growing disconnect between folks in non-tech industries dealing with painful, daily headaches that AI can solve now and builders who can create solutions, but don’t know which problems actually matter. That’s the gap we’re closing with Foundations NEXT. It’s a simple pitch: bring the real problem. No deck, no startup BS, just the operational pain you live with. We’ll share it with AI-native builders at Foundations who can poke at it, refine it, and maybe build something useful. Best of all, they’ll build it with you. For free. And if it becomes something bigger, we’ll partner with you. To operators who keep muttering “this shouldn’t be this hard” take a look and drop a problem if it fits @ fndtns.org/next
aviel tweet media
English
3
8
40
1.3K
Joe Golden
Joe Golden@JoeGoJoeGo·
@FederalAsh14 @johnjhorton @MeganTStevenson Qianfan OCR looks interesting. Here's what's different about it from other OCR models. It is a bigger model, but it can also handle, document understanding and key information extraction. It takes a prompt, and uses thinking tokens. I listened to the paper with Paper2Audio!
English
1
0
0
94
Joe Golden
Joe Golden@JoeGoJoeGo·
@FederalAsh14 @johnjhorton @MeganTStevenson I'm starting to look into Qianfan OCR as well. It is 4B parameters, but claims to go beyond the other recent "OCR" models to handle document understanding tasks (like chart summaries). It looks interesting and I'm going to read its paper this week too.
English
1
0
1
78
Megan Stevenson
Megan Stevenson@MeganTStevenson·
Tl/dr — use Marker to convert PDFs if you care about getting figures/tables/equations correct. Use ODL if you mostly care about text and have a large number of files you want converted quickly.
⑆Luke Stein⑈@lukestein

@MeganTStevenson Will be curious to see your and others' experiences. I just tried (using `--hybrid docling-fast --hybrid-mode full` and `--enrich-formula` and though super fast it was not good for the stuff other than text Attached is how gemini rated the conversions I got from the same paper

English
2
23
282
26.6K
⑆Luke Stein⑈
⑆Luke Stein⑈@lukestein·
Made for me but hopefully useful for you: A skill for extracting PDF slides from a video (either local or YouTube etc.) of a presentation github.com/lukestein/skil… Also makes a cleaned transcript For example, feed it a URL of @nberpubs YouTube talk. Codex or Claude Code
⑆Luke Stein⑈ tweet media⑆Luke Stein⑈ tweet media
English
4
13
58
3.9K
Joe Golden
Joe Golden@JoeGoJoeGo·
@MeganTStevenson @Ljt019117161 @johnjhorton The problem is that earlier tools had much lower accuracy narrating a complex doc like a research paper PDF. Paper2Audio accurately narrates your docs using high quality voices, building upon recent AI progress in multiple areas. Happy to discuss or answer more questions.
English
0
0
1
59
Joe Golden
Joe Golden@JoeGoJoeGo·
@FederalAsh14 @johnjhorton @MeganTStevenson Ah- our use case is different. We're extracting entire docs. I saw in the GLM paper that it supports key information extraction, but the section on that is very short which makes me suspect that it may not have high accuracy yet.
English
1
0
1
32
Joe Golden
Joe Golden@JoeGoJoeGo·
@Ljt019117161 @MeganTStevenson @johnjhorton GLM seems similar in quality and speed to Paddle 1.5, so that's good. Their API was too slow for us with limited max pages and file size, and self-hosting it was painful. We're going to continue to evaluate it.
English
2
0
0
42
Joe Golden retweetledi
Math Files
Math Files@Math_files·
log (😅) = 💧log 😄
English
274
15.8K
68.8K
2.4M
Joe Golden
Joe Golden@JoeGoJoeGo·
@MeganTStevenson @johnjhorton They are all very good at extracting the contents of a PDF, quickly / locally since they are small models. They can be hard to install and get running, and each has their own quirks (e.g. issues in docs they don't handle well, types of structured output they don't support).
English
1
0
3
59
Joe Golden
Joe Golden@JoeGoJoeGo·
@johnjhorton @MeganTStevenson Have you tried PaddleOCR VL 1.5, MinerU 2.5 or GLM-OCR? These are all new ~1B parameter open-weight OCR models that have the same output style as Marker and have papers on Arxiv. We work extensively with these types of models at Paper2Audio. Happy to discuss.
English
4
0
6
254