Joe Golden

633 posts

Joe Golden

@JoeGoJoeGo

Current: https://t.co/C4lxGFEvou: Listen to your documents. Prev: https://t.co/swqBbwZmHL Co-founder/CEO. UMich PhD econ. Google / Microsoft / Upwork.

Seattle, WA Katılım Şubat 2017

2.4K Takip Edilen507 Takipçiler

Joe Golden@JoeGoJoeGo·4d

@jt_kerwin Haha they must have trained on Futurama scripts. Lucy Liu: "I'll never forget you, Fry. MEMORY DELETED."

English

Jason Kerwin@jt_kerwin·4d

Amazing things are happening with the WhatsApp AI

English

388

Joe Golden@JoeGoJoeGo·4d

@aviel I absolutely love this idea and would love to be involved.

English

aviel@aviel·5d

Sick of AI doomerism but also keenly aware of reality? Us too, so we decided to do something about it. AI has forever changed software. It’s easier than ever to ship, but much harder to know what actually needs fixing. And even worse, there is a growing disconnect between folks in non-tech industries dealing with painful, daily headaches that AI can solve now and builders who can create solutions, but don’t know which problems actually matter. That’s the gap we’re closing with Foundations NEXT. It’s a simple pitch: bring the real problem. No deck, no startup BS, just the operational pain you live with. We’ll share it with AI-native builders at Foundations who can poke at it, refine it, and maybe build something useful. Best of all, they’ll build it with you. For free. And if it becomes something bigger, we’ll partner with you. To operators who keep muttering “this shouldn’t be this hard” take a look and drop a problem if it fits @ fndtns.org/next

English

1.3K

Joe Golden@JoeGoJoeGo·5d

@FederalAsh14 @johnjhorton @MeganTStevenson Qianfan OCR looks interesting. Here's what's different about it from other OCR models. It is a bigger model, but it can also handle, document understanding and key information extraction. It takes a prompt, and uses thinking tokens. I listened to the paper with Paper2Audio!

English

Joe Golden@JoeGoJoeGo·5d

@FederalAsh14 @johnjhorton @MeganTStevenson I'm starting to look into Qianfan OCR as well. It is 4B parameters, but claims to go beyond the other recent "OCR" models to handle document understanding tasks (like chart summaries). It looks interesting and I'm going to read its paper this week too.

English

Megan Stevenson@MeganTStevenson·6d

Tl/dr — use Marker to convert PDFs if you care about getting figures/tables/equations correct. Use ODL if you mostly care about text and have a large number of files you want converted quickly.

⑆Luke Stein⑈@lukestein

@MeganTStevenson Will be curious to see your and others' experiences. I just tried (using `--hybrid docling-fast --hybrid-mode full` and `--enrich-formula` and though super fast it was not good for the stuff other than text Attached is how gemini rated the conversions I got from the same paper

English

282

26.6K

Joe Golden@JoeGoJoeGo·5d

@lukestein @boyuan_chen @nberpubs @NotebookLM @Paper2Audio Thanks for continuing to share Paper2Audio! Please share feature requests or other feedback anytime.

English

⑆Luke Stein⑈@lukestein·5d

@boyuan_chen @nberpubs There’s so much that we invest in producing knowledge artifacts that don’t get used maximally/optimally. Part of why I like @NotebookLM, @Paper2Audio, even simple stuff like this:

⑆Luke Stein⑈@lukestein

We had a fantastic conference discussion today (from Felipe Severino) so I had a chance to use a favorite LLM prompt to get a “improved” transcription to send to my coauthors May be useful; copyable version here: gist.githubusercontent.com/lukestein/6a4d…

English

103

⑆Luke Stein⑈@lukestein·5d

Made for me but hopefully useful for you: A skill for extracting PDF slides from a video (either local or YouTube etc.) of a presentation github.com/lukestein/skil… Also makes a cleaned transcript For example, feed it a URL of @nberpubs YouTube talk. Codex or Claude Code

English

3.9K

Joe Golden retweetledi

⑆Luke Stein⑈@lukestein·14 Mar

@JoeGoJoeGo @ben_golub @BarbaraBiasi Late reply but I too learned about @Paper2Audio from @ben_golub too, and it's so good. Thank you, Joe!!

English

3.3K

Joe Golden@JoeGoJoeGo·5d

@MeganTStevenson @Ljt019117161 @johnjhorton The problem is that earlier tools had much lower accuracy narrating a complex doc like a research paper PDF. Paper2Audio accurately narrates your docs using high quality voices, building upon recent AI progress in multiple areas. Happy to discuss or answer more questions.

English

Megan Stevenson@MeganTStevenson·5d

@JoeGoJoeGo @Ljt019117161 @johnjhorton @JoeGoJoeGo I was just looking at your paper2audio tool. Can you give me the pitch on this? What problem were you trying to solve as compared to the pre-existing tools?

English

Joe Golden@JoeGoJoeGo·5d

@Ljt019117161 @MeganTStevenson @johnjhorton What is your use case?

English

Joe Golden@JoeGoJoeGo·5d

@Ljt019117161 @MeganTStevenson @johnjhorton Thanks for letting me know. We may have jumped the guy, trying to run it shortly after it was officially released, but before vLLM supported it.

English

Joe Golden@JoeGoJoeGo·5d

@Ljt019117161 @MeganTStevenson @johnjhorton We evaluate these models for Paper2Audio.com, our TTS app for accurate document narration. We use multiple cloud GPU hosts. GLM is very fast, just in our case, not notably faster than the other comparison models as they claimed in our case.

English

Lucien@Ljt019117161·6d

@JoeGoJoeGo @MeganTStevenson @johnjhorton I’ve pretty much only used docling and GLM ocr but it’s very very fast for me on vllm with mtp, what are you running it with/on?

English

Joe Golden@JoeGoJoeGo·5d

@FederalAsh14 @johnjhorton @MeganTStevenson Ah- our use case is different. We're extracting entire docs. I saw in the GLM paper that it supports key information extraction, but the section on that is very short which makes me suspect that it may not have high accuracy yet.

English

Anand Singh@FederalAsh14·6d

@JoeGoJoeGo @johnjhorton @MeganTStevenson I wanted to extract three statements from annual reports. I found marker ( with llm qwen3.5 9b) to be working, but again it breaks for about 30% docs

English

Joe Golden@JoeGoJoeGo·6d

@Ljt019117161 @MeganTStevenson @johnjhorton We couldn't replicate their substantial speed claims from their paper: arxiv.org/pdf/2603.10910. Which models have you used? Have any insights to share about them?

English

Joe Golden@JoeGoJoeGo·6d

@Ljt019117161 @MeganTStevenson @johnjhorton GLM seems similar in quality and speed to Paddle 1.5, so that's good. Their API was too slow for us with limited max pages and file size, and self-hosting it was painful. We're going to continue to evaluate it.

English

Joe Golden@JoeGoJoeGo·6d

@Ljt019117161 @MeganTStevenson @johnjhorton We haven't tried dots as much. It is 3B parameters, so likely would be ~3X slower and more expensive, which is unappealing. They have an intriguing brand new paper out that I'm going to read this week: arxiv.org/pdf/2603.13032…

English

Joe Golden@JoeGoJoeGo·6d

@FederalAsh14 @johnjhorton @MeganTStevenson Interesting. Were you trying to extract specific info from these docs, or the entire doc? Did you find something that does work?

English

Anand Singh@FederalAsh14·6d

@JoeGoJoeGo @johnjhorton @MeganTStevenson Have tried all, none of them works reliable to get financial statements from annual reports.

English

Joe Golden retweetledi

Math Files@Math_files·17 Mar

log (😅) = 💧log 😄

English

274

15.8K

68.8K

2.4M

Joe Golden@JoeGoJoeGo·6d

@MeganTStevenson @johnjhorton They are all very good at extracting the contents of a PDF, quickly / locally since they are small models. They can be hard to install and get running, and each has their own quirks (e.g. issues in docs they don't handle well, types of structured output they don't support).

English

Megan Stevenson@MeganTStevenson·6d

@JoeGoJoeGo @johnjhorton I haven't but thanks for the heads up. Can you tell me the pros and cons of these?

English

183

Joe Golden@JoeGoJoeGo·6d

@lukestein @johnjhorton @MeganTStevenson Thanks!!

English

⑆Luke Stein⑈@lukestein·6d

@JoeGoJoeGo @johnjhorton @MeganTStevenson All I know is however y’all do it, it works great

English

Joe Golden@JoeGoJoeGo·6d

@johnjhorton @MeganTStevenson @lukestein I meant to tag you too.

English

Joe Golden@JoeGoJoeGo·6d

@johnjhorton @MeganTStevenson Have you tried PaddleOCR VL 1.5, MinerU 2.5 or GLM-OCR? These are all new ~1B parameter open-weight OCR models that have the same output style as Marker and have papers on Arxiv. We work extensively with these types of models at Paper2Audio. Happy to discuss.

English

254

Keşfet

@jt_kerwin @aviel @FederalAsh14 @johnjhorton @MeganTStevenson @lukestein @boyuan_chen @nberpubs