George Ho

1.4K posts

George Ho banner
George Ho

George Ho

@_eigenfoo

Natural language processing, Bayesian modeling, open source, crosswords, donuts and coffee. Currently ML at @flatironhealth (he/him/his)

NYC Inscrit le Mayıs 2017
703 Abonnements1.1K Abonnés
George Ho
George Ho@_eigenfoo·
So, do I know anybody attending #KDD2024 @kdd_news this year? I'll be there next week!
English
0
0
1
286
George Ho retweeté
Pablo Montalvo
Pablo Montalvo@m_olbap·
It was hard to find quality OCR data... until today! Super excited to announce the release of the 2 largest public OCR datasets ever 📜 📜 OCR is critical for document AI: here, 26M+ pages, 18b text tokens, 6TB! Thanks to @ucsf_library, @industrydocs and @PDFAssociation 🧶 ↓
Pablo Montalvo tweet media
English
7
101
601
93.5K
George Ho
George Ho@_eigenfoo·
shot / chaser
George Ho tweet mediaGeorge Ho tweet media
English
0
0
2
300
George Ho retweeté
Dr Kareem Carr
Dr Kareem Carr@kareem_carr·
The perfect peer-reviewed article title does not exi-
Dr Kareem Carr tweet media
English
36
241
1.8K
184.1K
George Ho
George Ho@_eigenfoo·
Also from the group chat today Wordle 934 3/6* ⬛⬛🟨🟩⬛ 🟩🟩⬛🟩🟩 🟩🟩🟩🟩🟩
George Ho tweet media
English
0
0
0
164
George Ho
George Ho@_eigenfoo·
My NYT word game group chat has just come up with a new idea: play Wordle, get your score, and then prompt an image generation AI to draw a picture of what you see in your score. I'll go first. Wordle 934 6/6 ⬜🟦⬜⬜⬜ ⬜🟦⬜⬜🟦 🟦🟦🟦🟦⬜ 🟧🟧⬜🟦🟦 🟧🟧⬜🟧🟧 🟧🟧🟧🟧🟧
George Ho tweet media
English
1
1
3
566
George Ho retweeté
Armineh
Armineh@arminehnouri·
Very excited to introduce DocLLM, a multimodal LLM developed by my colleagues @jpmorgan. DocLLM-7B outperforms other SotA LLMs on 12/16 benchmarks within four core Document AI tasks! Incredibly proud of the team for their hard work. Check it out at arxiv.org/abs/2401.00908
Armineh tweet media
AK@_akhaliq

JPMorgan announces DocLLM A layout-aware generative language model for multimodal document understanding paper page: huggingface.co/papers/2401.00… Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a crucial role in comprehending these documents effectively. In this paper, we present DocLLM, a lightweight extension to traditional large language models (LLMs) for reasoning over visual documents, taking into account both textual semantics and spatial layout. Our model differs from existing multimodal LLMs by avoiding expensive image encoders and focuses exclusively on bounding box information to incorporate the spatial layout structure. Specifically, the cross-alignment between text and spatial modalities is captured by decomposing the attention mechanism in classical transformers to a set of disentangled matrices. Furthermore, we devise a pre-training objective that learns to infill text segments. This approach allows us to address irregular layouts and heterogeneous content frequently encountered in visual documents. The pre-trained model is fine-tuned using a large-scale instruction dataset, covering four core document intelligence tasks. We demonstrate that our solution outperforms SotA LLMs on 14 out of 16 datasets across all tasks, and generalizes well to 4 out of 5 previously unseen datasets.

English
7
27
110
41.5K
George Ho retweeté
AK
AK@_akhaliq·
JPMorgan announces DocLLM A layout-aware generative language model for multimodal document understanding paper page: huggingface.co/papers/2401.00… Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a crucial role in comprehending these documents effectively. In this paper, we present DocLLM, a lightweight extension to traditional large language models (LLMs) for reasoning over visual documents, taking into account both textual semantics and spatial layout. Our model differs from existing multimodal LLMs by avoiding expensive image encoders and focuses exclusively on bounding box information to incorporate the spatial layout structure. Specifically, the cross-alignment between text and spatial modalities is captured by decomposing the attention mechanism in classical transformers to a set of disentangled matrices. Furthermore, we devise a pre-training objective that learns to infill text segments. This approach allows us to address irregular layouts and heterogeneous content frequently encountered in visual documents. The pre-trained model is fine-tuned using a large-scale instruction dataset, covering four core document intelligence tasks. We demonstrate that our solution outperforms SotA LLMs on 14 out of 16 datasets across all tasks, and generalizes well to 4 out of 5 previously unseen datasets.
AK tweet media
English
23
342
1.9K
352.8K
George Ho
George Ho@_eigenfoo·
I sawed my copy of the power broker in half so that it’s easier to carry around When a book’s size becomes an impediment to reading it, I feel like something’s gone seriously wrong
George Ho tweet media
English
0
0
7
340
George Ho retweeté
Patrick Collison
Patrick Collison@patrickc·
Gerty and Carl Cori won the Nobel Prize together in 1947. Then 6 of their students won Nobel Prizes, all in physiology/medicine and chemistry. (Five separate prizes in total; one was shared.) amazon.com/Crucible-Scien…
English
14
60
694
0
George Ho retweeté
Jennifer R. Weiser
Jennifer R. Weiser@ProfJRWeiser·
Beyond ecstatic for our Cooper Brue team from @cooperunion for winning both best beer label and 3rd place overall in the annual beer brewing competition at AIChE. Go team and thanks Ana for helping us compete! And yes, the poster is hand drawn!
Jennifer R. Weiser tweet mediaJennifer R. Weiser tweet mediaJennifer R. Weiser tweet media
English
2
3
20
1.2K
George Ho retweeté
George Ho retweeté
Loplop
Loplop@__loplop·
Hello, long time no #crossword! A new #cryptic is up, and I’m pretty happy with it! My favorite clue: I'm about to stuff fruit with trace of radium — it might bring death (4,6) georgeho.org/crosswords/019/
English
1
2
8
665
George Ho retweeté
Flatiron Health
Flatiron Health@flatironhealth·
Extracting meaningful clinical detail from EHRs for millions of patients with cancer is challenging. @FlatironHealth uses #NLP & #ML to extract key information from unstructured documents in the curation of high quality #RWD. Read more on our approach: flatiron.com/resources/appr…
English
1
5
16
11.3K