Institutional Data Initiative

23 posts

Institutional Data Initiative banner
Institutional Data Initiative

Institutional Data Initiative

@instdin

A research center at Harvard working to strengthen society’s connection to knowledge by advancing our access to and understanding of the data that shapes AI.

Katılım Ağustos 2024
2 Takip Edilen170 Takipçiler
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
Amazing work from an amazing team using @instdin’s Institutional Books data release. Their dedication to detail and accuracy is sorely missing from the vast majority of historical-data work from the AI community. Yet there’s so much work to be done and benefit to getting it right
David Duvenaud@DavidDuvenaud

Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below! with @AlecRad and @status_effects 🧵

English
1
5
3
3.1K
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
Even if you're not a partner library, you might be curious about what it's like to work with GRIN. Our technical report has a wealth of details: arxiv.org/abs/2511.11447
English
0
1
0
138
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
We're also sharing the pipeline we developed for Institutional Books that seamlessly dedupes, classifies, and enhances the data once GRIN Transfer brings it down. institutional.org/tools
English
1
1
2
126
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
When libraries participate in Google Books, Google not only scans their books, it also makes a wealth of image, OCR, and metadata available to them via the Google Return Interface (GRIN). But working with GRIN can be challenging.
English
1
1
1
149
Institutional Data Initiative
What is the pathway towards greater diversity in data and AI? Hear from Professor Ruth Okediji, scholar of IP Law at Harvard Law School, who will be in conversation with Assistant Dean Amanda Watson of the Harvard Law School Library on Oct 22 at 2PM. harvard.zoom.us/meeting/regist…
Institutional Data Initiative tweet media
English
0
1
1
102
Institutional Data Initiative
Can a small visual language model read documents as effectively as models 27 times its size? Next Friday, IDI will host Michele Dolfi and Peter Staar from @IBMResearch Zurich to discuss their work on SmolDocling, an “ultra-compact” model for diverse OCR tasks.
Institutional Data Initiative tweet media
English
1
0
1
142
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
This Monday, @instdin will host @petrknoth to share his experience leading CORE ("The world’s largest collection of open access research papers") as the rise of AI brings new meaning, and challenges, to stewarding knowledge repositories. Join us virtually via the link below.
English
1
2
2
530
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
Tomorrow, it's our pleasure to host @ayahbdeir to talk about the power of data in building an AI ecosystem that's open, transparent, and fair. 11am ET on June 17th. Register at the link below to attend virtually. Cohosted by the @instdin and @BKCHarvard.
English
1
2
5
950
Institutional Data Initiative
We look forward to growing Institutional Books through community. We welcome collaboration from researchers and model makers as we: - Evaluate the dataset’s impact on model outputs - Continuing to refine our OCR pipelines View the dataset on Hugging Face: huggingface.co/datasets/insti…
English
1
0
3
318
Institutional Data Initiative
Today we released Institutional Books 1.0, a 242B token dataset from Harvard Library's collections, refined for accuracy and usability. 🧵
Institutional Data Initiative tweet media
English
3
12
36
8.5K
Institutional Data Initiative retweetledi
Fels
Fels@felchang·
I've loved writing words, while loops and wandering wectors, so I'm thrilled to join the @instdin team at Harvard as the director of community and communications! institutionaldatainitiative.org
Fels tweet media
English
2
2
11
894
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
As the Institutional Data Initiative (@instdin) expands its mission, we’re announcing a collaboration with the Boston Public Library (@BPLBoston) to develop AI-driven tools capable of accelerating new digitization at libraries across the world, starting at BPL. 🧵
English
1
7
6
1.6K
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
I'm pleased to announce we're expanding our mission at the Institutional Data Initiative (@instdin) with an open call for institutional collaborators, new digitization at Harvard Law School Library, and additional support to advance this work.
English
2
5
15
1.5K