Institutional Data Initiative

22 posts

Institutional Data Initiative banner
Institutional Data Initiative

Institutional Data Initiative

@instdin

A research center at Harvard working to strengthen society’s connection to knowledge by advancing our access to and understanding of the data that shapes AI.

Katılım Ağustos 2024
2 Takip Edilen164 Takipçiler
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
Even if you're not a partner library, you might be curious about what it's like to work with GRIN. Our technical report has a wealth of details: arxiv.org/abs/2511.11447
English
0
1
0
114
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
We're also sharing the pipeline we developed for Institutional Books that seamlessly dedupes, classifies, and enhances the data once GRIN Transfer brings it down. institutional.org/tools
English
1
1
2
105
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
When libraries participate in Google Books, Google not only scans their books, it also makes a wealth of image, OCR, and metadata available to them via the Google Return Interface (GRIN). But working with GRIN can be challenging.
English
1
1
1
137
Institutional Data Initiative
What is the pathway towards greater diversity in data and AI? Hear from Professor Ruth Okediji, scholar of IP Law at Harvard Law School, who will be in conversation with Assistant Dean Amanda Watson of the Harvard Law School Library on Oct 22 at 2PM. harvard.zoom.us/meeting/regist…
Institutional Data Initiative tweet media
English
0
1
1
88
Institutional Data Initiative
Can a small visual language model read documents as effectively as models 27 times its size? Next Friday, IDI will host Michele Dolfi and Peter Staar from @IBMResearch Zurich to discuss their work on SmolDocling, an “ultra-compact” model for diverse OCR tasks.
Institutional Data Initiative tweet media
English
1
0
1
134
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
This Monday, @instdin will host @petrknoth to share his experience leading CORE ("The world’s largest collection of open access research papers") as the rise of AI brings new meaning, and challenges, to stewarding knowledge repositories. Join us virtually via the link below.
English
1
2
1
526
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
Tomorrow, it's our pleasure to host @ayahbdeir to talk about the power of data in building an AI ecosystem that's open, transparent, and fair. 11am ET on June 17th. Register at the link below to attend virtually. Cohosted by the @instdin and @BKCHarvard.
English
1
2
5
945
Institutional Data Initiative
We look forward to growing Institutional Books through community. We welcome collaboration from researchers and model makers as we: - Evaluate the dataset’s impact on model outputs - Continuing to refine our OCR pipelines View the dataset on Hugging Face: huggingface.co/datasets/insti…
English
1
0
3
316
Institutional Data Initiative
Today we released Institutional Books 1.0, a 242B token dataset from Harvard Library's collections, refined for accuracy and usability. 🧵
Institutional Data Initiative tweet media
English
3
12
36
8.5K
Institutional Data Initiative retweetledi
Fels
Fels@felchang·
I've loved writing words, while loops and wandering wectors, so I'm thrilled to join the @instdin team at Harvard as the director of community and communications! institutionaldatainitiative.org
Fels tweet media
English
2
2
11
888
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
As the Institutional Data Initiative (@instdin) expands its mission, we’re announcing a collaboration with the Boston Public Library (@BPLBoston) to develop AI-driven tools capable of accelerating new digitization at libraries across the world, starting at BPL. 🧵
English
1
7
5
1.6K
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
I'm pleased to announce we're expanding our mission at the Institutional Data Initiative (@instdin) with an open call for institutional collaborators, new digitization at Harvard Law School Library, and additional support to advance this work.
English
2
5
15
1.5K
Institutional Data Initiative retweetledi
Greg Leppert
Greg Leppert@leppert·
Today we're launching the Institutional Data Initiative to work with libraries, gov agencies, and other knowledge institutions to help refine and publish their collections as data, with an eye toward AI. 🧵
Institutional Data Initiative@instdin

Hello world. 🧵

English
2
16
22
3.9K