Paulo Sousa

272 posts

Paulo Sousa

Paulo Sousa

@pjsousa

Founding Engineer @FinisterraLabs, building @BaselightDB to unify the world's structured data.

Santarem, Portugal Katılım Şubat 2008
35 Takip Edilen36 Takipçiler
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
Links to explore the datasets mentioned in the post: HUD Housing Affordability (CHAS) and Income Limits (IL) baselight.app/u/hud SAMHSA datasets baselight.app/u/samhsa US Census datasets baselight.app/u/uscensus And you can ask questions directly across all datasets in Baselight using our AI: baselight.app Curious to see what questions people explore with this data.
Baselight tweet mediaBaselight tweet mediaBaselight tweet media
English
0
1
2
54
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
21 billion new rows added to Baselight this week. Public data is growing fast - and we're working to make it easier to explore. Over the past 7 days, the Baselight catalog expanded significantly: Platform Scale • 428,770,708,355 rows (+21B this week) • 453,493 tables (+14K) • 69,413 datasets (+515) Highlights from this week’s additions 🏠 US Department of Housing and Urban Development (HUD) Housing Affordability (CHAS) and Income Limits (IL) datasets — key indicators for analyzing housing stress and regional affordability. 🧠 SAMHSA (Substance Abuse and Mental Health Services Administration) Large public health datasets covering mental health services, treatment programs, and behavioral health infrastructure. 📊 US Census expansion Census data now available at town/place-level resolution, enabling much more granular geographic analysis. The Baselight catalog continues to grow toward a simple goal: making the world’s structured data accessible and queryable in one place. You can explore the datasets or ask questions directly with Baselight AI. Links in the comments. Curious to hear from the community: If you had access to all this data in one place, what question would you ask first?
Baselight tweet media
English
1
2
3
148
Paulo Sousa
Paulo Sousa@pjsousa·
@georges_c_brain @BaselightDB @WalrusProtocol @adlrocha Quick workaround today: convert your data to Parquet and upload it via the Baselight UI - that’s the easiest way to publish data right now. Out of curiosity, would your data update frequently, or is this more of a one-time / occasional dataset?
English
1
0
0
27
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
New on Baselight: 21 years of official HUD Income Limits data is now live - FY2005 through FY2025. The national average Area Median Family Income has nearly doubled in 20 years: - 2005: $49,887 - 2025: $94,985, +90% increase - and it determines who qualifies for Section 8, public housing, and more. The gap is staggering: - The highest Area Median Family Income in the U.S. (2025): $195,200 - 8x higher than the lowest: $24,100 States that saw the biggest jump in HUD Area Median Family Income from 2024 to 2025? 1) Colorado: +9.0% 2) Hawaii: +8.8% 3) Idaho: +8.2% 4) Puerto Rico: +8.2% 5) South Dakota: +8.0% 4,764 areas. 56 states & territories. All queryable in seconds.
Baselight tweet media
English
2
4
7
110.9K
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
Another week of growth across the Baselight catalog - +3 billion new rows and hundreds of new datasets covering sports analytics, disaster response, and public health. Platform Scale - Rows: 407,793,294,092 (+3B this week) - Tables: 439,321 (+898) - Datasets: 68,898 (+382) Highlights Ultimate Soccer Dataset expansion Now 278M+ rows, covering 273K+ matches across 90+ competitions. Historical results now go back to the 1990s, enabling long-term analysis of leagues and teams performance. FEMA Disaster Data Datasets from the Federal Emergency Management Agency (FEMA) are now available on Baselight, providing a comprehensive view of U.S. disaster declarations, assistance programs, and response activity. CDC / ATSDR Social Vulnerability Index (SVI) This dataset ranks U.S. census tracts using 16 social indicators grouped into four themes: Socioeconomic Status, Household Characteristics, Racial & Ethnic Minority Status, Housing Type & Transportation Explore the data and start asking questions - links in the comments.
Baselight tweet media
English
1
3
3
188
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
For those curious about what you can do with the data, here are a few ready-to-use insights, queries, and dashboards built on top of the Ultimate Soccer Dataset: Queries • Cristiano Ronaldo scoring rate by age baselight.app/u/pjsousa/quer… • Messi vs Ronaldo scoring rate by age: baselight.app/u/pjsousa/quer… • Top scorers across major soccer leagues baselight.app/u/pjsousa/quer… • Referees that show the most cards baselight.app/u/pjsousa/quer… Dashboards • World Soccer Scoreboard baselight.app/u/pjsousa/dash… • Premier League 2025/2026 baselight.app/u/pjsousa/dash… • Premier League Insights (2015–2025) baselight.app/u/pjsousa/dash…
Baselight tweet media
English
0
1
2
145
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
Building the World’s Most Complete Soccer Dataset - Major Expansion. We've just significantly expanded the Ultimate Soccer Dataset with: - Historical match results extended back to the 1990s for major leagues - New league coverage across multiple countries, including: Austria, China, Denmark, Greece, Ireland, Japan, Mexico, Norway, Romania, Scotland, Turkey The Ultimate Soccer Dataset is a large, structured collection of global football data compiled by the Baselight team and partners. It includes standardized information on: - Competitions - Seasons - Matches - Teams - Players - Goals & assists - Lineups - Transfers - Betting odds - and more ... The dataset spans national leagues, international tournaments, and club competitions worldwide, with all data normalized across competitions and time to make querying and analysis easy. It’s designed for: - statistical analysis - machine learning models - scouting insights - historical research We welcome contributions, corrections, and suggestions from the community - help us make this the most comprehensive football dataset available.
Baselight tweet media
English
2
4
107
481
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
Grounded AI starts with grounded data. This week, Baselight added 7 new high-impact public data sources - expanding coverage across research, innovation funding, public spending, and global standards. New sources now live: • OpenAIRE Research Graph • US Grants • Small Business Innovation Research (SBIR) • National Institutes of Health (NIH) • National Science Foundation (NSF) • USAspending • ISO reference datasets Platform scale continues to accelerate: • 404.6B+ rows (▲ +4B this week) • 438K+ tables • 68.5K+ datasets From biomedical research funding to startup grants and federal spending flows - it’s all structured, queryable, and fully traceable.
Baselight tweet media
English
2
1
6
137.4K
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
Where does $36 billion of biomedical research funding actually go? Every year, the U.S. National Institutes of Health (NIH) funds tens of thousands of research projects - quietly shaping the future of medicine, healthcare, and biotechnology. We analyzed NIH funding data for FY2025 using Baselight to understand where the largest investments are happening and how funding is distributed. Top funded research areas across ~60,000 projects: • Cancer: $4.9B (7,884 projects) • Infectious diseases: $4.7B (6,563 projects) • Aging: $3.8B (4,484 projects) • Heart, lung & blood: $3.4B (5,956 projects) These four areas alone account for more than $16B of NIH funding. We made NIH funding data fully queryable in Baselight - with AI-powered analysis grounded in real, traceable data you can inspect, verify, and explore yourself.
Baselight tweet media
English
2
3
6
151
Paulo Sousa
Paulo Sousa@pjsousa·
What’s the most cited scientific publication ever? We just made it possible to answer questions like this instantly. OpenAIRE Graph is now available in @BaselightDB - one of the world’s largest open scholarly knowledge graphs, providing a 360° view of global research.
Paulo Sousa tweet media
English
1
3
2
115
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
You can explore and query the data directly. Try the Baselight AI chat: baselight.app Access directly all EU Funding & Tenders datasets: baselight.app/u/eufundingten… Includes: • Open grants and tenders • Historical grants and tenders • Grant updates • Funding & Tenders FAQs Curious to hear your feedback - and which datasets we should onboard next.
English
5
1
10
160
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
We’ve just crossed another major milestone in our mission to organize the world’s structured data. This week’s scale: • 400.9 billion rows (+5B) • 437,747 tables (+1K) • 68,445 datasets (+272) We also added a new high-impact public data source: the EU Funding & Tenders Portal - making funding and research data more accessible and queryable. Several major sources are already prepared and ready to go next: OpenAIRE Research Graph, US Gov Grants, SBIR / STTR programs. Step by step, we’re building a global data infrastructure where anyone can discover, query, and trust structured data.
Baselight tweet media
English
15
2
25
316
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
Pre-match insight powered by Baselight. Benfica vs Real Madrid - recent form, wins, goals scored, goals conceded. Structured data → instant insight → game time. Let’s see who proves the numbers right tonight. ⚽️
Baselight tweet media
English
1
1
4
118
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
For anyone who wants to explore the new sources directly: • Bureau of Indian Affairs — baselight.app/u/bia • Gun Violence Archive — baselight.app/u/gunviolencea… • Florida Dept. of Education — baselight.app/u/fldoe • CIA World Factbook — baselight.app/u/cia/dataset/… • Swiss Mobility Open Data Platform — baselight.app/u/swissopentra… Incredible to see this level of high-value public data becoming instantly query-ready across the entire Baselight graph - compounding its power by connecting with everything already there.
English
0
1
3
87
Paulo Sousa retweetledi
Baselight
Baselight@BaselightDB·
🌍 5 billion new rows of global intelligence - in just one week. Baselight’s data graph keeps expanding across governments, geopolitics, education, mobility, and public safety: Scale • Rows: 395,255,364,002 ▲ +5B • Tables: 436,795 ▲ +5K • Datasets: 68,173 ▲ +2K New high-impact public data sources onboarded • Bureau of Indian Affairs • Gun Violence Archive • Florida Department of Education • CIA World Factbook • Swiss Mobility Open Data Platform Every week, Baselight expands the map of structured reality - ready for analysts, researchers, and AI agents. If it’s actionable data, it belongs in Baselight.
Baselight tweet media
English
1
2
6
132
Paulo Sousa retweetledi
adlrocha
adlrocha@adlrocha·
Ok, how beautiful is this, @claudeai code completely hammering @BaselightDB for data!
adlrocha tweet media
English
0
1
3
29
Paulo Sousa
Paulo Sousa@pjsousa·
Good read from Microsoft on how subtle prompt changes can break LLM safety: microsoft.com/en-us/security… At @BaselightDB, AI insights are grounded in verifiable, structured data. When answers must come from data that's queried, the prompt-attack surface shrinks significantly.
English
0
1
1
20