ReadyAI

224 posts

ReadyAI banner
ReadyAI

ReadyAI

@ReadyAI_

Making the world's data accessible to AI. Home of AcquiOS for CRE & M&A acquisitions Bittensor Subnet 3️⃣3️⃣ 🌐

เข้าร่วม Ağustos 2024
24 กำลังติดตาม3.5K ผู้ติดตาม
ReadyAI รีทวีตแล้ว
David Fields
David Fields@DavFields·
Getting my Claw into music this weekend...thanks @ReadyAI_ Grab any files you'd like here: readyai.ai
David Fields tweet media
English
0
2
10
539
ReadyAI รีทวีตแล้ว
ReadyAI
ReadyAI@ReadyAI_·
New on @ReadyAI_ Request an llms.txt file for any domain, free Search a site → not in our 10K+ database? Hit "Request This Domain Now" → get your file queued on subnet 5 free requests per user. Every file is open-sourced on GitHub Structured data for agents shouldn't be gated
English
1
5
20
839
ReadyAI รีทวีตแล้ว
David Fields
David Fields@DavFields·
This is the best explanation for why (1) Bittensor is unique amongst crypto projects and (2) you often see crypto VCs hating on it Bittensor provides the incentives for bootstrapping innovation across numerous experiments all at once without the need for VCs $TAO
Algod@AlgodTrading

Yes emissions are used to bootstrap innovation, same as Uber, Amazon and countless of other big companies You can chose between these 2: -give those emissions to VC’s -give those emissions to builders who devote their whole time to build out the network Vc’s hate it because they can’t apply the VC playbook/had discounted access compared to the masses

English
6
14
79
4.8K
ReadyAI รีทวีตแล้ว
0xSammy
0xSammy@0xSammy·
SEO was built for humans browsing the web The next version of search optimization is built for agents reading it AEO/GEO ("agent engine optimization" or "generative engine optimization") is becoming a real category An entire industry is forming around making your website legible to LLMs and autonomous agents instead of just Google crawlers Right now every AI agent that needs info about a company or domain does the same thing; scrapes, parses HTML, and hopes for the best Billions of redundant crawls; trillions of wasted tokens llms.txt emerged as a proposed standard for this; a markdown file in a website's root directory that gives LLMs a clean structured summary of the site's content instead of forcing them to parse navigation menus, cookie banners, and JavaScript Over 844k websites have already adopted it; Anthropic, Cloudflare, and Stripe among them The problem is that no one has built the infrastructure to do this at scale across the entire web The beauty of this is that the infrastructure powering it can be decentralized from day one; there's no reason for one company to own the machine-readable index of the entire web So when you read the below announcement from subnet 33 you should look at it in the context of this broader agentic engine optimization (AEO) How many "AEO experts" do you think currently exist? Zero. There's a huge opportunity for you to pick a niche and dominate Once again, another Bittensor subnet tackling a forward thinking problem
David Fields@DavFields

We just launched a new readyai.ai Type any domain into the search. If it's in our dataset, you get clean, structured intelligence instantly. No scraping. No parsing HTML. Just machine-readable data, ready for any AI agent. 10,000+ websites crawled, cleaned, and structured by Subnet 33 so far. Growing to 100K by Q2, 1M by year end. This is the beginning of something bigger: a marketplace for agentic data. Right now, every AI agent that needs info about a company or domain scrapes, parses, and hopes. Billions of redundant crawls. Trillions of wasted tokens. We're building the infrastructure layer that fixes this — an indexed, machine-readable web powered by decentralized compute.

English
14
5
77
9.6K
ReadyAI รีทวีตแล้ว
David Fields
David Fields@DavFields·
We just launched a new readyai.ai Type any domain into the search. If it's in our dataset, you get clean, structured intelligence instantly. No scraping. No parsing HTML. Just machine-readable data, ready for any AI agent. 10,000+ websites crawled, cleaned, and structured by Subnet 33 so far. Growing to 100K by Q2, 1M by year end. This is the beginning of something bigger: a marketplace for agentic data. Right now, every AI agent that needs info about a company or domain scrapes, parses, and hopes. Billions of redundant crawls. Trillions of wasted tokens. We're building the infrastructure layer that fixes this — an indexed, machine-readable web powered by decentralized compute.
ReadyAI@ReadyAI_

x.com/i/article/2037…

English
4
17
99
20.3K
ReadyAI
ReadyAI@ReadyAI_·
@eleusys7 Orienting SN33 for the coming wave of agentic commerce 🫡
English
0
0
1
44
ReadyAI
ReadyAI@ReadyAI_·
👀 something new is coming We've been building and we're almost ready to show you. SN33 has been processing the web at scale, turning raw Common Crawl data into clean, AI-ready `llms.txt` files. Structured semantic summaries that any LLM agent, MCP server, or AI app can consume instantly. On Thursday we'll be releasing the Github repo where `llms.txt` files will be pushed in batches as the subnet processes them. We're starting with over 1000 websites analyzed and processed by the subnet that will grow every week. And shortly after... 🌍 We're launching a public frontend Any website. Any domain. You request it, the subnet processes it and you get a `llms.txt` back. No more raw HTML hell for AI agents. No more redundant crawling. Just clean, structured, machine-readable intelligence about any corner of the web, on demand, powered by decentralized compute. This is SN33 becoming a public utility for AI infrastructure The web, made readable for machines. At scale. Open to anyone. 🔜 More very soon. Stay tuned.
ReadyAI tweet media
English
6
11
61
11.6K
ReadyAI รีทวีตแล้ว
David Fields
David Fields@DavFields·
Our recent breakthrough with enrichment tasks on the subnet has completely opened the floodgates. We can now create structured datasets from nearly any source, from llms.txt to deep coding data. Will be sharing benchmark improvements with this coding data shortly
ReadyAI@ReadyAI_

x.com/i/article/2034…

English
4
3
23
1.9K
ReadyAI รีทวีตแล้ว
David Fields
David Fields@DavFields·
The web wasn't built for AI agents. We're fixing that. First 1,000 domains live now, millions coming. Open source, decentralized, and free. Frontend coming shortly to request llms.txt for any site
ReadyAI@ReadyAI_

🚀 llms.txt are live on SN33 The llms.txt repository is now live. 🔗 github.com/afterpartyai/l… SN33 has processed the first batch with over 1,000 websites crawled, cleaned, and converted into structured llms.txt files by the subnet. Semantic summaries ready for any LLM agent, MCP server, or AI app to consume instantly. No scraping. No parsing raw HTML. Just clean, machine-readable intelligence. New batches will be pushed as the subnet keeps processing. The repo grows every week. What's in the dataset: → Structured semantic summaries per domain → Named entities: people, orgs, products, technologies, concepts → Topic classification and key themes → Deterministic O(1) lookup by domain with no index file needed → Git-friendly structure that scales to millions of domains This initial release covers ~1,000 domains as a pilot, but the pipeline scales to millions. 📍 Roadmap: 10K → 100K → 1M domains → continuous updates from new Common Crawl releases and soon from requests. 🌍 And the frontend is coming. Any domain. You request it, the subnet processes it, you get an llms.txt back. We're putting the finishing touches on the public UI and it drops soon. SN33 is becoming infrastructure. The web, made readable for machines and open to anyone, powered by decentralized infra. Star the repo. Share it. And stay close. The next drop is right around the corner.

English
5
8
31
3.2K
ReadyAI
ReadyAI@ReadyAI_·
🚀 llms.txt are live on SN33 The llms.txt repository is now live. 🔗 github.com/afterpartyai/l… SN33 has processed the first batch with over 1,000 websites crawled, cleaned, and converted into structured llms.txt files by the subnet. Semantic summaries ready for any LLM agent, MCP server, or AI app to consume instantly. No scraping. No parsing raw HTML. Just clean, machine-readable intelligence. New batches will be pushed as the subnet keeps processing. The repo grows every week. What's in the dataset: → Structured semantic summaries per domain → Named entities: people, orgs, products, technologies, concepts → Topic classification and key themes → Deterministic O(1) lookup by domain with no index file needed → Git-friendly structure that scales to millions of domains This initial release covers ~1,000 domains as a pilot, but the pipeline scales to millions. 📍 Roadmap: 10K → 100K → 1M domains → continuous updates from new Common Crawl releases and soon from requests. 🌍 And the frontend is coming. Any domain. You request it, the subnet processes it, you get an llms.txt back. We're putting the finishing touches on the public UI and it drops soon. SN33 is becoming infrastructure. The web, made readable for machines and open to anyone, powered by decentralized infra. Star the repo. Share it. And stay close. The next drop is right around the corner.
English
1
4
22
5K
ReadyAI
ReadyAI@ReadyAI_·
Great question. Short answer: we sidestep a lot of it by processing at the site level, not the page level. When you enrich an entire domain's pages together, NER, tags, summarization, similar pages, you get entity grounding from context across the site rather than trying to reconcile isolated page-level extractions across the whole crawl. It doesn't eliminate the problem but it dramatically reduces the noise surface. The repo drops Thursday so would love your take once you can see the output structure.
English
1
0
0
132
Sovran AMR
Sovran AMR@a_m_r_news·
@ReadyAI_ LLMs.txt? Interesting. How are you handling entity resolution across the Common Crawl's noise? That's always been a bear for us.
English
1
0
0
156
ReadyAI รีทวีตแล้ว
David Fields
David Fields@DavFields·
The generic data race is over. The teams that win the next 3 years are the ones building deep, vertical-specific pipelines that scraping can't replicate. That's exactly what we're doing @ReadyAI_ . Phase 1 is just the start.
ReadyAI@ReadyAI_

x.com/i/article/2029…

English
4
6
28
2.3K
ReadyAI
ReadyAI@ReadyAI_·
SN33 — Organizing the Spoken Web Our podcast conversations dataset has been downloaded over 300,000 times on HuggingFace. That demand told us something: the market is starving for structured conversation data. Written content represents a fraction of human knowledge online. The real depth lives in spoken conversation with experts explaining their craft, founders breaking down strategy, researchers debating methodology. Millions of hours of it happen in public every day across podcasts, interviews, panels, and debates. It's the highest-signal data on the internet, and almost none of it is structured, tagged, or accessible to AI agents. We've been calling it the web's dark matter. This week, we're making it visible. We're launching SN33's agentic transcription system with an autonomous pipeline that discovers, retrieves, and processes public conversations across the web at scale. It doesn't wait for input. It finds the conversations that matter, converts them into structured data, and feeds them directly into the subnet for enrichment. Every conversation enters the system tagged to a category from the start. That means category-specific task routing for miners, and more importantly, it unlocks something we've been building toward: customer-requested categories. Need every meaningful AI conversation from the last 90 days transcribed, enriched, and delivered as structured data? That's not a hypothetical. That's the infrastructure we're standing up right now. With site enrichment, we organized the written web. With agentic transcription, we're organizing the spoken web. Together, these systems are building what we think of as llms.txt for the entire web so not just pages, but conversations. Structured, categorized, and ready for the next generation of AI agents to consume. Not just what people write, but what they say. Rolling out TOMORROW February 26th.
English
4
10
37
4.1K
ReadyAI รีทวีตแล้ว
David Fields
David Fields@DavFields·
We started SN33 @ReadyAI_ with a simple thesis: the web has all the information agents need, but none of it is structured for them. Webpage Metadata v2 is the inflection point as we're not just tagging pages anymore, we're enriching entire sites. The goal is to become the largest producer of llms.txt files in the world. On testnet now, mainnet 2/23 🫡
ReadyAI@ReadyAI_

SN33 -- Enriching the Data of the World SN33 just shipped Webpage Metadata v2, and the best way to explain what we’re building is this: an llms.txt version of Common Crawl. Our partnership with Common Crawl began with the simple but daunting task of tagging web pages to make semantic web data widely available. Generating this data would break down the barriers preventing web organization. This week we are taking a giant step in broadening that goal to encapsulate AI-enabling the world wide web by launching the enrichment process for entire web sites. Search engines atomize the web by surfacing individual pages. That's great for finding individual facts. It does almost nothing to give agents the holistic information they need to actually complete tasks. Simple example: an agent searching for "best skis" gets quality-for-price rankings from individual pages. It completely misses how waist width affects your ability to float in powder, navigate tight spaces, or carve on groomed trails. That information exists across an entire site, but no one is structuring it that way. This week we shipped the technology to change that. SN33 is now enriching entire web sites, not just individual pages. Our new high-volume API pushes full sites through the subnet, collecting enriched data from tags, NER, similar pages to summarization across every page on a site, grouped together. Why llms.txt matters The llms.txt standard summarizes an entire web site's contents in a single meaningful text file. Agents and MCP tools can understand what a site contains without processing every page. It's the missing layer between the open web and the agent economy. Adoption has been stymied by one problem: nobody is generating these files at scale. There hasn't been a broad effort to create llms.txt for the whole web — until now. Once SN33 reaches tipping-point volume of enriched site data, we begin publishing llms.txt files at scale. We believe SN33 will become the largest producer of llms.txt files in the world. The demand for structured web data is already proven. Our first open-source dataset, the 5000 Podcast Conversations, has crossed 300,000+ downloads on HuggingFace. That was conversations. This is the entire open web. More open-source releases are coming. v2.28.63 is on testnet and goes mainnet February 23rd.

English
2
9
51
4K