

bdub
2.9K posts

@bdw87
Social media promoter. Eskimo brother with @AB84 - All you need in life is pizza, hard cider and NFTs



Can crypto help solve the walled garden challenges around data for AI products? The biggest bottleneck for large language models (LLMs) isn’t compute; it’s data. GPT-5 is expected to require up to 75 trillion words for training. Or eight times the amount needed to build GPT-4. The rate of data consumption by AI models far exceeds that of new content production. With most of the public internet already scraped, indexed, and used in building GPT-4, where will the additional data come from? Over the last few quarters, open platforms such as Reddit and Stack Overflow have begun charging millions for access to their data. Which in turn, makes it difficult for smaller AI teams to compete with the giants. Creators who’ve contributed to such platforms are not compensated when these companies enter data licensing agreements. The feedback loop strengthens as LLMs continue to develop, exacerbating these issues. LLMs grow - > value of data increases -> platforms close up and charge more. This leads to a future where the best LLMs will be highly centralized and consolidated among the largest, most well-resourced entities. Users and creators on the internet become more of a product than they have ever been. Crypto might just provide some solutions to these problems. Teams like @Wyndlabs_ai, that’s building @getgrass_io and @getmasafi are tackling these problems head on by democratizing access to high quality data and rewarding individuals more equitably In our latest article by @shloked , we explore how walled gardens on the web are being broken down by a new generation of Web3 primitives. Read on for a brief on how LLMs evolve, why emerging applications need more data and the role crypto-native rails are playing in building a fair playing ground for founders. DMs open if you are building within the sector :) Link: decentralised.co/p/the-data-mus…









