John P

548 posts

John P

John P

@sql_johnpruitt

Katılım Ekim 2021
451 Takip Edilen146 Takipçiler
John P retweetledi
Tiger Data - Creators of TimescaleDB
Agents are the new developers, and they need databases built for how they work. Today we're open-sourcing Eon: a production Slack agent that turns institutional knowledge into instant answers. After 6 weeks, nearly 50% of our company uses it regularly. We learned three things building it: agents need conversational memory, focused context (not API wrappers), and production-grade reliability. So we open-sourced everything: ▪️ tiger-eon – Full reference implementation ▪️ tiger-agents-for-work – Durable event processing with retries ▪️ tiger-slack – Real-time Slack ingestion into TimescaleDB ▪️ MCP servers for GitHub, Linear, and docs with semantic search Built on Postgres. Battle-tested in production. Ready to deploy in 10 minutes. Start building: tsdb.co/ye5kjkz8
English
0
2
5
3.8K
John P retweetledi
Tiger Data - Creators of TimescaleDB
It's Mat (@cevianNY) from @tigerdatabase again. This week I am announcing a new open-source tool for evaluating and improving LLM-based text-to-SQL systems for PostgreSQL. We built this because in developing the semantic catalog text-to-SQL tool I discussed last week, we thought hard about why these systems stay stuck at "kinda works" instead of production-ready. The problem: we're measuring wrong 🧵 Most text-to-sql evals tell you THAT your system failed. Not WHY it failed. That's useless for actually improving things. Is your LLM bad at finding the right tables? Or is it finding tables but writing garbage SQL? Without knowing this, you're debugging blind. So we built text-to-sql-eval. It runs your system three ways: ▪️normal mode (retrieve + generate) ▪️with full schema provided ▪️with correct tables given The performance gaps tell you exactly where to focus. Example: if you get 40% accuracy normally but 75% with correct tables provided? Your retrieval sucks. Fix that first. We've been dogfooding this for months. It's how we improved our text-to-SQL systematically instead of through vibes. The eval is PostgreSQL-native because database differences matter. Works with any LLM or text-to-SQL system. includes LLM-as-judge for smarter evaluation. Just open sourced it all. Even threw in a companion tool to generate test datasets from your own schema. Because eval on toy databases is vastly inferior to testing on YOUR dataset. Link in thread 👇
English
1
2
3
162
John P retweetledi
Mike Freedman
Mike Freedman@michaelfreedman·
LLMs are becoming new users of the database. But unlike human users, it’s not enough to ask “did the query run?” We need ways to evaluate how well they interact with our data. Over the past few months, our team built an internal library to better understand text-to-SQL performance: where models succeed, where they fail, and what kinds of errors matter. We found it so useful that we’ve decided to open source this Evals library: 👉 github.com/timescale/text… A few things this library does: ▪️Runs evals against real PostgreSQL schemas and data (not just toy datasets). ▪️Surfaces why a query failed—schema retrieval vs. reasoning vs. SQL execution. ▪️Provides optional LLM-as-a-Judge checks to catch cases where queries are semantically correct but look different. ▪️Persists and visualizes results over time (backed by TimescaleDB). If you’re experimenting with text-to-SQL -- whether for internal tools, agents, or LLM apps -- we hope this is helpful. Contributions welcome, and full blog post below. #AgenticPostgres #TextToSQL #LLMevals
English
1
5
15
819
John P retweetledi
Avthar
Avthar@avthar·
Stop paying the OpenAI tax. The best AI devtools are actually open-source, free to use, and give you full control over your data and privacy. While proprietary AI dominated early headlines, the true revolution is happening in open source - where a flourishing ecosystem of smarter models and easy to use developer tools is making advanced AI accessible to everyone. After speaking with hundreds of developers, my @TimescaleDB colleague @cevianNY and I have curated the 'Easy Mode' Open Source AI stack - the most developer-friendly tools that work seamlessly together to help you build AI apps: LLMs: Open source models like Llama 3 and Qwen 2.5 are matching Claude and GPT’s performance on many benchmarks come with more data privacy guarantees. Embeddings: Modern embedding models like JinaAI, BAAI, and Nomic power help devs power accurate search and RAG without paying per token or dependence on third party APIs. Model access and deployment: @ollama enables developers to access and deploy dozens of state of the art open-source models with just one command – no team of PhDs required. Data and retrieval: PostgreSQL – The world's most trusted database now handles AI workloads better than specialized vector DBs, thanks to extensions like pgvector and pgai. Backend: FastAPI – The fastest way to build production-ready AI backends that actually scale. Frontend: NextJS – Build beautiful AI UIs with the framework that handles streaming, caching, and real-time updates out of the box. How’s your experience been with these tools? What did I miss?
Avthar tweet media
English
82
388
3.4K
329.5K
John P retweetledi
Avthar
Avthar@avthar·
Engineer: "Should we upgrade to the latest OpenAI model? It's cheaper and performs better on benchmarks." Tech lead: "We should explore it, but re-embedding all our data is too much work right now. Let's revisit next month." Engineer: "Hold my beer..." With pgai Vectorizer, you can create 3 different versions of your data with 3 different OpenAI embedding models in just 3 SQL queries (see code snippet). This ability to maintain multiple versions of your data with different embedding models not only makes testing new models easier, but also makes serving results for A/B testing, and gradually rolling out model upgrades without disrupting production much easier as well.
Avthar tweet media
English
3
38
319
29K
John P retweetledi
Mayo Oshin
Mayo Oshin@mayowaoshin·
One of the major challenges with building AI chat apps and agents powered by RAG is the complexity of creating, storing, and syncing embeddings with real-time source data. If this isn't managed effectively, your AI app will likely generate outdated, inaccurate information leading users to lose trust and abandon use of the app. Fortunately, there's a simple solution that eliminates this complexity. pgai Vectorizer by @TimescaleDB helps to ensure that your AI app always utilises relevant, up-to-date data via efficient RAG and vector search processes. With just one line of SQL (as per image below), you can automate embedding creation and synchronization directly within PostgreSQL. Once you're set up, pgai Vectorizer will automatically sync embeddings with inserts, updates, deletes to source data. No need for you to build external pipelines or synchronization services. pgai Vectorizer also enables A/B testing of different embedding models and data chunking methods without reprocessing data yourself. In a nutsell, pgai benefits include: - Vector embeddings in PostgreSQL: Create, store, and search embeddings all in PostgreSQL. - No more stale embeddings: Keeps embeddings synced as your underlying data changes. - All in SQL: Everything is configured in SQL, no extra tools needed. - Familiar postgreSQL ecosystem: pgai Vectorizer works with other popular tools for AI in PostgreSQL, like the pgvector, pgvectorscale, and pgai extensions. If you’re looking to simplify your AI infra for your RAG, search or AI agent app, check out the pgai Vectorizer Github repo here: github.com/timescale/pgai
Mayo Oshin tweet media
English
0
6
17
2K
John P retweetledi
Matvey Arye 🇺🇦
Matvey Arye 🇺🇦@cevianNY·
𝗧𝗵𝗲 𝗵𝗮𝗿𝗱 𝗽𝗮𝗿𝘁 𝗼𝗳 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗶𝘀 𝗵𝗮𝗻𝗱𝗹𝗶𝗻𝗴 𝘁𝗵𝗲 𝗳𝗮𝗶𝗹𝘂𝗿𝗲 𝗰𝗮𝘀𝗲𝘀. Yesterday, we launched pgai Vectorizer, a tool that automates embedding generation directly from data stored in the database and keeps it synced as the source data changes. The surprising pushback? Some argued it’s nothing new, pointing out that many vector databases can embed data automatically on insert. But that’s not the hard part—and it doesn’t address the actual pain points developers face. Sure, calling an embedding provider on insert is straightforward. What’s complex is handling the failure scenarios: what happens if the embedding provider is down, rate limits are exceeded, or unexpected errors occur? Most vector DBs simply propagate the error and fail the insert, leaving developers to handle the retry logic. This forces developers to set up a queuing system, monitor for failures, and take on additional operational burdens. pgai Vectorizer takes a different approach. It generates embeddings asynchronously, based on data already stored in the database. This means if there’s an intermittent failure, the system automatically retries—without extra developer intervention. The database remains the source of truth, giving developers reliability and simplicity, even when the embedding provider isn’t cooperating. By eliminating these operational headaches, we’re making embedding generation truly seamless and scalable. #AI #Embeddings #DataEngineering #VectorDatabase #Innovation
Matvey Arye 🇺🇦 tweet media
English
2
4
10
724
John P retweetledi
Matvey Arye 🇺🇦
Matvey Arye 🇺🇦@cevianNY·
Vector Databases should actually be Vector Indexes Imagine if every time you inserted or updated a row, you had to reach out to an external system to update the associated B-trees. Each call risks failure, rate-limits, and throws in queuing, tracking, staleness handling, and overall complexity. Sounds like some 1984-style dystopia, right? (Well, actually, in 1984 Ingres already managed indexes automatically....) And yet, here in 2024, we’re all too willing to deal with this exact BS for vector indexes. Take a simple example of embedding blog posts. Vector databases treat chunks and embeddings as isolated data atoms, detached from the source data itself. This means each time I publish a new post or edit an old one, I need to manually update embeddings in Pinecone, Qdrant, Weviate, etc. Or I need to set up a complex web-service with monitoring and retry logic to handle it all automatically. Either way, it’s a giant headache, and it shouldn’t have to be this way. That’s why we built pgai Vectorizer — making embedding creation and synchronization as easy as using an index in PostgreSQL. With Vectorizer, you simply have a blog table in your database, and create a vectorizer with a single line of code as seen below. From there, pgai Vectorizer automatically creates embeddings for your blog entries and keeps them in sync with every insert, delete, or update in your blog table. No custom data workflows, infrastructure, or constant monitoring required. There are far more interesting (and fun) challenges in AI than babysitting data infrastructure. Let us take on that burden for you.
Matvey Arye 🇺🇦 tweet media
English
3
4
24
2.6K
John P retweetledi
TimescaleDB (by Tiger Data)
TimescaleDB (by Tiger Data)@TimescaleDB·
🛑 Vector Databases Are the Wrong Abstraction. Here’s Why. They treat embeddings as standalone data, disconnected from their source, leading to outdated embeddings, constant sync issues, and endless maintenance. That’s why we built pgai Vectorizer for Postgres—so every engineer can build AI applications without the headache of managing embedding pipelines. Creating an embedding pipeline is as easy as building an index. It stays synced with your data automatically—no extra tools, no stale embeddings. 👉 Whether you’re a busy AI engineer or just getting started, check out the blog to see how pgai Vectorizer lets you focus on building killer AI apps. 🔗👇 #Postgres #pgaiVectorizer #Postgres #Data #AI #SQL #DevTools #AIDevelopment #PostgresExtensions #AIinSQL
TimescaleDB (by Tiger Data) tweet media
English
1
2
13
921
John P retweetledi
Eric Zhang
Eric Zhang@ekzhang1·
The original SQL paper (1974) really stands the test of time CPUs were 20,000,000,000x more expensive per flop in 1974! SQL is 50 years old: older than the Internet, hard disks, Linux, x86, and CMOS RAM; yet SQL remains in service to people today, for the exact same use cases
Eric Zhang tweet media
English
14
77
590
42.1K
John P retweetledi
Matvey Arye 🇺🇦
Matvey Arye 🇺🇦@cevianNY·
After my team built this we only had one question: why doesn’t everyone do embedding this way?
Avthar@avthar

VECTOR DATABASES ARE THE WRONG ABSTRACTION. Here’s a better way: introducing pgai Vectorizer, a new open-source PostgreSQL tool that automatically creates and syncs embeddings with source data, just like a database index. ❌ Why vector databases fail Vector databases treat embeddings as independent data, divorced from the source data from which embeddings are created, rather than what they truly are: derived data. This pitfall means that many AI projects that start out as simple vector search implementations inevitably evolve into a complex orchestra of monitoring, synchronization, and firefighting. 😓 Keeping embeddings in-sync is hard In an attempt to avoid stale embeddings, engineering teams have to build and maintain a maze of ETL pipelines, juggle multiple databases (vector DB, metadata store, lexical search), and manage complex queuing systems for updates. Add monitoring for data drift, alert systems for stale results, and validation checks across systems - and you have a brittle infrastructure that inevitably breaks down, leading to stale embeddings and wasted engineering hours. What if you could just use Postgres instead? ✅ Pgai Vectorizer: Vector embeddings as database indexes Pgai Vectorizer treats embeddings like database indexes. It automatically creates, updates, and maintains embeddings as your data changes. Just like an index, the database handles all the complexity: syncing, versioning, and cleanup happen automatically. This means no manual tracking, zero maintenance burden, and the freedom to rapidly experiment with different embedding models and chunking strategies without building new pipelines. 🤔Why did we build pgai Vectorizer? Our team at @timescaledb built pgai Vectorizer because many developers regard PostgreSQL as the “Swiss army knife” of databases, as it can handle everything from vectors and text data to JSON documents. We think an “everything database” like PostgreSQL is the solution to eliminate the nightmare of managing multiple databases, making it the ideal home for vectorizers and the foundation for AI applications. ⚙️How does pgai Vectorizer work? Check out the code snippet below –  it takes just 6 lines of SQL to put your embedding creation pipeline on autopilot with pgai Vectorizer! Under the hood, pgai Vectorizer checks for modifications to the source table (inserts, updates, and deletes) and asynchronously creates and updates vector embeddings in an external worker. 🧑‍💻 Sounds exciting! How can I get started? Pgai Vectorizer is open-source under the PostgreSQL license and available for free to use on any PostgreSQL database. You can find installation instructions on the pgai GitHub repository (see end of post). It’s also available as a managed service in Timescale’s PostgreSQL cloud platform. 📚Learn more [1] Pgai github repo: github.com/timescale/pgai [1] Technical explainer post: timescale.com/blog/vector-da… Share this post with your followers to let them know about pgai Vectorizer and comment your reactions and questions.

English
1
2
15
846
John P
John P@sql_johnpruitt·
Building a RAG application? Need to create embeddings and keep them up-to-date as source data changes? Don't want to manage a complex data pipeline? Look no further! pgai Vectorizer is your open-source solution for this automation... and it's built on Postgres!
John P tweet media
English
1
0
2
133
John P retweetledi
Ajay Kulkarni
Ajay Kulkarni@acoustik·
🔥 𝗘𝘃𝗲𝗿𝘆 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿 𝗶𝘀 𝗻𝗼𝘄 𝗮𝗻 𝗔𝗜 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿: 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗽𝗴𝗮𝗶 𝗩𝗲𝗰𝘁𝗼𝗿𝗶𝘇𝗲𝗿 🔥 𝗽𝗴𝗮𝗶 𝗩𝗲𝗰𝘁𝗼𝗿𝗶𝘇𝗲𝗿 is a developer tool that automatically creates and syncs embeddings right in your PostgreSQL database. In other words: 𝗽𝗴𝗮𝗶 𝗩𝗲𝗰𝘁𝗼𝗿𝗶𝘇𝗲𝗿 𝗺𝗮𝗸𝗲𝘀 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝗺𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁 𝗮𝘀 𝗲𝗮𝘀𝘆 𝗮𝘀 𝗰𝗿𝗲𝗮𝘁𝗶𝗻𝗴 𝗮𝗻 𝗶𝗻𝗱𝗲𝘅 𝗶𝗻 𝗣𝗼𝘀𝘁𝗴𝗿𝗲𝗦𝗤𝗟. Available 100% open source (PostgreSQL license) and via our Cloud offering. With pgai, we are building the first and only developer suite 𝘪𝘯𝘴𝘪𝘥𝘦 𝘵𝘩𝘦 𝘥𝘢𝘵𝘢𝘣𝘢𝘴𝘦 for building AI applications. Why are we doing this? To make every engineer an AI engineer, by embracing and extending PostgreSQL, the most loved database by developers. Serve every developer, power the future of computing, and advance the human frontier. Let's go!! 🤖 🐘 🚀
Ajay Kulkarni tweet media
English
1
4
11
500
John P retweetledi
Avthar
Avthar@avthar·
VECTOR DATABASES ARE THE WRONG ABSTRACTION. Here’s a better way: introducing pgai Vectorizer, a new open-source PostgreSQL tool that automatically creates and syncs embeddings with source data, just like a database index. ❌ Why vector databases fail Vector databases treat embeddings as independent data, divorced from the source data from which embeddings are created, rather than what they truly are: derived data. This pitfall means that many AI projects that start out as simple vector search implementations inevitably evolve into a complex orchestra of monitoring, synchronization, and firefighting. 😓 Keeping embeddings in-sync is hard In an attempt to avoid stale embeddings, engineering teams have to build and maintain a maze of ETL pipelines, juggle multiple databases (vector DB, metadata store, lexical search), and manage complex queuing systems for updates. Add monitoring for data drift, alert systems for stale results, and validation checks across systems - and you have a brittle infrastructure that inevitably breaks down, leading to stale embeddings and wasted engineering hours. What if you could just use Postgres instead? ✅ Pgai Vectorizer: Vector embeddings as database indexes Pgai Vectorizer treats embeddings like database indexes. It automatically creates, updates, and maintains embeddings as your data changes. Just like an index, the database handles all the complexity: syncing, versioning, and cleanup happen automatically. This means no manual tracking, zero maintenance burden, and the freedom to rapidly experiment with different embedding models and chunking strategies without building new pipelines. 🤔Why did we build pgai Vectorizer? Our team at @timescaledb built pgai Vectorizer because many developers regard PostgreSQL as the “Swiss army knife” of databases, as it can handle everything from vectors and text data to JSON documents. We think an “everything database” like PostgreSQL is the solution to eliminate the nightmare of managing multiple databases, making it the ideal home for vectorizers and the foundation for AI applications. ⚙️How does pgai Vectorizer work? Check out the code snippet below –  it takes just 6 lines of SQL to put your embedding creation pipeline on autopilot with pgai Vectorizer! Under the hood, pgai Vectorizer checks for modifications to the source table (inserts, updates, and deletes) and asynchronously creates and updates vector embeddings in an external worker. 🧑‍💻 Sounds exciting! How can I get started? Pgai Vectorizer is open-source under the PostgreSQL license and available for free to use on any PostgreSQL database. You can find installation instructions on the pgai GitHub repository (see end of post). It’s also available as a managed service in Timescale’s PostgreSQL cloud platform. 📚Learn more [1] Pgai github repo: github.com/timescale/pgai [1] Technical explainer post: timescale.com/blog/vector-da… Share this post with your followers to let them know about pgai Vectorizer and comment your reactions and questions.
Avthar tweet media
English
45
156
1.1K
111.3K
John P retweetledi
TimescaleDB (by Tiger Data)
TimescaleDB (by Tiger Data)@TimescaleDB·
🚀 Every Developer is Now an AI Developer - Introducing pgai Vectorizer No need for specialized tools or vector databases—pgai Vectorizer lets you create, sync, and manage embeddings with just one SQL command. 🔹 Embeddings in Postgres: Build, store, and sync embeddings alongside your relational data—no extra infrastructure needed. 🔹 Real-time sync, rapid testing: Keep embeddings fresh as your data changes. Test models instantly. 🔹 All in Postgres: Everything you need—embeddings, model access, and AI workflows—all with the SQL you already know. #Postgres #pgaiVectorizer #Postgres #Data #AI #SQL #DevTools #AIDevelopment #PostgresExtensions #AIinSQL
TimescaleDB (by Tiger Data) tweet media
English
5
22
70
16.3K
John P retweetledi
Avthar
Avthar@avthar·
👀 launching something big tomorrow...
Avthar tweet media
English
3
9
151
13.4K
John P retweetledi
Ajay Kulkarni
Ajay Kulkarni@acoustik·
👀 What's this? Something new launching tomorrow? Hm.....
Ajay Kulkarni tweet media
English
5
11
39
6.2K
John P retweetledi
Avthar
Avthar@avthar·
Build a fully private RAG app with @ollama, Llama3 (@AIatMeta) and PostgreSQL Worried about AI models leaking confidential data? Our new tutorial shows you how to build a private RAG application with no risk of data leakage, by using local models and open-source tools, all running on your local machine. It's a perfect start for developers who want to build AI systems while keeping sensitive information completely under wraps. (Link in next tweet)
Avthar tweet media
English
3
44
327
29.2K