dltHub

334 posts

dltHub banner
dltHub

dltHub

@dltHub

dltHub is the creator of data load tool (dlt)

Berlin Katılım Kasım 2022
20 Takip Edilen494 Takipçiler
dltHub
dltHub@dltHub·
Curious what people here are actually using for AI coding 👀 Copilot? Cursor? Claude Code? We put together a super short (1-min) survey. 👉 dlthub.notion.site/3039fb8e23cf80…
dltHub tweet media
English
0
0
2
53
dltHub
dltHub@dltHub·
Microsoft Fabric is great at compute & storage but data quality enforcement is on you. Use WAP to validate data before it hits the lakehouse. dlt handles schemas, business rules, uniqueness, PII, and monitoring so bad data never reaches analytics. dlthub.com/blog/microsoft…
English
0
0
2
97
dltHub
dltHub@dltHub·
LLMs follow the gravity of your vocabulary, not your business logic. That's why your AI-generated stack looks great in the demo and breaks silently in prod. dlthub.com/blog/unvibe
English
1
0
2
92
dltHub
dltHub@dltHub·
@Snowflake Next modules: • Build ingestion workflows with the Snowflake Native App • Run pipelines in Snowsight notebooks Module inspired by a great article from Martin Seifert 🙏 sfrt.io/can-you-run-dl…
English
0
0
2
52
dltHub
dltHub@dltHub·
❄️ Module 2 is live in our course dlt + @Snowflake  Learn how to run dlt pipelines inside Snowflake using Snowpark Container Services (SPCS), enabling native execution and scheduling with no external infrastructure required. Continue the course: dlthub.learnworlds.com/course/dlt-sno…
English
1
1
3
151
dltHub
dltHub@dltHub·
Small data teams deserve better tools. We’re opening early design partnerships for solo and small data teams to try dltHub Pro before launch. Early access, influence the roadmap, and an early-bird discount. dlthub.com/solutions/for-…
dltHub tweet media
English
0
0
2
79
dltHub
dltHub@dltHub·
Build a data pipeline locally with @DuckDB → ship it to @ClickHouseDB Join @elviskahoro & Joshua Lee from @AltinityDB for a live demo using dlt: ingest APIs & DBs run locally promote to ClickHouse explore with @marimo_io run + quality checks 📅 Mar 16 altinity.com/events/using-d…
Altinity@AltinityDB

Prototyping on #DuckDB locally is great until you need to ship it. 🙃 On March 16, Josh Lee (OS Dev Advocate) & @elviskahoro (DevX at dltHub) will show how to promote your DuckDB pipeline to #ClickHouse® production with dlt. Free, 8 AM PT, & worth it: hubs.la/Q045kCGW0

English
0
1
2
235
dltHub retweetledi
Matthäus Krzykowski
Matthäus Krzykowski@matthausk·
@mattarderne @tayloramurphy @dltHub We are trying to make this happen with our commercial product dltHub #solution-introducing-dlthub-so-that-any-python-developer-can-bring-their-business-users-closer-to-fresh-reliable-data" target="_blank" rel="nofollow noopener">dlthub.com/blog/llm-nativ…
English
1
1
2
97
dltHub
dltHub@dltHub·
@TheCesarCross @huggingface Exactly, schema inference is doing a lot of the heavy lifting here. dlt handles it automatically so you get provenance without the usual setup overhead 🙌
English
0
0
1
19
cCross
cCross@TheCesarCross·
@dltHub @huggingface Nice pipeline. Versioned Parquet keeps data provenance clear and reproducible
English
1
0
2
20
dltHub
dltHub@dltHub·
Your production traces are gold, but they aren’t training data yet. We turned raw agent traces into a specialist model using: dlt → @huggingface → Distil Labs The result? A 0.6B model beating a 120B one by 28 points on a specific task. 🔥 Here’s the pipeline ↓
dltHub tweet media
English
2
1
8
341
dltHub retweetledi
LanceDB
LanceDB@lancedb·
@dlthub and @huggingface 🤗 just shipped a clean way to ingest Hugging Face datasets into LanceDB 🚀 Query datasets over hf:// with DuckDB, stream them in batches, and load them into LanceDB with embeddings generated during ingest. The result is a simple Python path from Hub dataset to a searchable, explorable table.
LanceDB tweet media
English
2
6
23
854
dltHub
dltHub@dltHub·
@tayloramurphy @mattarderne Thanks for the mention, indeed we think the python practitioner should empower themselves with data. But BI also creates canonical models which, separate from reporting, may be very useful to DS. Previously saw DS spend 2 months to make a user table that existed in canonical.
English
0
0
0
20
dltHub
dltHub@dltHub·
@huggingface Here’s a great example of what you can build with it 👇 dlt → @huggingface → Distil Labs x.com/j_golebiowski/…
Jacek Golebiowski@j_golebiowski

Your production LLM agent is already generating the training data for its own replacement. Together with @dltHub we built a pipeline that takes those traces and trains a small specialist model from them. A 0.6B model trained this way beat the 120B teacher by 28 points on exact match. 200x smaller, under 50ms locally. You provide traces + a task description. We handle the rest.

English
0
0
1
113
dltHub
dltHub@dltHub·
The “pipeline-d hug” 🤗 is here. We just launched our @huggingface integration in dlt, bridging two worlds that have been siloed for too long. Training data lives across production DBs, warehouses, and the HF Hub. Now you can connect them with simple Python pipelines. ↓
English
2
1
8
2.4K
dltHub retweetledi
Matthäus Krzykowski
Matthäus Krzykowski@matthausk·
We are putting dltHub Pro in hands of early design partners. Two themes keep coming up: → Agentic data engineering changes businesses - e.g. consultants can now offer fixed-price projects → When AI makes building easy, you need AI-native tooling to ship to production
dltHub@dltHub

Team at @TasmanAnalytics runs data engineering projects for mid-market and enterprise clients. Their biggest challenge? Scoping. Every new client meant figuring out which APIs to connect, how long it would take, and what the data looked like, often before seeing a single row.

English
0
1
4
386
dltHub retweetledi
Matthäus Krzykowski
Matthäus Krzykowski@matthausk·
This is the pipeline that made it happen. Raw agent traces → @dltHub → schema inference, quality checks, incremental loads → @HuggingFace Hub → fine-tuned specialist. No large data engineering team. Pure Python. And now anyone can replicate it 👇 x.com/j_golebiowski/…
Jacek Golebiowski@j_golebiowski

Your production LLM agent is already generating the training data for its own replacement. Together with @dltHub we built a pipeline that takes those traces and trains a small specialist model from them. A 0.6B model trained this way beat the 120B teacher by 28 points on exact match. 200x smaller, under 50ms locally. You provide traces + a task description. We handle the rest.

English
0
1
5
368