Toby Mao

1.8K posts

Toby Mao

@Captaintobs

Cofounder and CTO of Tobiko Data. Building SQLMesh and SQLGlot.

San Mateo, CA Katılım Aralık 2013

362 Takip Edilen2.1K Takipçiler

Toby Mao@Captaintobs·12 Kas

SQLMesh is a Python framework that makes writing and orchestrating SQL easier. All you have to do is write SQL and SQLMesh will figure out the rest. It can find dependencies automatically (no need for ref). It understands what has been done and what needs to be done (cron and version tracking). It even performs syntax checking and validations so you don't always have to send your queries to the warehouse. But I guess if you are truly five years old, it'd be something like. Your mommy and daddy help companies understand what's going on every day. When people buy things on the internet, computers get information. SQLMesh helps your mommy and daddy turn that information into numbers so that they can look at them in pretty pictures to make decisions.

English

319

DataEngDude@dataenggdude·12 Kas

Someone explain SQLMesh to me like I'm 5, please. #dataengineering #sql #sqlmesh

English

458

Toby Mao retweetledi

Mark Rittman@markrittman·31 Eki

This week’s episode of @drilltodetail features @Captaintobs from @TobikoData on @Airbnb, #DataOps and the engineering innovation behind @SQLMesh

English

1.4K

Toby Mao@Captaintobs·26 Eki

@wahmedswl @SQLMesh @dagster @kestra_io you could simply use github actions, or soon tobiko cloud

English

113

Waqas Ahmed@wahmedswl·26 Eki

@Captaintobs @SQLMesh So, a scheduler like @dagster or @kestra_io is mandatory with @SQLMesh

English

105

Toby Mao@Captaintobs·26 Eki

Exciting new feature for @SQLMesh. Custom signals / triggers. SQLMesh's built-in scheduler controls which models are evaluated when the sqlmesh run command is executed. It determines whether to evaluate a model based on whether the model's cron has elapsed since the previous evaluation. For example, if a model's cron was @daily, the scheduler would evaluate the model if its last evaluation occurred on any day before today. Unfortunately, the world does not always accommodate our data system's schedules. Data may land in our system after downstream daily models already ran. The scheduler did its job correctly, but today's late data will not be processed until tomorrow's scheduled run. You can now use signals to prevent this problem! Just write a function in Python to determine which intervals are ready to be processed. sqlmesh.readthedocs.io/en/latest/guid…

English

Toby Mao@Captaintobs·26 Eki

@wahmedswl @SQLMesh every time you call run

English

136

Waqas Ahmed@wahmedswl·26 Eki

@Captaintobs @SQLMesh Does built-in scheduler only kicks-in when sqlmesh run command is executed or it's always running?

English

145

Toby Mao retweetledi

Simon Späti 🏔️@sspaeti·23 Eki

SQLMesh concepts with plans that apply to different environments (prod, dev) are elegant. Even `fetchdf` is integrated into the CLI. Also, on the right, you see SQLMesh auto-detecting the new columns as non-breaking and simply applying the (virtual) changes `y`.

English

3.4K

Toby Mao retweetledi

Nico Ritschel@nicoritschel·23 Eki

It surprises me how most poorly most data orchestrators support batch size for backfills Meanwhile, @SQLMesh open source supports out of the box

Nico Ritschel@nicoritschel

Simplifying our data stack i.e. killing Airflow

English

6.7K

Toby Mao@Captaintobs·16 Eki

@wahmedswl thanks!

English

176

Waqas Ahmed@wahmedswl·16 Eki

@Captaintobs Awesome read 👍

English

216

Toby Mao@Captaintobs·15 Eki

⚠️⚠️Read this before you start using dbt's microbatch models. There are three large gaps that could lead to serious data issues. Due to fundamental architectural design choices of dbt, the microbatch implementation is very limited. At its core, dbt is a stateless scripting tool with no concept of time, meaning it is the user's responsibility to figure out what data needs to be processed. 1. dbt's microbatch can lead to silent data gaps 2. dbt's lack of scheduling requires manual orchestration which could lead to incomplete and incorrect data 3. Mixed time granularities in microbatch can cause incomplete data and wasted compute tobikodata.com/dbt-incrementa…

English

Toby Mao@Captaintobs·12 Eki

@nicoritschel @TobikoData @SQLMesh it's just a prototype so far :)

English

Nico Ritschel@nicoritschel·12 Eki

h/t @TobikoData & @SQLMesh

178

Nico Ritschel@nicoritschel·12 Eki

Metrics in SQL today 👀

sidequery@sidequerydev

English

2.2K

Toby Mao retweetledi

Tobiko Data@TobikoData·10 Eki

Tired of messy data pipelines? 🥹🫣 Check out the @SQLMesh + @dltHub integration for seamless metadata handovers, faster scaffolding, and incremental processing. 💻 Simplify your data workflows! 🔗 tobikodata.com/integrated-dat… #DataEngineering #DataPipelines

English

1.1K

Toby Mao@Captaintobs·7 Eki

@SQLMesh is so good it's banned from dbt's Coalesce conference. If you're interested in learning about what makes it amazing, I'll be in Vegas for the duration of the event. * Tired of unmaintainable Jinja? * Want free column level lineage? * Can't afford expensive full refreshes? * Incremental models becoming out of sync / needing repair? * Sick of waiting 10 minutes for your warehouse to tell you you're missing a parenthesis? Hit me up and we can grab a coffee to talk about how we fix some of the pain.

English

365

Toby Mao@Captaintobs·4 Eki

If you're gonna be in Vegas next week hit me up. We can get coffee or hit the craps table... and talk data :)

English

602

Toby Mao@Captaintobs·4 Eki

The wait is over! You can now use Athena with @SQLMesh. Both Iceberg and Hive are supported but we heavily recommend Iceberg since it's a way better experience. And if you're still stuck on dbt, don't worry, we also have support for the dbt-athena adapter so you can have a seamless migration! Like always, if you run into any issues, feel free to file an issue or let us know on Slack. We're abnormally responsive :) sqlmesh.readthedocs.io/en/stable/inte…

English

4.6K

Toby Mao retweetledi

DataTalksClub@DataTalksClub·3 Eki

Featuring SQLMesh on this week's episode of Open-Source Spotlight, our series where we're discovering open-source tools. @TobikoData's Toby Mao, @Captaintobs, joined us in demonstrating how this tool can help data team workflows. Watch the demo here: youtu.be/ASiBidAFdwM

YouTube

English

1.2K

Toby Mao retweetledi

Tobiko Data@TobikoData·2 Eki

Join @Captaintobs and @Al_Grigor from @DataTalksClub on their Open-Source Spotlight series, where they talk through the benefits of @SQLMesh like: ♦ Column-level lineage ♦ Environment Management ♦ Instant Prod deployments and much more! youtu.be/ASiBidAFdwM

YouTube

English

668

Toby Mao@Captaintobs·22 Eyl

If you’re gonna be at Coalesce in Vegas, make sure to come to @TobikoData’s happy hour October 8 at 5pm! Also if you just wanna meet up with me and grab a coffee, I’d love to chat! DM me! cube.registration.goldcast.io/events/0936487…

English

605

Toby Mao@Captaintobs·22 Eyl

@wahmedswl Not at the moment.

English

Waqas Ahmed@wahmedswl·22 Eyl

@Captaintobs Any plan to have that in OSS?

English

Toby Mao@Captaintobs·21 Eyl

As a data engineer, you should consider how changes can be done in a non-breaking way. A non-breaking change to a data model is something that won't have any down stream impact, like adding a column or re-ordering columns. Adding columns only impacts down stream models when they do SELECT * statements, which is one of the reasons why it's best practice to avoid them. On the other hand, a breaking change will have significant impact on down stream models and usually requires expensive back-fills. An example of a breaking change is modifying a WHERE statement which changes the cardinality of a table. If you're working at any significant scale where it's expensive and time consuming to back-fill many tables, consider whether or not a change can be done in a backwards compatible way and how expensive a breaking change would be. If it's not very expensive to make a breaking change, it can be easier to maintain since all models are kept up to date without any legacy, so there's always a trade-off. Even if it's not too costly to back-fill many tables, it can be time consuming communicating breaking changes to stakeholders or validating all data consumers are up to date. Arguably, this is even more challenging than the technical/compute costs of breaking changes. As a software engineer, it's commonplace to consider whether changing an API or a database model should be done in a breaking or non-breaking fashion. I believe this best practice should be adopted by data teams as well. That's why we designed #SQLMesh to provide automatic detection of breaking and non-breaking changes by analyzing your SQL queries. This allows you to assess the impact of your changes at compile time and understand potential costs (both compute and organizational) before you finalize your changes. tobikodata.com/automatically-…

English

2.1K

Toby Mao@Captaintobs·22 Eyl

@wahmedswl We have that in Tobiko Cloud linkedin.com/posts/toby-mao…

English

Waqas Ahmed@wahmedswl·22 Eyl

@Captaintobs Sqlmesh is awesome tool, what it really miss is orchestration UI. When we can ran cron jobs in Sqlmesh, it should have UI as well

English

110

Keşfet

@drilltodetail @TobikoData @Airbnb @SQLMesh @wahmedswl @dagster @kestra_io @daily