Albert Franzi

430 posts

Albert Franzi banner
Albert Franzi

Albert Franzi

@FranziCros

Director of Data at @supersidehq

Barcelona Katılım Mayıs 2011
190 Takip Edilen254 Takipçiler
Miquel Puig i Pey
Miquel Puig i Pey@miquelpuigpey·
La opacitat depèn de l'idioma en què parlis a les cortines? 😅
Miquel Puig i Pey tweet media
Barcelona, Spain 🇪🇸 Català
1
0
0
120
Tclvx
Tclvx@Tclvx1·
Gm ☀️ Another textile inspired output. 🧵 Have a good Sunday :)
Tclvx tweet media
English
3
5
38
615
Albert Franzi
Albert Franzi@FranziCros·
@Fondes81 Avui ha tornat a passar, just al nostre darrere una noia amb un bon trau al cap.
Català
1
0
0
67
Jaume Alsina
Jaume Alsina@Fondes81·
Para los usuarios que vais con el tren de Rodalies (#rod1). Cuidado en el paso por Sant Adrià, nos acaban de lanzar una piedra por la ventana rompiéndola con el tren en marcha. Por suerte no ha habido heridos, pero el susto no nos lo quita nadie. #renfe #rodalies
Jaume Alsina tweet mediaJaume Alsina tweet mediaJaume Alsina tweet media
Español
8
37
41
0
Albert Franzi
Albert Franzi@FranziCros·
@Jra_tech @thebmbennett @pdrmnvd Define complexity. The solution is just Airflow in K8s with CI/CD pipelines. I believe that a lot of companies are using k8s as part of their infrastructure, so it's just reusing some staff on a really simple way, since airflow helm is quite simple to deploy and to integrate with
English
0
0
0
50
Joshua Andrews
Joshua Andrews@Jra_tech·
@thebmbennett @pdrmnvd @FranziCros dbt cloud does have its issues, especially with the development IDE. However when I read that article it keeps saying things like “just do this simple thing” but what I see is a DevOps team required to run and maintain all of the additional tooling. lots of complexity!!
English
2
0
2
561
b bennett | 500+ connections
b bennett | 500+ connections@thebmbennett·
Today I'm reviewing the PR that will migrate my org off dbt Cloud. In addition to saving money, we were inspired by this brilliant article and how-to guide by @FranziCros. They helped me realize I wasn't alone in my dbt Cloud frustration medium.com/albert-franzi/…
English
6
5
102
16.5K
Albert Franzi
Albert Franzi@FranziCros·
@thebmbennett Happy to inspire you :) I hope the article helped your team on archiving the success. Btw, I would be interested in knowing your transition experience :) Any other tools you included?
English
0
0
1
47
Albert Franzi
Albert Franzi@FranziCros·
@fred_irodrigues @code @ApacheAirflow We run all, but it could be possible to extract from the merge diff which models were updated. We do something similar for running SQLfluff only on updated models
English
0
0
1
73
Fred Rodrigues
Fred Rodrigues@fred_irodrigues·
@FranziCros @code @ApacheAirflow Thanks for sharing, great article :D When you merge from a PR to a main branch, do you run all the models or just the one's that were modified?
English
1
1
0
90
Albert Franzi
Albert Franzi@FranziCros·
We just published an article about how we moved successfully away from DBT Cloud to DBT Core + @code + @ApacheAirflow. link.medium.com/wuOwVYtRSvb Some context triggering our movement on my previous tweet (twitter.com/FranziCros/sta…)
Albert Franzi tweet media
Albert Franzi@FranziCros

So, after @getdbt announcement on doubling their subscription plan from $50/seat to $100/seat, we decided to move all to DBT Core + @ApacheAirflow + VS (@code). DBT is an excellent open-source project supported by the community and the company, but this movement is abusive.

English
2
4
17
2.3K
Albert Franzi
Albert Franzi@FranziCros·
@anna__geller @startdataeng It could be because airflow, yes. But also because I prefer to have only orchestration logic in the DAGs and then all the transformation logic on their domain repos/Dockers. So airflow is always agnostic about which libraries or dark magic are you doing in the tasks
English
1
0
1
36
Anna Geller
Anna Geller@anna__geller·
@FranziCros @startdataeng Interesting, could it be that you made that decision because of Airflow limitations? Because I agree this is the right way to use Airflow, but it's not necessarily the right way to approach orchestration in general. Kubernetes is an orchestrator itself and it executes a lot
English
1
1
0
34
Joseph Machado
Joseph Machado@startdataeng·
Thinking about using a data orchestrator, here are a few things to keep in mind 👇 1. Do not process large data in the orchestrator; process them via an external system (Spark, Snowflake, Postgres, etc.). #dataengineering #datapipeline #dataprocessing
English
4
3
30
5.8K
Albert Franzi
Albert Franzi@FranziCros·
@anna__geller @startdataeng I would think more on a way where orchestration and job to be done are completely decoupled. I try as much as possible to execute K8sPodOperators instead of PythonOperators
English
1
0
0
34
Anna Geller
Anna Geller@anna__geller·
@startdataeng Thanks for sharing. What tools did you use as a basis for this? I think a lot of that applies e.g. to Airflow but not necessarily all orchestrators in Prefect it's totally fine to run even heavy processes directly in your flow because scheduling is decoupled from execution
English
2
0
3
368
Albert Franzi
Albert Franzi@FranziCros·
@AndyRitting @floydophone @getdbt @ApacheAirflow @code @dagster Why only changed models? Are you defining all DBT models as views? Otherwise, you will be only updating the new models but not updating the existing ones with the new incoming data. You can always use DBT Core in local and then just one seat in DBT cloud as orchestrator.
English
1
0
1
129
Andy Ritting
Andy Ritting@AndyRitting·
@floydophone @FranziCros @getdbt @ApacheAirflow @code @dagster I run dbt core in the published docker container from GitHub on my local bitbucket runner. I would assume gitlab could do similar. Benefit is the container gets direct access to the code on PR. Using airflow instead seems more complicated. I wish to run only changed models
English
1
0
0
87
Albert Franzi
Albert Franzi@FranziCros·
So, after @getdbt announcement on doubling their subscription plan from $50/seat to $100/seat, we decided to move all to DBT Core + @ApacheAirflow + VS (@code). DBT is an excellent open-source project supported by the community and the company, but this movement is abusive.
English
13
12
88
16.1K
Albert Franzi
Albert Franzi@FranziCros·
We will be publishing an article explaining the entire solution so others can use it too :)
English
3
0
26
2.1K
Albert Franzi
Albert Franzi@FranziCros·
@fred_irodrigues @getdbt @ApacheAirflow @code Our company is already in Gitlab 😅🫠, so we use Gitlab Runners in our K8s. Also, I have to admit, that GitHub was quite better on CI/CD and integrations with Slack than Gitlab is right now. But that could give us material for another thread 🧵🤓
English
1
0
1
410
Albert Franzi
Albert Franzi@FranziCros·
@joshnekoff We are using AWS, but we run everything in K8s, so it shouldn't be an issue.
English
1
0
1
35
giosué
giosué@joshnekoff·
@FranziCros Great! Looking forward to reading it when you publish. On what platform would that be?
English
1
0
0
35
Albert Franzi
Albert Franzi@FranziCros·
@rahulj51 @getdbt @ApacheAirflow @code The Airflow DAG is scheduled to run on a daily basis, however, when we merge code to the main branch, we want to speed up our work and execute all the new code straight away. So we trigger the DAG making sure we have in Prod what we have in main without waiting until the next one
English
1
0
2
308
Rahul Jain
Rahul Jain@rahulj51·
@FranziCros @getdbt @ApacheAirflow @code Out Of curiosity, how do you trigger dbt from airflow? 1. dbt project is bundled in the same machine as AF 2. AF copies the project to the worker at runtime 3. With K8s? 4. Others?
English
1
0
1
347
Albert Franzi
Albert Franzi@FranziCros·
@rahulj51 @getdbt @ApacheAirflow @code Nothing, after using VC with the DBT plugins we got everything covered. Also we publish the DBT catalog to S3 so we can navigate it and share it with the rest of the team.
English
1
0
7
1.2K