Joseph Machado

1.9K posts

Joseph Machado banner
Joseph Machado

Joseph Machado

@startdataeng

I write about data engineering | SQL | Python | Distributed systems. Get my free data engineering course at https://t.co/sZTEcV0Q9W

New york Bergabung Nisan 2020
44 Mengikuti9.3K Pengikut
Joseph Machado me-retweet
Joseph Machado
Joseph Machado@startdataeng·
Left anti-join is cool! Get all the data from the left table that has no matching data in the right table select t1.* from t1 left join t2 on t1.id=t2.id where https://t.co/TilIGFtVGj is null; #data #dataengineering #SQL #Database
English
6
19
139
0
Joseph Machado me-retweet
Joseph Machado
Joseph Machado@startdataeng·
Backfilling is an inevitable part of data projects. When designing your data pipelines take some time to answer the following questions 1. Does multiple backfill runs cause duplicate data? 2. Can multiple backfills be parallelized? #data #DataEngineering #datapipeline #datasets
English
4
21
207
0
Joseph Machado me-retweet
Joseph Machado
Joseph Machado@startdataeng·
Learning data engineering? Build a pipeline locally. 1. Python to pull data from an API (e.g. Coincap) 2. Load data into a local Postgres container 3. Automate it with cron/task scheduler Start small, build, improve, & repeat. #data #dataengineering #pythonlearning #Python
English
9
105
529
0
Joseph Machado me-retweet
Joseph Machado
Joseph Machado@startdataeng·
It can be overwhelming to start learning data engineering. I'd recommend starting with the basics of python, sql, UNIX commands, building a simple data project, update Github, Linkedin. Landing a DE job is 60% part learning and 40% marketing. See reply 👇🏽 for helpful links.
English
15
79
355
0
Joseph Machado
Joseph Machado@startdataeng·
@matsonj RG35xx? I've got the purple one :) For AliExpress
English
0
0
0
110
Jacob Matson
Jacob Matson@matsonj·
in some ways you could say the future is already here
Jacob Matson tweet media
English
6
0
51
1.8K
Joseph Machado
Joseph Machado@startdataeng·
What is the biggest issue you face with writing tests? Let me know in the comments below. --- Follow me for more actionable data engineering content.
English
0
0
0
324
Joseph Machado
Joseph Machado@startdataeng·
3. Provides new engineers making changes the confidence to move fast 4. Documents edge case handling 5. Automated checks as part of CI Do you write tests for your data pipeline?
English
1
0
0
341
Joseph Machado
Joseph Machado@startdataeng·
Does pipeline code changes feel like playing Jenga? Write tests now! Save your sanity and get the highest return on time invested with tests. Here's why (well-written) tests are crucial: 👇 #data #datapipeline #datatest #unittest
English
1
0
9
831
Joseph Machado
Joseph Machado@startdataeng·
Data engineers write the most complex piece of code to Upsert into tables. Here's THE command you need to know MERGE INTO/INSERT ON CONFLICT #data #dataengineering #SQL
English
2
3
49
2.7K
Joseph Machado
Joseph Machado@startdataeng·
Sign up for our free "Data Engineering 101" course to learn about data engineering core concepts, building scalable & resilient systems, data best practices, data modeling, and building projects that reflect real-world projects. startdataengineering.com/email-course/
English
0
0
0
334
Joseph Machado
Joseph Machado@startdataeng·
Let me know in the comments how you use MERGE INTO! Interested in more actionable data engineering content? Checkout my website link (on my profile)
English
0
0
1
242
Joseph Machado
Joseph Machado@startdataeng·
2. Re-running the pipeline with the same input will not give you the same output (i.e., not idempotent) This makes debugging challenging. However, with thoughtful design, MERGE INTO/INSERT ON CONFLICT are powerful tools every data engineer must understand. ---
English
1
0
0
269