Joseph Machado

1.9K posts

Joseph Machado banner
Joseph Machado

Joseph Machado

@startdataeng

I write about data engineering | SQL | Python | Distributed systems. Get my free data engineering course at https://t.co/sZTEcV0Q9W

New york Katılım Nisan 2020
48 Takip Edilen9.4K Takipçiler
Joseph Machado retweetledi
Mitchell Hashimoto
Mitchell Hashimoto@mitchellh·
I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out. I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really). It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely. The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture. We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying. I worry.
English
512
1.9K
15.3K
1.6M
Joseph Machado
Joseph Machado@startdataeng·
@sspaeti Nice, How'd you get image view inside vim? I use render markdown but no image protocols. Any recommendations?
English
1
0
0
144
Simon Späti 🏔️
Simon Späti 🏔️@sspaeti·
Are you jumping from IDE to IDE? Learn vim motions. An investment that pays off for your career. It'll be here in 50 years - unlike most editors. ssp.sh/brain/vim-moti…
Simon Späti 🏔️ tweet media
English
3
3
41
3.5K
Joseph Machado
Joseph Machado@startdataeng·
The API is the easy part. Production is where the real learning happens.
English
0
0
1
76
Joseph Machado
Joseph Machado@startdataeng·
6. Read the Spark UI: Slow stages, skewed tasks, spill to disk are all there. If you can't diagnose a hanging job, you can't own one in production. 7. Observability, audit, and lineage: You need to know what ran, when, on what data, and whether it succeeded.
English
1
0
1
93
Joseph Machado
Joseph Machado@startdataeng·
Spark API is easy to learn. But to debug a hanging job, you need to know Spark internals. Here are 7 topics to know for production Spark 👇
Joseph Machado tweet media
English
1
1
9
388
Vic 🌮
Vic 🌮@VicVijayakumar·
40% of people earning $500,000+ per year say they’re living paycheck to paycheck.
Vic 🌮 tweet media
English
36
2
114
28.8K
Joseph Machado
Joseph Machado@startdataeng·
PSA: Understand the concepts and read the docs, before using LLMs Claude sent me on a wild goose chase, hallucinations, complex setup that breaks stuff, etc Wasted a lot of time, only to realize the tool(quarto) I work with already does what I needed
English
0
0
3
403
Joseph Machado
Joseph Machado@startdataeng·
Too many small files in your data lake impact performance. Detect it with Spark UI 1. Go to the stages tab, see the event timeline. 2. Many small tasks (1 task = 1 green chunk) indicate a many-small-files (or partitions) problem. Fix coming tomorrow #dataengineering
Joseph Machado tweet media
English
0
3
9
580
Flavio Amiel
Flavio Amiel@fba·
Most SEO agencies will never tell you this: You can take a website to 100K visits/month in under a year. Without backlinks, a massive team nor ads. Just some YOLO SEO. Do these 7 moves and you'll be playing in easy mode, I swear. 1. Topical map first: know what you're allowed to rank for before even getting started. 2. Rewrite the existing pages before writing new ones, and make sure you comply with your topical map. 3. 90-day editorial calendar, low-hanging fruit only in the beginning. We'll scale that later. 4. 100-200 pSEO pages on a real strategy (not spam). Preferably, you have proprietary data. 5. 3-5 free tools where the market has gaps. You know the ones "Free AI generators" of all types 6. 20-50 commercial pages that match buyer intent. Landings, commercial pages, etc. 7. Weekly optimization on content + internal links. Don't stop optimizign until you see that needle moving. Ok? Want the full play? Comment "YOLO SEO" + like this post, and I'll DM the link to the full playbook.
Flavio Amiel tweet media
English
314
35
547
43.3K
Joseph Machado
Joseph Machado@startdataeng·
Starting to learn data engineering, and are not sure where to begin? Start with dbt, here’s why 1. Multi-hop data flow: Go from staging → Intermediate → Data Marts represent progressively transformed data.
Joseph Machado tweet media
English
2
3
21
1K
Joseph Machado
Joseph Machado@startdataeng·
@fba My website is not as big to need your services, do you have any book/course recommendation to learn SEO for a tech blog?
English
0
0
0
143