AlexP | Data Engineer

111 posts

AlexP | Data Engineer banner
AlexP | Data Engineer

AlexP | Data Engineer

@cymlai

Clickhouse | dbt | Kafka | TiDB | MySQL 🔨 Real-time Data Engineer 📚 DWH, ML, streaming 🇨🇾 Build real-time, governed data platforms for FinTech

Cyprus Katılım Ekim 2025
32 Takip Edilen28 Takipçiler
AlexP | Data Engineer retweetledi
Alexey Grigorev
Alexey Grigorev@Al_Grigor·
A new cohort of LLM Zoomcamp starts on June 8, 2026. It’s a free 10-week course where you go from LLM basics to building a production-ready AI assistant. For this cohort, I'll update the course content during a series of live workshops. In the course, you'll learn: - Retrieval-Augmented Generation - Vector search and embeddings - AI agents - Function calling and tool use - Evaluation of RAG and agentic systems - Monitoring LLM applications Join the new cohort and build your LLM application step by step: github.com/DataTalksClub/…
Alexey Grigorev tweet media
English
3
20
105
3.9K
AlexP | Data Engineer
AlexP | Data Engineer@cymlai·
What I like most about this setup: reproducible analytics versioned transformations tested data models static public delivery low-friction deployment It’s a good pattern for shipping public data products from an engineering-first stack.
English
0
0
0
8
AlexP | Data Engineer
AlexP | Data Engineer@cymlai·
Tech stack: • Prefect orchestrates the pipeline • ClickHouse stores and serves analytics fast • dbt handles transformations and testing • Evidence turns curated data into a static analytics site • GitHub Actions / Pages automate delivery
English
1
0
0
39
AlexP | Data Engineer
AlexP | Data Engineer@cymlai·
Built an end-to-end analytics project for #DEZoomCamp It turns raw GitHub repository metadata into a dashboard using: Prefect for orchestration ClickHouse as the warehouse dbt for tested transformations Evidence for the dashboard Pages for publication github.com/aipavlo/github…
English
1
0
0
43
AlexP | Data Engineer retweetledi
Alexey Grigorev
Alexey Grigorev@Al_Grigor·
Got this message from a Data Engineering Zoomcamp participant. John-Luke landed their first junior DE role while still working on the final project. Always great to see participants finding real value in the material!
Alexey Grigorev tweet media
English
2
5
46
2.3K
AlexP | Data Engineer
AlexP | Data Engineer@cymlai·
Module 7 is a solid step into real-time data engineering: not just moving events, but actually extracting insight from live streams. Definitely one of the most engaging modules in the Zoomcamp so far
English
0
0
1
12
AlexP | Data Engineer
AlexP | Data Engineer@cymlai·
The most interesting part was working with tumbling and session windows. That’s where streaming starts to feel different from classic batch analytics
English
1
0
0
13
AlexP | Data Engineer
AlexP | Data Engineer@cymlai·
Just wrapped up Module 7 of DE Zoomcamp - a really hands-on introduction to streaming data pipelines with Kafka-compatible tools, PyFlink, windowed aggregations and real event-driven processing. #flink #kafka #dataengineering
English
1
0
0
56
AlexP | Data Engineer
AlexP | Data Engineer@cymlai·
Spark feels heavy until you start thinking in partitions + transformations + shuffle + actions After this homework, I’m more confident in reading/writing Parquet and debugging jobs via Spark UI
English
0
0
0
16
AlexP | Data Engineer
AlexP | Data Engineer@cymlai·
Zone lookup join: Load taxi_zone_lookup.csv into Spark, join on location ID, then find the least frequent pickup zone. This is a clean join + groupBy/count exercise and shows how to enrich raw facts with dimensional data
English
1
0
0
19
AlexP | Data Engineer
AlexP | Data Engineer@cymlai·
#DEZoomcamp (2026) Module 6 - Batch processing with Apache Spark. This homework was a great “Spark fundamentals in practice” run: Parquet, partitions, DataFrames, Spark UI, and some real NYC taxi analytics
English
1
0
0
41
AlexP | Data Engineer
AlexP | Data Engineer@cymlai·
@DataTalksClub @dltHub Another useful detail: some columns received no data during a load, so dlt couldn’t infer types. Solution: add type hints in dlt.resource(columns). After that, everything materializes cleanly and is easy to query in DuckDB. #dataengineering
English
1
0
0
32
AlexP | Data Engineer
AlexP | Data Engineer@cymlai·
Just completed the @DataTalksClub Data Engineering Zoomcamp dlt workshop with @dltHub I built a small ingestion pipeline from a paginated REST API to DuckDB using dlt. Thread with the key steps and some insights
English
1
0
0
33