ByteHouse

337 posts

ByteHouse banner
ByteHouse

ByteHouse

@bytehousecloud

ByteHouse, a part of @BytedanceTalk, is a lightning-fast, fully managed #CloudDataWarehouse that enables efficient, cost-effective big data analytics.

Singapore Katılım Ekim 2021
58 Takip Edilen24 Takipçiler
ByteHouse
ByteHouse@bytehousecloud·
This is such an elegant representation of data pipelines from @alexxubyte of @bytebytego #dataengineering #datapipelines
Bytebytego@bytebytego

Data Pipelines Overview. The method to download the GIF is available at the end. Data pipelines are a fundamental component of managing and processing data efficiently within modern systems. These pipelines typically encompass 5 predominant phases: Collect, Ingest, Store, Compute, and Consume. 1. Collect: Data is acquired from data stores, data streams, and applications, sourced remotely from devices, applications, or business systems. 2. Ingest: During the ingestion process, data is loaded into systems and organized within event queues. 3. Store: Post ingestion, organized data is stored in data warehouses, data lakes, and data lakehouses, along with various systems like databases, ensuring post-ingestion storage. 4. Compute: Data undergoes aggregation, cleansing, and manipulation to conform to company standards, including tasks such as format conversion, data compression, and partitioning. This phase employs both batch and stream processing techniques. 5. Consume: Processed data is made available for consumption through analytics and visualization tools, operational data stores, decision engines, user-facing applications, dashboards, data science, machine learning services, business intelligence, and self-service analytics. The efficiency and effectiveness of each phase contribute to the overall success of data-driven operations within an organization. Over to you: What's your story with data-driven pipelines? How have they influenced your data management game? – Subscribe to our newsletter to 𝐝𝐨𝐰𝐧𝐥𝐨𝐚𝐝 𝐭𝐡𝐞 𝐆𝐈𝐅. After signing up, find the download link on the success page: lnkd.in/eawsYGiA

English
0
0
1
104
ByteHouse
ByteHouse@bytehousecloud·
As this year comes to a close, ByteHouse wishes you and your loved ones more peace, love, good health, and success in 2024. Wish you a Happy and Prosperous New Year! #happynewyear #HappyNewYear2024
ByteHouse tweet media
English
0
0
0
46
ByteHouse
ByteHouse@bytehousecloud·
ClickHouse for large-scale data ingestion and application This article explores the application of large-scale data ingestion and use with Go and ClickHouse. Read it here - @kn2414e/utilizing-go-and-clickhouse-for-large-scale-data-ingestion-and-application-146822f7020c" target="_blank" rel="nofollow noopener">medium.com/@kn2414e/utili… #dataengineering #clickhouse #golang
ByteHouse tweet media
English
0
0
0
51
ByteHouse
ByteHouse@bytehousecloud·
Data ingestion deep dive - Part 1 Sharing part 1 of the 2 part series by Jan Meskens - @meskensjan/the-art-of-data-ingestion-powering-analytics-from-operational-sources-467552d6c9a2" target="_blank" rel="nofollow noopener">medium.com/@meskensjan/th… #dataengineering #dataingestion
English
0
0
1
32
ByteHouse retweetledi
Bytebytego
Bytebytego@bytebytego·
What is GraphQL? Is it a replacement for the REST API? The diagram below shows a quick comparison between REST and GraphQL. 🔹GraphQL is a query language for APIs developed by Meta. It provides a complete description of the data in the API and gives clients the power to ask for exactly what they need. 🔹GraphQL servers sit in between the client and the backend services. 🔹GraphQL can aggregate multiple REST requests into one query. GraphQL server organizes the resources in a graph. 🔹GraphQL supports queries, mutations (applying data modifications to resources), and subscriptions (receiving notifications on schema modifications). We talked about the REST API in last week’s video and will compare REST vs. GraphQL vs. gRPC in a separate post/video. Over to you: 1). Is GraphQL a database technology? 2). Do you recommend GraphQL? Why/why not? – Subscribe to our weekly newsletter to get a Free System Design PDF (158 pages): bit.ly/3KCnWXq
Bytebytego tweet media
English
12
113
526
45.5K
ByteHouse
ByteHouse@bytehousecloud·
10 Advanced Data Pipeline Strategies In the real world, it is common for data engineers to inherit a data pipeline mess, or work under tremendous constraints. In this guide, @Databand_ai share advanced strategies for managing data pipelines. #dataengineering #datapipeline
ByteHouse tweet media
English
0
0
1
59
ByteHouse retweetledi
Zach Wilson
Zach Wilson@EcZachly·
The best tech for each task: - batch pipeline: Apache Spark - data visualization: Apache Superset - web api: NextJS (spring boot close second) - SQL database: Postgres - NoSQL database: DynamoDB - Graph database: Neo4j - front end web: React - front end mobile: React Native (Flutter close second) - CI/CD system: GitHub Actions - data quality checks: Great Expectations (Deequ close second) - data lake file management: Apache Iceberg (Delta Lake a close second) - job orchestration: Apache Airflow (Mage and/or Prefect close second) - machine learning model: XGBoost (linear regression close second) - LLM: GPT-4.5 Turbo - programming language: Python (Rust close second) - message queue: Kafka (RabbitMQ close second) - cache: Redis (Memcached close second) #softwareengineering
English
43
236
1.4K
196.4K