ByteHouse

337 posts

ByteHouse

@bytehousecloud

ByteHouse, a part of @BytedanceTalk, is a lightning-fast, fully managed #CloudDataWarehouse that enables efficient, cost-effective big data analytics.

Singapore Katılım Ekim 2021

58 Takip Edilen24 Takipçiler

Sabitlenmiş Tweet

ByteHouse@bytehousecloud·12 Ara

The Modern Data Stack - An essential guide This blog post provides you with an essential guide to the modern data stack and how you can build one using ByteHouse - bytehouse.cloud/blog/modern-da… #DataEngineering #DataWarehousing

English

112

ByteHouse@bytehousecloud·4 Oca

The challenges of semi-structured data In this insightful post, Daniel Beach talks about the challenges of semi-structured data. open.substack.com/pub/dataengine… #dataengineering #xml #json

English

ByteHouse@bytehousecloud·2 Oca

This is such an elegant representation of data pipelines from @alexxubyte of @bytebytego #dataengineering #datapipelines

Bytebytego@bytebytego

Data Pipelines Overview. The method to download the GIF is available at the end. Data pipelines are a fundamental component of managing and processing data efficiently within modern systems. These pipelines typically encompass 5 predominant phases: Collect, Ingest, Store, Compute, and Consume. 1. Collect: Data is acquired from data stores, data streams, and applications, sourced remotely from devices, applications, or business systems. 2. Ingest: During the ingestion process, data is loaded into systems and organized within event queues. 3. Store: Post ingestion, organized data is stored in data warehouses, data lakes, and data lakehouses, along with various systems like databases, ensuring post-ingestion storage. 4. Compute: Data undergoes aggregation, cleansing, and manipulation to conform to company standards, including tasks such as format conversion, data compression, and partitioning. This phase employs both batch and stream processing techniques. 5. Consume: Processed data is made available for consumption through analytics and visualization tools, operational data stores, decision engines, user-facing applications, dashboards, data science, machine learning services, business intelligence, and self-service analytics. The efficiency and effectiveness of each phase contribute to the overall success of data-driven operations within an organization. Over to you: What's your story with data-driven pipelines? How have they influenced your data management game? – Subscribe to our newsletter to 𝐝𝐨𝐰𝐧𝐥𝐨𝐚𝐝 𝐭𝐡𝐞 𝐆𝐈𝐅. After signing up, find the download link on the success page: lnkd.in/eawsYGiA

English

104

ByteHouse@bytehousecloud·29 Ara

As this year comes to a close, ByteHouse wishes you and your loved ones more peace, love, good health, and success in 2024. Wish you a Happy and Prosperous New Year! #happynewyear #HappyNewYear2024

English

ByteHouse@bytehousecloud·28 Ara

How is data warehousing adapting to the needs of Web3 In this blog post, we discuss the evolution of data warehousing to accommodate the Web3 and Blockchain space, its use cases, challenges, and solutions. bytehouse.cloud/blog/data-ware… #dataengineering #datawarehousing #web3

English

ByteHouse@bytehousecloud·27 Ara

Measuring data quality at Airbnb through Data Quality Score In this insightful article, Clark Wright shares how they developed the DQ Score, and how it will power the future of data quality at @AirbnbEng. medium.com/airbnb-enginee… @Airbnb #dataengineering #dataquality #airbnb

English

ByteHouse@bytehousecloud·26 Ara

A refresher on Linear Regression is always useful

Josep Ferrer@iamjosepferrer

Struggling with Machine Learning algorithms? 🤖 Then you better stay with me! 🤓 Today I am starting a new series of threads to simplify ML algorithms. ...and Linear Regression is the first one! 👇🏻

English

ByteHouse@bytehousecloud·26 Ara

ClickHouse for large-scale data ingestion and application This article explores the application of large-scale data ingestion and use with Go and ClickHouse. Read it here - @kn2414e/utilizing-go-and-clickhouse-for-large-scale-data-ingestion-and-application-146822f7020c" target="_blank" rel="nofollow noopener">medium.com/@kn2414e/utili… #dataengineering #clickhouse #golang

English

ByteHouse@bytehousecloud·22 Ara

Season's greetings from ByteHouse. May this time bring joy to everyone ❤️ #merrychristmas #bytehouse

English

ByteHouse@bytehousecloud·21 Ara

Data ingestion deep dive - Part 1 Sharing part 1 of the 2 part series by Jan Meskens - @meskensjan/the-art-of-data-ingestion-powering-analytics-from-operational-sources-467552d6c9a2" target="_blank" rel="nofollow noopener">medium.com/@meskensjan/th… #dataengineering #dataingestion

English

ByteHouse@bytehousecloud·20 Ara

Open Source Tools powered by AI/OpenAI for Kubernetes AI-powered tools can help you master Kubernetes by automating tasks, improving reliability, and providing insights. Read about it here: itnext.io/ai-and-kuberne… #dataengineering #kubernetes

English

ByteHouse@bytehousecloud·19 Ara

10 use cases of a data lakehouse for modern businesses Read our latest blog post exploring the use cases for a data lakehouse in modern businesses and how to build one with ByteHouse. bytehouse.cloud/blog/use-cases… #dataengineering #datalakehouse #datalake

English

ByteHouse@bytehousecloud·15 Ara

Advanced SQL for data engineers SQL is an essential tool to master in the field of data engineering. As the complexity of work increases, it is important to advance one's expertise. Here's a useful playlist by @Alex_TheAnalyst - youtube.com/playlist?list=… #dataengineering #sql

English

ByteHouse retweetledi

Bytebytego@bytebytego·14 Ara

What is GraphQL? Is it a replacement for the REST API? The diagram below shows a quick comparison between REST and GraphQL. 🔹GraphQL is a query language for APIs developed by Meta. It provides a complete description of the data in the API and gives clients the power to ask for exactly what they need. 🔹GraphQL servers sit in between the client and the backend services. 🔹GraphQL can aggregate multiple REST requests into one query. GraphQL server organizes the resources in a graph. 🔹GraphQL supports queries, mutations (applying data modifications to resources), and subscriptions (receiving notifications on schema modifications). We talked about the REST API in last week’s video and will compare REST vs. GraphQL vs. gRPC in a separate post/video. Over to you: 1). Is GraphQL a database technology? 2). Do you recommend GraphQL? Why/why not? – Subscribe to our weekly newsletter to get a Free System Design PDF (158 pages): bit.ly/3KCnWXq

English

113

526

45.5K

ByteHouse@bytehousecloud·14 Ara

Europe's new comprehensive AI rules Europe has always been the forerunner in implementing laws that safeguard the security of its netizens, and this time too, it is the world’s first to draft comprehensive AI rules - apnews.com/article/ai-act… #dataengineering #ai #generatieveai

English

ByteHouse@bytehousecloud·13 Ara

Why is peter missing?

Benjamin Bennett Alexander@RealBenjizo

Python Question; Happy Tuesday What is the output of this code, and why?🤔🤔

English

ByteHouse@bytehousecloud·13 Ara

10 Advanced Data Pipeline Strategies In the real world, it is common for data engineers to inherit a data pipeline mess, or work under tremendous constraints. In this guide, @Databand_ai share advanced strategies for managing data pipelines. #dataengineering #datapipeline

English

ByteHouse@bytehousecloud·8 Ara

Try doing a dry run

Benjamin Bennett Alexander@RealBenjizo

Python Question; Another function. Can you figure this one out? What is the output of this code and why?🤔🤔

English

ByteHouse@bytehousecloud·8 Ara

Explain it like I’m 5 (ELI5) - Apache Kafka Ever wondered how to explain Apache Kafka to non-data folks? Gently Down the Stream by Mitch Seymour is a gentle (and super cute) introduction to Apache Kafka - gentlydownthe.stream @_round_robin #dataengineering #kafka

English

ByteHouse retweetledi

Zach Wilson@EcZachly·7 Ara

The best tech for each task: - batch pipeline: Apache Spark - data visualization: Apache Superset - web api: NextJS (spring boot close second) - SQL database: Postgres - NoSQL database: DynamoDB - Graph database: Neo4j - front end web: React - front end mobile: React Native (Flutter close second) - CI/CD system: GitHub Actions - data quality checks: Great Expectations (Deequ close second) - data lake file management: Apache Iceberg (Delta Lake a close second) - job orchestration: Apache Airflow (Mage and/or Prefect close second) - machine learning model: XGBoost (linear regression close second) - LLM: GPT-4.5 Turbo - programming language: Python (Rust close second) - message queue: Kafka (RabbitMQ close second) - cache: Redis (Memcached close second) #softwareengineering

English

236

1.4K

196.4K

Keşfet

@alexxubyte @bytebytego @AirbnbEng @Airbnb @Alex_TheAnalyst @Databand_ai @_round_robin @elonmusk