Germain LEFEBVRE

941 posts

Germain LEFEBVRE

Germain LEFEBVRE

@germainlefebvr4

@zatsit_ #Infra #DevOps #Cloud #Docker #Kubernetes #Crossplane #OpenSource

Lille, France Katılım Temmuz 2015
300 Takip Edilen161 Takipçiler
Germain LEFEBVRE
Germain LEFEBVRE@germainlefebvr4·
@bearstech Il y a fort fort lointain, j’avais démarré le développement d’un jeu style OGame avec l’univers de Command&Conquer. Au bout de 3 mois de dev, le site C&C Alliance est sorti 😭
Français
1
0
3
200
bearstech
bearstech@bearstech·
Vous avez un projet perso que vous regrettez de ne pas avoir concrétisé ?
bearstech tweet media
Français
4
5
11
3.1K
Germain LEFEBVRE
Germain LEFEBVRE@germainlefebvr4·
C'est parti pour le mois du #Hacktoberfest ! On vous prépare un mois mouvementé : - Développement - Documentation - Projets Open Source - Meetups Pas le temps de s'ennuyer... Objectif : Partager !
Français
0
0
2
36
Germain LEFEBVRE retweetledi
𝘇𝗮𝘁𝘀𝗶𝘁
𝘇𝗮𝘁𝘀𝗶𝘁@zatsit_·
Cet aprem go to Jupyter 😉 après avoir pris du recul ce matin pour bien comprendre que l'IA ne se résume pas à Chatgpt, on pratique quelques labs pour, par exemple, avoir son LLM en local, générer des images... Pour savoir quand utiliser l'#IA il faut comprendre how it works !
Français
1
3
5
131
Germain LEFEBVRE
Germain LEFEBVRE@germainlefebvr4·
Cet outil rend l'exploitation et le Patching Management des images Docker moins douloureux. En donnant accès à cette information directement aux équipes applicatives, vous leur donnez la possibilité de réduire le retard sur les mises à jour.
Français
0
0
0
10
Germain LEFEBVRE
Germain LEFEBVRE@germainlefebvr4·
Utiliser Kubernetes c'est bien, rester à jour sur les images utilisées dans le cluster c'est mieux. Version Checker de Jetstack permet de visualiser le retard sur les versions d'images. github.com/jetstack/versi…
Français
1
0
0
24
Germain LEFEBVRE retweetledi
DigitalOcean
DigitalOcean@digitalocean·
📱🧑‍💻 Calling all #OpenSource community members: #Hacktoberfest 2024 is right around the corner. Join your fellow #developers all October long in contributing to your favorite open-source projects. Registration opens on September 23. Learn more: #hacktoberfest-2024-is-just-around-the-corner?utm_medium=source_organic&utm_source=twitter&utm_campaign=global_brand_followers_en&utm_content=event" target="_blank" rel="nofollow noopener">digitalocean.com/blog/hacktober…
English
0
5
10
3.2K
Germain LEFEBVRE retweetledi
Stanislav Kozlovski
Stanislav Kozlovski@kozlovski·
Everyone is using Kafka. ​ But almost no one is using its new Infinite Storage feature. ✨ ​ KIP-405 is introducing the ability to store Kafka data in S3. ​ And any other external store for that matter. ​ It’s incredibly needed, because storage is Kafka’s biggest flaw right now. ❌ ​ Kafka was not originally developed with elasticity in mind. ​ It’s key limitation is that it co-locates the data with the broker, resulting in many problems: ​ 1. Competition for disk IOs ⚔️ ​ Historical consumers can decrease write performance by up to 43% (as shown in the tests below) because they force Kafka to read from the disk (as opposed to its page cache) and that causes extra disk strain. ​ This is especially bad for HDDs, which have notoriously improved exponentially in all aspects BUT their IOPS capacity. 👎 ​ They have been stuck at roughly 120 IOPS for the last two decades - so you can’t allow that precious IOPS to be used up… without expecting catastrophic latencies that is. 2. Speaking of HDDs - Kafka is practically REQUIRED to use them due to their cost-effectiveness. 💾 ​ But they can give you large tail latencies. ​ With tiered storage, you can afford very fast small local SSD storage. ⚡️ ​ This also means you can provision less memory because you don’t need to be as reliant on page cache for serving historical reads. 💡 ​ Previously, you’d be reliant on it so that you reduce hits to the disk in historical read scenarios which result in prolonged periods of under-replicated partitions, and extra network bandwidth of the cluster being used up. ​ 3. Disastrous broker disk failure scenario - in such cases, the broker has to re-fetch everything it had on disk (TBs). ​ This sudden extra historical read traffic at max allowed throughput severely impacts latencies. ​ Such broker start up scenarios can be 12,000% slower and can worsen produce latencies by up to 900%, as shown by tests below. 😱 ​ 4. Slow broker failure recovery - if a broker simply restarts for whatever reason (e.g host VM died), it has to catch up with a lot of data, proportional to the time it was dead. ​ This again exhausts precious IOPS, causes extra bandwidth and just generally takes slower to resolve under-replicated partitions. 🐌 ​ 5. Reassigning partitions - partitions that have a lot of data are extremely slow to move. ​ At a decent 100MB/s replication rate, 10TB of data moves at a whopping 27.7 hours! That’s more than one day! ​ In the intermediate state, this means you're replicating 2x as much data for 28 hours. 🤦🏻‍♂️ This means that any actions like expanding or shrinking a cluster will realistically take you days to finish, while also consuming a TON of extra replication bandwidth. It's a real disaster. ​ And finally - you can never effectively rebalance partitions in a reactive way to resolve any problem fast. (since fast isn't measured in hours) 👎 ​ 6. Impractical to scale storage 😮‍💨 ​ If you decide to increase your retention settings across the cluster for whatever reason (e.g GDPR), you either need to: ​ • scale horizontally: add new brokers and waste unnecessary CPU/memory resources) • scale vertically: do some complex and fragile disk swaps on them ​ 7. Your cluster set up ends up impacted by disk requirements 😠 ​ You end up with a significantly larger cluster than you would need if disk size wasn’t a concern. i.e you’re buying extra CPU/memory you don’t need. 👎 ​ This is because HDDs have limited size - so there are cases where you may need to use extra machines just so you can place more HDDs in there. ​ Not to mention the extra maintenance burden for supporting more nodes. ​ 8. High cloud cost ☁️ 💸 ​ In the cloud, it’s more expensive to provision larger disk volumes that are attached to the instance. ​ 9. Max storage limitation per partition 🛑 ​ You’re limited by how much data you can store on a single partition based on the limit of the physical disk on the broker. ​ While admittedly a very niche use case, why couldn’t you have a single partition that consists of terabytes of historical data? ​ Those are a lot of problems... ​ How do we solve all of this? 😥 ​ Simple. Put the data in S3 ✨ ​ That is what Tiered Storage is - it extends Kafka’s storage beyond the local one by retaining the data in a pluggable external store (HDFS, S3, etc). Pluggable is a key word here, as it will enable the open-source community to develop different implementations for different external stores in parallel. This can be implemented via the RemoteStorageManager interface. ​ Kafka will end up having TWO tiers of storage placement: 1. a local one (hot) 🥵 2. a remote one (cold) 🥶 ​ You will be able to enable this uniquely per topic, with varying local and remote retention settings. ​ This will be done transparently to any clients - they won’t be able to tell when they’re fetching from the remote store as the Kafka API remains the same and simply abstracts it away. ​ 😡 Won’t this kill latency? ​ In theory, one should expect slower reads from the remote log store. ​ But. This isn’t a problem practically as historical workloads are usually not performance sensitive. 💡 ​ The latency-sensitive workloads usually read from the tail of the log (latest data), and are therefore not impacted by this feature. ​ Performance tests were done nevertheless! (using HDFS as the external system) ​ They focused exclusively on write latency and the impacts there: ​ • The largest produce latency increase in the tests was 21ms → 25ms of p99 produce latency in the steady state. ​ With different scenarios came different results. ​ Get this - when there are historical reads (out of sync consumers), the produce latency was actually improved! 🔥 ​ This is because without tiered storage, consumers reading old data compete for IOs on the disk for reading (normal consumers don’t since data is served from pagecache). ​ This reduces the IOs that writes can get and write disk latency increases. ❌ ​ The tests showed 42ms of p99 produce with tiered and 60ms of p99 produce without tiered storage in this historical consumer scenario - a 28% latency decrease. ​ And the heavy-hitter final case - rebuilding a broker with an empty disk. ​ For just 12TB of data, recovery took almost 4 hours in their test without tiered storage, and only 2 minutes with tiered storage (a 120x improvement). ​ During this broker recovery, the p99 produce latency was 490ms without tiered and 56ms with tiered - a 9x improvement! Most importantly? ​ This completely flips the script and enables Kafka to be used as a true long-term store. ​ This is strongly in opposition to its most widely-used use-case today, which is a durable but ultimately ephemeral storage - akin more to a pipe than anything else. 🪈 ​ Lakehouses are all the buzz today, but has anybody pondered about what a Streaming Lakehouse might look like? 🤔 ​ -- ​ Let's be frank... is there any place with better Kafka content on the internet right now? If you like this and want to see more, help me out in two ways: ​ 1. Repost so your network learns too! 💥 2. Follow me here for more high-quality Kafka content - ✅ @kozlovski ​ They take 10 seconds to do - writing this takes me 10+ hours. ✌️ ​ #Kafka #ApacheKafka
Stanislav Kozlovski tweet mediaStanislav Kozlovski tweet mediaStanislav Kozlovski tweet mediaStanislav Kozlovski tweet media
English
4
81
371
29.5K
Germain LEFEBVRE retweetledi
Danilo Poccia
Danilo Poccia@danilop·
Today, we’re announcing a public preview of AWS App Studio, a generative AI-powered service that uses natural language to create enterprise-grade applications in minutes, without requiring software development skills. More in @donnieprakoso post: aws.amazon.com/blogs/aws/buil…
GIF
English
2
49
160
15.5K
Germain LEFEBVRE
Germain LEFEBVRE@germainlefebvr4·
@NFourre C’est dommage de combler une implémentation native par un ralentissement global en rajoutant une couche de processing :/ Quelle est la plus value ? Plus joli n’est pas une bonne réponse
Français
1
0
0
123
Germain LEFEBVRE retweetledi
DevLille
DevLille@DevfestLille·
Les rediffusions sont disponibles ! 🎥 Vous pouvez retrouver à ce lien la playlist qui contient toutes les vidéos du Devfest Lille 2024 ! youtube.com/playlist?list=…
Français
0
18
20
1.9K
Germain LEFEBVRE
Germain LEFEBVRE@germainlefebvr4·
@bearstech J’ai travaillé sur un projet qui utilisait Clickhouse, et je confirme le niveau impressionnant de performance, mais aussi de compression des données, qui n’affecte que très peu les perfs.
Français
0
0
2
49
bearstech
bearstech@bearstech·
ClickHouse est un système de gestion de base de données orienté colonnes Open Source, conçu pour l'analytique en temps réel. Il permet de traiter rapidement de grandes quantités de données analytiques. github.com/ClickHouse/Cli…
Français
1
8
26
2.8K