
Danilo DNA Silva, PhD
266 posts





Most developers today wouldn't dream of not using version control for their code... However, the same principles can be applied to data! @EarthmoverHQ's new open source project--Icechunk--includes version control features built specifically for the @zarr_dev data model, brining powerful data version control to the world of massive multidimensional arrays. Features include * All updates occur in isolated snapshots * Tags - immutable pointers to snapshots * Branches - mutable pointers to snapshots With Icechunk, you can safely experiment with changes to your data on a "dev" branch before propagating those changes to "main." You can publish an immutable version of your dataset (tag) while continuing to evolve towards the next version. Or you can simply revert incorrect changes back to an earlier version of your data. These capabilities make life so much easier for data scientists and teams using array data in production. I've been using data version control with Zarr for the past year via our Arraylake platform, and I'm thrilled that these capabilities are now fully open source. I can't imagine going back to the old way of working. Learn more at icechunk.io







Please visit oceanhackweek.org/ohw23/ for details. Application deadline: June 2, 2023. Accepted applicants will be notified no later than June 23!










