Tom Nicholas

501 posts

Tom Nicholas banner
Tom Nicholas

Tom Nicholas

@TEGNicholasCode

OSS for science, #python, @pangeo_data, @xarray_dev team. Ex-fusion, oceanography, now @_cworthy 🦋 https://t.co/4jprMBeckr

New York Joined Şubat 2020
686 Following667 Followers
Pinned Tweet
Tom Nicholas
Tom Nicholas@TEGNicholasCode·
At AGU I talked to NASA people about how agencies could better support open-source tools they rely on. I argued that our recent collaboration between Xarray and NASA ESDIS on xarray.DataTree was a good model to copy - read about how it happened here! xarray.dev/blog/datatree
English
1
4
25
1.8K
Tom Nicholas retweeted
Beto
Beto@betolink·
Science needs a social network for sharing big data @TomNicholas/H1KzoYrPJe" target="_blank" rel="nofollow noopener">hackmd.io/@TomNicholas/H… by @TEGNicholasCode
English
0
1
4
241
Tom Nicholas retweeted
Pangeo
Pangeo@pangeo_data·
We're moving over to BlueSky and LinkedIn for all our future announcements. Follow us at bsky.app/profile/pangeo… to find out more about tomorrow's showcase 😉 (p.s., it's on Xpublish at Scale at 4 PM EST 🚀) Connect with us on LinkedIn at linkedin.com/company/pangeo…
English
0
3
3
1K
Roger Creel
Roger Creel@rogercreel·
@TEGNicholasCode Exciting! Friendly encouragement to switch to Bluesky -- would love to follow you there!
English
1
0
1
39
Tom Nicholas
Tom Nicholas@TEGNicholasCode·
At AGU I talked to NASA people about how agencies could better support open-source tools they rely on. I argued that our recent collaboration between Xarray and NASA ESDIS on xarray.DataTree was a good model to copy - read about how it happened here! xarray.dev/blog/datatree
English
1
4
25
1.8K
Tom Nicholas
Tom Nicholas@TEGNicholasCode·
@alekpetty I'm hoping that virtual zarr datasets will make it easier to cloud-optimize data that was dumped in a bucket in a legacy format, and allow creating aggregated datasets with relevant derived information alongside it. github.com/zarr-developer…
English
1
0
0
75
Alek Petty
Alek Petty@alekpetty·
@TEGNicholasCode yeah definitely. And yes data is migrating to the cloud (NSIDC is my main DAAC I'm involved with, and they are moving everything over through NASA mandates). Trying to do the same for our more derived datasets too but a little unclear on best strategies/repositories for that.
English
1
0
0
22
Tom Nicholas
Tom Nicholas@TEGNicholasCode·
Completely agree - "in theory" we have the simple scalability of the cloud, but in practice it's often a headache, for no good reason, which prevents adoption by most users (including many scientists)
Matthew Rocklin@mrocklin

New Post: Cloud Computing is Broken matthewrocklin.com/cloud-is-broke… Investor asks: "What's next for Data/Cloud Infrastructure?" My answer: "Boring stuff. People struggle with basics." Cloud feels like MP3 players before iPod. In theory everything is good. In practice adoption is low

English
1
0
6
303
Tom Nicholas
Tom Nicholas@TEGNicholasCode·
@alekpetty Makes total sense. On (1) and (2) some intermediate services (e.g. Coiled, Modal) would like to sell you the solution to this, but it's annoying that NASA + AWS can't just get it right first time On (3) - is your data in the cloud at least? If not in cloud-optimized format?
English
1
0
0
65
Alek Petty
Alek Petty@alekpetty·
@TEGNicholasCode yes, e.g., 1) NASA isn't reliably providing us with cloud compute beyond some hubs, 2) AWS is a lot, hard for us to learn the fast-changing ecosystem in the limited time we have, 3) lots of our data not cloud optimized anyway so why bother
English
1
0
1
24
Tom Nicholas retweeted
Ian Schuler
Ian Schuler@ianschuler·
@mouthofmorrison @rabernat @betolink @EarthmoverHQ @steadyflux That said, it isn't 100% clear that NASA's best move is to immediately convert 10000+ data sets into cutting edge ARCO formats. Kerchunk and Virtual Zarr offer benefits of ARCO while keeping data in the native formats.
English
1
2
11
2.8K
Beto
Beto@betolink·
@mouthofmorrison @rabernat @EarthmoverHQ Tricky question, NASA has a mandate to keep it in archival, self describing formats... data would need to be duplicated and data conversion in most cases is not trivial. So At least 2x storing costs plus the overhead of data conversion.
English
1
0
1
122
Jacob Tomlinson
Jacob Tomlinson@_JacobTomlinson·
Had a great time at the @pydatanyc sprints today! Looking forward to the rest of the conference.
Jacob Tomlinson tweet media
English
1
0
7
390
Tom Nicholas retweeted
Joe Hamman
Joe Hamman@_jhamman·
We've talked a lot about #Icechunk's performance this week 🚀. But the Zarr-Python 3 results are also very encouraging! We're a few weeks away from the 3.0 launch but what this chart shows is that the new AsyncIO + multi-threading functionality in Zarr is going to be really good.
Tom Nicholas@TEGNicholasCode

ALSO this release is the first to be compatible with the much anticipated v3 implementation of zarr-python! (still on its beta branch right now) This brings big performance benefits when reading @zarr_dev on S3 via async and (b) compatibility with @EarthmoverHQ 's Icechunk.

English
0
1
8
624
Tom Nicholas
Tom Nicholas@TEGNicholasCode·
All these integrations represent literally years-worth of effort, all coming out at once 🤯 And that's not even mentioning all the other changes you see in a typical xarray release!
English
1
0
9
288
Tom Nicholas
Tom Nicholas@TEGNicholasCode·
ALSO this release is the first to be compatible with the much anticipated v3 implementation of zarr-python! (still on its beta branch right now) This brings big performance benefits when reading @zarr_dev on S3 via async and (b) compatibility with @EarthmoverHQ 's Icechunk.
Tom Nicholas tweet media
English
1
0
6
945