John Sheffield

1.4K posts

John Sheffield banner
John Sheffield

John Sheffield

@johnmsheffield

#rstats, #rspatial, machine learning with @climate_risQ (acquired by Intercontinental Exchange) in Boston.

Katılım Ocak 2011
1.4K Takip Edilen212 Takipçiler
John Sheffield retweetledi
François Chollet
François Chollet@fchollet·
The Functional API in Keras is super concise -- it lets you define complex models in about half as much code as the subclassing approach. It also saves you a lot of debugging time, because it checks all input compatibility assumptions at construction time, long before you run the model. Any model you can build, will run. This is similar to using statically typed code -- but for layers.
François Chollet tweet media
English
4
9
134
19.7K
John Sheffield retweetledi
Jack Morris
Jack Morris@jxmnop·
We spent a year developing cde-small-v1, the best BERT-sized text embedding model in the world. today, we're releasing the model on HuggingFace, along with the paper on ArXiv. I think our release marks a paradigm shift for text retrieval. let me tell you why👇
Jack Morris tweet media
English
68
426
3.2K
433.8K
John Sheffield retweetledi
Leland McInnes
Leland McInnes@leland_mcinnes·
Conversely, some folks believe that RAG is pretty much the point of embeddings, but there's so much more. Embeddings let you unlock the potential of unstructured data -- for exploration, for curation, for explainable models, and data understanding in general.
Maria Khalusova@mariaKhalusova

Some folks believe that you _must_ have embeddings to build RAG. Though very common, this is technically completely optional. You can, _if you want_, have RAG without embedding the data.

English
3
7
83
7.4K
John Sheffield retweetledi
Leonardo Jo
Leonardo Jo@leonardojo·
It took me SEVEN years to figure out that you can save a ggplot as a vectorized .svg file ggsave("file.svg"). The svg file can be opened on powerpoint and it will be completely vectorized, even text are still recognized as text boxes! No more recreating plots on illustrator!!!!!
Leonardo Jo tweet media
English
134
920
7K
925.1K
John Sheffield retweetledi
Leland McInnes
Leland McInnes@leland_mcinnes·
If you could get a clustering algorithm and library specifically designed for fast clustering of embedding vectors (CLIP, sentence-transformers, Cohere-embed, etc.), what features would you most want it to have?
English
29
14
103
20K
John Sheffield retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
Wanted to share the good news that "Build an LLM from Scratch" is now in its final stages. Currently putting on the final touches on classification-finetuning chapter, and then moving on to the last chapter: instruction finetuning! The new estimated publication date is now Summer 2024. (In the meantime, there's currently a discount on the Manning website: manning.com/books/build-a-…)
Sebastian Raschka tweet media
English
30
160
1.2K
130.6K
John Sheffield retweetledi
Patrick McKenzie
Patrick McKenzie@patio11·
And then I’ll make my annual exhortation: a relatively small number of geeks thinking they are playing a 45 year game will find they were actually playing a 15 or 20 year game due to issues mostly beyond their control. Buy term life and own occupation disability insurance.
English
3
7
184
11.5K
John Sheffield retweetledi
Bojan Tunguz
Bojan Tunguz@tunguz·
Machine Learning can be an overwhelming field when you first start out. But with years of study, experience, and hard work you'll realize that you know even less than you had thought.
English
35
153
2K
113.7K
John Sheffield
John Sheffield@johnmsheffield·
Hi #gischat — I’m hiring for a senior spatial analytics role in Sustainable Finance at Intercontinental Exchange. We play with data on climate, housing, remote sensing, social/demog, lots more. If you love GDAL and fun problems, please apply/say hi in DMs! egdd.fa.us6.oraclecloud.com/hcmUI/Candidat…
English
0
1
5
608
John Sheffield retweetledi
Robin Lovelace
Robin Lovelace@robinlovelace·
Ever wanted to do spatial clustering of orign-destination (OD) data? Leeds ITS + Turing PhD student Hussein Mahfouz - @h_mahfouz - has created this early-stage visualisation of spatial clustering of these zone-zone flows 🏗️ Case study of #Leeds. Looks beautiful AND useful 🎉
Robin Lovelace tweet media
English
3
10
89
7.3K
John Sheffield retweetledi
Yohan
Yohan@yohaniddawela·
High-resolution satellite images can be insanely expensive to buy. So here's a list of free datasets you can access. These datasets can be used to build foundation models, super-resolution models, or for segmentation.
GIF
English
14
132
708
86.8K
John Sheffield retweetledi
Zach Mueller
Zach Mueller@TheZachMueller·
Best resources for learning about RAGs? (Assume no prior knowledge about it). cc @HamelHusain @abacaj
English
19
14
214
62.4K
John Sheffield retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
"Simplifying Transformer Blocks" ranks easily among my favorite research papers that I've read this year. Here, the authors look into how the standard transformer block, essential to LLMs, can be simplified without compromising convergence properties and downstream task performance. Based on signal propagation theory and empirical evidence, they find that many parts can be removed to simplify GPT-like decoder architectures as well as encoder-style BERT models: skip connections normalization layers (LayerNorm) projection and value parameters sequential attention and MLP sub-blocks (in favor of a parallel layout) The authors also did a great job referencing tons of related work motivating their experiments. I definitely recommend reading this paper just for the references alone: arxiv.org/abs/2311.01906
Sebastian Raschka tweet media
English
47
565
3.6K
1M
John Sheffield retweetledi
OpenTopography
OpenTopography@OpenTopography·
New blog post highlighting a suite of Jupyter Notebooks to enable programmatic access to cloud-hosted USGS 3D Elevation Program (3DEP) #lidar data. Notebooks demonstrate how to use @pointcloudpipe to access & process data hosted by Amazon Open Data. opentopography.org/blog/new-colle… 1/2
OpenTopography tweet media
English
1
46
253
28.1K