
A simple filter-and-sort over a few thousand rows would reduce from 330ms to 150ms based on early testing. The planner skips unnecessary shuffles when it knows the data fits on one node.
The features, which are on track to merge into Spark's 4.2/4.3 release lowers the barrier for developers who want to start small and scale up without switching tools.
Huge thanks to Daniel Tenedorio and Liang-Chi Hsieh for spearheading the efforts! 🚀
English

















