Introselect algorithm
Introselect
When implementing a beam search, one must track the top K elements to prune the beam. One algorithm for this is Introselect, implemented in the C++ std library as nth_element.
Heaps
Heaps can also select the top-k elements. Push all elements into the heap, then pop k times. Useful for something like, ORDER BY score LIMIT k
to pull the top-k elements.
In C++ we can use: make_heap.
Sketches
Sketches can scale to handle large data streams. KLL Sketch enables distribution tracking for large streams of data. https://github.com/apache/datasketches-cpp