The problem is to find the “running median” for an input stream of integers. That is, given the input
12, 4, 5, 3, 8, 7
The median after we have read the 12 is: 12
The median after we have read the 4 (so far we have 4, 12) is: 8 ((4+12)/2)
The median after we have read the 5 (so far we have 4, 5, 12) is: 5
One possibility is to keep a sorted linked list. Then, for each new element we have to insert it sorted and find the new median. This would be O(n) for each element, which is expensive considering that the input may contain up to 105 elements.
We can observe that for each new element either the median needs to consider one more element to its right (or larger) and/or one more element to its left (or smaller). That is, it would be very useful if we could maintain at each stage the largest and smallest numbers at each side of the list respectively.
This is where heaps come in useful. We can keep the largest and smallest element in a Maximum Priority Queue or Minimum Priority Queue, while updating the values efficiently (O(log n)).
Then the algorithm has to consider a few cases:
If both heaps are empty, insert it the new element to any one of them and return this value as the median.
If the left-side heap contains more elements than the right-side heap then:
If the new element is greater than the maximum of the left-side heap, insert the new element in the right-side heap.
Otherwise, 1-insert the new element in the left-side heap, and 2-move the new maximum of the left-side heap to be the minimum of the right-side heap
The similar changes are done if the right-side heap has more elements than the left-side heap. This keeps the two heaps balanced (with a difference of at most one)
If both have the same number of elements, then the new element is added to the one it would naturally go. For instance, in our last scenario, a “2” would be inserted in the left-side heap.
Then, at any given moment, the median is very easy to calculate:
- The heaps have a different number of elements and we take the one with one more element
- The heaps have the same number of elements and we take the average of the maximum of the maxHeap and the minimum of the minHeap.
An initial solution implementing these ideas is available here (Solution.java).