Question
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
- For example, forÂ
arr = [2,3,4]
, the median isÂ3
. - For example, forÂ
arr = [2,3]
, the median isÂ(2 + 3) / 2 = 2.5
.
Implement the MedianFinder class:
MedianFinder()
 initializes theÂMedianFinder
 object.void addNum(int num)
 adds the integerÂnum
 from the data stream to the data structure.double findMedian()
 returns the median of all elements so far. Answers withinÂ105
 of the actual answer will be accepted.
Example 1:
Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]
Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.addNum(2); // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3); // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0
Leetcode
Let’s do some reframing
- What is a median?
- The median is the value at the exact center of a data set; half of the data points have a value smaller than or equal to it, and the other half have a value greater than or equal to it.
- Does the mid value(median) need to be taken on sorted dataset?
- Yes, however in our case the numbers come in randomly.
- So we will have to sort the dataset before we can take the median? Yes that is a possible way. 🤔
- Does the dataset's evenness or oddness affect the median?
- Odd — The exact middle number
- Even— average of the two middle numbers
Intuition
- We need to calculate the median, we we need to have the middle numbers or numbers—based on the evenness of the dataset.
- The numbers are coming in a random order, so we will have to maintain the sortedness of the array with each element added. Find its length and, based on whether it is even or odd, return the median. This will cost us
O(nlogn)
each time we add a number, which can be expensive. Finding the median will be a constant time operation! - We can also not sort the whole array but find the insertion point as we did in Search Insert Position(Sorted Array) Upper Bound which will cost us O(log n), but to insert the element, we will have to move every item after it by one. In the worst case, this can be O(N).
- Does the whole array need to be sorted for us to find the median, with the way we are doing it Yes. But is there a better way?
- What if we divide the data into two halves; the first half containing 50% of the values and the second half containing the other 50%. We don’t need the whole half sorted just the maximum from the smallest half and the minimum from the larger half. We are assuming the data set is even at present.
- Since we only need the min and the max from each half. What data structure can we use to maintain this infrastructure with minimal cost?
- Small pile—max value, dynamically changing the max value based on new data coming in. Will use a max heap.
- Large pile—min value, dynamically changing the min value based on new data coming in. Will use a min heap.
- How will be sure where the incoming element will go?
- Let’s assume the dataset is as that is above dataset = [1,2,3,4,5,6,7,8,9,10]. An element comes in where should it go?
- 11 : Compare the incoming number to the maximum element in the smaller half (in this case, 5). Since it is larger than 5, it will be larger than all elements the smaller pile. Therefore, it should be added to the larger pile.
- 2 : It is smaller maximum element in the smaller half (in this case, 5), so will be smaller than all elements in the larger pile. Therefore, it should be added to the smaller pile.
- If both heaps are empty at the start of the system, we cannot compare the first element with the top element in the smaller heap. Where should we add the incoming element?
- We can add the incoming element to either heap, but in our case, we will start with the smaller one. This is because we will be using the max from that heap to compare where the future incoming elements will go.
- Remember to add a negative sign to simulate a max heap in python.
- Let’s run the examples we discussed above.
- Let's add 11 to a dataset [1,2,3,4,5,6,7,8,9,10]. In addition to adding it to the large pile it also made the dataset odd. This will change the median. But where will it be it will be? Min value in the larger half. Why? As the mid of the dataset will not fall exclusively in the larger half.
- Let's add 2 to a dataset [1,2,3,4,5,6,7,8,9,10]. In addition to adding it to the small pile it also made the dataset odd. Similar to the adding 11. This will also change the median.The median will become the max value in the smaller pile—The half that has more values.
- Where is the median?
- When the elements are unevenly distributed (one heap has more elements than the other), the median is the element at the top of the heap with more elements.
- When the elements are evenly distributed, the median is the average of the maximum and minimum of the two heaps.
- From an evenly separation, then to odd, and then back to an even rebalance.
- Let’s consider adding two items to the dataset, we don't know where they go. In a case where both items go into the same heap, rebalancing is necessary. For example, if we add 11 and 12 to the dataset, the smaller heap holds 5 items and the larger one contains 7. This moves the elements used to calculate the median further down, such that the median of the values below will be 6.5 (an average of 6 and 7). Since 7 is not accessible at the top, the heap solution loses its purpose. In this case, the heap does not represent the halves in the dataset.
- In the case below, an element must be removed from the larger heap. The value 6 must be moved to the smaller half to rebalance, as it is the smallest number and now belongs in the smaller half.
- Normal —> Overflow —> Normal
- The larger heap will have to give, it gives its smallest number so the halves are rebalanced.
- When the smaller pile has two more elements than the larger one, we will do something similar: take the elements from the overflowing heap and add them to the larger one. Will rebalance the heaps after each addition.
def __init__(self):
self.small = [] #max heap
self.large = [] #min heap
def addNum(self, num: int) -> None:
starting = len(self.small) == 0 and len(self.large) == 0
belongsToSmall = not(starting) and - 1 * self.small[0] > num
if starting or belongsToSmall:
heappush(self.small,-num)
else:
heappush(self.large,num)
def findMedian(self) -> float:
evenDataSet = len(self.small) == len(self.large)
if evenDataSet:
first = -1 * self.small[0]
second = self.large[0]
result = (first+second)/2
else: #odd dataset
if len(self.small) > len(self.large):
result = -1 * self.small[0]
else:# len(self.small) < len(self.large):
result = self.large[0]
return result
def rebalance(self) -> None:
smallIsOverFlowing = len(self.small) - len(self.large) == 2
largeIsOverFlowing = len(self.large) - len(self.small) == 2
if smallIsOverFlowing:
value = heappop(self.small)
heappush(self.large,-1 * value)
if largeIsOverFlowing:
value = heappop(self.large)
heappush(self.small, -1 * value)
Implementation
from heapq import heappop,heappush
class MedianFinder:
def __init__(self):
self.small = [] #max heap
self.large = [] #min heap
def addNum(self, num: int) -> None:
starting = len(self.small) == 0 and len(self.large) == 0
belongsToSmall = not(starting) and - 1 * self.small[0] > num
if starting or belongsToSmall:
heappush(self.small,-num)
else:
heappush(self.large,num)
self.rebalance()
def findMedian(self) -> float:
evenDataSet = len(self.small) == len(self.large)
if evenDataSet:
first = -1 * self.small[0]
second = self.large[0]
result = (first+second)/2
else: #odd dataset
if len(self.small) > len(self.large):
result = -1 * self.small[0]
else:# len(self.small) < len(self.large):
result = self.large[0]
return result
def rebalance(self) -> None:
smallIsOverFlowing = len(self.small) - len(self.large) == 2
largeIsOverFlowing = len(self.large) - len(self.small) == 2
if smallIsOverFlowing:
value = heappop(self.small)
heappush(self.large,-1 * value)
if largeIsOverFlowing:
value = heappop(self.large)
heappush(self.small, -1 * value)
Complexities
Time complexity: O(n*log(n))
Space complexity: O(n)
n
= numberOfItemAdded