Prerequsites : kth smallest element in an array
Question
Design a class to find the kth
largest element in a stream. Note that it is the kth
largest element in the sorted order, not the kth
distinct element.
Implement KthLargest
class:
KthLargest(int k, int[] nums)
Initializes the object with the integerk
and the stream of integersnums
.int add(int val)
Appends the integerval
to the stream and returns the element representing thekth
largest element in the stream.
Example 1:
Input
["KthLargest", "add", "add", "add", "add", "add"]
[[3, [4, 5, 8, 2]], [3], [5], [10], [9], [4]]
Output
[null, 4, 5, 5, 8, 8]
Explanation
KthLargest kthLargest = new KthLargest(3, [4, 5, 8, 2]);
kthLargest.add(3); // return 4
kthLargest.add(5); // return 5
kthLargest.add(10); // return 5
kthLargest.add(9); // return 8
kthLargest.add(4); // return 8
Leetcode
Reframing
- In the previous questions we were given an array without any values being added from a strea. With an array, you can sort it and return the kth element either from the start or the end, depending on whether you want the minimum or maximum value. The time complexity for that will be
O(n(log(n))).
Using a method like that with a fixed size array is bad, but it's even worse when dealing with a stream. When a new element is added to an array for finding the kth largest/smallest element, the array needs to be sorted again. This process must be repeated for each new element, which increases the complexity toO(n*nlogn)
. - This results in multiple values for the kth element. Moreover?
- Example: Adding 23 to an already sorted array requires sorting the entire array again.
- This question is very similar to the previous two. In "The greedy merchant—'No one is wise from another man's woe,'" we created a heap of the k most expensive items by adding them to a min heap. We needed a min heap because we had to remove the least valuable item to make room for a new valuable item if it came. We needed the least valuable item at the top, which is only possible with a min heap. Similar to the "Kth smallest element in an array" problem, we can use a max heap instead of a min heap. By removing the largest minimum and replacing it with the smaller minimum from the array.
- If you are confused about why we will use a min heap to get the maximums or a max heap to get the minimums, look at the section "A Counterintuitive Mindset?" in The greedy merchant—"No one is wise from another man's woe”.
- We can start by initializing a bucket of maximums with the largest K items up until that point. To remove the smallest of these, we'll use a min heap. After initilization we can check whether incoming items will replace any elements in the heap or bucket. If they do, we replace the minimum in the heap with the incoming item. If not, that element will never be part of the top K elements.
class KthLargest:
def __init__(self, k: int, nums: List[int]):
self.k = k
self.array = nums
def add(self, val: int) -> int:
self.array.append(val)
self.array.sort()
kthLargest = len(self.array)-(self.k)
return self.array[kthLargest]
# example
topK3 = [1,2,3] # min heap
toAdd = 4
add 4 at the expense of 1
topK3 = [2,3,4]
1 will never be in the topK3 again
Implementation
- During initialization, we add all items from
nums
to a min heap. However, we need the heap to contain only thek
largest elements at initialization and there might be more element in the intial array then k. - The minimums from
nums
will be at the top, so we remove the smaller items from the top of the heap until we have thek
largest items fromnums
in the heap. - Adding new elements from the stream
- First, we need to ensure that the heap contains K items. There may be cases where
nums
is less than K and a heap smaller than K is initialized. In such cases, we simply need to add the incoming item to the heap. - Example : Test case
- If we have at least k elements,
- If the incoming element from the stream is greater than the (smallest element in the heap) top element, we can replace that item with the incoming number.
- To retain the size of the heap as k, the item at the top will be removed in favor of the incoming item.
- The item at that was removed will never be part of the top k again.
- if the incoming item is less than the topmost element it has no power to replace it, so we disregrad it.
- To find the Kth largest item in a heap or array, simply return the item located K positions away from the maximum. Since it is a min heap the maximum will be at the bottom and k positions away from it will be the top most item in the heap.
def __init__(self, k: int, nums: List[int]):
self.k = k
self.heap = []
for i in range(len(nums)):
heappush(self.heap, nums[i])
if len(nums) > k:
while len(self.heap) > k:
heappop(self.heap)
# not enough ks
sizeLessThanK = len(self.heap) < self.k
if sizeLessThanK:
heappush(self.heap, val)
input =
["KthLargest","add"]
[[2,[0]],[-1]]
topK3 = [1,2,3]
add = 4
add 4 at the expense of 1
topK3 = [2,3,4]
1 will never be in the topK3 again
topK3 = [1,2,3]
add = 0
topK3 = [1,3,4]
0 can't be in the top 3
TopHeapValueWillNeverBeinTopKAgain = not(sizeLessThanK) and val > self.heap[0]
elif TopHeapValueWillNeverBeinTopKAgain:
heappop(self.heap)
heappush(self.heap,val)
#else: pass
return self.heap[0]
not(sizeLessThanK)
ensures that self.heap[0]
is not run, which helps us avoid a KeyError
when the heap is empty. This strategy is commonly referred to as Short Circuit Evaluation.
return self.heap[0]
Code
from heapq import heappop, heappush, heapify
class KthLargest:
def __init__(self, k: int, nums: List[int]):
self.k = k
self.heap = []
for i in range(len(nums)):
heappush(self.heap, nums[i])
if len(nums) > k:
while len(self.heap) > k:
heappop(self.heap)
def add(self, val: int) -> int:
sizeLessThanK = len(self.heap) < self.k
TopHeapValueWillNeverBeinTopKAgain = not(sizeLessThanK) and val > self.heap[0]
if sizeLessThanK:
heappush(self.heap, val)
elif TopHeapValueWillNeverBeinTopKAgain:
heappop(self.heap)
heappush(self.heap,val)
return self.heap[0]
Exmaples:
- If new data is constantly arriving or old data is leaving, and we ONLY need to return the maximum or minimum values, think of using heaps
- consider the following scenarios.
- There queue for running tasks. The highest-priority task is processed and removed from the queue once completed, while new tasks with varying priority are added to it.
- Sales conducted throughout the day, with the top three sales always available.
- The two indicators for the viability of using a heap are the need to find the Kth largest/smallest element and the use of a stream. Since both of these criteria are met in this case, using a heap is a viable solution.
To Heapify or not to Heapify
- We push all the elements to the heap, creating it. This operation has a time complexity of nlog(n).
- We can simply use heapify which cost O(N).
- O(N)??? if you are curious why it is O(N), look at the section Create O(N) / heapify in A heap upclose: Simulating a max heap.
- Heapify is most effective when the elements are known beforehand. Since we only know the "nums" numbers beforehand, we can use heapify for that case. However, we can't use it effectively for elements in a stream.
for i in range(len(nums)):
heappush(self.heap, nums[i])
self.heap = nums
heapify(self.heap)
Complexities
Time Complexity:
Non-Heapify:
Pushing all items to the heap separately has a time complexity of O(len(nums) * log(len(nums)))
.
n
is the number of items added after initialization. Adding items after initialization has a time complexity of n * log(k)
.
The overall time complexity is O(len(nums) * log(len(nums)) + n * log(k)),
The theoretical time complexity is O(n * log(k))
.
Heapify:
Heapify has a time complexity of O(len(nums))
.
n
is the number of items added after heapify. Adding items after heapify has a time complexity of n * log(k)
.
The overall time complexity is O(len(nums) + n * log(k))
.
The theoretical time complexity is O(n * log(k))
.
The space complexity is O(k)
.
O(len(nums))
is insignificant compared to O(n * log(k))
as the number of added items increases.
Implementation— Keep adding elements till k + 1
- There is another way to write the code that keeps adding items until the total number of elements surpasses k.
- Let's imagine we are adding a new element. We add it to the heap, and when the size of the heap is k + 1, we remove the smallest element from the heap to balance it back to k.
- Bonus: Use the
add
method to add all existing values innums
to the heap, thus reducing the amount of code. - Always add the incoming value to the heap. If the heap contains more than
k
elements, remove the smallest element by popping the top item. - This will allow us to keep only the k largest elements in the heap. The top element will be the kth largest, which can then be returned.
k = 3
nums = [1,2,3]
# adding 1,2,3 before size reaches k
heap = [1,2,3]
# adding 4
heap = [1,2,3,4]
# size exceeded, remove the smallest to have the *k* largest elements
# remove 1 (smallest of the kLargestElements)
heap = [2,3,4]
Code
from heapq import heappop, heappush, heapify
class KthLargest:
def __init__(self, k: int, nums: List[int]):
self.k = k
self.heap = []
for number in nums:
self.add(number) # call add instead of heap's built in function
def add(self, val: int) -> int:
heappush(self.heap, val)
if len(self.heap) > self.k:
heappop(self.heap)
return self.heap[0]
Complexities
Time Complexity: O(n * log(k))
Space Complexity: O(k)
Dedications
To peak or not to peak
- Both of the solution’s for add above are doing the same thing, but in the way they do it is a little different.
- Keep adding and popping.
- Add the incoming value and remove the smallest element (if the heap contains more than
k
elements). This leavesk
largest items in the heap, and since it is a min heap, the smallest amongst them becomes theKth
largest. - Peaking and adding.
- We must be careful when both peaking and then adding. Before adding incoming items, we should check if they are useful for determining the Kth largest elements. For example, if K is 3 and the heap contains 2, 3, and 4, then an incoming item of 1 will never be included in the K largest values. Since the heap is a min heap of size K, and the incoming item is smaller than the smallest item in the heap, it is automatically smaller than the top K items and is discarded.
- For peaking and adding. We need to ensure that we do not pop out items when we still do not have k values in the heap. We don’t have to worry about that in the add method as we keep adding items till k + 1.
k = 3
heap = [2,3,4]
incoming stream = 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
k = 2
heap = [0]
incoming stream = 1
#make sure zero is not kicked out in favour of 1 before we have reached the size 2(k)
Add (and pop)
Peak and then add(and pop)
def add(self, val: int) -> int:
heappush(self.heap, val)
if len(self.heap) > self.k:
heappop(self.heap)
return self.heap[0]
def add(self, val: int) -> int:
sizeLessThanK = len(self.heap) < self.k
TopHeapValueWillNeverBeinTopKAgain = not(sizeLessThanK) and val > self.heap[0]
if sizeLessThanK:
heappush(self.heap, val)
elif TopHeapValueWillNeverBeinTopKAgain:
heappop(self.heap)
heappush(self.heap,val)
return self.heap[0]
But what about the time complexities
- They both will theoretically be the same as they both can have different worst cases. But in some application the peak and then add might have some benefits.
- Imagine you have an arcade machine and the score distribution is skewed towards smaller scores but you want the top 8 items in the score board. While there are incoming scores.
- It is better to
peek
first and then add, rather than adding and thenpopping
. For items that will eventually be discarded, the logarithmic complexity of adding and popping isO(log(k))
, whereas peeking first and not adding it into the heap isO(1)
.