Find the kth smallest element in an array.
Input: nums = [3,2,1,5,6,4], k = 3
Output: 3
Leetcode
Reframing 1.0
- This question asks us to find the kth smallest element in an array. For example, given the array [3,2,1,5,6,4] and k = 3, we want to find the 3rd smallest number (1st = 1, 2nd = 2, 3rd = 3).
- Let's consider a simpler value for k = 1, in order to build intuition for this question.
- In this case, the kth smallest element is equivalent to the 1st smallest element. Also known as the minimum.
- To find the minimum value in an array, we can sort the entire array and return the first element. However, this approach has a time complexity of O(nlog(n)) at best.
- Is there a better way to find the minimum value in an array?
- We are doing a-lot of extra work when sorting the whole array.
- One technique you might be familiar with is to traverse the entire array and use a variable to store the minimum value. Simply update this variable as you find smaller values in the array. The time complexity of that is O(N). Way better than
O(nlog(n))
. - Imagine the minimum value as a bucket. We only remove from the bucket when we encounter a smaller number, updating the value in the bucket.
Input: nums = [3,2,1,5,6,4], k = 1
Output: 3
def find_minimum(nums):
nums.sort()
return nums[0]
myList = [3, 2, 1, 5, 6, 4]
minimum = myList[0] # start with the first element as the minimum
for num in myList:
if num < minimum:
minimum = num # update the minimum if a smaller value is found
print(minimum) # output the minimum value
A step further
- Let's increase the value of k. When k equals 2, we are looking for the second smallest value. Building on the intuitions from previous section. To find the second smallest value in an array, you can sort the array and return the element at index 1.
- Is it necessary to sort the array? Sorting takes NlogN time. In the previous section, we optimized the process using the minimum variable. Can we apply a similar optimization here?
- We cannot store two items in a single variable, What do we do?
- List is a variable with multiple values, we can use a list to store two smallest items(or minimums).
- We can store the two smallest items in a list and keep adding elements from the array. If incoming elements are smaller than those in the list, we can simply replace the items in the list with the incoming items.
- We will try to maintain this condition.
- For naming sake we will call this list a bucket—A bucket of minimums.
- Let's examine some cases where we have an initial bucket and an incoming value.
- The incoming item(value of 1) is smaller than all elements in the bucket.
- The incoming item(value of 1 ) is than the maximum(value of 4 ) in the bucket
- The incoming item(value of 5 )Smaller than none of the items in the array
- Let's see if the invariance holds for all values of k.
- A casual Proof: By loop invariance
- We need to demonstrate that, at every iteration of the loop, the bucket always contains the k smallest elements from the array that has been traversed. Thus, at the end of the array traversal, the bucket will contain the k smallest elements in the array.
- Initialization
- To begin, we create a bucket of size k and add the first k items from the array to it. Since both the array and the bucket are of size k, we can conclude that the bucket contains the k smallest items from the array that have been traversed.
- Maintenance
- We need to demonstrate that at the end of each iteration, the bucket still holds the k smallest items from the traversed array.
- A test of the element at k + 1.
- Players
e
element to addr
random element in the bucketm
maximum item in the bucket- If element
e
is greater than all elements in the bucket. - No change, invariance holds
- If the element
e
is less than any random item in the bucketr
- To maintain the size of the bucket at k and have the smallest k elements in the bucket, we need to remove y and make room for e.
- Proof by contradiction .
- If
e
is smaller thany
, it will also be smaller than the maximum (assuming that the maximum is noty
). - Doing so will result in the maximum item still being in the bucket. Why is that a problem. If y is smaller than the maximum. We have kicked a smaller item out of the bucket which might be have been our k smallest elements, the variance will not hold.
- So can we remove the maximum every-time.
- If you have to remove an element to make room for a smaller element, it's best to remove the maximum. This is because the maximum is always the largest element in the bucket and will always be larger than all other elements in the bucket, including the incoming element. By removing the maximum, we ensure that the size of the bucket remains k and that the invariance holds.
- Termination
- At the end of the array traversal, the bucket contains the k smallest elements in the array. Therefore, the bucket always contains the k smallest items from the array that has been traversed. The invariance holds for all iterations of the loop.
- Building the solution.
- Start by initializing the minimum bucket of size k as "minimumBucket.”
- Iterate over the rest of the list.
- If the item coming in is smaller than any random item.
- If the number coming in is smaller than any. Find the max Item.
- Remove the max item and append the number coming in
- Where will the kth smallest be, the largest element in the minimumBucket.
- Can sort and return the last element
Ok(log(k))
- Or return the maximum from the bucket O(k)
Input: nums = [3,2,1,5,6,4], k = 2
Output: 2
def find_minimum(nums):
nums.sort()
return nums[1] # k = 2
The list always contains the k (2 in this case) smallest items from the array that has been traversed (looked at).
bucket = [4,2]
number to add = 1
bucket = [2,1]
number 4 is kicked out
bucket = [2,4]
number to add = 3
bucket = [2,3]
number 4 is kicked out
bucket = [2,4]
number to add = 5
bucket = [2,4]
No change...
The list always contains the k smallest items from the array that has been traversed(looked at)
bucket = [2,4]
number to add = 5
bucket = [2,4]
No change...
bucket = [4,2]
number to add(e) = 1
smaller than 2(random) also smaller than 4(maximum)
bucket = [4,1] # invarience doesn't hold
The list always contains the k smallest items from the array that has been traversed.
minimumBucket = []
for i in range(k):
minimumBucket.append(nums[i])
for comingInIndex in range(k,len(nums)):
smallerThanAny = False
numberComingIn = nums[comingInIndex]
for i in range(len(minimumBucket)):
if numberComingIn < minimumBucket[i]:
smallerThanAny = True
break
if smallerThanAny:
maximumInBucket = float("-inf")
maxIndex = -1
for m in range(len(minimumBucket)):
if maximumInBucket < minimumBucket[m]:
maximumInBucket = minimumBucket[m]
maxIndex = m
minimumBucket.pop(maxIndex)
minimumBucket.append(numberComingIn)
minimumBucket.sort()
return minimumBucket[-1]
return max(minimumBucket)
Implementation 1.0
def kthSmallest(self,nums,k):
minimumBucket = []
for i in range(k):
minimumBucket.append(nums[i])
for comingInIndex in range(k,len(nums)):
smallerThanAny = False
numberComingIn = nums[comingInIndex]
for i in range(len(minimumBucket)):
if numberComingIn < minimumBucket[i]:
smallerThanAny = True
break
if smallerThanAny:
maximumInBucket = float("-inf")
maxIndex = -1
for m in range(len(minimumBucket)):
if maximumInBucket < minimumBucket[m]:
maximumInBucket = minimumBucket[m]
maxIndex = m
minimumBucket.pop(maxIndex)
minimumBucket.append(numberComingIn)
return max(minimumBucket)
Complexities
Time Complexity: O(nk)
Space Complexity: O(k)
Reframing 2.0
- The time complexity of the code above is O(NK). This can still be significant, especially if the value of K increases.
- Previously we proved that the maximum will always be taken out when the incoming item is smaller than any items in the bucket, to maintain the invariance property.
- Do we need to check all elements? —A greedy approach.
- For an incoming new element. We only need to check the maximum number in the bucket. (Instead of all the numbers). Why
- If the incoming element is smaller than the maximum, then it will also be smaller than an some element in the array if that element is not the maximum.
- If the incoming element is greater than the maximum then it won't be smaller than any item in the bucket.
- Thus, we can simplify the code significantly by removing the maximum and adding the new item if it is smaller than the maximum. If the incoming item is greater than the maximum, we do not need to remove any element.
- We are removing the maximum every-time. We iterate over the entire bucket to find the maximum value, which takes
O(len(bucket)) = O(k)
and then extract it. Can we do better? - Let's consider a different perspective. Instead of iterating through the entire bucket, what if we keep a sorted bucket and check the maximum as the last element of that bucket? This approach could work, but it would cost us O(klog(k)) at each iteration to maintain the sortedness. This cost is more than O(k). Consequently, the total cost would increase to
O(N( klog(k)))
, which is a little more expensive than O(NK) for some cases. However, this approach could provide valuable insights for optimizing the solution further. Let's implement it and see how we can use this insight in the next section.
The list always contains the k smallest items from the array that has been traversed.
Complexities
Time Complexity: O(n(klog(k))
Space Complexity: O(k)
Implementation 2.0
class Solution:
def kthSmallest(self,nums, k):
minimumBucket = []
for i in range(k):
minimumBucket.append(nums[i])
# Time : O(klog(k))
minimumBucket.sort()
for comingInIndex in range(k,len(nums)):
smallerThanMax = False
maxIndexOfTheBucket = minimumBucket[-1]
numberComingIn = nums[comingInIndex]
if numberComingIn < maxIndexOfTheBucket:
smallerThanMax = True
if smallerThanMax:
minimumBucket.pop() # last element = max
minimumBucket.append(numberComingIn)
# Time : O(klog(k))
minimumBucket.sort()
return minimumBucket[-1]
Reframing 3.0
- Up to this point, we have been iterating over the array, adding smaller qualified items while to make room for smaller items removing maximums from the bucket.
- We are popping and adding new items to the bucket, we need to keep it sorted. However, sorting the entire array is too expensive since we only need the maximum value. Instead, we can use a heap data structure, which we will discuss in detail in this roadmap.
- The heap data structure allows us to maintain partial sorting of the data. Which reduces the cost of sorting while still allowing us to access the maximum or minimum values. It is a good middle ground between the cost of maintaining a complete order and the cost of searching through random chaos.
- Comparisons visualized.
- When adding a new item 23 and returning the maximum value, a sorted array takes
O(klog(k))
time instead oflog(k)
in a heap. We will simulate adding a new number to both data structures to see how they react. For a detailed explanation of heaps, see A Heap Upclose: Simulating a Max Heap. - A sorted array (Play the video).
- Will have to sort the whole array to find the actual position of the element, the simulation below is using Merge sort (a fast sorting algorithm).
- A heap (Play the video)
- Although not fully sorted, a heap data structure allows us to access the greatest or smallest element in constant time (O(1)). Additionally, it can provide the maximum element again after the addition of a new element in O(height of the heap) time, which is faster than an array that needs to be completely re-sorted.
- One thing to note about Python is that we only have access to min heaps (apart from coding them from scratch is using the default
heapq
module ). To convert a min heap into a max heap, we can multiply all elements going into the heap by -1. This brings all the original maximum values to the top because as they become the most minimum. - For example, to find the largest element in a collection of items
[1, 7, 8]
, we can convert the array to[-1, -7, -8]
. The largest element will then be at the top of the heap, which is8
, the minimum-8
in the context of a min heap. - When using the values for comparison we multiply it again by -1 to neutralize the effect of the previous multiplication.
Implementation 3.0
from heapq import heapify, heappop,heappush
class Solution:
def kthSmallest(self,nums, k):
minimumBucket = []
for i in range(k):
heappush(minimumBucket,-1 * nums[i])
# Time : O(N)
for comingInIndex in range(k,len(nums)):
smallerThanMax = False
maxOfTheBucket = -1 * minimumBucket[0]
numberComingIn = nums[comingInIndex]
if numberComingIn < maxOfTheBucket:
smallerThanMax = True
# Time : O(log(k))
if smallerThanMax:
heappop(minimumBucket)
heappush(minimumBucket,-1 * numberComingIn)
return -1 * minimumBucket[0]
Complexities
Time Complexity: O(nlog(k))
Space Complexity: O(k)
Iterating over n items while having at most k item in the heap. Adding and removing from the heap will be O(logK) for n times in the worst case it will be O(nlog(k)).