Return the kth smallest element in an unsorted array.
array = [7,2,1,9,3,6,8] k = 2
Brainstorming
- The simplest way to achieve this is by sorting the array using an efficient sorting algorithm and returning the Kth element from the start. This approach has a time complexity of O(nlogn).
- There are other ways to find the Kth smallest element in an array. The second method involves using a max heap to maintain a collection of the Kth smallest elements. We iterate over the array, maintaining the Kth smallest element with each iteration by pushing the largest element out to make way for a smaller one. When the whole array is traversed, we have the Kth smallest element and return the element on the top of the heap which happens to be the Kth smallest element. We used this method in our previous document on finding the Kth smallest element in an array. This method has a time complexity of O(nlog(k)).
- There is another method called quickSelect, which is similar to the Quicksort algorithm. In this tutorial, we will explain how to implement quickSelect. There are two parts to this algorithm: finding the correct location of the pivot and using that information to reduce the range where the Kth smallest element can be found.
sorted = [1,2,3,6,7,8,9], k = 2
output = 2 at index 1
array = [7,2,1,9,3,6,8]
array.sort()
output = array[k-1]
class Solution:
def kthSmallest(self,nums, k):
nums = arr
minimumBucket = []
for i in range(k):
heappush(minimumBucket,-1 * nums[i])
# Time : O(N)
for comingInIndex in range(k,len(nums)):
smallerThanMax = False
maxOfTheBucket = -1 * minimumBucket[0]
numberComingIn = nums[comingInIndex]
if numberComingIn < maxOfTheBucket:
smallerThanMax = True
# Time : O(log(k))
if smallerThanMax:
heappop(minimumBucket)
heappush(minimumBucket,-1 * numberComingIn)
return -1 * minimumBucket[0]
More about Quick-select
- We choose a pivot and move it to its original location.
- How will finding the correct location of the pivot help us? It is in how we will find the pivots correct location.
- To find the original location, we move all the elements less than it to one side and all the elements greater than it to the other side.
- After we have done that. What can we know about the kth smallest element by dividing the array into two halves.
- If the
pivotIndex + 1
is equal tok
, then thekth
smallest element has been found. - Suppose Kth Smallest = 5, since 7 is also the 5th smallest element we have found it.
- If the
pivotIndex + 1
(4) is greater thank
then thekth
smallest element is in the left half of the array. - Imagine Kth Smallest = 2. We know that it will be in the lesser half since 7 is the 5th smallest element and all elements smaller than it will be in the lesser half.
- If the
pivotIndex + 1
(4) is less thank
, then thekth
smallest element is in the right half of the array. - Suppose Kth Smallest = 6. We can infer that it will be in the greater half of the elements, since 7 is the 5th smallest element and all elements greater than 7 will be in the greater half.
- Why
pivotIndex + 1
? ThepivotIndex
is zero-index based, while the kth element is one-index based. - We repeatedly keep partitioning the array around the pivot until the pivot is in its original location. With each partition, due to the reducing possible location there is a more likely chance that our pivot and Kth smallest element will be in the same location. Once this is the case, we have found our Kth smallest element.
- Imagine that the K value is less than the pivot at index 4. In this case, we would look for the K value in the "less than" half. This reduces the worst-case scenario for finding the pivot from seven iterations to four iterations.
- After selecting 2 as the pivot and determining its correct location, we separate the halves again. With each partition, the number of elements we must examine to find the correct location of the pivot decreases. When we are left with only one element, we can be certain that it is the Kth smallest element. At this point, the recursion stops. If we haven't found the Kth smallest element before reaching this point, then we have definitely found it after the recursion stops.
- Imagine k again was less than the pivot, we will have only one element left. We have for sure found our pivot.
- Comparisons decrese with each partition
Code for the partitions
- Calculate the pivot and see what its relationship with the Kth smallest is
- We calculate the pivot again by passing the left and right pointers to the recursive function and creating an abstract partition of the array.
- The k value remains the same as long as the index values remain consistent.
class Solution:
def kthSmallest(self,array, left, right, k):
onlyOneElement = left == right
if onlyOneElement:
return array[right]
pivot = self.pivot(array,left,right) # Section below
pivotAndKthSame = k == pivot + 1
kthSmallerThanPivot = k < pivot + 1
kthLargerThanPivot = k > pivot + 1
if pivotAndKthSame:
return array[pivot]
elif kthSmallerThanPivot:
leftBound = left
rightBound = pivot-1
return self.kthSmallest(array,leftBound,rightBound,k)
elif kthLargerThanPivot:
leftBound = pivot+1
rightBound = right
return self.kthSmallest(array,leftBound,rightBound,k)
A treatises into finding the correct location of the pivot
- Our goal is to find the correct position of the pivot.
- What will be our pivot?
- There are different ways to approach this. Some people choose the pivot to be the rightmost index, while others choose the leftmost index. In our case, we will choose the leftmost index (0) as the pivot.
- How will we find the correct location of the pivot?By putting all the elements greater than the pivot on one side, and all the elements smaller than the pivot on the other side, we can determine the correct location for the pivot. This holds true regardless of the sortedness of the halves.
- To find the correct position of the pivot, start by considering the entire array as the initial range where the pivot can be located. We reduce the range from the left by swapping the left pointer(pivot) with elements that are less than the pivot. Swapping brings them to the smaller half as the pivot moves rightward. We reduce the range from the right by moving the right pointer to the left if the element at the right pointer is larger than the pivot. This means we have discarded that value as a possible pivot and placed it in the larger half. We converge the left and right pointers until the range for where the pivot can be becomes one, which is the original location of the pivot.
- Diving into the details.
- To reduce the range from the left, compare the left pointer's value with the one at left + 1, and swap their positions if the one on the left + 1 is smaller.
- Case 1: If
array[left] >= array[left+1]
, then left + 1 is smaller or equal (we are assuming the equal elements to the pivot will be in the smaller half) - Lets see it in action. For an array
[7,2,X,X,X,8]
where X’s are random elements. - We need to swap 7 and 2 to move 2 to the correct side of the pivot. Then, we can move the left pointer to the right since 2 is now in the correct place relative to the pivot, shortening the range—moving the pivot closer to its original location
- A casual proof on shortening the range
- Since the pivot can't be outside the range of the array, it also can't be on the left side of the left pointer.
- In an array, the pivot location can either be at the initial left pointer or to its right. For example, in the array
[1(L), 2, 3, 4]
, the pivot will remain in its original position, as all elements greater than it are in the larger half already. However, in the case of[2(L), 1, 3, 4]
, the pivot will have to move to the right. - If
array[left]
(7)>= array[left+1]
(2): - The pivot is not on the left side of the left pointer.
- The pivot is not in its original location(since there are elements to the right of it smaller than it i.e 2).
- It must belong to the right side.
- We can reduce the range of the pivot on the left side by moving the left pointer one position to the right.
- Case 2: if
array[left]
(7)< array[left+1]
(10), thenleft + 1
is larger. - Lets take another example of an array. The element at index left + 1 with a value of 10 is currently on the correct side, we cannot be sure if it will remain on the right side of the pivot when the pivot is restored to its original location. We can’t be fully sure how to reduce the range with certinity. What can we do? we leave it for now things will become clear in the next points.
- To reduce the range from the right, compare the left pointer with the right pointer. The right pointer can be moved more freely to reduce the range from the right since it doesn't involve any swaps. range is reduced by moving the elements on the right pointer to the larger side by moving the rigth pointer to the left.
- If
array[left] < array[right]
, then the element on the right pointer is greater than the left pointer (pivot). - The number 8 (on the right pointer) is greater than the pivot 7 (on the left pointer). Therefore, we can be sure that the pivot belongs on the right side, regardless of where the pivot ends up.
- When starting from the rightmost index, the pivot cannot be on the right side of the right pointer. It will always be on the left side or where the right pointer is(in the case of [4,3,2,1]).
- When the value at
array[right]
(8) is greater than the value atarray[left]
(7), it means that the value atarray[right]
(8) will be placed on the right side of the pivot. - The pivot cannot be on the right side of the right pointer.
- Can’t be on the location of where the right pointer is .
- Pivot will need to be larger than the value at the right pointer to be in its place. i.e [4(L),3,2,1(R)]) 4 willl have to move all the way to the position to the right pointer. Since it is larger than the pivot. We can be sure the pivot will not be on the location of 8
- Thus, it is safe to shift the right index to the left.
- Case 2 if
array[left] >= array[right]
right is smaller or equal to the left pointer(pivot). - We cannot move the right pointer because 4 cannot be on the right side of the pivot. How about another idea? What if we swap it with 7? This will place 7 in the incorrect location and shorten the range to 1. This doesn’t work, we will leave it for now and continue with the partitioning process.
- Locations where we were not able to shorten range.
- From the moving the left pointer to the right: In the case below we couldn’t be sure whether 10 will be on the right side of the pivot when the pivot is restored to its original location. We had left it due to this reason
array[left] < array[left+1]
- In another case of moving the right pointer to the left: We are also not sure if 4 should be on the left side of the pivot. If we swap it with 7, it will end up in the wrong location and shorten the range to 1. case
array[left] >= array[right]
- The only way these conditions are true,
- When both of these conditions are false, the conditions above will be true(as they will be their else cases). If both of these conditions are false what will happen, we will not be able to shorten the range from either the right or the and will be stuck in an infinite loop.
- We will rectify this by swapping the value of left + 1 with that of right.
- The statement
array[left] < array[left+1]
will no longer be true. In the next iteration, L(P) will be able to move, since this statement will become true:leftOfThePivot = array[left] >= array[left+1]
. - In the previous case where
array[left] >= array[right]
(case 10), the same behavior occurs: the condition is not true any more and allows forrightOfThePivot = array[left] < array[right]
, enabling us to move the right pointer. In the subsequent iterations after the swap, we can choose to shorten the range from either side, will depend on how we order our if else statements. - If we cannot move the range from either the left or right, it means that the element at left+1 is larger and the element at right is smaller, than pivot. In this scenario, we will swap these elements to break the deadlock and continue moving the range towards a converging point.
- How will we know we have found the pointer? When L == R
- When we have moved all the elements belonging to the right side by reducing the range from the right and the elements on the left side by moving the range from the left.
[7,2,1,9,3,6,8]
[X,X,X,X,7,X,X]
^ original position
#7 is the fifth smallest element and will be at index 4.
[7,2,1,9,3,6,8]
^ ^
L(P) R (initial range)
[2,1,6,3,7,9,8]
^P (original position(4th index))
[7,2,X,X,X,8]
^ ^
L(P) R
[7,2,X,X,X,8]
^ ^
L(P) R
^pivot can't be on the left side
[7,2,X,X,X,8]
^ ^
L(P) R
[2,7,X,X,X,8]
^ ^
L(P) R (swap 2 with 7)
[7,10,X,X,X,4]
^ ^
L(P) R
[7,2,X,X,X,8]
^ ^
L(P) R ^pivot not possible on the right
[7,2,X,X,X,8]
^ ^
L(P) R
[7,2,X,X,X,8]
^ ^
L(P) R (reduce the range from the right)
[7,2,X,X,X,4]
^ ^
L(P) R
[4,2,X,X,X,7]
^
L(P)
^
R
[7,10,X,X,X,4]
^ ^
L(P) R
[7,10,X,X,X,4]
^ ^
L(P) R
array[left] < array[left+1] # (in the case of 10)
array[left] >= array[right] # (in the case of 4)
leftOfThePivot = array[left] >= array[left+1]
rightOfThePivot = array[left] < array[right]
[7,10,X,X,X,4]
^ ^
L(P) R (swap 4 and 10)
[7,4,X,X,X,10]
^ ^
L(P) R
[4,7,X,X,X,10]
^ ^
L(P) R
[7,10,X,X,X,4]
^ ^
L(P) R (swap 4 and 10)
[7,4,X,X,X,10]
^ ^
L(P) R
[7,4,X,X,X,10]
^ ^
L(P) R
Code for finding the pivot
def pivot(array,left,right):
while left < right:
leftOfThePivot = array[left] >= array[left+1]
rightOfThePivot = array[left] < array[right]
if leftOfThePivot:
swap(array,left,left+1)
left+=1
elif rightOfThePivot:
# don't need a swap here since we are not moving the pivot
right-=1
else: # array[left] < array[left+1] and array[left] >= array[right]
swap(array,left+1,right)
return array
def swap(array,a,b):
temp = array[a]
array[a] = array[b]
array[b] = temp
SandBox
Time complexity
Info: "n" represents the most number of comparisons needed to find the pivot. This value is derived from the while loop that searches for the pivot.
- Worst Case O(n^2).
- When we chose the kth value such that we are able to reduce the partion by only one.
- The expression
n + (n-1) + (n-2) + (n-3) + (n-4)
converges to(n^2 + n)/2
. The time complexity of this convergence is O(n^2). - Average case O(N)
- We are assuming the pivot will be in the middle of the array(given the random distribution of the values), this will half the array with each recursion.
- The sum of this series will never exceed 2*N:
n + n/2 + n/4 + n/8 + n/16
converges to 2n - 1 which theoratically in time complexity is n. (Link)- Visually explained, after 1/64, all partitions can fit within the white space. The total area for
n/2 + n/4 + n/8 + n/16
combined will never exceedn
.
Find Kth largest
To determine the Kth largest value, you can use the Kth smallest function and then map out the corresponding K smallest values with respect to the Kth largest. This will provide the answer you are looking for.
class Solution:
def findKthLargest(self, nums: List[int], k: int) -> int:
kthSmallest = len(nums) - k + 1
return self.kthSmallest(nums,0,len(nums)-1,kthSmallest)
Similar Questions
Here is a link to use quickSelect in similar questions like this: Link
Dedications
Test case provided by: Truly understanding Algorithms