Prerequsites : kth smallest element in an array
Question
You're given an array that is almost sorted, in that each of the N
elements may be misplaced by no more than k
positions from the correct sorted order. Find a space-and-time efficient algorithm to sort the array.
Example 1
Input = [6, 5, 3, 2, 8, 10, 9], K = 3
Output = [2, 3, 5, 6, 8, 9, 10]
Example 2
Input = [10, 9, 8, 7, 4, 70, 60, 50], k = 4
Output= [4, 7, 8, 9, 10, 50, 60, 70]
GeeksForGeeks
Sorting using Insertion Sort
Insertion Sort is a simple sorting algorithm that works by comparing each element with the elements before it and swapping it if necessary. This algorithm is efficient for small sets of data, and its time complexity is O(n^2).
The simplified version of Insertion Sort is as follows:
- Start with the second element in the array.
- Compare the second element with the first element. If the second element is smaller, swap them.
- Move to the third element and compare it with the second and first elements. Swap it with the appropriate element if necessary.
- Continue this process until the end of the array is reached.
Here is an example of how the simplified Insertion Sort works:
Input: [5, 2, 4, 6, 1, 3]
sorted = [2] [5,4,6,1,3] = unsorted
sorted = [2,5] [4,6,1,3] = unsorted
sorted = [2,4,5] [6,1,3] = unsorted
sorted = [2,4,5,6] [1,3] = unsorted
sorted = [1,2,4,5,6] [3] = unsorted
sorted = [1,2,3,4,5,6] [] = unsorted
In this example, the second element (2) is compared with the first element (5) and swapped, resulting in the array [2, 5, 4, 6, 1, 3]. The third element (4) is then compared with the second element (5) and swapped, resulting in the array [2, 4, 5, 6, 1, 3]. This process continues until the array is fully sorted.
A proof of correction
At the start of the algorithm, we can be sure that nums[0]
is a sorted array of size 1. We increase the size of the array until we reach n
elements while maintaining the sorted invariant property. This can be proven using induction. Here is a proof.
A look at time complexities of insertion sort
- Worst case 0(N^2)
- Best case O(N)
- For Kth sorted arrays???
Input: [11, 7, 5, 3, 2]
sorted = [7] [11,5, 3, 2] = unsorted (1 comparison 11 with 7)
sorted = [7,11] [5, 3, 2] = unsorted (2 comparison 5 with 11,7)
sorted = [5,7,11] [3,2] = unsorted (3 comparison 3 with 11,7,5)
sorted = [3,5,7,11] [2] = unsorted (4 comparison 2 with 11,7,5,3)
sorted = [2,3,5,7,11] [] = unsorted
Will increase at the order of n.
info: Swaps are different than comparisons
Input: [2,3,5,7,11]
sorted = [2] [3,5,7,11] = unsorted (1 comparison 3 with 2)
sorted = [2,3] [5,7,11] = unsorted (1 comparison 5 with 3)
sorted = [2,3,5] [7,11] = unsorted (1 comparison 7 with 5)
sorted = [2,3,5,7] [11] = unsorted (1 comparison 11 with 7)
sorted = [2,3,5,7,11] [] = unsorted
Input: [5,3,2,7,11] k = 2
sorted = [3] [5,2,7,11] = unsorted 1 comparison 5 with 3
sorted = [3,5] [2,7,11] = unsorted 2 comparison 2 with 5,3 (Most possible)
sorted = [2,3,5] [7,11] = unsorted comparison 7 with 5
sorted = [2,3,5,7] [11] = unsorted 1 comparison 11 with 7
sorted = [2,3,5,7,11] [] = unsorted
No matter how many values there are, the elements will be k elements apart. For example, if k = 2, we will need to make at most 2 comparisons, as we can't have an offset of more than 2.
Another example with a larger array:
k = 2
[1, 4, 5, 2, 3, 7, 8, 6, 10, 9]
sorted = [1] [4, 5, 2, 3, 7, 8, 6, 10, 9] = unsorted (1 comparison 4 with 1)
sorted = [1, 4] [5, 2, 3, 7, 8, 6, 10, 9] = unsorted (1 comparison 5 with 4)
sorted = [1, 4, 5] [2, 3, 7, 8, 6, 10, 9] = unsorted (2 comparison 2 with 5 and 4)
sorted = [1, 2, 4, 5] [3, 7, 8, 6, 10, 9] = unsorted (2 comparisons 3 with 4 and 5)
sorted = [1, 2, 3, 4, 5] [7, 8, 6, 10, 9] = unsorted (1 comparisons 7 with 5)
sorted = [1, 2, 3, 4, 5, 7] [8, 6, 10, 9] = unsorted (1 comparison 8 with 7)
sorted = [1, 2, 3, 4, 5, 7, 8] [6, 10, 9] = unsorted (2 comparison 6 with 8 and 7)
sorted = [1, 2, 3, 4, 5, 6, 7, 8] [10, 9] = unsorted (1 comparison 10 with 8)
sorted = [1, 2, 3, 4, 5, 6, 7, 8, 10] [9] = unsorted (1 comparison 9 with 10)
sorted = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [] = unsorted
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] k = 2
At most 2 comparisons will need to be made for the array to be sorted again.
At most k comparisons will be made n times, where n is the number of elements in the array. Therefore, the time complexity of insertion sort on a nearly-sorted array with an offset of k will be O(N*K), and the space complexity will be O(1). While this may be sufficient for passing a test, for perfection-seeking individuals like us, there is room for improvement ;).
Let’s code the Algorythm
- Start with i = 2, iterating further across to the right till len(nums).
- element toAdd = i, element to compare(first comparison) = i - 1
- While the element toAdd is less than all elements on the left we keep swaping it till its not.
- Intermediate swap
- The index for "ToAdd" will be changed at every swap until it reaches its original position.
- For example, After the first swap the value of 0 at index 3 will be swapped to index 2, the value at index 0 is no longer at. Therefore, we cannot use the index obtained from the outer for loop (toAdd), but rather an intermediate position of where the element to add is.
- We have reached the original position when there are no elemets to the left greater than it then we make the swap.
- I left a bug in there can you find it? The code will return for the input The code return [5,5,5,5] for an array [5,2,3,1] instead of [1,2,3,5]. Why?
- The value of
toADD
will keep changing, so we cannot use it in ourwhile
loop to maintain the logic. - Final Code
for toAdd in range(1,len(nums)):
toCompare = toAdd - 1
#intermedite : swap
# toAdd = 0(number) to compare = 3(number) in an array [1,2,3]
# 0 with 3 and swap = [1,2,0,3]
# 0 with 2 and swap = [1,0,2,3]
# ... = [0,1,2,3]
while toCompare >= 0 and nums[toCompare] > nums[toAdd]:
intermediatPosition = toCompare + 1
nums[intermediatPosition] = nums[toCompare]
toCompare -= 1
while toCompare >= 0 and nums[toCompare] > nums[toAdd]:
#code
nums[intermediatPosition] = nums[toAdd] #outside while
while toCompare >= 0 and nums[toCompare] > nums[toAdd]:
#Instead do
key = nums[toAdd]
while toCompare >= 0 and nums[toCompare] > key:
nums[intermediatPosition] = nums[toAdd]
#instead do
nums[intermediatPosition] = key
class Solution:
def sortArray(self, nums: List[int]) -> List[int]:
for toAdd in range(1,len(nums)):
key = nums[toAdd]
toCompare = toAdd - 1
while toCompare >= 0 and nums[toCompare] > key :
intermediatPosition = toCompare + 1
nums[intermediatPosition] = nums[toCompare]
toCompare -= 1
#swap
nums[toCompare+1] = key
return nums
Using the K offset to our advantage an analysis
Let's try using another sorting algorithm, merge sort, and examine its time complexity. Spoiler alert: it will be O(NlogN). You can see from the complexity that there is no effect of k on the algorithm. The time complexity of merge sort remains the same even on a nearly sorted array. As we did with insertion sort, we cannot exploit the characteristics of the nums
array being nearly sorted (with k offset values) to reduce the time complexity from O(N^2) to O(NK). This leaves the question: which is better? When comparing O(NlogN) to O(NK), if the value of k is small, using insertion sort might be more effective than another sorting algorithm like merge sort. However, if the k value is large, merge sort might provide a better outcome.
Is there a better way to exploit the nearly sortedness of the array?
- For a nearly sorted array, each element only needs to be within k positions of its original position. To take advantage of this property, we can split the array into segments of size k+1, sort each segment, and return the first element. We only need k+1 elements in the window because an element can only be offset by at most k values from its original position.
- For example, if k is 2 and we are starting from the left side where i = 0, the element that belongs at index 0 in the original sorted array can only be at index 2 at most. Thus, we include the element at index 0 and the elements at index 1 and 2 into our sliced segment. This process is repeated for n elements on segments no larger than k+1, resulting in a sorting cost of O(klogk) for each element.
- Therefore, the total sorting cost will be n*(klogk). Note that we still need to sort the array every time we add a new element, which is a lot of work. We will also need O(N) for space complexity
- We only need to find the minimum value among k + 1 values. Fully sorting the array every time leads to unnecessary work. Is there a way to reduce the workload while still getting the up-to-date minimum value in the array segment without sorting it every time? Yes, there is: heaps.
- Before delving into heaps, let's review the regular sorting of the entire array. Some might argue here that it's better to use a standard sorting algorithm, such as merge sort. Merge sort has a worst-case time complexity of O(nlogN), which is better than N*(KlogK). Merge sort is beter than our solution as the values of K increase. However, merge sort does not take advantage of the characteristics of the dataset (k offset of values), as evidenced by its time complexity of nlogn, which is not affected by k. Therefore, it may not be what we are looking for. On to heaps.
- k = 2 (Play with the equations )
- K = 3
- K = 4
- Let's discuss how heaps can assist us in constructing a sorting algorithm for an array with Kth offset values.
- To avoid sorting the window of size k+1 each time, we can store it in a min heap. Since we only need the minimum value in that segment, we don't need to sort the entire segment. Heaps can help us save extra work in this regard. We only need k+1 elements because an element can only be offset by at most k values from its original position. After initializing the heap, the value at index 0(sorted array) will be at the top of the heap. We can pop it and add it to index 0 in the merged array.
- We can initialize the heap with k+1 elements using heapify, as this will be cost-effective.
- To obtain the remaining elements to pass through our heap, we need to iterate from k+1 to len(nums). Each time, we pop the minimum element from the heap, add it to the merged array, and insert a new element into the heap. The size of the heap never exceeds k + 1.
- The first pop will remove the item with the minimum value from index 0 to index 2 (given k is 2). Thus solidifying its positions for the first index in the newly merged array. We will continue adding elements to the merged array as we move to the right. After the i-th pop, the element belonging to the i-th position will be popped and added to the merged array.
- After adding and popping (len(nums) - k + 1) times, we will be left with a merged (sorted) array of (len(nums) - k + 1) items and a heap of size k + 1. We can then pop all the remaining items from the heap until it's empty, as these items will belong at the very end of the list. Popping from the heap until it's empty will give us the minimum value needed for that index. When we return the merged array, all the elements will be in the correct place. The time complexity of this approach is O(n*log(k)), as popping and adding items to the heap takes log(k) time, and we do it n times. We will also need O(N) for space complexity.
merged = []
sliced = nums[:k+1]
sliced.sort()
for i in range(k+1,len(nums)):
merged.append(sliced.pop(0))
sliced.append(nums[i])
sliced.sort()
return merged + sliced
#User function Template for python3
from heapq import heappop, heappush, heapify
class Solution:
def nearlySorted(self,nums,k):
merged = []
heap = nums[:k+1]
heapify(heap)
for i in range(k+1,len(nums)):
merged.append(heappop(heap))
heappush(heap,nums[i])
while heap:
merged.append(heappop(heap))
return merged
To obtain the minimum value using the "sorting the k + 1 window every time" approach takes O(N*(KlogK)).
merged = []
sliced = nums[:k+1]
sliced.sort()
for i in range(k+1,len(nums)):
merged.append(sliced.pop(0))
sliced.append(nums[i])
sliced.sort()
return merged + sliced
To get the minimum value at every index, a heap can be used. The time complexity for this algorithm is O(N*(logK)).
merged = []
heap = nums[:k+1]
heapify(heap)
for i in range(k+1,len(nums)):
merged.append(heappop(heap))
heappush(heap,nums[i])
while heap:
merged.append(heappop(heap))
return merged
Time complexities
Insertion sort
Time O(n*k)
Space O(1)
Efficient sorting algorithm.
Time O(n*log(n))
Space O(n)
, depending on the algorithm.
Heaps
Time O(n*log(k))
Space O(n)
Building algorithms based on the characteristics of data can give us an advantage in data analysis. By doing so, we can explore how different algorithms help us exploit the underlying nature of the data. The best time and space complexity for an algorithm will depend on the nature of the task and its specific purpose.