Sort a nearly sorted (or K sorted) array

Prerequsites : 🌇kth smallest element in an array

Question

You're given an array that is almost sorted, in that each of the Nelements may be misplaced by no more than kpositions from the correct sorted order. Find a space-and-time efficient algorithm to sort the array.

Example 1


Input = [6, 5, 3, 2, 8, 10, 9], K = 3
Output = [2, 3, 5, 6, 8, 9, 10]

Example 2

Input = [10, 9, 8, 7, 4, 70, 60, 50], k = 4
Output= [4, 7, 8, 9, 10, 50, 60, 70]

GeeksForGeeks

Sorting using Insertion Sort

Insertion Sort is a simple sorting algorithm that works by comparing each element with the elements before it and swapping it if necessary. This algorithm is efficient for small sets of data, and its time complexity is O(n^2).

The simplified version of Insertion Sort is as follows:

Start with the second element in the array.
Compare the second element with the first element. If the second element is smaller, swap them.
Move to the third element and compare it with the second and first elements. Swap it with the appropriate element if necessary.
Continue this process until the end of the array is reached.

Here is an example of how the simplified Insertion Sort works:

Input: [5, 2, 4, 6, 1, 3]

sorted = [2] [5,4,6,1,3] = unsorted
sorted = [2,5] [4,6,1,3] = unsorted

sorted = [2,4,5] [6,1,3] = unsorted

sorted = [2,4,5,6] [1,3] = unsorted

sorted = [1,2,4,5,6] [3] = unsorted

sorted = [1,2,3,4,5,6] [] = unsorted

In this example, the second element (2) is compared with the first element (5) and swapped, resulting in the array [2, 5, 4, 6, 1, 3]. The third element (4) is then compared with the second element (5) and swapped, resulting in the array [2, 4, 5, 6, 1, 3]. This process continues until the array is fully sorted.

A proof of correction

At the start of the algorithm, we can be sure that nums[0] is a sorted array of size 1. We increase the size of the array until we reach n elements while maintaining the sorted invariant property. This can be proven using induction. Here is a proof.

A look at time complexities of insertion sort

Worst case 0(N^2)

Input: [11, 7, 5, 3, 2]

sorted = [7] [11,5, 3, 2] = unsorted (1 comparison 11 with 7)

sorted = [7,11] [5, 3, 2] = unsorted (2 comparison 5 with 11,7)

sorted = [5,7,11] [3,2] = unsorted (3 comparison 3 with 11,7,5)

sorted = [3,5,7,11] [2] = unsorted (4 comparison 2 with 11,7,5,3)

sorted = [2,3,5,7,11] [] = unsorted 

Will increase at the order of n.

info: Swaps are different than comparisons

Best case O(N)

Input: [2,3,5,7,11]

sorted = [2] [3,5,7,11] = unsorted (1 comparison 3 with 2)

sorted = [2,3] [5,7,11] = unsorted (1 comparison 5 with 3)

sorted = [2,3,5] [7,11] = unsorted (1 comparison 7 with 5)

sorted = [2,3,5,7] [11] = unsorted (1 comparison 11 with 7)

sorted = [2,3,5,7,11] [] = unsorted

For Kth sorted arrays???

Input: [5,3,2,7,11] k = 2

sorted = [3] [5,2,7,11] = unsorted 1 comparison 5 with 3 
sorted = [3,5] [2,7,11] = unsorted 2 comparison 2 with 5,3 (Most possible)
sorted = [2,3,5] [7,11] = unsorted  comparison  7 with 5 
sorted = [2,3,5,7] [11] = unsorted 1 comparison 11 with 7
sorted = [2,3,5,7,11] [] = unsorted

No matter how many values there are, the elements will be k elements apart. For example, if k = 2, we will need to make at most 2 comparisons, as we can't have an offset of more than 2.


Another example with a larger array:
k = 2
[1, 4, 5, 2, 3, 7, 8, 6, 10, 9]
sorted = [1] [4, 5, 2, 3, 7, 8, 6, 10, 9] = unsorted (1 comparison 4 with 1)
sorted = [1, 4] [5, 2, 3, 7, 8, 6, 10, 9] = unsorted (1 comparison 5 with 4)
sorted = [1, 4, 5] [2, 3, 7, 8, 6, 10, 9] = unsorted (2 comparison 2 with 5 and 4)
sorted = [1, 2, 4, 5] [3, 7, 8, 6, 10, 9] = unsorted (2 comparisons 3 with 4 and 5)
sorted = [1, 2, 3, 4, 5] [7, 8, 6, 10, 9] = unsorted (1 comparisons 7 with 5)
sorted = [1, 2, 3, 4, 5, 7] [8, 6, 10, 9] = unsorted (1 comparison 8 with 7)
sorted = [1, 2, 3, 4, 5, 7, 8] [6, 10, 9] = unsorted (2 comparison 6 with 8 and 7)
sorted = [1, 2, 3, 4, 5, 6, 7, 8] [10, 9] = unsorted (1 comparison 10 with 8)
sorted = [1, 2, 3, 4, 5, 6, 7, 8, 10] [9] = unsorted (1 comparison 9 with 10)
sorted = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [] = unsorted 

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] k = 2

At most 2 comparisons will need to be made for the array to be sorted again.

At most k comparisons will be made n times, where n is the number of elements in the array. Therefore, the time complexity of insertion sort on a nearly-sorted array with an offset of k will be O(N*K), and the space complexity will be O(1). While this may be sufficient for passing a test, for perfection-seeking individuals like us, there is room for improvement ;).

Let’s code the Algorythm

Start with i = 2, iterating further across to the right till len(nums).

for toAdd in range(1,len(nums)):

element toAdd = i, element to compare(first comparison) = i - 1

toCompare = toAdd - 1

While the element toAdd is less than all elements on the left we keep swaping it till its not.

Intermediate swap

#intermedite : swap 
# toAdd = 0(number) to compare = 3(number) in an array [1,2,3] 
	# 0 with 3 and swap = [1,2,0,3]
	# 0 with 2 and swap = [1,0,2,3]
	# ... = [0,1,2,3]

The index for "ToAdd" will be changed at every swap until it reaches its original position.

For example, After the first swap the value of 0 at index 3 will be swapped to index 2, the value at index 0 is no longer at. Therefore, we cannot use the index obtained from the outer for loop (toAdd), but rather an intermediate position of where the element to add is.

while toCompare >= 0 and nums[toCompare] > nums[toAdd]:
    intermediatPosition = toCompare + 1
		nums[intermediatPosition] = nums[toCompare]
    toCompare -= 1

We have reached the original position when there are no elemets to the left greater than it then we make the swap.

           
while toCompare >= 0 and nums[toCompare] > nums[toAdd]:
	#code

nums[intermediatPosition] = nums[toAdd] #outside while

I left a bug in there can you find it? The code will return for the input The code return [5,5,5,5] for an array [5,2,3,1] instead of [1,2,3,5]. Why?

The value of toADD will keep changing, so we cannot use it in our while loop to maintain the logic.

while toCompare >= 0 and nums[toCompare] > nums[toAdd]: 

#Instead do 
key = nums[toAdd]
while toCompare >= 0 and nums[toCompare] > key:



nums[intermediatPosition] = nums[toAdd]
#instead do

nums[intermediatPosition] = key

Final Code

class Solution:
    def sortArray(self, nums: List[int]) -> List[int]:

        for toAdd in range(1,len(nums)):

            key = nums[toAdd]
            toCompare = toAdd - 1
            while toCompare >= 0 and nums[toCompare] > key :
                intermediatPosition = toCompare + 1
                nums[intermediatPosition] = nums[toCompare]
                toCompare -= 1

            #swap
            nums[toCompare+1] = key 

        return nums

Using the K offset to our advantage an analysis

Let's try using another sorting algorithm, merge sort, and examine its time complexity. Spoiler alert: it will be O(NlogN). You can see from the complexity that there is no effect of k on the algorithm. The time complexity of merge sort remains the same even on a nearly sorted array. As we did with insertion sort, we cannot exploit the characteristics of the nums array being nearly sorted (with k offset values) to reduce the time complexity from O(N^2) to O(NK). This leaves the question: which is better? When comparing O(NlogN) to O(NK), if the value of k is small, using insertion sort might be more effective than another sorting algorithm like merge sort. However, if the k value is large, merge sort might provide a better outcome.

Is there a better way to exploit the nearly sortedness of the array?

For a nearly sorted array, each element only needs to be within k positions of its original position. To take advantage of this property, we can split the array into segments of size k+1, sort each segment, and return the first element. We only need k+1 elements in the window because an element can only be offset by at most k values from its original position.

For example, if k is 2 and we are starting from the left side where i = 0, the element that belongs at index 0 in the original sorted array can only be at index 2 at most. Thus, we include the element at index 0 and the elements at index 1 and 2 into our sliced segment. This process is repeated for n elements on segments no larger than k+1, resulting in a sorting cost of O(klogk) for each element.
Therefore, the total sorting cost will be n*(klogk). Note that we still need to sort the array every time we add a new element, which is a lot of work. We will also need O(N) for space complexity

merged = []
  
sliced = nums[:k+1]

sliced.sort()


for i in range(k+1,len(nums)):
    
    merged.append(sliced.pop(0))
    sliced.append(nums[i])
    sliced.sort()

return merged + sliced

We only need to find the minimum value among k + 1 values. Fully sorting the array every time leads to unnecessary work. Is there a way to reduce the workload while still getting the up-to-date minimum value in the array segment without sorting it every time? Yes, there is: heaps.
Before delving into heaps, let's review the regular sorting of the entire array. Some might argue here that it's better to use a standard sorting algorithm, such as merge sort. Merge sort has a worst-case time complexity of O(nlogN), which is better than N*(KlogK). Merge sort is beter than our solution as the values of K increase. However, merge sort does not take advantage of the characteristics of the dataset (k offset of values), as evidenced by its time complexity of nlogn, which is not affected by k. Therefore, it may not be what we are looking for. On to heaps.

k = 2 (Play with the equations )

K = 3

K = 4

Let's discuss how heaps can assist us in constructing a sorting algorithm for an array with Kth offset values.

To avoid sorting the window of size k+1 each time, we can store it in a min heap. Since we only need the minimum value in that segment, we don't need to sort the entire segment. Heaps can help us save extra work in this regard. We only need k+1 elements because an element can only be offset by at most k values from its original position. After initializing the heap, the value at index 0(sorted array) will be at the top of the heap. We can pop it and add it to index 0 in the merged array.

We can initialize the heap with k+1 elements using heapify, as this will be cost-effective.

To obtain the remaining elements to pass through our heap, we need to iterate from k+1 to len(nums). Each time, we pop the minimum element from the heap, add it to the merged array, and insert a new element into the heap. The size of the heap never exceeds k + 1.

The first pop will remove the item with the minimum value from index 0 to index 2 (given k is 2). Thus solidifying its positions for the first index in the newly merged array. We will continue adding elements to the merged array as we move to the right. After the i-th pop, the element belonging to the i-th position will be popped and added to the merged array.

After adding and popping (len(nums) - k + 1) times, we will be left with a merged (sorted) array of (len(nums) - k + 1) items and a heap of size k + 1. We can then pop all the remaining items from the heap until it's empty, as these items will belong at the very end of the list. Popping from the heap until it's empty will give us the minimum value needed for that index. When we return the merged array, all the elements will be in the correct place. The time complexity of this approach is O(n*log(k)), as popping and adding items to the heap takes log(k) time, and we do it n times. We will also need O(N) for space complexity.

#User function Template for python3
from heapq import heappop, heappush, heapify
class Solution:
    
   
    def nearlySorted(self,nums,k):

        merged = []
        
        heap = nums[:k+1]
        heapify(heap)
        
        
        for i in range(k+1,len(nums)):
            
            merged.append(heappop(heap))
            heappush(heap,nums[i])
            
            
        while heap:
            
            merged.append(heappop(heap))
            
            
        return merged

To obtain the minimum value using the "sorting the k + 1 window every time" approach takes O(N*(KlogK)).

merged = []

sliced = nums[:k+1]
sliced.sort()


for i in range(k+1,len(nums)):
  
  merged.append(sliced.pop(0))
  sliced.append(nums[i])
  sliced.sort()

return merged + sliced

To get the minimum value at every index, a heap can be used. The time complexity for this algorithm is O(N*(logK)).

  merged = []
  
  heap = nums[:k+1]
  heapify(heap)
  
 
for i in range(k+1,len(nums)):
      
    merged.append(heappop(heap))
    heappush(heap,nums[i])
     
  while heap:
      merged.append(heappop(heap))
      
  return merged

Time complexities

Insertion sort

Time O(n*k)

Space O(1)

Efficient sorting algorithm.

Time O(n*log(n))

Space O(n), depending on the algorithm.

Heaps

Time O(n*log(k))

Space O(n)

Building algorithms based on the characteristics of data can give us an advantage in data analysis. By doing so, we can explore how different algorithms help us exploit the underlying nature of the data. The best time and space complexity for an algorithm will depend on the nature of the task and its specific purpose.