Prerequsites : kth smallest element in an array
Question
You are given K sorted integer arrays in a form of 2D integer matrix arrays of size .You need to merge them into a single array and return it.
Example 1
input:
array = [
[1,4,5],
[1,3,4],
[2,6]
]
output: [1,1,2,3,4,4,5,6]
Question Link
Brainstorming some Brute force methods
- Add all elements to an array sort it and return the Kth largest element.
def merge_k_arrays(arrays):
elements = []
for array in arrays:
elements += array
elements.sort()
return elements
Complexities
Time complexity: O(n * log(n) )
Where n
is the number of elements in all arrays.
Space Complexity: O(k)
Understanding a basic version
- Before we start merging k arrays, let's try merging two arrays by using two pointers to traverse the lists. We add the smaller value first and move the pointer of the list from which we added it.
array1 = [1,4,5]
^
array2 = [2,3,4]
^
____________________________
merged = [1]
array1 = [1,4,5]
--^
array2 = [2,3,4]
^
____________________________
merged = [1,2]
array1 = [1,4,5]
--^
array2 = [2,3,4]
--^
merged = [1,2,3]
... and so on
until both pointers are exhausted.
- Initial code
- The merging of two arrays was successful, but one element was left out as one pointer exceeded the length of the array while the other was left behind, due to the and statement.
- We can fix this by concatenating the array that still has elements left with the merged array.
- Full code
def mergeTwoArrays(array1,array2):
p1 = 0
p2 = 0
merged = []
while p1 < len(array1) and p2 < len(array2):
if array1[p1] < array2[p2]:
merged.append(array1[p1])
p1+=1
else:
merged.append(array2[p2])
p2+=1
return merged
array1 = [1,4,5]
array2 = [2,3,4]
print(mergeTwoArrays(array1,array2))
while p1 < len(array1) and p2 < len(array2):
while p1 < len(array1) and p2 < len(array2)::
<code>
#atleast one pointer exhauseted
stillSomethingLeftInArray1 = p1 < len(array1)
stillSomethingLeftInArray2 = p2 < len(array2)
if stillSomethingLeftInArray1:
arrayLeft = array1[p1:]
merged = merged + arrayLeft
if stillSomethingLeftInArray2:
arrayLeft = array2[p2:]
merged = merged + arrayLeft
def mergeTwoArrays(array1,array2):
p1 = 0
p2 = 0
merged = []
while p1 < len(array1) and p2 < len(array2):
if array1[p1] < array2[p2]:
merged.append(array1[p1])
p1+=1
else:
merged.append(array2[p2])
p2+=1
#atleast one pointer exhauseted
stillSomethingLeftInArray1 = p1 < len(array1)
stillSomethingLeftInArray2 = p2 < len(array2)
if stillSomethingLeftInArray1:
arrayLeft = array1[p1:]
merged = merged + arrayLeft
if stillSomethingLeftInArray2:
arrayLeft = array2[p2:]
merged = merged + arrayLeft
return merged
array1 = [1,4,5]
array2 = [2,3,4]
print(mergeTwoArrays(array1,array2))
SandBox (code to play with on PythonTutor)
3 arrays and onwards.
Input: arrays = [[1,4,5],[1,3,4],[2,6]]
Output: [1,1,2,3,4,4,5,6]
- We previously used two pointers to add the smallest value went to the merged array. For three, we will need three pointers. We start by taking the first index of each array and adding the minimum value amongst these three pointers to the merged array.
- step 1
- Step 2
- Step 3
- At each iteration, we choose the minimum item among the pointers and add it to the merged array. While we can write a laborious logic to merge three arrays, what about merging 100 arrays? There must be a better way. Can you think of one?
- We need to access the minimum among a set of values from pointers while new values come from other indexes where the pointer moves to (like a stream).
- You guessed it we will use a heap. But how will it work?
- Step By Step
- Step 1
- Step 2, min in the heap is merged to a new array and we move across the array we removed from.
- Step 3
- Step 7 ish something, there will come a time when there are no more elements left to merge from the array. We will not add the values from that array into the heap.After that, there will be only two values left in the heap, denoting two pointers. Then one, then zero. When there are zero values left, we know we have exhausted all the arrays. The heap acts as an abstract multi pointer.
- last Step.
array1 = [1, 4, 5]
^
array2 = [1, 3, 4]
^
array3 = [2, 6]
^
min between (1,1,2)
1 at index 0 for array 1
or
1 at index 0 for array 2
We can pick either to lets go with index 0 in array 1
merged = [1]
array1 = [1, 4, 5]
^
array2 = [1, 3, 4]
^
array3 = [2, 6]
^
min betweeen (1,2,4)
1 at index 0 of array 2
merged = [1,1]
array1 = [1, 4, 5]
^
array2 = [1, 3, 4]
^
array3 = [2, 6]
^
min betweeen (3,2,4)
2 at index 0 of array3
merged = [1,1,2]
We will go on untill the pointers have been exhausted for all arrays pointers
Implementation
- Creating the initial heap.
- While the heap is not exhausted
- Take the min element out of the heap. Add it to the merged array
- While an arrays is not exhausted keep adding the element from it to heap
def merge_k_sorted_lists(arrays):
merged = []
heap = []
for currentArray in array:
# push first element of each array into the heap
firstindex = 0
firstElement = currentArray[0]
heappush(heap, (FirstElement, currentArray, Firstindex)) # 1
#After initilization
#heap = [(1, [1, 3, 4], 0), (1, [1, 4, 5], 0), (2, [2, 6], 0)]
# The first value in the input is used to determine the order of sortedness
# --in the image above [1,1,2]
while heap:
element, currentArray, onIndex = heappop(heap)
merged.append(element)
nextIndex = onIndex + 1
stillMoreElementsInTheArray = nextIndex < len(currentArray)
if stillMoreElementsInTheArray:
heappush(heap, (currentArray[nextIndex], currentArray, nextIndex))
return merged
Heap is empty now—all arrays have been exhausted.
Code
from heapq import heappop, heappush
from typing import List
def mergeKSortedArrays(arrays):
merged = []
heap = []
for currentArray in arrays:
# push first element of each array into the heap
firstindex = 0
firstElement = currentArray[0]
heappush(heap, (firstElement, currentArray, firstindex))
while heap:
element, currentArray, onIndex = heappop(heap)
merged.append(element)
nextIndex = onIndex + 1
stillMoreElementsInTheArray = nextIndex < len(currentArray)
if stillMoreElementsInTheArray:
heappush(heap, (currentArray[nextIndex], currentArray, nextIndex))
return merged
SandBox(code to play with on PythonTutor)
Complexities
Time complexity: O(n*log(k))
Where n
is the number of elements in all arrays and k
is the number of arrays. This is because we are inserting and popping n
elements from the min heap, each of which takes O(log(k))
time—k is the maximum number of elements represneting pointers that can be in a heap.
Space Complexity: O(k)
FollowUp
- This question can also be asked in the form of "merge k sorted linked lists." Can you solve that? Here is a link:https://leetcode.com/problems/merge-k-sorted-lists/
Personal Opinion
I was asked this question in an Uber interview, and I absolutely messed it up. This tutorial is dedicated to that and all the things I learned from it.