Prerequisites: Heaps vs Priority Queue
WHAT A HEAP and what it IS NOT?
Heap: Is an Abstract Data structure that allows a set of operations that must be supported (such as insert, delete, and extract-min/max).
Disclaimer: A heap can be viewed as both a data structure and an ADT. For now we are viewing it as an ADT—the underlying implementation is not specified.
Array as a heap?
A sorted array can be used to implement the ADT heap (max/min), certain operations will have a higher time complexity, but every insert and delete operation will require us to sort the entire array, resulting in a time complexity of nlogn.
heap= [
10
9
8
7
6
5
4
3
2
1
]
Heap as a Binary Tree
To efficiently perform operations such as extracting max/min, insertions, and deletions, we typically use a binary tree as the underlying structure. Since we only need to have the correct location of the max/min, we don’t need to sort the whole set of values. This brings us to the point that a heap is not a sorted structure. It is partially sorted, requiring only that the max or min element is at the top of the heap. Sacrificing the whole sortedness for only the max/min facilitated by a binary tree allows us to perform all heap operations in log(n). (Why log(n) will talk in the sections below).
Heaps Characteristics
Representing heaps as binary trees requires us to satisfy two conditions:
- They need to be represented as complete binary trees.
- All nodes need to be less than or equal to or greater than or equal to their parent.
- The min-heap property states that the value of each node must be greater than or equal to the value of its parent.
- The max-heap property states that the value of each node must be less than or equal to the value of its parent.
Throughout this chapter, the word "heap" will always refer to a max-heap.
First Characteristic: Is it a Complete Binary Tree?
Complete Binary Tree — In a complete binary tree, all levels are completely filled, except possibly the last level, which must be filled from left to right.
Here is an example of a complete binary tree:
In the case of a complete binary tree shown below, all levels are filled, and in the last level, the values are filled from left to right: 4, 5, 6, making it complete.
1
/ \
2 3
/ \ /
4 5 6
Here is an example of an incomplete binary tree:
The second level is unfilled, and although the last level is filled from left to right, it is not sufficient.
1
/
2
/ \
3 4
Here's another example of an incomplete binary Tree:
The second level is filled, but the last level is incomplete because it is not filled from left to right.
Question: How will you make this tree complete?
1
/ \
2 3
\ / \
4 5 6
Second Characteristic: All Nodes need to be less than or equal to or greater than or equal to their parent.
- Note that the relationship of "less than" is not level-specific, but parent-to-child-specific. For example, node 7 (with a value of 55) is considered on a lower level than node 2 (with a value of 50) because it is not child of node 2 but of node 3. This is still considered a max heap.
- The max value is on the node 1 (value 90)
These two characteristics help us maintain a partially sorted order. In the next section, "Extraction and Deletion," we will see how it works.
Extraction and Deletion
Given a heap
- Extraction
- If 90(the max value ) is removed, 36 (the last element; since we need to retain the completeness of the Binary Tree replacing it with any other value will destroy the completeness of the BT) will take its place. It is left with two candidates to proceed: 81 and 51. They will be the first to be swapped and rise to the top, taking O(1). (Since all the elements below them are fewer they will be the rightful contenders for the next max). The max will be restored at the top. To ensure that all children in a binary tree are less than their parents, we need to place the value 36(value added to the top) in the correct location we do that by sifting it down. The complexity of extraction arises from the need to swap 36 down to a position where it is less than its parent. In the worst case(in the case below), we may have to bring it down the entire tree to the last level, requiring O(height) comparisons. For a complete binary tree, the height is log(n), thus the complexity of extraction is log(n).
- Insertion into a heap
- Let's insert the number 99 into the tree we extracted from earlier. We can't add the node to the tree randomly because we want to maintain its completeness. To do this, we will add it to the left-most side of the tree. However, since 99 is larger than its parent 39, it breaks the second characteristic of a heap. To restore this characteristic, we need to swap it with 39 and sift it upwards. This process will continue until the characteristic is restored, which could take sifting up the whole height of the tree(as demonstrated in this case). This requires O(height) comparisons. For a complete binary tree, the height is log(n), which is similar to the extraction process. Therefore, the complexity of insertion is also log(n).
- When sifting up, if a number is larger than its parent, it will also be larger than any other right element at that level. For instance, in the image below, 74 is the parent of 36, so 74 is larger than 36. Since 99 is larger than 74, it is also larger than 36. If the element reaches the top, it will be larger than the whole right subtree as well.
A slight digression
Why can't the heap be an incomplete binary tree?
The reason has to do with the height of the tree. In order to retain the heap property, we might need to perform swaps to ensure that the values are in the correct locations. If the heap were incomplete tree the height will not be log(n), this would increase our time complexity. Therefore, it is generally recommended to represent a heap as a complete binary tree.
ADT TO DS
CHOOSING A DATA STRUCTURE: What data structure should be used to implement heaps?
- We know that a heap is represented in the form of a binary tree. Creating a binary tree in Python is fairly straightforward. You can follow this tutorial to learn how.
- When inserting a new node into a max-heap, access to the parent node is needed to compare its value to its parent's value and swap them if necessary. This is required to maintain the heap property (which requires that the value of a node must be greater than or equal to the value of its parent). We will have to add that into the heap node class.
- It is important to initialize the parent value of any new node that is inserted into the max-heap. Similarly, it is important to set a node's left and right child pointers to NULL when the corresponding child is removed from the max-heap.
- It appears that we will need to put in a lot of effort to maintain consistency and in the structure while performing operations on it with this approach.
- However, we can use arrays to simulate the structure of heaps, which can make things easier. This conversation is related to a previous discussion about Heaps vs Priority Queue. When implementing an heap (ADT), not all data structures are created equal, and the implementation can vary. In this case, we are implementing heaps using an array instead of a tree. This is not a sorted array as represented in the start of this section. Its an array simulating a tree.
- Why an array?
- An array is a natural choice for implementing a heap, since it maintains a consistent mapping between parent and child nodes. Note that this will not work if the heap was an incomplete binary tree.
- The parent and children of each node can be represented using an array, and calculated using index arithmetic.
class Node:
def __init__(self, data):
self.left = None
self.right = None
self.data = data
class Node:
def __init__(self):
self.left = None
self.right = None
self.parent = None
self.data = data
# add fields for receipts here
node = Node()
Heaps as Arrays
Array representing a tree-like structure.
Tree
Array
Tree to heap in 7 seconds, follow the indexes
Moving around the heap.
- Let's take a look at some patterns in the heap. In part 7, what do you see here?
- The parent of node 16 (value 2) is node 8 (value 6).
- The parent of node 8 (value 6) is node 4 (value 25).
- The parent of node 4 (value 25) is node 2 (value 36).
- The parent of node 2 (value 36) is node 1 (value 90).
- To move up the heap, we divide the index by 2 using integer division, starting from the left node.
- Let's see if this works for right nodes:
- The parent of node 15 (value 2) is node 7 (value 4) with integer division 15/2 = 7.5
- The parent of node 7 (value 4) is node 3 (value 17) with integer division 7/2 = 3.5
- The parent of node 3 (value 17) is node 1 (value 90) with integer division 3/2 = 1.5
- To move up from the right of the heap, we divide by 2 using integer division and round down the fraction.
- Let's see if we can reverse the process to move down:
- From parent node 1 (value 90) to its child at node 2 (value 36).
- From parent node 2 (value 36) to its child at node 4 (value 25).
- What about the right child? We simply add 1 to go from the parent to its right child.
- From parent node 1 (value 90) to its child node 3 (value 17): (1 * 2) + 1.
- This relationship holds because heaps, as a structure, are complete binary trees.
- A relationship can be represented using either a 1-index or a 0-indexed array. The former provides better relationships between nodes. However, it is also possible to start at index 0, but this would result in slightly messier relationships between nodes. Specifically, the parent of the node at index i would be at index (i – 1)/2, and its children would be at indices 2i + 1 and 2i + 2, instead of parent i/2 and children i *2 and i * 2 + 1.
- No matter where we are in the heap we can move up and down the heap using a little bit of math. What will this look in code?
heap = [x,x,x,x,x,x]
root = heap[1]
What it would look in code
- Setting the stage
- initialization and Relationships
- Valid indexes, to see if we are out of bound of a heap or not.
- Swap indexes
- Insertion: Add the element to the end of the array(right most side on the last level) and sift it up until its original location is found.
- What is the original location; when the element being sifted up is greater than all the elements below it?
- SIFT UP
- Visualizing adding elements to an array, insertion entials sifting up.
- See where the elements are being inserted and and how they move up
- Step By Step (LINK)
- Removing the maximum
- Extracting the maximum
- SiftDown
- We only check if the left child is valid for the while loop to continue because if the left child is invalid, that means there are no children at all. Given the completeness of the binary tree, we cannot have the right child existing when there is no left child. Therefore, when there is no left child, the loop can exit.
- The code seems daunting but it is really simple.
class MaxHeap:
def __init__(self):
self.heap = [0]
self.size = 0
def parent(self, i):
return i // 2
def leftChild(self, i):
return i * 2
def rightChild(self, i):
return i * 2 + 1
def validIndex(self,index):
return index >= 1 and index < self.size
def swapValues(self,first,second):
temp = self.heap[first]
self.heap[first] = self.heap[second]
self.heap[second] = temp
def insert(self, p):
self.size += 1
self.heap.append(p)
self.siftUp(self.size)
def siftUp(self,elementBeingSiftedUp):
elementBeingSiftedUp = elementBeingSiftedUp
parentIndex = self.parent(elementBeingSiftedUp)
while self.validIndex(parentIndex) and self.heap[parentIndex] < self.heap[elementBeingSiftedUp]:
self.swapValues(parentIndex,elementBeingSiftedUp)
elementBeingSiftedUp = self.parent(elementBeingSiftedUp)
parentIndex = self.parent(elementBeingSiftedUp)
def extractMax(self):
maximum = self.heap[1]
#replace the last item with the first
lastItem = self.heap[self.size]
self.heap[1] = lastItem
#remove the last item
self.size -= 1
self.heap.pop()
#correct location from the element at the top
self.siftDown(1)
return maximum
def siftDown(self,elementBeingSiftedDown):
leftChildIndex = self.leftChild(elementBeingSiftedDown)
leftChildIsValid = self.validIndex(leftChildIndex)
while (leftChildIsValid):
# initilaly the larger child
largerChildIndex = leftChildIndex
rightChildIndex = self.rightChild(elementBeingSiftedDown)
rightChildIsValid = self.validIndex(rightChildIndex)
# Larger Child : left or right (if it exists)
rightIsLargerThanLeft = rightChildIsValid and self.heap[largerChildIndex] < self.heap[rightChildIndex]
if rightIsLargerThanLeft:
largerChildIndex = rightChildIndex
# swap and keep continuing to the next iteration of the while loop
if self.heap[largerChildIndex] > self.heap[elementBeingSiftedDown]:
self.swapValues(largerChildIndex,elementBeingSiftedDown)
elementBeingSiftedDown = largerChildIndex
leftChildIndex = self.leftChild(elementBeingSiftedDown)
leftChildIsValid = self.validIndex(leftChildIndex)
#NONE are Larget the element has found its correct position
else:
break
Code
class MaxHeap:
def __init__(self):
self.heap = [0]
self.size = 0
def parent(self, i):
return i // 2
def leftChild(self, i):
return i * 2
def rightChild(self, i):
return i * 2 + 1
def validIndex(self,index):
return index >= 1 and index < self.size
def swapValues(self,first,second):
temp = self.heap[first]
self.heap[first] = self.heap[second]
self.heap[second] = temp
def insert(self, p):
self.size += 1
self.heap.append(p)
self.siftUp(self.size)
def siftUp(self,elementBeingSiftedUp):
elementBeingSiftedUp = elementBeingSiftedUp
parentIndex = self.parent(elementBeingSiftedUp)
while self.validIndex(parentIndex) and self.heap[parentIndex] < self.heap[elementBeingSiftedUp]:
self.swapValues(parentIndex,elementBeingSiftedUp)
elementBeingSiftedUp = self.parent(elementBeingSiftedUp)
parentIndex = self.parent(elementBeingSiftedUp)
def extractMax(self):
maximum = self.heap[1]
#replace the last item with the first
lastItem = self.heap[self.size]
self.heap[1] = lastItem
#remove the last item
self.size -= 1
self.heap.pop()
#Find the correct location
#of the element moved to the top
self.siftDown(1)
return maximum
def siftDown(self,elementBeingSiftedDown):
leftChildIndex = self.leftChild(elementBeingSiftedDown)
leftChildIsValid = self.validIndex(leftChildIndex)
largerChildIndex = leftChildIndex
while (leftChildIsValid):
largerChildIndex = leftChildIndex
rightChildIndex = self.rightChild(elementBeingSiftedDown)
rightChildIsValid = self.validIndex(rightChildIndex)
rightIsLargerThanLeft = rightChildIsValid and self.heap[largerChildIndex] < self.heap[rightChildIndex]
# Larger Child? : left or right (if it exists)
if rightIsLargerThanLeft:
largerChildIndex = rightChildIndex
# swap and keep continuing to the next iteration of the while loop
if self.heap[largerChildIndex] > self.heap[elementBeingSiftedDown]:
self.swapValues(largerChildIndex,elementBeingSiftedDown)
elementBeingSiftedDown = largerChildIndex
leftChildIndex = self.leftChild(elementBeingSiftedDown)
leftChildIsValid = self.validIndex(leftChildIndex)
#NONE are Larget the element has found its correct position
else:
break
Creating a heap from a given set of values
Alot of questions we have done ask us to create heaps from a set of known values(instead of a stream coming in). Here we talk about two methods that can be used to create heaps with a know set of values.
- Create by inserting each element O( N log N )
- Worst case: The time complexity of this operation is O(N*log(N)) , which involves taking all the integers from dayBefore and inserting them one by one into an empty heap. Since the values are inserted in increasing order, each insertion causes the node to travel from the insertion point up to the root of the tree. This results in a time complexity of O(log(N)) for each insert, (derived from the height of the tree and the number of swaps made. Therefore, if we insert values N times, the maximum time complexity will be O(N*log(N)).
- Best case: In the best-case scenario, if the values are already aligned in a way that eliminates the need for swaps during heap building, the time complexity would be O(N).
- Create O(N) / heapify (Optional) Not necessery for the scope of interview.
- Instead of adding all elements separately. We initialize all elements into an array (representing a heap). We add the items that we already know we need to add to the heap. Although half of the items are already in the correct order by default, the remaining elements need to be swapped to maintain the heap property from the last internal vertex back to the root. To achieve this, we use a "siftDown" approach, starting from the end of the heap and moving backwards towards the top. We sift an item down until it is in the correct location. The time complexity of this method is O(n) because the work required for each item is proportional to the height of the heap. At the bottom of the heap, there is no need for any work and at the top there is the most work needed at log(n). The total work required is proportional to the sum of all the heights at each level, which converges to O(n). It's important to note that the time complexity of the
siftDown(i)
operation is not the gross upper bound of O(log N), but rather O(h), where h represents the height of the node we are sifting down. As a result, the time complexity of building a heap is O(n) instead of O(n log n). You don't need to understand the math, but there are great resources available if you'd like to learn more. - As we ascend the node tree, the required amount of work decreases. There is no work required at the bottom, and the highest amount of work (logn) is required at for the elements at the root. Things to note 1. count the number of swaps
- level 1 (root level ): log(n)
- level 2(root level - 1): log(n/2)
- level 3 (root level - 2): log(n/4)
- Last level theoretically log(1)
- Some complex math that I barley understand lol. (Some help from wolframAlpha to see how the series converges Link)
- There may be similarities with a similar problem Quick Select that also converges to 2N, as the values are divided in half for both cases every time.
- The sum of this series will never exceed 2*N:
n + n/2 + n/4 + n/8 + n/16
converges to 2n - 1 which theoretically in time complexity is n. (Link)- Visually explained, after 1/64, all partitions can fit within the white space. The total area for
n/2 + n/4 + n/8 + n/16
combined will never exceedn
.
values = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]
values = [18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1]
values = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18] # previously the worst case for a max heap
One more thing
It should be noted that retrieving the ith smallest number from a heap is not possible without damaging the entire structure. This is because a heap is not fully sorted like a sorted array, but instead has a structure that allows for efficient retrieval of the minimum or maximum value depending on whether it is a max or min heap. To retrieve the ith smallest number, you would need to extract values from the heap i times until the desired value is extracted. This will destroy the initial sortedness.