Prerequsites : kth smallest element in an array, Merge K sorted Arrays
Question
Design a simplified version of Twitter where users can post tweets, follow/unfollow another user, and is able to see the 10
most recent tweets in the user's news feed.
Implement the Twitter
class:
Twitter()
Initializes your twitter object.void postTweet(int userId, int tweetId)
Composes a new tweet with IDtweetId
by the useruserId
. Each call to this function will be made with a uniquetweetId
.List<Integer> getNewsFeed(int userId)
Retrieves the10
most recent tweet IDs in the user's news feed. Each item in the news feed must be posted by users who the user followed or by the user themself. Tweets must be ordered from most recent to least recent.void follow(int followerId, int followeeId)
The user with IDfollowerId
started following the user with IDfolloweeId
.void unfollow(int followerId, int followeeId)
The user with IDfollowerId
started unfollowing the user with IDfolloweeId
.
Example 1:
Input
["Twitter", "postTweet", "getNewsFeed", "follow", "postTweet", "getNewsFeed", "unfollow", "getNewsFeed"]
[[], [1, 5], [1], [1, 2], [2, 6], [1], [1, 2], [1]]
Output
[null, null, [5], null, null, [6, 5], null, [5]]
Explanation
Twitter twitter = new Twitter();
twitter.postTweet(1, 5); // User 1 posts a new tweet (id = 5).
twitter.getNewsFeed(1); // User 1's news feed should return a list with 1 tweet id -> [5]. return [5]
twitter.follow(1, 2); // User 1 follows user 2.
twitter.postTweet(2, 6); // User 2 posts a new tweet (id = 6).
twitter.getNewsFeed(1); // User 1's news feed should return a list with 2 tweet ids -> [6, 5]. Tweet id 6 should precede tweet id 5 because it is posted after tweet id 5.
twitter.unfollow(1, 2); // User 1 unfollows user 2.
twitter.getNewsFeed(1); // User 1's news feed should return a list with 1 tweet id -> [5], since user 1 is no longer following user 2.
Leetcode
Refraiming the question
- We know that the system has users who can:
- Post tweets
- Follow and unfollow one another
- Access the latest news feed (up to 10 tweets)
- To store tweets posted by a single user, we need a data structure that tracks which user posted which tweet. A hashmap could be used for this purpose.
- To display the feed, we need to know who follows whom. This allows us to query all the tweets from the users that the person (whose feed is being created) follows and display them.
- We may ask why we don't store who follows the user instead of who the user follows? Twitter is represented by a directed graph(information usually flows one way), which means we can only see posts from people we follow, not those who follow us. Therefore, this information is not necessary for displaying the feed.
- Since there are going to be more than 1 value for the tweets a person has and the people they follow. We will need another data strcture in the hashmap to store those values.
- Which is better for storing a collection of tweets and follows: a list or a set?
- Both will work but we have to see what gives us better performance. Lets go with lists right now(will realize why that can be a bad option)
- Follow/unfollow classes.
- For following, when a user follows another user, we can add them to the
follows
hashmap and remove them as necessary. - User 1 follows User 3.
- User 1 unfollows User 3.
- Code, this is not fully complete what needs to be added?
- Imagine we receive a call to follow someone we are already following. This may result in duplicates in the list. When querying, we might end up displaying the same tweets twice.
- Or if we are trying to remove a follower that doesn’t exist we will get an error. So we check if the user was followed before and then remove it.
- Choosing a list to store items can be inefficient. When using
.remove(followeeId)
, the removal time is equivalent to traversing the entire list (O(n)). To find the index of the item to remove, you must traverse the list, and then shift all the elements down by one index. This is time-consuming and can be avoided by using other data structures. What other data strucure can we use? A set the time to remove an item from a set is O(1). This will also help us write less code as we will not need to worry about duplicates as a set always retains unique item. Storing tweets in a list is suitable, since sets do not retain the order. Here, time is important, so the list will represent the chronological order of the tweets(link). - One thing to consider is that TweetIDs do not represent time, as they do not follow a chronological order (e.g. 5 does not come after 4). Therefore, we must include a timestamp for each post and increment it after each post.
- Posting the tweet—add the tweetcounter as metadata to represent the age of the tweet(important in the order of how tweets show up in the feed)
- Now we have all the ingredients to creating the feed.
- We need to starts by getting all of the users that the given user follows and quering all their tweets, we also need to add the users own tweets to the feed as well so before quering we will simply add the user to the his own follows list.
- Once we have these tweets we need to have them ordered in the 10 most recent.
- We can merge all the lists and get the last 10 items. This is a viable option, but it will be costly. To achieve this, we must merge all the lists and take the last ten out, which will cost us OO
O(nLog(n)
. Here,n
is the total number of tweets by all the users the person whose feed we are generating follows. This could become complex if we If we have a large number of tweets, such as 10,000 we will sort all of them and get the last ten out. There will be so much extra work we need to do for only getting 10 last tweets. - But since all the tweets from users are sorted even they are in differnt array, can we use this to our advantage. Does the previous question we did ring a bell Merge K sorted Arrays. Where we used heap as a pointer to track the most recent items uptill a point and then merged them. This is basically the same with two changes.
- Instead of merging all values, we merge until the feed size is 10.
- We will use a max heap instead of a min heap, going from the highest value to the lowest. Tweets with a higher tweetCounter will be at the top and the first ones to be popped out—this will make us start from the very end instead of the beginning of the array.
- Same as in Merge K sorted Arrays We are initializing a heap with the most recent tweet from each of these users. This is determined by the tweet counter, we also add the tweet, the set of tweets from the user and the index of the most latest tweet(we have not added to the feed yet).
- Why use -1 multiplied with tweetCounter? In Python, we only have a min-heap. To use it as a max-heap, we can multiply the number by a negative sign. This way, the highest number becomes the lowest—10 becomes -10
- There is a slight incompletness in the code. What if we have a have a user that has no tweets, what will happen with the code below?
- How can we fix that, we simply check if there are tweets to merge before pushing them into the heap.
- The system retrieves up to a maximum of 10 of the most recent tweets from all users that are followed. The tweetCounter is used to order the tweets in the heap and tracks the most recent tweet; if there are more tweets from the same user, these are added back to the heap. The news feed is then returned.
- While we have a heap or the len(feed) is not 10. We keep merging the most recent tweets by popping them from the top of the heap.
- We use
or
betweenheap
andlen(feed)
because there may be fewer than 10 tweets from all users. In such cases, we stop the program once the heap is empty. - We add the index of the most recent tweet that hasn't been added to the feed to the payload. Every time we pop it and add it to the heap, we decrement it. If it drops below zero, we know there are no more tweets to add, so we don't add the payload to the hea[. Once all tweets from all users have been merged, or the size reaches 10, we return the feed.
class Twitter:
def __init__(self):
self.tweets = defaultdict(list) # userId : list of [tweetIds]
self.follows = defaultdict(list) # userId : list of people one follows
def follow(self, followerId: int, followeeId: int) -> None:
self.follows[followerId].append(followeeId)
def unfollow(self, followerId: int, followeeId: int) -> None:
self.follows[followerId].remove(followeeId)
follow(1,2)
follow(1,2) #twice
user1 :[user2,user2]
def follow(self, followerId: int, followeeId: int) -> None:
if not(followeeId in self.follows[followerId]): # change here
self.follows[followerId].append(followeeId)
def unfollow(self, followerId: int, followeeId: int) -> None:
if followeeId in self.follows[followerId]: # change here
self.follows[followerId].remove(followeeId)
class Twitter:
def __init__(self):
self.tweets = defaultdict(list)
self.follows = defaultdict(set) # change here
def follow(self, followerId: int, followeeId: int) -> None:
# change here: no need for extra line now we are using set.
self.follows[followerId].add(followeeId) # change here : append to add
def unfollow(self, followerId: int, followeeId: int) -> None:
if followeeId in self.follows[followerId]: # same as before, still need to check if the element exists.
self.follows[followerId].remove(followeeId)
class Twitter:
def __init__(self):
self.tweets = defaultdict(list) # userId : list of [tweetCounter,tweetIds] # change here
self.follows = defaultdict(set) # userId : set of people one follows
self.tweetCounter = 0 # change here
def postTweet(self, userId: int, tweetId: int) -> None:
self.tweets[userId].append((self.tweetCounter,tweetId))
self.tweetCounter+=1
self.follows[userId].add(userId)
userFollows = self.follows[userId]
feed = []
heap = []
#initilize heap most latest tweets
for user in userFollows:
tweetsByUser = self.tweets[user]
mostRecentTweetIndex = len(tweetsByUser)-1
tweetCounter,tweetID = tweetsByUser[mostRecentTweetIndex]
payLoad = (-1*tweetCounter,tweetID,tweetsByUser,mostRecentTweetIndex)
heappush(heap,payLoad)
payLoad = (-1*tweetCounter,tweetID,tweetsByUser,mostRecentTweetIndex)
time,tweetID = tweetsByUser[mostRecentTweetIndex] # index out of bounds error
#fixed
# some feeds might be followed by the user but be empty
if (len(tweetsByUser)) > 0:
tweetCounter,tweetID = tweetsByFollower[index-1]
while heap and len(feed) < 10:
# retrieves the most recent tweet from all followed users and adds it to the news feed
tweetCounter,tweetID,tweetsByFollower,index = heappop(heap)
feed.append(tweetID)
stillHasMoreTweets = (index-1) >= 0
if stillHasMoreTweets:
tweetCounter,tweetID = tweetsByFollower[index-1]
payLoad = (-1*tweetCounter,tweetID,tweetsByFollower,index-1)
heappush(heap,payLoad)
Implementation
from heapq import heappush, heappop
class Twitter:
def __init__(self):
self.tweets = defaultdict(list) # userId : list of [tweetCounter,tweetIds] # change here
self.follows = defaultdict(set) # userId : set of people one follows
self.tweetCounter = 0
def postTweet(self, userId: int, tweetId: int) -> None:
self.tweets[userId].append((self.tweetCounter,tweetId))
self.tweetCounter+=1
def getNewsFeed(self, userId: int) -> List[int]:
self.follows[userId].add(userId)
userFollows = self.follows[userId]
feed = []
heap = []
#initilize heap most latest tweets
for user in userFollows:
tweetsByUser = self.tweets[user]
mostRecentTweetIndex = len(tweetsByUser)-1
if (len(tweetsByUser)) > 0:
tweetCounter,tweetID = tweetsByUser[mostRecentTweetIndex]
payLoad = (-1*tweetCounter,tweetID,tweetsByUser,mostRecentTweetIndex)
heappush(heap,payLoad)
while heap and len(feed) < 10:
tweetCounter,tweetID,tweetsByFollower,index = heappop(heap)
feed.append(tweetID)
stillHasMoreTweets = (index-1) >= 0
if stillHasMoreTweets:
tweetCounter,tweetID = tweetsByFollower[index-1]
payLoad = (-1*tweetCounter,tweetID,tweetsByFollower,index-1)
heappush(heap,payLoad)
return feed
def follow(self, followerId: int, followeeId: int) -> None:
self.follows[followerId].add(followeeId)
def unfollow(self, followerId: int, followeeId: int) -> None:
if followeeId in self.follows[followerId]:
self.follows[followerId].remove(followeeId)
Complexities
Time complexity: O(n*log(k))
k
= numberOfUsersFollowed
n
= numberOfItemsInTheMergedFeed
(10 in this case)
This is because we are inserting and popping elements n
times from the min heap, each of which takes O(log(k))
time—k numberOfUsersFollowed
is the maximum number of items that can be in a heap.
Space Complexity: O(k)