search unique URL

https://stackoverflow.com/questions/12210233

29-06-2021
|

Question

Given a set of 1 million (very large) no. of URL's. Find the "first" "unique" URL from the list.

My Approach: Build a hash using perfect hashing function, that can help. But my question is to hash large data is not possible., then how can I solve this question.

Is there any method to do inplace? Please help. Thanks in advance.

Solution

Given an input list of ["c","a","b","a","c"], my first approach would be:

Convert the list of URLs into a list of tuples which associates each element which its position in the list. Now you have [(0,"c"),(1,"a"),(2,"b"),(3,"a"),(4,"c")].
Sort the list lexicographically by the second tuple element (the URL). Now you have [(1,"a"),(3,"a"),(2,"b"),(0,"c"),(4,"c")].
Group sequences of subsequent equal tuples (a tuple is equal if the second element equals) into sub-lists. Now you have [[(1,"a"),(3,"a")],[(2,"b")],[(0,"c"),(4,"c")]].
Filter the list so that you only have lists of length 1. Now you have [[(2,"b")]].
If the resulting list is empty, there is no unique URL in the list. If it is non-empty, Sort the list by the first tuple element (the position in the string). In this case, you get the same list back - [[(2,"b")]].
take the first element of the list. Now you have [(2,"b")].
The (only) tuple in this list tells you the first unique URL, and the position in the input list: it's the URL b at position 2 in the input list.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow