Question

I'm wondering what is the best way to create a list while avoiding the duplicate.

I have some data in mysql which fields contain product types.

For example:

id ------- category

1 --------   food, drink, vege

2 --------   food, drink

3 --------   vege, baby goods

4 --------   fish

Output which im aiming is:

['food','drink','vege','baby goods','fish']

(please note order does NOT matter for me)

Data set have over 40,000 records so checking it manually is certainly not an option...

I would appreciate if you could drop me a note or suggestion to achieve this.

Was it helpful?

Solution

Python sets don't allow duplicates. So, you can construct a set of unique categories, using set comprehension, like this

unique_categories = {item.strip() for row in cur for item in row[1].split(",")}

For example,

a = "food, drink, vege"
print {item.strip() for item in a.split(",")}

Output

set(['food', 'drink', 'vege'])

You can iterate a set, like a list. But, if you want to convert it to a list later on, you can use list function like this

unique_categories = list(unique_categories)

OTHER TIPS

Just change the datatype to set it will work

example

x = ['food','drink','vege','baby goods','food']

if you want the following output

x = ['food','drink','vege','baby goods']

just do it

x = set(x)

Its done

In set there is no duplicate members.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top