Question

I have tweets scraped in MySQL database and I manage to connect to it and query for column that contains tweets' text. Now what I want to do is parse this and extract hashtags into a csv file.

So far, I have this code that is working until the last loop:

import re
import MySQLdb

# connects to database
mydb = MySQLdb.connect(host='****',
    user='****',
    passwd='****',
    db='****')
cursor = mydb.cursor()

# queries for column with tweets text
getdata = 'SELECT text FROM bitscrape'
cursor.execute(getdata)
results = cursor.fetchall()

for i in results: 
    hashtags = re.findall(r"#(\w+)", i)
    print hashtags

I get the following error: TypeError: expected string or buffer. And the problem is in line hashtags = re.findall(r"#(\w+)", i).

Any suggestions?

Thanks!

Was it helpful?

Solution

cursor.fetchall() returns a list of tuples. Take the first element from each row and pass it to findall():

for row in results: 
    hashtags = re.findall(r"#(\w+)", row[0])

Hope that helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top