just change your GROUP BY to
group by
`tweets`.`account_id`,
`tweets`.`user_screenname`,
cast(`tweets`.`datetime` as date)
Question
I have a table of twitter data in MYSQL where the columns is_retweet, is_reply is made of binary values where 1=yes, 0=no. if a user retweeted multiple times in a day, there would then be multiple rows of ones in the retweet coulmn for that user on that day.
account_id, datetime, user_screenname, is_retweet, is_reply,followers_count
'9', '2008-06-11 20:06:35','Access2', '1', '0', '811'
'9', '2008-06-11 23:06:35','Access2', '1', '1', '812'
'9', '2008-06-12 20:01:21','Access2', '0', '1', '813'
'7', '2008-06-11 17:01:00','actingparty', '1', '1', '2000'
How should i structure my SQL view to give me a result like the table below where i can sum up the retweets and replies for any given day, and by username? IE What i am trying to do is:
-for a username on any day, what is the total number of retweets, replies and highest follower count.
account_id, date, user_screenname, sum_retweet, sum_reply, followers_count
'9', '2008-06-11', 'Access2', '2', '0', '812'
'9', '2008-06-12', 'Access2', '0', '1', '813'
Here is my sql code:
CREATE VIEW `tweet_sum` AS
select
`tweets`.`account_id` AS `account_id`,
`tweets`.`user_screenname` AS `user_screenname`,
CAST(`tweets`.`datetime` as date) AS `period`,
MAX(`tweets`.`followers_count`) AS `followers_count`,
SUM(`tweets`.`is_reply`) AS `sum_reply`,
SUM(`tweets`.`is_retweet`) AS `sum_retweet`,
from
`tweets`
group by cast(`tweets`.`datetime` as date)
However my data dont seem to match with what i want as it seems that the sql is summing up all users retweets for that day. How can i group it by day and username as well?
Thanks! J
******EDIT*************************************
I would like to extend the question. Say I have one more column Reach (which is equal to followers_count times the number of columns(is_retweet, is_reply) that is greater than zero.) For example, in the output table below, the sum_retweet and sum_reply columns are both greater than zero for 2008-06-11 so i will need to take followers_count*2=1624 for the reach column.
How can i structure my sql code to do that?
account_id, date, user_screenname, sum_retweet, sum_reply, followers_count, **Reach**
'9', '2008-06-11', 'Access2', '2', '1', '812', '1624'
'9', '2008-06-12', 'Access2', '0', '1', '813', '813'
La solution
just change your GROUP BY to
group by
`tweets`.`account_id`,
`tweets`.`user_screenname`,
cast(`tweets`.`datetime` as date)