Question

I have a table of twitter data in MYSQL where the columns is_retweet, is_reply is made of binary values where 1=yes, 0=no. if a user retweeted multiple times in a day, there would then be multiple rows of ones in the retweet coulmn for that user on that day.

account_id,    datetime,        user_screenname, is_retweet, is_reply,followers_count
'9',      '2008-06-11 20:06:35','Access2',        '1',         '0',     '811' 
'9',      '2008-06-11 23:06:35','Access2',        '1',         '1',     '812' 
'9',      '2008-06-12 20:01:21','Access2',        '0',         '1',     '813' 
'7',      '2008-06-11 17:01:00','actingparty',    '1',         '1',     '2000' 

How should i structure my SQL view to give me a result like the table below where i can sum up the retweets and replies for any given day, and by username? IE What i am trying to do is:

-for a username on any day, what is the total number of retweets, replies and highest follower count.

account_id,    date,        user_screenname, sum_retweet, sum_reply, followers_count
'9',         '2008-06-11',        'Access2',        '2',         '0',     '812' 
'9',         '2008-06-12',        'Access2',        '0',         '1',     '813' 

Here is my sql code:

CREATE VIEW `tweet_sum` AS
    select 
        `tweets`.`account_id` AS `account_id`,
        `tweets`.`user_screenname` AS `user_screenname`,
        CAST(`tweets`.`datetime` as date) AS `period`,
        MAX(`tweets`.`followers_count`) AS `followers_count`,
        SUM(`tweets`.`is_reply`) AS `sum_reply`,
        SUM(`tweets`.`is_retweet`) AS `sum_retweet`,

    from
        `tweets`
    group by cast(`tweets`.`datetime` as date)

However my data dont seem to match with what i want as it seems that the sql is summing up all users retweets for that day. How can i group it by day and username as well?

Thanks! J

******EDIT*************************************


I would like to extend the question. Say I have one more column Reach (which is equal to followers_count times the number of columns(is_retweet, is_reply) that is greater than zero.) For example, in the output table below, the sum_retweet and sum_reply columns are both greater than zero for 2008-06-11 so i will need to take followers_count*2=1624 for the reach column.

How can i structure my sql code to do that?

account_id,    date,        user_screenname, sum_retweet, sum_reply, followers_count, **Reach** 
'9',         '2008-06-11',        'Access2',        '2',         '1',     '812',      '1624'
'9',         '2008-06-12',        'Access2',        '0',         '1',     '813',       '813'
Was it helpful?

Solution

just change your GROUP BY to

group by
   `tweets`.`account_id`,
   `tweets`.`user_screenname`,
   cast(`tweets`.`datetime` as date)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top