Question

I have to collect a big set (3000-5000) tweets selected randomly for a manual annotation, but I have some restrictions:

  1. all the tweets must have an image attached to;
  2. duplicated tweets are not allowed;
  3. I need information like the comments, so it wouldn't be useful to take them few seconds after their posting time,
  4. I have to process my data in Java, so I intend to use Twitter4J or Hosebird Client.
  5. Possibly, the tweets must be chosen randomly, without specify any tag (like "food" or "sport").

I managed, using Hosebird Client, to create a Stream and to fill a BlockingQueue with tweets, but in this way I don't respect restriction 3 and 5.

With Twitter4J, on the other hand, using the class Query, I am not able to find a solution that respects from 1 to 5 at the same time.

I feel that using a Queue will help me with the issue of duplicated tweets, but I don't succeed in respect all my prerequisites.

My question is: is possible respect all my restrictions with twitter4j or HBC? If yes, can someone give me some code or advice?

Thanks

Was it helpful?

Solution

I think twitter4j is enough to satisfy all your conditions. You can use streaming in twitter4J. It gives you a tweet stream if you don't give it a filtering parameter. For example;

Condition-1) While you are getting tweets from streaming you can look at the media_entity section to check if there is at least one media object and its type is image. If so, you can let it be saved in a table on any database.

For instance:

 MediaEntity[] mediaEntities = status.getMediaEntities();

       for (MediaEntity mediaEntity : mediaEntities)
               if (mediaEntity.getType() == "photo")
               {
                  // Save status object which has the tweet and its metadata

                break;
               }

Condition-2) Each tweet has a unique tweet id and another thing is a streaming application can not hit two times for the same tweet. However, if you run two different streaming apps independently, it might be possible to hit the same tweet by these two apps. Even though you run two apps, you can check the tweet id in database table before saving.

Condition-3) (Elaborate what you mean more!) As far as I know, there is no commenting mechanism in Twitter yet as Facebook does. If you mean the retweets, you can search retweets of a particular tweet at the same time by another application with Twitter4J.

Condition-4) I don't know Hosebird Client system but I know and use Twitter4J a lot. I can say Twitter4J is a pure Java based system. What you need to use is just add jar files as references to your java application and it is ready to use. It is quite simple.

Condition-5) I have given a set of keywords to my streaming application to catch the tweets which contains the particular keywords or hashtags. In your case, you might not give any parameter, so that means the streaming application will catch all the tweets without any conditions. For this, you can look at my filtering mechanism as an example:

 FilterQuery fq = new FilterQuery();
  String keywords[] = {"sport", "politics", "health"}; //etc..

  fq.track(keywords);

  twitterStream.addListener(statusListener);
  twitterStream.filter(fq);

Finally, I pasted below a full java method about how you can use it as an example. I hope it helps you. :D

  private static void GetTweetStreamForKeywords()
        {
        TwitterStream twitterStream = new TwitterStreamFactory(config).getInstance();

        StatusListener statusListener = new StatusListener() {

         @Override
         public void onStatus(Status status) {
           // The main section that you get the tweet. You can access it by status object.
           // You can save it in a database table.
         }


                @Override
                public void onDeletionNotice(StatusDeletionNotice sdn) {
                    throw new UnsupportedOperationException("Not supported yet."); 
                }

                @Override
                public void onTrackLimitationNotice(int i) {
                    throw new UnsupportedOperationException("Not supported yet."); 
                }

                @Override
                public void onScrubGeo(long l, long l1) {
                    throw new UnsupportedOperationException("Not supported yet."); 
                }

                @Override
                public void onStallWarning(StallWarning sw) {
                    throw new UnsupportedOperationException("Not supported yet.");
                }

                @Override
                public void onException(Exception ex) {
                    logWriter.WriteErrorLog(ex, "onException()");
                }
            };

            FilterQuery fq = new FilterQuery();        

            String keywords[] = {"sport", "politics", "health"};

            fq.track(keywords);        

            twitterStream.addListener(statusListener);
            twitterStream.filter(fq);          
      }   
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top