Selecting tweets based on searchwords in R

https://stackoverflow.com/questions/22926148

29-06-2023
|

Frage

For my thesis I am working with tweets. I am trying to select only the tweets that contain certain words. Since I am analysing the tweets geographically I have them as a SpatialPointDataFrame (SPDF). As a result I want to see on a map where the tweets about these words come from so I want to select them from the SPDF, as a new SPDF.

I figured this should be easy with the package tm (Text Mining) or with general functions like scan. But unfortunately I find it difficult to find a function that allows me to scan the tweets for a certain word. My next step would be to work around it, converting the tweets in the SPDF to a textfile, selecting the tweets using one of the functions I have been trying with the SPDF, and then link them back to the SPDF to make them spatial again.

In the programming language R someone has told me to not start making functions since most functions you will try to make already exist. So before I am breaking my brain on this situation I am posting it here hoping someone has the result at hand.

So I have a SPDF with a lot of tweets. I want to select all tweets that contain a certain word. And that's it! It still sounds so easy to me and I feel like I am just not getting the right line of thought at the moment.

Please help!

EDIT!!!!!

all_tweets_containing_word_test_are_true <- grepl('test', spatialpointdataframe$twt_txt)

??? Select all true numbers of spatialpointdataframe ???

Lösung

In a regular data frame you use grepl like this:

Sub.DF <- DF[grepl('test', DF$twt_txt),]

As long as an SPDF works like that then this should be what you want. You won't even need to make the Sub.DF object if you just want to plot it. If you use ggplot2 or ggmap or something then just use data=DF[grepl('test', DF$twt_txt),]

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow