I am trying to generate an .arff file from a csv data file I have. Now I am totally new to Weka and have started using it just a day back. I am trying out a simple twitter sentiment analysis with this for starters. I have generated training data in CSV. Contents of CSV file are as follows:
tweet,affinScore,polarity
ATAUTHORcfoblog is giving away a $25 Amex gift card (enter to win over $600 in prizes!) http://t.co/JD8EP14c ,4,4
"American Express has always been my dark horse acquirer of ATAUTHORFoursquare. Bundle in Square-like payments & its a lite-retailer platform, no? ",0,1
African-American Demos Express Ethnic Identity Differently http://t.co/gInv4bKj via ATAUTHORmediapost ,0,3
Google ???????? Visa ? American Express http://t.co/eEZTSiHY ,0,4
Secrets to Success from Small-Business Owners : Lifestyle :: American Express OPEN Forum http://t.co/b85F8JX0 via ATAUTHOROpenForum ,2,1
RT ATAUTHORhunterwalk: American Express has always been my dark horse acquirer of ATAUTHORFoursquare. Bundle in Square-like payments & its a lite ... ,0,1
Winning Surveys $1500 american express Huggies Sweeps http://t.co/WoaTFowp ,4,1
I root for Square mostly because a small business that takes Square is also one that takes American Express. ,0,1
I dont know how bitch be acting American Express but they cards be saying DEBIT ON IT HAVE A ?? PLEASE!!! ,-5,2
Uh oh... RT ATAUTHORBlackArrowBella: I dont know how bitch be acting American Express but they cards be saying DEBIT ON IT HAVE A ?? PLEASE!!! ,-5,2
Just got another credit card. A Blue Sky card with American Express. Its gonna help pay for the honeymoon! ATAUTHORAmericanExpress ,-1,1
Follow ATAUTHORShaveMagazine and ReTweet this msg to be entered to #Win an American Express Gift card. Winners contacted bi-weekly by direct msg! ,2,4
American Express Gold zakelijk aanvragen: http://t.co/xheZwmbt ,0,3
RT ATAUTHORhunterwalk: American Express has always been my dark horse acquirer of ATAUTHORFoursquare. Bundle in Square-like payments & its a lite ... ,0,1
Here first attribute is actual tweet, second is AFFIN score and third is actual classification class (1- Positive, 2-Negative, 3-Neutral, 4-Spam)
Now I try to generate .arff format from it using code:
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.CSVLoader;
import java.io.File;
public class CSV2Arff {
/**
* takes 2 arguments:
* - CSV input file
* - ARFF output file
*/
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.out.println("\nUsage: CSV2Arff <input.csv> <output.arff>\n");
System.exit(1);
}
// load CSV
CSVLoader loader = new CSVLoader();
loader.setSource(new File(args[0]));
Instances data = loader.getDataSet();
// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
saver.setFile(new File(args[1]));
saver.setDestination(new File(args[1]));
saver.writeBatch();
}
}
This generates .arff file that looks somewhat like:
@relation file
@attribute tweet {_ATAUTHORcfoblog_is_giving_away_a_$25_Amex_gift_card_(enter_to_win_over_$600_in_prizes!)_http://t.co/JD8EP14c_,'American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite-retailer_platform,_no?_',African-American_Demos_Express_Ethnic_Identity_Differently_http://t.co/gInv4bKj_via__ATAUTHORmediapost_,Google_????????_Visa_?_American_Express__http://t.co/eEZTSiHY_,Secrets_to_Success_from_Small-Business_Owners_:_Lifestyle_::_American_Express_OPEN_Forum_http://t.co/b85F8JX0_via__ATAUTHOROpenForum_,RT__ATAUTHORhunterwalk:_American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite_..._
@data
_ATAUTHORcfoblog_is_giving_away_a_$25_Amex_gift_card_(enter_to_win_over_$600_in_prizes!)_http://t.co/JD8EP14c_,4,4
'American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite-retailer_platform,_no?_',0,1
African-American_Demos_Express_Ethnic_Identity_Differently_http://t.co/gInv4bKj_via__ATAUTHORmediapost_,0,3
Google_????????_Visa_?_American_Express__http://t.co/eEZTSiHY_,0,4
Secrets_to_Success_from_Small-Business_Owners_:_Lifestyle_::_American_Express_OPEN_Forum_http://t.co/b85F8JX0_via__ATAUTHOROpenForum_,2,1
RT__ATAUTHORhunterwalk:_American_Express_has_always_been_my_dark_horse_acquirer_of__ATAUTHORFoursquare._Bundle_in_Square-like_payments_&_its_a_lite_..._,0,1
I am new to Weka but from what I have read, I have a suspicion that this ARFF is not correctly formed. Can anyone comment on it?
Also if it is wrong, can someone point me to where exactly am I going wrong?