Question

I am trying to calculate Pearsons correlation between 13 variables in a tab delimited text file where each column is a variable. I am using java and was hoping that somebody can give me some guidance as to which libraries or which functions I should be using. I am guessing I will first need to read the contents of the file but can't figure out how to essentially make the program know that each column is an like an array which would than enable me to do my required calculations. I would have thought the java.io package would be the best place for me to start but just can't figure out what classes I could use for my problem. I have also looked at http://commons.apache.org/math/ which has a function for measuring pearsons correlation but that would be too easy and as this is a Uni assignment I have to implement it form scratch. By looking at the appache pearsons correlation they seem to have approached the problem like a matrix where each column of the matrix is a variable.

Sorry for the lengthy description of my problem. If you guys know any websites or any good kewords to search for or any other information I would greatly appreciate. Thanks, Arlind.

Was it helpful?

Solution

You should be able to do this using just the standard java Math, String, File I/O libraries, and a few arrays and loops!

Read this first to learn how to read in the file. http://www.roseindia.net/java/beginners/java-read-file-line-by-line.shtml

Inside the loop parse your csv file by using the String.split(String regex) method. e.g. strLine.split(",").

Convert this to an array of doubles, by using Double.parseDouble for each String in the String[]

From there you can use the Math.sqrt(double a) and Math.pow(double a, double b) functions along with some simple loops to calculate your correlation for each pair of variables.

Hopefully that's enough info to get you started, feel free to post back if you want more help!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top