Question

I have two data sets, one displays the schoolenrollment for 6 countries, the other one shows the GDP of each country. I want to calculate the correlation coefficient between the school enrolment and GDP of each country. I have a look for the question at : How can I create a correlation matrix in R?

But I have problem with range of the two datasets (number of rows and columns of the datasets ) …

Schoolenrollemnt dataset: https://drive.google.com/file/d/0B1NJGKqdrgRtTjcySzZOM2xKZU0/edit?usp=sharing

    CountryName year_2000   year_2004   year_2008   year_2012
    Comoros 201899884   362420484   4880000000  6800000000
    Jordan  8457923945  11407566660 54082389393 58768800833
    UAEmirates  104337375343    147824374543    21902892584 36044457920
    Egypt   99838540997 78845185709 840000000   1240000000
    Qatar   17759889598 31675273812 131611819294    210279947256
    Syria   19325894913 25086930693 88882967742 95981572517

gdp dataset: https://drive.google.com/file/d/0B1NJGKqdrgRtRm9SWm9ObGpwbU0/edit?usp=sharing

Indicator   com_2000    com_2004    com_2008    com_2012    Jor_2000    Jor_2004    Jor_2008    Jor_2012    ARE_2000    ARE_2004    ARE_2008    ARE_2012    Egy_2000    Egy_2004    Egy_2008    Egy_2012    Qat_2000    Qat_2004    Qat_2008    Qat_2012    Syr_2000    Syr_2004    Syr_2008    Syr_2012
preprimary (% gross)    2.39124 4.3563  23.68581    24.80515401 31.08014    32.71263    37.38376    33.81492    63.34796    81.92245    91.926025   71.14425    11.94312    15.1121 23.49822    27.3631 29.23454    32.69621    49.64917    73.42391    8.67231 10.00469    9.93459 10.6214
primary (% gross)   116.7763    121.0558    112.08  117.3767    102.3871    106.8326    102.04  98.87783    94.22761    102.304 107.5285    108.3284    101.3365    105.5968    109.9804    108.6207    104.7228    106.0118    104.0118    102.94  107.6219    121.8342    118.0423    122.2586
secondary (% gross) 31.8468 48.04706    60.04706    73.48619    85.90683    91.6662 93.89221    89.05884    45.0041 57.57103    68.905185   72.91143    85.83446    87.64275    89.48275    76.06258    86.4097 110.453 93.25074    12.14547    43.96275    66.56304    72.69195    74.42249
tertiary (% gross)  1.41838 3.00913 6.474124923 11.42145    28.28053    39.41155    44.30046    39.93893    0   0   0   0   31.62423    30.32905    31.64919    28.7532 22.565405   17.80551    11.3693 12.14547    12.00074    15.0151 24.20384    25.63541

the X-axis has to have the value of years (2000,2004,2008,2012), y-axis has the enrollment type... for each country i want separate graph,,,, "the graph link at the comments"

the code is not that true,, but this is my start :

    library(lattice)
        xtest<-read.csv(file.choose(), header=T, sep=",")
ytest<-read.csv(file.choose(), header=F, sep=",")
xvalues<-as.matrix(xtest)
yvalues<-as.matrix(ytest)
corvalue<-cor(xvalues,yvalues)
image(x=seq(dim(xvalues)[2]), y=seq(dim(yvalues)[2]), z=corvalue, xlab="x column", ylab="y column")
text(expand.grid(x=seq(dim(xvalues)[2]), y=seq(dim(yvalues)[2])), labels=round(c(corvalue),2))

as a test i take a subset of the original dataset of gdp , xtest :

Comoros Comoros Comoros Comoros
201899884   201899884   201899884   201899884
362420484   362420484   362420484   362420484
4880000000  4880000000  4880000000  4880000000
6800000000  6800000000  6800000000  6800000000

and for the scoolenrollment, i take subset of data, ytest :

0   2.39124 4.3563  23.68581    24.80515401
99.78652    116.7763    121.0558    112.08  117.3767
0   31.8468 48.04706    60.04706    73.48619
0.82459 1.41838 3.00913 6.474124923 11.42145

any suggestion for better output ? the output result in the comments :

Was it helpful?

Solution

i use this code:

xtest<-read.csv(file.choose(), header=T, sep=",")
ytest<-read.csv(file.choose(), header=F, sep=",")
xvalues<-as.matrix(xtest)
yvalues<-as.matrix(ytest)
corvalue<-cor(xvalues,yvalues)
image(x=seq(dim(xvalues)[2]), y=seq(dim(yvalues)[2]), z=corvalue, xlab="x column", ylab="y column")
text(expand.grid(x=seq(dim(xvalues)[2]), y=seq(dim(yvalues)[2])), labels=round(c(corvalue),2))

where the used datasets: ytest:

0   2.39124 4.3563  23.68581    24.80515401
99.78652    116.7763    121.0558    112.08  117.3767
0   31.8468 48.04706    60.04706    73.48619
0.82459 1.41838 3.00913 6.474124923 11.42145

xtest:

Comoros Comoros Comoros Comoros
201899884   201899884   201899884   201899884
362420484   362420484   362420484   362420484
4880000000  4880000000  4880000000  4880000000
6800000000  6800000000  6800000000  6800000000
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top