Question

I have a data file that represents a contingency table that I need to work with. The problem is I can't figure out how to load it properly.

Data structure:

  • Rows: individual churches
  • 1st Column: Name of the church
  • 2nd - 12th column: Mean age of followers
  • Every cell: Number of people who follows corresponding church and are correspondingly old.

//In the original data set only the age range was available (e.g. between 60-69) so to enable computation with it I decided to replace it with mean age (e. g. 64.5 instead of 60-69)

Data sample:

name;7;15;25
catholic;25000;30000;15000
hinduism;5000;2000;3000
...

I tried to simply load the data and make them a 'table' so I could expand it but it didn't work (only produced something really weird).

dataset <- read.table("C:/.../dataset.csv", sep=";", quote="\"")
dataset_table <- as.table(as.matrix(dataset))

When I tried use the data as they were to produce a simple graph it didn't work either.

barplot(dataset[2,2:4])
Error in barplot.default(dataset[2,2:4]) :    'height' must be a vector or a matrix

Classing dataset[2,2:4] showed me that it is a 'list' which I don't understand (I guess it is because dataset is data.frame and not table).

If someone could point me into the right direction how to properly load the data as a table and then work with them, I'd be forever grateful :).

Was it helpful?

Solution

If your file is already a contingency table, don't use as.table().

df <- read.table(header=T,sep=";",text="name;7;15;25
catholic;25000;30000;15000
hinduism;5000;2000;3000")
colnames(df)[-1] <- substring(colnames(df)[-1],2)
barplot(as.matrix(df[2,2:4]), col="lightblue")

The transformation of colnames(...) is because R doesn't like column names that start with a number, so it prepends X. This codes just gets rid of that.

EDIT (Response to OP's comment)

If you want to convert the df defined above to a table suitable for use with expand.table(...) you have to set dimnames(...) and names(dimnames(...)) as described in the documentation for expand.table(...).

tab  <- as.matrix(df[-1])
dimnames(tab)        <- list(df$name,colnames(df)[-1])
names(dimnames(tab)) <- c("name","age")
library(epitools)
x.tab <- expand.table(tab)
str(x.tab)
# 'data.frame': 80000 obs. of  2 variables:
#  $ name: Factor w/ 2 levels "catholic","hinduism": 1 1 1 1 1 1 1 1 1 1 ...
#  $ age : Factor w/ 3 levels "7","15","25": 1 1 1 1 1 1 1 1 1 1 ...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top