Pergunta

My question today refers to a data frame I'm working on in R. The header of the data frame looks like the following: String(unique), Integer N[0-23]

Those 24 Integer values represent the frequency of the String associated with each hour of the day. Logically, the int values in each row sum up to the number how often the string appears in the data in general.

Thing is, I don't need the real frequency of the string at a certain hour but the percentage this frequency represents in relation to the sum of the integer values in all rows.

My lecturer hinted that table() might be the right R tool for that but I honestly don't understand how that is supposed to help me.

If all else fails, I'll calculate it in Java - although I'd really appreciate your help to do this in R.

Thanks for reading so far and thanks in advance for your help,

Rickyfox

@@@@@@I am your edit, read me @@@@@@

With the help I got from James I got the following proptable

Thing is, the percentages sum up to 100 for each row, but they should do so for the whole table. Is there a way to do that?

Foi útil?

Solução

Use prop.table on a matrix containing the values:

x <- data.frame(id=letters[1:3],val0=1:3,val1=4:6,val2=7:9)
prop.table(as.matrix(x[-1]),margin=1)
           val0      val1      val2
[1,] 0.08333333 0.3333333 0.5833333
[2,] 0.13333333 0.3333333 0.5333333
[3,] 0.16666667 0.3333333 0.5000000

Edit: A fully working example:

tt=read.table("topichitsperhod.csv",sep=",",header=TRUE)  
tt=na.omit(tt[-1])
pt=prop.table(tt[-1],margin=NULL)

First column is being left out because it held the topic strings.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top