Domanda

I have a pretty big dataframe, called FTSE. Here his structure.

str(FTSE)

'data.frame':   21167 obs. of  5 variables:
 $ Name         : Factor w/ 2 levels "FTSE MIB","FTSE MIB NET TOT ": 1 1 1 1 1 1 1 1 1 1 ...
 $ DateLastTrade: Factor w/ 18 levels "12/10/13","12/11/13",..: 9 9 9 9 9 9 9 9 9 9 ...
 $ LastPrice    : num  19091 19008 19002 19018 19018 ...
 $ Open         : num  19091 19091 19091 19091 19091 ...
 $ LastClose    : num  19021 19021 19021 19021 19021 ...

I tried to summarize it, I've obtained:

summary(FTSE)
                Name        DateLastTrade     LastPrice          Open         LastClose    
 FTSE MIB         :10289   12/3/13 : 1370   Min.   :17750   Min.   :17811   Min.   :17805  
 FTSE MIB NET TOT :10878   12/4/13 : 1370   1st Qu.:18124   1st Qu.:18055   1st Qu.:18124  
                           12/6/13 : 1370   Median :18321   Median :18310   Median :18313  
                           12/2/13 : 1369   Mean   :18366   Mean   :18375   Mean   :18352  
                           12/5/13 : 1369   3rd Qu.:18595   3rd Qu.:18752   3rd Qu.:18697  
                           12/23/13: 1353   Max.   :19091   Max.   :19091   Max.   :19021  
                           (Other) :12966      

Pay attention at the "LastPrice" column. If I try to summarize directly LastPrice (variable that I actually need in my analysis) I've obtained this, that is pretty different from previous.

summary(FTSE$LastPrice)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  17750   18120   18320   18370   18600   19090 

I'm pretty a newbie on R and I really can't figure why values are different. It's a rounding issue? I've read a lot of answers about this but I can't find a solution to uniform the results. I'm really stuck on this problem.

Thanks to anybody that could help me or even try to understand my problem. Regards

EDIT for shujaa:

max(FTSE$LastPrice) 
[1] 19091.3

FTSE[which.max(FTSE$LastPrice), ]
      Name DateLastTrade LastPrice    Open LastClose
1 FTSE MIB       12/2/13   19091.3 19091.3  19021.48
È stato utile?

Soluzione

It's a rounding problem. All the output from summary(FTSE$LastPrice) has only 4 significant digits. If you look at ?summary in its Usage section you see the default for digits (as a named argument) coupled with the default for digits as an option gets your to 4.

 # summary(object, ..., digits = max(3, getOption("digits")-3))

> getOption("digits")
[1] 7

So try:

summary(FTSE$LastPrice, digits=7)

An unanswered question remains, however: Why does the summary.data.frame function not do the same degree of rounding, since the default argument to digits is the same for the .default and the .data.frame methods? Looking at the code you see that summary.data.frame actually first does summary.default on its columns with a fixed value of digits=12L, and later uses the digits argument to format. It seemed to me that the help page was somewhat obscure in this area in it arguments description

digits: integer, used for number formatting with signif() (for summary.default) or 
                                                 format() (for  summary.data.frame).

It completely ignores the fact that the default (and fixed) signif for data.frame columns is quite different.

Altri suggerimenti

The method for summary on data.frame is different then the default method for a vector. Probably resulting in using a different precision for computation. Read more here. If you explicitly specify digits, it works:

Lets create the data:

nr <- 21167
set.seed(nr)
temp <- data.frame(Name=sample(c("FTSE","FTSE NET"),nr,replace=T),
                   DateLastTrade=sample(1:18,nr,replace=T),
                   LastPrice = sample(18000:21000,nr,replace=T),
                   Open = 19091,
                   LastClose = 19021
                   )


str(temp)

Now lets reproduce what you got:

summary(temp)
summary(temp$LastPrice)

Now lets fix that:

summary(temp,digits=7)
summary(temp$LastPrice,digits=7)

Note: This will only work if your numbers are <10^7-1, i.e. if they have <= 7 digits. You'll not see the effect at 10^7, but if 10^7+1 is there, then the last digit will be rounded down to zero. To solve that you'll have to increase digits to 8 (and so on). It is safer to use a larger value, e.g. digit=10

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top