سؤال

I need to create a plot with the monthly median house price over time. The data is at a random order and consists selling prices of individual houses.

I already converted the daily dates to monthly and converted Value into a numeric column. But I can't manage to calculate the median per month.

below are the characteristics of the dataset.
  str(a)
'data.frame':   1411764 obs. of  2 variables:
 $ Date : Factor w/ 498 levels "1977-11","1978-06",..: 108 60 12 58 51 60 12 59 60 60 ...
$ Value: num  223000 171528 110269 172436 181512 ...
>head(a)    
    Date    Value
1  2003-01 223000.0
2  1999-01 171528.0
3  1992-01 110268.6
5  1998-11 172436.5
9  1998-04 181512.1
10 1999-01 197848.0
هل كانت مفيدة؟

المحلول 2

I'd use plyr for this. Something like this should get you a data.frame with the median per month:

library(plyr)
result_df = ddply(a, .(Date), summarize, median_value = median(Value))

plyr is known to be a little slow for larger datasets, but I would just give the code above a try. A very good alternative is data.table, which provides roughly the same functionality, but then orders of magnitude faster.

نصائح أخرى

If you have a lot of data, you will find data.table very efficient for such operations. If you don't, you will still find data.table very useful -

library(data.table)
dt <- data.table(df)
dt[,list(medianvalue = median(Value)), by = "Date"]
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top