Вопрос

I have a data frame with only one column and 158112 different values. The values are not ordered randomly. Every 24 values represent one day. Every day is listed in there 18 times and the followed by the next day, eg. 18x24 for the 01.01.2012, 18x24 for the 02.01.2012 and so on.

        df
1       593
2       939
3       734
4       791
5       184
6       495
...
158112  683

I want to organise them in a new data frame in a different structure. The process would kind of look like this:

Take the first 24 values and put them into the new data frame "new_df" column no. 1, take the next 24 values and put the into "new_df" column no. 2, take the next 24 values an put the into "new_df" column no. 3. Do this until 18 columns are filled with each 24 values and then start again with column no.1 and add the next 24 values and so on... So at the end I would like to have the "new_df" with 18 columns and 8784 rows each.

Any ideas?

Это было полезно?

Решение 2

I think you want something like the following:

# sample data
mydf <- data.frame(df=rnorm(18*8784,0,1))
# split dataframe into chunks (of 18*24)
mylist <- split(mydf,rep(1:366,each=432))
# turn each chunk into a matrix of the right shape and `rbind` them back together
new_df <- do.call(rbind, lapply(mylist, function(x) matrix(x[,1],nrow=24)))

You can check if this is right with:

all.equal(mydf[1:24,1],new_df[1:24,1]) # first 24 values are first column
all.equal(mydf[25:48,1],new_df[1:24,2]) # next 24 values are second column
all.equal(mydf[433:456,1],new_df[25:48,1]) # day 2 starts in the first column

All of those should be TRUE. And I guess you want it as a data.frame, so just use as.data.frame(new_df) to get the result back into a data.frame.

Другие советы

Try this:

set.seed(1)  
df <- data.frame(df=sample(1:999, 158112, TRUE))  # creating some data
new_df <- data.frame(matrix(unlist(df), ncol=18)) # putting df into a 8784 x 18 data.frame 
dim(new_df) # checking the dimensions of new_df

Perhaps better than the alternatives so far is to use an array to manipulate your data into your desired structure. Since you are just dealing with a single vector and you want to fill your data in by columns, you just need to assign the dims to your vector.

Here is a simplified example. We'll start with a vector of length 40.

mydata <- rep(1:8, each = 5)
mydata
#  [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 
# [21] 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8

Now, imagine we want to convert this into four columns where the first 20 values are grouped together and the second 20 values are grouped together. (In your data, it would be the first 24*18 values grouped together to represent 18 columns of records for one day.)

Here's how we would do that:

myarray <- array(mydata, dim=c(5, 4, 2),
                 dimnames = list(NULL, NULL,
                                 c("2012-01-01", "2012-01-02")))
myarray
# , , 2012-01-01
# 
#      [,1] [,2] [,3] [,4]
# [1,]    1    2    3    4
# [2,]    1    2    3    4
# [3,]    1    2    3    4
# [4,]    1    2    3    4
# [5,]    1    2    3    4
# 
# , , 2012-01-02
# 
#      [,1] [,2] [,3] [,4]
# [1,]    5    6    7    8
# [2,]    5    6    7    8
# [3,]    5    6    7    8
# [4,]    5    6    7    8
# [5,]    5    6    7    8

Perhaps you want to stop at this point. However, if you want to go all the way to a single data.frame, that's also easily possible.

Using @Jilber's sample data just for purposes of easy replication:

set.seed(1)
df <- data.frame(df=sample(1:999, 158112, TRUE))
# Hopefully you've done your math correctly
#   R will recycle if the dims aren't correct
#   for your data.
Ndays <- nrow(df)/(24*18)
dfarray <- array(df$df, 
                 dim = c(24, 18, Ndays), 
                 # Add dimnames by creating a date sequence
                 dimnames = list(NULL, NULL, as.character(
                   seq(as.Date("2012-01-01"), by = "1 day", 
                       length.out = Ndays))))
# Use `apply` to convert this to a `list` of `data.frame`s
temp <- apply(dfarray, 3, as.data.frame)
# Use `lapply` to create your intermediate `data.frame`s
out <- lapply(names(temp), function(x) {
  data.frame(date = as.Date(x), temp[[x]])
})
# Use `do.call(rbind, ...)` to get your final `data.frame`
final <- do.call(rbind, out)

The first few lines of the output look like this:

head(final)
#         date  V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15
# 1 2012-01-01 266 267 732 347 455 991 729 724 101 649 307 702 133 841 443
# 2 2012-01-01 372 386 693 334 410 496 453 338 927 953 578 165 222 720 157
# 3 2012-01-01 573  14 478 476 811 484 175 630 283 953 910  65 227 267 582
# 4 2012-01-01 908 383 861 892 605 174 746 840 590 340 143 754 132 495 970
# 5 2012-01-01 202 869 438 864 655 755 105 856 111 263 415 620 981  84 989
# 6 2012-01-01 898 341 245 390 353 454 864 391 840 166 211 170 327 354 177
#   V16 V17 V18
# 1 109 232  12
# 2 333 241 940
# 3 837 797 993
# 4 277 831 358
# 5 587 114 747
# 6 836 963 793

I still do strongly suggest that you become familiar with the "xts" package if you're going to be doing a lot of work with time series data though.

Conversion from the "final" data.frame above to an xts object is easy:

library(xts)
Final <- xts(final[-1], order.by=final[[1]])

And this will let you easily do fun things like this:

apply.quarterly(Final, mean)
#                  V1       V2       V3       V4       V5       V6
# 2012-03-31 490.5256 493.8338 507.4272 503.5421 495.0929 494.4025
# 2012-06-30 511.5792 508.1493 500.9043 500.2152 509.0614 499.9881
# 2012-09-30 496.2672 501.1399 496.3542 493.7423 504.8170 507.1671
# 2012-12-31 503.9583 502.5616 502.8936 509.2120 503.2387 502.4678
#                  V7       V8       V9      V10      V11      V12
# 2012-03-31 490.2477 492.2115 510.6525 499.8168 506.9510 494.3654
# 2012-06-30 494.0962 497.0357 506.9267 500.2198 501.4263 494.1117
# 2012-09-30 509.9561 487.0543 497.2206 485.4511 498.1191 494.5190
# 2012-12-31 503.0095 500.7903 494.7428 494.1409 502.0181 496.9764
#                 V13      V14      V15      V16      V17      V18
# 2012-03-31 504.4130 499.8581 503.0023 501.0137 499.1021 504.7711
# 2012-06-30 500.0504 501.2903 490.7582 502.7395 503.5737 496.4821
# 2012-09-30 493.4860 499.2088 500.7260 503.1907 491.9583 490.4293
# 2012-12-31 500.4348 507.9475 499.3637 486.4438 496.8220 492.8890
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top