Frage

Ich habe einen Datenrahmen "foo" suchen, wie diese

Date       Return
1998-01-01  0.02
1998-01-02  0.04
1998-01-03 -0.02
1998-01-04 -0.01
1998-01-05  0.02
...
1998-02-01  0.1
1998-02-02 -0.2
1998-02-03 -0.1
etc.

Ich möchte diesen Datenrahmen eine neue Spalte hinzuzufügen, mir den Dichtewert der entsprechenden Rendite zeigt. Ich habe versucht:

foo$density <- for(i in 1:length(foo$Return)) density(foo$Return, 
from = foo$Return[i], to = foo$Return[i], n = 1)$y

Aber es hat nicht funktioniert. Ich habe wirklich Schwierigkeiten, eine „Funktion“ auf jede Zeile angewendet wird. Aber vielleicht gibt es auch eine andere Möglichkeit, es zu tun, nicht Dichte () verwenden?

Was würde ich im Wesentlichen zu tun ist, um die Einbaudichte Wert von Dichte (), um die Erträge in foo zu extrahieren. Wenn ich nur Handlung tun (Dichte ($ foo Return)) es gibt mir die Kurve, jedoch würde Ich mag die Dichtewerte zu den Renditen angebracht haben.

@Joris:

foo$density <- density(foo$Return, n=nrow(foo$Return))$y 

berechnet etwas scheint jedoch falsch Dichtewerte zurück.

Danke, dass du mich für die Unterstützung aus! Dani

War es hilfreich?

Lösung

On second thought, forget about the density function, I suddenly realized what you wanted to do. Most density functions return a grid, so don't give you the evaluation in the exact points. If you want that, you can eg use the sm package:

require(sm)
foo <- data.frame(Return=rpois(100,5))
foo$density <- sm.density(foo$Return,eval.points=foo$Return)$estimate
# the plot
id <- order(foo$Return)
hist(foo$Return,freq=F)
lines(foo$Return[id],foo$density[id],col="red")

If the number of different values is not that big, you can use ave() :

foo$counts <- ave(foo$Return,foo$Return,FUN=length)

If the purpose is to plot a density function, there's no need to calculate it like you did. Just use

plot(density(foo$Return))

Or, to add a histogram underneath (mind the option freq=F)

hist(foo$Return,freq=F)
lines(density(foo$Return),col="red")

Andere Tipps

An alternative to sm.density is to evaluate the density on a finer grid than default, and use approx or approxfun to give the interpolated values of the density at the Returns you want. Here is an example with dummy data:

set.seed(1)
foo <- data.frame(Date = seq(as.Date("2010-01-01"), as.Date("2010-12-31"),
                             by = "days"),
                  Returns = rnorm(365))
head(foo)
## compute the density, on fin grid (512*8 points)
dens <- with(foo, density(Returns, n = 512 * 8))

At this point, we could use approx() to interpolate the x and y components of the returned density, but I prefer approxfun() which does the same thing, but returns a function which we can then use to do the interpolation. First, generate the interpolation function:

## x and y are components of dens, see str(dens)
BAR <- with(dens, approxfun(x = x, y = y))

Now you can use BAR() to return the interpolated density at any point you wish, e.g. for the first Returns:

> with(foo, BAR(Returns[1]))
[1] 0.3268715

To finish the example, add the density for each datum in Returns:

> foo <- within(foo, Density <- BAR(Returns))
> head(foo)
        Date    Returns   Density
1 2010-01-01 -0.6264538 0.3268715
2 2010-01-02  0.1836433 0.3707068
3 2010-01-03 -0.8356286 0.2437966
4 2010-01-04  1.5952808 0.1228251
5 2010-01-05  0.3295078 0.3585224
6 2010-01-06 -0.8204684 0.2490127

To see how well the interpolation is doing, we can plot the density and the interpolated version and compare. Note we have to sort Returns because to achieve the effect we want, lines needs to see the data in increasing order:

plot(dens)
with(foo, lines(sort(Returns), BAR(sort(Returns)), col = "red"))

Which gives something like this: Density (in black) and interpolated version (in red)

As long as the density is evaluated at sufficiently fine a set of points (512*8 in the above example) you shouldn't have any problems and will be hard pushed to tell the difference between the interpolated version and the real thing. If you have "gaps" in the values of your Returns then you might find that, as lines() just joins the points you ask it to plot, that straight line segments might not follow the black density at the locations of the gaps. This is just an artefact of the gaps and how lines() works, not a problem with the interpolation.

If we ignore the density issue, which @Joris expertly answers, you don't seem to have grasped how to set up a loop. What you are returning from the loop is the value NULL. This is the value that is being inserted in foo$density and that won't not work because it is the NULL, which means it is an empty component, i.e. it doesn't exists as far as R is concerned. See ?'for' for further details.

> bar <- for(i in 1:10) {
+     i + 1
+ }
> bar
NULL

> foo <- data.frame(A = 1:10, B = LETTERS[1:10])
> foo$density <- for(i in seq_len(nrow(foo))) {
+     i + 1
+ }
> head(foo) ## No `density`
  A B
1 1 A
2 2 B
3 3 C
4 4 D
5 5 E
6 6 F

If you want to insert the return value for each iteration of the loop, you must do the assignment inside the loop, and that means you should pre-allocate the storage space before you enter the loop, e.g. the above loop if we wanted to have i + 1 for i in 1,...,10, we could do this:

> bar <- numeric(length = 10)
> for(i in seq_along(bar)) {
+     bar[i] <- i + 1
+ }
> bar
 [1]  2  3  4  5  6  7  8  9 10 11

Of course, you would not do such a calculation as this via a loop, because R is vectorized and will work with vectors of numbers rather than you having to code each computation element by element as you might in C or other programming languages.

> bar <- 1:10 + 1
> bar
 [1]  2  3  4  5  6  7  8  9 10 11

Notice that R has turned 1 into a vector of 1s of sufficient length to allow the computation to proceed, something known as recycling in R-speak.

Sometimes, you might need to iterate over an object with a loop or using one of the s|l|t|apply() family, but most often you will find a function that works for an entire vector of data in one go. This is one of the advantages of R over other programming languages, but does require you to get your head into vectorized mode.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top