Histogram, density kernel and normal distribution

https://stackoverflow.com/questions/15408735

23-03-2022
|

Question

I want to plot a histogram, density (Gaussian kernel) and the fitted corresponding normal distribution, of the simple losses of the Allianz SE company. (That means, the simple losses are minus the simple returns)

I have the following code:

hist(alvsloss,breaks = 100, freq=F,main="Histogramm,
 density curve (gaussian kernel) of Allianz simple losses ",xlab="loss in percent",ylab="density")
lines(density(alvsloss), col="red", lwd=2)
curve(dnorm(x, mean = mean(alvsloss), sd = mean(alvsloss)), add=TRUE, col="blue", lty="dotted")

Now I have the first problem:

The fitted normal distribution is not drawn, I get the (german) error message:

In dnorm(x, mean = mean(alvsloss), sd = mean(alvsloss)) :
  NaNs wurden erzeugt

The normal distribution curve is not plotted.

The second is a question in general: If I leave out the normal distribution, so I only have the histogram and the density. Than I can change between frequency true and false via the command

 freq=T

 freq=F

I attached a screenshot of both pictures (I have to upload it, since I have not at least 10 reputations). I do not understand them, if I have freq=T that means, I have the density values at the y axis. So there should be values like 0.0012 or 0.1, but not values of 300 or 400, density should be relative values? Also the kernel does not match in any way, it is clearly wrong? If I have freq=F I get the right picture. Now I have absolute values, so e.g. there were 30 cases in which I had a return which was about 0.0 (the middle high peak), right? Now the density does fit, but I would have expected it to do not fit in this case, since I thought it is in freq=T values, so it should be another way round, in this picture it should be wrong?

If this is answered, I would have further questions: I do not like the x axis, how can I have a more detailed scaling? Is it right to say the following: The tail on the right from 0.5 up to 0.1 is heavier than the tail on the left side, so in this area we have a higher probability of losses than of gains? Whereas the extreme values occur only on the left side: Values of -0.2 and even one with approx -0.4. So extreme losses in this case are not occurring, whereas extreme gains are realized? Is this right?

What is my mistake, I cannot see it?

Screenshot:

You can find the data here

It is the alvsloss data

The complete solution is:

hist(alvsloss,breaks = 100, freq=F,main="Histogramm, density curve (gaussian kernel) of Allianz simple losses ",xlab="loss in percent",ylab="density")
lines(density(alvsloss), col="red", lwd=2) 
curve(dnorm(x, mean = mean(alvsloss), sd = sd(alvsloss)), add=TRUE, col="blue",lwd=2)

which gives the following picture:

enter image description here

seems to be correct, right?

Solution

The R help says :

logical; if TRUE, the histogram graphic is a representation of frequencies, the counts component of the result; if FALSE, probability densities, component density, are plotted

When the freq attribute is TRUE, the number of times the values appear in the data is plotted. If you have a vector with 400 times the value 1 and 300 times the value 0, the height of the bars would be 400 and 300 when freq=TRUE, and 4/7 and 3/7 when freq=FALSE.

For the second part of your question, if there are NA values in your vector, you have to calculate the mean with :

mean(...,na.rm=TRUE)

Furthermore, as ndoogan said, I think there is a typo in your code. Try this instead :

dnorm(x, mean = mean(alvsloss,na.rm=TRUE), sd = sd(alvsloss,na.rm=TRUE))

Finally, you cannot use curve to plot a vector. It works only for functions. So you can try :

lines(dnorm(x, mean = mean(alvsloss,na.rm=TRUE), sd = sd(alvsloss,na.rm=TRUE)), col="blue", lty="dotted")

curve(dnorm,from=ToBeFilled,to=ToBeFilled,col="blue", lty="dotted",mean = mean(alvsloss,na.rm=TRUE), sd = sd(alvsloss,na.rm=TRUE)))

ToBeFilled are respectively the bounds of the interval where you want to plot.

OTHER TIPS

I don't know where to get the data you are working with, but try setting your standard deviation in the dnorm plot to the standard deviation of your data...

curve(dnorm(x, mean = mean(alvsloss), sd = sd(alvsloss)), add=TRUE, col="blue", lty="dotted")

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow