Normal Probability Plot interpretation [closed]

https://stackoverflow.com/questions/4858904

27-10-2019
|

Question

I have a very basic question. What is the basis of the normal probability plot i.e. what do the probabilities represent? I am testing for a standard normal distribution. My normplot (in MATLAB) revealed that the values were more or less in a straight line BUT the probability of 0.5 corresponded to a value other than zero.

My question is, how do I interpret this? Does this mean that my data is normally distributed but has a non-zero mean (i.e. not standard normal) or does this probability only reflect something else? I tried Google and one link said the probabilities are the cumulative probabilities from the z-table, and I can't figure out what to make of it.

Also in MATLAB, is it that as long as the values are fitting into the line drawn by the program (the red dotted line) the values come from a normal distribution? In one of my graphs, the dotted line is very steep but the values fit in, does this mean that the one or two values that are way outside this line are just outliers?

I'm very new to stats, so please help!

Thanks!

Solution

My question is, how do I interpret this? Does this mean that my data is normally distributed but has a non-zero mean (i.e. not standard normal) or does this probability only reflect something else?

You are correct. If you run normplot and get data very close to the fitted line, that means your data has a cumulative distribution function that is very close to a normal distribution. The 0.5 CDF point corresponds to the mean value of the fitted normal distribution. (Looks like about 0.002 in your case)

The reason you get a straight line is that the y-axis is nonlinear, and it's made to be "warped" in such a way that a perfect Gaussian cumulative distribution would map into a line: the y-axis marks are linear with the inverse error function.

When you look at the ends and they have steeper slopes than the fitted line, that means your distribution has shorter tails than a normal distribution, i.e. there are fewer outliers, perhaps due to some physical constraint that prevents excessive variation from the mean.

OTHER TIPS

The normal distribution is a density function. The probability of any single value will be 0. This because you have the total probability ( = 1) distributed between an infinite number of values (its a continuous function).

What you have there in the graph (of the normal distribution) is how the probability is distributed (y axis) around the values (x axis). So what you can get from the graph is the probability of an interval either between 2 points, from -infinite to any point, or from any point to +infinte. This probability is obtained integrating the function (of the normal distribution) defined from point1 to point2.

But you don't have to do this integral since you have the z table. The z table gives you the probability of x being between -infinite and x (aplying the equation that relates x to z)

I don't have matlab here, but i guess the straight line you mention is the cumulative distribution function, which tells you the probability of x between [-infinite, x], and is determined by the sum (or integral in this case) from -infinite to the value of x (or obtained in the z table)

Sorry if my english was bad. I hope i was helpful.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow