Question

Now I have a column of data like this:

0.000000
0.000000
0.000000
0.000000
0.024995
0.024996
0.024996
0.024997
0.024997
0.024997
0.024997
0.025004
0.025010
0.025011
0.025996
0.025996
0.025996

First I want to calculate the cumulative probability of these data, and show them in column B, then based on Column A and B, to draw a CDF graph.

Anyone one knows what formula should I use?

Was it helpful?

Solution

In the cell to the left of the first entry (B1 in my example), enter the following:

=COUNT(A$1:A1)/COUNT($A$1:$A$17)

Then fill this down the column.

To create the CDF chart, create a scatter plot (with interpolated lines) with x-values =A1:A17 and y-values =B1:B17.

Note:
Since you have several duplicate values at the start of your data, you may want to plot only x-value =A4:A17 and y-values =B4:B17. This is really depends on the nature of your variable. You can do it this way if it's clear the minimum possible value is zero.

OTHER TIPS

I am assuming the numbers you are providing are a Probability Density Function (PDF) and that you want to compute a Cumulative Distribution Function (CDF) from that PDF. In that case...

B1 would simply be =A1. B2 =B1+A2, B3 =B2+A3, etc. Then highlight the data in column B, click the "Insert" Tab, and select a line graph. Alternatively you could do in B1 =SUM(A$1:A1) and fill down.

BTW, CDF's are usually monotonically increasing between 0 and 1. Your PDF doesn't generate a CDF which goes all the way to 1. So, if this is what you're aiming for, you're either not listing all of the data in your PDF, or you need to scale things a little differently. You could divide each element in column A by the sum of those elements, and that will sum to 1. If all you want is a properly defined CDF, though, you can do it directly by setting B1 =SUM(A$1:A1)/SUM(A:A) and again fill down.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top