Unsure how to plot a histogram with variable break points from a one column matrix in R

StackOverflow https://stackoverflow.com/questions/22927459

  •  29-06-2023
  •  | 
  •  

Domanda

I have a matrix which has the following approximate dimensions: 20000 x 1. I would like to plot the values in a histogram with bins of length 0.01 from -0.05 to +0.15. However, the values in the matrix are pretty random - for eg, 0.0123421, 0.0124523, 0.124523, -0.011234, etc. Thus, I need to first count the number of values that fall into a particular bin, and then plot a histogram. For the numbers I gave, I'd have 2 values between 0.01 and 0.02, 1 between -0.02 and -0.01, and so on, which I need in a histogram. Is there an easy way to do this? I'm relatively new to R, so any help is appreciated!

È stato utile?

Soluzione

As an example illustrating breaks (content summarized from an excellent post on R-bloggers which you can refer to here), lets assume that you start with some normally distributed data. In R, you can generate normal data this way using the rnorm() function:

data <-rnorm(n=1000, m=24.2, sd=2.2) 

We can then generate a simple histogram using the following call:

hist(data)

Now, let's assume that you want to have coarser or finer groups for your bins. There are a number of ways to do this. You could, for example, use the breaks() option. Below is a tidy example illustrating this:

hist(data, breaks=20, main="Breaks=20")
hist(data, breaks=5, main="Breaks=5")

Now, if you want more control over exactly the breakpoints between bins, you can be more precise with the breaks() option and give it a vector of breakpoints, like this:

hist(data, breaks=c(17,20,23,26,29,32), main="Breaks is vector of breakpoints")

This dictates exactly the start and end point of each bin. Of course, you could give the breaks vector as a sequence like this to cut down on the messiness of the code:

hist(data, breaks=seq(17,32,by=3), main="Breaks is vector of breakpoints")

Note that when giving breakpoints, the default for R is that the histogram cells are right-closed (left open) intervals of the form (a,b]. You can change this with the right=FALSE option, which would change the intervals to be of the form [a,b). This is important if you have a lot of points exactly at the breakpoint.

Altri suggerimenti

hist(x, breaks = seq(-.05, .15, .01))

See ?hist

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top