Clustering dots in a scatterplot

https://stackoverflow.com/questions/23348550

11-07-2023
|

Question

Let's say I have this data.frame:

df <- data.frame(x = rep(1, 20), y = runif(20, 10, 20))

and I want to plot df$y vs. df$x.

Since the x values are constant, points that have identical or close y values will be plotted on top of each other in a simple scatterplot, which kind of hides the density of points at such y-values. One solution for that situation is of course to use a violin plot.

I'm looking for another solution - plotting clusters of points instead of the individual points, which will therefore look similar to a bubble plot. In a bubble plot however, a third dimension is required in order to make the bubbles meaningful, which I don't have in my data. Does anyone know of an R function/package that take as input points (and probably a defined radius) and will cluster them and plot them?

Solution 5

look at the sunflowerplot function (and the xyTable function that it uses to count overlapping points).

You could also use the my.symbols function from the TeachingDemos package with the results of xyTable to use other shapes (polygrams or example).

OTHER TIPS

You can jitter the x values:

plot(jitter(df$x),df$y)

You could try a hexplot, using either the hexplot library or stat_binhex in ggplot2.

http://cran.r-project.org/web/packages/hexbin/

http://docs.ggplot2.org/0.9.3/stat_binhex.html

The other standard approach (vs. jitter) is to use a partially transparent color, so that overlapping points will appear darker than "lone" points.

De gustibus, etc.

Using transparency is another solution. E.g.:

ggplot(df, aes(x=x, y=y)) +
  geom_point(alpha=0.2, size=3)

When there is only one x value, a density plot:

ggplot(df, aes(x=y)) +
  stat_density(geom="line")

or a violin plot:

ggplot(df, aes(x=x, y=y)) +
  geom_violin()

might also be options for displaying your data.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow