Question

I have a large data set that I am trying to discretise and create a 3d surface plot with:

  rowColFoVCell wpbCount Feret

1  001001001001       1  0.58

2  001001001001       1  1.30

3  001001001001       1  0.58

4  001001001001       1  0.23

5  001001001001       2  0.23

6  001001001001       2  0.58

There are currently 695302 rows in this data set. I am trying to discretise the third 'Feret' column based on the second column, so for each 'wpbCount' bin the 'Feret' column.

I think the solution will involve using cut but I am not sure how to go about this. I would like to end up with a data frame something like this:

  wpbCount Feret Count

1  1  [0.0,0.2] 3

2  1  [0.2,0.4] 5

3  1  [0.4,0.6] 6

4  1  [0.8,0.8] 9

5  2  [0.0,0.2] 6

6  2  [0.4,0.6] 23

No correct solution

OTHER TIPS

This is to answer the first part:

Create Some data

DF <- data.frame(wpbCount = sample(1:1000, 1000),
                 Feret = sample(seq(0, 1, 0.001), 1000))

1) Discretize Use cut with right = FALSE so the intervals are [) I normally find this more usefull than the default

DF$cut_it <- cut(DF$Feret, right = FALSE,
                 breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1))

2) Aggregate
TABLE <- data.frame(table(DT$cut_it))

EDIT Another attempt

library(data.table)
DT <- data.table(DF)
DT <- DT[, list(wpbCount = length(wpbCount),
                Feret = length(Feret)
                ), by=cut_it]

Perhaps you are just trying to discretize and not aggregate. Try this:

DF2 <- data.frame(wpbCount = sample(1:3, 1000, replace=T),
                 Feret = sample(seq(0, 1, 0.001), 1000))

DF2$Feret2 <- cut(DF$Feret, right = FALSE,
                 breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1.1))

DF2 <- DF2[, c(1, 3)]

Thanks very much for your help I used the following functions in R:

x$bin <- cut(x$Feret, right = FALSE, breaks = seq(0,max(wpbFeatures$Feret), by=0.1))

y <-aggregate(x$bin, by = x[c('wpbCount', 'bin')], length)

From your suggestions I have been able to get the data frame that I require:

wpbCount | bin | x

1 [0.2,0.3) 72

2 [0.2,0.3) 142

3 [0.2,0.3) 224

4 [0.2,0.3) 299

5 [0.2,0.3) 421

6 [0.2,0.3) 479

Now I need to plot this in 3D and I am not sure how to do so with a non-numerical column i.e. the bin column which is factors.

Does anyone know how I can plot these three columns against each other?

Check out this link. There are some 3d plots. However, 3d plots aren't the greatest tool to analize data. If you insist with the 3d approach, try stat_contout() from the ggplot2 package.

However, a probably better apprach is to do a few plots in 2d, or use facet_grid(). Take a look at ggplot2 current documentation also.

Try this based on your last answer (not tested):

ggplot(DF, aes(wpbCount , x)) +
  geon_point() +
  facet_grid(. ~ bin)

The idea is to use the factor variable (in this case, bin) to facet the plot.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top