r의 쌓인 영역 히스토그램

https://stackoverflow.com/questions/2241290

19-09-2019
|

문제

나는 Hadoop 클러스터에서 돼지 작업을 실행하여 코호트 분석을 수행하기 위해 R이 처리 할 수있는 것으로 많은 데이터를 뭉쳤다. 다음 스크립트가 있고 두 번째로 마지막 줄은 형식의 데이터를 가지고 있습니다.

> names(data)
[1] "VisitWeek" "ThingAge"    "MyMetric"

방문은 날짜입니다. 사물과 마이 메틱은 정수입니다.

데이터는 다음과 같습니다.

2010-02-07     49  12345

지금까지 가지고있는 대본은 다음과 같습니다.

# Load ggplot2 for charting 
library(ggplot2);

# Our file has headers - column names
data = read.table('weekly_cohorts.tsv',header=TRUE,sep="\t");

# Print the names
names(data)

# Convert to dates
data$VisitWeek = as.Date(data$VisitWeek)
data$ThingCreation = as.Date(data$ThingCreation)

# Fill in the age column
data$ThingAge = as.integer(data$VisitWeek - data$ThingCreation)

# Filter data to thing ages lt 10 weeks (70 days) + a sanity check for gt 0, and drop the creation week column
data = subset(data, data$ThingAge <= 70, c("VisitWeek","ThingAge","MyMetric"))
data = subset(data, data$ThingAge >= 0)

print(ggplot(data, aes(x=VisitWeek, y=MyMetric, fill=ThingAge)) + geom_area())

이 마지막 줄은 작동하지 않습니다. 나는 많은 변형, 막대, 히스토그램을 시도했지만 평소와 같이 문서가 나를 물리 쳤다.

표준 Excel 스타일 스택 영역 차트를 보여주고 싶습니다. X 축에서 몇 주에 걸쳐 y 축의 날짜와 함께 쌓인 각각의 타임 시리즈. 이러한 종류의 차트의 예는 다음과 같습니다. http://upload.wikimedia.org/wikipedia/commons/a/a1/mk_zuwander.png

여기서 문서를 읽었습니다. http://had.co.nz/ggplot2/geom_area.html 그리고 http://had.co.nz/ggplot2/geom_histogram.html 그리고이 블로그 http://chartsgraphs.wordpress.com/2008/10/05/r-lattice-plot-beats-excel-stacked-area-trend-chart/ 그러나 나는 그것이 나를 위해 일할 수 없다.

이것을 어떻게 달성 할 수 있습니까?

해결책

library(ggplot2)
set.seed(134)
df <- data.frame(
    VisitWeek = rep(as.Date(seq(Sys.time(),length.out=5, by="1 day")),3),
    ThingAge = rep(1:3, each=5),
    MyMetric = sample(100, 15))

ggplot(df, aes(x=VisitWeek, y=MyMetric)) + 
    geom_area(aes(fill=factor(ThingAge)))

아래 이미지를 제공합니다. 나는 당신의 문제가 영역 플롯의 채우기 매핑을 올바르게 지정하는 데 있다고 생각합니다. fill=factor(ThingAge)

다른 팁

ggplot (data.set, aes (x = time, y = value, color = type)) + geom_Area (aes (fill = type), 위치 = '스택')

GEOM_AREA에 채우기 요소를 제공하고 쌓아야합니다 (기본값 일 수도 있지만).

여기에서 찾았습니다 http://www.mail-achive.com/r-belp@r-project.org/msg84857.html

나는 이것으로 내 결과를 얻을 수 있었다 :

StackedPlot () 함수를로드했습니다 https://stat.ethz.ch/pipermail/r-help/2005-august/077475.html

기능 (내 것이 아니라 링크 참조)은 다음과 같습니다.


stackedPlot = function(data, time=NULL, col=1:length(data), ...) {

  if (is.null(time))
    time = 1:length(data[[1]]);

  plot(0,0
       , xlim = range(time)
       , ylim = c(0,max(rowSums(data)))
       , t="n" 
       , ...
       );

  for (i in length(data):1) {

    # Die Summe bis zu aktuellen Spalte
    prep.data = rowSums(data[1:i]);

    # Das Polygon muss seinen ersten und letzten Punkt auf der Nulllinie haben
    prep.y = c(0
                , prep.data
                , 0
                )

    prep.x = c(time[1]
                , time
                , time[length(time)]
                )

    polygon(prep.x, prep.y
            , col=col[i]
            , border = NA
            );
  }
}

그런 다음 데이터를 광범위한 형식으로 재구성했습니다. 그런 다음 효과가있었습니다!


wide = reshape(data, idvar="ThingAge", timevar="VisitWeek", direction="wide");
stackedPlot(wide);

정수를 요인으로 바꾸고 geom_area보다는 geom_bar를 사용했습니다.

df<-expand.grid(x=1:10,y=1:6)
df<-cbind(df,val=runif(60))
df$fx<-factor(df$x)
df$fy<-factor(df$y)
qplot(fy,val,fill=fx,data=df,geom='bar')

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow