ggplot2: How to adequately capture spread of data in plot

Question

So here is an expansion on my comment, using your Romeo and Juliet dataset.

Rather than using the maximum probability for each sentence as a surrogate for sentiment, you could use the weighted average, or "expected sentiment". This is calculated, for each sentence, as:

L = [-2,-1,0,1,2]

E(S_i) = Σ ( p_i,j × L_j )

where i,j is the row,column number and L_j is the jth element of L.

You could also calculate the uncertainty in sentiment for each sentence as:

V(S_i) = Σ p_i,j × [ L_j - E(S_i) ]²

In R code:

library(ggplot2)
library(reshape2)
P <- read.csv("romeo.and.juliet.txt",sep=" ")   # file you provided
P <- as.matrix(P)                               # needs to be a matrix
# calculate expected sentiment, E(s) based on Likert scale
# E(S) = sum(P_i * i)  [i in -2:2]
L <- c(-2,-1,0,1,2)    # Likert Scale
ES <- P %*% L          # E(S)
sentiment <- data.frame(n=1:length(ES),ES)
# calculate sentiment variability for each sentence
# V(S)  = sum(P_i * (i - E(s))^2)   [i in -2:2]
# SD(S) = sqrt(V(S))
LL <- matrix(rep(L,each=nrow(P)),ncol=ncol(P))
LL <- apply(LL,2,function(X)X-ES)
LL.sq <- LL^2
VS <- P %*% t(LL.sq)
SD <- sqrt(diag(VS))
sentiment$SD <- SD
# reshape for plotting w/ggplot
gg <- melt(sentiment,id="n")
ggplot(gg, aes(x=n,y=value,color=variable)) + 
  geom_point(size=1.5,alpha=.5) + 
  stat_smooth(method=loess, size=1)+
  facet_grid(variable~., scales="free_y")+
  scale_color_discrete(name="",labels=c("Expected Sentimant","Uncertainty"))+
  theme(legend.position="bottom")

This data looks completely random - which makes me question the classification method (is it appropriate for Elizabethan English?). Nevertheless this does illustrate the technique.