Spatial Distribution /Simulation/ Density Function

Question 1

Not really an answer, just some observations:

If R= # iterations, and S= sample size per iteration (e.g., R=200 and S=4000), then your run time will be ~O(R × S²). So doubling runs and halving sample size will reduce run time by a factor of 2.
The default distance metric in spDists(...) is Euclidean. If this is what you want, then you are much better off using the dist(..) function - it is much more efficient (see code below). If you want geographic distance (e.g., Great Circle) you have to use spDists(..., longlat=T).
spDists(...) calculates the full distance matrix, not just the lower diagonal. This means that all non-zero distances show up twice, which has an effect on your densities. This is why M1 and M2 in the code below are not equal.
For large S, profiling your code (using Rprof) shows that about 46% of the time is spent in density(...), and another 38% is spent in spDists(...). So this is one case where replacing the for loop with calls to apply, lapply etc. will not help much.
There are other functions for calculating a geographic distance matrix - assuming that's what you want, but none were faster than what you are already using. I tried earth.dist(...) in the fossil package, distm(...) from the geosphere package, and rdist.earth(...) in the fields package, but none of these improved run times.

Code:

library(sp)
library(maptools)
germG <- readShapePoly("vg250_gem.shp")
coord <- spsample(germG, 1e4, "random") # Just 10,000 points...
R <- 200

# dist(...) and spDists(..., longlat=F) give same result
zz <- coord[sample(1e4,size=200)]
d1 <- spDists(zz)
d2 <- dist(zz@coords)
max(abs(as.matrix(d1)-as.matrix(d2)))
# [1] 0
# but dist(...) is much faster
M1 <- matrix(NA,R, 256)
set.seed(1)
system.time({
  for(r in 1:R) {
    ix <- sample(1e4,size=200)    # S = 200; test case
    Dg <- spDists(coord[ix])      # using spDists(...)
    Dg <- as.vector(Dg[Dg!=0])
    kg <- density(Dg,bw="nrd0",n=256)
    M1[r,] <- kg$y
  }
})
#    user  system elapsed 
#   11.08    0.17   11.28 

M2 <- matrix(NA,R, 256)
set.seed(1)
system.time({
  for(r in 1:R) {
    ix <- sample(1e4,size=200)    # S = 200; test case
    Dg <- dist(coord[ix]@coords)  # using dist(...)
    Dg <- as.vector(Dg[Dg!=0])
    kg <- density(Dg,bw="nrd0",n=256)
    M2[r,] <- kg$y
  }
})
# user  system elapsed 
# 2.67    0.03    2.73

Edit In response to OP's request. Below is the profiling code with size=200.

R=200
M <- matrix(NA,R, 256)
Rprof("profile")
set.seed(1)
system.time({
  for(r in 1:R) {
    ix <- sample(1e4,size=200)    # S = 200; test case
    Dg <- spDists(coord[ix])      # using spDists(...)
    Dg <- as.vector(Dg[Dg!=0])
    kg <- density(Dg,bw="nrd0",n=256)
    M[r,] <- kg$y
  }
})
Rprof(NULL)
head(summaryRprof("profile")$by.total,10)
#                   total.time total.pct self.time self.pct
# "system.time"          11.52    100.00      0.02     0.17
# "spDists"               7.08     61.46      0.02     0.17
# "matrix"                6.76     58.68      0.24     2.08
# "apply"                 6.58     57.12      0.26     2.26
# "FUN"                   5.88     51.04      0.22     1.91
# "spDistsN1"             5.66     49.13      3.36    29.17
# "density"               3.18     27.60      0.02     0.17
# "density.default"       3.16     27.43      0.06     0.52
# "bw.nrd0"               1.98     17.19      0.00     0.00
# "quantile"              1.76     15.28      0.02     0.17

As S gets bigger, calculating density begins to dominate because the results must be sorted. You can run this code with ix <- sample(1e4,size=4000) to see it.

Question 2

You might find it is marginally faster to define a blank matrix DG ahead of time.

Beyond that, you may want to consider a multicored application, granting sufficient RAM space.