As I pointed out in my comments, this question is really multiple questions, and does not reflect the title. In future, please try to keep questions manageable and discrete. I'm not going to attempt to answer your third point (about K-means clustering) here. Search SO and I'm sure you will find some relevant questions/answers.
Regarding your other questions, have a careful look at the following. If you don't understand what a particular function is doing, refer to ?function_name
(e.g. ?tapply
), and for further enlightenment, run nested code from the inside out (e.g. for foo(bar(baz(x)))
, you could examine baz(x)
, then bar(baz(x))
, and finally foo(bar(baz(x)))
. This is an easy way to help you get a handle on what's going on, and is also useful when debugging code that produces errors.
d <- read.csv(text='Country,LifeExpectancy,Region
India,60,Asia
Srilanka,62,Asia
Myanmar,61,Asia
USA,65,America
Canada,65,America
UK,68,Europe
Belgium,67,Europe
Germany,69,Europe
Switzerland,70,Europe
France,68,Europe', header=TRUE)
barplot(with(d, tapply(Country, Region, length)), cex.names=0.8,
ylab='No. of countries', xlab='Region', las=1)
boxplot(LifeExpectancy ~ Region, data=d, las=1,
xlab='Region', ylab='Life expectancy')
d$Country[which.min(d$LifeExpectancy)]
# [1] India
# Levels: Belgium Canada France Germany India Myanmar Srilanka Switzerland UK USA
d$Country[which.max(d$LifeExpectancy)]
# [1] Switzerland
# Levels: Belgium Canada France Germany India Myanmar Srilanka Switzerland UK USA