Are a majority of machine learning techniques derived from linear regression and kNN?

Question 1

Here's the complement of your question -- some methods that are derived from KNN and linear regression.

Techniques for regression

LASSO, ridge regression and elastic net are linear regression with a regularization penalty.
Local linear regression builds a nonlinear estimator using locally linear models (it's kind of a combination of linear regression and K nearest neighbor).
Nonlinear regression techniques generalize to the case where there isn't a simple linear dependency on the inputs, but the there is still a parametric model.
Poisson regression generalizes linear regression to the case where the quantity being estimated is a count (i.e. it cannot be negative, and it is always a round number)
Heirarchical linear models, for example where A is used to predict B, which is in turn used to predict C.
Least absolute deviation tries to minimize the L1 norm, rather than the L2 norm as in linear regression.
Various robust regression techniques try to provide robustness in the face of outliers.

Techniques for classification

Logistic regression and probit regression fall into the class of generalized linear models, and so are related to linear regression.
Similarly, they generalize to multinomial probit and multinomial logit models when there are more than two categories.
Some neural nets can be viewd as heirarchical, multinomial logistic regressions.
Support vector machines can, in some cases, be viewed as a least-squares regression with binary targets, operating in a higher-dimensional space than that occupied by the original data.

Techniques not inspired by KNN or LR

Some techniques that aren't obviously inspired by k nearest neighbors or linear regression include

Decision trees (and random forests, a development of decision trees).
Naive Bayes (which works with probability distributions)
Markov chain, hidden Markov model, Kalman filter and particle filter models, which impose additional structure on the problem that isn't easily captured by nearest neighbors or linear dependence.

Question 2

Decision trees are a prominent class of techniques/algorithms that don't really have anything in common with nearest-neighbour classification or linear regression. That said, they (to some extent) rely on the same kind of assumptions that data characteristics that kNN and linear regression rely on, most notably that the features of the data represents the values to predict in some way.

The techniques themselves are completely different though, both in terms of how the models are built and what they look like.

Question 3

This is way to far fetches quote. It is similar to saying "all algorithms rely on loops". Yes, these are some core concepts, but the similarities from the point of todays models are so small that thinking about modern ML methods as linear regression and/or knn is simply naive. This is a good quote in terms of introductary course, but without this context is imho simply wrong. Everything lies in the definition of "variant", how much change qualify to be no longer a "variant".

I would like to see a (short!) answer to how any of these models: HMM, CRF, GMM,SVM (linear classification in terms of large margin has nothing to do with linear regression), random forest, ... are linear regression/knn models.

Question 4

Apart from the methods mentioned by Chris Taylor already, there is another large class of methods that's not derived from linear regression or kNN, and that's the whole class of unsupervised learning algorithms. The most popular one is k-means for clustering.

Also, the whole class of dimension reduction techniques is not really a variant of linear regression or kNN (although there are some connections, e.g. for principal components analysis).

It is also important to note that the authors of Elements of Statistical Learning use the word "variant" in a very broad sense here, which you can see when they list examples of "variants" just after the quote you mention. While all of these examples use important ideas from linear regression and/or kNN, they also use quite a few other deep and ingenious ideas as well.