Setting sample weights can cause big differences in the way the sklearn linear_model Ridge object processes your data - especially if the matrix is tall (n_samples > n_features), as is your case. Without sample weights it will exploit the fact that X.T.dot(X) is a relatively small matrix (100x100 in your case) and will thus invert a matrix in feature space. With given sample weights, the Ridge object decides to stay in sample space (in order to be able to weight the samples individually, see relevant lines here and here for the branching to _solve_dense_cholesky_kernel which works in sample space) and thus needs to invert a matrix of the same size as X.dot(X.T) (which in your case is n_samples x n_samples = 200000 x 200000 and will cause a memory error before it is even created). This is actually an implementation issue, please see the manual workaround below.
TL;DR: The Ridge object is unable to treat sample weights in feature space, and will generate a matrix n_samples x n_samples, which causes your memory error
While waiting for a possible remedy within scikit learn, you could try to solve the problem in feature space explicitly, like so
import numpy as np
alpha = 1. # You did not specify this in your Ridge object, but it is the default penalty for the Ridge object
sample_weights = w_tr.ravel() # make sure this is 1D
target = y.ravel() # make sure this is 1D as well
n_samples, n_features = X.shape
coef = np.linalg.inv((X.T * sample_weights).dot(X) +
alpha * np.eye(n_features)).dot(sample_weights * target)
For a new sample X_new, your prediction would be
prediction = np.dot(X_new, coef)
In order to confirm the validity of this approach you can compare these coef to model.coef_ (after you have fit the model) from your code when applying it to smaller numbers of samples (e.g. 300), that do not cause the memory error when used with the Ridge object.
IMPORTANT: The code above only coincides with sklearn implementations if your data is already centered, i.e. your data must have mean 0. Implementing a full ridge regression with intercept fitting here would amount to a contribution to scikit learn, so it would be better to post it there. The way to center your data is as follows:
X_mean = X.mean(axis=0)
target_mean = target.mean() # Assuming target is 1d as forced above
You then use the provided code on
X_centered = X - X_mean
target_centered = target - target_mean
For predictions on new data, you need
prediction = np.dot(X_new - X_mean, coef) + target_mean
EDIT: As of April 15th, 2014, scikit-learn ridge regression can deal with this problem (bleeding edge code). It will be available in the 0.15 release.