sklearn Ridge and sample_weight gives Memory Error

Question 1

Setting sample weights can cause big differences in the way the sklearn linear_model Ridge object processes your data - especially if the matrix is tall (n_samples > n_features), as is your case. Without sample weights it will exploit the fact that X.T.dot(X) is a relatively small matrix (100x100 in your case) and will thus invert a matrix in feature space. With given sample weights, the Ridge object decides to stay in sample space (in order to be able to weight the samples individually, see relevant lines here and here for the branching to _solve_dense_cholesky_kernel which works in sample space) and thus needs to invert a matrix of the same size as X.dot(X.T) (which in your case is n_samples x n_samples = 200000 x 200000 and will cause a memory error before it is even created). This is actually an implementation issue, please see the manual workaround below.

TL;DR: The Ridge object is unable to treat sample weights in feature space, and will generate a matrix n_samples x n_samples, which causes your memory error

While waiting for a possible remedy within scikit learn, you could try to solve the problem in feature space explicitly, like so

import numpy as np
alpha = 1.   # You did not specify this in your Ridge object, but it is the default penalty for the Ridge object
sample_weights = w_tr.ravel()  # make sure this is 1D
target = y.ravel()  # make sure this is 1D as well
n_samples, n_features = X.shape
coef = np.linalg.inv((X.T * sample_weights).dot(X) + 
                      alpha * np.eye(n_features)).dot(sample_weights * target)

For a new sample X_new, your prediction would be

prediction = np.dot(X_new, coef)

In order to confirm the validity of this approach you can compare these coef to model.coef_ (after you have fit the model) from your code when applying it to smaller numbers of samples (e.g. 300), that do not cause the memory error when used with the Ridge object.

IMPORTANT: The code above only coincides with sklearn implementations if your data is already centered, i.e. your data must have mean 0. Implementing a full ridge regression with intercept fitting here would amount to a contribution to scikit learn, so it would be better to post it there. The way to center your data is as follows:

X_mean = X.mean(axis=0)
target_mean = target.mean()   # Assuming target is 1d as forced above

You then use the provided code on

X_centered = X - X_mean
target_centered = target - target_mean

For predictions on new data, you need

prediction = np.dot(X_new - X_mean, coef) + target_mean

EDIT: As of April 15th, 2014, scikit-learn ridge regression can deal with this problem (bleeding edge code). It will be available in the 0.15 release.

Question 2

What NumPy version do you have installed?

Looks like the ultimate method call that does it is numpy.dot(X, X.T) which if in your case X.shape = (200000,2) would generate a 200k-by-200k matrix.

Try converting your observations to a sparse matrix type or reduce the number of observations you use (there may be a variant of ridge regression that uses a few observations one batch at a time?).