Question

I have a number of items in groups of varying size. For each of these groups, one (known) item is the "correct" one. There is a function which will assign a score to each of item. This results in a flat vector of item scores, as well as vectors telling the index where each group begins and how big it is. I wish to do a "softmax" operation over the scores in each group to assign the items probabilities, and then take the sum of the logs of the probabilities of the correct answers. Here is a simpler version, where we simply return the score of the correct answer without the softmax and the logarithm.

import numpy                                                                                                                                                                                                                                                                          
import theano                                                                                                                                                                                                                                                                         
import theano.tensor as T                                                                                                                                                                                                                                                             
from theano.printing import Print                                                                                                                                                                                                                                                     

def scoreForCorrectAnswer(groupSize, offset, correctAnswer, preds):  
    # for each group, this will get called with the size of
    # the group, the offset of where the group begins in the 
    # predictions vector, and which item in that group is correct                                                                                                                                                                                                                                                                                                                                                                                                                                              
    relevantPredictions = preds[offset:offset+groupSize]                                                                                                                                                                                                                              
    ans = Print("CorrectAnswer")(correctAnswer)                                                                                                                                                                                                                                       
    return relevantPredictions[ans]       

groupSizes = T.ivector('groupSizes')                                                                                                                                                                                                                                                  
offsets = T.ivector('offsets')                                                                                                                                                                                                                                                        
x = T.fvector('x')                                                                                                                                                                                                                                                                    
W = T.vector('W')                                                                                                                                                                                                                                                                     
correctAnswers = T.ivector('correctAnswers')                                                                                                                                                                                                                                          

# for this simple example, we'll just score the items by
# element-wise product with a weight vector                                                                                                                                                                                                                                                                                  
predictions = x * W                                                                                                                                                                                                                                                                   

(values, updates) = theano.map(fn=scoreForCorrectAnswer,                                                                                                                                                                                                                                       
   sequences = [groupSizes, offsets, correctAnswers],                                                                                                                                                                                                                                
   non_sequences = [predictions] )                                                                                                                                                                                                                                                    

func = theano.function([groupSizes, offsets, correctAnswers,                                                                                                                                                                                                                          
        W, x], [values])                                                                                                                                                                                                                                                              

sampleInput = numpy.array([0.1,0.7,0.3,0.05,0.3,0.3,0.3], dtype='float32')                                                                                                                                                                                                            
sampleW = numpy.array([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], dtype='float32')                                                                                                                                                                                                           
sampleOffsets = numpy.array([0,4], dtype='int32')                                                                                                                                                                                                                                     
sampleGroupSizes = numpy.array([4,3], dtype='int32')                                                                                                                                                                                                                                  
sampleCorrectAnswers = numpy.array([1,2], dtype='int32')                                                                                                                                                                                                                              

data = func (sampleGroupSizes, sampleOffsets, sampleCorrectAnswers, sampleW, sampleInput)                                                                                                                                                                                             
print data                                                                                                                                                                                                                                                                            

#these all three raise the same exception (see below)                                                                                                                                                                                                                                             
gW1 = T.grad(cost=T.sum(values), wrt=W)                                                                                                                                                                                                                                               
gW2 = T.grad(cost=T.sum(values), wrt=W, disconnected_inputs='warn')                                                                                                                                                                                                                   
gW3 = T.grad(cost=T.sum(values), wrt=W, consider_constant=[groupSizes,offsets])   

This correctly calculates the output, but when I attempt to take the gradient with respect to the parameter W, I get (paths abbreviated):

Traceback (most recent call last):
  File "test_scan_for_stackoverflow.py", line 37, in <module>
    gW = T.grad(cost=T.sum(values), wrt=W)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 438, in grad
    outputs, wrt, consider_constant)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 698, in _populate_var_to_app_to_idx
    account_for(output)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 694, in account_for
    account_for(ipt)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 669, in account_for
    connection_pattern = _node_to_pattern(app)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 554, in _node_to_pattern
    connection_pattern = node.op.connection_pattern(node)
  File "Theano-0.6.0rc2-py2.7.egg/theano/scan_module/scan_op.py", line 1331, in connection_pattern
ils)
  File "Theano-0.6.0rc2-py2.7.egg/theano/scan_module/scan_op.py", line 1266, in compute_gradient
    known_grads={y: g_y}, wrt=x)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 511, in grad
    handle_disconnected(elem)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 497, in handle_disconnected
    raise DisconnectedInputError(message)
theano.gradient.DisconnectedInputError: grad method was asked to compute 
the gradient with respect to a variable that is not part of the 
computational graph of the cost, or is used only by a 
non-differentiable operator: groupSizes[t]

Now, the groupSizes are constant, so there's no reason to need to take any gradients with respect to it. Ordinarily you could deal with this by either suppressing DisconnectedInputErrors or telling Theano to treat groupSizes as a constant in your T.grad call (see the last lines of the sample script). But there doesn't seem to be any way to pass such things down to the internal T.grad calls in the gradient computation for the ScanOp.

Am I missing something? Is these a way to get the gradient computation to work through the ScanOp here?

Was it helpful?

Solution

This turns out to be a Theano bug as of mid-Feb. 2013 (0.6.0rc-2). It is fixed in the development version on github as of the date of this post.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top