문제

I want to make a little project and I want to use neural networks with python. I found that pybrain is the best solution. But until now, all the examples and questions I have found, cannot help me.

I have a sequence of numbers. Hundreds of rows. Some values are missing and instead of a number, there are a "x".

For example

1425234838636**x**40543485435097**x**43953458345345430843967067045764607457607645067045**x**04376037654067458674506704567408576405

and so on. This is just an example. Not my sequence.

I thought to read one by one the values and train my neural net and when I find one 'x' I will predict the number and I will continue training it with the following numbers.

What I have found until now are training like this one

trainSet.addSample([0,0,0,0],[1])

with some inputs and some outputs.

Any advice how can I continue with it?

Edit: I figure something and I would like to receive feedback, because I don't know if it is right.

I still have the string for above. I split it in list so I have a list where each entity is a number.

for ind in range(len(myList)):
   if not myList[ind] == "x" and not myList[ind+1]=="x":
       ds.addSample(myList[ind],myList[ind+1])
   else:
       break

net = FeedForwardNetwork() 
inp = LinearLayer(1) 
h1 = SigmoidLayer(1) 
outp = LinearLayer(1)

net.addOutputModule(outp) 
net.addInputModule(inp) 
net.addModule(h1)

net.addConnection(FullConnection(inp, h1))  
net.addConnection(FullConnection(h1, outp))

net.sortModules()

trainer = BackpropTrainer(net, ds)
trainer.trainOnDataset(ds,1000)
trainer.testOnData(verbose=True)

lis[ind+1] = net.activate((ind,))

GO to the beggining and continue from the last "x" which replaced from the net.activate()

What do you think? Do you believe that something like this will work?

도움이 되었습니까?

해결책

In general, if you are training your ANN using back propagation, you are basically training an input-output map. This means that your training set has to comprise known input-output relations (none of your unknown values included in the training set). The ANN then becomes an approximation of the actual relationship between your inputs and outputs.

You can then call x = net.activate([seq]) where seq is the input sequence associated with the unknown value x.

If x is an unknown input sequence for a known result, then you have to call the inverse of the ANN. I do not think there is a simple way of inverting an ANN in pybrain, but you could just train an ANN with the inverse of your original training data. In other words, use your known results as the training inputs, and their associated sequences as the training results.

The main thing to consider is the appropriateness of the tool and the training data for what you are trying to do. If you just want to predict x as a function of the previous number, then I think you are training correctly. I am guessing x is going to be a function of the previous n numbers though, in which case you want to update your data set as:

n = 10
for ind in range(len(myList)):
    # Don't overrun our bounds
    if ind == len(myList)-1:
        break

    # Check that our sequence is valid
    for i in range(ind-n, ind+1):
        if i >= 0 and myList[i] == "x":
            # we have an invalid sequence
            ind += i   # start next seq after invalid entry
            break

    # Add valid training sequence to data set
    ds.addSample(myList[ind-n:ind],myList[ind+1])

다른 팁

What you are describing is a statistical application called Imputation: substituting missing values in your data. The traditional approach does not involve neural networks, but there has certainly been some research in this direction. This is not my area, but I recommend you check the literature.

I can give you not a specific answer for that python library, but as I see it, you have a neural net and you give it samples of the form

    [ i0 i1 ... i n ] --> [ o0 o1 ... on ]
    (input vector)        (output vector)

Now you train the the net with sample vectors of length 1. Your net does not know about the sequence of the numbers presented to it, that sequence is only interesting for the outcome of the trained net.

To get a network, that knows about the sequence you could present vectors of consecutive numbers as input and the single number you want, as output. You leave ot the sequences containing the X Example:

    Sequence: 1 2 3 4 X 2 3 4 5 6 7 8
    Training with input length 3, output length 1:
    [1 2 3] -> 4
    [2 3 4] -> 5 (the second one, as the first one is not available)
    [3 4 5] -> 6
    [4 5 6] -> 7
    [5 6 7] -> 8

I think using this, your net can adapt a little to the input sequence. The "how" to extract the right training sequences as input, I have to leave to the domain expert (you).

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top