Question

I'm working with the back-propagating neural network written in Python found here. It works quite well with the simple XOR example provided.

However, I want to use it to do something a bit more complex: attempt to predict stock prices. I know that neural networks aren't necessarily the best choice and may not be accurate at all, but I would still like to try.

My first attempt was to get 10 days of past closing prices for a specified stock (GOOG, for example). I then hoped to train the neural network with this data and then predict the next day's closing price, but then I realized something: I only had 1 input value, and would not have any input to provide when trying to get the prediction. This is the root of all my confusion, the number of input/hidden/output nodes.

In a paper here, they mention using the lowest, the highest, and the average value of a stock in the last d days as input. This is 3 inputs (or 4? if you count d), yet in order to predict the next day's price you wouldn't be able to provide any of this as input (except maybe d?).

How do you account for this variation in the number of inputs when training and predicting with a neural network? Am I missing some fundamental part of NN's and how they're used? Thanks!

Was it helpful?

Solution

@anana's comment helped me click on how the neural network should work. As she said, I can just provide the average value of a stock in the last d (in my case, 5) days as the input to attempt to get a prediction.

This means that my training input is of the format:

[[rollingAverage, rollingMinimum, rollingMaximum], normalizedClosePrice] for the past five days (so a total of 9 days are analyzed due to the rolling windows).

When I want to get a prediction after training, I provide input nodes of just the format:

[rollingAverage, rollingMinimum, rollingMaximum] for the most recent 5 days.

Below is all the relevant logic, combined with the neural network I linked in the original question:

## ================================================================

def normalizePrice(price, minimum, maximum):
    return ((2*price - (maximum + minimum)) / (maximum - minimum))

def denormalizePrice(price, minimum, maximum):
    return (((price*(maximum-minimum))/2) + (maximum + minimum))/2

## ================================================================

def rollingWindow(seq, windowSize):
    it = iter(seq)
    win = [it.next() for cnt in xrange(windowSize)] # First window
    yield win
    for e in it: # Subsequent windows
        win[:-1] = win[1:]
        win[-1] = e
        yield win

def getMovingAverage(values, windowSize):
    movingAverages = []

    for w in rollingWindow(values, windowSize):
        movingAverages.append(sum(w)/len(w))

    return movingAverages

def getMinimums(values, windowSize):
    minimums = []

    for w in rollingWindow(values, windowSize):
        minimums.append(min(w))

    return minimums

def getMaximums(values, windowSize):
    maximums = []

    for w in rollingWindow(values, windowSize):
        maximums.append(max(w))

    return maximums

## ================================================================

def getTimeSeriesValues(values, window):
    movingAverages = getMovingAverage(values, window)
    minimums = getMinimums(values, window)
    maximums = getMaximums(values, window)

    returnData = []

    # build items of the form [[average, minimum, maximum], normalized price]
    for i in range(0, len(movingAverages)):
        inputNode = [movingAverages[i], minimums[i], maximums[i]]
        price = normalizePrice(values[len(movingAverages) - (i + 1)], minimums[i], maximums[i])
        outputNode = [price]
        tempItem = [inputNode, outputNode]
        returnData.append(tempItem)

    return returnData

## ================================================================

def getHistoricalData(stockSymbol):
    historicalPrices = []

    # login to API
    urllib2.urlopen("http://api.kibot.com/?action=login&user=guest&password=guest")

    # get 14 days of data from API (business days only, could be < 10)
    url = "http://api.kibot.com/?action=history&symbol=" + stockSymbol + "&interval=daily&period=14&unadjusted=1&regularsession=1"
    apiData = urllib2.urlopen(url).read().split("\n")
    for line in apiData:
        if(len(line) > 0):
            tempLine = line.split(',')
            price = float(tempLine[1])
            historicalPrices.append(price)

    return historicalPrices

## ================================================================

def getTrainingData(stockSymbol):
    historicalData = getHistoricalData(stockSymbol)

    # reverse it so we're using the most recent data first, ensure we only have 9 data points
    historicalData.reverse()
    del historicalData[9:]

    # get five 5-day moving averages, 5-day lows, and 5-day highs, associated with the closing price
    trainingData = getTimeSeriesValues(historicalData, 5)

    return trainingData

OTHER TIPS

A supervised machine learner is an algorithm that takes a bunch of cases consistent of features (a set of numbers, input) and a result (output).

What you need is a training dataset, for example, time series for several months, for which you know the output. Once your network is trained, you take the stock values in the last few days (known, because it has happened) in order to predict what is going to happen tomorrow, so you know what to buy.

At last, d is not an input, is a constant. And the number of input/outputs is independent (well, as long as you have enough input features). In theory, having more features increases the accuracy of the predictions, but takes longer processing time, needs bigger training sets, and may be prone to overfitting.

You're eliminating a lot of information by using rolling averages. There are a few other ways to present time-series data to a NN, e.g. the sliding windows approach.

Say you use 3 days worth of data as input in order to forecast the 4th day. Instead of averaging the previous 3 days, you could instead present each to an input node. Roll this 3-day window over the first half of your data to train your model. To test, present the 3 days worth of prices immediately prior the day you wish to forecast. E.g.

Training Set

[[day 1 price, day 2 price, day 3 price], day 4 price]
[[day 2 price, day 3 price, day 4 price], day 5 price]
[[day 3 price, day 4 price, day 5 price], day 6 price]
[[day 4 price, day 5 price, day 6 price], day 7 price] 

Testing

[day 5 price, day 6 price, day 7 price] 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top