Question

I am looking for some relatively simple data sets for testing and comparing different training methods for artificial neural networks. I would like data that won't take too much pre-processing to turn it into my input format of a list of inputs and outputs (normalized to 0-1). Any links appreciated.

Was it helpful?

Solution

Why not try something simple like the sin function as the training data? Since you are comparing the training methods and don't really care about what you are training the network for, it should work and be easy to generate the training data.

Train the network using sin(x) where x is the input and the output is the value of the function. An added benefit in your case is that the absolute value of the result is already in the range 0-1. It would equally work with other mathematical functions.

OTHER TIPS

https://archive.ics.uci.edu/ml is the University of California Irvine repository of machine learning datasets. It's a really great resource, and I believe that they are all in CSV files.

Some resources are

  • The sinC function .

           +----
           |   sin(x)
           |  -------        when x != 0
           |     x
    sinC = |
           |
           |     1           otherwise
           +----
    
  • The sin(x) function as @adrianbanks told.

  • For testing some new modification to some algorithm the good old n-parity tests.

  • The Iris dataset, semeion hand written digit data set etc, any other functions and a lot more.

  • The UCI Machine Learning Repository: archive.ics.uci.edu/ml/datasets.html

  • Here is another resource having many regression datasets: www.dcc.fc.up.pt/~ltorgo//Regression/DataSets.html . You will get many of these from the UCI ML Repository.
  • You can get data sets from https://www.kaggle.com/ for various practical data sets.

I don't think you require a lot of pre-processing with these. Like for categorical variables, you can replace them with binary using a GUI text editor fast. For example the Abalone dataset has one categorical attribute, the Gender, which has three values "M" for male, "F" for female, "I" for infant. You can press Ctrl + R in your text editor and replace all occurrences of "M" with 1,0,0, all occurrences of "F" with 0,1,0 and all occurrence of "I" with 0,0,1 (considering the file is in CSV format). This will make quick replacements of the categorical variables.

If you are in R, then you can use the normalizeData function which comes with the RSNNS package to scale and normalize your data in 0 and 1.

If you are in other environment like octave or matlab, you can just invest some time to write your code. I am not aware of available functions in these environments, I use my code to scale and/or normalize the data.

When you use functions your work is made much easier, and once you prepare the data, save the modified data in a file.

Remember one thing, the target of training a neural-network is not just to train the network in a way such that it works good on a certain training set. The main target is to train the network such that it has best error for new data which the network haven't seen (directly or indirectly).

http://neuroph.sourceforge.net/sample_projects.html There is many sample projects and famous data.

Here are some handwriting and other databases for training purposes.

http://www.cs.nyu.edu/~roweis/data.html

As an interesting side note, ~roweis commited suicide in 2010 after fighting with his wife: http://www.huffingtonpost.com/2010/01/14/sam-roweis-nyu-professor-_n_421500.html.

I learnt ANNs as an undergraduate by using them to perform OCR (Optical Character Recognition). I think this is a nice use case.

Scan in two pages of text, extract the letters and form training/testing datasets (e.g. 8x8 pixels leads to 64 input nodes), label the data. Train the ANN and get a score using the testing dataset. Change the network topology/parameters and tune the network to get the best score.

You could find some interesting datasets from NLP, NER to Image Classification, Bounding here: https://dataturks.com/projects/trending

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top