Question

In Encog 3.x, how do you normalize data, use it for training, and denormalize results?

There is no good documentation on this and a simple example that applies each of these would go a long way in reducing the learning curve on Encog. I haven't figured it all out yet, but here are some resources.

(1) *How does Encog 3.0 Normalize?*

This code is ok for saving a new normalized csv. It is not clear here though how to take the AnalystNormalizeCSV and convert it to an MLDataSet to actually use it.

EncogAnalyst analyst = new EncogAnalyst();
AnalystWizard wizard = new AnalystWizard(analyst);
wizard.wizard(sourceFile, true, AnalystFileFormat.DECPNT_COMMA);
final AnalystNormalizeCSV norm = new AnalystNormalizeCSV();
norm.analyze(sourceFile, true, CSVFormat.ENGLISH, analyst);
norm.setOutputFormat(CSVFormat.ENGLISH);
norm.setProduceOutputHeaders(true);
norm.normalize(targetFile)

(2) *How do I normalize a CSV file with Encog (Java)*

This code is, again, ok for producing a normalized csv output. But it is unclear on how to take the normalized data and actually apply it. There is a method for setting the target as an MLData, but it assumes all columns are inputs and doesn't leave room for any ideals. Furthermore, both of these options are difficult to use when the file has headers and/or unused columns.

try {
            File rawFile = new File(MYDIR, "iris.csv");

            // download Iris data from UCI
            if (rawFile.exists()) {
                System.out.println("Data already downloaded to: " + rawFile.getPath());
            } else {
                System.out.println("Downloading iris data to: " + rawFile.getPath());
                BotUtil.downloadPage(new URL("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), rawFile);
            }

            // define the format of the iris data

            DataNormalization norm = new DataNormalization();
            InputField inputSepalLength, inputSepalWidth, inputPetalLength, inputPetalWidth;
            InputFieldCSVText inputClass;

            norm.addInputField(inputSepalLength = new InputFieldCSV(true, rawFile, 0));
            norm.addInputField(inputSepalWidth = new InputFieldCSV(true, rawFile, 1));
            norm.addInputField(inputPetalLength = new InputFieldCSV(true, rawFile, 2));
            norm.addInputField(inputPetalWidth = new InputFieldCSV(true, rawFile, 3));
            norm.addInputField(inputClass = new InputFieldCSVText(true, rawFile, 4));
            inputClass.addMapping("Iris-setosa");
            inputClass.addMapping("Iris-versicolor");
            inputClass.addMapping("Iris-virginica");

            // define how we should normalize

            norm.addOutputField(new OutputFieldRangeMapped(inputSepalLength, 0, 1));
            norm.addOutputField(new OutputFieldRangeMapped(inputSepalWidth, 0, 1));
            norm.addOutputField(new OutputFieldRangeMapped(inputPetalLength, 0, 1));
            norm.addOutputField(new OutputFieldRangeMapped(inputPetalWidth, 0, 1));
            norm.addOutputField(new OutputOneOf(inputClass, 1, 0));

            // define where the output should go
            File outputFile = new File(MYDIR, "iris_normalized.csv");
            norm.setCSVFormat(CSVFormat.ENGLISH);
            norm.setTarget(new NormalizationStorageCSV(CSVFormat.ENGLISH, outputFile));

            // process
            norm.setReport(new ConsoleStatusReportable());
            norm.process();
            System.out.println("Output written to: " + rawFile.getPath());

        } catch (Exception ex) {
            ex.printStackTrace();
        }

(3) *Denormalizing*

I'm at a total loss for how to take all of this and denormalize according to the appropriate data-type's max's and min's.

Was it helpful?

Solution

Here are few resources ,where you can get more detailed information about normalization and denormalization using ENCOG framework.

These great e-books written by Jeff Heaton himself, 1. Programming Neural Networks with Encog3 in C#, 2nd Edition by Heaton, Jeff (Oct 2, 2011) 2.Introduction to Neural Networks for C#, 2nd Edition by Jeff Heaton (Oct 2, 2008) These are must have ebooks for ENCOG users.

You can also have a look at pluralsight course on "Introduction to Machine learning with ENCOG", this also includes few examples of normalization and denormalization.

Now regarding your queries :"It is not clear here though how to take the AnalystNormalizeCSV and convert it to an MLDataSet to actually use it."

well you can use AnalystNormalizeCSV to normalize your training file. And then you can use LoadCSV2Memory of EncogUtility class to load the normalized training file to get the ML DataSet. Here is a sample code in C#,

var trainingSet = EncogUtility.LoadCSV2Memory(Config.NormalizedTrainingFile.ToString(), network.InputCount, network.OutputCount,true, CSVFormat.English,false);

it takes the normalized training file as first parameter, network input neuron count as second, network output neuron count as third, fourth parameter is boolean if you have header in your csv file, then you can mention the format as the fifth parameter, and sixth parameter is for significance.

so once you have this dataset in memory, you can use it for training. Similar approach can be taken in cross validation and evaluation step also.

Regarding denormalization, you can first persist the analyst file, and later you can use analyst file to denormalize individual columns also. For example :

var denormlizedOutput = analyst.Script.Normalize.NormalizedFields[index].DeNormalize(item.Input[index]);

Similar approach can be used in denormalizing fields to get class labels also. For example

var predictedClass = analyst.Script.Normalize.NormalizedFields[index].Classes[predictedClassInt].Name;

OTHER TIPS

The encog analyst is fantastic for normalizing data. It can take information stored in a CSV file and automatically determine the normalized fields and their type of encoding (including 1 of N equilateral encoding).

The only downside of this is that the logic is tightly coupled with the ReadCSV class.

Favouring extension as opposed to modification I decided to go about creating extension methods and alternative classes to create an analyst that would normalize a generic .NET dataset.

I also added a new test class which shows you how to use it (its very similar to the standard encog implementation).

using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Collections.Generic;
using Encog.ML.Data.Market;
using Encog.ML.Data.Market.Loader;
using Encog.App.Analyst;
using Encog.App.Analyst.Wizard;
using EncogExtensions.Normalization; //<- This is my extension lib
using System.Data;
using System.Linq;     
[TestMethod]
    public void Normalize_Some_In_Memory_Data()
    {
        // Download some stock data
        List<LoadedMarketData> MarketData = new List<LoadedMarketData>();
        MarketData.AddRange(DownloadStockData("MSFT",TimeSpan.FromDays(10)));
        MarketData.AddRange(DownloadStockData("AAPL", TimeSpan.FromDays(10)));
        MarketData.AddRange(DownloadStockData("YHOO", TimeSpan.FromDays(10)));

        // Convert stock data to dataset using encog-extensions
        DataSet dataSet = new DataSet().Convert(MarketData, "Market DataSet");

        // use encog-extensions to normalize the dataset 
        var analyst = new EncogAnalyst();
        var wizard = new AnalystWizard(analyst);
        wizard.Wizard(dataSet);

        // DataSet Goes In... 2D Double Array Comes Out... 
        var normalizer = new AnalystNormalizeDataSet(analyst);
        var normalizedData = normalizer.Normalize(dataSet);

        // Assert data is not null and differs from original
        Assert.IsNotNull(normalizedData);
        Assert.AreNotEqual(normalizedData[0, 0], dataSet.Tables[0].Rows[0][0]);

    }

    private static List<LoadedMarketData> DownloadStockData(string stockTickerSymbol,TimeSpan timeSpan)
    {
        IList<MarketDataType> dataNeeded = new List<MarketDataType>();
        dataNeeded.Add(MarketDataType.AdjustedClose);
        dataNeeded.Add(MarketDataType.Close);
        dataNeeded.Add(MarketDataType.Open);
        dataNeeded.Add(MarketDataType.High);
        dataNeeded.Add(MarketDataType.Low);
        dataNeeded.Add(MarketDataType.Volume);

        List<LoadedMarketData> MarketData =
            new YahooFinanceLoader().Load(
                new TickerSymbol(stockTickerSymbol),
                dataNeeded,
                DateTime.Now.Subtract(timeSpan),
                DateTime.Now).ToList();

        return MarketData;
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top