Pregunta

I am trying to figure out how to use the Accord.Net Framework to make a bayesian prediction using the machine learning NaiveBayes class. I have followed the example code listed in the documentation and have been able to create the model from the example.

What I can't figure out is how to make a prediction based on that model.

The way the Accord.Net framework works is that it translates a table of strings into numeric symolic representation of those strings using a class called Codification. Here is how I create inputs and outputs DataTable to train the model (90% of this code is straight from the example):

        var dt = new DataTable("Categorizer");
        dt.Columns.Add("Word");
        dt.Columns.Add("Category");

        foreach (string category in categories)
        {
            rep.LoadTrainingDataForCategory(category,dt);
        }

        var codebook = new Codification(dt);
        DataTable symbols = codebook.Apply(dt);
        double[][] inputs = symbols.ToArray("Word");
        int[] outputs = symbols.ToIntArray("Category").GetColumn(0);

        IUnivariateDistribution[] priors = {new GeneralDiscreteDistribution(codebook["Word"].Symbols)};
        int inputCount = 1;
        int classCount = codebook["Category"].Symbols;
        var target = new NaiveBayes<IUnivariateDistribution>(classCount, inputCount, priors);

        target.Estimate(inputs, outputs);

And this all works successfully. Now, I have new input that I want to test against the trained data model I just built. So I try to do this:

        var testDt = new DataTable("Test Data");
        testDt.Columns.Add("Word");
        foreach (string token in tokens)
        {
            testDt.Rows.Add(token);
        }

        DataTable testDataSymbols = codebook.Apply(testDt);
        double[] testData = testDataSymbols.ToArray("Word").GetColumn(0);

        double logLikelihood = 0;
        double[] responses;
        int cat = target.Compute(testData, out logLikelihood, out responses);

Notice that I am using the same codebook object that I was using previously when I built the model. I want the data to be codified using the same codebook as the original model, otherwise the same word might be encoded with two completely different values (the word "bob" in the original model might correspond to the number 23 and in the new model, the number 43... no way that would work.)

However, I am getting a NullReferenceException error on this line:

        DataTable testDataSymbols = codebook.Apply(testDt);

Here is the error:

System.NullReferenceException: Object reference not set to an instance of an object.
   at Accord.Statistics.Filters.Codification.ProcessFilter(DataTable data)
   at Accord.Statistics.Filters.BaseFilter`1.Apply(DataTable data)
   at Agent.Business.BayesianClassifier.Categorize(String[] categories, String testText) 

The objects I am passing in are all not null, so this must be something happening deeper in the code. But I am not sure what.

Thanks for any help. And if anyone knows of an example where a prediction is actually made from the bayesian example for Accord.Net, I would be much obliged if you shared it.

¿Fue útil?

Solución

Sorry about the lack of documentation on the final part. In order to obtain the same integer codification for a new word, you could use the Translate method of the codebook:

// Compute the result for a sunny, cool, humid and windy day:
double[] input = codebook.Translate("Sunny", "Cool", "High", "Strong").ToDouble(); 

int answer = target.Compute(input);

string result = codebook.Translate("PlayTennis", answer); // result should be "no"

but it should also have been possible to call codebook.Apply to apply the same transformation to a new dataset. If you feel this is a bug, would you like to fill a bug report in the issue tracker?

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top