Frage

I'm trying to run Mallet in Java and am getting the following error.

Couldn't open cc.mallet.util.MalletLogger resources/logging.properties file.
Perhaps the 'resources' directories weren't copied into the 'class' directory.
Continuing.

I'm trying to run the example from Mallet's website (http://mallet.cs.umass.edu/topics-devel.php). Below is my code. Any help is appreciated.

package scriptAnalyzer;

import cc.mallet.util.*;
import cc.mallet.types.*;
import cc.mallet.pipe.*;
import cc.mallet.pipe.iterator.*;
import cc.mallet.topics.*;

import java.util.*;
import java.util.regex.*;
import java.io.*;

public class Mallet {

    public static void main(String[] args) throws Exception {

        String filePath = "C:/mallet/ap.txt";
        // Begin by importing documents from text to feature sequences
        ArrayList<Pipe> pipeList = new ArrayList<Pipe>();

        // Pipes: lowercase, tokenize, remove stopwords, map to features
        pipeList.add( new CharSequenceLowercase() );
        pipeList.add( new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")) );
        pipeList.add( new TokenSequenceRemoveStopwords(new File("stoplists/en.txt"), "UTF-8", false, false, false) );
        pipeList.add( new TokenSequence2FeatureSequence() );

        InstanceList instances = new InstanceList (new SerialPipes(pipeList));

        Reader fileReader = new InputStreamReader(new FileInputStream(new File(filePath)), "UTF-8");
        instances.addThruPipe(new CsvIterator (fileReader, Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"),
                                               3, 2, 1)); // data, label, name fields

        // Create a model with 100 topics, alpha_t = 0.01, beta_w = 0.01
        //  Note that the first parameter is passed as the sum over topics, while
        //  the second is the parameter for a single dimension of the Dirichlet prior.
        int numTopics = 5;
        ParallelTopicModel model = new ParallelTopicModel(numTopics, 1.0, 0.01);

        model.addInstances(instances);

        // Use two parallel samplers, which each look at one half the corpus and combine
        //  statistics after every iteration.
        model.setNumThreads(2);

        // Run the model for 50 iterations and stop (this is for testing only, 
        //  for real applications, use 1000 to 2000 iterations)
        model.setNumIterations(50);
        model.estimate();

        // Show the words and topics in the first instance

        // The data alphabet maps word IDs to strings
        Alphabet dataAlphabet = instances.getDataAlphabet();

        FeatureSequence tokens = (FeatureSequence) model.getData().get(0).instance.getData();
        LabelSequence topics = model.getData().get(0).topicSequence;

        Formatter out = new Formatter(new StringBuilder(), Locale.US);
        for (int position = 0; position < tokens.getLength(); position++) {
            out.format("%s-%d ", dataAlphabet.lookupObject(tokens.getIndexAtPosition(position)), topics.getIndexAtPosition(position));
        }
        System.out.println(out);

        // Estimate the topic distribution of the first instance, 
        //  given the current Gibbs state.
        double[] topicDistribution = model.getTopicProbabilities(0);

        // Get an array of sorted sets of word ID/count pairs
        ArrayList<TreeSet<IDSorter>> topicSortedWords = model.getSortedWords();

        // Show top 5 words in topics with proportions for the first document
        for (int topic = 0; topic < numTopics; topic++) {
            Iterator<IDSorter> iterator = topicSortedWords.get(topic).iterator();

            out = new Formatter(new StringBuilder(), Locale.US);
            out.format("%d\t%.3f\t", topic, topicDistribution[topic]);
            int rank = 0;
            while (iterator.hasNext() && rank < 5) {
                IDSorter idCountPair = iterator.next();
                out.format("%s (%.0f) ", dataAlphabet.lookupObject(idCountPair.getID()), idCountPair.getWeight());
                rank++;
            }
            System.out.println(out);
        }

        // Create a new instance with high probability of topic 0
        StringBuilder topicZeroText = new StringBuilder();
        Iterator<IDSorter> iterator = topicSortedWords.get(0).iterator();

        int rank = 0;
        while (iterator.hasNext() && rank < 5) {
            IDSorter idCountPair = iterator.next();
            topicZeroText.append(dataAlphabet.lookupObject(idCountPair.getID()) + " ");
            rank++;
        }

        // Create a new instance named "test instance" with empty target and source fields.
        InstanceList testing = new InstanceList(instances.getPipe());
        testing.addThruPipe(new Instance(topicZeroText.toString(), null, "test instance", null));

        TopicInferencer inferencer = model.getInferencer();
        double[] testProbabilities = inferencer.getSampledDistribution(testing.get(0), 10, 1, 5);
        System.out.println("0\t" + testProbabilities[0]);
    }

}
War es hilfreich?

Lösung 3

If you try to run Mallet either by downloading the version 2.0.8-SNAPSHOT (https://github.com/mimno/Mallet) or by getting the currently latest maven version (2.0.7) you will get this error.

The reason is that Mallet expects the file logging.properties inside the created target\classes\cc\mallet\util\resources folder. When you build the project with maven, this file is not created and therefore this exception occurs in MalletLogger.java.

Someone should either configure maven properly so that the logging.properties file is created in the target folder. A temporary solution would be to modify the Mallet code to set another path for logging.properties.

Andere Tipps

Mallet looks for a logging file if one is not specified in the System properties. The simplest way to sort this out if you are using Maven is to put the file in

src/main/resources/cc/mallet/util/resources/logging.properties 

this will copy it automatically has part of the standard Maven build process to:

target/classes/cc/mallet/util/resources/logging.properties 

So you do not need any special configuration. The file can be empty, but it is logically deliberately left out so you configure your own logging.

For anyone else who is using Maven and trying to configure Mallet's logging, try this:

Create a new text file at src/mallet_resources/logging.properties. It doesn't actually need to specify anything; an empty file is enough to shut Mallet up.

Then modify your pom.xml file to make sure that file gets copied to the location mentioned in the other answer. To do this, in the <build><plugins> section, add:

<!--Mallet logging is horrifically verbose, and has not easy to configure-->
<!--We have to use this complicated process to copy the logging.properties file to the right location -->
<plugin>
    <artifactId>maven-resources-plugin</artifactId>
    <version>2.6</version>
    <executions>
        <execution>
            <id>copy-resources</id>
            <phase>validate</phase>
            <goals>
                <goal>copy-resources</goal>
            </goals>
            <configuration>
                <outputDirectory>
                    ${basedir}/target/classes/cc/mallet/util/resources
                </outputDirectory>
                <resources>
                    <resource>
                        <directory>src/mallet-resources</directory>
                        <filtering>true</filtering>
                    </resource>
                </resources>
            </configuration>
        </execution>
    </executions>
</plugin>

Regarding the "Couldn't open edu.umass.cs.mallet.base.util.MalletLogger resources/logging.properties file" error, encountered (e.g.) when running run.sh (or or other script or command) in BANNER Named Entity Recognition (uses MALLET).

Solution:

Copy 'logging.properties' from

src/main/java/edu/umass/cs/mallet/base/util/resources/logging.properties

to

target/scala-2.11/classes/edu/umass/cs/mallet/base/util/resources/logging.properties

[I am using the BANNER provided at https://github.com/clulab/banner ]

Another error I encountered at the same time (... Logging configuration class "edu.umass.cs.mallet.base.util.Logger.DefaultConfigurator" failed) can be safely ignored:

https://osdir.com/ml/ai.mallet.devel/2007-11/msg00008.html >> "I think this is a bug with the distribution but that it affects logging only. I've always ignored this warning."

http://comments.gmane.org/gmane.comp.ai.mallet.devel/200 >> "This bug should not affect your output."

http://courses.washington.edu/ling572/winter09/teaching_slides/1_08_Mallet.pptx >> Slide 20: "Please ignore this message." [Fei Xia, Jan 2009, 'Introduction to Mallet', Andrew McCallum's group at UMass (https://people.cs.umass.edu/~mccallum/)]

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top