Question

So I am thinking about building a hobby project, one off kind of thing, just to brush up on my programming/design.

It's basically a multi threaded web spider, updating the same data structure object->int.

So it is definitely overkill to use a database for this, and the only thing I could think of is a thread-safe singleton used to contain my data structure. http://web.archive.org/web/20121106190537/http://www.ibm.com/developerworks/java/library/j-dcl/index.html

Is there a different approach I should look in to?

Was it helpful?

Solution

Double-checked locking has been proven to be incorrect and flawed (as least in Java). Do a search or look at Wikipedia's entry for the exact reason.

First and foremost is program correctness. If your code is not thread-safe (in a multi-threaded environment) then it's broken. Correctness comes first before performance optimization.

To be correct you'll have to synchronize the whole getInstance method

public static synchronized Singleton getInstance() {
   if (instance==null) ...
}

or statically initialize it

private static final Singleton INSTANCE = new Singleton();

OTHER TIPS

Using lazy initialization for the database in a web crawler is probably not worthwhile. Lazy initialization adds complexity and an ongoing speed hit. One case where it is justified is when there is a good chance the data will never be needed. Also, in an interactive application, it can be used to reduce startup time and give the illusion of speed.

For a non-interactive application like a web-crawler, which will surely need its database to exist right away, lazy initialization is a poor fit.

On the other hand, a web-crawler is easily parallelizable, and will benefit greatly from being multi-threaded. Using it as an exercise to master the java.util.concurrent library would be extremely worthwhile. Specifically, look at ConcurrentHashMap and ConcurrentSkipListMap, which will allow multiple threads to read and update a shared map.

When you get rid of lazy initialization, the simplest Singleton pattern is something like this:

class Singleton {

  static final Singleton INSTANCE = new Singleton();

  private Singleton() { }

  ...

}

The keyword final is the key here. Even if you provide a static "getter" for the singleton rather than allowing direct field access, making the singleton final helps to ensure correctness and allows more aggressive optimization by the JIT compiler.

If your life depended on a few microseconds then I would advise you to optimize your resource locking to where it actually mattered.

But in this case the keyword here is hobby project!

Which means that if you synchronized the entire getInstance() method you will be fine in 99.9% of all cases. I would NOT recommend doing it any other way.

Later, if you prove by means of profiling that the getInstance() synchronization is the bottleneck of your project, then you can move on and optimize the concurrency. But I really doubt it will cause you trouble.

Jeach!

Try the Bill Pugh solution of initialization on demand holder idiom. The solution is the most portable across different Java compilers and virtual machines. The solution is thread-safe without requiring special language constructs (i.e. volatile and/or synchronized).

http://en.wikipedia.org/wiki/Singleton_pattern#The_solution_of_Bill_Pugh

as Joshua Bloch argues in his book "effective java 2nd edition" I also agree that a single element enum type is the best way to implement a singleton.

public enum Singleton {
  INSTANCE;

  public void doSomething() { ... }
}

If you look at the very bottom of that article, you'll see the suggestion to just use a static field. That would be my inclination: you don't really need lazy instantiation (so you don't need getInstance() to be both an accessor and a factory method). You just want to ensure that you have one and only one of these things. If you really need global access to one such thing, I'd use that code sample towards the very bottom:

class Singleton
{
  private Vector v;
  private boolean inUse;
  private static Singleton instance = new Singleton();

  private Singleton()
  {
    v = new Vector();
    inUse = true;
    //...
  }

  public static Singleton getInstance()
  {
    return instance;
  }
}

Note that the Singleton is now constructed during the installation of static fields. This should work and not face the threading risks of potentially mis-synchronizing things.

All that said, perhaps what you really need is one of the thread-safe data structures available in the modern JDKs. For example, I'm a big fan of the ConcurrentHashMap: thread safety plus I don't have to write the code (FTW!).

Why don't you create a data structure you pass to each of the threads as dependency injection. That way you don't need a singleton. You still need to make the thread safe.

The article you referenced only talks about making the creation of the singleton object, presumably a collection in this case, thread-safe. You also need a thread-safe collection so that the collection operations also work as expected. Make sure that the underlying collection in the singleton is synchronized, perhaps using a ConcurrentHashMap.

Check out this article Implementing the Singleton Pattern in C#

public sealed class Singleton
{
    Singleton()
    {
    }

    public static Singleton Instance
    {
        get
        {
            return Nested.instance;
        }
    }

    class Nested
    {
        // Explicit static constructor to tell C# compiler
        // not to mark type as beforefieldinit
        static Nested()
        {
        }

        internal static readonly Singleton instance = new Singleton();
    }
}

How about:

public static Singleton getInstance() {
  if (instance == null) {
    synchronize(Singleton.class) {
      if (instance == null) {
         instance = new Singleton();
      }
    }
  }

  return instance;
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top