Question

I have a variable of type HashMap<String, HashSet<Long>> and it's size can grow upto 100MB. I need to write this to secondary storage.

Serialization is not an option as it's too slow for me. Is there any other better method to dump the byte structure into the hard drive?

PS: I am worried only about the speed of writing to disk, slow reading isn't an issue.

Was it helpful?

Solution

You can serialize it yourself. You can also compress the data to make it smaller.

public static void write(String filename, Map<String, Set<Long>> data) throws IOException {
    try (DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(
            new DeflaterOutputStream(new FileOutputStream(filename))))) {
        dos.writeInt(data.size());
        for (Map.Entry<String, Set<Long>> entry : data.entrySet()) {
            dos.writeUTF(entry.getKey());
            Set<Long> value = entry.getValue();
            dos.writeInt(value.size());
            for (Long l : value) {
                dos.writeLong(l);
            }
        }
    }
}

To read it you just do the same thing but reading instead of writing.

public static Map<String, Set<Long>> read(String filename) throws IOException {
    Map<String, Set<Long>> ret = new LinkedHashMap<>();
    try (DataInputStream dis = new DataInputStream(new BufferedInputStream(
            new InflaterInputStream(new FileInputStream(filename))))) {
        for (int i = 0, size = dis.readInt(); i < size; i++) {
            String key = dis.readUTF();
            Set<Long> values = new LinkedHashSet<>();
            ret.put(key, values);
            for (int j = 0, size2 = dis.readInt(); j < size2; j++)
                values.add(dis.readLong());
        }
    }
    return ret;
}

public static void main(String... ignored) throws IOException {
    Map<String, Set<Long>> map = new LinkedHashMap<>();
    for (int i = 0; i < 20000; i++) {
        Set<Long> set = new LinkedHashSet<>();
        set.add(System.currentTimeMillis());
        map.put("key-" + i, set);
    }
    for (int i = 0; i < 5; i++) {
        long start = System.nanoTime();
        File file = File.createTempFile("delete", "me");
        write(file.getAbsolutePath(), map);
        Map<String, Set<Long>> map2 = read(file.getAbsolutePath());
        if (!map2.equals(map)) {
            throw new AssertionError();
        }
        long time = System.nanoTime() - start;
        System.out.printf("With %,d  keys, the file used %.1f KB, took %.1f to write/read ms%n", map.size(), file.length() / 1024.0, time / 1e6);
        file.delete();
    }
}

prints

With 20,000  keys, the file used 44.1 KB, took 155.2 to write/read ms
With 20,000  keys, the file used 44.1 KB, took 84.9 to write/read ms
With 20,000  keys, the file used 44.1 KB, took 51.6 to write/read ms
With 20,000  keys, the file used 44.1 KB, took 21.4 to write/read ms
With 20,000  keys, the file used 44.1 KB, took 21.6 to write/read ms

So 20K entries in 21 ms and using only 2.2 bytes per entry.

OTHER TIPS

Use any suitable serialization library (some of them are fast - google protocol buffers for example are fast and make small messages) to get the data in a suitable form, then zip it in memory and dump the results to disk.

The disk IO time is going to be your main bottleneck in most cases so compression to reduce that will help a lot.

We can do this using Jackson APIs.

Prerequisites: Add the following Jars to your classpath. You can download these from here.

  • Jackson Core
  • Jackson Annotations
  • Jackson Databind

Here, I am going to do an example for data structure HashMap>

Step 1: Create sample class (DataStructure) which holds your data structure as a variable.

public class DataStructure {
  public HashMap<String, HashSet<Long>> data = new HashMap<String, HashSet<Long>>();
  public DataStructure() {
  }
  public DataStructure(HashMap<String, HashSet<Long>> data) {
 this.data = data;
  }
}

Step 2: Create a method to store the data structure to a File.

static void storeToFile(HashMap<String, HashSet<Long>> data) {
  try {
   String fileName = "test.txt";
   FileWriter fw = new FileWriter(fileName);
   DataStructure ds = new DataStructure(data);
   ObjectMapper objectMapper = new ObjectMapper();
   fw.write(objectMapper.writeValueAsString(ds));
   fw.close();
  } catch (IOException e) {
   System.out.println("storeToFile: " + e.getMessage());
  }
 }

After Step 2, your data-structure is stored as a string in the specified file.

For more information: http://tutorials.jenkov.com/java-json/index.html

I wrote blog post regarding the retrieval also: https://tech-scribbler.blogspot.com/2020/04/how-can-you-store-any-complex-data.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top