Topojson: quantization VS simplification

Question 1

The total size of your geometry is controlled by two factors: the number of points and the number of digits (the precision) of each coordinate.

Say you have a large geometry with 1,000,000 points, where each two-dimensional point is represented as longitude in ±180° and latitude in ±90°:

[-90.07231180399987,29.501753271000098],[-90.06635619599979,29.499494248000133],…

Real numbers can have arbitrary precision (in JSON; in JavaScript they are limited by the precision of IEEE 754) and thus an infinite number of digits. But in practice the above is pretty typical, so say each coordinate has 18 digits. Including extra symbols ([, ] and ,), each point takes at most 1 + 18 + 1 + 18 + 1 = 39 bytes to encode in JSON, and the entire geometry is about 39 * 1,000,000 ≈ 39MB.

Now say we convert these real numbers to integers: both longitude and latitude are reduced to integers x and y where 0 ≤ x ≤ 99 and 0 ≤ y ≤ 99. A simple mapping between real-number points ⟨λ,φ⟩ and integer coordinates ⟨x,y⟩ is:

x = floor((λ + 180) / 360 * 100);
y = floor((φ + 90) / 180 * 100);

Since each coordinate now takes at most 2 digits to encode, each point takes at most 1 + 2 + 1 + 2 + 1 = 7 bytes to encode in JSON, and the entire geometry is about 7MB; we reduced the total size by 82%.

Of course, nothing comes for free: if you remove too much information, you will no longer be able to display the geometry accurately. The rule of thumb is that the size of your grid should be at least twice as big as the largest expected display size for the entire map. For example, if you’re displaying a world map in a 960×500 pixel space, then the default 10,000×10,000 (-q 1e4) is a reasonable choice.

So, quantization removes information by reducing the precision of each coordinate, effectively snapping each point to a regular grid. This reduces the size of the generated TopoJSON file because each coordinate is represented as an integer (such as between 0 and 9,999) with fewer digits.

In contrast, simplification removes information by removing points, applying a heuristic that tries to measure the visual salience of each point and removing the least-noticeable points. There are many different methods of simplification, but the Visvalingam method used by the TopoJSON reference implementation is described in my Line Simplification article so I won’t repeat myself here.

While quantization and simplification address these two different types of information mostly independently, there’s an additional complication: quantization is applied before the topology is constructed, whereas simplification is necessarily applied after to preserve the topology. Since quantization frequently introduces coincident points ([24,62],[24,62],[24,62]…), and coincident points are removed, quantization can also remove points.

The reason that quantization is applied before the topology is constructed is that geometric inputs are often not topologically valid. For example, if you takes a shapefile of Nevada counties and combine it with a shapefile of Nevada’s state border, the coordinates in one shapefile might not exactly match the coordinates in the other shapefile. By quantizing the coordinates before constructing the topology, you snap the coordinates to a regular grid and can get a cleaner topology with fewer arcs, hopefully correctly identifying all shared arcs. (Of course, if you over-quantize, then you can cause too many coincident points and get self-intersecting arcs, which causes other problems.)

In a future release, maybe 1.5.0, TopoJSON will allow you to control the quantization before the topology is constructed independently from the quantization of the output TopoJSON file. Thus, you could use a finer grid (or no grid at all!) to compute the topology, then simplify, then use a coarser grid appropriate for a low-resolution screen display. For now, these are tied together, so I recommend using a finer grid (e.g., -q 1e6) that produces a clean topology, at the expense of a slightly larger file. Since TopoJSON also uses delta-encoded coordinates, you rarely pay the full price for all the digits anyway!

Question 2

The two are related, but have different purposes and results.

I believe quantization collapses nearby points based on the parameter (which you tune to the expected resolution of the view) - no point in having a resolution higher than the pixels that will be drawing the map. But it doesn't go out of the way to analyze the path to determine the optimal number of points needed to represent the shape.

Simplification is an algorithm that will analyze the polygon and reduce the number of points in an optimal manner such that the overall deformation of the polygon is minimized. Basically, it can be used to dramamatically reduce the number of points (and thus file size) without noticeable impact to the quality of the path.

As a parallel case study, consider a straight line made up of 10 points. Quantization will reduce the number of points (collapsing nearby or coincident points) based on the value you use. Simplification will analyze the line and realize that 8 out of the ten points can be removed without significantly changing the polygon's overall shape, and reduce the line to two points (because there is no deformation of the path by removing points on a line).

See also:

Topojson reference: https://github.com/mbostock/topojson/wiki/Command-Line-Reference
M. Bostock's Simplification article: http://bost.ocks.org/mike/simplify/

Both should be used in combination: quatization to reduce the map to a right sized grid, simplification to optimize the paths.