Changes-based regeneration by static site generators

Question

Short answer: it's hard.

The hard part isn't knowing which files changed. The hard part is knowing what output files are affected by the files that changed. For example, if you change the title of a blog post, the main blog index will need to be updated. So will any tag pages. So will any page which list the other post as a "related post". If you have excerpts on your homepage, same deal.

But that's not impossible to deal with. You can keep a directed acyclic graph which tracks the dependencies for any given page, and regenerate the pages which include bits of other pages that change. It adds overhead and code complexity, as well as computation time, but doing this would probably be worth the effort.

Harder than that, though, is knowing which pages need to be regenerated as a result of changes to items they're not already associated with. What happens if you add a new tag to a blog post? Now the tag page for that new tag needs to be regenerated as well. If you're using tags to generate "related posts", all posts on your site should be regenerated, since the "best" relations for any given post could be different now. What happens when you add a new post? To avoid unnecessary compilation, the static site generator must know which pages would have included that post if it were around, and regenerate them as well.

Note that, in all these cases, false positives (pages which haven't changed, but are recompiled anyway) are acceptable, but false negatives (pages which should be recompiled, but are not) are absolutely unacceptable. So in every case, the site generator must err on the side of caution: if there's any possibility that a page would change were it compiled again, it must be recompiled.

Nanoc, for example, does track changes like you mention. It keeps a directed acyclic graph of pages that depend on other pages, and it caches it between compiles to limit the number of recompiles. It doesn't regenerate every page every time, but it does often recompile some pages which don't need to be compiled. There's still a lot of room to improve.