Hierarchical here means that there are memory cgroups nested within cgroups. eg. You can create a parent cgroup P and it can have a child cgroup C. There can be processes in P (say p1 and p2) and in C (c1, c2). With use_hierarchy=1, memory stats at P would show the total of C's usage and usage by all processes in P. It will also account for any tmpfs in P.
If cgroup P goes over limit, it can reclaim memory from p1, p2, and child cgroup C. If C goes over limit, it will reclaim from c1 and c2.
I think the point that is unclear in documentation is that there can be tasks that are directly under P and not under one of its children cgroups.