Question

I have looked around quite a bit to try and answer this question, but to no avail. I am parsing wikimedia page dumps to process certain pages (yes, I am aware of several tools to parse wikimedia page dumps, but they don't work for me as well as my parser).

Question is simple. I know how to detect start of a section (e.g. "==External References=="). That's easy. What's not well defined is how to detect when a section ends? For example, for most sections I can scan until start of next section header, but that isn't reliable. I looked at wikimedia's help page on sections, but it doesn't say how to detect end of a section.

Was it helpful?

Solution

There is no "section end" marker in MediaWiki syntax. A section extends until the next section header of the same or lower level. (There is also a "section 0" containing all the text before the first section header.)

Yes, this implies that sections at different levels can overlap, as in this example:

This text is in section 0.

== Section 1 begins here ==

This text is in section 1.

=== Section 2 begins here ===

This text is in sections 1 and 2.

=== Section 3 begins here ===

This text is in sections 1 and 3.

== Section 4 begins here ==

This text is in section 4.

Note that headings created using the HTML <h1>, <h2>, etc. tags don't begin or end sections, and won't have section edit links, even though they look otherwise identical to section headings.

Section headings inside templates do get section edit links, which let you edit the corresponding section of the template, but they're treated specially and are not considered part of the normal section structure of the containing page. There are also some weird special cases here involving section headers inside template parameters which I don't fully remember off the top of my head.

The automatically generated first level heading at the top of every page also doesn't count as a section heading, although any extra first level headings created with = Heading = do.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top