If I'm reading this correctly, you only count 1 ngram per line, so the line
"<author>James Parker</author><year>2008</year><lang>English</lang>"
has a trigram and 3 unigrams. You don't need all combinations for each line.
The simplest way to count this is to just use a dictionary accessed by the tag or tuple to store the count. That gives you a single pass and should scale well with the number of input lines. I use a regular expression to pull out the first of each tag (this means the input has to be well formed) and then just index into the counter by tag name and then by the n-tuple formed by the set of tag names.
import collections
import re
string = """<author>James Parker</author><year>2008</year><lang>English</lang>
<author>Van Wie</author><year>2002</year>
<year>2012</year><lang>English</lang>
<year>2002</year><lang>French</lang>"""
strings = string.split("\n")
counter = collections.Counter()
tag_re = "\<[^/\>]*\>"
for s in strings:
tags = re.findall(tag_re, s)
tags.sort()
# use name directly
for tag in tags:
counter[tag] += 1
# use set for ngram
ngram = tuple(tags)
counter[ngram] += 1
print counter
This prints:
Counter({'<year>': 4, '<lang>': 3, '<author>': 2, ('<year>', '<lang>'): 2, ('<author>', '<year>'): 1, ('<author>', '<year>', '<lang>'): 1})