Question

I'm trying to get the text (in this case it's '10-Q') of an entry from XBRL using cheerio.js with nodejs. The line is below:

<dei:DocumentType contextRef="D2013Q3YTD" id="Fact-DB2A50C2A485F9CC21D51934C6E61D42">10-Q</dei:DocumentType>

I've tried:

$('dei:DocumentType').text

and a few others to no avail. There is not unique id or anything else that I can see.

Sample file:

http://www.sec.gov/Archives/edgar/data/1018724/000144530513002495/amzn-20130930.xml

So how could I go about extracting this text? Thanks.

Was it helpful?

Solution

It turns out that parsing the file above is very possible with Cheerio.

This works using Cheerio:

$('dei\\:CurrentFiscalYearEndDate').text().trim();

One must escape the special characters, twice, evidently.

OTHER TIPS

XBRL is XML and it cannot be treated as HTML DOM with libraries like cheerio. You will need an XML parser with Xpath support, like xpath, libxml or o3-xml

Then you can get the value with an XPath expression like this:

/*/dei:DocumentType/text()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top