Domanda

How do I access data from the StackExchange API using Matlab?

The naive

sitedata = urlread('http://api.stackoverflow.com/1.1/questions?tagged=matlab')

fails since the data is compressed. However, when I write this to file (using fprintf(fileID,'%s',sitedata)), I get a zip-file that cannot be uncompressed.

È stato utile?

Soluzione

Try urlwrite() instead:

urlwrite('http://api.stackoverflow.com/1.1/questions?tagged=matlab',...
  'tempfile.zip')
gunzip('tempfile.zip')
fid = fopen('tempfile');
str = textscan(fid,'%s',Delimiter','\n');
fclose(fid);

A better version of this snippet would use tempname to dynamically generate temporary filenames.

Altri suggerimenti

Matlab's urlread assumes you're getting text data back, not binary. The gzip binary data is getting mangled either when urlread is decoding the character data to Unicode values to stick in Matlab chars, or when the formatted-output fprintf function is writing them out, encoding them to UTF-8 or whatever default character encoding you're using for fileID and changing the byte sequence, or maybe both.

IIRC, urlread will default to using ISO-8859-1 encoding, which means the bytes will be turned in to the Unicode code points with the same numeric values - effectively just a widening. So you can get the byte data back by doing sitebytes = uint8(sitedata). (That's a regular uint8() conversion, not a typecast().) (If this isn't the case, you can probably fiddle with urlread's CharSet option.)

If you can't get the right bytes out from urlread by fiddling with the encoding and casts, then you can drop down and make calls against the Java HttpAgent like urlread does and bypass the character set decoding step, or fiddle with its options. See the urlread source for how to do it.

Once you have the right bytes in memory, you can write them out to a file using the lower-level fwrite() function, which won't mangle them by doing character set encoding. Then you'll have a valid gzip file of the site's original response. (I think it'll work if you also just use fwrite(fileID, sitedata, 'uint8') directly on the char string, but it's uglier IMHO.)

You can also unzip it in memory using Java classes and save a trip to the filesystem. Do jsitebytes = typecast(sitebytes 'int8') to get them as Java-friendly signed bytes and then stick it into a ByteArrayInputStream and read it out through a GZIPInputStream. You'll need to build a little Java helper class because Matlab doesn't play well with passing byte[] buffers by reference like java.io wants, but it may be worthwhile if you do a lot of in-memory munging like this.

When working with web services or fancier data downloads (e.g. sites that need sessions or certificates), I've often ended up dropping down and coding directly against the HttpAgent and java.io classes from within Matlab.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top