سؤال

Would it be possible to get the data behind the interactive chart in this webpage (sorry, website requires login)?

When I hover over the chart with a mouse, the data shows up, but how do I get that data?

Here's an extract of the HTML source code from that website:

<svg height="460" version="1.1" width="1037" xmlns="http://www.w3.org/2000/svg" style="overflow: hidden; position: relative; left: -0.5px;">
<desc>Created with Raphaël 2.1.0</desc>
<defs>

<path style="" fill="none" stroke="#f1f1f1" d="M20,130L1017,130M20,159.66666666666666L1017,159.66666666666666M20,189.33333333333331L1017,189.33333333333331M20,219L1017,219M20,248.66666666666666L1017,248.66666666666666M20,278.3333333333333L1017,278.3333333333333M20,308L1017,308">
<path style="" fill="none" stroke="#f1f1f1" d="M295.0344827586207,130L295.0344827586207,337.66666666666663M295.0344827586207,365L295.0344827586207,415M535.6896551724138,130L535.6896551724138,337.66666666666663M535.6896551724138,365L535.6896551724138,415M776.3448275862069,130L776.3448275862069,337.66666666666663M776.3448275862069,365L776.3448275862069,415M1017,130L1017,337.66666666666663M1017,365L1017,415">
<path style="" fill="none" stroke="#cccccc" d="M17,337.66666666666663L1018,337.66666666666663">
<path style="" fill="none" stroke="#cccccc" d="M17,365L1018,365">
<rect x="20" y="130" width="997" height="207.66666666666666" r="0" rx="0" ry="0" fill="#ff0000" stroke="none" style="opacity: 0;" opacity="0">
<path style="" fill="none" stroke="#6e87d7" d="M20,281.030303030303L54.37931034482759,316.6902356902357L88.75862068965517,318.78787878787875L123.13793103448276,318.78787878787875L157.51724137931035,318.78787878787875L191.89655172413794,312.4949494949495L226.27586206896552,285.2255892255892L260.65517241379314,312.4949494949495L295.0344827586207,314.59259259259255L329.41379310344826,316.6902356902357L363.7931034482759,297.8114478114478L398.1724137931035,318.78787878787875L432.55172413793105,335.56902356902356L466.9310344827586,293.61616161616155L501.3103448275862,276.8350168350168L535.6896551724138,272.6397306397306L570.0689655172414,274.7373737373737L604.448275862069,272.6397306397306L638.8275862068965,216.00336700336698L673.2068965517242,216.00336700336698L707.5862068965517,239.07744107744105L741.9655172413793,281.030303030303L776.344827586207,144.68350168350165L810.7241379310345,245.37037037037032L845.1034482758621,239.07744107744105L879.4827586206897,247.46801346801345L913.8620689655172,245.37037037037032L948.2413793103449,245.37037037037032L982.6206896551724,207.61279461279457L1017,163.56228956228955" stroke-width="2">
<path style="" fill="none" stroke="#f1f1f1" d="M20,390L1017,390M20,415L1017,415">
<path style="opacity: 

There are many many more of these path elements, which I didn't paste here.

هل كانت مفيدة؟

المحلول

You would have to parse that information (and guessing from your tags, you'll want to do this in python). However, having had a quick look at the Raphael documentation, I'm fairly sure you can get the data in another, quicker way: the data has to exist as a javascript array somewhere. Try looking for that first.

Eventually, from this javascript data, the SVG you've found gets generated. If you look at the SVG Path element description, you'll see how those M and L definitions need to be interpreted and then you should be capable of parsing those lines into the (python) dataset you like.

However, I want to state again that it is hard for us to find what you are looking for without even a picture to go on (is it a histogram, is it a linechart?). The lines that are being drawn with L could be all you need.

As an example, if you take that first path you've listed in a python session, you could do this:

svg_string = "M20,130L1017,130M20,159.66666666666666L1017,159.66666666666666M20,189.33333333333331L1017,189.33333333333331M20,219L1017,219M20,248.66666666666666L1017,248.66666666666666M20,278.3333333333333L1017,278.3333333333333M20,308L1017,308"
import re
data = [map(float, xy.split(',')) for xy in re.split('[ML]', svg_string)[1:]]

Remark that this only works correctly, because the Move and Line commands take turns in this string. But it does look like all the other paths are generated in a similar fashion (which leads me to think more strongly that the dataset is just somewhere in a javascript file you haven't looked at yet).

Finally, to obtain this sourcecode, you should look into using urllib2 for programmatic retrieval.

نصائح أخرى

A good option for this case is combining selenium with some scraping tool like Scrapy in Python.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top