문제

I have a script that visits fcc.gov, then clicks a link which triggers a download:

require "mechanize"

docket_number = "12-268" #"96-128"

url = "http://apps.fcc.gov/ecfs/comment_search/execute?proceeding=#{docket_number}"
agent = Mechanize.new
agent.pluggable_parser.default = Mechanize::DirectorySaver.save_to 'downloads'

agent.get(url) do |page|
    link = page.link_with(:text => "Export to Excel file")
    xls = agent.click(link)
end

This works fine when docket_number is "12-268". But when you change it to "96-128", Mechanize downloads the html of the page instead of the desired spreadsheet.

The urls for both pages are:

As you can see, if you visit each page in a browser (I'm using Chrome) and click "Export to Excel file", a spreadsheet file is downloaded and there is not problem. "96-128" has many more rows, so when you click on the Export link, it takes you to a new page that refreshes every 10 seconds or so until the file begins downloading. How can I get around this and why is there this inconsistency?

도움이 되었습니까?

해결책

Clicking Export on 96-128 takes you to a page that refreshes using this kind of a tag (I've never heard of it before):

<meta http-equiv="refresh" content="5;url=/ecfs/comment_search/export?exportType=xls"/>

By default, Mechanize will not follow these refreshes. To get around that, change a setting on agent:

agent.follow_meta_refresh = true

Source: https://stackoverflow.com/a/2166480/94154

다른 팁

The proceeding 12-268 has 48 entries, 96-128 has 4046. When I click at 'Export to Excel File' on the latter, there sometimes is a page saying:

Finished processing 933 of 4046 records. Click if this page does not reload automatically.

I guess mechanize is seeing this, too.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top