質問

I've found a number of posts that are really close to this, but none that get me close enough.

I need to set up an automation that will:

-go to a webpage (http://webpageinquestion.com/things/3445)
-find a specific HTML tag on that page (<small>sometext</small>)
-take the value that is wrapped between that tag ("sometext")
-save that value to a text document as a list, prepended by the name of the page (3445_sometext)

By the end, I need a a list that looks like:

3445_sometext
3446_someothertext
3447_yetmoretext
3845_textext
4564_textThetext
9837_texty

I've explored different methods that might use Wget and jquery GET requests. But clearly, I don't have a solid understanding of either of those tools in order to accomplish this. I'm sure CURL might be able to do something like this, but I've never used it myself.

Any ideas? This has been such a puzzle...

役に立ちましたか?

解決

Using jQuery, I think the simplest and fastest method would be something like this:

  • Use a jQuery AJAX request to get the contents of that web page.
  • Use regex to get the contents within the <body> tags.
    • Regex will only work if you know for sure that every page has an opening and closing body tag that is properly formatted. If you can't ensure this, you'll need to crawl the DOM instead.
  • Put the ripped content in to a new jQuery object: var $contents = $(bodyContents)
  • Use typical jQuery functions to find what you need: $contents.find('small').text()
  • Write the value to the file.

It'd be a rather substantial amount of code to do all that, so I'm not going to try.

Also, for writing the file, unless you are in certain environments, you can't write the file with JavaScript (at least not with the technologies you've tagged), so you'll need a method for that. Some options for that:

  • Send an AJAX call to a server where it can store it.
  • Run the script as a Node script which can access the file system.
  • Use something like the HTML 5 Local Storage: http://diveintohtml5.info/storage.html

Good luck.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top