I'm trying to scrape articles from news agencies, but I can't figure out how to get the author of an article using python-goose. I've read through the documentation, source code and searched google.

from goose import Goose

def getArticle(url):
    g = Goose()
    article = g.extract(url=url)
    print article.title
    # print article.author
    # print article.writer

So, is there a built in way to extract the author of an article using python-goose?

Link for python-goose code and documenation: http://github.com/grangier/python-goose

有帮助吗?

解决方案

From their documentation:

Goose will try to extract the following information:

  • Main text of an article
  • Main image of article
  • Any Youtube/Vimeo movies embedded in article
  • Meta Description
  • Meta tags

They don't promise to get the author; you will need to look into the metadata to see if it's included and extract it manually.

其他提示

Newspaper may satisfy your requirements.

Here is the usage:>>> article.authors [u'Leigh Ann Caldwell', 'John Honway']

You can find more details from its document or Github. http://newspaper.readthedocs.org/en/latest/

It is quite simple and powerful.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top