Question

I'm trying to scrape articles from news agencies, but I can't figure out how to get the author of an article using python-goose. I've read through the documentation, source code and searched google.

from goose import Goose

def getArticle(url):
    g = Goose()
    article = g.extract(url=url)
    print article.title
    # print article.author
    # print article.writer

So, is there a built in way to extract the author of an article using python-goose?

Link for python-goose code and documenation: http://github.com/grangier/python-goose

Was it helpful?

Solution

From their documentation:

Goose will try to extract the following information:

  • Main text of an article
  • Main image of article
  • Any Youtube/Vimeo movies embedded in article
  • Meta Description
  • Meta tags

They don't promise to get the author; you will need to look into the metadata to see if it's included and extract it manually.

OTHER TIPS

Newspaper may satisfy your requirements.

Here is the usage:>>> article.authors [u'Leigh Ann Caldwell', 'John Honway']

You can find more details from its document or Github. http://newspaper.readthedocs.org/en/latest/

It is quite simple and powerful.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top