How do I get the author of an article using python-goose

https://stackoverflow.com/questions/21079015

27-09-2022
|

题

I'm trying to scrape articles from news agencies, but I can't figure out how to get the author of an article using python-goose. I've read through the documentation, source code and searched google.

from goose import Goose

def getArticle(url):
    g = Goose()
    article = g.extract(url=url)
    print article.title
    # print article.author
    # print article.writer

So, is there a built in way to extract the author of an article using python-goose?

Link for python-goose code and documenation: http://github.com/grangier/python-goose

解决方案

From their documentation:

Goose will try to extract the following information:

Main text of an article

Main image of article

Any Youtube/Vimeo movies embedded in article

Meta Description

Meta tags

They don't promise to get the author; you will need to look into the metadata to see if it's included and extract it manually.

其他提示

Newspaper may satisfy your requirements.

Here is the usage:>>> article.authors [u'Leigh Ann Caldwell', 'John Honway']

You can find more details from its document or Github. http://newspaper.readthedocs.org/en/latest/

It is quite simple and powerful.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow