스크립트에서 스택 오버플로 질문을 통해 검색하려면 어떻게해야합니까?

https://stackoverflow.com/questions/196755

10-07-2019
|

문제

"Python 모범 사례"와 같은 일련의 키워드가 주어지면 Python 스크립트에서 관련성 (?)에 의해 정렬 된 키워드를 포함하는 첫 10 개의 스택 오버플로 질문을 얻고 싶습니다. 내 목표는 튜플 목록 (제목, URL)으로 끝나는 것입니다.

이것을 어떻게 달성 할 수 있습니까? 대신 Google 쿼리를 고려 하시겠습니까? (파이썬에서 어떻게 하시겠습니까?)

해결책

>>> from urllib import urlencode
>>> params = urlencode({'q': 'python best practices', 'sort': 'relevance'})
>>> params
'q=python+best+practices&sort=relevance'
>>> from urllib2 import urlopen
>>> html = urlopen("http://stackoverflow.com/search?%s" % params).read()
>>> import re
>>> links = re.findall(r'<h3><a href="([^"]*)" class="answer-title">([^<]*)</a></h3>', html)
>>> links
[('/questions/5119/what-are-the-best-rss-feeds-for-programmersdevelopers#5150', 'What are the best RSS feeds for programmers/developers?'), ('/questions/3088/best-ways-to-teach-a-beginner-to-program#13185', 'Best ways to teach a beginner to program?'), ('/questions/13678/textual-versus-graphical-programming-languages#13886', 'Textual versus Graphical Programming Languages'), ('/questions/58968/what-defines-pythonian-or-pythonic#59877', 'What defines &#8220;pythonian&#8221; or &#8220;pythonic&#8221;?'), ('/questions/592/cxoracle-how-do-i-access-oracle-from-python#62392', 'cx_Oracle - How do I access Oracle from Python? '), ('/questions/7170/recommendation-for-straight-forward-python-frameworks#83608', 'Recommendation for straight-forward python frameworks'), ('/questions/100732/why-is-if-not-someobj-better-than-if-someobj-none-in-python#100903', 'Why is if not someobj: better than if someobj == None: in Python?'), ('/questions/132734/presentations-on-switching-from-perl-to-python#134006', 'Presentations on switching from Perl to Python'), ('/questions/136977/after-c-python-or-java#138442', 'After C++ - Python or Java?')]
>>> from urlparse import urljoin
>>> links = [(urljoin('http://stackoverflow.com/', url), title) for url,title in links]
>>> links
[('http://stackoverflow.com/questions/5119/what-are-the-best-rss-feeds-for-programmersdevelopers#5150', 'What are the best RSS feeds for programmers/developers?'), ('http://stackoverflow.com/questions/3088/best-ways-to-teach-a-beginner-to-program#13185', 'Best ways to teach a beginner to program?'), ('http://stackoverflow.com/questions/13678/textual-versus-graphical-programming-languages#13886', 'Textual versus Graphical Programming Languages'), ('http://stackoverflow.com/questions/58968/what-defines-pythonian-or-pythonic#59877', 'What defines &#8220;pythonian&#8221; or &#8220;pythonic&#8221;?'), ('http://stackoverflow.com/questions/592/cxoracle-how-do-i-access-oracle-from-python#62392', 'cx_Oracle - How do I access Oracle from Python? '), ('http://stackoverflow.com/questions/7170/recommendation-for-straight-forward-python-frameworks#83608', 'Recommendation for straight-forward python frameworks'), ('http://stackoverflow.com/questions/100732/why-is-if-not-someobj-better-than-if-someobj-none-in-python#100903', 'Why is if not someobj: better than if someobj == None: in Python?'), ('http://stackoverflow.com/questions/132734/presentations-on-switching-from-perl-to-python#134006', 'Presentations on switching from Perl to Python'), ('http://stackoverflow.com/questions/136977/after-c-python-or-java#138442', 'After C++ - Python or Java?')]

이것을 함수로 변환하는 것은 사소해야합니다.

편집하다: 도대체 할게 ...

def get_stackoverflow(query):
    import urllib, urllib2, re, urlparse
    params = urllib.urlencode({'q': query, 'sort': 'relevance'})
    html = urllib2.urlopen("http://stackoverflow.com/search?%s" % params).read()
    links = re.findall(r'<h3><a href="([^"]*)" class="answer-title">([^<]*)</a></h3>', html)
    links = [(urlparse.urljoin('http://stackoverflow.com/', url), title) for url,title in links]

    return links

다른 팁

stackoverflow에는 이미이 기능이 있으므로 검색 결과 페이지의 내용을 가져 와서 필요한 정보를 긁어 내면됩니다. 다음은 관련성에 의한 검색을위한 URL입니다.

https://stackoverflow.com/search?q=python+best+practices&sort= relevance

소스를 보면 각 질문에 필요한 정보가 다음과 같은 줄에 있음을 알 수 있습니다.

<h3><a href="/questions/5119/what-are-the-best-rss-feeds-for-programmersdevelopers#5150" class="answer-title">What are the best RSS feeds for programmers/developers?</a></h3>

따라서 해당 형식의 문자열을 검색하여 처음 10을 얻을 수 있어야합니다.

REST API를 추가 할 것을 제안하십시오. http://stackoverflow.uservoice.com/

유효한 HTTP 요청에서 반환 된 HTML을 스크레이프 할 수 있습니다. 그러나 그로 인해 업장이 나쁘고 숙면을 즐길 수있는 능력이 상실됩니다.

PyCurl을 사용하여 검색어를 쿼리 URI에 연결합니다.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow