Can anybody share a simple example of using Mathematica and Google scholar to extract academic research information

StackOverflow https://stackoverflow.com/questions/6109520

문제

How can I use Mathematica and Google scholar to find the number of papers a person published in 2011?

도움이 되었습니까?

해결책

Google Scholar is not very suited for this goal as it doesn't have a formal API AFAIK. It also doesn't provide results in a structured (e.g. XML) format. So, we have to resort to a quick (and very, very fragile!) text pattern matching hack like:

 searchGoogleScholarAuthor[author_String] := 
 First[StringCases[
   Import["http://scholar.google.com/scholar?start=0&num=1&q=" <> 
     StringDrop[
      StringJoin @@ ("author:" <> # <> "+" & /@ 
         StringSplit[author]), -1] <> "&hl=en&as_sdt=1,5"], ___ ~~ 
     "Results" ~~ ___ ~~ "of about" ~~ Shortest[___] ~~ 
     p : Longest[(DigitCharacter | ",") ..] ~~ ___ ~~ "." ~~ ___ ~~ 
     "(" ~~ ___ :> p]]

In[191]:= searchGoogleScholarAuthor["A Einstein"]

Out[191]= "6,400"

In[190]:= searchGoogleScholarAuthor["Einstein"]

Out[190]= "9,400"

In[192]:= searchGoogleScholarAuthor["Wizard"]

Out[192]= "197"

In[193]:= searchGoogleScholarAuthor["Vries"]

Out[193]= "70,700"

Add ToExpression if you don't like the string result. If you want to restrict the publication years you can add &as_ylo=2011&as_yhi=2011& to the search string and change the start and end years appropriately.

Please note that authors with popular names will generate lots of spurious hits as there is no way to uniquely identify a single author. Additionally, Scholar returns a diversity of hits, including citations, books, reprints and more. So, really, this ain't very useful for counting.

A bit of explanation:

Scholar splits the initials and names of authors and co-authors over several author: fields combined with a +. The StringDrop[StringJoin @@ ("author:" <> # <> "+" & /@ StringSplit[author]), -1] part of the code takes care of that. The StringDrop removes the last +.

The Stringcases part contains a large text pattern which basically searches for the text that Scholar places at the top of each results page and which contains the number of hits. This number is then isolated and returned.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top