I would like to fetch some data from Google Scholar automatically via a matlab script. I am mostly interested in data like Google Scholar's Bibtex entries and the forward citation feature. However, it seems that there is no API for Google Scholar, is there a way to automatically fetch bibliographic data from Google Scholar using Matlab? Are there some tools or code already available for this?

有帮助吗?

解决方案

If you really want to use Matlab for this (which I don't really advise), then you can look at some various web scraping examples and there is this code that actually already gets some info from Google Scholar. Basically, just good 'matlab web scraping' and off you go.

I personally would recommend using Python for this because Python is better for general programming IMHO. For instance, this guy has already done a similar thing to what you want with Python. However, if you know Matlab and don't have any interest/time for Python then follow the links in the first paragraph.

其他提示

A word of caution I found while working further on this project.

There is a reason why Google Scholar does not have an API. Using bots to collect from Google Scholar is against the EULA. The basic idea is that any program that tries to interface with Google Scholar cannot do so in a qualitatively different way than an end user. In other words, you can automatically fetch large amounts of data. Although the script in @JustinPeel's answer do not necessarily violate the terms, putting it in a massive loop, would.

Some specific points from this EULA:

You shall not, and shall not allow any third party to: ...

(i) directly or indirectly generate queries, or impressions of or clicks on Results, through any automated, deceptive, fraudulent or other invalid means (including, but not limited to, click spam, robots, macro programs, and Internet agents);

...

(l) "crawl", "spider", index or in any non-transitory manner store or cache information obtained from the Service (including, but not limited to, Results, or any part, copy or derivative thereof);

If you look at the Google Scholar robots.txt then you can also see that no bots of any kind are allowed.

I have heard from some colleagues that you will get in trouble if you try to circumvent this policy, which can result in your lab losing access to Google Scholar.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top