Archive for the ‘academic search engine optimization’ Category

Do you trust Google Scholar?

Saturday, December 18th, 2010

Are you using Google Scholar? For finding scientific literature? For obtaining citation counts and publication lists of researchers? Have you ever thought about how trustworthy the information is you get on Google Scholar?

My colleague and I performed several tests with Google Scholar and found out that it is really easy to fool Google Scholar. You can easily increase citation counts of articles and therefore increase the article’s rankings. You can easily add invisible keywords to articles and make the article appear relevant for searches it actually isn’t. You can also create complete non-sensical articles with the paper generator SciGen and make Google Scholar index them. And you can place any kind of advertisement in manipulated articles and make users of Google Scholar downloading them.
Of course, our results do not mean that you cannot trust Google Scholar at all or shouldn’t use it at all. Despite our results I am using Google Scholar frequently – imho it’s still the best academic search engine on the market. However, as with all other search engines you should be aware that there might be spam and manipulated information and you should really be carefully using citation counts from Google Scholar. Maybe there are no, or little, manipulations right now. But the more citation counts from Google Scholar are used for performance evaluations, the higher the incentive for researchers to manipulate them (and, as said, it’s really easy).

What I am interested in now is: What’s you opinion on this subject? Have you every found something on Google Scholar that was suspicious? Please let me know.

If you are interested in more information read the full article, titled “Academic Search Engine Spam and Google Scholar’s Resilience Against it”, here.

Update 2010/12/31:

We got a few questions when we did the experiments on Google Scholar (unfortunately we didn’t state that in the paper). The answer: Between early 2009 and mid of 2009. We first submitted the paper to WWW2010 in November 2009 but it was rejected. Well, and then it took… many many month (and edits) before the Journal of Electronic Publishing finally accepted and published the paper :-) .

Update 2011/01/02:

There is another really interesting article about spamming Google Scholar: Cyril Labbe created a fake researcher called Ike Antkare and made him one of the most cited authors of all time (according to Google Scholar). Read the article here.

New Paper: On the Robustness of Google Scholar against Spam

Saturday, June 12th, 2010

Update: The final article is published. Please read here.

I am currently in Toronto presenting our new paper titled “On the Robustness of Google Scholar against Spam” at Hypertext 2010. The paper is about some experiments we did on Google Scholar to find out how reliable their citation data etc. is. The paper soon will be downloadable on our publication page but for now i will post a pre-print version of that paper here in the blog:

Abstract

In this research-in-progress paper we present the current results of several experiments in which we analyzed whether spamming Google Scholar is possible. Our results show, it is possible: We ‘improved’ the ranking of articles by manipulating their citation counts and we made articles appear in searchers for keywords the articles did not originally contained by placing invisible text in modified versions of the article.

1.    Introduction

Researchers should have an interest in having their articles indexed by Google Scholar and other academic search engines such as CiteSeer(X). The inclusion of their articles in the index improves the ability to make their articles available to the academic community. In addition, authors should not only be concerned about the fact that their articles are indexed, but also where they are displayed in the result list. As with all ranked search results, articles displayed in top positions are more likely to be read.

In recent studies we researched the ranking algorithm of Google Scholar [1-3] and gave advice to researchers on how to optimize their scholarly literature for Google Scholar [4]. However, there are provisos in the academic community against what we called “Academic Search Engine Optimization” [4]. There is the concern that some researchers might use the knowledge about ranking algorithms to ‘over optimize’ their papers in order to push their articles’ rankings in non-legitimate ways.

We conducted some experiments to find out how robust Google Scholar is against spamming. The experiments are not all completed yet but those that are completed show interesting results which are presented in this paper. (more…)

Academic Search Engine Optimization: What others think about it

Sunday, April 18th, 2010

In January we published our article about Academic Search Engine Optimization (ASEO). As expected, feedback varied strongly. Here are some of the opinions on ASEO:

Search engine optimization (SEO) has a golden age in this internet era, but to use it in academic research, it sounds quite strange for me. After reading this publication (pdf) focusing on this issue, my opinion changed.

[...] on first impressions it sounds like the stupidest idea I’ve ever heard.

ASEO sounds good to me. I think it’s a good idea.

Good Article..

As you have probably guessed from the above criticisms, I thought that the article was a piece of crap.

In my opinion, being interested in how (academic) search engines function and how scientific papers are indexed and, of course, responding to these… well… circumstances of the scientific citing business is just natural.

Check out the following Blogs to read more about it (some in German and Dutch) (more…)