Reading the articles in the current Cambridge University Libraries Information Bulletin (CULIB) about open access, journals and the DSpace@Cambridge project started me off on this whole subject of access to academic research.
You will probably not have heard of the University’s very own institutional repository: DSpace@Cambridge. Or, indeed, of institutional repositories at all! I will attempt a brief and hopefully clear explanation of what they are and why they are important to you.
Basically, institutional repositories (IRs), besides being a mouthful and tricky to type, are digital storage areas freely accessible to anyone online. Although anything that can be digitized can be placed in the repositories, current interest is about research staff posting copies of their journal-published articles in their repository. Over 90% of peer-reviwed academic journals now allow researchers to post a copy of their published articles in their institution’s repository: so-called ’self-archiving’. In some cases this is the ‘pre-print’ which can link to any later revisions, in other cases the final peer-reviewed version.
Because the repositories are freely accessible online, the exciting implication of self-archiving in repositories is that almost any research article published can be freely available online via the author’s institutional repository. This is known as ‘open access’ (OA): all would-be users can access any research article irrespective of whether their own institution can afford to subscribe to the journal they were published in.
Like Cambridge University, most academic institutions have set up repositories. The main problem now seems to be getting staff to place material in them - especially getting them to self-archive all their published research articles. For example, most of the stuff currently in DSpace@Cambridge is archival records - there are very few currently published research articles. In future, it may be made a condition of funding that all Research Council UK (RCUK) funded research must be self-archived by the author in the institution’s repository.
Indeed, the UK government House of Commons Science and Technology Committee investigated the whole area of access to research published in science technology and medical (STM) journals. Their report, Scientific publications: free for all?, was published in July. It advised the government to oblige UK authors to publish all research in their institutions’ websites or repositories.
Many specialist search engines are available, designed to search for articles in repositories across institutions worldwide. For example, OAIster, one of the most comprehensive, now indexes over 3.7m records across 363 institutions, and there are smaller experimental OA citation databases such as citebase search. Our Library User and Resource Guide will be covering open access, institutional repositories and IR search engines in the next edition available in January 2005. The wonderful thing about a search engine such as OAIster is that every single search result is freely accessible online with no restrictions! Others such as Elsevier’s Scirus scientific search engine covers some repositories as, indeed does Yahoo! Search (by collaborating with OAIster) and Google, especially its new Google Scholar offshoot (see previous item below).
Google Scholar, or ‘Schoogle’, a new web search service from Google, should prove useful to those of you searching for ’scholarly’ (ie academic) research articles in your studies. Google has isolated a subset of its (recently increased) 8 billion index which it considers to be scholarly material. This means that searches using Google Scholar should exclude thousands of unsuitable search results you may get using regular Google.
Google does not disclose the sources of its data or even the size of the scholarly subset. It has clearly made special arrangements with publishers and other data providers to allow it to access material in passworded subscription-only ‘deep web’ areas. As an indication of the size of the Scholar database, a search for ‘the’ using Google Scholar currently gives 289m results (compared with 8bn for regular Google).
The most interesting thing about Google Scholar seems to be its citation data, which make it an excellent citation database. Google scrapes the complete text of articles, including the citations. Google’s robots seem to be capable of reading and understanding citations - number of citations is one of the measures used to rank the search results. But, interestingly, Google Scholar actually provides a ‘cited by nnn’ link to a list of citations it knows about for each search result. Citation analysis is nothing new, but the comprehensive ones tend to be subscriber-only databases - for example Thomson’s ISI Web of Science (subscribed by the University), or Elsevier’s recently launched Scopus (not currently subscribed by the University). However, Google Scholar is freely available to all online.
Google have been working with the CrossRef organization on a comprehensive journal search project called CrossRef Search which is at pilot stage - try out the CrossRef Search box at Cambridge University Press for instance. CrossRef Search is planned as a freely accessible full text cross journal search service that, for example, any library would be able to include in its website. One cannot help wondering if Scholar is an offshoot of Google’s work on CrossRef Search. (By the way, it is possible to modify the
standard Google search URL to restrict it to CrossRef: add “&restrict=crossref” [without quotes] to the URL of a search you have done.)
Note that, unlike regular Google, you may not be able to access the full text of many of the articles found by Google Scholar. Google stipulates that abstract and citations must be accessible for every article it indexes, but not necessarily the full text of the article itself. You may be asked by a data provider to pay a small fee to see the full article - but please check with the Library first - we may have the actual journal hardcopy or you may be entitled to access the article for free via UL Electronic Resources. The UL home page now displays a message emphasizing this point. So, if you are acessing on-campus (and Cambridge subscribes to the relevant journal’s online access) you should be able to freely access the full text. Or, off-campus, you can use Athens passwords for those sources allowing it (get your Athens password from the library). Also note that some of Google Scholar’s results (from its citation gathering) will be offline resources - books, for example.
Google Scholar has one unique special syntax - author:[lastname]. The best way to search for a particular article is to use the author syntax plus a phrase (within double quotes). For example, author:einstein “theory of relativity”.
Google Scholar is currently a ‘beta’ (preview) service. Expect refinements and improvements. Some areas are not well served: conference proceedings for example, and open access (OA) articles could be flagged. Google Scholar is an attractive, easy to use search tool. But it is important to acquire a wide repertoire of search resources - access to an excellent and vast range of which are free to Cambridge students and staff through UL Electronic Resources.