Open Citation Data and Reference Linking

Open Citation Data and Reference Linking

Read this post in Deutsch

Citations are of central importance in science, whether in methodology, science evaluation, or simply for information retrieval. In recent years, Open Science initiatives have succeeded in opening up citation data for common use. – What is open citation data? How is it made available? And how do you use it?

Citations are a central element of scholarly work. By referencing literature, sources, or data, researchers link to the state of knowledge, substantiate their argument, enable its verification, and anchor their research in the history of science.

In scientometrics, citations are used for bibliometric analyses: The famous impact factor, which is influential in publishing and in the evaluation of science, is calculated based on citations.

Of course, bibliographic references are also important resources for information retrieval. Tracing the cited literature in relevant texts is one of the most effective research strategies, which gains additional efficiency in the context of Linked (Open) Data. (Klein 2017, pp. 127–128; Baykoucheva 2022, pp. 12–15)

For all these reasons, citations are also of great importance under the paradigm of Open Science. In recent years, efforts have been made to make citation data accessible comprehensively and without restrictions – with success.

I4OC Establishes Open Citations

Until recently, two fee-based services were the essential source of reliable and analyzable data on citations in journal articles: Clarivate’s ‘Web of Science’ and Elsevier’s Scopus. The two databases are important retrieval platforms for many disciplines and have been the source for calculating the best-known bibliometric indicators for many years. (Schiermeier 2017)

The University Library Zurich and the Zentralbibliothek Zurich license the Web of Science and Scopus services for members of the University of Zurich.

The Web of Science and its predecessors pioneered the interconnection and analysis of citation data. But its expensive subscription model precludes many universities and researchers from using the data for scientific innovation and from calculating and verifying metrics. In favor of the reproducibility of science and a transparent scientometric evaluation of research, the Open Science culture has led to the conviction that equal access to bibliographic citation data is necessary. For this reason, various actors started to make their citation data freely accessible.

An important impulse for this came from the non-profit organization OpenCitations. It provides a data model and infrastructures – namely indexes – for open citation data. Together with Wikidata and other initiators, OpenCitations is also the founder of the ‘Initiative for Open Citations’ (I4OC), which was launched in 2017. Within a few years, this initiative succeeded in convincing the major scientific publishers to release the reference data of their publications. In the meantime, OpenCitations’ most important citation index, COCI, has already recorded well over one billion citation records. (Shotton 2018)

The University Library Zurich, the Zentralbibliothek Zurich and other Swiss university libraries support OpenCitations via the crowdsourcing of the ‘Global Sustainability Coalition for Open Science Services’ (SCOSS).

What is an Open Citation?

In the indexes of OpenCitations, citations are not only registered as links, but as independent data units with descriptive properties. These include, for example, the digital identifiers of the citing and cited literature, the creation date of the citation, and whether it is a self-citation. (Baykoucheva 2022, p. 52; Heibi, Peroni, Shotton 2019, p. 1216; Peroni, Shotton 2020, p. 436)

A citation is considered ‘open’ if the citation data is machine-readable, if it is self-contained – that is, independent of access to the citing and cited resources –, and if it is retrievable and usable without restriction – ideally in the public domain. (Peroni, Shotton 2018, [pp. 3–4])

The Crossref Data Hub

A central role in the distribution of references is played by Crossref. Crossref is a non-profit organization supported by numerous participating publishers and organizations. Crossref acts as a registration agency for ‘Digital Object Identifiers’ (DOIs) and ensures linking between publications from different publishing houses. The publishers involved agree to include DOIs in their bibliographies. This process is called ‘reference linking’.

In addition, publishers have the option of including the cited literature in the metadata they deposit with Crossref for registered publications (registering references). This gives them access to the Cited-by service, which establishes a reciprocal link between cited and citing literature and thus also returns the information about how often and where a scientific text is cited. (Bilder 2016; Tolwinska 2018)

Application: OA publications and literature research

The services of Crossref are used on a daily basis in applications at the University of Zurich:

The open access publication platform HOPE offers its journals the possibility to register the bibliography of published articles together with the article metadata. Crossref then automatically enriches the bibliography with DOIs. The citation data is subsequently available for bibliometric analyses, and at the same time the articles published on HOPE are integrated into the corresponding data networks.

Furthermore, open and linked citations are employed in literature retrieval. Linked references provide a level of discovery not previously covered by bibliographic metadata. (Lauscher et al. 2018, p. 109) The library research portal swisscovery offers the function ‘Citation Trails’ within the holdings of the ‘Central Discovery Index’ (CDI), where citing and cited literature are directly linked:

Linked citation data in the swisscovery search area of the ‘Central Discovery Index’ (CDI)

Gaps and Outlook

In the meantime, the critical mass of available, public domain citation data has been reached and the practice of making citation data openly available has been established. In 2021, Elsevier, the last of the major scientific publishers, has also joined the I4OC initiative. (Martin-Martin 2021; Hutchins 2021)

However, the existing open data cover the spectrum of scientific disciplines not in equal measures and show large gaps overall and especially retrospectively. Various projects are trying to address these gaps:

OpenCitations, for example, has created an additional citation index that makes as yet inaccessible reference lists accessible through crowdsourcing. Researchers, editors, and publishers are invited to submit citation data that is not yet available. The legal barriers to this are low. (Heibi, Peroni, Shotton 2019)

A possible contribution by libraries is also being considered. These could for instance provide support for the cataloging of citation data. The project ‘Linked Open Citation Database’ (LOC-DB) has designed a working process according to which libraries could semi-automatically include and link bibliographic references in their metadata. Thereby, holdings that are not available in electronic form but only as books can also be included in the process. (Lauscher et al. 2018, p. 110)

Samuel Nussbaum, Team Open Science Services