Last edited one month ago
by Anna Lionetti

The LOD Platform Technology

Revision as of 15:20, 22 February 2024 by Anna Lionetti (talk | contribs)


LOD - Linked Open Data Platform is a highly innovative technological framework, an integrated ecosystem for the management of bibliographic, archive and museum catalogues, and their conversion to linked data according to the BIBFRAME ontology version 2.0 (https://www.loc.gov/bibframe/docs/bibframe2-model.html), extensible as needed for specific purposes.

The core of the LOD Platform was designed in the EU-funded project ALIADA, with the idea of creating a scalable and configurable framework able to adapt to ontologies from different domains, capable of automating the entire process of creating and publishing linked open data, regardless of the data source format.

The aim of this framework is to open the possibilities offered by linked data to libraries, archives and museums by providing greater interoperability, visibility and availability for all types of resources.

The application of the LOD Platform obviously requires the careful analysis of the standards, formats and models used in the institution addressed; however, its coverage, based on BIBFRAME 2.0 as core ontology, can be enriched with a suite of additional ontologies, such as Schema.org, Prov-O, MADS, RDFS, LC vocabularies, RDA vocabularies and so on; it’s extremely flexible and allows for the implementation of additional ontologies, vocabularies and modelling according to specific needs.  

By incorporating standards, models and technologies recognized as key elements for the creation of new processes of management and use of knowledge, the LOD Platform allows:

  • the creation of a data structure based on Agent, Work, Instance, Item, Place entities, as defined by BIBFRAME, and extensible to reconcile other entities;
  • data enrichment through the connection with external data sources;
  • reconciliation and clusterization of entities created from the original data;
  • the conversion of data according to the standard model indicated by the W3C for the LOD, RDF - Resource Description Framework;
  • delivery of converted and enriched data to the target institution for reuse in their systems;
  • the publication of the dataset in linked data on RDF storage (triplestore);
  • the creation of a discovery portal with a web user interface based on BIBFRAME or other ontologies defined in specific projects.

High level steps

In the implementation of a system that uses the LOD Platform, data from libraries, archives and museums are transformed into linked data through entity identification, reconciliation and enrichment processes.

Attributes are used to uniquely identify a person, work or other entity, with variant forms reconciled to form a cluster of data referring to the same entity. The data are subsequently reconciled and enriched with further external sources, to create a network of information and resources. The result is an open relationship database and Cluster Knowledge Base (CKB) in RDF.

The database uses the semantic web paradigms but allows the target institution to manage their data independently, and is able to provide:

  • enrichment of data with URIs, both for the original library records and for the output linked data entities; examples of sources for URI enrichment are ISNI, VIAF, FAST, GeoNames, LC Classification, LCSH, LC NAF, Wikidata;  
  • conversion of data to RDF using the BIBFRAME vocabulary and other ontologies;
  • creation of a virtual discovery platform with web user interface;
  • creation of a database of relationships and clusters accessible in RDF through a triplestore;
  • implementation of tools for direct interaction with the data, permitting the validation, update, long-term control and maintenance of the clusters and of the URIs identifying the entities (see below);  
  • batch/automated data updating procedures;
  • batch/automated data dissemination to libraries.
  • progressive implementation of additional workflows such as API for ILS, back-conversion for local acquisition and administration systems, reporting.

The goal is to ensure that a large amount of data, which often remains hidden or unexpressed in closed silos (“containers”), finally reveals its richness within existing collections.

Benefits

The LOD Platform, developed according to the principle of functionality, provides various environments and interfaces for the creation and enrichment of data and offers workflows capable of responding to the different needs of librarians / archivists / museum operators, professionals, scholars, researchers and participating students.

There are several advantages:

  • integration of the processes of a collaborative environment with local systems and tools;
  • integration into the semantic web while maintaining ownership and control of the data, benefiting from the simplified administration of the environment and a large pool of data;
  • integration of library/archive/museum data into the collaborative environment and pool of data;
  • standards and infrastructures for "future-proof" data, ie ensuring that they are compatible with the structure of linked data and the semantic web;
  • enrichment of data with further information and relationships not previously expressed in the established metadata formats in use (e.g. MARC), increasing the possibilities of discovery for all types of resources;
  • create an environment that is useful for both end users and professionals (librarians, archivists, museum operators);
  • allow librarians a wider and direct interaction with and editing of linked data entities through the Cluster Knowledge Base Editor (more details in the next section);
  • advanced search interfaces to improve the user experience and provide broader search results to users;
  • reveal data that would otherwise have remained hidden in silos, allowing end users to access a large amount of information that can be both imported and exported by the library.

This approach fully harnesses the potential of linked data, connecting library information to the advantage of scholars, patrons and all library users in a dynamic research environment that unlocks new ways of accessing knowledge.

Added values

It’s particularly relevant to highlight that the LOD Platform is currently being enhanced with a module dedicated to edit and update entities in the Cluster Knowledge Base (CKB). This Cluster Knowledge Base editor has been named JCricket, and is conceived as a collaborative environment with different levels of access and interaction with the data, enabling several manual and automatic actions on the clusters of entities saved in the database, including creation, modification, merge of clusters of works, of agents etc.

JCricket consists of two main layers:  

  1. automatic checks and update of the data performed by the LOD system;
  2. manual checks and edit of the data performed by the user through a web interface.

All changes to entities, both automatic and manual, are reported on the Entity Registry, a source (also available in RDF) that tracks the updates of each entity, especially when this has an impact on the persistent entity URI.

High-level workflow of the Cluster Knowledge Base editor

A further added value is the ability of the LOD Platform to interact directly with external data sources such as ISNI and Wikidata. The interaction with Wikidata is currently under analysis and will be triggered from the CKB editor itself, allowing the search from the editor into Wikidata and the enrichment of the LOD Platform entities with information from Wikidata and vice versa. This way the editor will allow for the creation of new identifiers both in the external data sources (where possible or applicable) and in the Cluster Knowledge Base.

Results from a query on Wikidata displayed on the editor interface: the editor is ready to enrich the entity with Wikidata information that will be saved in the Entity Knowledge Base.

How theLOD Platform works

The developed components and tools aim to create a useful environment for knowledge management, with advanced search interfaces to improve the user experience and provide wider results to libraries, archives, museums and their users:

  • Authify: a RESTFul module that provides search and full-text services of external datasets (downloaded, stored and indexed in the system), mainly related to Authority files (VIAF, Library of Congress Name Authority files etc.) that can also be extended to other types of datasets. It consists of two main parts: a SOLR infrastructure for indexing the datasets and related search services, and a logical level that orchestrates these services to find a match within the clusters of the entities.
  • Entity Knowledge Base, on PostgreSQL database, is the result of the data processing and enrichment procedures with external data sources for each entity; typically: clusters of Agents (authorized and variant forms of the names of Persons, (Corporate Bodies, Families) and clusters of titles (authorized access points and variant forms for the titles of the Works). The Cluster Knowledge Base, also called Entities Knowledge Base, contains other entities produced through identification and clustering processes (such as places, roles, languages, etc.)
  • RDFizer: a RESTFul module that automates the entire process of converting and publishing data in RDF according to the BIBFRAME 2.0 ontology in a linear and scalable way. It is flexible and can be adapted to multiple situations: it allows, therefore, to manage the classes and properties not only of BIBFRAME but also of other ontologies as needed.
  • Triple store: the LOD Platform can currently be integrated with two different types of triple stores: one open source (Blazegraph), more suitable for small or medium-sized projects (up to about 2,000,000 bibliographic records), and a commercial one, more suitable for larger datasets, such as Neptune, supporting RDF and SPARQL. The latter can be considered a valid alternative since it is integrated with Amazon Web Services infrastructure already in use for the whole system, and the whole LOD Platform has already migrated Neptune; therefore this solution can provide better performance.
  • Discovery portal: data presentation portal, for retrieving and browsing data in a user-friendly discovery interface.