ShareDoc:PublicDocumentation/LODPlatform/EntityEditor: Difference between revisions

Revision as of 08:33, 25 October 2024

JCricket Entity Editor and Shared Cataloguing Tool

The LOD Platform technology enables institutions to embrace the advantages of linked open data. The Cluster Knowledge Base (CKB, or Entity Knowledge Base) is the result of the entity identification and resolution obtained through clustering processes with data coming from different participating libraries with enrichment of external sources. JCricket is the Linked Data Entity Editor developed by the Share-VDE community, a tool designed for collaboratively curating entity data living in the CKB.

The entity editor JCricket opens new forms of cooperation among institutions. The CKB is conceived as a live source of authoritative data: through JCricket editor, member institutions engage in collaborative entity management and in a new way of performing authority control. JCricket is the tool for collaborative linked data entity management. It enables - according to the BIBFRAME ontology - the entity curation (e.g. creation of new entities, entity modification, the application of entity merge and split functions). The scope is to improve the quality of the Entity Knowledge Base that is a live source produced through clustering and daily update processes. JCricket is a manual application that makes it possible to manage bibliography data in the form of entities. It empowers professional librarians to enhance the quality of machine-generated data output of MARC to BIBFRAME transformation.

Also, the availability of this tool enables new data workflows that are being analysed to support the use of JCricket even outside of the Share Family LOD Platform. We are considering several scenarios where linked data systems (eg. Library of Congress Marva or LD4P Sinopia BIBFRAME editors) can simultaneously operate locally with their own tools and share and cooperatively edit linked data resources across different environments using JCricket.

JCricket operates on the Share-VDE CKB - Cluster Knowledge Base (and on the CKB of the other Share Family tenants), which is not a local data storage for a single library, but rather the outcome of complex integration processes that generate entities from local data in traditional formats, such as MARC, and new formats, like RDF. Its optimal application is within a large data pool formed through contributions from multiple libraries, such as Share-VDE; therefore, it does not impact original data that reside in member libraries’ systems, unless libraries want to use ad hoc APIs for entity updates both in SVDE and in their local systems, which is also an option.

Also, not only JCricket operates on members' institutions data processed by the the LOD Platform, but it can serve as a bridge with cross-domain contexts to create links with third parties, such as FOLIO community and Wikidata, and to support the integration with third parties' ILS / LSP.

References

This page provides a high-level description of JCricket functions. All manual functions are also available through corresponding Curation APIs.

For more detailed descriptions and specifications refer to:

the JCricket user manual, dedicated to professional users (librarians) who will use JCricket for shared cataloguing;
the JCricket UX guide, describing the concept and functions of JCricket user interface and user experience;
a presentation summarising the technical aspects of JCricket;
Curation APIs dedicated to Share Family members:

JCricket has been demonstrated at:

BIBFRAME Workshop in Europe 2024 (slides are also available);
ALA Conference 2023, during the Share-VDE Workshop;
LD4 Conference 2023 (slides are also available).

JCricket overview

The LOD Platform system fetches MARC records – or records in other formats - from the institution for ingestion and BIBFRAME conversion; linked data entities are created from this process and stored in the Cluster Knowledge Base (CKB). In this context, JCricket serves as a transformative tool, enabling librarians with specific authorization levels to refine and enhance the quality of the data. By facilitating human intervention, it acknowledges the inherent limitations of automated processes and harnesses the expertise of librarians to elevate the accuracy and richness of the shared knowledge base.

JCricket is integrated in the discovery portal web interface, for authenticated users.

Within the knowledge base, each cataloger can work on the dataset, using the Provenance to identify their own data. Through the dedicated JCricket web interface, librarians can engage in a spectrum of operations. Key operations include the ability to edit, update, create, delete properties or invalidate clusters.

Search on the portal: JCricket is integrated into the discovery portal web interface, for authenticated users.

The JCricket user interface allows librarians to merge multiple clusters into one, addressing cases where different clusters should logically belong to the same entity but were produced as different clusters in automated processes. Conversely, librarians can split a cluster into multiple clusters to rectify cluster reconciliation discrepancies. These edits on entities are saved in the system. Any operation performed on the entity is reported on the Logs page, to have a clear idea of the status of the entity/cluster.

JCricket user interface.

Provenance and Prism

To understand how JCricket operates, the foundational concept of Provenance should be mentioned.

In the LOD Platform, a tenant is represented by a set of institutions contributing to the same Cluster Knowledge Base. An institution within a tenant is called Provenance. We use that term because we always want to retain the relationship between entities and the data that originally contributed to their creation. Each record coming from a Provenance contributes in building/enriching one or more clusters. Therefore, a cluster can be seen as a prism where each facet represents data coming from a given provenance. Each cluster maintains a link to the records it originated from.

The concept of Prism and its Provenances.

The image below shows a very high-level pictureof JCricket flow, as applied to a LOD Platform tenant with multiple institutions (= Provenances):

original data flows into the LOD Platform from libraries, institutions and third-party sources (e.g. VIAF, ISNI, FAST);
the Cluster Knowledge Base contains the integrated/clustered/enriched entities converted to linked data;
the entity discovery portal (eg. svde.org) shows to end-users the result of this process: data can be searched on the portal;
data is manually edited through JCricket;
the entity discovery portal shows the result of manual editing and data curation.

The Big Picture: from Genesis to Edit.

The concept of Provenance as applied to the LOD Platform processes has been illustrated in the presentationShare-VDE perspective on Cluster Knowledge Base and Provenance.

JCricket v.1.0 features recap

AAA: Authentication + Authorization + Auditing:
- integrated in the discovery portal web interface, for authenticated users;
- user types: basic and advanced.
Cluster Status API: captures the status of an entity/cluster.
Edit function:
- real time notifications (through GraphQL subscriptions) about cluster property changes.

Merge function: C1, C2, C3 => C1, C2, C3
- Multiple phases: create the merge list, edit the merge list, edit clusters, request for review, approve (or deny the merge).

Split function: C1 => C1, C2
- Multiple phases: create the split-set, edit the split-set, edit clusters, request for review, approve (or deny the split).

Create function
Delete function (Invalidate):
- the Knowledge Base does not perform deletes by removing data. Instead, prisms are invalidated, meaning they are marked with tombstones. Invalidated Prisms are no longer searchable in the public portal; however, they can be still managed through JCricket API.
Dictionary API: What are the available cluster types? Which attributes belong to a cluster type? Which relationships? Given an attribute, which is its cardinality? Is it mandatory or not?
Review workflow: merge and split operations are reviewed by advanced editors.
Entity Event Log: tracks the history of changes.
User notifications: for managing the merge/split review lifecycle.
Data changes synchronization across storages (e.g. RDF Store, Search Engine, RDBMS).

Manual curation - Edit function

The Edit function is available for JCricket editors to add, remove and amend attributes, relationships and links belonging to a single Entity.

If the user is a basic editor, only the properties coming from the user’s provenance will be editable. If the user is an advanced editor, the whole Prism (meaning all properties) will be editable.

Edit function.

Manual curation - Merge function

The Merge function is available for users with the “advanced editor” role to merge one or more source Entities into one, picking the source Cluster(s) properties that must be ported.

The user picks two or more Clusters to merge, then designates the destination one. The remaining Clusters are automatically marked as “source”. Such destination or source are marked by dedicated statuses “MD” (Merge Destination) and “MS” (Merge Source) .

After that, the user can choose which properties to copy to the destination Entity.

After picking all the properties to put in the destination Cluster, the user confirms the merge and requests a review action by designating a reviewer. The destination Cluster shifts its status to “RN” (Review Needed).

The reviewer may approve the merge; at that point the destination Cluster shifts its status to “SV” (Saved), while source Clusters acquire the status “IN” (Invalidated).

Invalidated Clusters will remain in the system, but they won’t be indexed anymore, i.e. they will not appear in search results.

The reviewer may even reject the merge; in that case the destination cluster shifts back to the “MD” status; the reviewer provides some rejection notes to guide the editor.

Merge function.

Manual curation - Split and Create functions

The Split and Create function is available for users with the “advanced editor” role to let them move one or more properties between two clusters.

The user picks the “Giver” and “Receiver” Clusters. The giver Cluster takes the “SG” (Split Giver) status, while the receiver takes the “SR” (Split Receiver) status.

The user can then choose the giver’s properties to be moved to the receiver.

When satisfied with the choice, the user confirms the split and requests a review action by designating a reviewer. The receiver Cluster shifts its status to “RN” (Review Needed).

The reviewer may approve the split; at that point the giver and receiver Clusters shift their status to “SV” (Saved).

The reviewer may even reject the split; in that case the receiver cluster shifts back to the “SR” (Split Receiver) status; the reviewer provides some rejection notes to guide the editor.

Split function.

Dashboard, Review and Log

Editor users have access to a personal dashboard that displays information relevant to them. At the present, the dashboard contains a ‘notifications’ pane that shows notifications about merges and splits for which a review has been requested, or that have been approved or rejected.

The screenshot shows the dashboard page shown as part of the ‘dashboard’ view in the site navigation, while making it easy for you to switch to your most recent activities in other navigation tabs.

JCricket dashboard.

The Review and Entity Event Log functions complete the data curation flow:

Review workflow: merge and split operations are reviewed by advanced editors.
Entity Event Log: tracks the history of changes.

@@ Line 75: / Line 75: @@
 * Create function
-* Delete function
+* Delete function (Invalidate):
+** the Knowledge Base does not perform deletes by removing data. Instead, prisms are invalidated, meaning they are marked with tombstones. Invalidated Prisms are no longer searchable in the public portal; however, they can be still managed through JCricket API.
 * Dictionary API: What are the available cluster types? Which attributes belong to a cluster type? Which relationships? Given an attribute, which is its cardinality? Is it mandatory or not?
 * Review workflow: merge and split operations are reviewed by advanced editors.