Share Family Bulletin - n. 5 May 2022

We are happy to announce the Share-VDE semi-annual workshop for current and former members and for observer institutions on Monday June 27th at 8am - 10am PDT | 11am - 1pm EDT | 5pm - 7pm CEST.

It’s a great pleasure to announce that the meeting will be held in hybrid format, both in presence and online. For those who are able to join the session in presence, the venue is in the Jefferson Building at the Library of Congress, in the Room # LJ119.

Much gratitude goes to the Library of Congress for hosting the meeting.

In the file https://bit.ly/SVDE_workshop_2022-Jun-27 you can find the link to the agenda document that includes the instructions to join remotely.

See below an extensive update on the latest developments for Share-VDE and the other branches of the Share Family that are supported by the same LOD Platform technology, towards the production environment. Since the last update shared via this mailing list a few months ago, important advancements have been made particularly to the system infrastructure and to the features that leverage the API layer orchestrating the query of SVDE data.

Having a system capable of hosting union catalogs that integrate catalogs from many libraries producing a rich and articulated structure of linked data entities imposes the challenge of having adequate technological and infrastructural resources to support the connected complexity and amount of data. For this reason, SVDE has recently adopted AWS - Amazon Web Services which will better support system scalability to make it more robust and meet high availability expectations.

Obviously, this further enhancement of infrastructure meant a lot of work in setting up the AWS backbone and integrating it with the pre-existing components, in particular the clustering module and algorithms that are constantly refined and enriched based on the input of the SVDE community itself, particularly through the work of the SEI - Sapientia Entity Identification group.

In light of this, in March we processed an initial set of approximately 42 millions bibliographic and authority MARC records that were converted to create 98 millions of clusters of linked data entities. This process is iterative until all components of the system are stable; therefore in the next few weeks we will continue with the data uploads and the final numbers will be much higher.

A similar process is in course to be replicated also for the other branches and portals housed by the Share Family, that is: the portal dedicated to National Bibliographies which hosts the data of the British National Bibliography, with the British Library being the first institution participant in this tenant; the open pool of PCC-quality BIBFRAME data; the portal of the Kubikat-LOD pilot project which aggregates the catalogs of the four German art history libraries of the Kubikat group.

The SVDE staff is testing the results of loading the data on the various portals through a structured method that will be useful for making constant optimizations to the functioning of the system. (Please note a caveat: because of this reprocessing work, during this and the next week you might experience intermittent outage or downtimes of the Share Family portals due to the re-load of the data. If some of the links included in this message do not work properly, that’s for this reason.)

In parallel with the infrastructure improvement, a number of advancements have been developed, including clustering and search algorithm refinements. Those enhancements are beneficial to all Share Family tenants, and the new processing run of library catalogues in progress these weeks is a prerequisite to activate the new functions.

Just to mention some of the most relevant new features involved:

1. Improvements and refinements to the search functions, including:

Publication simple search, including the default simple search on Publications and the ability to filter for Language, Year, Type, Library, Electronic / Print, Auctions / Exhibitions. The default simple search on Publications will be enabled only on some tenants of the Share Family. As soon as it will be available, we’ll be able to show you more on this.

The search mechanism and search features like facets are configurable at tenant level.

Support for the federated search and integration of data from external sources (es. JSTOR for the Kubikat-LOD pilot portal).

2. Initial version of Subject management, including subject strings linked in the Subject tab of the Publication page and the display of concepts; currently this is visible for example on Kubikat-LOD pilot, and will be displayed soon on the other Share Family portals. There will also be an improvement of the current feature in future versions of the system.

3. User interface functions that enable connections with local library environments or connected services, including the link to the local OPACs and in the future interactive features like circulation request buttons (e.g. see an example of the button “Available at”). Other similar functions are being analysed for further enhancements of the actions available for end users.

4. Link to resources connected with the records (e.g. table of content, review, publisher website etc.).

5. Data representation in different formats including JSON, MARC, MARCXML, RIS (other linked-data based representation formats are in progress, including JSON-LD, RDF XML, N-triples, N3, Turtle, N-Quads, TriX, TriG).

6. Management of the Provenance of library resources: the system can be queried via API to return the bibliographic records of a given Provenance (= institution) connected to an Instance, and alo to show which institutions have contributed to a given linked data cluster (through the Provenance itself). This is crucial to support the editing of linked data entities that will be enabled by the JCricket editor that is under development. See more on Provenance in a recent presentation.

Concerning other work strands, the developments for the JCricket linked data entity editor are moving forward: a lot of work is going on behind the scenes for the implementation of the back-end features that will support the editing of the data in the Cluster Knowledge Base. For example, federated authentication and authorization to manage JCricket users; entity status mechanism to rule concurrent editing and cluster variations; notification management.

Much work is also being done to optimise the general performance of the system and in particular of the indexing process, which is a key part of the data load to the web portals. This is not a trivial task when it comes to the very large size of SVDE tenant, which supports an exponential number of clusters created from the original library records. It is critical to have a responsive system and we are doing our best to provide the best user experience possible.

Developments are also ongoing to complete customised skin portals and SVDE localisations, starting from the University of Pennsylvania dedicated portal.

From the community work standpoint, the SEI - Sapientia Entity Identification working group has concluded the work on the definition of the Instance entity and how it should be clustered among the other SVDE entities, and defined the rules to model the relationships between entities in order to relate them appropriately. This represents a significant contribution to the BIBFRAME based SVDE ontology that is also compatible with IFLA LRM.

Moreover, we are very happy with the progress of the National Bibliographies working group. After the initial discussion phase opened in the first months of the group’s work, we are now structuring the use cases for a tenant hosting a collective global catalogue of national bibliographies in linked open data, thanks to the input of the participating libraries. Starting from this input, we have distilled a first set of high-level use cases that the group is analyzing; the next step is for the group to define priority use cases, that is, those that will be transformed into functions that libraries want to be developed first in the National Bibliographies tenant.

It’s also worth mentioning that the involvement in the discussions stemming from the PCC BIBFRAME Data Exchange meeting will be of great value for the advancement of interoperability and cooperation within the community, and for the Share Family itself. This focus on interoperability among BIBFRAME nodes will also be beneficial to the harmonisation of SVDE - LD4P3 interaction, and the connection with the Sinopia environment.

Regarding other Share-VDE connected services, the initial release of the authority control features for MARC-based environments is now available, while we look forward to designing the BIBFRAME-based authority workflows which will complete the picture of authority control services.

The information to process is a lot, so don’t hesitate to ask for functional, infrastructural or technical details on any of the above.

To provide feedback on the new version of https://svde.org, do report bugs and suggestions by reaching out through the forum https://forum.svde.org/.

For further information on any of the above, do not hesitate to contact us at [[1]] or consult the SVDE https://wiki.svde.org, that is the source of information about Share-VDE and the Share Family.