Pilot Engineering Repository Xsearch

PerX Logo

Shared Services Report

M.Moffat (M.Moffat@hw.ac.uk) - Ver 1.0 (19/10/06)

Home>About>Deliverables>Shared Services Report

One of the aims of the PerX project is to investigate the practical management and maintenance issues associated with enabling resource discovery across multiple digital repository collections. Associated with this is an objective to analyse the suitability of JISC infrastructural shared services for use within the pilot, and experiment with the incorporation of sufficiently mature services.

In line with this Objective, the project aimed to identify relevant infrastructural shared services which form part of the JISC Information Environment [Note1] and detail usage scenarios for those shared services which are relevant to the PerX Pilot Cross Search Demonstrator. However, at the present time, shared services are largely either at the conceptual stage or under development. For shared services which were found to be sufficiently mature, effort was expended in their utilisation, and an implementation report was produced to feed back the results.

Identified infrastructural shared services of relevance to PerX included;

  1. Service Registries
  2. Identifier and Resolver Services
  3. Terminology Services
  4. Metadata Schema Registries

Note that Authorisation/Authentication services were considered out of scope due to the fact that the PerX Pilot Cross Search demonstrator cross searches only freely available data or metadata. Other possible shared services (e.g. Alerting, user preferences, harvesting etc) are, at the time of writing (Oct 06), not sufficiently mature/developed for consideration.


1. Service Registries

Overview
A Service Registry stores, manages and makes available descriptions of service instances for the benefit of service providers such as PerX or portal type services. Within the JISC IE the Information Environment Service Registry (IESR) is a fully functional registry demonstrator which is currently in a 'service-in-development' phase and is funded by JISC until July 2009.

PerX Usage Scenario
A PerX administrator queries the IESR by subject for services relevant to engineering. Collection descriptions in the IESR records returned are used to determine whether the services are appropriate for inclusion in the PerX cross search service. The PerX administrator uses the IESR records to determine which protocols are supported for selected services (e.g. Z39.50, OAI-PMH, SRU/W) and the necessary client configuration information is retrieved from the IESR to enable it to be added to the PerX cross search. PerX software automatically checks the IESR periodically for any changes to service details. The PerX administrator is alerted by the IESR to new engineering services which become available.

IESR Implementation Report (October 2006)

A Full IESR Implementation Report is available as a separate document. Its main conclusions are;

  • At the present time, the coverage restrictions of the IESR limit its usefulness.
  • IESR Records are not up to date. IESR records must be current if service providers are to depend on them for accuracy. Arguably an out of date service registry is worse than no registry at all.
  • There are issues regarding the subject classification of resources in the IESR. All IESR records must be adequately classified by subject if service providers are to get lists of appropriate services on a subject basis.
  • In the PerX test implementation the IESR record for the COPAC Z39.50 service was adequate for enabling target setup in the PerX Pilot whereas the JORUM IESR OAI-PMH record contained only minimal information requiring further work by the service provider.
  • The PerX-IESR interaction was not automated due to concerns with currency of the IESR.
  • Web services may be the ideal means to automate the interaction between the IESR and Service providers but these are not yet available.
  • It was felt that more documentation may be required to help service providers automate their IESR interactions.

Comments from the IESR project team on the PerX Implementation Report are also available.


2. Identifier and Resolver Services

Overview
An Identifier Service "maintains and provides an association between an identifier and some metadata about the identified resource. Typically, an identifier service takes an identifier of a resource and returns a locator for it (usually in the form of a URL)" [2]. Within the JISC IE the only shared service currently operating in this area is the OpenURL Router based at Edina. The OpenURL Router works by offering a central registry of UK HE/FE institutions' OpenURL resolvers. An institution registers details of its resolver at the central registry and once registered, any service provider can provide users from that institution with OpenURL links to their resolver. Services such as PerX can use the OpenURL Router to determine a user's institution, and hence their resolver.

PerX Usage Scenario
An end user from a UK HE/FE institution searches the PerX Pilot Service and finds useful resources from a number of collections which are cross searched. PerX software uses the OpenURL router behind the scenes to check the user's institution and to identify that the institution in question does have an OpenURL resolver. The end user is provided with OpenURL links via the PerX Pilot Service which lead them to appropriate copies of identified resources.

Implementation Report
No implementation as yet. Trial implementation is planned for later in the project.


3. Terminology Services

Overview
Terminology Services are perceived to be shared services which offer a range of terminology-related services, for example mapping a term from one controlled vocabulary to another or expanding terms within a thesaurus. As conceded in the JISC Terminology Services and Technology review [3] terminology services "can be confusing in that they span very different application areas, vocabularies, communities, and can provide quite different kinds of services." The HILT project aims to "research, investigate, pilot, and develop solutions for, problems pertaining to cross-searching multi-subject scheme information environments." In a nutshell this involves mapping users' search terms to subject based controlled vocabularies in order to improve searching. HILT is currently a JISC project with identified service potential and is funded until January 2007 to provide a m2m demonstrator which offers web services access.

PerX Usage Scenario
The HILT m2m Feasibility Study [4] details a number of possible usage scenarios. At the simplest level a scenario is along the following lines;

An end user types a term into the PerX cross-search box. The term is sent to the HILT Terminology Service to generate a set of additional search terms that can be used to improve the cross search results. The original and derived terms are passed back to PerX and are run against the databases to be cross searched. Results are returned to the end user. The user notices no substantial differences in the result set (apart from, hopefully, a larger number of results) between the non-enhanced query and a query enhanced first by via m2m interaction with HILT.

Implementation Report
No implementation viable within the timescales of the project.


4. Metadata Schema Registries

Overview
A Metadata Schema Registry is a network service that stores and makes available information about the metadata schemas in use by other services. "The primary intention of this service is to allow portals, brokers and aggregators to automatically determine information about appropriate search terms and the structure of metadata records that will be returned to them." [2]. The JISC Information Environment Metadata Schema Registry (IEMSR) is a project with identified service potential which is currently funded until Sept 2006.

PerX Usage Scenario
IEMSR usage scenarios for PerX are fairly complex. A range of usage scenarios are available from phase I of the IEMSR project website. Further use scenarios from phase II of the IEMSR project which will define more clearly the benefits that would be delivered by a pilot registry service are not yet apparent on the IEMSR website (Oct 06).

Some speculative ideas for means by which PerX could utilise the IEMSR are as follows. Data is harvested from Data Provider X by PerX in a particular format (e.g. oai_dc or LOM). PerX software then checks the IEMSR to retrieve details about Data Provider X's metadata records. The retrieved IEMSR records allow the PerX administrator to determine specific details about the harvested metadata from Data Provider X. For example, it is established that the <dc:subject> field in the oai_dc harvested records uses Dewey Decimal Classification (DDC). This information might allow the PerX administrator to reliably filter the harvested records by subject (This is often desirable for subject based services as many multidisciplinary collections do not offer oai sets on a subject basis. Accurately partitioning such collections along subject lines would allow such services to filter out subjects which are not relevant and produce more targeted subject based services). Alternatively with the knowledge that DDC was being used, PerX software could utilise Terminology Services (such as HILT) to perform a crosswalk to other classification schemes such as Library of Congress (LC). Retrieved IEMSR records may also allow services such as PerX to determine whether it is worth harvesting and processing seemingly richer metadata format (e.g. LOM rather than MARC) rather than basic oai_dc.

Note that in these scenarios it is crucial that the IEMSR record specifies exactly how Data Provider X's internal application profile (i.e. the metadata used internally by Data Provider X) relates to their exposed metadata formats (e.g. oai_dc or LOM).

Implementation Report
No implementation viable within the timescales of the project.

 

[1] Note that the JISC Information Environment work in combination with the e-learning framework initiative is now being taken forward as the e-framework for education and research

[2] JISC Shared Infrastructure Services Synthesis Study 2006. Available at
http://www.jisc.ac.uk/media/documents/jisc-sis-report-final-2006-09-28.pdf

[3] JISC Terminology Services and Technology review 2006. Available at
http://www.jisc.ac.uk/media/documents/terminology
_services_and_technology_review_sep_06.pdf

[4] HILT m2m Feasibility Study 2005. Available at
http://hilt.cdlr.strath.ac.uk/hiltm2mfs/0HILTM2MFinalReportRepV3.1.pdf