IESR Implementation Report (October 2006)
PerX aimed to use the Information Environment Service Registry (IESR) to obtain up to date target information in order to set up and maintain search targets for the Pilot Cross Search service.
Ideally this would allow the Perx pilot service to:
- Get a list of all of the Engineering Targets relevant to PerX from the registry.
- Get details of the means of interoperability of each target (i.e. z39.50, OAI-PMH, SRW).
- Use that information to set up the targets in the PerX Pilot.
And on a continuing basis:
- Continue to monitor for any updates to the list of current targets.
- Continue to monitor for any new targets which are relevant to engineering.
However an initial manual trawl of the IESR in December 2005 revealed that there were very few IESR records relevant to engineering and therefore very little overlap with the PerX proposed target list. For example, although the existence of a number of OAI-PMH repositories covering engineering had been identified by the PerX team, none were found to have been included in the IESR.
As indicated on the IESR website the registry is limited in coverage as it contains only a "selected set of electronic resources within the JISC Information Environment".
Thus the following plan of action was established:
- Identify/Add a small sample of suitable targets.
- Retrieve and Analyse the usefulness of IESR Records.
- Automate the PerX-IESR interaction to enable ongoing maintenance of selected targets.
2. Identification/Addition of Suitable Targets
In December 2005 a search via the IESR Web interface using the term 'engineering' in the subject field resulted in 5 hits; Inspec,
Statistical Accounts of Scotland, CrossFire, Gmelin and Zetoc. However, a number of other services which are relevant to engineering and are included in the IESR were not returned. For example; EEVL, JORUM and COPAC. This raises issues with regard to the subject classification used within the IESR. Services must be appropriately classified if Service Providers are to get lists of appropriate services on a subject basis. In this case, one of the services not returned via a subject search clearly has its main focus on engineering (EEVL), while the others (JORUM and COPAC) are multidisciplinary in nature but have sufficient engineering content to be of relevance.
It was decided to utilise the IESR records for COPAC and JORUM for the purposes of the PerX implementation study as these represented two different machine to machine (m2m) interfaces which were already being used by PerX, namely Z39.50 and OAI-PMH.
The IESR COPAC record was found to contain details of the COPAC Z39.50 service, however the JORUM record contained no details of the JORUM OAI-PMH service despite this interface being available. This raises issues regarding the currency of IESR records. If Service providers are to depend on the IESR to obtain details of service interfaces IESR records must be current and regularly updated. After contacting IESR and JORUM staff it was confirmed that the JORUM IESR record was 'terribly out of date' and it was subsequently amended to include details of the OAI-PMH service. Further evidence that IESR records are not up to date is shown by the fact that, at September 2006, no record exists for the Intute Service which was launched in mid 2006, and records are still apparent for each of the individual component hubs (SOSIG, PSIgate, EEVL, HUMBUL etc) which have since been decommissioned.
It is possible that by allowing trusted services (e.g. PerX) to modify/update/add/ IESR records the latency in the updating of records may be reduced. It could be argued that services which intend to consume IESR records have the greatest interest in ensuring that they are up to date.
3. Retrieval and Analysis of IESR Records
From the four available IESR interfaces (Web Search, Z39.50, OAI-PMH, Open URL Link- to Resolver) provided at http://iesr.ac.uk/use/ Z39.50 access was initially chosen to retrieve the technical details for the COPAC and JORUM targets. Subsequently it was decided also to investigate the Web Search and OAI-PMH interfaces in order to determine whether these could provide any richer technical information on the targets.
3.1 Retrieval of IESR records via Z39.50
A search for COPAC using the Z39.50 Access Interface returned the XML file listed in Appendix A. In this file three main elements were relevant to the set up COPAC as a PerX target:
Mandatory information to establish a connection with COPAC Z39.50 service.
The xml file provided in the <iesr:interface> element is listed in Appendix B. This file has sufficient information to setup the COPAC Z39.50 service as a PerX target.
Service help information of use if the <iesr:interface> element does not provide information on the attributes needed for properly searching COPAC via Z39.50.
A further element of potential interest included;
c. 33 million records
Provides an idea of the time required for searching COPAC via Z39.50 without timing out unnecessarily.
A search for JORUM via the IESR Z39.50 Interface returned the XML file listed in Appendix C. In this file only one element was found of real relevance to setting up the JORUM OAI-PMH as a PerX target:
This is the base URL of the JOURM OAI repository.
One further elements of potential interest included;
Available to 5/99 and X4L project team members. Account required.
3.2 Retrieval of IESR records via OAI-PMH
Using the IESR OAI-PMH Access interface to retrieve the COPAC record it was necessary to first use the IESR Web Search Interface to establish the unique identifier assigned by IESR to COPAC. Although not a major issue it was felt that the necessity to use the IESR web search interface was unnecessary (e.g. It may have been more appropriate to enable a m2m mechanism for identification of the IESR unique identifier as this would allow service administrators to work entirely within their administrative interface).
The IESR unique identifier of COPAC is oai:iesr.ac.uk:1084445734-12758. This identifier was used to harvest the COPAC record via OAI-PMH in both oai_dc and oai_iesr formats via the following URLS:
The COPAC XML files returned by the IESR OAI-PMH Access interface did not provided any more relevant information or anything that was not available via the IESR Z39.50 Access Interface.
After using the IESR Web Search Interface to find the IESR Unique Identifier of JORUM the IESR OAI-PMH Interface was used to harvest the IESR JORUM record in both oai_dc and oai_iesr formats:
Appendixes D and E show the XML record for JORUM in both oai_dc and oai_iesr formats.
The JORUM XML files returned by the IESR OAI-PMH Access interface did not provided any more relevant information or anything that was not available via the IESR Z39.50 Access Interface.
In the case of the COPAC Z39.50 service, the aim of IESR to make the setup of Z39.50 targets easier for other services by providing them with relevant technical information seems to have been adequately accomplished. In particular the element <iesr:interface xsi:type="dcterms:URI"> adequately contains the information required for Z39.50 target implementation by a service such as Perx. It should therefore be possible for services similar to PerX to identify and successfully setup a Z39.50 target using the appropriate IESR record, often without the need for further analysis of the Z target itself.
For the JORUM OAI-PMH repositories the situation is different. In practical terms the IESR offered only basic details of the OAI-PMH service – that is the base URL of the JORUM OAI-PMH repository. Further effort is required by service providers to gather the information required for setting up OAI-PMH Harvesting (e.g. what metadata formats are supported, are sets supported, can harvesting be achieved via ListIdentifiers or via ListRecords, what is the approximate size of the repository etc). Dedicated OAI-PMH registries such as the OAI-PMH Data Provider Registry of the Grainger Engineering Library Information Center at University of Illinois (http://gita.grainger.uiuc.edu/registry/) provide useful, richer and practical information for setting up OAI services. The Grainger registry is useful because they provide information filtered/collected from Identify, ListSets, ListMetadataFormats, and sample records from the OAI repositories. On top of that the registry is searchable via SRU, which makes it a good choice for m2m retrieval. Grainger also offers feeds on latest changes to the registry, via RDF/RSS. Unfortunately Grainger seems to be American-oriented and only included full OAI compliant repositories, which limits its relevance for PerX because the technological challenge with OAI comes when the federated search service needs to deal with data providers that do not follow the OAI-PMH standards and recommendations in full. It is in these cases when the help of services such as IESR or Grainger is very desirable.
From the perspective of a service provider the content of the IESR JORUM record was not particularly useful in the setting up of the JORUM OAI-PMH target. A trial and error process of manually examining the OAI-PMH repository was still required in order to determine the type of information contained in dedicated OAI-PMH registries such as Grainger.
4. Automation of PerX-IESR Interaction
Bearing in mind the currency issues regarding the IESR it was decided not to actually implement the automation of the PerX-IESR interaction at this time. However, it would be possible to create a script which periodically retrieved the relevant IESR records, extracted the relevant details from them, and to updated the target details in the PerX Administrative Interface (PAIN) if necessary. This type of 'bespoke' automation for each type of target would require a reasonable amount technical effort and may not be a practical scaleable solution.
Because of the differences in the types of services involved, different modes of updating might be possible e.g.
- For OAI-PMH targets automate the checking of IESR records to run pre-Harvest of each repository.
- For Z39.50 targets automate the checking of the IESR record for a predefined period (e.g. weekly) or if the Z39.50 target became unavailable.
Ideally the automation of the PerX-IESR interaction would occur automatically via web services. For example the IESR could alert service providers (such as PerX) immediately via web services when a change to a particular target occurs, or when an additional target within their subject area is added. A further web service would then be used to automate the update/input of the target into the service providers administrative interface. At the time of writing (October 06) the IESR website (Use Section) indicates that "A Web services (SRW) interface will be a later development to further enable the inclusion of the IESR in meta-searching applications". Further documentation, perhaps with case studies, may be useful to help service providers automate their IESR interactions.
- At the present time, the coverage restrictions of the IESR limit its usefulness.
- IESR records must be current if service providers are to depend on them for accuracy. Arguably an out of date service registry is worse than no registry at all.
- All IESR records must be adequately classified by subject if service providers are to get lists of appropriate services on a subject basis.
- In this test implementation the IESR record for the COPAC Z39.50 service was adequate for enabling target setup in the PerX Pilot whereas the JORUM ISER OAI-PMH record contained only minimal information requiring further work by the service provider.
- The PerX-IESR interaction was not automated due to concerns with currency of the IESR.
- Web services may be the ideal means to automate the interaction between the IESR and Service providers but these are not yet available.
- It was felt that more documentation may be required to help service providers automate their IESR interactions.
6. Appendix - Example IESR Records