SRU/W Client Implementation
The latest release of PerX Toolkit (Beta 1.5 27/03/07) includes an SRU/W client. PerX can now cross-search databases that expose their data via SRU/W servers. After two months of testing, we can report some early findings:
- Searching via SRU/W can be faster than searching via Z39.50
- SRU/W can provide a means for searching the full-text (in opposition to only searching metadata) of data providers that plug their SRU/W servers directly to their native search engines. Early testing has shown that in these cases the quality and quantity of SRU/W search results are higher than the ones returned by OAI-based or Z39.50-based protocol search services
- SRU/W can be as fast as OAI-based implementations, if not faster
- In general, adding new SRU/W targets is as simple as than adding Z39.50 targets. All is done via PAIN (the PerX Admin Interface)
- An SRU/W target requires almost no maintenance effort
- Data providers that have created their SRU/W servers on top or as an extension of their Z39.50 servers, tend to be slower, to return unnecessarily complex metadata formats (e.g. UNIMARC encoding) and to don't give access to richer data (e.g. full-text) that is searchable by their native search engines
- The PerX SRU/W Client handles short or full Dublin Core (DC) metadata. Support for other XML schemas have been left for a real service implementation
We have used the following databases for testing the PerX SRU/W client:
COPAC offers Z39.50 and SRU/W. We have noticed that its SRU/W interface is faster than its Z39.50. We have also observed that its SRU/W is more stable than its Z39.50, as the later is prone to return "Server is not replying queries" errors. Therefore, we are now searching COPAC via SRU/W.
So far we only have been able to get data from JORUM via OAI. As we have reported before, its OAI interface doesn't follow any "OAI-PMH best practices" for harvesting. We have been waiting for its SRU/W since Phil announced it long time ago in a PerX meeting. A couple of weeks ago, JORUM announced that its new release of Intrallect software included SRU/W. However, we have learned from JORUM that this SRU/W was buggy and now Intrallect has "disabled" SRU/W support from its software. So we couldn't test SRU/W with JORUM. Some sources say that its SRU/W may be fixed by September.
Its SRU/W service is faster, it searches full-text and produces more hits than its OAI repository. Therefore, we have decided to use SRU/W for searching Inderscience from now on.
Intute is one of our few targets that offers Z39.50 , SRU/W and OAI options. We have evaluated all of them and our recommendation is to use Z39.50. Now Intute is being searched via Z39.50 (the main reason for not using its SRU/W, is that it uses a rather Z39.50 oriented format (MARC) that makes our SRU/W XSLT parser very slow. I am not sure if Intute has followed a "best practice" implementation for its SRU/W.)
Other Possible SRU/W Targets
- ADT (ASK Project)
- NDLTD http://alcme.oclc.org/ndltd/SearchbySru.html
- JSTOR http://www.jstor.org/about/xml_gateway.html
- Arrow (http://www.valaconf.org/vala2006/papers2006/57_Treloar_Final.pdf)
- DSpace MIT (OCLC Implementation)
We have not implemented an SRW (SOAP-type) client as the literature has indicated that there are reasons to be concerned about the efficiency of SRW and SOAP-based Web Services as opposed to SRU and REST-style services, at least in high-throughput multi-threaded clients.
Last Updated 10-May-07