1. Building queries for searching Z39.50 targets
Z39.50 supports several types of query formats, including CQL and even SQL. However the most common form of query is Reverse Polish Notation (RPN). When it's sent across the wire, it's encoded into the structure of the request, rather than as a string. It seems that there has never been an official string representation. However, Prefix Query Format PQF/PQN has become the de facto standard for string encoding. YAZ, the software API used by PerX for querying Z39.50 targets, uses PQF/PQN for constructing the queries sent to Z39.50 servers.
Below, there are first a quick introduction of basic concepts and then some examples of building PQF queries, taken from the algorithms used by the PerX MetaSearch Engine.
Firstly, RPN does not have a concept of an 'index', 'table', 'column' or any physical representation of a collection of data. Instead it abstracts this into a vector of 'attributes'. Each attribute has a numeric identifier. These attributes are collected together in 'attribute sets'. (This is where CQL context sets have been derived from.)
The most commonly supported attribute set is BIB1 -- the bibliographic attribute set version 1. BIB1 has 6 attribute types:
As per the examples, each type has several values, the most populated being, unsurprisingly Use attributes. There is a very long list of use attributes available, in theory, on the Bib1 reference page. Common ones are:
1 Personal Name
12 Local Number
1010 Body of Text
To start lets put everything together into a simple query using PQF:
A search query for the word XML in the field title
could be written as:
@attr 1=4 @attr 2=3 "XML"
Clauses as above can be linked with a prefixed boolean operator.
Thus, title = xml and author = sanderson
Could be written as:
@and @attr 1=4 @attr 2=3 "XML" @attr 1=1003 @attr 2=3 "Sanderson"
Available booleans, with the expected semantics, are: @and, @or, @not
The above queries let the server decide if the query is a keyword or exact
search, amongst other possibilities. To specify this, we need to add
another attribute into the vector:
Thus @attr 1=4 @attr 2=3 @attr 4=2 "xml" Is a keyword search.
In theory, that's all that's necessary to know for bibliographic searching using Z39.50. However, for a real stable and reliable service, we cannot work with such limited syntaxes assuming that the default settings of all Z39.50 servers will be correct for us. Below we have included a little bit more advanced queries concerning less frequently used facilities, with the hope that can be useful to implementators dealing with Z39.50.
When a server completes a search it creates a named result net containing
references to the matched records. This result set may be referenced in a
query, for example to merge result sets. Using PQF:
@and @set resultSet1 @set resultSet2
would perform the intersection of resultSet1 and resultSet2.
More advanced sample queries include the groupings of Boolean operators and combinations of attribute types, for example:
@attrset bib-1 @or @and @attr 1=4 @attr 2=3 @attr 3=1 @attr 6=1 "XML with Java" @attr 1=1016 @attr 2=104 @attr 3=3 @attr 6=1 "Morrinson" @and @attr 1=62 @attr 2=104 @attr 3=3 "XML" @attr 1=62 @attr 2=104 @attr 3=3 "Java"
which will search for records with title equal to the phrase "XML with Java" AND "Morrinson" as author OR for records with the keywords XML AND Java in abstract.
While BIB1 is the most common attribute set, there are others. For example a more advanced specification is the Attribute Architecture which adds further dimensionality to the request, including notions of semantic and functional qualifiers. An Author might be a name, qualified with 'personal' and 'creator'. Attributes within a single clause may come from different attribute sets, so you might see:
@attrset XD @attr 1=3 @attrset BIB2 @attr 2=3 @attrset UTIL @attr 12=2 Rob name, personal and creation in this order.
While the Attribute Architecture is technically superior, only the most advanced Z39.50 implementations actively support it, which does not include any (to my knowledge) of the commonly used library systems. None of the Z39.50 targets of interest to PerX are using the Attribute Architecture specification.
One thing to take into account is proximity searches. Boolean searches
are all very well, but if you want to find two keywords which are near
each other but not adjacent, then you need to use proximity. PQF's
representation of proximity leaves a lot to be desired in terms of
expressability. Below is an simplified example of how to construct the
@prox operator. We have not used proximity in PerX.
@or bridge concrete @prox construction
2. Z39.50 targets error detection study
Each time we query a Z39.50 database, we need to sequentially execute four operations:
An error can occur or be produced by any of the above operations. Thus, in order to be able to take an action as soon as an error happens, we introduced an "error interceptor" mechanism in the code handling Z39.50 queries. The interceptor was able to detect errors for each operation. However, after a number of trials with the interceptor, we took notice of the following points:
3. Further information