Pilot Engineering Repository Xsearch

PerX Logo

Engineering Digital Repositories Landscape Analysis, and Implications for PerX
Version 1.0 (10/11/05)

Authors: Roddy MacLeod (R.A.Macleod@hw.ac.uk) and Malcolm Moffat (M.Moffat@hw.ac.uk)

Home>About>Deliverables>Engineering Digital Repositories Landscape Analysis

Document History
Version Date Comments
1.0 10/11/05  


Contents


1. Introduction

This paper draws on the experience of compiling a Listing of engineering repository sources and aims to help inform the development of a pilot subject based engineering repository cross search service.

The analysis reviews the landscape in the area of digital repositories within the subject area of engineering, as identified in the Listing. The document provides a synopsis of the current state of digital repository and metadata repository provision (including obvious gap areas) within engineering related disciplines. It proceeds to discuss various issues concerning repository provision and how they might relate to the provision of an Engineering Repository cross search service. The document concludes by considering the implications of the current engineering digital repository landscape for the PerX project.

For the most part, 'closed access' commercially produced repositories which do not currently allow free searching (such as various commercially available full-text and Abstract & Indexing services) are not included in the analysis or discussion.  Many of these, such as ScienceDirect, Ei Compendex, Inspec, Technology Research Index, Web of Science, etc., are recognised as being extremely important, in terms of information provision, for the engineering community.  However, the primary purpose of this list is to help inform the development of a Pilot Engineering Repository Xsearch service which does not require authentication (i.e. which cross-searches only freely available data or metadata). 

 

2. Synopsis of Engineering Repository Sources

Table 2 sums up the status of repository and metadata repository provision uncovered via the Listing of Engineering Repositories exercise;

Table 2. Synopsis position via Repository Type
Repository Type
Brief Synopsis of Position
Research Data
Repositories
  • Obvious gap area.
  • Little in the way of primary research data is available - Few specific research data repositories have so far been identified in engineering.
  • Currently there is much international interest in the area of research data repositories. The ability to access and integrate data from different fields and the expanding number of data sets in a range of disciplines is creating numerous possibilities to reutilise primary data in various ways. Researchers in areas such as climate modeling, demographic studies and genetics already frequently use research data originally generated by others which are accessed via the internet. Recently, Canada and the US have made recommendations regarding the importance of establishing the necessary infrastructure for long term management of research data (National Science Board Report (2005), Strong and Leach (2005))
  • A number of Digital Repository Programme projects (e.g. Claddier, R4L, Spectra & StORe) are investigating issues surrounding research data repositories although none are looking specifically into the engineering subject area.
  • Engineering portal consultancy groups undertaken as part of the Subject Portals Project in 2002 suggested a need for an 'archive for engineering test data' as well as means to enable manipulation of research data.

Metadata Repositories

  • Obvious gap area.
  • Only one metadata source (ESDU), which provides validated design data, has so far been identified which fits this category, although it could be argued that sources such as Knovel and the McGraw Hill Digital Engineering Library might be considered here..

Research Outputs 1: Preprints/Postprints

Repositories

  • Wide scale adoption of Institutional Repositories seems likely following on from impetus arising from the UK Select Committee on Science and Technology Report and subsequent activities.
  • There is a growing number of institutional repositories in the UK. Adoption is being encouraged through various initiatives such as SHERPA and the Digital Repositories Programme.
  • As yet there is no established National infrastructure for coordination of UK HE research output. Swann et al (2005) discusses a model for eprint content in the UK and the ARROW project provides an example of a national resource discovery service for research outputs in Australia.
  • Most institutional repositories contain multidisciplinary material. There are often no means available for a subject based service to select subsets of such multidisciplinary collections based on subject coverage. Clearly OAI-PMH Sets could be used but in practice few repositories provide subject based sets.

Metadata Repositories

  • No general purpose engineering Preprints/Postprints metadata repositories have been identified
  • A number of more specialised metadata repositories exist (e.g. ASEE Conference Proceedings ).
Research Outputs 2: Technical Reports Repositories
  • Gap area
  • A number of substantial engineering technical report repositories are in existence which cover technical reports published in the USA (e.g. NASA, NACA).
  • There are few equivalents for the UK, one exception being reports available via the CCLRC ePublication Archive. On the whole, bibliographic control of technical report series tends to be complex, they are often poorly catalogued, and scattered across the academic, government and commercial sectors. This makes technical reports difficult to identify, locate and access.
  • The MAGIC project investigated issues surrounding access to the technical reports, and produced the METReS demonstrator service, which was in essence a prototype UK National Technical Reports Catalogue. The MAGIC project ended in Oct 2002 and the METReS service is no longer supported and is available only for archival purposes.

Metadata Repositories

  • A number of technical report metadata repositories are in existence.  Coverage of US technical reports is good (NTIS, STINET). Other metadata repositories in this area tend to include various types of 'grey' literature as well as technical reports (Energy Citations Database, GrayLit Network, etc) .
e-Thesis
Repositories
  • Considerable progress has been made in the UK on the back of FAIR projects (e.g. Electronic Thesis, Thesis Alive! and DAEDALUS ).
  • The Ethos Project aims to deliver a fully operational UK database of thesis (UKDoT).
  • There are some international exemplars including NDLTD (US), ADT (Australia).

    Metadata Repositories

    • French doctoral theses are covered by THESA
Learning Materials
Repositories
  • There are many initiatives in the Learning Objects repositories area. Within the engineering subject area there are several UK projects (e.g. RESET, Learn'EM) and numerous international projects (including, for example, GROW, DLNET) aimed at producing engineering learning objects.
  • JORUM will act a National UK repository source for Learning objects with content to be surfaced via the Connect Learning & Teaching Portal.
  • Some Learning Object Repositories identified tend to have less visible means of interoperability (i.e. no explicit OAI-PMH or Z39.50 interfaces identified), perhaps relating to issues of IPR, copyright and reuse.

Metadata Repositories

  • No sizable engineering specific learning materials metadata repositories have been identified.
Multimedia
Repositories
  • Several multimedia initiatives have been identified, with some large collections from the USA (e.g. EQIIS, NASA Image eXchange), and also some international multidisciplinary collections, including Flickr Engineering Subset.
  • Few current sizable UK engineering image or video repositories have been identified.
  • Work, not specifically related to engineering, in this area has included the outputs from the Biomed Image Archive project, the DRP funded MIDESS project and an ongoing feasibility study investigating community based image archives. MIDESS is looking into the issues surrounding the management of images in a distributed environment via the development of a digital repository infrastructure and exploration of the national context. 

Metadata Repositories

  • No metadata repositories have been identified.
Assessment Materials Repositories
  • Gap area.
  • Developments in the area of assessment materials repositories appear to be in their early stages, although there is much current interest in the area of 'item banks' of assessment materials.
  • At the institutional level, a number of HE institutions have ad hoc access to previous exam papers online.
  • A small number of initiatives in the UK have addressed collaborative sharing of engineering assessment materials (e.g. E3AN)
  • The DRP funded ‘UK collaboration for a digital repository to support sharing high quality, high-stakes assessment items' project is investigating this area.

Metadata Repositories

  • No metadata repositories have been identified.
Journal Repositories
  • Gap area.
  • A number of engineering journals and trade publications are freely available in full text on the web (e.g. EESE listing and DOAJ listing).
  • Very few journals or trade publications providers have established any means of interoperability (e.g. OAI-PMH or Z39.50). This is likely to be due to the fact that many are produced by independent, smaller scale publishers with little motivation or means to provide interoperability.
  • The DOAJ provides a means to harvest article level metadata of open access journals using OAI-PMH sets. The Technology and Engineering set currently contains over 2400 article level metadata records.
  • Central community effort to establish suites of open access journals within the engineering subject area is sparse, though some initiatives such as Petroleum Journals Online and Hindawi are beginning to make progress.

Metadata Repositories

  • A large, and growing, number of metadata repositories exist for journals.  Some are essentially freely available subject-based Abstracting and Indexing databases, e.g., (RAM, TRIS). Some professional associations provide data (IoP, ASCE, SPIE). A growing number of aggregators provide data (Scitation). Commercial journal publishers are increasingly likely to make their metadata freely available in order to increase its visibility. One example of a commercial publisher which provides OAI-PMH interoperability is Inderscience.
National

Repositories

A number of UK repository related projects exist which aim to coordinate national access to resources (e.g. ePrints UK, Ethos, IRI Scotland). Additional nationally coordinated UK effort seems logical/possible in the following areas:

  • Research Data (e.g. establishment of National Data Repository Services). Recent reports from the US and Canada illustrate that there are considerable opportunities, but substantial large scale national effort is necessary. (National Science Board Report (2005), Strong and Leach (2005))
  • Research Output (e.g. establishment of a National Repository Infrastructure for Research outputs). This would build on the developments of the ePrints UK pilot and learn from the ARROW project in Australia. Swann et al (2005) suggests a range of possible models for the management of eprint content in UK Further and Higher Education.
  • eThesis (e.g. establishment of a National electronic Thesis service). Currently being investigated/developed by Ethos. International examples include the Australian Digital Thesis Programme and NDLTD.
  • Image (e.g. establishment of a National Image Repository Service). Feasibility Studies underway as part of Digital Repositories Programme.
  • Journal. Considerable encouragement would be necessary to develop some form of 'open access' effort in engineering using the Biomedcentral/PLOS model.  Most current initiatives are in the area of the physics, the life sciences and medicine.

Metadata Repositories

  • Books. UK Book metadata is available via Copac.
Subjects Repositories
  • Gap area
  • No substantial subject based repositories exist specifically for engineering. Some other disciplines (e.g. Physics: ArXiv, Computing: Computing Research Repository (CoRR), Cognitive sciences: Cogprints, Information Science: E-LIS and Languages: OLAC) have established such repositories.

Metadata Repositories

  • A number of metadata repositories exist which cover engineering related websites (e.g. EEVL, Aerade, Avel etc)
Other

Metadata Repositories

  • Several metadata repository sources covering patents and standards are relevant to engineering.

 

3. Summary and Analysis

 

3.1 Summary

From the above synopsis it is obvious that a number of gap areas in the engineering repository landscape exist which include:

  • Research data,

  • Subject Based Access,

  • Technical Reports,

  • Journal,

  • Assessment Materials repositories.

These gap areas are discussed further in sections 3.3 to 3.7.

The picture concerning other repository types is fluid and in a state of fairly rapid development. Considerable advances are currently being made in the development and deployment of repositories for learning materials, e-thesis and at the institutional level. Often, coverage in these repositories is multidisciplinary in nature with no means for a subject based service to select subsets based on subject coverage. This presents a considerable challenge for subject based services which wish to cross search multiple resources, and is discussed in section 3.2.

Overall, the level of digital repository provision specifically for the engineering community appears to be relatively low. A considerable number of gap areas have been revealed and there is little in the way of repositories specifically aimed at the engineering community.  A growing number of multidisciplinary repository sources will help in the future to improve access to certain content types, but with no specific emphasis on engineering materials. Some are interoperable and others are not.

At the same time, albeit with some notable gap areas, there are numerous metadata repositories of interest to engineering, some of which are already interoperable, others for which interoperable interfaces are in development, and yet more for which the situation is unknown.

This situation will have a bearing on the coverage provided by the Pilot Engineering Repository Xsearch.  Inevitably, initial coverage will be very patchy

 

3.2 Enabling Subject Based Resource Discovery from Multidisciplinary Resources

A number of multidisciplinary repository sources have been identified with content which would be of interest to engineering (e.g. national repositories of research outputs, e-thesis, learning materials, technical reports and institutional/departmental materials). Clearly, subject based cross search services would wish to include relevant materials from such collections and OAI-PMH sets would seem to offer a mechanism to achieve this. In practice, however, the adoption of subject based sets within repositories is piecemeal in nature with some repositories offering clearly delineated subject groupings, others offering no subject breakdown whatever, and still others offering a more complex picture of many multiple sets (e.g. sets by publication status, sets by type of content (i.e. abstract/full text), sets by originating unit or author etc). The initial impression is that sets are often produced by data providers based upon their particular needs and internal organisational structures rather than the likely needs of service providers.

Examples of Repositories which provide Subject Based Sets

  • NASA Technical Report Server (NTRS) -[ListSets]
  • Digital Library Network for Engineering and Technology (DLNET) -[ListSets]
  • Directory of Open Access Journals (DOAJ) -[ListSets]
  • ArXiv.org -[ListSets]

Examples of Repositories which do not provide Subject Based Sets

  • Networked Digital Library of Thesis and Dissertations Union Catalog (NDLTD) -[ListSets], Sets provided are based on institution of origin.
  • National Engineering Education Delivery System (NEEDS) -[ListSets], No sets provided.
  • Council for the Central Laboratory of the Research Councils (CCLRC) ePublication Archive -[ListSets], Sets provided are based on type of content, i.e. metadata or Full text.
  • Australian Research Repositories Online to the World (ARROW) -[ListSets], No Sets provided.

Examples of Repositories which provide many multiple Sets

The current picture is therefore one in which the use of OAI-PMH sets to select particular subject groupings is limited to a small number of repositories which provide suitable sets. For others, suitable sets are simply not available or the situation is made prohibitively complex by large numbers of sets and the inability to easily gauge their relevance. In these situations the only current option is to harvest the entire multidisciplinary collection.

Some possible means to address this situation include;

  1. Encouraging the adoption of broad subject sets by data providers. By raising the issue of the usefulness of broad subject categorisation among the repository community, more repositories may be encouraged to consider the provision of subject based sets. Projects involved in encouraging Institutional Repository adoption such as Sherpa may be able to provide support and guidance and suggest suitable subject classification schemes (e.g. JACS). It is likely that automated means of broadly classifying documents deposited into repositories could be utilised to automatically produce subject sets.

  2. Alternatively the generation of subject based sets could be tackled at the level of service providers. Such a model was proposed by the ePrints UK project to enhance the metadata provided from UK Institutional Repositories. ePrints UK proposed the utilisation of a web services subject classification service (from OCLC) which would utilise the available metadata and provide a suitable subject classification. The enhanced records could presumably then be used to provide broad subject based sets suitable for subject based cross search services.

3.3 Analysis of Gap Areas

A number of gap areas in repository provision have been identified including; Research data, Technical Reports, Journal Repositories, Subject Access and and Assessment.

Few repository sources were identified for engineering-specific research data.  This situation can be contrasted to that of the Social Sciences and Humanities, for which the UK Data Archive  provides access to over 5000 computer-readable datasets for research and teaching purposes. 

Another gap area is that of UK technical reports.  Technical reports from USA are covered well by repositories and metadata repositories, but UK technical reports have few equivalents apart from the CCLRC ePublication Archive. Technical reports are also not particularly well covered by commercially available 'closed access metadata repositories', such as the SIGLE database. The British National Bibliography for Report Literature which included this type of material ceased publication in 2003.

A further notable gap area within engineering is that of journal repositories. Although some initiatives such as Petroleum Journals Online and Hindawi are beginning to make progress, they are relatively small-scale and as yet unproven, and the situation can be contrasted with well established open access journal repositories in some other subject areas (e.g. BioMed Central, and PLOS), and subject areas for which public funding is currently encouraging development. Various journal metadata repositories exist, and there are also major commercially available 'closed access repositories' such as ScienceDirect, and 'closed access metadata repositories', such as CSA Technology Research Database and Ei Compendex. This situation will have a bearing on the content and potential effectiveness, from an information retrieval perspective, of a Pilot Engineering Repository Xsearch service.  Coverage of certain resource types will be very patchy.

A major gap area identified is that of subject-based repository access for engineering.   No sizeable subject-based repositories exist for engineering and no real community movement is evident which is likely to change this situation. Yet various subject based repositories have been developed which serve other disciplines.  Several have existed for some time and have become well-established and heavily used, serving their respective communities. They include:

ArXiv.org (Physics, Mathematics, Non-linear Science, Computer Science, and Quantitative Biology).

Chemistry, Math and Computer Science Preprint Archives (Chemistry, Mathematics and Computer Science)

ClinMed NetPrints (Clinical Medicine and Health Research)

CogPrints (Cognitive Sciences: including Psychology, Neuroscience, Linguistics, and and many areas of Computer Science (artificial intelligence, robotics, vision, learning, speach, neural networks), Philosophy (e.g., mind, language, knowledge, science, logic), and Biology.)

Computing Research Repository (CoRR) (Computing)

Cryptology ePrint Archive (Cryptology)

E-BioSci (Life Sciences)

E-LIS (Library and Information Science)

EPRINT (Natural Science and Technology)

Project Euclid (Mathematics)

PubMed Central (Biomedical and Life Sciences)

Social Science Research Network (SSRN) (Social Science)

RePEc (Economics)

In addition to the above, the AMS Directory of Mathematics Preprint and e-Print Servers lists a number of subject repositories in mathematics, and 34 archives participate in the Open Language Archive Community (OLAC).

It is apparent therefore that subject repository services exist for some disciplines but not others,.  It has been suggested that "Actually establishing and running a repository calls for a level of infrastructure, maintenance and administration which many subject communities cannot provide." (Hubbard 2005) . With respect specifically to the engineering community, is the lack of subject based or journal repositories and the comparative lack of uptake of repositories simply a chicken and egg scenario?  In other words, if a well-supported subject based repository were developed for engineering, or if a number of open access journals were initiated for engineering, would there be a resultant increase in uptake by the engineering community?  The answer is probably no, or at least not to as great an extent as has happened in some other disciplines. 

To explain much of the above, and to understand how repositories fit into the information landscape of engineering, it is necessary to understand how the information and communication needs of engineers, the complexity of their information landscape, and the information seeking behaviour of engineers within the engineering discipline impact on information retrieval within engineering.

3.4 Information and Communication Needs of Engineers

Tenopir and King (2004) studied the communication patterns of engineers and the information resources used to perform their work. They synthesised the findings of numerous previous studies, looked at various aspects of communication patterns amongst engineers, examined the literature that distinguished the information needs and uses of engineers from those of scientists, and reviewed previous studies that had explored the communication practices of engineers.  Their study is the most complete survey of this topic to date.

Their conclusions reveal that engineers rely on both interpersonal and informal means of communication more than scientists, who read journals more frequently and are more inclined to use other formal means of communication as well. They found that engineers engage in many types of activity including research, design, development, production, construction, teaching, management, and marketing. As a result, they use numerous formal and informal channels to satisfy their communication and information needs. There are many written publication types which may be important to engineers, including scholarly and trade journals, books, internal and external reports, patent documents, conference proceedings, standards, regulations, dissertations.

In another study, Ward (2005) described the engineering knowledge base as consisting of explicit knowledge, such as that contained in print and online sources, and tacit knowledge which is inside people's heads or embedded in organisational structures and practice, or attached to cultural and other objects such as engineering products. He, too, identified the importance of personal and informal sources of information for engineers, along with various explicit information sources such as books, departmental files, confidential departmental databases, trade literature, standards and specifications, legislation, technical papers, journal papers, internal reports, external technical reports, reference works, test sheets and technical data, and and various multimedia sources.

A third study (Needham et al 2002) found similar results, and that engineers' information needs tend to be multifaceted, complex and changing in nature.  Engineers were found to spend between 40-66% of their time communicating, with personal and informal sources of information being preferred. Engineers spent a significant proportion of their time using scientific and technical information.  As well as other types of publication, sources such as technical reports were important and useful to engineers. 

Pinelli (2001) confirms this finding: "Journal articles are appropriate for scientists to describe the development and testing of one idea.  In contrast, technical reports are more appropriate for engineers to document engineering outcomes."

Elsewhere, Carstensen (1997) found a need amongst engineers for information and publications pertaining to a number of areas including design, product information, component specifications, standards, production data, materials and components, papers and research results, people and project documentation.

It is important to note that the studies mentioned above refer for the most part to the information and communication needs of engineers of all types, and not just engineers within academia. The information requirements of academic engineers will not necessarily exactly mirror the information requirements of the profession as a whole. For example, within academia a greater emphasis would be expected on the importance of scholarly journal articles and papers as an information source. Confirming this, a recent study on the needs of UK academic researchers in different disciplines (Sparks, 2005), showed that journal articles are very important for engineering researchers. However, that same study showed that several other types of publication can also be important for engineering researchers.

The literature thus shows that as well as interpersonal and informal communication and sources (tacit knowledge), a large range of publication types (e.g. trade material, books, standards, patents, legislation, reference works, and component specifications) can be expected to be important for academic engineers.  Many of these sources are not produced by academics and researchers and are also not the result of publicly funded research. In many cases, therefore, they are unlikely to result in materials which will be deposited in repositories.

Repositories per se are therefore unlikely to become as important in engineering as they have become, or may become, in some other disciplines. Instead, it is more likely that repositories may become one more additional source, amongst many, of potentially useful information for engineers. 

A service that focused only on materials in repositories, and ignored materials found in other sources, for which metadata repositories may be available, would therefore be unlikely to be regarded as an essential information retrieval tool.

More conclusions may be drawn from an analysis of some aspects of the published engineering information landscape.

3.5 Complexity of the Published Engineering Information Landscape

Published material makes up the bulk of the explicit knowledge base relevant to engineering, and the complexity of the engineering information landscape has been alluded to above.  Many different types of publication have been mentioned as being potentially important for engineering. For each and every one of these publication types there is often quite a range of commercial and non-commercial publishing bodies, plus a variety of secondary sources of information. When viewed as a whole, this amounts to a complex information landscape, and a landscape which is more complicated than that of many other disciplines.

For example, in the area of journals alone, academic, professional, trade and house journals can all be important sources of information for the engineer. Many engineering scholarly journals are published by the same commercial publishing houses that publish in other disciplines.  In addition, however, professional societies play a particularly important role in the science, technology and medical journal publication landscape - "Society and other non-profit titles account for 85% of the top 20, three quarters of the top 200 and two thirds of the top 500 ISI ranked titles" (Morris and Olivieri (2004)).  The figure is likely to be higher still if engineering titles are taken alone. Engineering professional societies tend to serve memberships made up of practicing engineers as well as academic engineers. The content of the journals they publish, and the origin of authors, reflects this to great extent, albeit with some differences between engineering sub-disciplines, with papers being written by practicing engineers as well as those in academia.  So far, enthusiasm of open access models for journal publishing from engineering professional societies appears to have been less than in some other disciplines.  A recent JISC survey findings show that, in two cases, learned societies publishing technology journals believed that the author community for their journals was not enthusiastic about being charged for publication (Waltham, 2005).  Given that the proportion of potential authors originating from industry are unlikely to be able to include the cost of prepayment for publication from research grants, this is not surprising.

Some engineering professional societies publish journal titles which are less scholarly in orientation, and are aimed at their particular professional sectors (which will include a proportion of academics). In addition, many trade journals in engineering are published by relatively small commercial publishing houses, for whom the academic market is only one sector, alongside engineers in industry.  Trade journals can often be important for academic engineers, as they provide information such as industry-related news and analysis, product and market information, technical information about materials and processes, software reviews, products reviews, and trade literature and advertisements. For less-scholarly titles, advertising and job announcements not only bring significant income to the publication, but are also an integral part of the publication's content.  There is often little motivation for these publications, or those who author materials in them, to deposit materials in repositories.

House journals, published by organisations, businesses and public service providers are yet another additional type of publication, and often have their own specific concerns. A proportion of the titles identified from the DOAJ listings which provide open access could be classified as house journals.

Whilst papers and other materials published in any of these types of journals might legitimately be regarded as potential targets for deposit within repositories, in fact there are various reasons why such disclosure may often not be appropriate, or might damage the financial viability of the publication, or where motivation for doing so is low. 

As a result, a proportion of the literature which appears in the various types of engineering journals has little chance of appearing, as preprints or post-prints, in repositories. 

The corollary is that those within engineering who are seeking topical information in journals will continue to rely upon a number of different sources.  As is the case with other types of publications, materials deposited in repositories will be one of a number of potential sources of information. Commercial 'closed access repositories' and 'closed access metadata repositories' are major, if often incomplete, sources of information in engineering.  If materials deposited in repositories are neither included in such sources, or cross-searchable at the same time via a broadcast search, they are less likely to be discovered.  Provision of resource discovery services, even subject-based ones, which are limited to materials in digital repositories, are therefore unlikely to significantly raise the visibility of those materials in engineering.  This conclusion should be noted, as web visibility of materials in repositories has been identified as being important for the success of the digital repository movement (Whitehead, 2005).

In addition, much relevant engineering research information is not published, even in the variety of journal types mentioned above.  Pinelli (2001) found that the experiences of the scientific and engineering communities concerning the disclosure (communication and publication) of knowledge were fundamentally different.  Whilst the scientific community, generally speaking, makes its research knowledge freely available, "...disclosure is neither required or expected in the technology community".  "Technological knowledge is not easily or completely codified, nor is it freely communicated.  Unlike science, the output of technology is not made universally available". Once again, it must be remembered that he was reporting on the commercial world of engineering and technology rather than the academic community.  For academic engineers, disclosure, and the resultant rewards disclosure reaps, can be just as important as it is for academic scientists, especially in the case of research funded by public bodies and research councils.  However, much research undertaken by academic engineers is funded or sponsored by commercial concerns, where disclosure may actually be undesirable (as it might assist the funder's competitors).  Secrecy can, in fact, be desirable.  As Needham et al (2002) found, the publication and distribution of technical reports is often controlled by the particular company or agency performing or sponsoring the work. As a result, many reports will never be distributed externally and the motivation to do so is often lacking.

Commercial interests will also mean that research data is often not disclosed.  In addition, much research data will be collected as a result of experiments which are applied to engineers' own systems under analysis, and will in any case often not be as universally applicable as the data collected by scientists.  It is noticeable that of the data currently available via the CCLRC Data Portal, little can be classified as engineering data.

These things mean that, overall, there is less emphasis on open provision to research outputs in engineering than there is in science.  It also means that engineers might not expect to find all, or even a significant proportion, of research publications relevant to their own needs openly available. 

Most successful subject based repositories have developed as a result of the high dependency of academics in the subjects concerned on scholarly papers which can relatively easily be deposited as e-prints in archives. In an imaginary and simplistically monolithic information landscape, it is easy to see how digital repositories might play an extremely important role in such cases.  The scenario is that literature-based research is undertaken, and results are disclosed via papers deposited in repositories, and subsequently in scholarly journals, etc.  Discovery of the bulk of research results can thus be anticipated via a search of relevant repositories/subject repositories and/or a search of a relevant journal full-text or indexing service.

In a more complex information landscape, such as that of engineering, any one particular source of information for any one particular type of publication has to struggle for the limited attention of those seeking relevant information.  It is therefore, perhaps, not surprising that there is not the same growing movement towards open access in engineering as in some other disciplines.

To summarise, much information which is important for engineering cannot, and will not be deposited in repositories, and any discovery service which concentrates on engineering repository materials alone will find it difficult to attract a significant usage.  If engineering materials deposited within repositories are difficult to find, or are missed when literature searches are undertaken, there will be a resulting reluctance on the part of engineers to deposit their own materials in repositories. 

A Pilot Engineering Repository Xsearch service therefore has a role to play in attempting to cross-search whatever is available, either through repositories or metadata repositories.  Through advocacy work with publishers, professional societies and others, more sources may become interoperable and can subsequently be included in the cross-search service.

3.6 Information Seeking Behaviour of Engineers

As far as the information seeking habits of engineers are concerned, Tenopir and King (2004) concluded that: "Engineers' information seeking...varies with each information need, and generally differs by engineering discipline (e.g., aeronautical compared with civil engineering), the nature of work being performed (e.g., research vs. design), country (e.g., based on access to technology, funds available, and culture), and personal characteristics (e.g., gender and age). What tends to be common to all engineers is the need to obtain information quickly with as little effort as possible."  [our emphasis]

In a much earlier paper, Pinelli (1991) reported similar findings: "What an engineer usually wants...is a specific answer in terms and format that are intelligible to him - not a collection of documents that he must sift, evaluate, and translate before he can apply them" [our emphasis].  In a later paper, Pinelli (2001) reports that "Engineers and scientists exhibit many other important differences in education, technical discipline, and type of work activities.  These differences point to differences in their information-seeking behaviors and information needs."

The Magic report (Needham et al 2002) provided a useful summary as to what is currently known about the information seeking behaviour of engineers, with results similar to other studies:

  • The dominant factor in the use of an information source is its accessibility, followed by its ease of use.
  • Engineers need for information was found to be task-oriented and based on problem solving.
  • Engineers needing technical information tend to use the most accessible sources rather than the highest quality sources.
  • Engineers are reluctant users of formal information sources.
  • Engineers may be missing significant information through lack of awareness of information sources.
  • Increasingly, engineers are frequent users of the Internet, which is used to locate information for their work.

The conclusion to be drawn is that, generally, those involved in engineering, whether in academia or in industry, tend to require information quickly and without much effort, in order to satisfy task-based requirements.  At the same time, they tend not to be overly committed to the information gathering process.  To some extent, therefore, materials residing in repositories will present simply another source to be added (or, in practice, often not added) to many others in the information gathering process, and one which will further complicate that process.

A subject-based repository search service for engineering which focused only on materials deposited in digital repositories would add yet another possible service engineers would have to search.  Such a service might not contain very much (certainly not at first, and given that much engineering research is not Research Council funded, and hence will not have a requirement to be deposited in repositories, perhaps not even in the longer term).  Usage levels of such a service would possibly be low compared with other disciplines.   

The corollary is that a subject-based search service which cross-searched materials deposited in digital repositories as well as numerous other metadata repositories containing relevant content from a variety of different types of publication might well satisfy more of the information needs of engineers.

Much of the above analysis refers to information retrieval of materials for research. So far, much less is known about information retrieval of learning materials in engineering, and it is important to note that this is a growth area due to the rapid uptake of Virtual and Managed Learning Environments.  This area is less mature in terms of information provision and resource discovery analysis.  Of course, one person's scholarly article may in fact be another person's teaching material, and there cannot always be clear-cut divisions between different types of resources.  This is an area that requires more study.

3.7 Importance of a Subject Based Approach

The importance of subject access to information has been recognised in the literature.  As Peters (2002) points out: "Ultimately, most seekers and users of scholarly information are persuing a topic or train of thought.  Although the publisher, author, and the institution with which the author was associated may be of some interest to seekers and users of scholarly information, usually those interests pale in comparison to the topic (and scholarly task) at hand.  Ultimately, a good, user-centric scholarly information system must meet the needs of students and scholars. These end-users need a system that enables broadcast searching across a wide variety of e-print servers, digital libraries, and institutional digital repositories to identify and retrieve potentially pertinent scholarly content"

The same case is made in even stronger terms by Stephen and Harrison (2002), who state "We feel more strongly than ever that there are significant advantages to a disciplinary approach to electronic services supporting advanced scholarship and higher education".  They continue "Unfortunately, we have seen little of the structure of the disciplinary community in electronic services."

It is important to note that the disciplinary approach is obviously not satisfied by services such as Google Scholar.

A disciplinary approach can take various forms.  From the various discussions above, it can be concluded that an ArXive type model for a subject repository service, supported by central funding, or an OLAC model (an international partnership of institutions and individuals who create a worldwide virtual library of resources by developing a network of interoperating repositories and services for housing and accessing such resources), are both unlikely to be particularly successful in engineering.  As Stephen and Harrison (2002) have pointed out, "...It is precisely because disciplines are such distinct cultures that electronic systems designed to speed scholarly communication, such as Paul Ginsparg's preprint server in high-energy physics (Ginsparg, 1994), may be revolutionary in particular fields but completely irrelevant in many others..." What may work for one discipline does not necessarily work well for another discipline.

Accordingly, PerX intends to pilot a distributed subject model, whereby engineering materials within digital repositories will be cross-searchable, at the same time as metadata repositories containing relevant engineering content.  In this way, it is hoped that the resulting pilot service will potentially satisfy some of the specific needs of the engineering community and take account of the advice of Stephen and Harrison (2002): "Electronic services need to be designed differentially and should deploy technologies selectively in service of the varying scholarly practices that define different fields.  The disciplinary community is everything and it is our belief that significant benefits would accrue if this insight, translated into a guiding principle of design, were to be more fully exploited among today's electronic services for the research and education community"

This will have the added advantage that better subject-based resource discovery options in engineering are likely to help raise the profile of digital repository-related work within the community, and are subsequently likely to encourage growth in the rate that relevant materials are deposited in repositories.  In other words, if engineers can find materials deposited in digital repositories easily alongside other relevant material, this in itself may help to encourage them to deposit their own materials in repositories. 

It is important that the distributed subject model cross-search service envisaged above should not be seen as the only point of access to the plethora of materials identified as being of interest to engineering.  As Fraser (2005) has pointed out, "...the building blocks of a VRE will comprise a mixture of institutional, (inter)national and discipline-based systems and services." Subject-based cross-search services such as that envisaged by PerX could become a component of relevant Virtual Research Environments, just as they could become a component of an institutional portal or other aggregating services.

 

4. Implications for PERX

Implications for the Pilot Engineering Repository Xsearch which emerge from the proceeding analysis include:

  1. Virtue of a Subject Based Approach to Resource Discovery. Subject based approaches to resource discovery have considerable merits. Differences between disciplines need to be carefully considered when evaluating which approaches are likely to be successful.
  2. Complexity of the Engineering Information Environment. Any potential service which aims to facilitate resource discovery from repositories must take into account the complex information environment in engineering and the specific needs of the engineering community.  We intend that the Pilot Engineering Repository Xsearch service should attempt to include as many different relevant repositories and metadata repositories as possible, providing coverage of a variety of repository types.  Even then, the service's usefulness will be limited if it does not include 'closed access repositories' and 'closed access metadata repositories', which raises longer term authentication and authorisation issues. There is a range of repositories and metadata repositories that can be included immediately within the pilot, although overall coverage may initially be patchy.  More repositories and metadata repositories are likely to be added to the pilot in due course. The provision of a subject-based cross-searching service which searches multiple resource types may in fact raise further issues, such as the relevance of particular resource-types for some searches. Little is known about retrieval requirements in the area of learning materials.  Feedback on the pilot may reveal relevant data.
  3. Existence of Gap Areas. Within the engineering subject area there are obvious gap areas where provision of repository sources is very limited or non existent. Namely; Research Data Repositories, Technical Reports Repositories, Journal Repositories, Engineering subject based repositories and repositories of assessment materials. Coverage of engineering materials within an Engineering Repository Xsearch service will therefore be patchy and some gap areas will exist. A critical mass of materials may not be available from the initial PerX Pilot.
  4. Identification of Suitable Repositories. Currently, identification of suitable repositories is time consuming and frustrated by lack of collection descriptions. There is usually no means to determine the relevance and coverage of repositories without resorting to time-consuming manual analysis via use of tools such as the OAI repository explorer. Clearly there is a need for better collection level descriptions, as is being proposed and developed by projects such as OpenDOAR. In addition it is often difficult to established exactly what is being aggregated by some service providers. Multiple levels of aggregation and lack of provenance may lead to a confusing picture. Identification of suitable repositories via the Information Environment Service Registry (IESR) is currently limited by the fact that the IESR holds information for only a "selected set of electronic resources within the JISC Information Environment". For example, only one OAI-PMH repository covering engineering is currently included.
  5. Subject Perspectives on Multidisciplinary Repositories. There are many relevant repository sources which are multidisciplinary in nature and many currently offer no effective means to subdivide collections on a subject basis.
  6. Differing means of interoperability. Sources vary in their means of interoperability from currently un-interoperable, to non-standard interoperability (i.e. proprietary APIs), to fully functional interoperability based on established standardised means (e.g. Z39.50, SRW, OAI-PMH). It is likely that any effective cross search service must be able to deal effectively with a range of possible interoperability mechanisms (e.g. OAI-PMH, Z39.50, SRW/U).
  7. Advocacy Work. There is potential for various avenues of advocacy type work for the PerX team, including:
  8. Usability. The Pilot service must also be easy to use and be designed so that little knowledge or 'buy-in' is required for its use. Carstensen (1997) recommended that in any information support system for engineers, there is a need to provide easy access, including seamless switching between use of different search strategies and information sources, and this advice should be heeded in the design of a Pilot Engineering Repository Xsearch service.

Clearly there are considerable challenges and obstacles which subject based cross search services must address. As Shreeves et al (2005) points out, much work is required as service providers “try to cope with the chaos that develops from aggregating data from diverse sources”.

 

5. References