Pilot Engineering Repository Xsearch
|Investigating resource discovery issues in engineering digital repositories||| Home | About | Deliverables | Links | Pilot ||
Engineering Digital Repositories Landscape Analysis, and Implications for PerX
This paper draws on the experience of compiling a Listing of engineering repository sources and aims to help inform the development of a pilot subject based engineering repository cross search service.
The analysis reviews the landscape in the area of digital repositories within the subject area of engineering, as identified in the Listing. The document provides a synopsis of the current state of digital repository and metadata repository provision (including obvious gap areas) within engineering related disciplines. It proceeds to discuss various issues concerning repository provision and how they might relate to the provision of an Engineering Repository cross search service. The document concludes by considering the implications of the current engineering digital repository landscape for the PerX project.
For the most part, 'closed access' commercially produced repositories which do not currently allow free searching (such as various commercially available full-text and Abstract & Indexing services) are not included in the analysis or discussion. Many of these, such as ScienceDirect, Ei Compendex, Inspec, Technology Research Index, Web of Science, etc., are recognised as being extremely important, in terms of information provision, for the engineering community. However, the primary purpose of this list is to help inform the development of a Pilot Engineering Repository Xsearch service which does not require authentication (i.e. which cross-searches only freely available data or metadata).
Table 2 sums up the status of repository and metadata repository provision uncovered via the Listing of Engineering Repositories exercise;
|Brief Synopsis of Position|
Research Outputs 1: Preprints/Postprints
|Research Outputs 2: Technical Reports||Repositories
A number of UK repository related projects exist which aim to coordinate national access to resources (e.g. ePrints UK, Ethos, IRI Scotland). Additional nationally coordinated UK effort seems logical/possible in the following areas:
From the above synopsis it is obvious that a number of gap areas in the engineering repository landscape exist which include:
Subject Based Access,
Assessment Materials repositories.
These gap areas are discussed further in sections 3.3 to 3.7.
The picture concerning other repository types is fluid and in a state of fairly rapid development. Considerable advances are currently being made in the development and deployment of repositories for learning materials, e-thesis and at the institutional level. Often, coverage in these repositories is multidisciplinary in nature with no means for a subject based service to select subsets based on subject coverage. This presents a considerable challenge for subject based services which wish to cross search multiple resources, and is discussed in section 3.2.
Overall, the level of digital repository provision specifically for the engineering community appears to be relatively low. A considerable number of gap areas have been revealed and there is little in the way of repositories specifically aimed at the engineering community. A growing number of multidisciplinary repository sources will help in the future to improve access to certain content types, but with no specific emphasis on engineering materials. Some are interoperable and others are not.
At the same time, albeit with some notable gap areas, there are numerous metadata repositories of interest to engineering, some of which are already interoperable, others for which interoperable interfaces are in development, and yet more for which the situation is unknown.
This situation will have a bearing on the coverage provided by the Pilot Engineering Repository Xsearch. Inevitably, initial coverage will be very patchy.
A number of multidisciplinary repository sources have been identified with content which would be of interest to engineering (e.g. national repositories of research outputs, e-thesis, learning materials, technical reports and institutional/departmental materials). Clearly, subject based cross search services would wish to include relevant materials from such collections and OAI-PMH sets would seem to offer a mechanism to achieve this. In practice, however, the adoption of subject based sets within repositories is piecemeal in nature with some repositories offering clearly delineated subject groupings, others offering no subject breakdown whatever, and still others offering a more complex picture of many multiple sets (e.g. sets by publication status, sets by type of content (i.e. abstract/full text), sets by originating unit or author etc). The initial impression is that sets are often produced by data providers based upon their particular needs and internal organisational structures rather than the likely needs of service providers.
Examples of Repositories which provide Subject Based Sets
- NASA Technical Report Server (NTRS) -[ListSets]
- Digital Library Network for Engineering and Technology (DLNET) -[ListSets]
- Directory of Open Access Journals (DOAJ) -[ListSets]
- ArXiv.org -[ListSets]
Examples of Repositories which do not provide Subject Based Sets
- Networked Digital Library of Thesis and Dissertations Union Catalog (NDLTD) -[ListSets], Sets provided are based on institution of origin.
- National Engineering Education Delivery System (NEEDS) -[ListSets], No sets provided.
- Council for the Central Laboratory of the Research Councils (CCLRC) ePublication Archive -[ListSets], Sets provided are based on type of content, i.e. metadata or Full text.
- Australian Research Repositories Online to the World (ARROW) -[ListSets], No Sets provided.
Examples of Repositories which provide many multiple Sets
The current picture is therefore one in which the use of OAI-PMH sets to select particular subject groupings is limited to a small number of repositories which provide suitable sets. For others, suitable sets are simply not available or the situation is made prohibitively complex by large numbers of sets and the inability to easily gauge their relevance. In these situations the only current option is to harvest the entire multidisciplinary collection.
Some possible means to address this situation include;
A number of gap areas in repository provision have been identified including; Research data, Technical Reports, Journal Repositories, Subject Access and and Assessment.
Few repository sources were identified for engineering-specific research data. This situation can be contrasted to that of the Social Sciences and Humanities, for which the UK Data Archive provides access to over 5000 computer-readable datasets for research and teaching purposes.
Another gap area is that of UK technical reports. Technical reports from USA are covered well by repositories and metadata repositories, but UK technical reports have few equivalents apart from the CCLRC ePublication Archive. Technical reports are also not particularly well covered by commercially available 'closed access metadata repositories', such as the SIGLE database. The British National Bibliography for Report Literature which included this type of material ceased publication in 2003.
A further notable gap area within engineering is that of journal repositories. Although some initiatives such as Petroleum Journals Online and Hindawi are beginning to make progress, they are relatively small-scale and as yet unproven, and the situation can be contrasted with well established open access journal repositories in some other subject areas (e.g. BioMed Central, and PLOS), and subject areas for which public funding is currently encouraging development. Various journal metadata repositories exist, and there are also major commercially available 'closed access repositories' such as ScienceDirect, and 'closed access metadata repositories', such as CSA Technology Research Database and Ei Compendex. This situation will have a bearing on the content and potential effectiveness, from an information retrieval perspective, of a Pilot Engineering Repository Xsearch service. Coverage of certain resource types will be very patchy.
A major gap area identified is that of subject-based repository access for engineering. No sizeable subject-based repositories exist for engineering and no real community movement is evident which is likely to change this situation. Yet various subject based repositories have been developed which serve other disciplines. Several have existed for some time and have become well-established and heavily used, serving their respective communities. They include:
ArXiv.org (Physics, Mathematics, Non-linear Science, Computer Science, and Quantitative Biology).
Chemistry, Math and Computer Science Preprint Archives (Chemistry, Mathematics and Computer Science)
ClinMed NetPrints (Clinical Medicine and Health Research)
CogPrints (Cognitive Sciences: including Psychology, Neuroscience, Linguistics, and and many areas of Computer Science (artificial intelligence, robotics, vision, learning, speach, neural networks), Philosophy (e.g., mind, language, knowledge, science, logic), and Biology.)
Computing Research Repository (CoRR) (Computing)
Cryptology ePrint Archive (Cryptology)
E-BioSci (Life Sciences)
E-LIS (Library and Information Science)
EPRINT (Natural Science and Technology)
Project Euclid (Mathematics)
PubMed Central (Biomedical and Life Sciences)
Social Science Research Network (SSRN) (Social Science)
In addition to the above, the AMS Directory of Mathematics Preprint and e-Print Servers lists a number of subject repositories in mathematics, and 34 archives participate in the Open Language Archive Community (OLAC).
It is apparent therefore that subject repository services exist for some disciplines but not others,. It has been suggested that "Actually establishing and running a repository calls for a level of infrastructure, maintenance and administration which many subject communities cannot provide." (Hubbard 2005) . With respect specifically to the engineering community, is the lack of subject based or journal repositories and the comparative lack of uptake of repositories simply a chicken and egg scenario? In other words, if a well-supported subject based repository were developed for engineering, or if a number of open access journals were initiated for engineering, would there be a resultant increase in uptake by the engineering community? The answer is probably no, or at least not to as great an extent as has happened in some other disciplines.
To explain much of the above, and to understand how repositories fit into the information landscape of engineering, it is necessary to understand how the information and communication needs of engineers, the complexity of their information landscape, and the information seeking behaviour of engineers within the engineering discipline impact on information retrieval within engineering.
Tenopir and King (2004) studied the communication patterns of engineers and the information resources used to perform their work. They synthesised the findings of numerous previous studies, looked at various aspects of communication patterns amongst engineers, examined the literature that distinguished the information needs and uses of engineers from those of scientists, and reviewed previous studies that had explored the communication practices of engineers. Their study is the most complete survey of this topic to date.
Their conclusions reveal that engineers rely on both interpersonal and informal means of communication more than scientists, who read journals more frequently and are more inclined to use other formal means of communication as well. They found that engineers engage in many types of activity including research, design, development, production, construction, teaching, management, and marketing. As a result, they use numerous formal and informal channels to satisfy their communication and information needs. There are many written publication types which may be important to engineers, including scholarly and trade journals, books, internal and external reports, patent documents, conference proceedings, standards, regulations, dissertations.
In another study, Ward (2005) described the engineering knowledge base as consisting of explicit knowledge, such as that contained in print and online sources, and tacit knowledge which is inside people's heads or embedded in organisational structures and practice, or attached to cultural and other objects such as engineering products. He, too, identified the importance of personal and informal sources of information for engineers, along with various explicit information sources such as books, departmental files, confidential departmental databases, trade literature, standards and specifications, legislation, technical papers, journal papers, internal reports, external technical reports, reference works, test sheets and technical data, and and various multimedia sources.
A third study (Needham et al 2002) found similar results, and that engineers' information needs tend to be multifaceted, complex and changing in nature. Engineers were found to spend between 40-66% of their time communicating, with personal and informal sources of information being preferred. Engineers spent a significant proportion of their time using scientific and technical information. As well as other types of publication, sources such as technical reports were important and useful to engineers.
Pinelli (2001) confirms this finding: "Journal articles are appropriate for scientists to describe the development and testing of one idea. In contrast, technical reports are more appropriate for engineers to document engineering outcomes."
Elsewhere, Carstensen (1997) found a need amongst engineers for information and publications pertaining to a number of areas including design, product information, component specifications, standards, production data, materials and components, papers and research results, people and project documentation.
It is important to note that the studies mentioned above refer for the most part to the information and communication needs of engineers of all types, and not just engineers within academia. The information requirements of academic engineers will not necessarily exactly mirror the information requirements of the profession as a whole. For example, within academia a greater emphasis would be expected on the importance of scholarly journal articles and papers as an information source. Confirming this, a recent study on the needs of UK academic researchers in different disciplines (Sparks, 2005), showed that journal articles are very important for engineering researchers. However, that same study showed that several other types of publication can also be important for engineering researchers.
The literature thus shows that as well as interpersonal and informal communication and sources (tacit knowledge), a large range of publication types (e.g. trade material, books, standards, patents, legislation, reference works, and component specifications) can be expected to be important for academic engineers. Many of these sources are not produced by academics and researchers and are also not the result of publicly funded research. In many cases, therefore, they are unlikely to result in materials which will be deposited in repositories.
Repositories per se are therefore unlikely to become as important in engineering as they have become, or may become, in some other disciplines. Instead, it is more likely that repositories may become one more additional source, amongst many, of potentially useful information for engineers.
A service that focused only on materials in repositories, and ignored materials found in other sources, for which metadata repositories may be available, would therefore be unlikely to be regarded as an essential information retrieval tool.
More conclusions may be drawn from an analysis of some aspects of the published engineering information landscape.
Published material makes up the bulk of the explicit knowledge base relevant to engineering, and the complexity of the engineering information landscape has been alluded to above. Many different types of publication have been mentioned as being potentially important for engineering. For each and every one of these publication types there is often quite a range of commercial and non-commercial publishing bodies, plus a variety of secondary sources of information. When viewed as a whole, this amounts to a complex information landscape, and a landscape which is more complicated than that of many other disciplines.
For example, in the area of journals alone, academic, professional, trade and house journals can all be important sources of information for the engineer. Many engineering scholarly journals are published by the same commercial publishing houses that publish in other disciplines. In addition, however, professional societies play a particularly important role in the science, technology and medical journal publication landscape - "Society and other non-profit titles account for 85% of the top 20, three quarters of the top 200 and two thirds of the top 500 ISI ranked titles" (Morris and Olivieri (2004)). The figure is likely to be higher still if engineering titles are taken alone. Engineering professional societies tend to serve memberships made up of practicing engineers as well as academic engineers. The content of the journals they publish, and the origin of authors, reflects this to great extent, albeit with some differences between engineering sub-disciplines, with papers being written by practicing engineers as well as those in academia. So far, enthusiasm of open access models for journal publishing from engineering professional societies appears to have been less than in some other disciplines. A recent JISC survey findings show that, in two cases, learned societies publishing technology journals believed that the author community for their journals was not enthusiastic about being charged for publication (Waltham, 2005). Given that the proportion of potential authors originating from industry are unlikely to be able to include the cost of prepayment for publication from research grants, this is not surprising.
Some engineering professional societies publish journal titles which are less scholarly in orientation, and are aimed at their particular professional sectors (which will include a proportion of academics). In addition, many trade journals in engineering are published by relatively small commercial publishing houses, for whom the academic market is only one sector, alongside engineers in industry. Trade journals can often be important for academic engineers, as they provide information such as industry-related news and analysis, product and market information, technical information about materials and processes, software reviews, products reviews, and trade literature and advertisements. For less-scholarly titles, advertising and job announcements not only bring significant income to the publication, but are also an integral part of the publication's content. There is often little motivation for these publications, or those who author materials in them, to deposit materials in repositories.
House journals, published by organisations, businesses and public service providers are yet another additional type of publication, and often have their own specific concerns. A proportion of the titles identified from the DOAJ listings which provide open access could be classified as house journals.
Whilst papers and other materials published in any of these types of journals might legitimately be regarded as potential targets for deposit within repositories, in fact there are various reasons why such disclosure may often not be appropriate, or might damage the financial viability of the publication, or where motivation for doing so is low.
As a result, a proportion of the literature which appears in the various types of engineering journals has little chance of appearing, as preprints or post-prints, in repositories.
The corollary is that those within engineering who are seeking topical information in journals will continue to rely upon a number of different sources. As is the case with other types of publications, materials deposited in repositories will be one of a number of potential sources of information. Commercial 'closed access repositories' and 'closed access metadata repositories' are major, if often incomplete, sources of information in engineering. If materials deposited in repositories are neither included in such sources, or cross-searchable at the same time via a broadcast search, they are less likely to be discovered. Provision of resource discovery services, even subject-based ones, which are limited to materials in digital repositories, are therefore unlikely to significantly raise the visibility of those materials in engineering. This conclusion should be noted, as web visibility of materials in repositories has been identified as being important for the success of the digital repository movement (Whitehead, 2005).
In addition, much relevant engineering research information is not published, even in the variety of journal types mentioned above. Pinelli (2001) found that the experiences of the scientific and engineering communities concerning the disclosure (communication and publication) of knowledge were fundamentally different. Whilst the scientific community, generally speaking, makes its research knowledge freely available, "...disclosure is neither required or expected in the technology community". "Technological knowledge is not easily or completely codified, nor is it freely communicated. Unlike science, the output of technology is not made universally available". Once again, it must be remembered that he was reporting on the commercial world of engineering and technology rather than the academic community. For academic engineers, disclosure, and the resultant rewards disclosure reaps, can be just as important as it is for academic scientists, especially in the case of research funded by public bodies and research councils. However, much research undertaken by academic engineers is funded or sponsored by commercial concerns, where disclosure may actually be undesirable (as it might assist the funder's competitors). Secrecy can, in fact, be desirable. As Needham et al (2002) found, the publication and distribution of technical reports is often controlled by the particular company or agency performing or sponsoring the work. As a result, many reports will never be distributed externally and the motivation to do so is often lacking.
Commercial interests will also mean that research data is often not disclosed. In addition, much research data will be collected as a result of experiments which are applied to engineers' own systems under analysis, and will in any case often not be as universally applicable as the data collected by scientists. It is noticeable that of the data currently available via the CCLRC Data Portal, little can be classified as engineering data.
These things mean that, overall, there is less emphasis on open provision to research outputs in engineering than there is in science. It also means that engineers might not expect to find all, or even a significant proportion, of research publications relevant to their own needs openly available.
Most successful subject based repositories have developed as a result of the high dependency of academics in the subjects concerned on scholarly papers which can relatively easily be deposited as e-prints in archives. In an imaginary and simplistically monolithic information landscape, it is easy to see how digital repositories might play an extremely important role in such cases. The scenario is that literature-based research is undertaken, and results are disclosed via papers deposited in repositories, and subsequently in scholarly journals, etc. Discovery of the bulk of research results can thus be anticipated via a search of relevant repositories/subject repositories and/or a search of a relevant journal full-text or indexing service.
In a more complex information landscape, such as that of engineering, any one particular source of information for any one particular type of publication has to struggle for the limited attention of those seeking relevant information. It is therefore, perhaps, not surprising that there is not the same growing movement towards open access in engineering as in some other disciplines.
To summarise, much information which is important for engineering cannot, and will not be deposited in repositories, and any discovery service which concentrates on engineering repository materials alone will find it difficult to attract a significant usage. If engineering materials deposited within repositories are difficult to find, or are missed when literature searches are undertaken, there will be a resulting reluctance on the part of engineers to deposit their own materials in repositories.
A Pilot Engineering Repository Xsearch service therefore has a role to play in attempting to cross-search whatever is available, either through repositories or metadata repositories. Through advocacy work with publishers, professional societies and others, more sources may become interoperable and can subsequently be included in the cross-search service.
As far as the information seeking habits of engineers are concerned, Tenopir and King (2004) concluded that: "Engineers' information seeking...varies with each information need, and generally differs by engineering discipline (e.g., aeronautical compared with civil engineering), the nature of work being performed (e.g., research vs. design), country (e.g., based on access to technology, funds available, and culture), and personal characteristics (e.g., gender and age). What tends to be common to all engineers is the need to obtain information quickly with as little effort as possible." [our emphasis]
In a much earlier paper, Pinelli (1991) reported similar findings: "What an engineer usually wants...is a specific answer in terms and format that are intelligible to him - not a collection of documents that he must sift, evaluate, and translate before he can apply them" [our emphasis]. In a later paper, Pinelli (2001) reports that "Engineers and scientists exhibit many other important differences in education, technical discipline, and type of work activities. These differences point to differences in their information-seeking behaviors and information needs."
The Magic report (Needham et al 2002) provided a useful summary as to what is currently known about the information seeking behaviour of engineers, with results similar to other studies:
The conclusion to be drawn is that, generally, those involved in engineering, whether in academia or in industry, tend to require information quickly and without much effort, in order to satisfy task-based requirements. At the same time, they tend not to be overly committed to the information gathering process. To some extent, therefore, materials residing in repositories will present simply another source to be added (or, in practice, often not added) to many others in the information gathering process, and one which will further complicate that process.
A subject-based repository search service for engineering which focused only on materials deposited in digital repositories would add yet another possible service engineers would have to search. Such a service might not contain very much (certainly not at first, and given that much engineering research is not Research Council funded, and hence will not have a requirement to be deposited in repositories, perhaps not even in the longer term). Usage levels of such a service would possibly be low compared with other disciplines.
The corollary is that a subject-based search service which cross-searched materials deposited in digital repositories as well as numerous other metadata repositories containing relevant content from a variety of different types of publication might well satisfy more of the information needs of engineers.
Much of the above analysis refers to information retrieval of materials for research. So far, much less is known about information retrieval of learning materials in engineering, and it is important to note that this is a growth area due to the rapid uptake of Virtual and Managed Learning Environments. This area is less mature in terms of information provision and resource discovery analysis. Of course, one person's scholarly article may in fact be another person's teaching material, and there cannot always be clear-cut divisions between different types of resources. This is an area that requires more study.
The importance of subject access to information has been recognised in the literature. As Peters (2002) points out: "Ultimately, most seekers and users of scholarly information are persuing a topic or train of thought. Although the publisher, author, and the institution with which the author was associated may be of some interest to seekers and users of scholarly information, usually those interests pale in comparison to the topic (and scholarly task) at hand. Ultimately, a good, user-centric scholarly information system must meet the needs of students and scholars. These end-users need a system that enables broadcast searching across a wide variety of e-print servers, digital libraries, and institutional digital repositories to identify and retrieve potentially pertinent scholarly content".
The same case is made in even stronger terms by Stephen and Harrison (2002), who state "We feel more strongly than ever that there are significant advantages to a disciplinary approach to electronic services supporting advanced scholarship and higher education". They continue "Unfortunately, we have seen little of the structure of the disciplinary community in electronic services."
It is important to note that the disciplinary approach is obviously not satisfied by services such as Google Scholar.
A disciplinary approach can take various forms. From the various discussions above, it can be concluded that an ArXive type model for a subject repository service, supported by central funding, or an OLAC model (an international partnership of institutions and individuals who create a worldwide virtual library of resources by developing a network of interoperating repositories and services for housing and accessing such resources), are both unlikely to be particularly successful in engineering. As Stephen and Harrison (2002) have pointed out, "...It is precisely because disciplines are such distinct cultures that electronic systems designed to speed scholarly communication, such as Paul Ginsparg's preprint server in high-energy physics (Ginsparg, 1994), may be revolutionary in particular fields but completely irrelevant in many others..." What may work for one discipline does not necessarily work well for another discipline.
Accordingly, PerX intends to pilot a distributed subject model, whereby engineering materials within digital repositories will be cross-searchable, at the same time as metadata repositories containing relevant engineering content. In this way, it is hoped that the resulting pilot service will potentially satisfy some of the specific needs of the engineering community and take account of the advice of Stephen and Harrison (2002): "Electronic services need to be designed differentially and should deploy technologies selectively in service of the varying scholarly practices that define different fields. The disciplinary community is everything and it is our belief that significant benefits would accrue if this insight, translated into a guiding principle of design, were to be more fully exploited among today's electronic services for the research and education community"
This will have the added advantage that better subject-based resource discovery options in engineering are likely to help raise the profile of digital repository-related work within the community, and are subsequently likely to encourage growth in the rate that relevant materials are deposited in repositories. In other words, if engineers can find materials deposited in digital repositories easily alongside other relevant material, this in itself may help to encourage them to deposit their own materials in repositories.
It is important that the distributed subject model cross-search service envisaged above should not be seen as the only point of access to the plethora of materials identified as being of interest to engineering. As Fraser (2005) has pointed out, "...the building blocks of a VRE will comprise a mixture of institutional, (inter)national and discipline-based systems and services." Subject-based cross-search services such as that envisaged by PerX could become a component of relevant Virtual Research Environments, just as they could become a component of an institutional portal or other aggregating services.
Implications for the Pilot Engineering Repository Xsearch which emerge from the proceeding analysis include:
Clearly there are considerable challenges and obstacles which subject based cross search services must address. As Shreeves et al (2005) points out, much work is required as service providers “try to cope with the chaos that develops from aggregating data from diverse sources”.
|...resource discovery in engineering||| Home | About | Deliverables | Links | Pilot ||