Implementing Learning Technology
Route map: Home Publications Imp. Learning Tech.Evaluation Methods
Evaluation: A practical guide to methodsPhilip Crompton
This article seeks to cover the principles for conducting an evaluation whether it is a small or a large project. An understanding of the theory and background to evaluation is beneficial in order to better plan, design and conduct an evaluation programme. Above all there is no substitute for having a clear purpose for an evaluation, defining the right question is a prerequisite.
"Evaluation is the collection of, analysis and interpretation of information about any aspect of a programme of education or training as part of a recognised process of judging its effectiveness, its efficiency and any other outcomes it may have."
Mary Thorpe in "Handbook of Education Technology" (Ellington, Percival and Race, 1988)
As the authors point out, Rowntree (1992) makes comment on this definition as follows:-
Evaluation and assessment although often used interchangeably, refer to different levels of investigation. Evaluation is concerned at the macro or holistic level of the learning event, taking into account the context of learning and all the factors that go with it, whereas assessment can be seen as the measurement of student learning and is one of the elements that go into an evaluation, the micro-level. One aspect of any sound evaluation is the allowance for the unexpected. Above all an evaluation is a designed and purposeful enquiry which is open to comment.
There are initially two distinct approaches to evaluation
The scientific approach is concerned with the measurement of the effects of specific variables against the resultant outcomes. It seeks to examine the achievement of stated goals in relation to a learner's pre-knowledge and skills. The approach is geared towards the measurement of the efficiency of the educational intervention as well as the effectiveness of the learning outcomes.
The illuminative approach on the other hand seeks to examine and explore the process of the educational intervention. The techniques, as such, are therefore more qualitative, some might say, more subjective in nature as they call on personal judgements.
Though these approaches appear to be at either end of a continuum, it is possible to make use of both within the complexity of what is educational research. A selected mid-point will be governed by that which is being evaluated as well as the reason behind the evaluation. It is not uncommon to see a mix of techniques from both approaches combined in an evaluation project. The degree of combination will depend largely on a process of negotiation between the evaluator and the instigator of the evaluation as well as the environment and the time frame in which the evaluation is being performed.
Whichever approach or combination of approaches is used, the evaluation should always be a clear well thought out undertaking. The more effort that goes into the pre-planning of a piece of evaluation the better the evaluation. Before conducting any evaluation it is important to not only have defined that which you are trying to investigate but also how you are going to go about it.
Romiszowski (1988) differentiates between the scope (Levels of Evaluation) and depth (Levels of Analysis) of an evaluation.
It is necessary to predetermine at which level the evaluation is to be conducted. Evaluation is of great importance in all aspects and stages of teaching and learning. All too often it is seen as the last activity and yet as Laurillard (1993) states it is an iterative process and should take place at every stage in the design, production, implementation/integration of a new educational intervention (e.g. using LT materials) whether it be as a complete course, part of a course, or a particular session or teaching aid.
There are two possible stages for evaluation;
The evaluation of a piece of courseware in isolation will tend to focus inwardly on various aspects of the software itself. It will look at aspects like navigation, screen design, text layout, etc. It will examine the scope of the coverage of the material content, e.g. is the subject presented to such a breadth and depth that its use is not limited to beginners?
The evaluation of the courseware within the course itself will allow us to examine other factors which will lend themselves to the successful integration of the product within the course. These will be:
The iterative nature of evaluation should then assist in making the learning experience both more efficient as well as more effective as the feedback is used to continuously improve matters.
It is important to recognise at the outset that evaluation is a time consuming exercise and better planning can save time and effort in the actual evaluation process. It is more advisable to do a small evaluation well than to try to do a large evaluation and run out of time. Remember that an adequate amount of time needs to be allocated to both the analysis and writing up stages.
Two main concerns for evaluation in the context of the TLTP initiative have been the need to justify technology-based learning on the grounds of effectiveness and efficiency. A third factor should be added to the list and that is relevance; the applicability and appropriateness to the intended employers and users of the technology, the teachers and students, even the department or institution. Whereas Romiszowski (1988) includes this under factors of effectiveness, there is a case for treating it separately. Project Varsetile (Allen, Booth, Crompton & Timms, 1996) encountered technology that would score highly on both grounds of effectiveness and efficiency and yet at the same time be wholly inapplicable for the given context of instruction and learning. Successfully integrating or embedding the CBL package within a course is dependant on relevance.
The following are factors open to consideration and investigation by an evaluation study.
The evaluation of these factors will lend themselves to improvements in both the course, in terms of aims, objectives and content, and the procedures for the design and development of the course itself.
This section will look at the various instruments available to the evaluator and while the list is not exhaustive it is representative of the main techniques used in any evaluation.
This is probably one of the easiest and simplest instruments to create for data collection. However this apparent simplicity belies the fact that creating and organising your questions can be quite arduous and time consuming. You should be clear about the areas and concerns that the questionnaire is to cover. Do not attempt to ask everything as this will tend to make the questionnaire too long and by so doing decrease the willingness of the respondent to answer fully and thoughtfully.
In designing the layout of the questionnaire, leave enough space for responses to open-ended questions but at the same time keep such questions to a minimum if possible. These areas are often better left to semi-structured interviews where a fuller answer can be sought. If these are necessary, keep them to short-answer questions allowing the respondent some space to elaborate on their response.
This evaluation is best used after you have collated and analysed your questionnaire. This will then allow you to obtain an elaboration on important points arising out of your questionnaire. It even allows you the opportunity to cover those points you did not manage to cover in the questionnaire.
While developing your questions from points arising out of your questionnaire do not feel confined by these, allow the interviewees to raise points of concern to them about the learning event. By so doing you may discover points that you yourself had not thought of. However, do not allow yourself to be side-tracked too much.
These were developed by the TILT Project (University of Glasgow) and are extremely useful in gauging a student's confidence on particular points of a given subject area. These are useful as self indicators of learning rather than anything more concrete. Although their relationship to the actual learner is unproven they can be a useful guide to a student's self-appraisal of his or her own knowledge. It is beneficial within a learning event if the learner's confidence is higher in order that they may actually learn. Under confidence as much as over confidence can be an inhibiting factor as a foundation for learning.
This is taken from an evaluation carried out on a piece of software produced by a TLTP Economics consortium. The Confidence Logs demonstrated an increase in the learners selfassurance with the material. On average their confidence moved from a feeling of "SomeConfidence" to that of "Confidence". Although this does not necessarily translate into increased attainment scores it is a necessary element to any learning environment.
|Very Confident||Confident||Some Confidence||Little Confidence||No Confidence|
Table 1: Confidence Logs
Crichton (1995), cites Belcha (1992), in her preliminary observations of an earlier version of WinEcon states that "the subjective measure used here has the strength of reflecting a student's own judgement about whether or not the software was helpful in learning course material.".
Again, as with any instrument the better prepared and the more focused an observation is the more likely it is that useful data can be collected. Although with video-taped recordings it is possible to make observations at a later date, it is far better to have a sense of what one is seeking to observe whether it be inter-personal communications, skills or behaviour patterns (e.g. study strategies - note taking, etc.)
Use of a structured observation sheet is essential whether the session is video-taped or not. If video-taping is not possible, then the use of a tape-recorder is useful as students verbal interactions and comments may be synchronised with the notes and comments from the observation sheets.
It is useful to have a code or shorthand that can help in recording your observations. This will help you focus on what it is that you are trying to observe while at the same time allowing for those unexpected but enlightening events which may occur. If you are conducting the observation exercise with a colleague make sure that you are both clear on the meaning of the shorthand code.
This particular observation sheet was created for use in the evaluation of a piece of Japanese CAL software. A pair of students were video taped using the program and their actions were noted on the observation log when the tape was later analysed.
It is useful to start any evaluation by obtaining a profile for each student as a baseline against which any future changes in both attitude and opinion may be referenced. Within a Student Profile questionnaire it is appropriate to ask about the learner's attitude towards computer-based learning as well as their use of computers in general. At the same time, questions concerning the learner's academic background in the particular and related areas of the content subject under evaluation. If for instance in an evaluation programme on CBL in economics, information about the learner's level of knowledge of not only economics (e.g. Standard Grade or Higher, O or A levels etc.) but in the related areas of maths and statistics would be useful.
Profiles are also useful when selecting evaluation groups by the use of stratified random sampling through which it is possible to obtain groups of learners that meet specific criteria. When wanting to make sure that there is a range of student abilities in a certain group the use of a student profile questionnaire can be used as a basis for this selection.
This is perhaps one of the most difficult techniques in terms of educational research. The idea of performing a pre-test before the students use of a particular program followed by a post-test in order to measure any educational gain is not as simple as it might first appear.
The concern is that of test validity which is fundamental to the
purpose of any testing. Cohen and Manion (1980) raise the point
"Conditions that threaten to jeopardise the validity of experiments... are of greater consequence to the validity of quasi-experiments (more typical in education research) than to true experiments...."
The problem is one of both 'internal validity' and 'external validity.'
Internal validity is concerned with whether the educational intervention makes any difference to the outcome in question. Whereas external validity is concerned with how far the results are generalisable and in what settings and with what population. There are different factors which affect validity which fall outside the scope of this article. However one of the main problems within higher education is that it is becoming increasingly difficult if not time consuming to create a 'true' experimental design.
Notwithstanding the potential pitfalls and accepting the limitations of this technique indicators of learning gains may be obtained, and while not being generalisable may be useful when taken together with other findings.
This example is once again taken from the evaluation of the economics software package WinEcon together with Economics in Action and the two recommended coursebooks that these software programs are based on. The students were randomly split into three groups and each group were given a different learning resource for each test. They were tested at both the beginning and at the end of each session.
|TEST 1||+/- %||TEST 2||+/- %||TEST 3||+/- %|
The pre- and poststests results appear to indicate that the software was certainly no worse than more traditional forms of bookbased learning. The first test appeared to indicate that the possible automatic note taking by the students seemed to have had a beneficial impact which is somewhat equalised as all the students adopt this approach in the subsequent two tests when using either piece of software.
A one-way analysis of variance was carried out on the data in order to identify whether any group was better at economics. The pre-test scores of each group were subjected to this analysis but there were no significant differences. The nearest to any effect occurred in Test 1, though this was not statistically significant. No matter which media was used the students performed equally well, however, with such a small sample small changes in performance were undetectable.
It is useful to document all aspects of the evaluation procedure and check this off once completed. Tessmer and Harris (1992) make great use of checklists as templates for conducting evaluations. Having a checklist of the stages within an evaluation can help in the shaping and structuring of the overall evaluation design, besides their use in the evaluation itself.
The TLTP project TILT have developed a useful checklist of all learning resources available to students. At the end of a course or programme the students complete the checklist indicating what they found useful. It could include such resources as books, handouts, tutor's time, other students, CBL packages etc.. This is useful in indicating those resources which a student uses and finds most helpful over the period of a course.
Another excellent of source is the "Telematics Applications for Education and Training USABILITY GUIDE, Version 2" by Smith & Mayes (1995) as it contains references to other evaluation sources. It is freely available over the world wide web via URL "http://www.icbl.hw.ac.uk/tet/usability-guide/".
This checklist can be used when initially planning an evaluation in order to begin to create an evaluation plan. It could then be written up as the basis for a contract, either formal or informal, for any evaluation work to be performed.
Who? (Target - know your audience)
What? (Area - understand what it is you are evaluating)
When? (Timing - don't start until you are ready)
How? (Techniques - what is most appropriate)
The effort that is put into the design of any piece of evaluation will pay rich dividends, however, defining the right question is always the key starting point. There are degrees of correctness of definition but this should always be something that is measurable and possible within the time frame in which you are working. Each evaluation will have its own combination of costs and benefits (see Breakwell and Millward, 1995), and estimating them is all part of the work of an evaluation.
Evaluation is also addressed in the following chapters: Chapter 11 - a conceptual introduction; Chapter 3 - the role of evaluation in the overall process of implementation, and Chapter 7 - a practical guide on how to evaluate LT materials that you may be considering using in your teaching.
To contact the maintainers - mail (firstname.lastname@example.org)
HTML by Phil Barker
© All rights reserved.
Last modified: 30 December 1999. (formatting)
First web version: 03 October 1997.
First Published: July 1996.