Language testing: Different facets and parameters [Archives:2005/842/Education]
By Dr. Ayid Sharyan
Faculty of Education,
Many Students all over complain about exams and their results. Both teachers and students pass comments on exam papers: not valid, unreliable, not objective, deals with one part of the curriculum, etc. The university of Sana'a for this formed a team this semester to check the university exams of the first term of 2004-2005. The University of Science and Technology held a series of workshops with the help of Professor Mahmoud Aukasjha, a visiting professor from Cairo who is an expert in measurement and testing. I participated in this activity and being a member of the team of experts entrusted with the task of evaluating the exams of Sana'a University, I thought it worthwhile to discuss this topic and share ideas with a wider readership to initiate a healthy dialogue and create awareness about the issues involved in test construction, in the context of approaching exams.
The word test makes learners nervous; teachers do not feel happy either. But can we measure the progress of learners without a test? Not only is learners' achievement checked through the mechanism of testing ,but also life at large is full of situations where we have to choose among alternatives and make a decision: choosing a life partner, a major at the university, a job, a place to stay, a friend, a political party, the way to dress, speak, eat, etc. Job interviews, as a form of test, are to select new employees. To admit new entrants to join the Department of English, one cannot do without tests. What does a test actually seek to do?
Test measures the ability, knowledge, or performance of a candidate. Test methods in EFL situation vary from alternative response item (yes, no), fixed response item or closed-ended response (as choose a, b, c, or d), to free response item or open-ended response, etc. These tests examine the English language skills such as listening, speaking, reading, and writing and sub-skills like pronunciation, intonation, stress, accuracy, fluency ,literary appreciation, grammar, vocabulary and so on.
Since exams go hand in hand with any learning process, it is normal that students who take a course have to appear for a test. Preliminary tests help to place candidates in a certain level or diagnose their shortcomings to overcome them or think of remedial teaching. From time to time teachers need to evaluate what was covered to see the progress of learning. Performance of candidates is measured periodically by formative tests that are indispensable for successful learning. At the end of the course, teachers need to take a decision about the level of attainment of learners. Summative evaluation is crucial here to culminate the achievement and measure the gains. Students' evaluation assists evaluating the whole program: input, and processing and output. Evaluation of this sort is possible if it makes use of varied types of tests such as progress, achievement, proficiency, placement, aptitude, diagnostic tests and so on.
But does it mean a pen-and-paper test is the only means of evaluating EFL learners? What about interview, observation cards (as questionnaires on Likert scale or the extended technique of Thurostone), portfolio, progress reports, research projects or reports by students goals table, checklists, etc.? Unfortunately, teachers, many a time, are oblivious of measurements like these ones. For this, traditional testers rely heavily on pen-and-paper tests that dominate the educational arena.
But why all this fuss about testing? A test item is the first building block in the whole national education. In evaluating the academic curricula, grades in a teacher's book means a lot for the national progress, sometimes more than a standardised test as GRE or TOFEL. Some voices now demand some kind of standardized comprehensive test to check an output that attains the minimum requirement on international standards. But this is not the need of the hour. What is needed now is to better the teachers' exams to obtain a precise measurement so as to ensure the quality of education. Assessment of education takes off from the departure point of such exams. Program evaluation or curriculum development is a failure unless it takes into account testing as its base. If such an importance is assigned to testing, one wonders what to test: knowledge, cognitive skills, practical skills, transferable skills or all? Since testing is the means to take a decision, test constructors differ in their opinions about what to test: linguistic competence or performance. A test designer thinks of a range of levels of knowledge (e.g. memory, comprehension, application, analysis, synthesis, or evaluation) when constructing a test. Other factors that are equally significant are things like comprehensiveness, variety, test format, test organization, validity, reliability, objectively and proper layout. Other characteristics such as authenticity, interactiveness, impact, and practicality of the test are some of the very necessary test requirements.
Since the learner is going to be encapsulated in one number (i.e. mark), test designers are compelled to be fair and objective in issuing their resolutions that have the potential to spell the future of a test-taker. An exam should not only elicit knowledge but it should add something new to the learning process.
How can we check that exams are doing what they are supposed to do? Mark registers or mark sheets reveal discrepancies. The entry point then is the control sheet with its frequency tables, percentages of failures and correlation among all courses. A telling evaluation is not possible without taking a sample of the learners' scripts to check the answer sheets and compare it with the attained grades. Examples of the learners' assignments, research projects, portfolio, disclose the exact level of the exam takers and the processing of input in a program. This is to reveal what happens in terms of processing to find out if the test matches the minimum requirements or not. Evaluating exam means assessing the educational system and reporting its pros and cons. Curriculum (intended, implemented or attained) is seen in the light of exams evaluation. The type of achievement tells clearly whether the intended curriculum has been achieved or the attained curriculum is something totally different. To judge the re liability of exams accurately one needs to bear in mind some criteria.
An important criterion of exams is variety of questions to measure the level of students and gains so as to measure up the notational level of the expected outcomes. Variety of exams provides valuable feedback on the match or mismatch with the intended curriculum that was chalked out by the educational planners and policy makers. Accuracy of tests is of paramount importance to screen the implemented curriculum and find out the exact attained curriculum. Comprehensiveness of exam is a feature that shows what the teacher has covered in his teaching. Since a test is a sort of document that reflects the level of the teacher and the level of students, it is bound to have some face validity to tell that it measures certain prescribed levels of knowledge. Test constructors, in terms of content, need to strike a balance between performance, skill, knowledge, and mastery of rules.
Major types of exams
An essay-type exam is easy to prepare but difficult to mark. They help to gauge the students' higher levels of thinking: analyzing, organizing and discussing ideas. But its problems are many. It takes a lot of time to correct them with no objectivity. Marks may be influenced to a great extent by the subjective impression of the examiner. It is difficult to cover all goals of the course. This type of exam can be improved by carefully delimiting the aim of the question. The phrasing of the question can pinpoint exactly what is required; examples are:
1. Compare Blake's London with Wordsworth's London from the point of time and place in the two poems;
2. Give the reasons that led Pip in Great Expectations to believe that Miss Havisham was his benefactor;
3. By looking at the invocation in Paradise Lost, differentiate between the fall of Man in Christianity and Islam by referring to the story of Satan, Adam and Eve.
Essay-type questions can be improved also by deciding the level of knowledge that the question measures. Questions that require long answers need to be avoided. Prior thinking about the time, model answer, rules of marking minimize flaws in these exams. Avoiding optional questions is what many test developers stress to expose testees to the same experience to be fair in evaluation.
The second type that is commonly used is the objective exam with all its varieties: multiple choice, true/ false, filling in the blanks, matching, rearrangement, etc. This type is known for being easy to correct. It is objective in terms of marking. Its validity and reliability tends to be higher than the essay-type questions. It is more comprehensive and it allows for different levels of knowledge at the same time. Its minus points are that it takes a lot of time to prepare; some testees may guess and get marks and some find it easy to cheat. In addition to its cost sometimes, it does not allow for learners to expose their ability of writing, organizing ideas to show their opinions.
To sum up, there is no ideal way in language testing but a combination of both essay-type and objective test is more effective and more practical. Such a method allows for variety, comprehensiveness and the ability to measure all levels of knowledge in addition to some abilities of organizing ideas, expressing one's views that become clear in writing, for instance. Internal assessment may include some other measurements as observations cards, interviews, and portfolios as well as research projects. Talking together both in-term and end-term exams give a fair idea about the testee for teachers, course developers, curriculum designers, and program evaluators. This is what gives importance to language testing as an important area of research nowadays.