Mustafa's+Evaluation+and+Analysis

**Practicality**

 * 1) Did the test take as long as expected to design?
 * 2) Was the test easy to administer? (arrangement of seating, distribution of the test among the learners, supervision, necessary equipment, timing, etc.)
 * 3) Were the instructions clear and unambiguous, and examples for each task type provided, so that the test-takers knew exactly what to do?
 * 4) Was it easy to mark and score?

**Reliability**

 * 1) Does the test provide consistent and credibile measurement of language ability?
 * 2) Was the variety of tasks and items enough to provide a representative sample of language ability?
 * 3) Did the answer key provide an objective scoring of test items?
 * 4) Did the format of the test reflect the format of the activities in the classroom, in order to ensure that the learners are familiar with the tasks and rubrics.
 * 5) Were the scores affected by intra-marker variables? For example, test-marker's personal knowledge of student, sequence of marking (good test following a poor test gets graded higher), order of marking (fatigue--the test marked last may be graded differently than the first test marked).

NOTE: Practicality and reliability are more significant in norm-referenced tests (e.g. placement and proficiency). In criterion-referenced testing validity is more important. The multiple-choice format is the best for both norm- and criterion-referenced testing, but it is restricted to assessing receptive skills only. Tests of speaking and writing have high validity, but do pose reliability and practicality problems.

**Validity**

 * 1) What was the face validity, i.e. did the test appear to test what it is supposed to test?
 * 2) What was the content validity, i.e. did the test really measures what it is supposed to measure, and nothing else (e.g. summarizing a text heard from tape not only checks writing, but also listening comprehension and the ability to select, extract, and condense the most essential information; general knowledge, intelligence-testing and culturally-loaded questions do not test linguistic competence but extralinguistic knowledge or the analytical skills of the testee).
 * 3) Wha was the construct validity, i.e. did the test reflect the relative importance of the elements specified in the test construct?

**Authenticity**

 * 1) Was the test as realistic as possible and closely related to the situations in which the examinees will perform in real life

**Washback**

 * 1) Did the test have any positive or negative washback effect on the teaching programme.
 * 2) Did the test reinforce the relative importance of skills and language focus in the classroom?