VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF LANGUAGES AND INTERNATIONAL STUDIES
FACULTY OF ENGLISH LANGUAGE TEACHER EDUCATION
GRADUATION PAPER
THE DEVELOPMENT AND VALIDATION
OF AN ENGLISH SPEAKING ACHIEVEMENT TEST
FOR 8TH GRADERS IN A SECONDARY SCHOOL
IN HANOI
Supervisor: Dương Thu Mai, PhD.
Student: Nguyễn Vân Anh
Course: QH2014.F1.E1
HÀ NỘI – 2018
ĐẠI HỌC QUỐC GIA HÀ NỘI
TRƯỜNG ĐẠI HỌC NGOẠI NGỮ
KHOA SƯ PHẠM TIẾNG ANH
KHÓA LUẬN TỐT NGHIỆP
XÂY DỰNG VÀ XÁC TRỊ BÀI KIỂM TRA HỌC KÌ
KĨ NĂNG NĨI MƠN TIẾNG ANH CHO HỌC SINH
LỚP 8 TẠI MỘT TRƯỜNG THCS TẠI HÀ NỘI
Giáo viên hướng dẫn: Dương Thu Mai, PhD.
Sinh viên: Nguyễn Vân Anh
Khóa: QH2014.F1.E1
HÀ NỘI – 2018
ACCEPTANCE PAGE
I hereby state that I: Nguyễn Vân Anh, QH2014.F1.E1, being a candidate
for the degree of Bachelor of Arts (Fast-track program), accept the requirements
of the College relating to the retention and use of Bachelor’s Graduation Paper
deposited in the library.
In terms of these conditions, I agree that the origin of my paper deposited
in the library should be accessible for the purposes of study and research, in
accordance with the normal conditions established by the librarian for the care,
loan or reproduction of the paper.
Signature
Date
i
ACKNOWLEDGEMENTS
I would like to express my sincere gratitude to my supervisor – Ms.
Duong Thu Mai, PhD. – for her constant support for my study. Without her
valuable advice and suggestions, together with her encouragement and passion,
I could not manage to complete this graduation paper.
I deeply appreciate the testing experts at FELTE, ULIS-VNU, as well as
8th grade teachers and students at Dong Da Secondary School, who have been
the enthusiastic respondents in my research.
I would also like to send my thanks to my lecturers and my classmates
for the insightful comments and encouraging words they have given me.
Last but not least, I owe my gratitude towards my beloved family
members, who have been motivating me throughout the writing of this research.
i
ABSTRACT
Thanks to the trend of communicative language teaching and the
introduction of the pilot English program, the role of speaking and speaking
assessment is being highlighted in secondary school English education.
However, speaking assessment instruments for secondary students are limited
and not many of them are of good quality, due to the complexity of test
development and validation. This empirical study is an attempt to design and
evaluate the content validity of an English speaking achievement test for 8th
graders based on conceptualized framework of test construction. To select the
appropriate test content, observation of English 8’s textbook content and
teaching practice, as well as teachers’ opinion collected by questionnaire had
been conducted. A test was constructed at the end of the study. It was unveiled
that responsive tasks were most appropriate to assess 8th graders’ speaking
performance. The survey data gathered from testing experts afterwards has
shown that the test content is valid in terms of relevance to objectives, duration,
instructions and interaction amount. The rating scale and some tasks in the test,
however, need further revisions to be more relevant to the course contents.
Key terms: language tests, test development, test evaluation, content validity
ii
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ................................................................................. i
ABSTRACT ......................................................................................................... ii
TABLE OF CONTENTS ................................................................................... iii
LIST OF FIGURES, TABLES, AND ABBREVIATIONS ............................ vi
CHAPTER 1. INTRODUCTION ...................................................................... 1
1. Background of the study.................................................................................... 1
2. Aims and objectives of the study ...................................................................... 2
3. Scope of the study ............................................................................................. 4
4. Significance of the study ................................................................................... 4
5. Organization of the study .................................................................................. 5
CHAPTER 2. LITERATURE REVIEW .......................................................... 6
1. Key concepts of language assessment ............................................................... 6
1.1. Language testing ....................................................................................... 6
1.2. Developing classroom language tests ..................................................... 13
1.3. Validating a classroom language test...................................................... 17
1.4. Testing speaking/oral production............................................................ 20
2. Review on related studies ................................................................................ 27
2.1. Studies on developing and validating speaking tests worldwide............ 27
2.2. Studies on developing and validating speaking tests in Vietnam ........... 27
CHAPTER 3. METHODOLOGY ................................................................... 30
1. The instruction and assessment of speaking skill for 8th graders at Dong Da
Secondary School ............................................................................................. 30
1.1. The instruction of speaking skill ............................................................. 30
1.2. The assessment of speaking skill ............................................................ 31
iii
2. Research questions .......................................................................................... 31
2.1. What are the components of an English speaking achievement test for
8th graders? ..................................................................................................... 31
2.2. To what extent is the test valid in terms of content validity? ................. 31
3. Sampling .......................................................................................................... 32
4. Data collection methods .................................................................................. 32
4.1. Observation ............................................................................................. 32
4.2. Survey ..................................................................................................... 33
4.3. Data collection procedure ....................................................................... 34
5. Data analysis methods ..................................................................................... 35
CHAPTER 4. FINDINGS AND DISCUSSION .............................................. 37
1. The development of the speaking achievement test for 8th graders at Dong
Da Secondary School ........................................................................................ 37
1.1. Test rationale ........................................................................................... 37
1.2. Content selection ..................................................................................... 38
1.3. Development of specifications (including scoring procedures) and
writing of materials ........................................................................................ 47
2. The validation of the speaking achievement test for 8th graders at Dong Da
Secondary School ............................................................................................. 49
CHAPTER 5. CONCLUSION ......................................................................... 55
1. Conclusion and implications ........................................................................... 55
2. Limitations of the study ................................................................................... 56
3. Suggestions for further research ...................................................................... 57
REFERENCES .................................................................................................. 58
APPENDICES.................................................................................................... 63
APPENDIX 1. TEST SPECIFICATIONS .......................................................... 63
iv
APPENDIX 2. QUESTIONNAIRE FOR EXPERT INFORMANTS ................ 70
APPENDIX 3. QUESTIONNAIRE FOR TESTING EXPERTS ....................... 75
APPENDIX 4. OBSERVATION SCHEME FOR TEXTBOOK ....................... 80
APPENDIX 5. OBSERVATION SCHEME FOR ENGLISH LESSON ............ 81
APPENDIX 6. BOOK MAP OF ENGLISH 8 TEXTBOOK ............................. 82
v
LIST OF FIGURES, TABLES, AND ABBREVIATIONS
LIST OF FIGURES
Figure 1.
Steps to designing an effective test
Figure 2.
Different components involved in communication
Figure 3.
Testing experts’ opinions on the difficulty of test tasks
LIST OF TABLES
Table 1.
A framework for describing the speaking construct
Table 2.
Speaking performance objectives of the new English 8 course
Table 3.
Teachers’ opinions on the frequency of task types in English
lessons
Table 4.
Teachers’ opinions on the suitability of speaking task types
Table 5.
Teachers’ opinion on the likelihood of using speaking task types in
real-life communicative situations
Table 6.
Teachers’ opinion on the level of interest of students in different
speaking tasks
Table 7.
Common Reference Level: global scale
Table 8.
Frequency of speaking tasks recommended by the textbooks to
measure each performance objective
Table 9.
Frequency of speaking tasks used by teacher to measure each
objective
Table 10.
Testing experts’ opinions on the relevance of test tasks towards
course content
Table 11.
Testing experts’ opinions on the relevance of test tasks towards
task objectives
Table 12.
Testing experts’ opinions on the instructions/prompts of test tasks
Table 13.
Testing experts’ opinions on the types of interaction in test tasks
Table 14.
Testing experts’ opinions on the duration of each task and the test
as a whole
vi
LIST OF ABBREVIATIONS
VSTEP
Vietnam Standardized Test of English Proficiency
IELTS
International English Language Testing System
TOEFL
Test of English as a Foreign Language
DDSS
Dong Da Secondary School
CEFR
Common European Framework
ULIS
University of Languages and International Studies
VNU
Vietnam National University
vii
CHAPTER 1. INTRODUCTION
1. Background of the study
The role of assessment in teaching and learning is indispensable. Not
only does it measure “the level or magnitude of some attribute of a person”
(Mousavi, 2009), assessment also utilizes the collected information to render
“decisions about students, curricula and programs, and educational policy”
(Nitko, 2009). Thanks to such well-timed decisions, learning can be reinforced
and the students can also be motivated (Heaton, 1988). To attain valuable data
about learning and teaching, a variety of assessment instruments can be listed,
such as tests, portfolios, diaries, conferences and so on. Among these
instruments, test is an outstanding tool for educational assessment in general
and language assessment particularly. First, it highlights the “strengths and
weaknesses in the learned abilities of the students” (Henning, 1987). Moreover,
according to Heaton (1988), testing enables educators to make necessary
adjustments in their teaching. It also locates the difficulty areas in the language
program and motivates students through fair evaluation, as asserted by Heaton
(1988). Thus, not only the teachers but also learners might benefit from testing.
That explains why testing is involved in schools in various forms and at
different levels.
Designing a test takes time and is never considered an easy procedure.
As Abeywickrama and Brown (2010) put it, “constructing a good test is a
complex task involving both science and art.” This is not an exaggeration - to
tell the truth - especially in Vietnam, a non-native English speaking country.
Constructing a test to measure oral ability is even more challenging, since
speaking is “more than just knowing the language” (Chastain, 1988). Unlike
reading and listening ones, speaking tests “do not easily fit the conventional
assumption about people and testing” (Underhill, 1987). This productive skill
requires testers to actually perform the language, and is “far too complex a skill
to permit any reliable analysis to be made for the purpose of objective testing”
(Heaton, 1990).
1
After the construction process comes the validation one, in which the
validity of the test is evaluated. This procedure can take place before and after
the test administration, telling how “sound” and “to the point” the test is
(Cumming & Berwick, 1996). As Cumming and Berwick also claimed,
validation in language testing is ominously significant since it takes into
account
“educational
and
linguistic
policies,
instructional
decisions,
pedagogical practices, as well as tenets of language theory and research”
(Cumming & Berwick, 1996).
Validity is undoubtedly the top concern when it comes to evaluate the
value of the test, according to Bachman and Palmer (1996). According to
Heaton (1988), a test validity is “the extent to which it measures what it is
supposed to measure and nothing else” (p. 159). Among different dimensions
of validity, content validity is one of the most important, as the assessment of
content validity is the initial stage of establishing the validity of any instrument
(Ozer, Fitgerald, Sulbaran & Garvey, 2014). Evidences of content validity are
often reflected through the relevance of the content and the coverage of
important parts of the construct domain (Messick, 1995).
2. Aims and objectives of the study
In a context where communicative language teaching is increasingly
promoted (Abeywickrama & Brown, 2010), together with the introduction of
The Pilot English Program for Secondary School Students (Ministry of
Education and Training, 2012), the role of speaking and speaking assessment is
being highlighted. Yet, due to the complexity of test development, current
achievement classroom speaking tests tend to be either adapted from that of
international standardized tests namely VSTEP, IELTS, TOEFL and so on, or
self-designed by the teachers. The problem with standardized tests, however,
lies in the fact that what is tested might not be what is taught (Vu, 2010). On
the other hand, tests designed by teachers might very well reflect the learning
situation, but encounter troubles regarding validation. This is due to the fact
that a close and thorough investigation into the qualities of such homemade
2
tests tends to be neglected. This absence of test evaluation might catalyze
several consequences, as only when the test is qualified can teachers and
learners make the most use of it (Ozer, Fitgerald, Sulbaran & Garvey, 2014)
and can the relationship between learning and assessment be fortified. In
addition, to the knowledge of the author, there is still a humble amount of
research about developing speaking tests compared to other skills. Studies
about speaking construction and validation that aim specifically at students of
lower levels – such as secondary or primary school students – are even fewer.
Given the importance of speaking test development and validation issues, the
researcher believes this domain deserves more attention.
All the aforementioned reasons have motivated the researcher to
conduct a study in which an English speaking achievement test for 8th graders
is designed and validated based on language assessment theories and experts’
opinions. Hopefully, the result of the study can highlight the importance of
having a qualified speaking test to facilitate secondary school’s English
learning and motivate other researchers to investigate more into test
development and validation.
In particular, this research primarily aims at designing and validating an
English speaking achievement test for 8th graders at a secondary school in
Hanoi. To be more specific, the development of basic components of the test,
including test specifications, test items and rating scale, will be featured in this
study. The test will then be validated through data collected regarding its
content validity.
To accomplish such goals, the paper purports to address the following
questions:
2.1. What are the components of an English speaking achievement test
for 8th graders?
2.2. To what extent is the test valid in terms of content validity?
3
3. Scope of the study
The study focuses on testing only the speaking skill of secondary
students that are using the new English 8 textbook. Speaking is integrated more
in the new English textbooks, which follows the communicative teaching
approach (Ministry of Education and Training, 2017a), witnessing an increased
number of speaking-related drills and activities in its content.
Besides, the research aims at rendering an achievement test rather than
other kinds of test. The reason behind this is because the syllabus of the
English 8 course and its learning objectives are available to the public, which
provides the necessary conditions to construct an achievement test. Moreover,
the test administration time is near the end of the second semester, hence,
nothing but an achievement test would be more suitable for the level and
amount of knowledge that they have acquired.
Last but not least, although a good test should possess several qualities,
namely reliability, validity, backwash, authenticity and practicality (Brown,
2004), this study could only cast its light on one validity aspect. This is firstly
because of the scale of the study and the availability of resources that leaves the
researcher no choice but to be selective. Validity was chosen because it is
specially dubbed as an “essential measurement quality” by Bachman & Palmer
(1996, p.19). Moreover, regarding all dimensions of validity, it is asserted by
Ozer, Fitgerald, Sulbaran and Garvey (2014) that the assessment of a test
should begin from content validity. This explains why content validity is the
focused issue in the study when it comes to validation. The literature used in
the study will be restricted to English as a Second Language and Language
Assessment materials.
4. Significance of the study
The study is expected to provide a reference source for test designers to
constitute assessment instruments, especially achievement tests, to measure the
speaking performance of secondary school students. Specifically, the test
specifications drafted in this study might help formulate similar speaking
4
achievement tests for 8th graders. Additionally, the framework adapted and
developed in this study might be the foundation for developing speaking tests
of high quality in the future. With such high quality tests, more valuable
information about the practice of teaching and learning the new English
textbooks will be gathered.
In addition to this, the development procedure of the test might reflect
the most problematic areas in test designing that deserve more attention.
Meanwhile, the validation process is likely to highlight several factors that
affect the test’s content validity. Such information would be valuable for test
makers to develop more valid assessment instruments in the forthcoming
future.
5. Organization of the study
The study is divided into five chapters:
Chapter 1. Introduction
This chapter is the presentation of statement of the problem, rationale,
scope, aims and objectives as well as the organization of the study.
Chapter 2. Literature review
This chapter featured the literature related to language testing, test
designing and test validation.
Chapter 3. Methodology
This chapter describes the methods of the study, the selection of
respondents, the materials and the methods of data collection and data analysis.
Chapter 4. Findings and discussion
This chapter presents and discusses the results of the data collection and
data analysis process.
Chapter 5. Conclusion
This chapter summarizes the study, names some limitations and offers
recommendations for further study.
5
CHAPTER 2. LITERATURE REVIEW
This chapter attempts to provide a theoretical background for the
research. Key concepts of language assessment – including language testing,
test development, validity and reliability issues – together with international
and domestic studies in the domain will be reviewed.
1. Key concepts of language assessment
1.1. Language testing
1.1.1. Definitions of tests
The developing field of language assessment has noticed various
definitions of test. Carroll (1968) gives a definition of test as below:
“A psychological or educational test is a procedure designed to elicit
certain behavior from which one can make inferences about certain
characteristics of an individual.” (Carroll, 1968, p. 46)
From this notion, Bachman (1990) asserts that a test is one type of
measurement tailored to elicit “a specific sample of an individual behavior”.
Abeywickrama and Brown (2010) attempt to construe the term in a simpler
way, in which a test is “a method of measuring a person’s ability, knowledge,
or performance in a given domain” (p. 3).
Much as these interpretations differ, they agree on several crucial parts.
First, a test is a method, an instrument, or a procedure that requires
performance of the test-takers. According to Abeywickrama and Brown (2010),
in order to qualify as a test, the method needs to be explicit and structured:
multiple-choice questions with answer key, writing prompt with scoring rubric,
oral interview coming along with a question script or a checklist of anticipated
responses.
Second, a test must “measure”, which might be understood as “a process
of quantifying a test taker’s performance according to explicit procedures or
rules” (Bachman, 1990, pp. 18-19). The measured target might be anything
from general ability to specific competencies or objectives (Abeywickrama &
6
Brown, 2010). The communication of results, hence, also range from letter
grade, comments to numerical score (in standardized tests for instance), as said
by Abeywickrama and Brown (2010).
The third crossroad of the aforementioned definitions is that a test must
measure the ability, knowledge or performance of “an individual” (Caroll,
1968; Bachman, 1990; Abeywickrama & Brown, 2010). Therefore, as
perceived by Abeywickrama and Brown (2010), it is important to have a deep
insight into the test-takers’ previous language experience and background.
These data will help tester decide whether the test suits testees’ ability, as well
as give a hand in appropriate score interpretation (Abeywickrama & Brown,
2010).
Among the three scholars, Abeywickrama and Brown (2010) explain the
most clearly what a test can convey by pointing out the three aspects that can
be concluded from test results. These are the “ability”, “knowledge” and
“performance” of test-takers (Abeywickrama & Brown, 2010, p. 3). Although a
test measures performance, it is the test-taker’s ability (or competence) – that is
reflected via the results. Sometimes the knowledge about the language is tested
as well (Abeywickrama & Brown, 2010).
1.1.2. Testing and assessment
Despite some overlaps in the meaning of “testing” and “assessment”,
these two terms are not equivalent. Whereas testing is understood as above,
assessment – as claimed by Mousavi (2009) – is “appraising or estimating the
level of magnitude of some attribute of a person” (p. 36). Educationally
speaking, Abeywickrama and Brown (2010) refer to assessment as “an ongoing
process that encompasses a wide range of methodological techniques” (p. 3).
When a student replies to a question, tries out a new vocabulary item and the
teacher observes, assessment occurs. When a student writes an essay and
submits it to the teacher to get a score, assessment also takes place. Hence,
assessment can be formal or informal, conscious or subconscious, incidental or
intended (Abeywickrama & Brown, 2010, p. 3). This then leads to a conclusion
7
that assessment has a broader meaning than testing, or that tests are only “a
subset of assessment, a genre of assessment techniques (Abeywickrama &
Brown, 2010, p. 3).
1.1.3. Classification of test
Hughes (2003) categorizes tests according to the information they offer.
According to his work, tests are divided into four types: proficiency test,
achievement test, diagnostic test and achievement test. Abeywickrama and
Brown (2010) add aptitude test to the list, making it the fifth test type.
However, the recent unpopularity of aptitude test due to some certain
limitations mentioned in Stansfield and Reed’s study (2004) explains its
absence in the categorization below.
1.1.3.1. Proficiency test
Proficiency test, as explained by Hughes (2003, p. 9), “are designed to
measure people’s ability in a language regardless of any training they may have
had in that language”. Rather than relying on the course content or objectives,
the content of the test is based on what candidates have to be able to do in order
to be considered proficient.
1.1.3.2. Diagnostic test
This kind of test is used to diagnose students’ strengths and weaknesses.
They are intended primarily to identify the needed future instructions.
1.1.3.3. Placement test
As its name suggest, this kind of test is employed to place students at the
stage which suit their current levels or abilities. Since not a single placement
test will work for any circumstance (Hughes, 2003), it is better to be tailormade.
1.1.3.4. Achievement test
Unlike proficiency test, achievement test relates directly to language
courses (Hughes, 2003) with their purpose being how successful learner(s)
8
have been in achieving course objectives. According to Gronlund (1982),
achievement testing plays a central part in all types of educational programs. It
is the most widely used method of assessing learner achievement (Gronlund,
1982; Abeywickrama & Brown, 2010). Gronlund (1982) then defines
achievement test as follows:
“An achievement test is a systematic procedure for determining the
amount student has learned. Although the emphasis is on measuring
learning outcome, it should not be implied that testing is to be done only
at the end of instruction. All too frequently, achievement testing is
viewed as an end-of-unit or end-of-course activity that is used primarily
for assigning course grades.” (Gronlund, 1982, p. 1)
Abeywickrama and Brown (2010) concur with this definition by saying:
“the primary role of an achievement test is to determine whether course
objectives have been met” (p. 9).
It has been learnt from the above definitions that an achievement test
can take place at the end of either a unit or a whole course. Thus, regarding
classification, Hughes (2003) introduces two kinds of achievement test, namely
final achievement test and progress achievement test.
Final achievement test
As its name suggests, final achievement test are “those administered at
the end of a course of study” (Hughes, 2003). The test-writers, hence, might be
education ministries, official examining authorities or members of the teaching
institutions.
There used to be a debate over the content of final achievement test, as
summarized by Hughes (2003), whether it should base on the syllabus content
or on course objectives. Achievement tests following syllabus-content
approach seems to be fairer since learners are examined on what they are
thought to have learnt. However, if the syllabus, the books or materials are
badly designed or selected, results can be misleading since successful
performance in the test may not indicate successful achievement of course
objectives (Hughes, 2003). Then comes the alternative approach where the test
9
is based directly on the objectives of the course. The key part of this approach
lies in the objectives: they need to be explicitly designed. Hughes (2003)
himself expresses his preference towards this approach, since he believes it will
give “more accurate information about individual and group achievement, and
it is likely to promote a more beneficial washback effect on teaching” (p. 11).
Progress achievement test
This kind of test is intended to “measure the progress that students are
making” (Hughes, 2003). Since the word progress refers to the progress of
achieving course objectives, Hughes (2003) supposes that progress
achievement test should relate to the objectives as well.
This notion then catalyzes two trends in conducting progress
achievement test. The first one involves repeated administration of the final
achievement test. After each attempt to take the test, the scores are anticipated
to increase, exhibiting the progress made. The very transparent flaw spotted in
this approach is that students might earn very low scores in the early stages of
the course, which can be discouraging. Another way to develop progress
achievement test is to base it on a well-established set of short-term objectives
(Hughes, 2003). The objectives for each progress achievement test must relate
closely to each other and to the course objectives. In other words, they must
show “a clear progression” towards the final test (Hughes, 2003, p. 12).
The study opted to design a final achievement test, since it is the most
efficient instrument to decide how successful students are in achieving course
objectives. Moreover, the research was conducted on 8th graders, who hardly
have to take proficiency, diagnostic and placement tests. Furthermore, the
objectives of the course are publicly available, which made it easier to
construct a final achievement test.
1.1.4. Two major issues in modern language testing
Before moving on to the development and validation of classroom
language tests, it is necessary to have an overview of some current issues in
10
language assessment. According to Abeywickrama and Brown (2010), these
issues involve behavioral influences, different approaches to language testing
and performance-based assessment. The following paragraphs attempt to look
at some of the issues, especially those that are related directly to the research
topic, through reviewing the work of different researchers in the domain.
1.1.5.1. Communicative language testing
This “hot topic” (Abeywickrama & Brown, 2010) is, in fact, not new in
the field of language assessment. Together with essay-translation, structuralist
and integrative approaches, communicative language testing has been
mentioned long ago in Heaton’s work (1988) as one of the main approaches to
testing.
As stated by Heaton (1988), communicative tests focus mostly on “how
language is used in communication” (p. 19). Consequently, tests following
communicative approach tend to adopt tasks that are approximately close to
real-life ones (Heaton, 1988). The underpinning belief, according to Heaton
(1988), is that success is determined upon the communication effectiveness,
instead of formal linguistic accuracy. Bachman and Palmer (1996) share this
viewpoint by stating: “In order for a particular language test to be useful for its
intended purposes, test performance must correspond in demonstrable ways to
language use in non-test situations” (p. 9).
Communicative language testing approach is often linked to the
integrative
one
(Heaton,
1988;
McNamara,
1996;
Fulcher,
2003;
Abeywickrama & Brown, 2010). This association roots from the fact that both
approaches emphasize an utterance’s meaning rather than its form and
structure. However, one fundamental dissimilarity of these two lies in their
focus. The integrative approach promotes the idea of testing several abilities
simultaneously instead of separately (Oller, 1979, as cited in Fulcher, 2003).
Meanwhile, communicative testing approach tends to highlight the role of
communication that resembles realistic ones.
11
Heaton (1988), therefore, mentions two significant features of this
approach, namely the emphasis on context and the use of authentic materials.
This view is later reinforced by many scholars in the field (Skehan, 1988;
Fulcher, 2003; Leung & Lewkowicz, 2006). The need for authentic tasks and
genuine texts then poses a challenge to test designers. Not only should the
kinds of real-world tasks be identified, the sampling of these tasks also requires
thorough validation (Abeywickrama & Brown, 2010). Weir (1990) even
emphasizes the complexity of these procedures by saying: “to measure
language proficiency... account must now be taken of: where, when, how, with
whom and why language is to be used, and on what topics, and with what
effects” (p. 11).
Nevertheless, when the challenges are conquered, the communicative
testing approach will bring about certain advantages. Firstly, it introduces the
concept of qualitative assessment modes, which is more preferred than the
quantitative one. Qualitative assessment – represented by detailed statement of
performance level – helps increase the reliability of the scoring (Heaton, 1988).
Secondly, a more “humanistic attitude” (Heaton, 1988, p. 21) is conveyed
through the application of this testing approach. Rather than being compared to
other performances, student’s performance is assessed based on his/her ability
to perform the language tasks. Last but not least, more informative feedback
might be provided to both teachers and testees. Via the presented written
descriptions in the rubric, instructors and students will learn more about the
problematic areas in the performance (Heaton, 1988). It is this feedback that
they need rather than merely numerical test scores.
The new English 8 textbook claim its communicative teaching approach
in its very beginning pages (Ministry of Education and Training, 2012). Some
insights about this approach, hence, is of utmost importance. As the research
revolves around designing communicative test, this part helps highlight some
issues that worth considering, such as the need of authentic tasks in the test.
12
1.1.5.2. Performance-based assessment
In language courses worldwide, attention is now drawn towards this
student-centered agenda of assessment (Bachman, 2002; Leung & Lewkowicz,
2006). As concurred by Price, Pierson and Light (2011), performance-based
assessment belongs to the top six most effective classroom assessment
strategies of the 21st century.
There are various definitions for performance-based assessment,
including those by Fitzpatrick and Morrison (1971), Chalhoub-Deville (1995),
Haertel (1992) and Abeywickrama and Brown (2010). These interpretations
share some common ground regarding the complexity as well as the closeness
to reality of the performance task. For Abeywickrama and Brown (2010),
performance-based assessment acts as an alternative to paper-and-pencil tests
which engages “oral production, written production, open-ended responses,
integrated performance (across skill areas), group performance, and other
interactive tasks” (p. 16).
One of the most outstanding characteristics of performance-based
assessment, as its name suggests, is that communicative performance has to be
elicited. This feature is mentioned by McNamara (1996) in his work, where he
claims that learners are required to do some relevant tasks. Abeywickrama and
Brown (2010) agree with this and term this kind of task an “interactive” one (p.
16). Due to the presence of task(s) in this approach, performance-based
assessment is also dubbed alternatively as task-based assessment.
As the study aims at developing a test that measure 8th graders’ oral
production, it should follow performance-based assessment strategy. A brief
review of performance-based assessment has highlighted the need to elicit a
speaking performance in test following this assessment strategy.
1.2. Developing classroom language tests
Due to its importance towards teaching and learning, test development is
never considered an easy process. It takes test designers a lot of time and effort
13
to plan a good test. As Abeywickrama and Brown (2010) put it, constructing a
good test “is a complex task involving both science and art” (p. 4).
Different models of test construction have been introduced by various
famous scholars in the field. A common structure is one proposed by
Abeywickrama and Brown (2010), which is demonstrated via the following
diagram:
Figure 1. Steps to designing an effective test
(Abeywickrama & Brown, 2010, p. 81)
According to the flowchart, the first step to construct an effective test is
to determine its purpose/usefulness. After this purpose is clarified, concrete
objectives of the test are then harbored. The test writers will base on these
objectives to draw up test specifications. Subsequently, the test tasks and items
are either designed or selected, before being put in a systematic order. The
system of scoring and grading, in this model, is conducted after the test
administration procedure.
Another model of test planning is that developed by McNamara (1996),
specially designed to measure language performance. This ten-step model
appears to be more specific and manages to take into account the characteristics
of a performance-based test. The stages involve rendering test rationale,
acknowledging
resources/constraint,
selecting
content,
developing
specifications, piloting, selecting and training raters, analyzing trial data,
revising materials and specifications, setting standard and lastly, implementing
and monitoring the test.
14
Step 1. Test rationale
Articulating test rationale is concurred to be the first and most important
step in constructing any classroom test (Bachman & Palmer, 1996; McNamara,
1996; Abeywickrama & Brown, 2010), since it “constrains the test
specification in basic ways” (McNamara, 1996, p. 92).
Test rationale, or test purpose, can be interpreted as the reasons for the
introduction of the test (McNamara, 1996). According to McNamara (1996), in
order to visualize the rationale, test designers have to seek answers to several
questions. These questions might include who wants to know the results, what
they want to know about, about whom the information is required and so on.
Step 2. Resources/constraints
Performance test, according to McNamara (1996), can be costly. It
involves thorough rater training and repetition of trial test administration, both
of which require considerable expense. The test and test development,
resultedly, need to be practical in terms of time, financial resources, personnel
resources and administrative load.
Step 3. Content selection
McNamara (1996) believes the real-life communicative tasks should be
carefully studied before being sampled in a performance-based test. Ideally, the
research of the target language use situation should cover these four
components: consultation with expert informants (people responsible for
instruction), literature search, observation and examination of texts.
Step 4. Development of specifications (including scoring procedures)
and writing of materials
Test specifications are often regarded to as the “blueprint” of a test
(Alderson, Clapham & Wall, 1995, p. 9; Bachman, 1996, p. 176;
Abeywickrama & Brown, 2010, p. 59). Test specs are based on as an outline by
test and item writers to produce different versions of the test. These data often
involve the characteristics of texts, the format of items or tasks, the exact
15