The Evaluation on the Reliability and Validity of CET-SET 6

2022.11.03

Han?Yiqiong

【Abstract】The College English Test-Spoken English Test （CET-SET） is a nationwide spoken English test including two different bands – CET-SET 4 and CET-SET 6. It has been designed to test the oral communicative ability of university or college students in China. This article intends to evaluate CET-SET 6 in details as there are limited studies on it. With the general description of CET-SET， this article will evaluate the test in terms of its reliability and validity and give suggestions to modify CET-SET 6.

【Key words】CET-SET 6; reliability; validity; suggestions

【作者簡介】Han Yiqiong， Guangzhou University Sontan College.

1. Introduction

The College English Test-Spoken English Test （CET-SET） is affiliated to the College English Test （CET） - a criterion-related norm-referenced test in China. As one of the large-scale and high-stake standardized language test in China， CET has been more frequently researched over the past ten years （Cheng， 2008）. Researchers have studied CET from different aspects： Ren （2011） used questionnaire and interview to find out the effects of CET on English education in universities of Tianjin City; He and Dai （2006） studied CET-SET group discussion based on a corpus of test performance and found the candidates interaction is at a low degree in the group discussion; Li （2009） studied CET writing and found the teachers did not teach to the test. However， few researchers evaluate the reliability and validity of CET-SET 6 in detail， which this paper intends to do.

2. The Description of CET-SET 6

Authorized by Ministry of Education， the National College English Testing Committee （NCETC） has administered CET since 1987. CET aims to assess the fulfillment of College English Teaching Syllabus in college， measure the English communication ability of non-English majors at the tertiary level and provide feedback to teachers and students （Wang， Yan， and Liu， 2014）. It has been said to be a high-stake test because CET-4 certificate is a prerequisite for graduation or bachelors degree in many universities in China （He and Dai， 2006; Ren， 2011; Wang， Yan， and Liu， 2014：）. CET is comprised two levels of tests： CET-4 is to check the basic requirements in College English Curriculum Requirement （CECR）; CET-6 is to check the intermediate requirements （Du， 2012）. Before 2016， the total score was shown on CET report with three sub-scores from writing and translation， listening， and reading. Speaking was absent from the test report because speaking ability was ignored. In 1999， NCETC started CET-SET which tests English communication ability of students from higher education in China. From 2016， candidates can take CET-SET 6 without any limitation. Since the interview test was replaced by the Internet-based one， this article will only focus on the Internet-based CET-SET 6.

The Internet-based CET-SET 6 is conducted in a small group including one virtual examiner who in fact works as an interlocutor and two candidates. The oral test consists of three parts. In the first part， candidates take turns to give a self-introduction in 20 seconds， and then they answer one question asked by the interlocutor simultaneously in 30 seconds. The question would be related to the topic of the test. The second part is individual presentation and group discussion. In this part， every candidate take turns to make a 90-second presentation with a given a visual prompt （sentences or pictures） on the screen after 60-second preparation. The information given to the candidates in the same group are about the same topic. After the presentations， candidates are instructed to take part in the group discussion on the given topic and try to reach an agreement in the end. In the third section， two candidates simultaneously answer a question about the given topic from the interlocutor in 45 seconds. The total length of the test is around 18 minutes.

3. The Evaluation of Reliability in CET-SET 6

If the scores would have been more similar， the test is said to be more reliable （Hughes， 2003）. In other words， reliability means the consistency in scores regardless of when and how many times a particular test is taken. The reliability of CET-SET 6 is achieved by the practices of the standardized administration procedures， the format of the interlocutors engagement， and the standardized rating procedures （Zhang and Elder， 2009）. The procedures of the testing are regulated and organized for the interlocutor and candidates so it makes no difference on the test score whether the time or the location the candidates take the test. The interviewer variability is also confirmed to have influence on the test and scores （Van Moere， 2006）. However， in CET-SET 6， the interviewers engagement in CET-SET 6 will not affect the test result， since the interviewer simply reads the instructions and questions in the test.

Besides， the standardized scoring procedures can minimize variations in the process of scoring and the scoring criterion （Zhang and Elder， 2009）. The candidates performance in CET-SET 6 is scored by the authorized and trained raters with the formal rating scale designed on the requirements of CECR. The scoring criteria are also irreplaceable in the score reliability which is an essential component of the test reliability （Bachman and Palmer， 1996）. In CET-SET 6， the criteria are specified into three aspects： 1） accuracy and range; 2） length and coherence; 3） flexibility and appropriateness. The candidates performance is scored on a scale from 1 to 5 based on the criteria. With the scoring criteria and the standard samples for reference， rater can make a proper judgement on candidates performances. This analytical scoring can make an oral test achieve a higher score reliability （Li， 2011）. Moreover， test score in interview is inevitably subjective， which causes the inconsistency in the ratings （Bachman， 1990）. However， the potential sources of inconsistencies cannot be diminished entirely （Bachman and Palmer， 1996）.

4. The Evaluation of Validity in CET-SET 6

Validity means whether a test measures what it is intended to measure （Hughes， 2003）. It is the most important consideration in the test development， interpretation and use （Bachman， 1990）. Among the different types of test validity， the most common ones are content validity， criterion-related validity， face validity and construct validity （Li， 2011）.

Content validity means the content of the test should contain a representative sample of a language skill， structures， etc. （Hughes， 2003）. A speaking test has content validity only if it tests a proper sample of the related structure such as dialogue or discussion etc.， which is easy to identify from the description of CET-SET 6. Besides， the three parts of CET-SET 6 actually test a variety of contents because candidates complete the tasks via description， negotiation， persuasion， debate， argumentation etc. which are required in CECR （Yang， 2003）. Above all， CET-SET6 has high content validity.

Criterion-related validity is divided into concurrent validity and predictive validity. To get concurrent validity， testers need to compare the performance of a randomly sample of students with that of all students. The similarity between the two groups shows the concurrent validity of the test： the more similar they are， the higher the concurrent validity is. Predictive validity measures the degree to which a test can predict interviewees future performance. There is little research on the criterion-related validity of CET-SET.

Face validity means the test looks as if it measures what it is supposed to measure （Hughes， 2003）. Face validity can be gained in a speaking test if the testing uses direct method such as dialogue， group discussion， role play etc.， by copying a similar context for the use of target language （Li， 2011）. CET-SET 6 tries to provide a real-life interactive context for achieving the face validity. However， it is difficult to judge whether it has high face validity. Since the test interview is conducted in a small group， each candidate has his different attitude towards the test.

Construct validity refers to the extent of the consistency between the performance on test and test purpose （Bachman， 1990）. If the test measures the ability which it is intended to measure， it is said to have construct validity. As the performance does not fit the types of test task and result， the construct validity of CET-SET 6 is considered relatively low （Jing and Ma， 2012）.

5. The suggestions to CET-SET 6

With the above discussion， there are suggestions to modify CET-SET 6. Firstly， Ma （2014） concluded that the Spoken English test should be included as one inseparable part of the entire College English Test for the sake of validity of a test. This is also one of my suggestions to CET-SET 6. Secondly， CET-SET 6 needs to involve more question-and-answer interaction in the third part， which is also proposed by Lei （2019） in his study on IELTS speaking test and CET-SET 4/6. Since daily communication seldom stops with only a question and answer， the third part should have more interaction. Thirdly， CET-SET 6 needs to enrich its content by referring to the content of IELTS Speaking Test （Lei， 2019）. CET-SET 6 should choose hot issues of daily life to allow candidates to express their ideas and interact with each other so as to better assess candidates communicative ability. This will enhance the reliability and validity of CET-SET 6 and achieve the test aim as well. Lastly， raters can work with the computer-automated scoring system because the differences between raters rating are unavoidable. It may make the rating more complicated， but it helps to ensure the reliability and validity of the test score.

References：

[1]Bachman， L. Fundamental considerations in language testing[M]. Oxford： Oxford University Press， 1990.

[2]Bachman， L. & Palmer， A. Language testing in practice[M]. Oxford： Oxford University Press， 1996.

[3]Cheng， L. Y. The key to success： English language testing in China [J]. Language Testing， 2008，25（1）：15-37.

[4]Du， H. College English teaching in China： responses to the new teaching goal[J]. TESOL in Context， 2012，22（S3）：1-13.

[5]He， L. Z. & Dai， Y. A corpus-based investigation into the validity of the CET–SET group discussion[J]. Language Testing， 2006，23（3）： 370-401.

[6]Hughes， A. Testing for language teachers[M]. Cambridge： Cambridge University Press， 2003.

[7]Jing H. W. & Ma L. L. A Study of the CET-SET Validity[J]. Journal of Gansu Normal Colleges， 2012，17（6）：97-99.

[8]Lei， X. The Implication on the Preparation and Testing on CET-SET from IELTS[J]. Overseas English， 2019，13：79-80.

[9]Li， H. L. Are Teachers Teaching to the Test？ A Case Study of the College English Test （CET） in China[J]. International Journal of Pedagogies and Learning， 2009，5（1）：25-36.

[10]Li， W. Validity Considerations in Designing an Oral Test[J]. Journal of Language Teaching & Research， 2011，2（1）：267-269.

[11]Ma， F. College English Test： To Be Abolished or To Be Polished [J]. Journal of Language Teaching & Research， 2014，5（5）：1176-1184.

[12]Ren， Y. A study of the washback effects of the College English Test （band 4） on teaching and learning English at tertiary level in China[J]. International Journal of Pedagogies and Learning， 2011， 6（3）：243-259.

[13]Van， M. A. Validity evidence in a university group oral test[J]. Language Testing， 2006，23（4）：411-440.

[14]Wang， C.， Yan， J. & Liu， B. An Empirical Study on Washback Effects of the Internet-Based College English Test Band 4 in China[J]. English Language Teaching， 2014，7（6）.

[15]Yang， Y. L. On the Validity and Reliability of CET-SET[J]. Journal of Sanming College， 2003，20（4）：152-156.

[16]Zhang， Y. & Elder， C. Measuring the speaking proficiency of advanced EFL learners in China： The CET-SET solution[J]. Language Assessment Quarterly， 2009，6（4）：298-314.