Table Of Content

THE UNIVERSITY OF EDINBURGH MORAY HOUSE SCHOOL OF EDUCATION Language Testing Part 1 Reflective Journal The first case is when we were designing reading comprehension section of our test, disagreement appeared. My group members proposed to use cloze in the test and I agreed, but when they decided to include cloze in the reading comprehension section, I felt that there might be some problems. I agreed to use cloze because I agree with Gellert and Elbro’s statement that, “Cloze tests of reading have a relatively high number of items per 100 words as compared with other formats with intact texts” (2013, P. 16). This indicates that using cloze in a test is quite time efficient. However, I doubted whether it could be used to test reading comprehension as I held that cloze only tests grammar, vocabulary, and at most sentence level comprehension. I had this view mainly because I experienced a lot of them when I was in senior high school. As cloze is one part of the College Entrance Examination, every practice exams, mid-term or final exams, or mock exams have it and it is a section separate from reading comprehension one. They left me an impression that they were more or less testing discrete grammar or vocabulary knowledge such as tense, collocation, synonyms, antonyms, and idioms etc. with or even without sentence level comprehension. I remembered that sometimes, even if the text was too difficult and I could hardly understand it, I could still get 6-7 out of 10 because I know specific grammar or vocabulary points. My view is more or less similar to Carlisle and Rice’s view, which warns that, Cloze tests are more inclined to assess “local comprehension”, that is to say, the “grammatical and semantics constraints on meaning” (2004, P. 535). There is also research indicating that, instead of being sensitive to higher-order comprehension processes, some cloze tests appear to focus on assessing “decoding and word-level processes” (Keenan et al., 2008). However, one of my group members had exactly the opposite experiences and she said that she felt that if she could not understand the whole text, she could get only 2-3 out of 10. Also, in her tests at university, cloze is one part of reading comprehension section. I was surprised and decided to look further into whether cloze is an appropriate format for testing reading comprehension or not. The first thing that changed my stereotype is an example provided by Gellert and Elbro, which shows that local context would suggest an incorrect answer while “understanding of the subtext at the unspoken socio-emotional level” (2013, P. 17) would indicate the right answer. I kept in mind that cloze only assesses surface level word or sentence processing but the example reflected exactly the opposite opinion that cloze can actually be used to test understanding of the text beyond separate grammar or vocabulary points at sentence level. Becoming aware that cloze could be used to test higher-order comprehension abilities, I was still struggling whether to choose cloze or question-answering comprehension test. According to Gellert and Elbro’s research, “a cloze test can be developed to provide results on a par with a standard comprehension measure in terms of reliability and validity” (2013, P. 25). However, due to some issues concerning practicality, for example, cloze is time efficient, and it is “easy and quick to use” (Gellert & Elbro’s, 2013, P. 25), I chose to abandon question-answering comprehension format and use cloze in reading comprehension. Inspired by the experiment done by Cain, Patson and Andrews (2005), which indicates that, when it comes to reinserting words, which are cohesive ties, competent comprehenders tend to do better than poor comprehenders. I deliberately left a blank, which is item 2 in Cloze, and provided one distractor based solely on the knowledge of the conjunction in the sentence. I did a pilot test and 21 out of 35 participants chose the distractor. I asked some of the participants about their choices. People who chose the correct answer explained based on their comprehension of the whole paragraph or text while those who chose the wrong answer tended to explain based on their knowledge of the conjunction “or”. This experiment reinforces my belief that cloze is a useful tool in assessing overall reading comprehension instead of only word or grammar knowledge. The second case is when my group members and I were designing the items of the listening section of the test, we all agreed on the use of Multiple-Choice Questions as this format has the “advantage of sampling broad domains of knowledge efficiently and hence reliably” (Sim & Rasiah, 2006, P. 67). However, we are aware of the demerits of MCQs so we decided to add another item type, which hopefully could be of higher construct validity. We had an argument concerning which format might be more valid, reliable and practical. My group members suggested that we could use dictation while I was more in favor of gap filling on summaries. I thought that they wanted to use dictation because that item could be easily constructed. If we want to use dictation in the test, we just need to choose a text and play it four times, during which the first and fourth time at normal speed while the second and third time at low speed with pauses. However, when thinking about validity, reliability and practicality, I was totally against using this format. First of all, considering Target Language Use (Bachman and Palmer, 1996, P. 44-45), and the specifications of this test, according to my past experience, it is very unlikely that students who are going to the university will need to complete a dictation task in their academic life. Possibly, a task in a listening test, which is not closely related to the TLU, would affect the validity of the test because as Wagner suggests, “the characteristics of the test tasks should be similar to and representative of the real-world tasks of the domain that the test is trying to assess” (2013, P. 179). Previously, I doubted that whether TLU is important or not. If I define that the constructs are only linguistic knowledge such as vocabulary and grammar and I use dictation, which I thought at the beginning only tests linguistic knowledge, to test students, I can say that the inferences drawn from the scores are valid as I am measuring exactly the constructs. However, reminded by my group members, considering the context and specifications of the test, it is a proficiency test, which aims at determining whether certain students’ English is proficient enough for university study. I realize that only testing linguistic knowledge is insufficient and could possibly be meaningless. Admittedly, compared with the task, gap-filling on summaries, which is “hard to develop” (Templeton, 1977), it is simple to construct and administer a dictation (Irvine, Atai & Oller, 1974). However, “scoring the standard form of the dictation may be a challenge” (Cai, 2012, P. 182). I remember when I was at university markers always had an argument concerning the inter-rater reliability. Some markers think that mark should not be deduced due to punctuation errors, minor spelling mistakes and missing words, which do not impede communication. While some other markers hold that everything should be exactly the same as the original text, or test-taker should lose marks. It is unfair for those students who only make punctuation mistakes or minor spelling mistakes to get the same mark as those who have problems understanding the key words. What’s more, marking dictation takes a lot of time and is really demanding for markers. After suggesting these problems dictation may cause, my group members said that dictation is not the only one I mentioned above, it can be changed to suit different needs and context. Afterwards, I read something about dictation and I came to realize that in order to ease raters’ burden of marking the entire texts and “improve its connection to real-life tasks, dozens of variations have been invented” (Davis & Rinvolucri, 1988). I never expected that various types of dictation had emerged as I was confined to my own experiences. Sometimes, past experiences can lie as things are changing all the time. My group members then suggested that partial dictation might be an alternative. I thought about it and agreed that it is similar to tasks such as note taking in real academic life, the rating scale is easy to set and it is much easier to mark. However, as Buck (2001) proposes, listening construct includes not only “knowledge of the sound system and understanding local as well as full linguistic meanings”, but also “inferred meanings and communicative listening ability”. I held that partial dictation only tests what Anderson and Lynch (1988) regard as “lower-order knowledge”, that is to say, knowledge of the language system, while the other format, gap-filling on summaries, that I intended to use tests “high-order knowledge”. I had this idea mainly because when I was having a dictation test at university, I could sometimes write down the whole sentence without understanding the text. I felt that process was like recognizing different words and write them down. I told my group members about my experience and they gave me a partial dictation to do. I felt surprised that in 4 out of 6 blanks, I could not write down every exact word in that sentence if I only listened to that sentence without understanding what was said before or after. I began to wonder why there would be such a difference. Therefore, I read some studies in aspects of partial dictation. Cai’s study finds that partial dictation not only is “easy to construct”, but also a “consistent test can be obtained from it” (2012, P. 195). She also points out that both lower-order and high-order abilities may be involved (2012, P. 195). After thinking about it, I accepted this test format, dictation, in our test. After the two cases, I began to think back and wondered why my views have changed. In case one, I realize that my group member and I had different opinions towards cloze because the purposes of the tests are different, that's is to say, the cloze test I took was designed to test specific knowledge of grammar and vocabulary while the one my group member took was designed to test reading comprehension. This is possibly the reason why the cloze tests I took were separated from the reading comprehension section. I also realize in case one that cloze does not test comprehension, it is the items selected and deliberately designed that test reading abilities. This echoes with what Gellert and Elbro say, “A cloze test of comprehension needs comprehension-demanding gaps” (2013, P. 18). This realization is also reflected in case two. I had discrimination towards partial dictation mainly because I took it for granted that certain item types tend to test lower-order abilities while others tend to test higher-order ones and that certain item type was born with higher reliability or validity than the other. It is possible that some of the dictation tasks I had when I was at university were merely word recognition was due to the problematic design of the item. This point is of great importance to our design of partial dictation items. As Buck suggests, “to guarantee good construct validity, blanks in tasks need to be designed on purpose to test higher-order knowledge” (2001). I realize that a certain test method does not test lower-order or higher-order abilities. It is the items designed that test them. Therefore, we need to replace the sentences with blanks where there are more content words and high information load, which requires test-takers not only understand specific sentences but also understand the whole text and activate their “background and procedural knowledge” (Anderson & Lynch, 1988). “No one single item type has been found useful in and by itself. “ (Spaan, 2007, P. 279) This also has implications for the design of MCQs. I realize that if not carefully or strictly designed, MCQs may not necessarily have high reliability as I assumed. It is interesting to see how my attitudes have changed towards cloze and dictation and how my knowledge of, not only these two, but also some other item types have developed. Besides all the reflection mentioned above, another thing impresses me is the importance of group work in designing a test. I become aware of how valuable other members’ ideas are, because sometimes they provide inspiring insights into some aspects of something, which I have never thought about or have ignored due to e.g. stereotype. Both cases indicate that the knowledge shaped by my own experiences can be limited and the reluctance to consider and accept others’ opinions will hold my knowledge development back. Part 2: Comparison and evaluation of the two tests Both the International English Language Testing System (IELTS) and the Pearson Test of Academic English (PTE Academic), according to their websites, are English proficiency tests. IELTS claims its results to be “a secure, valid and reliable indicator of true-to-life ability to communicate in English for education, immigration and professional accreditation” (IELTS website) and similarly, PEAT results are an English language qualification to pass the UK’s requirements for settlement and naturalization (PTE website). As Taylor and Geranpayeh suggest, created in 1989, IELTS “testifies to a strongly contextualized approach to assessing advanced language proficiency and shapes later development of test designs” (2011, P. 90), which include the new PTAE. However, whether PEAT is really a development of IELTS is still under debate. Firstly, when comparing the two tests, there appears to be some difference in the content of the audio input and task types. In PTAE, all the texts are about academic life, for example, lectures and presentations about science etc. Even daily communications are also related to academic domain. So are the items, for example, to test summarizing and taking notes etc. Nevertheless, in IELTS, the first two parts are more about social life, e.g. the application for homestay and a tour while section 3 and section 4 are more about academic lectures and presentation. The items also test students’ ability on campus as well as off campus. As early in 1982, UCLES emphasized a shift from “literature-oriented texts” to “authentic spoken English in a variety of realistic contexts” (1982, P. 28).

Description:

Transcript. Scientists are discovering that when you touch someone, you communicate very specific emotions such as sympathy, disgust, gratitude, or even love. The current issue of the scientific journal. Emotion features a series of studies about touch. Reporter Michelle Trudeau touched base with t

PTE Academic Unscored Practice Test: Listening PDF

63 Pages·2011·2.16 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview PTE Academic Unscored Practice Test: Listening

Description:

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.