This study examined the relation of TOEFL® performance to a widely used variant of the cloze procedure–the multiple-choice (MC) cloze method. A main objective was to determine if categories of MC cloze items could be identified that related differentially to the various parts of the TOEFL. MC cloze items were prepared and classified according to whether the involvement of reading comprehension, as defined by sensitivity to long-range textual constraints, was primary or secondary. For two categories, reading comprehension was primary and knowledge of grammar or vocabulary was secondary, and for two other categories knowledge of grammar or vocabulary was primary and reading comprehension secondary. Examinees taking an operational TOEFL at domestic test centers were given the three basic sections of the test along with a fourth section containing the MC cloze items. Performance was examined for each of nine major language groups. Exploratory and confirmatory factor analyses for the basic TOEFL were performed first, to provide a basis for relating the MC cloze items to the TOEFL structure. These factor analyses suggested that, from a practical standpoint, TOEFL performance can be adequately described by just two factors, which relate to (a) Listening Comprehension, and (b) all other parts of the test–Structure, Written Expression, Vocabulary, and Reading Comprehension. Examination of the MC cloze test showed that the total MC cloze score was relatively reliable and that it was possible to estimate item response theory parameters for the MC cloze items with reasonable accuracy. Thus, the development of the MC cloze items was successful in these respects. However, the correlations among scores for the four MC cloze item categories were approximately as high as their reliabilities, thus providing no strong empirical evidence that the item types within the MC cloze test reflected distinct skills. Correlational analyses related the four MC cloze categories to the five parts of the TOEFL. These analyses revealed a slight tendency for MC cloze items that involved a combination of grammar and reading to relate more highly to the Structure and Written Expression parts of the TOEFL than the other parts, and for MC cloze items that involved a combination of vocabulary and reading to relate more highly to the Vocabulary and Reading Comprehension parts of the TOEFL than the other parts. Although this pattern was relatively consistent across language groups, however, the differences among correlations were not substantial enough to be of practical importance. Multiple regression analyses were performed, using total MC cloze score as the dependent variable and the five TOEFL parts as independent variables. The resulting multiple Rs were mostly in the lower to upper .90s, suggesting that total MC cloze performance can be predicted from TOEFL performance with a relatively high degree of accuracy. In general, the study provided no evidence that distinct skills are measured by the nonlistening parts of the TOEFL or by the four categories of MC cloze items. It would appear that the skills associated with grammar, vocabulary, and reading comprehension are highly interrelated, as assessed by the TOEFL and the MC cloze test.

Figures - uploaded by Charles Stansfield

Author content

All figure content in this area was uploaded by Charles Stansfield

Content may be subject to copyright.

ResearchGate Logo

Discover the world's research

  • 20+ million members
  • 135+ million publications
  • 700k+ research projects

Join for free

A preview of the PDF is not available

... With an increasing level of difficulty, TOEFL takers tend to have a very high chance of making errors (Hale et al., 1988). This is also similar to when L1 interference is related to mistakes and errors (Watcharapunyawong & Usaha, 2013 to a noticeable deviation from the adult grammar of a native speaker (H. ...

  • Rolisda Yosintha Rolisda Yosintha
  • Sukma Shinta Yunianti
  • Boris Ramadhika

This study aimed at investigating the students' linguistic and non-linguistic constraints in doing the Structure and Written Expression section of the TOEFL. This was a qualitative study in the form of a case study and was carried out in two universities in Magelang, Central Java, Indonesia. The data comprised document analysis on 42 students' answer sheets on TOEFL and interviews with four students. The data were analyzed using the difficulty index (IF) formula proposed by Brown (2004) and the interactive model developed by Miles and Huberman (1994) for quantitative and qualitative data, respectively. The findings revealed that the students encountered three linguistic constraints in terms of grammatical items as caused by both intralingual and interlingual interference: active-passive verbs, double comparatives, and pronoun-noun agreement. Besides, non-linguistic factors such as unpleasant past learning experiences and limited exposure to the L2 worsened their performance on the test.

... In the cloze literature, there have been various suggestions on how to use MCC test techniques in class (Bortnick & Lopardo, 1973;Carr et al., 1989;Eze, 2015;Grant, 1979;Hale, Stansfield, Rock, Hicks, Butler, & Oller, 1988;Mostow, Huang, Jang, & Weinstein, 2017;Weaver, 1979). Cameron, Linton, & Hunt (1987) concluded that "as instructional devices, when used in conjunction with peer discussion, cloze passages increase linguistic flexibility." ...

We explored using multiple-choice cloze (MCC) tests for classroom instruction. The practice of "testing leading teaching" is frequently criticized because it might distort the original teaching objectives. We do not primarily emphasize how to get high scores; instead, we show how to use testing techniques and teaching activities to provide feedback that energizes teaching methods and increases learning effectiveness. We analyzed MCC test-taking strategies, which include leading students to: 1) skim for the first and the last sentence in cloze passages; 2) read the whole cloze passage to grasp its general idea; 3) look for contextual clues; 4) orally express ("thinking out loud") their reasons for choosing one MCC test item instead of another; and 5) conduct group discussions. Finally, 6) teachers guided the entire class, discussed contextual and situational clues, and provided feedback about student choices and reasons. The experimental design of this research primarily compared the performance between two groups: Experimental and Control. Differences in cloze scores between the two groups were significant, but differences in reading comprehension scores were not. After six 25-minute MCC test lessons, Experimental group students had better MCC test scores than did Control group students. Our findings supported our hypothesis that MCC instruction, even for a short time, would improve performance on a cloze test. We also discuss how to use MCC tests to teach strategies for answering MCC test items.

  • John Oller John Oller

This list was updated on January 28, 2021 at 1:22 PM Central Standard Time. It is intended to make the works in question accessible for free, as much as is possible.

This study investigated the construct validity of a local speaking test for international teaching assistants (ITAs) from a fairness perspective, by employing a multi-group confirmatory factor analysis (CFA) to examine the impact of task type and examinee first language (L1) background on the internal structure of the test. The test consists of three types of integrated speaking tasks (i.e., text-speaking, graph-speaking, and listening-speaking) and the three L1s that are most represented among the examinees are Mandarin, Hindi, and Korean. Using scores of 1804 examinees across three years, the CFA indicated a two-factor model with a general speaking factor and a listening task factor as the best-fitting internal structure for the test. The factor structure was invariant for examinees across academic disciplines and L1 backgrounds, although the three examinee L1 groups demonstrated different factor variances and factor means. Specifically, while Korean examinees showed a larger variance in oral English proficiency, Hindi examinees demonstrated a higher level of oral proficiency than did Mandarin and Korean examinees. Overall, the lack of significance for multiple task factors and the invariance of factor structure suggest that the test measures the same set of oral English skills for all examinees. Although the factor variances and factor means for oral proficiency differed across examinee L1 subgroups, they reflect the general oral proficiency profiles of English speakers from these selected L1 backgrounds in the university and therefore do not pose serious threats to the fairness of the test. Findings of this study have useful implications for fairness investigations on ITA speaking tests.

  • Philip K. Oltman
  • Lawrence J. Stricker Lawrence J. Stricker
  • Thomas S. Barrows

Responses on the TOEFL© test may reflect both the influence of the examinees' native language and their level of English proficiency. The aim of this study was to appraise the effect of these examinee variables on the structure of the test. The interrelations among TOEFL items, using all of the information provided by the various responses to the items (the four alternatives, omitted, and not reached), were analyzed by three-way multidimensional scaling for samples of examinees systematically varying in native language and level of English proficiency. Four dimensions were identified: three corresponded to the sections of the test, and the fourth was an end-of-test phenomenon. The dimensions were predominantly defined by easy items and were most salient for low-scoring examinees. The salience of the dimensions did not differ for the various language groups, except for the end-of-test dimension. Major conclusions were that the TOEFL's construct validity is supported, the test's interpretation varies with the examinees' English proficiency, easy and difficult items differ in their potential for diagnosis and global screening, and the dimensionality of the TOEFL and of competence in English depend on the examinees' English proficiency.

The TOEFL testing program is currently exploring a change in Section 3 of the TOEFL® test that would replace the vocabulary subpart with additional reading comprehension questions. This change has been proposed by internal test development specialists and is supported by external experts in the field of English as a second language. The purpose of this study was to investigate the proposed revision to Section 3 in terms of the length and timing that would be necessary to address concerns of test speededness of the section. The study was carried out using an experimental design with test length and testing time defined as independent variables, and examinee test performance defined as the dependent variable. In addition, several psychometric issues relating to the proposed revision to Section 3 were investigated as part of the study. The results of the study supported the implementation of a revised TOEFL Section 3 consisting of five reading passages with a total of 50 items. The results of the study also suggested that a total testing time of no less than 55 minutes should be allowed for the revised TOEFL Section 3. Additional psychometric analyses indicated that the current TOEFL score scale can be maintained with the revised Section 3, and that the proposed revisions will not appreciably affect the reliability and validity of Section 3 of the TOEFL test.

This study assessed the factor structure of the LanguEdge™ test and the invariance of its factors across language groups. Confirmatory factor analyses of individual tasks and subsets of items in the four sections of the test, Listening, Reading, Speaking, and Writing, was carried out for Arabic-, Chinese-, and Spanish-speaking test takers. Two factors were identified, Speaking and a fusion of the other sections of the test. The number of factors, the factor loadings, and the factors' error variances were invariant in the three samples, although the correlations between the factors differed. The failure to find separate factors for each section of the LanguEdge test necessarily raises questions about the test's functioning that need to be resolved.

  • Mary Schedl Mary Schedl
  • Ann Gordon
  • Patricia A. Carey
  • K. Linda Tang

The issue of exactly what is measured by different types of reading items has been a matter of interest in the field of reading research for many years. Language teaching and testing specialists have raised the question of whether a reading test for foreign students wishing to enter university in the United States should include questions testing abilities beyond linguistic and very general discourse competencies, or indeed whether it is possible to separate these language competencies from other competencies. The purpose of this study was to investigate the dimensionality of the TOEFL® reading test, based on the specifications in use as of April 1991. Of particular interest was whether four item types identified in the test specifications as "reasoning items" could be shown to measure, in addition to general reading ability, any abilities not measured by the other item types in the TOEFL reading test. Two techniques, Stout's procedure and NOHARM analyses, were employed to investigate the hypothesized two-factor model. In both cases the data failed to fit the model, indicating that TOEFL "reasoning items" cannot be shown to measure a unique construct. However, the follow-up exploratory analyses indicated that all 10 test forms used in the study violated the assumption of essential unidimensionality, and all of the forms appeared to fit a two-factor model where the second factor may be related to passage content or position.

  • Linda Steinman Linda Steinman

Le test cloze, ou test de completion, est un procede qui consiste a enlever un mot tous les n mots dans le passage d'un texte que le lecteur doit ensuite retrouver par rapport au contexte textuel. Ce type d'exercice, invente par Taylor dans les annees 1950, permet de tester la comprehension dans la lecture de la langue maternelle et est egalement utilise pour l'enseignement des langues etrangeres. L'A. explore les differents types (a choix multiple, reponse libre...) de cette activite, etudie comment elle permet de mesurer la comprehension, donne des pistes pour son utilisation en tant que test et des conseils pour la correction de l'enseignant.

This book introduces and reclassifies disorders across the board from the vantage point of a more dynamic, comprehensive, consistent, and coherent theory of sign systems. In doing so, it presents newly discovered theoretical connections along with up-to-date published empirical and experimental demonstrations. It is becoming increasingly evident that disorders and disease conditions, especially the unexpectedly persistent ones singled out as "communication disorders," invariably involve problems in representation. Such problems range from difficulties at the deepest levels of genetics to the highest levels of human emotion and intelligence manifested in experience, actions, language, and reasoning. More than ever before it is clear that health depends profoundly on dynamic representations, especially true ones, of the way things really are and how they are changing over time. Disease conditions and disorders invariably involve mistaking fictional or deliberately false representations for true ones. Disease agents, it is clear, can falsely represent themselves to the body's defenses. Toxins can disrupt the capacity of the body and its immune systems to represent things correctly from genetics upward through metabolism and on to the highest levels of emotion, cognition, language, and reasoning. When fictions are mistaken for true representations, as when deliberate deceptions or mere fictions are taken to be true representations of actual facts, problems result. Such communication problems at the deepest levels from genetics and metabolism right on up to the most general forms of language and thought, form the underlying basis for disease, disorder, and mortality. Without exception, disorders of communication and disease conditions in general, are the consequence of breakdowns and failures in systems of representation. At the core of the distinctly human capacities of communication are the dynamic pragmatic mapping relations by which sensory impressions of the physical world are linked through actions to abstract concepts of the linguistic kind. The simplest examples involve naming: For instance, if we refer to one of our editors as "Sandy," we aim to map the surface form of the name onto a certain person. If we succeed, our representation qualifies, as far as it is intended to qualify, as a true and valid representation. Another simple example of the pragmatic mapping relation would be a baby waving goodbye when someone else is actually taking leave of that baby, or vice versa. In that case the waving would be appropriately associated by way of reference or signification with the act of taking leave. Such pragmatic mapping relations, as well demonstrated in the study of communication disorders, are fundamentally programmed into our neurological systems. It is not too much to say that they are dynamically built-in to the architecture of the brain. With that in mind as the basis for the dynamic connection between abstract ideas and concrete things through intelligence, it is also becoming increasingly evident, as our readers and students are discovering, that the many fields of study concerned with human communication and its disorders are undergoing a paradigm shift from static theories of distinct bits and pieces, surface forms, and independent components, toward theories taking account of dynamic, interconnected, systems that communicate with each other. The dynamic systems-oriented approaches to human experience are central to the paradigm shift that we believe is already underway in the health sciences. The shift is bringing with it a better understanding of the central role of valid communications to well-being. It is ultimately the dissolution of representations themselves, or we could say the development of communication disorders, that leads to diseases and disordered conditions. This book chronicles the initial stages of the paradigm shift that we believe is underway and it anticipates some of the ways in which the �systems orientation must continue to develop in the coming months and years. Teachers who adopt the book and course, and their students, are assured of a cutting edge introduction to the best of current theories and ongoing empirical research being applied to test them. No other introduction offers as much historical depth or experimental currency concerning well researched cases. Nor does any other course provide a simpler, more coherent, or more intelligible theoretical perspective.

  • John Oller John Oller
  • Frances Butler Hinofotis

Two hypotheses proposed to explain the variance in second language tests are investigated. Hypothesis 1 (HI) claims that language skill is separable into components related either to linguistically defined categories (e.g., phonology, syntax, and lexicon) or the traditionally recognized skills (i.e., listening, speaking, reading, and writing). Although tests of the presumed separable components are believed to produce substantial overlapping variances, it is assumed in HI that tests aimed at a certain component (e.g., listening skill or vocabulary knowledge) should also produce some meaningful variance that is unique to that component (i.e., not overlapping with variances of tests aimed at other components). Another possibility (H2) is that second language ability may be a more unitary factor such that once the common variance on a variety of language tasks is explained, essentially no meaningful unique variance attributable to separate components will remain. Previous studies have provided rather convincing support for H2, though it seems to be the less obvious of the two alternatives. Data from 159 Iranian subjects at the University of Tehran, Iran, who took a cloze test, a dictation, and the five subparts of the Test of English as a Foreign Language also support H2 in this report However, when an oral interview task is included, the picture is less clear. Data from 106 foreign students (from mixed language backgrounds) at the Center for English as a Second Language at Southern Illinois University suggest the possibility of unique variances associated with components of grammatical knowledge, e.g., syntax versus phonology, or vocabulary).

  • Richard B. Baldauf
  • Ivan K. Propst

Few reliable and valid measures of reading achievement are available to evaluate programs for elementary English-as-a-second-language (ESL) pupils. Four variations on the cloze procedure, which has been previously used with disadvantaged and ESL elementary pupils, were evaluated using randomly assigned groups of fourth and fifth grade students. Matching and multiple- choice variations were selected for comparison because they are in greater consonance with current psycho- linguistic theories of the reading process than are other types of reading comprehension measures. Although the overall results were quite similar for the four cloze variations examined, the matching cloze procedure seems to be preferable for elementary ESL students since these tests produced better item characteristics and were more easily constructed.

  • Philip K. Oltman
  • Lawrence J. Stricker Lawrence J. Stricker
  • Thomas S. Barrows

Responses on the TOEFL© test may reflect both the influence of the examinees' native language and their level of English proficiency. The aim of this study was to appraise the effect of these examinee variables on the structure of the test. The interrelations among TOEFL items, using all of the information provided by the various responses to the items (the four alternatives, omitted, and not reached), were analyzed by three-way multidimensional scaling for samples of examinees systematically varying in native language and level of English proficiency. Four dimensions were identified: three corresponded to the sections of the test, and the fourth was an end-of-test phenomenon. The dimensions were predominantly defined by easy items and were most salient for low-scoring examinees. The salience of the dimensions did not differ for the various language groups, except for the end-of-test dimension. Major conclusions were that the TOEFL's construct validity is supported, the test's interpretation varies with the examinees' English proficiency, easy and difficult items differ in their potential for diagnosis and global screening, and the dimensionality of the TOEFL and of competence in English depend on the examinees' English proficiency.

  • Winton H. Manning

The purpose of this study was to investigate the validity of cloze-elide tests of English proficiency for students who are similar to the TOEFL candidate population. Cloze-elide tests used in this research consisted of exercises in which an examinee is required to edit a prose passage by eliminating extraneous words that have been randomly interspersed throughout the original text of the passage. Students enrolled in university-level intensive English language programs were administered a series of tests including, in addition to cloze-elide tests, a form of the TOEFL, a multiple-choice cloze test, a traditional cloze exercise, and an essay that was holistically scored. Students were also rated by their instructors in twelve areas of English proficiency, and the students rated their own language competency through self-assessments in ten areas. A variety of student background information was also obtained. The design of the study aimed at contributing evidence of the construct validity of cloze-elide tests. Concurrent validity information is relevant to this question, but the evidence arising from factor analyses of the intercorrelations among these many measures is more pertinent. In summary, the new cloze-elide measures demonstrated very strong concurrent validity for TOEFL and other more widely used measures of second language proficiency. The factor analyses suggest that cloze-elide tests are good, indirect measures of English language proficiency, comparing very favorably with more commonly used testing procedures. Multiple regression analyses confirmed the usefulness of cloze-elide tests, which were generally one of the two best predictors of teacher ratings of students' English proficiency.

  • Sybil B. Carlson
  • Brent Bridgeman Brent Bridgeman
  • Roberta Camp
  • Janet Waanders

Four writing samples were obtained from 638 applicants for admission to U.S. institutions as undergraduates or as graduate students in business, engineering, or social science. The applicants represented three major foreign language groups (Arabic, Chinese, and Spanish), plus a small sample of native English speakers. Two of the writing topics were of the compare and contrast type and the other two involved chart and graph interpretation. The writing samples were scored by 23 readers who are English as a second language specialists and 23 readers who are English writing experts. Each of the four writing samples was scored holistically, and during a separate rating session two of the samples from each student were assigned separate scores for sentence-level and discourse-level skills. Representative subsamples of the papers also were scored descriptively with the Writer's Workbench computer program and by graduate-level subject matter professors in engineering and the social sciences.

  • Marna Golub-Smith

The study was conducted to provide information for the TOEFL program about the effects of option rearrangement as a means of increasing test security during administrations of the listening comprehension section of the TOEFL. Two forms, consisting of items whose options were systematically scrambled, were spiraled together with the base versions during the pretesting phase of test development. The results indicated that scrambling an item's options does produce differences in both the estimated item response functions and equating functions. Therefore, it was recommended that this procedure not be adopted for the operational program.

This report examines the content characteristics of the Test of English as Foreign Language (TOEFL) from a communicative viewpoint based on current theory in applied linguistics and language proficiency assessment. After a review of relevant literature, the authors developed and applied a four-part operational framework for analyzing the communicative characteristics of a language proficiency test. The first component of this framework consists of the grammatical, sociolinguistic, and discourse competencies required by test tasks. The second component consists of eight factors that could influence test performance. The third component consists of judgements of the relevance of the content of test items to academic and social language use. The final component relates the langauge and language tasks appearing in TOEFL items and sections to a criterion-referenced scale of language proficiency. In this case, the Interagency Language Roundtable scale was used. Finally, the report discusses test design features that might improve the quality of language proficiency tests.

  • Deborah Hosley
  • Keith Meredith

The present study provides validity information for the TOEFL by examining some of its inter- and intra-test correlates. Inter-test correlates included: 1) grades in an intensive English program, 2) accumulated scores from objective quizzes administered after each of 15 lessons in a course designed to teach listening comprehension and note taking skills, and 3) scores on the Comprehensive English Language Test (CELT) with subtests representing structure, listening comprehension and vocabulary abilities. In addition, intercorrelations were done within ability groups (high, medium and low, as measured by students' levels in their English program) to determine if correlation patterns vary according to the academic level of the student. Intra-test correlates consisted of investigations of correlations among subtests within the TOEFL. Factor analyses were used to aid in interpretation of the various correlation structures. The purpose of this study is to initiate a validation study of the content of the TOEFL. One factor was identified through factor analysis of TOEFL subtest scores, with the reading comprehension subtest having the highest factor loading. The interrelated nature of the TOEFL subtests is supported by positive correlations (greater than .50) within TOEFL subtests and between TOEFL and CELT subtests. A high correlation between the listening comprehension subtest of the TOEFL and another listening comprehension measure, the listening tracts, as well as a considerable correlation between listening tract scores and TOEFL totals, suggest that listening comprehension may be a separate skill that is significantly interrelated with total score success. Grades in an intensive English program are not predictors of TOEFL success, although relative academic level is.

  • Donald L. Alderman
  • Paul W. Holland Paul W. Holland

The Test of English as a Foreign Language (TOEFL) was examined for instances in which the item performance of examinees with comparable scores differed according to their native languages. A chi-square procedure, sensitive to deviations of less than ten percent from the expected frequencies of correct item responses across several language groups, revealed significant differences on seven-eighths of the TOEFL items. Reviewers familiar with particular languages could attribute the relative advantage or disadvantage of those language groups on a specific item to linguistic similarities or dissimilarities with the English language. Reviewers could not, however, identify which items would exhibit differential performance across groups based upon inspection of a test form and answer key alone. These findings suggest that examinees' performance on given items in a test of proficiency in a second language will vary according to linguistic contrasts with their native language and that statistical procedures will be necessary for identifying items with exaggerated or unexplained differences across language groups.