Sessions / Testing and Evaluation
Whether their purpose is to present research findings or to prepare themselves for the global workforce after they graduate, being able to carry out an effective presentation can be of great importance for university students. However, one of the greatest inhibitors to a successful presentation is anxiety. While much research has been conducted into measuring and analyzing foreign language anxiety and public speaking anxiety, little research has been conducted on measuring the anxiety foreign-language learners experience conducting presentations. This presentation will detail the development of a Presenting in a Foreign Language Anxiety Scale (PFLAS) specifically designed to measure the anxiety Japanese university students experience when making presentations in English. It is hoped that the information gleaned from this survey will allow the researchers to better understand what aspects of classroom instruction increase or reduce anxiety. This will in turn lead to changes in instruction, to help students become more successful presenters.
A common step in quantitative research into language learning is the reliance on null-hypothesis statistical significance testing (NHST) to show the effectiveness or otherwise of different treatments for groups. However, as has been well-documented, multiple problems exist with this approach (see: The American Statistician, 2019, Volume 73, sup1). To move beyond the limitations of NHST, Cumming (2012, 2014) introduces the concept of the New Statistics. Here, the focus shifts from reporting p-values to estimation of effect sizes and confidence intervals (CIs) as a way to better explain what a difference between groups may mean. To help facilitate a New Statistics approach, Ho, et al (2019) have developed a computer package, DABEST, for the creation of estimation graphics. These are data-rich plots which display effect sizes and CIs alongside graphical distribution of all data points from samples. In this presentation, following an overview of the issues above, I will introduce an online application for quantitative data analysis using DABEST. The application, built by the presenter, generates estimation plots and other essential statistical data to help researchers understand results from their research through a New Statistics framework, as well as providing output for use in presentations or publications.
The Duolingo English Test (D.E.T.) is an online, on-demand English proficiency test. This test measures the language ability of test-takers across the four skills of reading, listening, writing and speaking, in a blended manner, where all skills are assessed in a single test. This test is typically much shorter than other proficiency tests and it can be done in the examinees’ own homes using their own computers. The adaptive nature of this test means it adjusts its questions based on the preceding (correct or incorrect) answers enabling it to measure competency rapidly and accurately.
Given the current Coronavirus pandemic, taking language proficiency examinations at test-centers is often not a viable option. Therefore, the D.E.T. may appear to be an affordable, convenient alternative to more traditional tests. However, despite its apparent advantages, it is not without its complications, so it should not be considered a panacea to the deficiencies of conventional testing.
This presentation will introduce teacher and student experiences of the D.E.T. based on information gleaned from questionnaires and interviews. It will elucidate the advantages and disadvantages of the test and also propose best practice guidelines for others wishing to employ this test in their own contexts.
This TEVAL SIG Forum will consist of three 20-minute presentations and a 30-minute open-floor discussion which will address the proposed adoption of four-skills tests for university admissions in Japan. The main purpose is to overview the selection and use of tests and the potential impact that their use will have on English education in Japan. Firstly, David Allen will briefly discuss why MEXT is recommending the use of external tests for admissions purposes and present various reactions to the proposal. He will describe key features of the recommended tests (i.e., Cambridge Assessment exams, EIKEN, GTEC, IELTS, TEAP and TOEFL) and discuss the main issues facing key test stakeholders (i.e., test takers, parents, teachers, university administrators) when determining which test(s) to adopt. Secondly, Kingo Shiratori will discuss the various factors involved in selecting four skills tests for university entrance purposes. By referring to his recent study (Shiratori, 2019), which investigated the use of the Cambridge Preliminary B1 exam at his own institution, he will illustrate how researchers and other stakeholders can evaluate the appropriateness of using specific tests in specific contexts. Thirdly, Tatsuro Tahara will discuss future research directions into the engineering of washback from test use in the Japanese context with reference to contemporary washback theories. He will focus specifically on two specific aspects of Japanese education that have been under-researched yet are likely to play an important role in test washback in this context: shadow education (i.e., juku and yobiko) and Japanese test culture.
Research into second-language vocabulary size has suffered from inattention to psychometric issues, with ordinal-level raw scores often analyzed as if they represented ratio-level measurement. Additionally, contextual effects have been largely ignored, leading to concern over the interpretation of research findings. This study used many-faceted Rasch measurement to analyze vocabulary data from 1872 Japanese university students. A test of word synonymy was linked to the Vocabulary Size Test and the contextual variables of item position and time of administration analyzed as measurement facets. Major findings were that data-model fit was sufficient to allow local linking of different item types and contextual variables, allowing meaningful comparison of results and score gains on a scale of vocabulary size, and that item placement within a test form had a substantive effect on item difficulty.