** Statistics Outside the Ivory Tower: The Connections Between Hypotheses, Study Design, Measures and Analyses** (Gunnar Jacob)

**Regression Modelling for Learner Corpus and SLA Research** (Stefan Th. Gries)

**Research Synthesis and Meta-analysis in Applied Linguistics** (Luke Plonsky)

**Doing Replication Research in Second Language Acquisition** (Kevin McManus)

**Social Network Analysis in Applied Linguistics Research** (Jeremi Ochab, Andrzej Jarynowski, Michał B. Paradowski)

*Statistics Outside the Ivory Tower*: The Connections Between Hypotheses, Study Design, Measures and Analyses

*Statistics Outside the Ivory Tower*: The Connections Between Hypotheses, Study Design, Measures and Analyses

Gunnar Jacob, University of Kaiserslautern

**Course description:**

A statistical analysis does not happen in an empty space. Instead, any data set is based on a particular study design, and designing a study involves a number of crucial decisions about research questions, hypotheses, what to measure, and how to measure it. These decisions have far-reaching consequences for statistical analyses. In the course, we will discuss what can go wrong if these crucial connections are not taken into account from the very start.

**Topics:**

**1:** *Means can be pretty mean –* A comprehensive guide to successful self-deception using descriptive statistics

*A few weirdos can mess things up *– The influence of outliers

*Should I stay or should I go? – *Exclusion of data points, items, or participants

*Alice in Wonderland.* *– *Self-deception through outlier exclusion

*Enough is enough… but when is enough really enough?* – Researcher degrees of freedom and *p*-hacking

**2: You reap what you sow – The connections between study design and statistics**

*Everything is connected… unless…* *–* Experimental vs. non-experimental studies

*Comparing monolingual apples with bilingual oranges – *The use of experimental designs in second-language research and consequences for statistical analyses

*All scale levels are created equal… but some are more equal than others *– subtle differences between nominal, ordinal, and interval data, and how they can mess up your analysis

**3: 1 or 0, that is the question – the pitfalls of categorical data analysis**

*In the disguise of a simpleton – *The hidden issues with percentages

*An insufficient cure from the 50ies* – Do transformations help?

*Cutting the Gordian Knot – *Logit mixed-effects models: The solution?

**4: Let’s look them straight in the eye – an example from an eye-tracking study**

*Remember these weirdos?* – Outliers in eye-tracking data

*Remember these zeros?* – Categorical eye-tracking measures

*Remember these apples? *– Comparing native speakers vs. second-language learners

*Remember Alice?* – Cherry-picking of particular measures

**5: Making a deal with the stats devil – consequences of getting an analysis wrong**

*Remember that damn eye-tracking study?* – A summary of disasters

*Publish or perish *– How supervisors, reviewers, or parents can make things even worse

In addition to the theoretical sessions, each of the 5 blocks will also include a hands-on session, in which the participants try to detect hidden methodological and statistical pitfalls in particular sample studies or explore a sample data set and perform an analysis themselves.

**Regression Modelling for Learner Corpus and SLA Research**

Stefan Th. Gries, University of California, Santa Barbara

**Course description:**

The workshop is an introduction to the statistical analysis of data from learner corpus research and second language acquisition using fixed- and mixed-effects regression modelling. The course presupposes an understanding of statistical analysis that minimally involves the logic of statistical testing / significance values and ideally involves some (basic) knowledge of monofactorial statistical analysis with R. Using the open source software and programming language R, we will:

- briefly discuss the basic principles and applications of linear and logistic regression modelling, in particular how to define regression models, how to interpret the numerical results, how to validate/diagnose models, and how to visualize regression results;

- talk about the ways in which mixed-effects modelling can advance our understanding of data and how we define and interpret such models;

- work in detail through two case studies that (i) exemplify basic aspects of these regression modelling techniques and (ii) at least briefly point towards a more sophisticated approach such as a priori contrasts, curvature, and random effects exploration.

**Topics:**

1: Linear and logistic regression modelling basics: *Do*s and *don’t*s of model definition, selection, and visualization

2: Towards mixed-effects modelling: Numeric and visual interpretation

3: Case studies part 1: An example from LCR

4: Case studies part 2: An example from SLA

Recommended reading: Gries, S. Th. (2013) *Statistics for Linguistics with R* [2^{nd} edn.], chs. 1-4. De Gruyter Mouton.

**Required software:**

- R (v. 3.6 or newer)

- RStudio (v. 1.2 or newer)

- the following packages installed: `car, effects, rms, lme4, MuMIn, party, doParallel, foreach, rgl`

**Research Synthesis and Meta-analysis in Applied Linguistics**

Luke Plonsky, Northern Arizona University

**Course description:**

Research synthesis and meta-analysis comprise a set of well-developed techniques that greatly improve upon traditional literature reviews. Among other benefits, research synthesis and meta-analysis provide enhanced objectivity, comprehensiveness, and precision with regard to the effects or relationships examined in a body of research. Consequently, the application of meta-analysis in applied linguistics has expanded dramatically in recent years, following much of the social and medical sciences. The workshop is designed for applied linguistics students and researchers interested in learning the conceptual motivations and hands-on techniques for (a) conducting research synthesis and meta-analysis, and (b) embracing and practising a synthetic approach. The workshop will not be technical and only a very basic understanding of quantitative methods is assumed.

The outline for this extended workshop will begin with a conceptual discussion of the rationale for meta-analytic thinking at the primary and secondary levels. We will address notions central to this approach such as synthetic-mindedness and the relative merits of statistical (NHST / *p* values) and practical significance (effect sizes). The remainder of the course will take participants through the four major stages of completing a meta-analysis: (1) Defining the domain and searching for primary studies; (2) Developing and implementing a coding scheme, and extracting effect sizes; (3) Analysis: Aggregating effects across studies; and (4) Data interpretation and presentation.

Examples from several meta-analytic projects will be used throughout to illustrate the points and procedures that are discussed. However, workshop participants are more than welcome to bring their own topics for discussion and possible application of meta-analysis. Upon completing the workshop, the participants will be in a very strong position to conduct and critically examine meta-analyses in applied linguistics.

A combination of Excel and SPSS will be used for data analysis. However, suggestions for other free and web-based tools, packages, and templates will also be provided.

**Doing Replication Research in Second Language Acquisition**

Kevin McManus, Pennsylvania State University

**Course description:**

Replication research is essential to the conduct of good science and the advancement of knowledge. Responding to a groundswell of interest in carrying out replication research, the course will demonstrate in a step-by-step approach how to go about replicating research in second language acquisition and will instruct participants how to design and execute their own replication study. It will focus on four major aspects: (1) fundamental questions about previous research to motivate replication, (2) critical reflection on research design and data analysis, (3) a step-by-step guide to executing and writing up a replication study using models published in high-quality journals, (4) dissemination of the replication study. In addition to regular class discussions focused on replication research and data analysis, students will choose an important study in their area of interest to motivate and design a replication study. In short, the course seeks to answer a number of questions on the practical aspects of replication research, in particular:

*what*a replication study is*how*to select a suitable study for replication*why*such a study lends itself to such an approach*what*kind of replication approach is most useful given the nature of the target study*how*to carry out the study to maximise its replicative potential*how*to write up the study to highlight its comparative core, and*where*to publish the work to maximise its impact on the field

Before the commencement of the course, the participants shall read and familiarise themselves with the following study, which will then be used as the original study for designing the replication: Bitchener, J., & Knoch, U. (2010). Raising the linguistic accuracy level of advanced L2 writers with written corrective feedback. *Journal of Second Language Writing, 19*(4), 207-217.

**Social Network Analysis in Applied Linguistics Research**

Jeremi Ochab, Jagiellonian University, Kraków;

Andrzej Jarynowski, Interdisciplinary Research Institute, Wrocław;

Michał B. Paradowski, Institute of Applied Linguistics, University of Warsaw

**Course description:**

The workshop will try to address some of the following research questions: What types of social/language interactions can be represented by networks? Can we infer how they affect language learners/users? How are they correlated with their education, gender, ethnic background or social conditions? Can they predict success?

Each block will begin with an introduction of theoretical concepts, real-world examples and presentation of an analysis workflow and will be followed by a hands-on session. The focus of these will be on tools available in the R statistical programming environment and some other open-source programmes, depending on the participants’ programming skills. The data sets used in the workshop will be provided by the instructors, but the participants are encouraged to bring their own. Some basic familiarity with statistics and programming is most welcome, but no expert knowledge of R in particular is required.

**Topics:**

1: Types of networks and network representations in SLA contexts: Sociograms, layers of interactions (e.g. communication), peer vs. vertical learning in groups

2: Network structure: Clustering, small-world phenomenon, core-periphery, (dis)assortativity, preferential attachment, cliques and clubs

3: Examples of:

- social networks
- literary and linguistic networks

4: Centrality measures: degree, PageRank, closeness, betweenness

5: Clustering of networks, community detection algorithms

6: Visualisation in Gephi and in R

- filters and layouts
- interactive formats
- annotation of networks

**Software prerequisites:**

- R

- RStudio

- in R, install the following packages: `dplyr, ggplot2, tidyverse, igraph, stylo, networkD3, readr`

. This can be done for instance using the command:

` install.packages("tidyverse")`

- Gephi