Distant Classmates: speech and silence in online and telephone language tutorials

This paper presents findings from the Interaction Study Group, a team of four researchers based at the Open University (OU) investigating tutorial provision on beginners' distance language courses. Patterns of verbal interaction in online and telephone tutorials are investigated using social network analysis and gaps and silences between interaction turns are analysed using an ethnographic approach. The results lead to recommendations for tutor training and student preparation for online and telephone tutorials.


Dieser Artikel basiert auf Forschungsergebnissen der Interaction Study Group, einer vierköpfigen Forscherinnengruppe an der Open University (OU) in Großbritannien. Unter Zurhilfnahme der sozialen Netzwerkanalyse untersucht die Gruppe verbale Interaktionen in Online- und Telefontutorien aus dem Anfängersprachunterricht im Fernstudium. Zusätzlich wird ein ethnographischer Ansatz verwendet, um Pausen und Schweigephasen zwischen Interaktionszyklen zu analysieren. Aufgrund der bisherigen Ergebnisse der Studie, schlägt die Gruppe erste Richtlinien zur Weiterbildung von Online-TutorInnen und zur Vorbereitung von Studierenden auf den Online-Unterricht vor.

distance learning, language tutorials, interaction, silence



Despite considerable cultural and methodological differences in language teaching practice, most language teachers would agree that one important feature of a successful language class is oral practice. In other words: we expect our students to talk.

For beginner language learners, the form and content of what they can produce in the second language (L2) are restricted, owing to limited vocabulary and knowledge of more complex structures. On the other hand, there are a variety of activities designed to help beginner learners in the difficult work of expressing themselves orally in their second language, e.g. drills, rotes, substitution exercises, structured and semi-structured role-plays, etc.; and there are strategies to compensate for a lack of skills in the L2. Many of the exercises can be practised or simulated in distance learning without the immediate presence of a co-locutor, many of the strategies can be taught, explicitly or implicitly through task design. However, even well designed speaking tasks performed by a learner on their own or with the help of recordings, lack the spontaneity and responsiveness of real life conversations. Unexpected and uncertain responses are what novices in L2 use dread; having to "think on their feet" can make them "freeze up".

In distance language learning, the opportunities to practise spontaneous interactive speaking are limited (White, 2003). Language tutorials are built into distance courses so that students can meet up with other speakers of the L2, peers and tutors, and practise what they have learned working through the materials more or less on their own. The Open University (OU) offers its students a choice of tutorial mode to cater for learners with differing needs. Apart from the traditional face-to-face tutorials that take place in classrooms in study centres throughout the UK, language tutorials are held online, in an audio-graphic conferencing system called Lyceum, and – as a fall-back option – are also provided via telephone conferencing for those students who live too far from the tutorial venues and have no access to a networked computer or lack the necessary computer skills for online conferencing[i].

Although tutors are carefully selected to teach on OU language courses, they all bring their different experience and their individual teaching style to this role. Some manage better than others to exploit the media and adapt the suggested tasks to maximise opportunities for verbal interaction. Some have more tutor-centred styles, providing frequent feedback and remaining the focal point of most interactions; others encourage communication exchanges between peers. In some beginners' tutorials, the emphasis is on routines, drills, pre-formed chunks of language ("structured exchanges"), often focusing on forms; in other tutorials there is a clear attempt to simulate some simplified form of genuine communicative exchange, often neglecting form in favour of content or fluency.

Whatever the tutor's style, synchronous tutorials offer the best chance for our students to practise spontaneous interactions in the L2 in a non-threatening and supportive atmosphere with the help of a competent speaker (tutor) when needed and the positive challenge of having to understand and react to unexpected input from peers who are not always correct or even easily comprehensible in the target language.

The Study

This paper forms part of a wider study analysing language tutorials in different media. In an attempt to investigate the opportunities for speaking and interacting in different media, four researchers in the Department of Languages (the "Interaction Study Group") decided to focus on spoken verbal interaction, recording synchronous live tutorials from German beginner courses. Face-to-face tutorials were recorded on video and separately on audio, online tutorials held via audio-graphic conferencing were recorded using screen-capture software (Camtasia) and separate audio-recordings. In addition, one telephone tutorial was recorded on audio.

Because of the relative novelty of spoken online interaction for language learning purposes, there is no clearly established methodology for analysing this data. The Interaction Study Group has begun to investigate different methods of analysis and evaluating their suitability (Heins et al., in press). Verbal interaction in tutorials was transcribed and analysed using three different methods: analysis of interaction patterns using a variation of Social Network Analysis (Stickler et al., 2005), analysis of discourse features using QSR N6, a piece of computer software for qualitative data analysis (Duensing et al. 2006), and an analysis of silence based on an ethnographic approach to the investigation of significant features[ii]. The screen-captures and video recordings will also be used for the analysis of non-verbal interaction. Two of the methods are employed in this particular paper: sna and analysis of silence. More information on the use of QSR N6 for the analysis of discourse features in spoken interaction online and face-to-face can be found in two other papers of the Interaction Study Group: Duensing et al., 2006 and Heins et al., in press[iii].

Triangulation (see Müller-Hartmann, 2001) is achieved in three ways:

  1. data triangulation: by collecting data from different tutors' classes at different times in the course;

  2. researcher triangulation: by comparing results from four different researchers and negotiating a common outcome; and

  3. method triangulation: by analysing the same data using three different approaches (sna, QSR and ethnography).

All three approaches allow us to compare interaction occurring in the different media, providing us with more information on students' actual behaviours in beginners' language tutorials online and in a classroom. More importantly, however, these comparisons allow us to reach conclusions about best practice in online teaching and learning, which can be used for tutor training (Hampel & Stickler, 2005) and student preparation for future courses.

Inspiring the wider project are the fundamental questions intended to improve the tutorial provision for language learners:

  • what opportunity do distance students have to practise interactive speaking in their tutorials?

  • how do these opportunities differ in different media (Stickler et al., 2005)?

  • how can task design, tutor behaviour and tool use influence these opportunities (Duensing et al., 2006)?

Tackling these questions in a systematic way meant that the researchers also had to approach questions of research design and the role of interaction in second language acquisition (Heins et al., in press) and had to select, adapt and evaluate methods for data analysis.

Although the more general questions are still relevant for this paper, the focus here is on two media, audio-graphic conferencing and telephone conferencing, which leads to narrower questions:

  • how do speaking opportunities differ in online and telephone tutorials?

  • what are the differences in interaction patterns in these two media?

An initial analysis of the two non face-to-face media brought to light a very significant feature: silences in online and telephone tutorials - both are media without direct information on facial expression and bodily presence - have a fundamentally different quality and need special attention. This led to a third research question:

  • is there a significant difference in the occurrence of silences in online and telephone tutorials?

The underlying question of research design needs to be addressed concurrently: which methods are most suitable for the investigation of certain features?

Compared to face-to-face tutorials or even video-conferencing, telephone communication and audio-graphic conferencing have "lower bandwidth", i.e. they only provide synchronous input in a limited number of modes. "Media that transmit auditory and visual channels, for example, have higher bandwidth than media that transmit only auditory cues." (Straus et al., 2001). Both media allow for multi-point, two-way verbal exchanges, audio-graphic conferencing offers an additional interface: a shared screen with graphic input that can be manipulated and added to by participants. The OU uses Lyceum, an internet-based conferencing software developed in house (for more details see Buckingham Shum et al., 2001). For the purpose of language tutorials, the interface has been slightly changed to accommodate a choice of four different languages.

In 2003, beginners' language courses in German and Spanish were the first OU courses to offer students a choice of tutorial mode, either face-to-face or online; since then all language courses have become dual strand. Because of the relative newness of the medium and to make the adjustment to the new teaching environment easier, tutorial materials were prepared for the online version. Online tutors are provided with a suggested lesson plan and also linked teaching materials (electronic whiteboards, documents etc.). Many tutors do indeed use these materials as the basis for their online classes. They are, however, free to adapt, change or even discard the suggested activities. Some tutors prefer to develop and use their own materials. Face-to-face tutors or telephone tutors are not provided with lesson plans and teaching materials. They are able to access the online lesson plans if they wish and adapt them to the face-to-face or telephone classroom environment[iv].

Participation in tutorials is optional for all OU language courses (Baumann, 1999) and they are only a minor part of the overall teaching in distance teaching mode. In the twelve-months beginners' course, for example, there are only approximately 21 hours of live tuition. However, these group tutorials do play an important role in student motivation.

Two Lyceum tutorials and one telephone tutorial are studied here in detail in order to find out whether there are any significant differences in patterns of interaction and in silences between spoken exchanges. The two Lyceum tutorials were each of approximately 75 minutes length, the telephone tutorial has a duration of 60 minutes. Participation was as follows: two students (S1 = John[v], S2 = Ted), one tutor (LT1 = Paul) and one observer (O)[vi] in Lyceum tutorial 1; two students (S1 = Michele, S2 = Isobel), one tutor (TT1 = Silke) and one operator in the telephone tutorial; and four students (S1 – S4, Frances, Edwin, Fanny and Gerry), one tutor (LT2 = Ella) and one observer (O) in the second Lyceum tutorial. They took place at a comparable stage in the language course, after approximately eight months of language learning.

Patterns of interaction: sna[vii]

Social Network Analysis is used in the social sciences to investigate the patterns of interaction between agents (Cook & Whitmeyer, 1992; Wellman, 1983). On one hand this can express the diachronic development of relationships and "networks" for individuals ("ego networks"), on the other hand, communication in organisations, e.g. in a large company, can be interpreted as "networks" (Cook & Whitmeyer, 1992), presenting a picture of "nodes" and "outliers" and identify frequency and direction of interactions (Wortham, 1999).

SNA has been used to study diachronic developments of "ego networks" of bilingual children (Wei, 2006) and in the analysis of asynchronous computer-mediated communication in L2 learning (Reffay & Chanier, 2002). In our study, we use a simplified version of sna (Nardi, 2004) and focus on very short time spans: language tutorials of 60 – 75 minutes duration.

To represent verbal interactions between participants in the tutorials numerically and graphically, every utterance (spoken verbal contribution) by student or tutor is coded as one turn, given a direction and one of three language options: German (G), English (E) or a mixture of both (M). The direction of the interaction is determined not by the intention of the speaker but by the respondent. So, for example, Isobel (S2) in the Telephone tutorial asking Michele a question by naming her but being answered - or interrupted - by Silke (TT1) would be tagged as: "S2 – TT1" rather than "S2 – S1"


<Telephone tutorial>

  1. Isobel: Oh, sorry. Michele, ich bin der Meinung, dass wir mehr Deutsch lernen muessen.

  2. Silke: Was denkst du? oder Stimmt das?

  3. Michele: Uh uh, ich habe nicht versteht. Dass in Deutschland? Ich habe nicht versteht.

  4. Silke: Kannst du das wiederholen, Isobel?

Tagged as:







2 (G)



3 (G)




1 (M)



Although most turns are coded as directed to one individual respondent, certain utterances are clearly directed to the whole group, not to an individual student, evidenced, for example, where instructions for silent work are followed by a long pause. In addition, a number of other interactions can also be tagged with a direction towards the whole group (e.g. elicitation of chorus response, instructions eliciting non-verbal responses from whole group).

The resulting charts (for full tables, see Appendix A) have been transformed into graphic representations, with participants represented by circles (S = student, T = tutor, O = observer), the whole group shown as a rectangle in the middle, every turn represented by one fine line (for the whole tutorial graph only: 10 turns = 1 thick line); German is red, English blue and the mixture shown in purple.

The overall results, although not informative about the content of the lessons, provide a quick and easy image of the interactions taking place, a first impression of the major contributors and the less active participants and an indication of the teaching style employed.

Lyceum Tutorial 1: 2 Students

Figure 1. Lyceum Tutorial 1: 2 Students

Lyceum Tutorial 2: 4 Students

Figure 2. Lyceum Tutorial 2: 4 Students

Telephone Tutorial: 2 Students

Figure 3. Telephone Tutorial: 2 Students

Differences in the interaction patterns in these three language tutorials can be identified at first glance: whereas Lyceum tutorial 1 has a strong tutor focus - virtually every interaction is directed to or from the tutor - the telephone tutorial and Lyceum tutorial 2 display considerably more interaction between students. Lyceum tutorial 2, the tutorial with most participants, also shows frequent tutor-turns directed towards the whole group (eliciting responses from all students or turns that end without response from one individual student). A student or learner centred approach, according to Cynthia White's definition, would make sure that the interactions are relevant for and responsive to the learner, that there is a commitment to knowledge construction by the learner and that there is a prevailing culture of enquiry (White 2007). In our limited investigation, the definitions of "student focussed" or "tutor focussed" are based simply on the frequency of exchanges between students or between tutor and student(s) and not on the actual content of exchanges.

Social Network Analysis can be usefully adapted to show patterns of interaction in small groups and even to identify a limited number of different types of "exchange" (Cook & Whitmeyer, 1992; Wellman, 1983). What cannot be shown with this method is the quality of exchanges: a turn could be several sentences or just one word, it could be accurate and appropriate German or grammatically wrong. Seconds or even minutes could elapse between exchanges. What we can show with the analysis of interaction patterns is that a responsiveness to students is possible regardless of media. What we cannot show with this limited method is that learner-centred exchanges in the detailed definition by White (2007) do occur in these tutorials.

Nevertheless, from the sna analysis we can form some tentative assumptions about the differences between the three tutorials, and about the two media.


  1. Both media, telephone conference and audio-graphic conference, lend themselves to student-focussed language tutorials, giving students maximum opportunity to practise the L2.

  2. Students have ample opportunity to speak in German in all three tutorials, in both media.

  3. Different tutor styles, tutor-focussed or student-focussed interaction patterns, can establish themselves in the online medium.

  4. A tutor can focus virtually every interaction on himself or herself.

  5. A tutor can address the whole group in both media (although the responses will be different; e.g. choral response "yes" vs. non-verbal indication of approval "yes tick" in voting).

  6. The number of exchanges for every student can be influenced by design: more student-focussed tasks like pair work or group work maximise the turns between students and hence the turns for every individual student (which becomes relevant especially in larger groups).

To exemplify some of these assumptions, we will present excerpts from tutorial transcriptions.

Assumption 1: Extract from Lyceum Tutorial 2

Fanny: Was ist getrennt lebend?

Ella: Okay. Weiß das jemand von den anderen? Does anyone else know? Getrenntlebend? # Frances?

Frances: Uhm. To be separated. Live apart.

Assumption 2: Extract from Lyceum Tutorial 1

Paul: Okay. Und was brauchst du für deine Arbeit? Was musst du haben, um deine Arbeit gut auszuführen?

John: Man muss uhm, uh [???] What's got uhm a degree?

Paul: Ein Diplomen.

John: Ah ja. Ein Diplomen in Chemie haben.

Assumption 5: Extract from Lyceum Tutorial 2

Frances: Verwitwet ist der Familienstand. ### [3 secs]

Ella: Ja, again the others if you could hear Frances ok give her a yes-tick, bitte. ### [5 secs] Okay. Danke.

The last point (Assumption 6) can be illustrated by creating an sna-graph of just one task performed in Lyceum tutorial 2: a pair work activity where students are split up to practise language more or less independently. The graph shows an entire pair work task, starting with instructions by the tutor to the whole group and clarification offered to individual students. After this introduction in the plenary, students are sent off to breakout rooms for practice in pairs. The tutor joins each pair for a short period to offer help and feedback on their performance. The last section of the task is spent in the plenary again to present results and give group feedback.

Activity 4: Pair works (S3-S4)

Lyceum Group 2: 4 Students

Figure 4. Lyceum Group 2: 4 Students

As the observer, in this case, could only follow one set of students into one breakout room, the focus here is on the language practice of Fanny and Gerry (S3 and S4). It becomes immediately obvious that both students have ample opportunity to speak. They are prepared to take this chance and use German in a majority of their utterances[viii]. Unfortunately, the increase in interaction between students cannot be quantified for the Lyceum tutorials as the practicalities of gathering data influence what is available to us to investigate. In order to make recordings of pair work, the tutor or observer needs to be in the breakout room with the students. Only by being in the same room is it possible to physically make the recordings. The result of this is that it is only ever possible, given the current restrictions of the technology used, to record one pair or small group of students at a time.

The gaps in-between: A study of silences

Social network analysis, as stated above, is insufficient if we want to identify silences and gaps in the communication. To investigate the different silences, we employed an ethnographic approach; ethnography has been used extensively in classroom research (Hammersley & Atkinson, 1995; Silverman, 2001) and in applied linguistics research (Davis, 1995; Lazaraton, 2003). By observing (live or recorded) classroom interaction systematically, attention is gradually focussed on pertinent occurrences.

Silences and gaps are of interest to our research because we work from the assumption that practising oral skills is the main purpose of tutorials in the distance language course. It follows, that the opposite must also be true: silence, the absence of verbal interaction, must be considered undesirable for the language tutor. However, anyone who has ever taught in a classroom will know that this it stating the situation too simply.

"If we look into a classroom, for example, we can sometimes see the teacher's struggle to get the pupils to be silent. But then again a moment later what happens when a pupil fails to participate during a planned discussion, and remains silent …?" (Alerby & Elìdòttir, 2003, p. 47)

Silence can very often be a desideratum, and tutors and teachers can demand it from their learners and see it as a sign of attention and "hard work".

"Being silent is doing or not doing something: it is a form of action or inaction, a way of engaging or a refusal to engage; and it is a linguistic move regulated by norms and subject to normative assessments." (Medina, 2004, p. 564)

Silences - gaps in the conversation - are natural parts of every exchange. In beginners' language classes, silences are especially pertinent as they occur frequently for a number of possible reasons. Lack of linguistic skill (vocabulary, structures) is only one of those mentioned by language researchers (Harder, 1980; Jackson, 2002). There are cultural reasons for preferring to stay silent (Carbaugh et al., 2006; Jackson, 2002; Jones, 1999; Tsui, 1996; Zhou et al., 2005), and power (Jaworski & Sachdev, 1998) and gender differences influencing reticence outside (Mills, 2006; Stanley, 1993) and in the language classroom (Julé, 2003, 2004)[ix].

Some researchers interpret continuing silence at the beginning of language learning as a "Silent Phase" ; this can be based not just on the inability to say certain things but on a lack of expression for the self, altogether (Granger, 2004). Silence is the consequence of such a lack of expression. Twisting Wittgenstein's philosophical statement on silence ("7. Worüber man nicht sprechen kann, darüber muß man schweigen " [7 What we cannot speak about we must pass over in silence.] (Wittgenstein, 1974, 2004), we can use it to highlight the dilemma of the beginner language learner and their inability to fully express themselves[x].

Borrowing further from the philosophy of Wittgenstein, we could interpret the position of the language learner as a peculiar space between language games; their utterances might be nonsensical or intelligible, depending on the listener and the perspective (see Medina, 2004, p. 567). Beginners might be bolder and experiment more freely with the new language in a non-threatening environment, e.g. a language tutorial, addressing a well-meaning expert speaker, e.g. a tutor trained to interpret and guess at the meaning of unconventional statements. However, the relatively narrow range of expression in the L2 does limit what one can talk about and also the profundity and complexity of discussion. For adult language learners the chasm between L1 knowledge and everyday expression thereof and the limited range of expressions available in the L2 is more pronounced than for children. Rather than only expressing a limited range of one's thought, some (adult) learners will prefer to stay silent.

"Most learners will probably, in deciding what to say (if anything) have a sort of cut-off point for the reduction they will tolerate, below which silence is preferable. Instead of seeing silence as the extreme point on the scale of message reduction, it can also be seen as the alternative to it." (Harder, 1980, p.269)

Silence can become the preferable option where the learners lack an understanding of the importance of mistakes in the learning process. They perceive mistakes as failure rather than an opportunity to improve. Hence, they may remain silent to avoid mistakes or what they perceive as "making a fool of themselves" in front of others, i.e. the teacher or their peers.

While silence, on the one hand can be the refraining from speech, on the other hand it may be just a gap indicating necessary "thinking time" for the communicator. The speed of delivery – of native speakers' expected question-answer rhythm - can hinder genuine contributions from less confident learners. This kind of rapid-fire exchange can also be linked to language anxiety in the L2 classroom (Tsui, 1996). Jones (1999) provides an interesting example of a conversation between an American native speaker of English and a Vietnamese learner of English. The one-word responses by the Vietnamese co-locutor were interpreted by the American as a lack of interest in the conversation while the L2 speaker explained them in terms of not having had enough time to think about and provide more complex answers. The content of the conversation changed radically when the American was asked to pause for one second after each question turn. (Jones, 1999, p. 253)[xi]

Silence is not uniform: "There is not one but many silences, and they are an integral part of the strategies that underlie and permeate discourses" (Foucault, 1984, p. 310). As well as being an indication of a "loss for words", silence can also signify time for thought and the permission to remain silent (in educational contexts) can foster reflection and enable deeper thought and creativity (see Alerby & Elìdòttir, 2003; see Lamy & Goodfellow, 1999 for the value of reflection in the L2 class). Silent "periods" in class time may well be necessary for the whole group as well as for the individual learner to think and plan.

Silences are not "empty": "… one cannot not communicate." (Watzlawick et al., 1968, 49). In any situation where people are together – virtually or physically – various means and modes of communication are being employed. Oral verbal interaction is just one of them. Even the absence of it, the silence between speakers is a form of communication. According to Alerby & Elìdòttir (2003) silence is choosing not to speak when one does have something to say. In German that distinction is more easily expressed with the difference between the words "Schweigen" and "Stille". Whereas Stille means the absence of sound, Schweigen means the absence of speech.

However, the absence of communication, Schweigen / Silence can be very expressive or even aggressive in itself; and we habitually interpret silences and give them meaning ("beredtes Schweigen", "betretenes Schweigen"). In the absence of verbal clues, we base our interpretation on a variety of other modes: glance and facial expression, relative positioning and movement, gesture, "atmosphere", etc. In the absence of non-verbal clues or if these clues prove inconclusive, we can revert to speculation and base our interpretation on context, experience, expectation, assumptions, etc. Obviously, in both the media we have chosen to study here, a majority of clues that would be present in face-to-face tutorials will be absent, such as facial expressions, gesture, physical proximity or distance.

Imagine, for example, a classroom situation where the tutor has momentarily lost his place in the lecture plan and is looking through his class schedule for the next activity.

In a face-to-face classroom students can see the movement, the direction of glance, can hear the shuffling of papers, maybe breathing or a sigh and they can interpret facial expression (frustration, anxiety, worry, nervousness, etc.). Very observant students might even use contextual and comparative clues to interpret whether the tutor is tired (slower than usual movements), distracted (frequent shifting of gaze) or anxious (trembling of hands), etc.

Now imagine the same occurrence taking place in the media we analyse:

On the telephone students may just pick up the aural clues (paper rustling, sigh). But in the computer conferencing environment this continuous audio-feed is absent (Lyceum only transmits sound when the speaking button is pressed).

In both cases, learners have to rely much more on speculation than observation and clues. They can interpret the silence by referring back to clues from surrounding utterances or they can guess. Or, if their tutor is experienced and skilful, they can rely on his or her constant awareness of the limitations of the medium, which may lead them to comment more explicitly on what they are doing. For example, an experienced telephone or online tutor would provide verbal ad hoc information or post hoc explanation that might be unnecessary face-to-face.

<Telephone tutorial>

Silke: Ahm. Ich rufe jetzt aehm, den Operator an, und dann bekommt ihr 15 Minuten. Okay?

Isobel: Okay.

Silke: Ja, okay, bis gleich.

[sound of telephone dial] ### [60'' – 1 full minute while Silke talks to operator etc.]

<Lyceum tutorial 1>

Paul: OK, first of all I'm going to save this page. ### OK. Can you all see that an empty page in front of you now? Sieht ihr seht ihr eine leere Seite vor euch?

<Lyceum tutorial – other>

Anna: Dann ### bleibt bitte/ wartet ein' Moment. Ich mache ein neues Whiteboard auf. ###

A comparison of silences of more than one second length in the two different media shows some significant differences:

Silences on the telephone are less frequent and generally shorter (often they are just 2-3 second gaps, waiting for responses compared to 3 – 14 seconds on Lyceum). Some of this difference is explicable on technical grounds as speakers in Lyceum have to press a speaking button. Some network connections also cause a minimal delay. The length of time a tutor waits for responses might also be indirectly dependent on the medium, as Lyceum tutors might assume that technical difficulties can cause delays. Obviously, telephone is a medium most learners (and tutors) are more familiar with which might explain shorter waiting time, as well.

Longer silences on the telephone have a limited number of possible interpretations:

Telephone tutorial




60 seconds



2 - 6 seconds

lack of requisite skills (e.g. comprehension)


2 - 5 seconds

waiting for students' responses


2 – 4 seconds

unexplained (possibly taking notes)


2 seconds

surprise (delay in responding)

Silences in the audio-graphic medium, on the other hand, open up a whole range of speculation and possible explanations. Here are a few selected examples:

Lyceum tutorials




3 – 14 seconds

waiting for students' responses


9 seconds

waiting for students' questions


5 – 17 seconds

waiting for students' non-oral / non-verbal reactions


9 seconds

tutor using alternative mode


5 seconds – 2 minutes



50 seconds – 4 minutes

task that does not involve speaking


22 -41 seconds

handling mistake / technical glitch


more than 24 seconds

connection failure (participant absent, other participants carry on speaking after short pause)

Some of these speculative interpretations can be confirmed when looking at the non-verbal affordances of the medium: the possibility of inviting responses through voting, for example, or via text chat and typing into a document, the confirmation of absence of a participant through absence indicator or the disappearance of his or her name are options that are not given in the telephone medium. (See Appendix B for a list of additional features in Lyceum.)

In most cases, the use of these additional affordances can be confirmed by screen recordings, showing typed text, voting ticks or crosses, etc.

Lyceum tutorials


confirmation (screen recording)


waiting for students' questions

raised hand


waiting for students' non-oral / non-verbal reactions

voting ticks or crosses; typing


tutor using alternative mode

typing in document or text chat


task that does not involve speaking

[none, e.g. silent reading] typing


connection failure

disappearance of name

One clear distinction between different forms of silence is based on tutors' planning of their interaction with students and of their class organisation: some silences are indeed planned, they can be operational (e.g. connecting / disconnecting, changing settings, moving to different room or conference arrangement, etc.) or part of a specific task that does not involve speaking (e.g. silent reading, typing, drawing, etc.). As these silences are planned, they can be announced beforehand by the tutor (and indeed, all our tutors on the telephone and in Lyceum make such announcements).

<Lyceum tutorial 2>

Ella: No the verb is scheiden. And scheiden can/ very old fashioned in German it means to to to part to part from one another but in modern German it means err well it means to divorce. But I type this down as well. Just a moment please. Are there any other questions? ###

Unplanned silences are by their nature unannounced. However, some of them may be explained afterwards: connection failure, handling mistakes, etc. can be made explicit.

<Lyceum tutorial 2 [with interpretation]>

Ella: Okay. Uhm zwei haben wir noch. Uhm Edwin, kannst du bitte<eliciting response>das nächste Wort uns nennen? < [13 secs pause] waiting for student's turn>

Edwin: means cousin.

Ella: Okay. Und welche Kategorie ist Cousin? And, uhm, in German we have more of a French pronunciation. So it's Cousin not cousin Cousin. So you have the nasal "ou" as in French. Sorry.

Edwin:Sorry, I must have forgotten to press the button.<could explain long pause above>

 Cousin ist Familienbeziehung.

<different Lyceum tutorial of Paul>

Paul: OK sehr gut, erm es gibt im Deutschen erm einige Verben, die immer mit Dativ Dativ gehen, zum Beispiel danken oder helfen und gehören gehen immer mit Dativ. Like I help the man ### [22 secs] I'm sorry I was totally talking to myself then I was going off on a monologue and I didn't press the button [laughs]! I'm sorry about that...

Sarah: I heard you say about danken and gehören always taking the dative.

Where silences are not explained, sometimes speculation about what may be happening is made explicit.

<Telephone tutorial [with interpretation]>

Michele: Ja. Isobel, ich nehme an, die Zahlen über Deutschland uh in 2050 sind {fale} <incorrect pronunciation of „falsch" = wrong>. Die die , uh, Deutsch. Die Zahlen über Deutscher 2050 sind falsch. ###

[audible pause, c. 4 secs]

 Verstehen Sie? Nein <speculation by student: pause due to interlocutor not comprehending>

Silke: Isobel, hast du das verstanden?

Isobel: Nein. <confirmation>

Apparently, the student herself interpreted the silence of her co-locutor as incomprehension; an interpretation that was later confirmed.

<Lyceum tutorial 2>

Edwin: No sign of Ella, err. I don't know if O wants to say anything. ### [24 secs] Ah she is back in the lobby. ### [23 secs] Well, she was in the lobby. Can anybody else see where she went? ## [3 secs]

Fanny: Maybe she is in transit if she is still on dialogue.

The speculation in Lyceum is supported by seeing the name of the tutor appearing in a different part of the screen (see appendix B for a screenshot of Lyceum). However, no confirmation or later explanation is forthcoming during the tutorial as the tutor has technical difficulties and has disappeared.


To investigate the first of our questions, how speaking opportunities differ in online and telephone tutorials, we used social network analysis to create an overview of actual speaking turns. Our analysis has shown that telephone conferencing and audio-graphic conferencing via the internet can both be employed for student-focussed language tutorials at beginners' level and offer ample opportunity for students to use the L2. A more detailed analysis shows that interaction patterns do differ in the different tutorials. However, the medium cannot be shown to be the predominant reason for this, task design and tutor style have a strong influence on the number and frequency of students' verbal contributions in the tutorial and seem to be responsible for the difference in interaction patterns.

Overall, sna has proven to be useful for a quick overview of tutorial interaction and can – in turn – be used to demonstrate to tutors the interaction patterns following different tasks and different styles of teaching online or on the telephone. However, sna has proven inadequate for more detailed investigations of the content, form and relevance of individual turns. The quality of silences and gaps in particular escapes the numerical and graphical data analysis of sna. This is equally difficult to track through the discourse-focussed analysis by means of QSR N6 as this is mainly based on the transcriptions of verbal exchanges (Heins et al., in press). To investigate silences we therefore employed an ethnographic approach.

The third of our initial questions, the difference in quality of silences online and on the telephone merits a more detailed answer. The quality of silences and gaps in language tutorials is varied. In part this variation is dependent on the medium used, in part it depends on the use of silences by the participants. How much or how little speculation is necessary to interpret the silences is dependent on the following factors:

  • additional information provided by the medium (i.e. constant audio-feed on the telephone vs. the additional affordances of Lyceum)

  • the explicitness of the participants (e.g. how much the tutor announces in advance, how aware participants are of the necessity to verbalise non-obvious occurrences)

  • participants' expectations of the medium (e.g. students expecting technical difficulties in online conferencing vs. students expecting and accepting interference from the operator on the telephone)

The situation is similar for all beginner learners in synchronous communication, regardless of the medium they use. They have to cope with

  • a lack of vocabulary and structures;

  • a need for compensation (or communication) strategies;

  • a loss of "ego" or depth of self or sophistication in the second language.

They also have to bring to the tutorial an understanding that 'having a go at speaking' is essential for the learning process and that a mistakes do not constitute failure, but are instead a valuable opportunity to improve. We can further assume that shyness or loss of face do have an influence in online and non-visual media as well as in face-to-face learning situations (albeit maybe to a differing degree). However, where the difference is most pronounced between media with high and low bandwidth is with regard to cognitive demands or cognitive overload (students on the telephone may find concentration on aural input and output easier if they do not have to interpret visual clues at the same time). Differences between telephone conferences and internet-based audio-conferencing can be found in actual length of pauses (based partly on the delay caused by the software and by signal transmission over the internet) and in the additional information available to students during these pauses (i.e. presence or absence of shared visual input).

One of the reasons why some silences are explained and others are not may be the gap between expectation and what actually occurs. Participants with some experience can, for example, reasonably assume that pauses before a response will be longer in Lyceum than on the telephone but if the delay is much longer than expected, the need to explain arises (Edwin: "Sorry, I must have forgotten to press the button.").

Expectations will, of course, change with growing familiarity with the medium. Meanwhile however, an important role can be played by active and planned training of tutors and preparation of students for the use of alternative media in L2 communication. This is not just a convenience for the distance language tutorial, it also reflects changing means and media in everyday communication. Different media influence how we and our co-locutors interpret and evaluate the communication (Straus et al., 2001) and by using new media for language tutorials, our learners are better prepared not only for communication in the L2 but also for communication via different channels.

Based on the results of the Interaction Study Group's investigations so far and from our analyses of telephone and online beginners' tutorial interactions we can make the following recommendations for tutor and student preparation:

Recommendations for tutor training:

  • raise tutors' awareness of the influence of task design and tutor style on the number of turns and exchanges focussed on tutor vs. students;
  • ensure that tutors know the features of the medium well and adapt their expectations accordingly;
  • ensure that they can make maximum use of non-oral features of the medium;
  • Recommendations for student preparation:
  • ensure that students are aware of the features of the medium and adapt their expectations accordingly;
  • ensure that they are also prepared for non-oral features of the medium and pay attention to different modes;
  • raise expectations that they should / can fully contribute orally in whatever medium;
  • make students aware of the benefits of pair work and small group work for speaking opportunities.

The current study is limited by the small number of students in the tutorials investigated, a larger group of students might produce different interaction patterns. A further limitation is caused by the fact that this paper is part of a wider study: it is tempting to avoid the duplication of background information which can lead to a "truncated" style and missing explanations. This is also true for more extensive references to the literature and theory of second language acquisition. We have also found the study challenging, as we have to work on developing and evaluating methods at the same time as analysing the data. The new area of investigating Spoken Online Learning Events in the field of language learning has yet to establish a research framework and methodology suitable for the investigation of such events (Coleman, 2007).

As our study is ongoing, we are already planning next steps for research as well as for tutor training. Apart from completing the necessary feedback loop to the tutors on our online courses and ascertaining that they are aware of the results and conclusions of our research, we would like to further investigate the relevance of "social presence" in the different media, complete the sna analysis of all recorded tutorials (including face-to-face tutorials) and focus on non-verbal interaction in the online medium where this can be used to compensate for a lack of immediacy.

[i] This option is slowly being phased out, as Open University students are committed to a course of study that integrates basic ICT literacy within all their courses.

[ii] For an overview of qualitative methods used in applied linguistics research, see (Davis, 1995)

[iii] The part of the study reported in Duensing et al. (2006) showed the effect of task design on spoken interaction in the beginner language tutorial, regardless of whether this took place face-to-face or online. Given the centrality of interaction in SLA research Heins et al (in press) explored the qualitative features of this interaction. Their findings suggests that both students and tutors produced comprehensible in- and output. Overall there was a clear emphasis on structured L2 exchanges as could be expected with beginners' limited linguistic abilities. However, the study also showed that the different learning environments affected the level and nature of interaction: compared to the face-to-face classrooms online tutorials produced a higher ratio of student L2 in- and output, less evidence of unstructured student exchanges and tended to be characterised by a higher level of classroom management and tutor dominance. L1 student to student exchanges seldom occurred.

[iv] For samples and an analysis of tutorial tasks, see (Rosell-Aguilar, 2005)

[v] All names of participants have been changed to secure anonymity.

[vi] In Lyceum, every participant is visible to others in the group, so an observer's name will appear in the list of participants. One researcher was present during the Lyceum tutorials for recording and observation purposes and is included in the list as O (observer).

[vii] We use the small letter abbreviation "sna" here to distinguish our simplified version from the more complex method (SNA) used in social sciences.

[viii] Frances and Edwin were practising at the same time in a different breakout room. It can be assumed, although not proven here, that their interaction pattern would have been similar.

[ix] Co-incidentally, two of our tutorials analysed here are single gender classrooms: all participants in Lyceum tutorial 1 were male, all participants in the telephone tutorial were female.

[x] In German "nicht sprechen kann" (nicht können) can express the inability on the part of the subject (speaker) as well as an impossibility caused by the topic.

[xi] An analysis with sna could not show the difference between the one-word answers and the more in-depth conversation taking place at a lower pace of exchanges.


We would like to thank the tutors and students on L(ZX)193 Rundblick for their participation in this research and their willingness to be recorded and analysed.

Appendix A

Tables for sna analysis

Table 1. Lyceum tutorial 1

Lyceum tutorial 1

Table 2. Telephone tutorial

Telephone tutorial

Table 3. Lyceum tutorial 2

Lyceum tutorial 2

Appendix B

Figure 5. Lyceum Screenshot

Lyceum Screenshot

Table 4. List of selected features:



bold room number

participants in room

list of names

participants present

arrow out

temporary absence

raised hand

intention to speak

loudspeaker symbol left of name


name in separate box


tick next to name

yes vote

cross next to name

no vote

name and text chat line

text chat contribution

document mode only: key and initials

person currently typing

concept map only: highlight and initials

person currently contributing

highlight only

someone currently contributing



