A Benchmarking Study of K-Means and Kohonen Self-Organizing Maps Applied to Features of Mooc Participants

Rosa Cabedo Gallén, Edmundo Tovar Caro, Technical University of Madrid, Spain


MOOC format is characterized by the great diversity of enrolled people. Their heterogeneity represents an opportunity to identify the underlying relationships present in the internal structure of the data. This paper aims at identifying and analysing MOOC participants’ profiles by running two unsupervised clustering techniques: K-Means as a partitioning approach and Kohonen’s Self-Organizing Maps (SOM) as a competitive learning technique.

The dataset comes from MOOCKnowledge project data collection. After the execution of both clustering algorithms, the evaluation stage is performed with the validation measures: an intra-cluster measure and an overall quality criterion for K-Means, and two measures related to topological ordering for SOM.

The interpretation of the resulting profiles is made with the help of a matrix of prevalence levels. The similarities between the two resulting clustering on the one hand, and some pinpointed differences on the other are highlighted. They cannot be evaluated in advance without the opinion of an expert familiarized with the specifications of the MOOC.

After a preliminary study the results are not considered conclusive. For sure there is a long way in order to help stakeholders on how to identify and select the appropriate clustering according to several quality criteria.

Abstract in Spanish

Este artículo tiene como objetivo la identificación y el análisis de un conjunto de perfiles de participantes MOOC con la aplicación de dos técnicas de agrupamiento no supervisadas: K-Means como algoritmo particional y Kohonen Self-Organizing Maps (SOM) como una técnica representativa de aprendizaje competitivo.

El conjunto de datos del estudio tiene su origen en el proyecto MOOCKnowledge. Tras la etapa de ejecución, la evaluación del agrupamiento se realiza con una selección de medidas de validación: una de dentro de cada grupo (intra-cluster) y la segunda de calidad global del agrupamiento (average Silhouette width) para K-Means, así como dos medidas relacionadas con el orden topológico para SOM.

La interpretación de los perfiles resultantes de los dos agrupamientos se realiza con la ayuda de una matriz de niveles de prevalencia. Tanto las similitudes como las diferencias identificadas en las matrices no pueden evaluarse de antemano sin la opinión de un experto familiarizado con el formato MOOC.

Tras un estudio preliminar los resultados no se consideran concluyentes. Sin duda, hay todavía un largo camino para ayudar a las partes interesadas sobre cómo identificar y seleccionar la agrupación adecuada de acuerdo con varios criterios de calidad.

Keywords: MOOC profiles, K-Means, Kohonen’s Self-Organizing Maps, SOM, cluster analysis, clustering

If you would like to read the entire contribution, please click here.



e-learning, distance learning, distance education, online learning, higher education, DE, blended learning, MOOCs, ICT, information and communication technology, collaborative learning, internet, interaction, learning management system, LMS,

Current issue on Sciendo

– electronic content hosting and distribution platform

EURODL is indexed by ERIC

– the Education Resources Information Center, the world's largest digital library of education literature

EURODL is indexed by DOAJ

– the Directory of Open Access Journals

EURODL is indexed by Cabells

– the Cabell's Directories

EURODL is indexed by EBSCO

– the EBSCO Publishing – EBSCOhost Online Research Databases

For new referees

If you would like to referee articles for EURODL, please write to the Chief Editor Ulrich Bernath, including a brief CV and your area of interest.