5.1 Implementation Platform
1.Hardware
1. Processor:Intel Duo Core
2. RAM: 8 GB
3. GPU: NVIDIA Tesla K80
2.Software
1. Operating System:Linux
2. Programming Languages: Python
3. Server: Reddis
5.2 Dataset Introduction
The dataset used in this work is from the
rst year (Fall 2012, Spring 2013, and
Summer 2013 semesters) of MITx and HarvardX courses o ered on the edX
platform. The dataset includes 641138 records, and each record represents sev-
eral learner’s activities in a course. In each record, we focus on the information
showed in the
gure 5 below. The dataset consists of learners’ activity data in
11 completed courses. There are 57400 records in each of which learner’s grade
is null, so these records are deleted.
Dept. of CSE, DSCE, Bangalore 78 Page 16
Predictions Of Dropouts In MOOCs 2015-19
Figure 5: Variables in each record of dataset.
Based on Chapters variable, all learners can be divided into three groups
including only registered, general, and active. The group of only registered
represents the learners who never access courseware, the group of general repre-
sents the learners who access courseware but access less than half of the available
courseware chapters, and the group of active represents the learners who access
more than half of the available courseware chapters.
Nearly 37%, 57% and 6%
of the learners belong to the groups of only registered, general and active re-
spectively. So most of the learners are not enthusiastic for learning in MOOCs,
and only few have learnt most course content.
5.3 Behavior Analysis
Both grade and certi
cate variable are important indicator for evaluating learn-
ing e ect for learners in MOOCs. So we take a statistic on grade to show the
distribution of learners in 11 courses. From the statistic we can know that learn-
ers can be divide into three categories in each course, like most of the learners
with grade of 0, about 10%-20% learners with grade over 0, but didn’t earn
certi
cate in course, and the ratio of learners who earned certi
cate is about
3%-10%. For convenience, we call the cases of grade=0, grade>0 but no cer-
Dept. of CSE, DSCE, Bangalore 78 Page 17
Predictions Of Dropouts In MOOCs 2015-19
ti
cated, and certi
cated as three categories like class 0, class 1 and class 2
respectively.
Based on the above analysis, we try to understand the di erence between
the three categories. We calculated the mean, minimum, quarter quantile, half
quantile, three quarters quantile, maximum for learner’s behavior features like
events, days, videos, chapters, forum respectively. Based on these statistics,
we can observe that the learners who get high grade are more active than the
learners with low grade in MOOCs. For example, for the case of class 0, the
mean of event, days, videos, and chapters are respectively round 100, 3, 17 and
1, but for the case of class 1 the mean numbers are respectively larger than 1000,
13, 100, and 5, and for the case of class 2 the mean numbers are respectively
larger than 5000, 40, 400 and 10.
In order to understand the distribution of the learners with di erent cate-
gories in behavior feature space. We applied K-means algorithm to the learners’
behavior features like Events, Days, Videos, Chapters, Forum in Table 1 for a
course. All the learners are clustered into two clusters or three clusters, and
then we calculated the mean, minimum, quarter quantile, half quantile, three
quarters quantile, maximum for learners’ grade in each cluster showed in Table
2 and Table 3.
From the results in Table 2, we can con
dently guess that the cluster A and
the cluster B can represent the learners who have a poor performance or better
performance in online learning respectively. And further the cluster A and the
cluster B can be regarded as the learners with the grade of 0 and the learners
with grade over 0 respectively. The average distance between the points of the
cluster A and the cluster B is 6.23 in behavior feature space.
From the results in Table 3, we can observe that the cluster A’ and the
cluster C’ can be regarded as class 0 or class 2 respectively. The cluster B’
includes the records come from the cluster A and the cluster B in Table 2 and
represents the learners whose learning performance generally and have medium
grade in examination. So we can regard the cluster B’ in Table 3 represents class
1. The average distance between the points of the cluster A’ and the cluster
B’, the cluster B’ and the cluster C’, the cluster A’ and the cluster C’ is 5.91,
2.72 and 7.87 respectively in behavior feature space. It means that there are
overlapping between class 1 and class 2, and better separability between class 0