51 Implementation Platform1Hardware1 ProcessorIntel Duo Core2 RAM Essay

5.1 Implementation Platform

1.Hardware

1. Processor:Intel Duo Core

2. RAM: 8 GB

3. GPU: NVIDIA Tesla K80

2.Software

1. Operating System:Linux

2. Programming Languages: Python

3. Server: Reddis

5.2 Dataset Introduction

The dataset used in this work is from the

rst year (Fall 2012, Spring 2013, and

Summer 2013 semesters) of MITx and HarvardX courses o ered on the edX

platform. The dataset includes 641138 records, and each record represents sev-

eral learner’s activities in a course. In each record, we focus on the information

showed in the

gure 5 below. The dataset consists of learners’ activity data in

11 completed courses. There are 57400 records in each of which learner’s grade

is null, so these records are deleted.

Dept. of CSE, DSCE, Bangalore 78 Page 16

Predictions Of Dropouts In MOOCs 2015-19

Figure 5: Variables in each record of dataset.

Based on Chapters variable, all learners can be divided into three groups

including only registered, general, and active. The group of only registered

represents the learners who never access courseware, the group of general repre-

sents the learners who access courseware but access less than half of the available

courseware chapters, and the group of active represents the learners who access

more than half of the available courseware chapters.

Nearly 37%, 57% and 6%

of the learners belong to the groups of only registered, general and active re-

spectively. So most of the learners are not enthusiastic for learning in MOOCs,

and only few have learnt most course content.

5.3 Behavior Analysis

Both grade and certi

cate variable are important indicator for evaluating learn-

ing e ect for learners in MOOCs. So we take a statistic on grade to show the

distribution of learners in 11 courses. From the statistic we can know that learn-

ers can be divide into three categories in each course, like most of the learners

with grade of 0, about 10%-20% learners with grade over 0, but didn’t earn

certi

cate in course, and the ratio of learners who earned certi

cate is about

3%-10%. For convenience, we call the cases of grade=0, grade>0 but no cer-

Dept. of CSE, DSCE, Bangalore 78 Page 17

Predictions Of Dropouts In MOOCs 2015-19

ti

cated, and certi

cated as three categories like class 0, class 1 and class 2

respectively.

Based on the above analysis, we try to understand the di erence between

the three categories. We calculated the mean, minimum, quarter quantile, half

quantile, three quarters quantile, maximum for learner’s behavior features like

events, days, videos, chapters, forum respectively. Based on these statistics,

we can observe that the learners who get high grade are more active than the

learners with low grade in MOOCs. For example, for the case of class 0, the

mean of event, days, videos, and chapters are respectively round 100, 3, 17 and

1, but for the case of class 1 the mean numbers are respectively larger than 1000,

13, 100, and 5, and for the case of class 2 the mean numbers are respectively

larger than 5000, 40, 400 and 10.

In order to understand the distribution of the learners with di erent cate-

gories in behavior feature space. We applied K-means algorithm to the learners’

behavior features like Events, Days, Videos, Chapters, Forum in Table 1 for a

course. All the learners are clustered into two clusters or three clusters, and

then we calculated the mean, minimum, quarter quantile, half quantile, three

quarters quantile, maximum for learners’ grade in each cluster showed in Table

2 and Table 3.

From the results in Table 2, we can con

dently guess that the cluster A and

the cluster B can represent the learners who have a poor performance or better

performance in online learning respectively. And further the cluster A and the

cluster B can be regarded as the learners with the grade of 0 and the learners

with grade over 0 respectively. The average distance between the points of the

cluster A and the cluster B is 6.23 in behavior feature space.

From the results in Table 3, we can observe that the cluster A’ and the

cluster C’ can be regarded as class 0 or class 2 respectively. The cluster B’

includes the records come from the cluster A and the cluster B in Table 2 and

represents the learners whose learning performance generally and have medium

grade in examination. So we can regard the cluster B’ in Table 3 represents class

1. The average distance between the points of the cluster A’ and the cluster

B’, the cluster B’ and the cluster C’, the cluster A’ and the cluster C’ is 5.91,

2.72 and 7.87 respectively in behavior feature space. It means that there are

overlapping between class 1 and class 2, and better separability between class 0

and not class 0.

Still stressed from student homework?
Get quality assistance from academic writers!