In the paper, I tend to explore student dropout behavior in Massive Open Online Courses.I use a case study of a recent Coursera course from that I tend to develop a survival model that allows us to measure the influence of factors extracted from that information on student dropout rate. Specifically I also explore factors related to student behavior and social positioning within discussion forums using standard social network analytic techniques. These analysis reveals several significant predictors of dropout.
With the recent boom in development of instructional resources each in business and academe, large Open on-line Courses (MOOC) have speedily enraptured into an area of prominence within the media, in scholarly publications, and within the mind of the general public.
The hope is that this new surge of development can bring the vision of equitable access to long learning opportunities at intervals sensible reach. MOOCs supply several valuable learning experiences to students, from video lectures, readings, assignments and exams, to opportunities to attach and collaborate with alternatives through rib discussion forums and other internet two.
0 technologies. Withal, despite all this potential, MOOCs have to date didn’t turn out proof that this potential is being realized within the current mental representation of MOOCs. Of specific concern is that the very high rates of attrition that are rumored for this initial generation of MOOCs.
One vital hurdle that stops MOOCs from reaching their transformative potential is that they fail to supply the type of social setting that’s tributary to sustained engagement and learning, particularly as students arrive in waves to those on-line learning communities. The distinctive biological process history of MOOCs creates challenges that need insight into the inner-workings of large scale social interaction so as to fulfill. specifically, instead of evolving step by step as higher understood types of on-line communities, MOOCs originate nightlong then expand in waves as new cohorts of scholars arrive from week to week to start the course. As large communities of strangers that lack shared practices that will change them to create confirmatory bonds of interaction, these communities grow in unruly manner.
In the remainder of the paper we have a tendency to initial supply a a lot of elaborate review of current analysis connected to attrition in MOOCs. Next, we have a tendency to describe our alpha work that impelled our applied math analysis. We then define our methodology for extracting indicators after all engagement on the method, as well as each social network measures and measures of engagement in discussion forums. we have a tendency to then gift results from a survival analysis model to analyze the factors that contribute to course dropout on the method. we have a tendency to conclude with discussion and our vision for future analysis, notably targeted on opportunities for modeling victimization machine learning and graphical modeling techniques.
In this section we have a tendency to define the views on students attrition that are explored to date within the rising literature on MOOCs. Abundant of this work with success leverages machine learning and different advanced computer science strategies.
Some previous work has targeted on queries associated with the preparation of scholars for participating in MOOC learning. as an example, Tinto et al. make a case for student dropout in as a socio-psychological processes that occur as students transition from their life before university participation and their new engagement in university life. Thus, this work has targeted on factors that have an effect on readiness to interact during this new stage of life, together with gender roles and expectations, money resources etc. Other add this space focuses on environmental factors at intervals the university learning community itself.
The environmental factors simply mentioned each dramatically have an effect on the external forces that affect student motivation further because the that means and significance of dropout itself as a construct. In a shot to raised perceive the individual variations and environmental factors that influence student disengagement
In MOOCs, Kizilcec et al. applied associate unattended machine learning approach to characterize patterns of engagement and disengagement in 3 computer science courses of variable issue levels. They know rising clusters of learners characterized as finishing, auditing, disengaging and sampling learners. In their work, they evaluated factors like demographic info from surveys, geo-location of learners, declared intentions upon enrolling in courses, and preliminary forum activity. Their goal, was to grasp the factors that have an effect on dropout on the manner so as to inspire development of interventions which may address these shortcomings. Huang et al.analyzed students submission behaviors in MOOCs, chiefly that specialize in the syntax and practical similarity of the submissions so as to explore the extent to that these characteristics create predictions regarding commitment to the course over time. Whereas this work provides a abstract foundation for our own efforts, it doesn’t address our specific queries associated with the association between social relations at intervals the discussion threads and the way that affects dropout.
3. Data and Exploratory Analysis
In preparation for participating in a very partnership with a team for a Coursera MOOC that was launched in Fall of 2013, we have a tendency to got permission by Coursera to scrape and study a little range of different courses. Our goal was to realize insights that will to develop tools for pedagogue support.least once, and our created social network graph from interactions among the discussion forums contained a complete of 3848 edges.
Our preliminary work unconcealed clearly totally different trajectories through the course for cohorts began at different times. The earliest cohorts completed additional of the course and were less doubtless to drop out. Students from later cohorts, maybe in a shot to catch up, seemed to adopt the strategy of starting their active engagement with the materials of the course throughout the week of fabric before the week within which they joined instead of starting with the week one materials.
What appeared most vital in our beta work was that the later cohort students seemed to have hassle obtaining integrated into the community discussion. The few high spatial relation participants were primarily from the earliest cohort. Members from the first cohort were conjointly additional doubtless than others to continue collaborating in discussion forums dedicated to earlier parts of the course long once they advanced past that material. all the same, the scholars joined throughout later cohorts and announce within the same sub forums wherever they were active weren’t extremely doubtless to have interaction in discussions with them. Students in later cohorts were additional doubtless to stay at the outer boundary whereas students from the sooner cohorts continuing interacting with each other. Later cohort students seemed to adopt a less intense sort of participation. They announce at a lower rate than the sooner cohorts. and that they seldom came to discussions in earlier weeks once they’d advanced to instant weeks of fabric.
This pattern points either to psychological feature variations between students be a part of early and students be a part of late or to challenge be a part of the course late in obtaining integrated properly into the course. so as to formalize these findings and perceive this pattern higher, we have a tendency to utilized a documented applied machine learning technique that has been used antecedently to know participation patterns in different varieties of online communities.
4. Method: Exploring Factors Affecting Dropout through
Operationalizations of Social Positioning
Our exploratory analysis suggested that student behavior in the discussion forum might predict attrition.This makes sense intuitively. For example, a student who has decided to finish an online course might be more likely to dig in to the details of each assignment and lecture, which might make him/her more inclined to actively post questions, reply or comment in the discussion forum. In an attempt to operationalize these factors, we define metrics related to posting behavior and social positioning within the resulting reply network.
4.1 Posting Behavior
The features we considered for each person each week include:
Thread starter- Refers to whether a student has started a thread within the particular week or not (binary value of 0/1); Sub thread starter- Refers to whether a student has started a sub thread within the particular week or not. Such people are actually discussion initiators within threads whose posts generate comments greater than a particular threshold. We arbitrarily choose the threshold value as 3 comments (binary value of 0/1)
Post length- Refers to the number of posts for a particular user; Post density- Refers to the Post length divided by Post duration for the weeks a student survived; Post Duration refers to the time difference between the first post and last post in current week.
Content length- Refers to the number of characters spoken on the discussion forum; Content density- Refers to the Content length divided by Post length for the weeks a student survived.
The motivation behind examining the above forum features is to gain insights into :
1)whether thread starters and sub thread starters differ in their probabilities of surviving,
as opposed to people who only reply to threads/sub threads;
2)whether the pattern of posting makes any difference to students survival;
3)whether survival of students is affected by starting/not starting threads/sub threads and then engaging/not engaging in active discussion afterwards. For example, when there are a stre
am of bursty posts from a student, is it a potential indicator of their interestedness in the course, and are they are more or less likely to dropout afterward?
4.2 Social Network Behavior
To fully perceive the structured discussion forum we have a tendency to explore a large vary of normal social network analytic measures that capture aspects of social positioning among the ensuing reply networks. In our network systematization, thread starters or thread initiators have associate degree outward link to all or any those that have denote or commented among that exact thread. If folks post over once within a thread, we have a tendency to count them as having a stronger tie strength to the thread starter. We use directed links in our network construction. Because we have a tendency to have an interest in however behavior among per week affects survival to succeeding week of the course, we have a tendency to construct a separate network specifically for every week of participation so extract the measures from that network. We try this for every student in every week of their active participation.
This representation of each student week of behavior is then input to the survival model.
Centrality measures such as Degree (which is the average value of number of inlinks and outlinks), Eigenvector centrality (which measures node importance in a network based a nodes connections) Betweenness (which measures how often a node appears on the shortest path between nodes in the network) and Closeness centrality (which measure average distance from a given starting node to all other nodes in the network). Using a similar analogy as in Borgatti et al., we can say that Eigenvector centrality will capture the long term direct and indirect effect of a students interaction patterns and the implication of their connections in the MOOC network, while Degree centrality will capture more immediate effects.
Average Clustering coefficient- Indicates how nodes are embedded in their neighborhood,which can be thought of as an overall indication of the small world effect or clustering in the network. The motivation for examining this factor as a potential indicator of dropout is because, if there is absence of cliquishness in discussion forums, students would not find enough active partners to engage in discussions. So, due to lack of support and fruitful means of engagement, they would be more inclined to leave the course. Having more tightly knit neighbors influences structural location of students in the network, which in turn facilitates discussion and motivates students to remain in the course.
Eccentricity – Indicates the distance from a given starting node to the farthest node in the network. As an extreme measure of centrality, importance of examining this variable in the MOOC network is to intuitively monitor how response time affects students participation. In the MOOC, students with low eccentricity values will primarily be at center of the graph and therefore more receptive to information, as compared to students having high eccentricity who will belong to the periphery of the network. And if students are partially cut off from the influence of the core or central group of students in the network, their chances of dropout might be increased.
Authority and Hub scores – These indicate how valuable information stored in a node is and the quality of the nodes links. In a MOOC, students with a good authority scores are those who engage other students in discussions. Students with good hub scores are those are who get engaged in discussions initiated by many active learners such as thread starters or sub thread starters. We find a strong correlation between these two measures in our data.