Click here to view the metadata information for our course data.
At the beginning of each course, we send out a questionnaire using SurveyGizmo to all enrollees with questions about demographics, reasons for enrolling in the course, and what participants hoped to get out of the course. At the end of the course, a second questionnaire is sent, asking participants about their experience and satisfaction with the course.i Coursera also collects demographic data on course participants, and we merge that data into our own dataset to fill any necessary gaps.
Web Event Logs and Event Data
About a month after the end of a course, we collect the Web Event Logs and event data recorded by the Coursera platform. The Web Event Logs data includes every click, of every participant, within a course site. Examples of Web Event Logs data include forum and wiki views, as well as video views. The Web Event Logs data allows us to know specifically which videos the participants watched, how many times they watched videos, and whether the videos were downloaded or streamed. If the videos were streamed, we can assess when participants paused, stopped, and/or restarted the videos. Web Event Logs data also contains information about the time and date of each click, the type of device and browser that was used to access various pages in the course site, and provides the IP address of every participant. The second source of data, the event data, contains information about lectures viewed, quizzes taken, assignments submitted, dates of enrollment, the content of forum posts, quiz scores, and certificates earned.
Defining Course Activity
Our understanding of "activity" is defined as a multifaceted action that includes various combinations of typical online course behaviors such as watching lectures, taking quizzes, submitting assignments, and participating in forums.
Defining Course Participants
For session-based courses, we define a course participant as someone who displays activity within a course for at least one day. The participant category includes individuals who discontinued their activity while remaining technically enrolled in the course, as well as those who intentionally un-enrolled themselves from a course. Further, participants who registered after the official course end date have been removed from our data set.
IP data is extracted from the Web Event Logs and Event data for individual participants who are marked as "students" in the Coursera dataset. IP data is only counted for clicks made within the official course period. The IP data is subsequently matched to an IP geolocation country list and aggregated on the country from which an individual most frequently logs into the courses. This aggregated data is used to define users' locations.
While most individuals' locations can be identified, some IP addresses cannot be resolved. Additionally, some users log in through anonymous proxies and satellite providers, and some log in equally often from multiple countries, making determining a specific location impossible. Therefore, these users are included in our analyses, but not linked to particular locations.
i We do not have survey data for the first offering of Introduction to Sustainability or Heterogeneous Parallel Programming.