Systems for Data Science

COMPSCI 532

In this course, students will learn the fundamentals behind large-scale systems used for data science. We will cover the issues involved in scaling up (to many processors) and out (to many nodes) parallelism in order to perform fast analyses on large datasets. These include locality and data representation, concurrency, distributed databases and systems, performance analysis and understanding. We will explore the details of existing and emerging data science platforms, including map-reduce and data analytics systems like Hadoop and Apache Spark, graph databases, stream processing systems, and systems for machine learning.

Class meetings: Monday/Wednesday 4:00pm-5:15pm Eastern Time, synchronous lectures which will be recorded and made available.

TA: Aidan O’Neill (aconeill@umass.edu)

Graders: Nikhil Maryala (nmaryala@umass.edu) and Kautilya Mukund Rajbhara (krajbhara@cs.umass.edu)

Prerequisites: COMPSCI 311, COMPSCI 345, and COMPSCI 377.

Credits: 3

Required Texts: This is an emerging topic so we will read and review recent technical papers, which represent the reading material for the exams. The course slides will be made available before exams.

Course Format

Classes will be synchronous and will be offered on Zoom. They will be recorded and made available online. Synchronous attendance is not mandatory, although it is encouraged. Students who cannot attend synchronously because of the time zone are encouraged to ask questions during the office hours, which will also be synchronous.

Students are expected to attend the lectures, synchronously or asynchronously, and study the material presented during the lectures. They will also have to participate to the following activities:

  • Reviews and homework:
    • Before most lectures, students will have to submit either a paper review or a homework report, depending on the class schedule.
    • For paper reviews, students will have to read a paper and submit a review having the following fields
      • Summary
      • Strengths of the system
      • Aspects that were not clear or too difficult
      • Limitations of the system or possible extensions
    • Homework assignments are hands-on. Students will have to submit a short report.
    • Reviews and homework will be graded as pass/fail, where fail is 50 and pass 100. A missing submission will be graded 0.
  • Quizzes:
    • There will be weekly multiple-choice quizzes about the material discussed in the week.
  • Projects:
    • There will be two/three larger programming assignments.
    • Projects can be performed in groups of up to three people.
    • Each student, including students in a group, will submit their work individually. Each submission will be graded on its own.
  • Group policy and plagiarism
    • Members of the same group can engage in low-level discussions about the project. Submissions of students in the same group can (but don’t have to) be similar or even identical.
    • Low-level discussions about the project between students in different groups will be considered plagiarism. Similarities between submissions of students in different groups will be considered as signs of potential plagiarism.
  • Project submissions
    • Work will be submitted by committing using a private GitHub repo created by the student. Students will communicate the repo to the instructor
    • Submissions will include:
      • The code of the submission.
      • A small report describing how you fulfilled the different tasks (more details will be provided with the project description).
      • An automated script for deploying, running, and testing the code with a single shell command. It is the responsibility of the student to ensure that the code runs correctly on the computer of the grader.
  • Exams:
    • There will be two exams. The midterm exam will only be about the material discussed before the exam. The final exam will mainly be about the material after the midterm but it will require familiarity with the material of the entire course.

Examination schedule.

Quizzes and exams will be performed on Moodle.

  • Quizzes. Quizzes will be released weekly on Thursdays at 1 pm U.S. Eastern Time. Students will have 24 hours to start a quiz session. Once started, there will be a limited time to complete the quiz.

  • Exams. There will be a midterm and a final exam. Students will have 24 hours to start an exam session. Once started, there will be a limited time to complete the exam. The examination dates are:

    • Midterm: Students can start the exam within 24h from Friday, September 24, 8 am Eastern Time. The duration of the exam will be approximately 2 hours.
    • Final: Students can start the exam within 24h from Thursday, November 3, 8 am Eastern Time. The duration of the exam will be approximately 2 hours.

Grading criteria

Each assignment will be given a grade in a 0/100 scale. The average of all assignments in a category will be weighted as follows:

  • Reviews and homework: 5%
  • Quizzes: 20%
  • Projects: 30%
  • Midterm exam: 20%
  • Final exam: 25%

Accommodation Statement

The University of Massachusetts Amherst is committed to providing an equal educational opportunity for all students. If you have a documented physical, psychological, or learning disability on file with Disability Services (DS), you may be eligible for reasonable academic accommodations to help you succeed in this course. If you have a documented disability that requires an accommodation, please notify me within the first two weeks of the semester so that we may make appropriate arrangements.

Academic Honesty Statement

Since the integrity of the academic enterprise of any institution of higher education requires honesty in scholarship and research, academic honesty is required of all students at the University of Massachusetts Amherst. Academic dishonesty is prohibited in all programs of the University. Academic dishonesty includes but is not limited to: cheating, fabrication, plagiarism, and facilitating dishonesty. Appropriate sanctions may be imposed on any student who has committed an act of academic dishonesty. Instructors should take reasonable steps to address academic misconduct. Any person who has reason to believe that a student has committed academic dishonesty should bring such information to the attention of the appropriate course instructor as soon as possible. Instances of academic dishonesty not related to a specific course should be brought to the attention of the appropriate department Head or Chair. Since students are expected to be familiar with this policy and the commonly accepted standards of academic integrity, ignorance of such standards is not normally sufficient evidence of lack of intent (http://www.umass.edu/dean_students/codeofconduct/acadhonesty/).