Systems for Machine Learning

COMPSCI 692S (seminar)

Machine learning is employed in an increasingly wide range of applications. Using ML entails developing end-to-end pipelines to collect data, clean it, and run learning and inference algorithms in a scalable manner. This results in computationally intense workloads and complex software pipelines. Systems for ML help users organize their data and scale these computationally intense problems to larger and larger datasets. At the same time, ML is having an increasing impact on systems design. Fine-tuned analytical heuristics and cost models are being replaced by learned models, following trends observed in other fields. This seminar will review cutting-edge research on these topics and allow students to work on a hands-on project. This course will primarily involve reading, presenting, and discussing papers (1 credit), and a final project building an end-to-end machine learning pipeline (3 credits).

Class meetings: Wednesday 11:15 AM-1:15 PM, CS 142

Piazza: https://piazza.com/umass/spring2020/cs692s

Seminar structure

Tutorials

All students will be required to prepare a tutorial. Each tutorial will be presented by a group of 3 students. A tutorial will cover an area and present at least 3 papers from the reading list.

The typical structure of a presentation would be along these lines:

  • What is the problem being addressed? Give context assuming people know nothing about the area. Why is the problem important? (Approx. 15 minutes)
  • Present the papers. Focus on the big ideas rather than the technicalities, but give enough details to make the presentation informative. (Approx. 40 minutes)
  • Discussion: Comparison between the different papers, strengths and weaknesses of each. (Approx. 10 minutes)
  • Q&A (Approx. 10 minutes)

Reviews

Groups will announce the 3 main papers they will present before the class. All students will have to read the papers and write a short review by the end of the day before the class. Links will be provided on Piazza and the course website.

Projects (3 credits only)

Students that have registered for the 3 credits section will also have to prepare a project. Students (in groups of 3) will also have to pick either (1) a systems problem that can benefit from the use of ML algorithms, or (2) a use case ML application that requires systems support for data collection, data cleaning, machine learning, inference, or a pipeline composing these steps. The choice will be agreed upon with the instructor.

Students will then prepare a “problem statement” report where they describe the application and identify challenges in terms of scalability, reducing running time, and/or usability. Students will also propose a solution. The claims need to be validated experimentally.

Students will then implement the solution and finally write a report describing it. The final report will validate the system design through performance measurements and/or user studies. There will be a final presentation for all projects.

Seminar structure

The course will consist of meetings with presentations. Students will be expected to participate in the following activities.

  • Presentations. Each student will have to present, alone or in a group, at one of the meetings. The presentation will cover one paper taken from a reading list published on Moodle. Each presentation will have 3 parts:
    1. Background and motivation. Present the general topic addressed in this paper, the prior related work, and the gaps that this paper addresses. Presenters are encouraged to read the most relevant work in the area to prepare this part of the presentation. (~15 minutes)
    2. Paper content. Presentation of the technical content of the paper. (~15 minutes)
    3. Potential extensions. Propose potential research directions extending the work. Optionally, this section can be presented as a specific project proposal with expected goals and intermediate milestones. (~10 minutes)
      • The instructor will offer some 3-credit independent studies based on well-defined project proposals.
    4. Discussion. Questions about the paper and discussion on the general area, the paper contributions, and the potential extensions. (open ended)
  • Paper reviews. Two days prior to each presentation, each student will have to read the presented paper and enter a review on a Google form. After the deadline, reviews will be published to the class to encourage discussions. Reviews will also be discussed in class.

  • Attendance. Attendance to the presentations is mandatory.