Project

Due Monday December 10 at 11:59pm

Homework policies and submission instructions

To get started please consult the lecture notes (especially Lecture 23) and the textbook (especially sections 12.3 and 12.4).

Obtain the activities of daily life dataset from the UC Irvine machine learning website: https://archive.ics.uci.edu/ml/datasets/Dataset+for+ADL+Recognition+with+Wrist-worn+Accelerometer. Ignore any data that is in a MODEL folder.

Problem

Build a classifier that classifies the given files into the appropriate activity: 'Use_telephone', 'Standup_chair', 'Walk', 'Climb_stairs', 'Sitdown_chair', 'Brush_teeth', 'Comb_hair', 'Eat_soup', 'Pour_water', 'Descend_stairs', 'Eat_meat', 'Drink_glass', 'Getup_bed', 'Liedown_bed'.

The data items are the files themselves. The classifier you train will be able to take an activity file of arbitrary length and classify it with one of the activity labels. You might notice that each file is of a different length, which is why you will use vector quantization to turn each file into a fixed-length feature vector.

To obtain your classifier's features, you should use vector quantization, creating a histogram of cluster centers for each data item. You should use k-means clustering in order to construct the pattern vocabulary. You may use whichever multi-class classifier you wish.

Deliverables

Please hand in the following

  1. Report your total error rate and the class confusion matrix for your classifier.
  2. Then improve your classifier by (a) modifying the number of cluster centers in your k-means and (b) modifying the size of the fixed length samples that you use.
  3. Hand in your source code, your total error rate and class confusion matrix for your final classifier with an explanation of how you selected your parameters and why your chosen parameters performed well. Your explanation should not be sparse. For example, a one sentence explanation would be considered insufficient and will be graded as such. Instead, consider describing your experiments that you conducted to obtain your parameters, and why you believe that they are optimal with respect to the data and the operation of your classifier. Overall, we don't expect your explanation will be more than 1 page in length, though you're welcome to expand it further if you feel like doing so. Likewise, if you feel your explanation can be adequately represented in less than 1 page, then that is reasonable as well.

Submission guidelines

As part of your final submission, please submit two files:

  1. A pdf file with all of the above written deliverables and output deliverables (e.g. error rate and confusion matrix.)
  2. A compressed file that contains all of the source code used to complete the assignment.

If you choose to present code inline in your pdf report (e.g. exporting a Jupyter notebook as pdf), then you should use clear titles to identify the parts of the report, so that it is immediately apparent where each deliverable is located. This additional note does not apply to those who choose to separate all code from all written report details.

Any infraction of the submission guidelines will be met with grading penalties.