Practical 1: Logistic Regression & Perceptron
Due: 16. February (23:59)
The project files are located here.
Image from here.
Changelog
February 14th, 2018
- Updated typo from 'words' to 'features' in README.md
February 12th, 2018
- Pushed update which fixed type where model was trained and tested on the same data.
January 26th, 2018
- Practical 1 released.
Overview
In this assignment you will implement logistic regression along with a basic perceptron model both using stochastic gradient descent (SGD). The task we will apply this to is predicting whether or not a patient has diabetes (0: no diabetes, 1: diabetes). The information we are given about the patient include:
- Number of times pregnant
- Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- Diastolic blood pressure (mm Hg)
- Triceps skin fold thickness (mm)
- 2-Hour serum insulin (mu U/ml)
- Body mass index (weight in kg/(height in m)^2)
- Diabetes pedigree function
- Age (years)
You should not use any libraries that implement any of the logistic regression, perceptron, or sgd functionality for you. In the future we will be able to use premade implementations but this assignment will serve as a warm up and should not take more than an hour to complete.
You will turn in the assigment using the submit server by 2/16 at 11:59 p.m.
Assignment
Coding (20 points):
- Understand how the code works.
- (5 points) Logistic Regression: In
logreg.py
, implement thepredict
function that predicts the class given the feature vector and weights. Thesigmoid
function is already implemented for you just callself.sigmoid(float value)
. - (5 points) Logistic Regression: In
logreg.py
, finish implementing thesg_update
function to update the weights based on the predicted class value using SGD. - (5 points) Perceptron: In
perceptron.py
, implement thepredict
function that predicts the class given the feature vector and weights. - (5 points) Perceptron: In
perceptron.py
, finish implementing thesg_update
function to update the weights based on the predicted class value using SGD.
The accuracy of both models should be above 60% for full credit.
Analysis (10 points):
Questions (1-4) refer to the logistic regression model.
- What is the role of the learning rate?
- How many passes over the data do you need to complete?
- What features are the best predictors of each class? How (mathematically) did you find them?
- What features are the poorest predictors of classes? How (mathematically) did you find them?
- What is an advantage of the perceptron algorithm compared to logistic regression?
Extra credit:
- Implemenent the
normalize_dataframe
function.- Normalize all the real valued features to be between -1 and 1.
- Explain why this would be useful.
- Use a schedule to update the learning rate.
- Supply an appropriate argument to step parameter
- Support it in your
sg_update
- Show the effect in your analysis document
What to turn in -
- Submit your
logreg.py
andperceptron.py
file (include your name at the top of the source) - Submit your
analysis.pdf
file- no more than one page
- pictures are better than text
- include your name at the top of the PDF
Hints
- The methods for the perceptron model and the logistic regression model should look very similar with only slight differences in the update logic and prediction.
- Try debugging your code with a small number of epochs first and check if your accuracy improves over time.
- You may find
df['feature'].max()
,df['feature'].min()
, anddf['feature'].mean()
to be useful for thenormalize_dataframe
function extra credit.