# Practical 1: Logistic Regression & Perceptron

Due: 16. February (23:59)

The project files are located here.

###### Image from here.

## Changelog

###### February 14th, 2018

- Updated typo from 'words' to 'features' in README.md

###### February 12th, 2018

- Pushed update which fixed type where model was trained and tested on the same data.

###### January 26th, 2018

- Practical 1 released.

## Overview

In this assignment you will implement logistic regression along with a basic perceptron model both using stochastic gradient descent (SGD). The task we will apply this to is predicting whether or not a patient has diabetes (0: no diabetes, 1: diabetes). The information we are given about the patient include:

- Number of times pregnant
- Plasma glucose concentration a 2 hours in an oral glucose tolerance test
- Diastolic blood pressure (mm Hg)
- Triceps skin fold thickness (mm)
- 2-Hour serum insulin (mu U/ml)
- Body mass index (weight in kg/(height in m)^2)
- Diabetes pedigree function
- Age (years)

You should **not** use any libraries that implement any of the logistic regression, perceptron, or sgd functionality for you. In the future we will be able to use premade implementations but this assignment will serve as a warm up and should not take more than an hour to complete.

You will turn in the assigment using the submit server by 2/16 at 11:59 p.m.

## Assignment

### Coding (20 points):

- Understand how the code works.
- (5 points) Logistic Regression: In
`logreg.py`

, implement the`predict`

function that predicts the class given the feature vector and weights. The`sigmoid`

function is already implemented for you just call`self.sigmoid(float value)`

. - (5 points) Logistic Regression: In
`logreg.py`

, finish implementing the`sg_update`

function to update the weights based on the predicted class value using SGD. - (5 points) Perceptron: In
`perceptron.py`

, implement the`predict`

function that predicts the class given the feature vector and weights. - (5 points) Perceptron: In
`perceptron.py`

, finish implementing the`sg_update`

function to update the weights based on the predicted class value using SGD.

The accuracy of both models should be above 60% for full credit.

### Analysis (10 points):

Questions (1-4) refer to the logistic regression model.

- What is the role of the learning rate?
- How many passes over the data do you need to complete?
- What features are the best predictors of each class? How (mathematically) did you find them?
- What features are the poorest predictors of classes? How (mathematically) did you find them?
- What is an advantage of the perceptron algorithm compared to logistic regression?

### Extra credit:

- Implemenent the
`normalize_dataframe`

function.- Normalize all the real valued features to be between -1 and 1.
- Explain why this would be useful.

- Use a schedule to update the learning rate.
- Supply an appropriate argument to step parameter
- Support it in your
`sg_update`

- Show the effect in your analysis document

What to turn in -

- Submit your
`logreg.py`

and`perceptron.py`

file (include your name at the top of the source) - Submit your
`analysis.pdf`

file- no more than one page
- pictures are better than text
- include your name at the top of the PDF

### Hints

- The methods for the perceptron model and the logistic regression model should look very similar with only slight differences in the update logic and prediction.
- Try debugging your code with a small number of epochs first and check if your accuracy improves over time.
- You may find
`df['feature'].max()`

,`df['feature'].min()`

, and`df['feature'].mean()`

to be useful for the`normalize_dataframe`

function extra credit.