Pa | Applied Sciences homework help

1

CAP 5625: Programming Assignment 3

Due on Canvas by Friday, November 10, 2023 at 11:59pm

Preliminary instructions

You may consult with other students currently taking CAP 5625 in your section at FAU on this
programming assignment. If you do consult with others, then you must indicate this by
providing their names with your submitted assignment. However, all analyses must be
performed independently, all source code must be written independently, and all students
must turn in their own independent assignment. Note that for this assignment, you may choose
to pair up with one other student in your section of CAP 5625 and submit a joint assignment. If
you choose to do this, then both your names must be associated with the assignment and you
will each receive the same grade.

Though it should be unnecessary to state in a graduate class, I am reminding you
that you may not turn in code (partial or complete) that is written or inspired by
others, including code from other students, websites, past code that I release
from prior assignments in this class or from past semesters in other classes I
teach, or any other source that would constitute an academic integrity violation.
All instances of academic integrity violations will receive a zero on the
assignment and will be referred to the Department Chair and College Dean for
further administrative action. A second offense could lead to dismissal from the
University and any offense could result in ineligibility for Departmental Teaching
Assistant and Research Assistant positions.

You may choose to use whatever programming language you want. However, you must provide
clear instructions on how to compile and/or run your source code. I recommend using a
modern language, such as Python, R, or Matlab as learning these languages can help you if you
were to enter the machine learning or artificial intelligence field in the future.

All analyses performed and algorithms run must be written from scratch. That is, you may not
use a library that can perform coordinate descent, cross validation, elastic net, least squares
regression, optimization, etc. to successfully complete this programing assignment (though you
may reuse your relevant code from Programming Assignments 1 and 2). The goal of this
assignment is not to learn how to use particular libraries of a language, but it is to instead
understand how key methods in statistical machine learning are implemented. With that
stated, I will provide 5% extra credit if you additionally implement the assignment using built-in
statistical or machine learning libraries (see Deliverable 6 at end of the document).

Note, credit for deliverables that request graphs, discussion of results, or specific values will not
be given if the instructor must run your code to obtain these graphs, results, or specific values.

2

Brief overview of assignment

In this assignment you will still be analyzing the same credit card data from 𝑁 = 400 training
observations that you examined in Programming Assignment 1. The goal is to fit a model that
can predict credit balance based on 𝑝 = 9 features describing an individual, which include an
individual’s income, credit limit, credit rating, number of credit cards, age, education level,
gender, student status, and marriage status. Specifically, you will perform a penalized
(regularized) least squares fit of a linear model using elastic net, with the model parameters
obtained by coordinate descent. Elastic net will permit you to provide simultaneous parameter
shrinkage (tuning parameter πœ† β‰₯ 0) and feature selection (tuning parameter 𝛼 ∈ [0,1]). The
two tuning parameters πœ† and 𝛼 will be chosen using five-fold cross validation, and the best-fit
model parameters will be inferred on the training dataset conditional on an optimal pair of
tuning parameters.

Data

Data for these observations are given in Credit_N400_p9.csv, with individuals labeled on
each row (rows 2 through 401), and input features and response given on the columns (with
the first row representing a header for each column). There are six quantitative features, given
by columns labeled β€œIncome”, β€œlimit”, β€œRating”, β€œCards”, β€œAge”, and β€œEducation”, and three
qualitative features with two levels labeled β€œGender”, β€œStudent”, and β€œMarried”.

Detailed description of the task

Recall that the task of performing an elastic net fit to training data
{(π‘₯1, 𝑦1), (π‘₯2, 𝑦2), … , (π‘₯𝑁 , 𝑦𝑁)} is to minimize the cost function

𝐽(𝛽, πœ†, 𝛼) = βˆ‘(𝑦𝑖 βˆ’βˆ‘π‘₯𝑖𝑗𝛽𝑗

𝑝

𝑗=1

)

2
𝑁

𝑖=1

+ πœ†(π›Όβˆ‘π›½π‘—
2

𝑝

𝑗=1

+ (1 βˆ’ 𝛼)βˆ‘|𝛽𝑗|

𝑝

𝑗=1

)

where 𝑦𝑖 is a centered response and where the input 𝑝 features are standardized (i.e., centered
and divided by their standard deviation). Note that we cannot use gradient descent to minimize

this cost function, as the component βˆ‘ |𝛽𝑗|
𝑝
𝑗=1 of the penalty is not differentiable. Instead, we

use coordinate descent, where we update each parameter π‘˜, π‘˜ = 1,2, … , 𝑝, in turn, keeping all
other parameters constant, and using sub-gradient rather than gradient calculations. To
implement this algorithm, depending on whether your chosen language can quickly compute
vectorized operations, you may implement coordinate descent using either Algorithm 1 or
Algorithm 2 below (choose whichever you are more comfortable implementing). Note that in
languages like R, Python, or Matlab, Algorithm 2 (which would be implemented by several
nested loops) may be much slower than Algorithm 1. Also note that if you are implementing
Algorithm 1 using Python, use numpy arrays instead of Pandas data frames for computational
speed. For this assignment, assume that we will reach the minimum of the cost function within
a fixed number of steps, with the number of iterations being 1000.

3

Algorithm 1 (vectorized):
Step 1. Fix tuning parameters πœ† and 𝛼
Step 2. Generate 𝑁-dimensional centered response vector 𝐲 and 𝑁 Γ— 𝑝 standardized

(centered and scaled to have unit standard deviation) design matrix 𝐗
Step 3. Precompute π‘π‘˜, π‘˜ = 1,2, … , 𝑝, as

π‘π‘˜ =βˆ‘π‘₯π‘–π‘˜
2

𝑁

𝑖=1

Step 4. Randomly initialize the parameter vector 𝛽 = [𝛽1, 𝛽2, … , 𝛽𝑝]

Step 5. For each π‘˜, π‘˜ = 1,2, … , 𝑝:
compute

π‘Žπ‘˜ = xπ‘˜
𝑇(𝐲 βˆ’ 𝐗𝛽 + xπ‘˜π›½π‘˜)

and set

π›½π‘˜ =

sign(π‘Žπ‘˜) (|π‘Žπ‘˜| βˆ’
πœ†(1 βˆ’ 𝛼)

2
)
+

π‘π‘˜ + πœ†π›Ό

Step 6. Repeat Step 5 for 1000 iterations or until convergence (vector 𝛽 does not change)

Step 7. Set the last updated parameter vector as οΏ½Μ‚οΏ½ = [οΏ½Μ‚οΏ½1, οΏ½Μ‚οΏ½2, … , �̂�𝑝]

4

Algorithm 2 (non-vectorized):
Step 1. Fix tuning parameters πœ† and 𝛼
Step 2. Generate 𝑁-dimensional centered response vector 𝐲 and 𝑁 Γ— 𝑝 standardized

(centered and scaled to have unit standard deviation) design matrix 𝐗
Step 3. Precompute π‘π‘˜, π‘˜ = 1,2, … , 𝑝, as

π‘π‘˜ =βˆ‘π‘₯π‘–π‘˜
2

𝑁

𝑖=1

Step 4. Randomly initialize the parameter vector 𝛽 = [𝛽1, 𝛽2, … , 𝛽𝑝]

Step 5. For each π‘˜, π‘˜ = 1,2, … , 𝑝:
compute

π‘Žπ‘˜ =βˆ‘π‘₯π‘–π‘˜

(

𝑦𝑖 βˆ’βˆ‘π‘₯𝑖𝑗𝛽𝑗

𝑝

𝑗=1
π‘—β‰ π‘˜ )

𝑁

𝑖=1

and set

π›½π‘˜ =

sign(π‘Žπ‘˜) (|π‘Žπ‘˜| βˆ’
πœ†(1 βˆ’ 𝛼)

2
)
+

π‘π‘˜ + πœ†π›Ό

Step 6. Repeat Step 5 for 1000 iterations or until convergence (vector 𝛽 does not change)

Step 7. Set the last updated parameter vector as οΏ½Μ‚οΏ½ = [οΏ½Μ‚οΏ½1, οΏ½Μ‚οΏ½2, … , �̂�𝑝]

Note that we define

sign(π‘₯) = {
βˆ’1 if π‘₯ < 0
1 if π‘₯ β‰₯ 0

π‘₯+ = {
0 if π‘₯ < 0
π‘₯ if π‘₯ β‰₯ 0

and we use the notation xπ‘˜ as the π‘˜th column of the design matrix 𝐗 (the π‘˜th feature vector).
This vector by definition is an 𝑁-dimensional column vector.

When randomly initializing the parameter vector, I would make sure that the parameters start
at small values. A good strategy here may be to randomly initialize each of the 𝛽𝑗, 𝑗 = 1,2, … , 𝑝,

parameters from a uniform distribution between βˆ’1 and 1.

Effect of tuning parameter on inferred regression coefficients

You will consider a discrete grid of nine tuning parameter values πœ† ∈
{10βˆ’2, 10βˆ’1, 100, 101, 102, 103, 104, 105, 106} where the tuning parameter is evaluated across

a wide range of values on a log scale, as well as six tuning parameter values 𝛼 ∈ {0,
1

5
,
2

5
,
3

5
,
4

5
, 1}.

For each tuning parameter value pair, you will use coordinate descent to infer the best-fit
model. Note that when 𝛼 = 0, we obtain the lasso estimate, and when 𝛼 = 1, we obtain the
ridge regression estimate.

5

Deliverable 1: Illustrate the effect of the tuning parameter on the inferred elastic net
regression coefficients by generating six plots (one for each 𝛼 value) of nine lines (one for

each of the 𝑝 = 9 features), with the 𝑦-axis as �̂�𝑗, 𝑗 = 1,2, … ,9, and the π‘₯-axis the

corresponding log-scaled tuning parameter value log10(πœ†) that generated the particular �̂�𝑗.

Label both axes in all six plots. Without the log scaling of the tuning parameter πœ†, the plots
will look distorted.

Choosing the best tuning parameter

You will consider a discrete grid of nine tuning parameter values πœ† ∈
{10βˆ’2, 10βˆ’1, 100, 101, 102, 103, 104, 105, 106} where the tuning parameter is evaluated across

a wide range of values on a log scale, as well as six tuning parameter values 𝛼 ∈ {0,
1

5
,
2

5
,
3

5
,
4

5
, 1}.

For each tuning parameter value pair, perform five-fold cross validation and choose the pair of
πœ† and 𝛼 values that give the smallest

CV(5) =
1

5
βˆ‘MSE𝑖

5

𝑖=1

where MSE𝑖 is the mean squared error on the validation set of the 𝑖th-fold.

Note that during the five-fold cross validation, you will hold out one of the five sets (here 80
observations) as the Validation Set and the remaining four sets (the other 320 observations)
will be used as the Training Set. On this Training Set, you will need to center the output and
standardize (center and divided by the standard deviation across samples) each feature. These
identical values used for centering the output and standardizing the input will need to be
applied to the corresponding Validation Set, so that the Validation set is on the same scale.
Because the Training Set changes based on which set is held out for validation, each of the five
pairs of Training and Validation Sets will have different centering and standardization
parameters.

Deliverable 2: Illustrate the effect of the tuning parameters on the cross validation error by
generating a plot of six lines (one for each 𝛼 value) with the 𝑦-axis as CV(5) error, and the π‘₯-

axis the corresponding log-scaled tuning parameter value log10(πœ†) that generated the
particular CV(5) error. Label both axes in the plot. Without the log scaling of the tuning

parameter πœ†, the plots will look distorted.

Deliverable 3: Indicate the pair of values πœ† and 𝛼 that generated the smallest CV(5) error.

Deliverable 4: Given the optimal πœ† and 𝛼 pair, retrain your model on the entire dataset of
𝑁 = 400 observations and provide the estimates of the 𝑝 = 9 best-fit model parameters.
How do these estimates compare to the estimates obtained from ridge regression (𝛼 = 1
under optimal πœ† for 𝛼 = 1) and lasso (𝛼 = 0 under optimal πœ† for 𝛼 = 0) on the entire
dataset of 𝑁 = 400 observations?

6

Deliverable 5: Provide all your source code that you wrote from scratch to perform all analyses
(aside from plotting scripts, which you do not need to turn in) in this assignment, along with
instructions on how to compile and run your code.

Deliverable 6 (extra credit): Implement the assignment using statistical or machine learning
libraries in a language of your choice. Compare the results with those obtained above, and
provide a discussion as to why you believe your results are different if you found them to be
different. This is worth up to 5% additional credit, which would allow you to get up to 105% out
of 100 for this assignment.

Order a unique copy of this paper
(550 words)

Approximate price: $22

Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

We value our customers and so we ensure that what we do is 100% original..
With us you are guaranteed of quality work done by our qualified experts.Your information and everything that you do with us is kept completely confidential.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

The Product ordered is guaranteed to be original. Orders are checked by the most advanced anti-plagiarism software in the market to assure that the Product is 100% original. The Company has a zero tolerance policy for plagiarism.

Read more

Free-revision policy

The Free Revision policy is a courtesy service that the Company provides to help ensure Customer’s total satisfaction with the completed Order. To receive free revision the Company requires that the Customer provide the request within fourteen (14) days from the first completion date and within a period of thirty (30) days for dissertations.

Read more

Privacy policy

The Company is committed to protect the privacy of the Customer and it will never resell or share any of Customer’s personal information, including credit card data, with any third party. All the online transactions are processed through the secure and reliable online payment systems.

Read more

Fair-cooperation guarantee

By placing an order with us, you agree to the service we provide. We will endear to do all that it takes to deliver a comprehensive paper as per your requirements. We also count on your cooperation to ensure that we deliver on this mandate.

Read more

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency