[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

BB Multiple Regression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 59

UNCLASSIFIED / FOUO

UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
National Guard
Black Belt Training
Module 37
Multiple Regression
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
CPI Roadmap Analyze
Note: Activities and tools vary by project. Lists provided here are not necessarily all-inclusive.
TOOLS
Value Stream Analysis
Process Constraint ID
Takt Time Analysis
Cause and Effect Analysis
Brainstorming
5 Whys
Affinity Diagram
Pareto
Cause and Effect Matrix
FMEA
Hypothesis Tests
ANOVA
Chi Square
Simple and Multiple
Regression
ACTIVITIES
Identify Potential Root Causes
Reduce List of Potential Root
Causes
Confirm Root Cause to Output
Relationship
Estimate Impact of Root Causes
on Key Outputs
Prioritize Root Causes
Complete Analyze Tollgate
1.Validate
the
Problem
4. Determine
Root
Cause
3. Set
Improvement
Targets
5. Develop
Counter-
Measures
6. See
Counter-
Measures
Through
2. Identify
Performance
Gaps
7. Confirm
Results
& Process
8. Standardize
Successful
Processes
Define Measure Analyze Control Improve
8-STEP PROCESS
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
3
Multiple Regression
Learning Objectives
Understand how to identify correlation with multiple
variables
Learn how to create a mathematical model for the
effect of multiple inputs on an output variable
Understand and identify multicollinearity
Understand how to use best subsets to identify the
best model
Examine unusual observations to learn more about
the data
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
4
Multiple Regression
Multiple Regression
In Simple Linear Regression, we
had:
Y = B
0
+ B
1
X
In Multiple Linear Regression,
we have:
Y = B
0
+ B
1
X
1
+ B
2
X
2
+
B
3
X
3
Wed like to identify which, if any,
of the predictor variables are
useful in predicting Y
Y
X
1
X
5
X
4
X
3
X
2
Y = f(X)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
5
Multiple Regression
When Should I Use Multiple Regression?
The tool depends on the data type. Regression is typically used with a continuous
input and a continuous response but may also be used with count or categorical
inputs and outputs.
Continuous Attribute
A
t
t
r
i
b
u
t
e





C
o
n
t
i
n
u
o
u
s
Independent Variable (X)
D
e
p
e
n
d
e
n
t

V
a
r
i
a
b
l
e

(
Y
)
Regression ANOVA
Logistic
Regression
Chi-Square (
2
)
Test
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
6
Multiple Regression
Basic Steps for Regression Modeling
Process Flowchart
S I P O C
Scatter Plot,
Histogram
Correlation, Test
Hypothesis
Regression Analysis
STEPS
OBJECTIVES KEY QUESTIONS
To identify KPIVs and
KPOVs
Which KPIVs will significantly
improve which KPOVs?
To visualize the data
Does it look like there is
C&E relationship?
To qualify the C&E relationship
(Strength, % Variability, P-value)
To quantify the C&E relationship
(Method of Least Squares)
How strong is the C&E
relationship?
What is the prediction
equation?
Residual Analysis
To validate the model selected
Is there anything suspicious
with the model selected?
1
5
4
3
2
KPIV = Key Process Input Variables KPOV = Key Process Output Variables
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
7
Multiple Regression
Example: Production Plant
A chemical engineer is investigating the amount of
silver required in the high volume production of contact
switches for a new Army radio. Although only a small amount of
silver is deposited on the switches, a larger amount is wasted
through a multiple step process. She has collected data and
would like to develop a prediction model. A-06 Production
Plant
Step 1: The variables identified as KPIVs are given below:
X
1
= Average temperature of rinse bath (degrees C)
X
2
= Speed of reel that feeds the switches through the line (inches/min)
X
3
= Thickness of silver deposit (angstroms)
X
4
= Water consumed (gallons per day)
Y = Amount of silver consumed (pounds/day)
Source: Applied Regression Analysis, Draper and Smith
What questions
would you ask
about this data?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
8
Multiple Regression
Visualize the Data!
Step 2:
Visualize the Data
Data file: A-06 Production
Plant.mtw
Select Graph>Matrix Plot
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
9
Multiple Regression
Looking for relationships between variables...
Step 2: Visualize the Data!
This dialog box comes up
first
Select Matrix of Plots Simple
Since we have only one (Y)
variable and no groups
Click on OK to go the next
Dialog box
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
10
Multiple Regression
Double click on all of the
variables you want to include in
the Matrix, to place them in the
Graph variables box
Select Matrix Options to move
on to the next dialog box
Step 2: Visualize the Data!
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
11
Multiple Regression
Select Lower left to place all
the graph labels to the
lower left of the boxes
Click on OK here and on the
previous dialog box to get
the matrix
Step 2: Visualize the Data!
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
12
Multiple Regression
Correlation Table
There appear to be some relationships
between certain variables and the response.
Temp
Thickness
Water
Amt of Ag
Speed
12
10
8
14.0
13.5
13.0
170
160
150
65 60 55
21
20
19
12 10 8 14.0 13.5 13.0 170 160 150
Matrix Plot of Temp, Speed, Thickness, Water, Amt of Ag
Response
Variable
(Y)
Is this
good or
bad?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
13
Multiple Regression
Quantify the Relationships Between Variables
Select Stat>Basic
Statistics> Correlation
Step 3: Quantify the relationship
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
14
Multiple Regression
Double click on all of the
variables you want to
include, to place them in
the Variables box
Check to display p-values
(default setting)
Click on OK to get the
Correlation Matrix in your
Session Window
Evaluating coefficients of correlation among predictors...
Correlation Matrix
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
15
Multiple Regression
Predictor variable pairwise correlations larger than .5-.7 are signs of
trouble ... Multicollinearity. We will explain more shortly.
The TOP number in
each pair is the
Pearson
Coefficient of
Correlation,
(r-Value)
While the BOTTOM
number is the
p-Value
Correlation Matrix
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
16
Multiple Regression
Finding the Regression Equation...
Select: Stat>
Regression>
Regression
Step 4: Develop a prediction model
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
17
Multiple Regression
Double click on C5 Amt of AG
and place it in the Response:
variable box, then double
click on all the variables you
want to place in the Predictors:
box.
Select Options to go to next
dialog box.
Finding the Regression Equation... (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
18
Multiple Regression
In this dialog box, the only
thing you have to do is check
Variance inflation factors
Click on OK here and on
previous dialog box to get the
regression analysis in your
Session Window
Finding the Regression Equation... (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
19
Multiple Regression
Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness
+ 0.0449 Water
Predictor Coef SE Coef T P VIF
Constant 5.72 10.83 0.53 0.607
Temp -0.01558 0.02616 -0.60 0.563 1.276
Speed 0.2393 0.2644 0.90 0.383 10.997
Thickness 0.443 1.033 0.43 0.675 11.671
Water 0.04495 0.01481 3.04 0.010 1.731
S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5%
The P-values indicate
whether a particular
predictor is significant
in presence of other
predictors in the
model
Minitab displays the following regression equation:
Regression Equation
This new model
explains 80.9% of
response variability
R-Sq (adj) adjusts for degrees
of freedom due to variables
that have no real value. It
should be used when
comparing models
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
20
Multiple Regression
Interpreting P-values
The P columns give the significance level
for each term in the model
Typically, if a P value is less than or equal
to 0.05, the variable is considered significant
(i.e., null hypothesis is rejected)
If a P value is greater than 0.10, the term is removed
from the model. A practitioner might leave the term in
the model, if the P value is within the gray region
between these two probability levels
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
21
Multiple Regression
Amt of Ag = 5.7 - 0.0156 Temp + 0.239 Speed + 0.44 Thickness
+ 0.0449 Water
Predictor Coef SE Coef T P VIF
Constant 5.72 10.83 0.53 0.607
Temp -0.01558 0.02616 -0.60 0.563 1.276
Speed 0.2393 0.2644 0.90 0.383 10.997
Thickness 0.443 1.033 0.43 0.675 11.671
Water 0.04495 0.01481 3.04 0.010 1.731
S = 0.412748 R-Sq = 80.9% R-Sq(adj) = 74.5%
Regression output in Minitabs Session Window
Regression Equation
High VIF values are signs of trouble (VIF > 10)
Variance Inflation Factor
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
22
Multiple Regression
Problems with Several Predictor Variables
Sometimes the Xs are correlated (dependent). This condition is
known as Multicollinearity
Multicollinearity can cause problems (sometimes severe)
Estimates of the coefficients are affected (unstable, inflated
variances)
Difficulty isolating the effects of each X
Coefficients depend on which Xs are included in the model
High multicollinearity inflates the standard error estimates,
which increases the P values
If case of extreme multicollinearity, Minitab will throw out one
term and give you notice
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
23
Multiple Regression
Graphical Representation of Multicollinearity
Total
Variation
in Y
Variation
Explained by
X1
Variation
Explained by
X2
Overlap represents correlation
X1 and X2 are both correlated with Y
X1 and X2 are highly correlated
If X1 is in the model, we dont need X2, and
vice versa
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
24
Multiple Regression
VIF
Temp 1.276
Speed 10.997
Thickness 11.671
Water 1.731
Assessing the Degree of Multicollinearity
We use a metric called Variance Inflation Factor (VIF):
Where:
R
i
2
is the R
2
value you get when you regress X
i
against the other Xs
A large R
i
2
suggests that a variable is redundant
Rule of Thumb:
R
i
2
> 0.9 is a cause for concern (high degree of collinearity) (VIF > 10)
0.8 < R
i
2
< 0.9 (moderate degree of collinearity) (VIF > 5)
For the Production Plant data, Minitab gives us:
2
1
1
i
R
VIF

Select
Stat>Regression>Regression>Options>
Display variance inflation factors
Two VIFs are a bit large, but in this case with a R-sq.
of 80.9%, some multicollinearity can be tolerated
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
25
Multiple Regression
Some Cautions About the Coefficients
Remember the prediction equation obtained earlier:
Relative importance of predictors cannot be
determined from the size of their coefficients:
The coefficients are scale dependent
The coefficients are influenced by correlation among
the predictor variables
If a high degree of multicollinearity exists, even the
signs of the coefficients may be misleading
Water 0.0449 Thickness 0.44 Speed 0.239 Temp. 0156 . 0 7 . 5 Ag of Amt
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
26
Multiple Regression
Residual Analysis
Select Stat>
Regression>
Regression
Step 5: Validate the selected model
Is there anything
suspicious with
this model?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
27
Multiple Regression
Double click on C5 Amt of AG
and place it in the Response
variable box, then double
click on all the variables you
want to place in the Predictors
box
Select Graphs to go to next
dialog box
Residual Analysis (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
28
Multiple Regression
Select Four in one to get all four
Residual plots on one graph, or
you can pick and choose the plots
You want
Click on OK here and on previous
Dialog box to get Residual plots
Residual Analysis (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
29
Multiple Regression
1.0 0.5 0.0 -0.5 -1.0
99
90
50
10
1
Residual
P
e
r
c
e
n
t
N 17
AD 0.249
P-Value 0.705
21.5 21.0 20.5 20.0 19.5
0.50
0.25
0.00
-0.25
-0.50
Fitted Value
R
e
s
i
d
u
a
l
0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6
4
3
2
1
0
Residual
F
r
e
q
u
e
n
c
y
16 14 12 10 8 6 4 2
0.50
0.25
0.00
-0.25
-0.50
Observation Order
R
e
s
i
d
u
a
l
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Amt of Ag
Not too bad overall
If you want to see
the value for any
observation, just
hold your cursor
over that point
Residual Analysis (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
30
Multiple Regression
How to Address Multicollinearity
Eliminate one or more input variables
Well look at a technique called Best Subsets
Regression
Collect additional data
Use process knowledge to determine the principal
relationship
Use DOE to further assess the multicollinearity
If neither are significant then eliminate both from the
analysis
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
31
Multiple Regression
Best Subsets Regression
Rather than relying on the p-values alone, the
computer looks at all possible combinations of
variables and prints the resulting model
characteristics
Statistics like adjusted R-Sq and MS
Error
will improve
as important model terms are added, then worsen as
junk terms are added to the model
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
32
Multiple Regression
Best Subsets Regression Considerations
Objective: We want to select a model with predictive
accuracy and minimum multicollinearity
Seek compromise between:
Overfitting (including model terms with only
marginal, or no, contribution)
Underfitting (ignoring or deleting relatively
important model terms)
What are some problems with overfitting?
What are some problems with underfitting?
overfit underfit
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
33
Multiple Regression
Best Subsets Regression
Evaluating Candidate Models
Four things to look at when evaluating candidate models:
1. R
2
(large R
2
is desired, although R
2
increases as we add more
predictors to the model, so this should only be used for
comparing models with the same number of terms)
2. Adjusted R
2
(large is desired)
3. Mallows Cp statistic (small Cp desired, close to the number of
terms in the model)
4. s (the estimate of the standard deviation around the regression)
Generally, the best three models are selected and checked for
significance of all factors and residual assumptions
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
34
Multiple Regression
More on the Mallows C-p Statistic
In practice, the minimum number of parameters needed in
the model is when the Mallows C-p statistic is a minimum
Rule of Thumb:
We want C-p number of input variables
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
35
Multiple Regression
Best Subsets Regression
Select Stat>
Regression>
Best Subsets
Minitab data set: Production Plant
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
36
Multiple Regression
Best Subsets Regression (Cont.)
Enter Response variable
Enter Predictor variables
(Input Variables)
Click on OK to get analysis
in Session Window
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
37
Multiple Regression
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
i
c
S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 X X
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
What Model(s)
are the best
candidates?
Best Subsets Regression (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
38
Multiple Regression
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
i
c
S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 X X
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
R-Sq: Look for the highest value
when comparing models with the
same number of input variables
Best Subsets Regression (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
39
Multiple Regression
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
i
c
S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 X X
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
R-Sq (adj): Look for the
highest value when comparing
models with different number
of input variables
Best Subsets Regression (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
40
Multiple Regression
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
i
c
S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 X X
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
Cp: Look for models where Cp is
small and close to the number of
input variables in the model
Best Subsets Regression (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
41
Multiple Regression
Best Subsets Regression: Amt of Ag versus Temp, Speed, Thickness, Water
Response is Amt of Ag
T
h
i
c
S k W
T p n a
e e e t
Mallows m e s e
Vars R-Sq R-Sq(adj) Cp S p d s r
1 64.4 62.0 9.4 0.50387 X
1 62.3 59.8 10.7 0.51836 X
2 80.0 77.2 1.5 0.39047 X X
2 78.8 75.8 2.3 0.40200 X X
3 80.6 76.1 3.2 0.39959 X X X
3 80.3 75.8 3.4 0.40237 X X X
4 80.9 74.5 5.0 0.41275 X X X X
S: We want S, the estimate of
the standard deviation about
the regression, to be as small
as possible
Best Subsets Regression (Cont.)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
42
Multiple Regression
Once the Candidate Models Are Identified
Evaluate the candidate models under a microscope
Outliers
High leverage
Influential observations
Residuals
Prediction quality
Once a model has been selected, find the new
regression equation
Test its predictive capability for observations NOT
originally used in the modeling
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
43
Multiple Regression
Regression with Reduced Model
We select the best model with two variables, Speed & Water,
and run Minitab again to obtain the new regression equation:
Select Stat>
Regression>
Regression
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
44
Multiple Regression
Regression with Reduced Model (Cont.)
Enter Amt of Ag as the
Response
Enter only Speed and Water
as Predictors
Click on OK to get analysis
in Session Window
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
45
Multiple Regression
Regression with Reduced Model (Cont.)
Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water
Predictor Coef SE Coef T P
Constant 9.919 1.694 5.86 0.000
Speed 0.35689 0.08544 4.18 0.001
Water 0.04253 0.01206 3.53 0.003
S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2%
Session window of Minitab yields the following regression
equation for the reduced model:
Amt of Ag = 5.7 - 0.0156 Temp. + 0.239 Speed
+ 0.44 Thickness + 0.0449 Water
Predictor Coef SE Coef T P
Constant 5.72 10.83 0.53 0.607
H20 Temp -0.01558 0.02616 -0.60 0.563
Speed 0.2393 0.2644 0.90 0.383
Thick. 0.443 1.033 0.43 0.675
Water 0.04495 0.01481 3.04 0.010
S = 0.4127 R-Sq = 80.9% R-Sq(adj) = 74.5%
to compare with the previous model:
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
46
Multiple Regression
Session window of Minitab also gives us the following output:
Unusual Observations
Obs Speed Amt of A Fit SE Fit Residual St Resid
3 11.5 21.0000 20.3784 0.2477 0.6216 2.06R
R denotes an observation with a large standardized residual
An unusual observation means a large standard residual
Lets see what would happen if we
eliminated such an observation
from our collected data!
Unusual Observations
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
47
Multiple Regression
Without the Unusual Observation, the Session window of Minitab
yields the following regression equation:
Amt of Ag = 8.61 + 0.237 Speed + 0.0577 Water
Predictor Coef SE Coef T P
Constant 8.610 1.567 5.49 0.000
Speed 0.23698 0.08960 2.64 0.020
Water 0.05775 0.01226 4.71 0.000
S = 0.3383 R-Sq = 85.0% R-Sq(adj) = 82.7%
to compare with the regression equation of our
previous reduced model
Amt of Ag = 9.92 + 0.357 Speed + 0.0425 Water
Predictor Coef SE Coef T P
Constant 9.919 1.694 5.86 0.000
Speed 0.35689 0.08544 4.18 0.001
Water 0.04253 0.01206 3.53 0.003
S = 0.3905 R-Sq = 80.0% R-Sq(adj) = 77.2%
R-Sq goes up a little
because weve gotten rid
of noise in the model
Impact of the Unusual Observation
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
48
Multiple Regression
Takeaways
Regression analysis can be used with historical data as well
data from designed experiments to build prediction models
Care must be exercised when using historical data
Correlation does not imply a cause and effect relationship
There may be serious problems with multicollinearity and
high leverage observations
There are several diagnostic tools available to evaluate
regression models:
Fit: R
2
, adjusted R
2
, Cp, S
Unusual observations: residual plots, leverage, CooksD
Multicollinearity: VIFs (Variance Inflation Factors)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
49
Multiple Regression
Considerations in Regression
Set goals before doing the analysis (what do you want to learn,
how well do you need to predict, etc.).
Gather enough observations to adequately measure error and
check the model assumptions.
Make sure that the sample of data is representative of the
population.
Excessive measurement error of the inputs (Xs) creates
uncertainty in the estimated coefficients, predictions, etc.
Be sure to collect data on all potentially important explanatory
variables.
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
50
Multiple Regression
Regression Checklist
Scatterplots (Y vs. X)
Histograms and/or Boxplots of Ys and Xs
Coefficients
Significance (p < .05 - .10)
R
2
and adjusted R
2
S
Residuals (no obvious pattern)
Unusual Y values (standardized residuals > 2)
Unusual X values (leverage > 2p/n)
Overfitting vs. underfitting (C-p number of input variables in model)
Multicollinearity (VIF > 5-10)
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
What other comments or questions
do you have?
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
52
Multiple Regression
References
Neter, Wasserman, and Kutner, Applied Linear Regression Models, Irwin, 1989
Draper and Smith, Applied Regression Analysis, Wiley, 1981
Schulman, Robert S., Statistics in Plain English, Chapman and Hall, 1992.
Gunst and Mason, Regression Analysis and its Application, Marcel Dekker, 1980
Myers, Raymond H., Classical and Modern Regression with Applications,
Duxbury, 1990
Dielman, Applied Regression Analysis for Business and Economics, Duxbury,
1991
Hosmer and Lemeshow, Applied Logistic Regression, Wiley, 1989
Iglewicz and Hoaglin, How to Detect and Handle Outliers, ASQ Press
Crocker, Douglas C., How to use Regression Analysis in Quality Control, ASQ
Press
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
National Guard
Black Belt Training
APPENDIX
Additional Exercises
Anthonys Pizza
Customer Satisfaction
A Study of Supervisor
Performance
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
54
Multiple Regression
Additional Practice Example:
Anthonys Pizza
We have received Voice of the Customer feedback
telling us that customers are dissatisfied if we cannot
accurately predict the time of their pizza delivery when
it is beyond the 30 minute target
We would like to develop a model so that when the
customer calls, we can accurately predict delivery time
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
55
Multiple Regression
Additional Practice Example:
Six Sigma Pizza
Our Minitab data can be found in the file Multiple
Regression - Pizza.mpj
Based on the data that we have collected, we are going to
study the effects of total pizzas ordered, defects, and
incorrect order on delivery time
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
56
Multiple Regression
Additional Practice Exercise:
Customer Satisfaction
Bob Black Belt would like to get a better understanding of the
customer satisfaction data
Use the data provided in the Minitab file A-06 Customer
Satisfaction Data.mtw to create a Regression Model to predict
Overall Satisfaction
Each row of data is a monthly average of how customers rated the
services on a scale of 1-10. For example, in January, the average
of customer ratings for Staff Responsiveness was a 7.9.
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
57
Multiple Regression
Additional Practice Exercise:
Customer Satisfaction (Cont.)
Consider Staff Responsiveness, Check-out Speed,
Frequent Guest Program, and Problems Resolved as
possible inputs that could be used to predict Overall
Satisfaction.
First, study correlation with a Matrix Plot and Correlation
Table
Next, create the initial Regression Model
Find the best combination of inputs with Best Subsets
Finally, run the reduced Regression Model
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
58
Multiple Regression
Additional Practice Exercise:
A Study of Supervisor Performance
A recent survey of clerical employees in a large financial organization
included questions related to employee satisfaction with their
supervisors. The company was interested in any relationships between
specific supervisor characteristics and overall satisfaction with
supervisors as perceived by the employees,
Y = Overall rating of the job being done by the supervisor
X1 = Handles employee complaints
X2 = Does not allow special privileges
X3 = Provides opportunity to learn new things
X4 = Raises based on performance
X5 = Too critical of poor performance
X6 = Rate of advancing to better jobs (employees perception
of their own advancement rate)
Source: Regression Analysis by Example, Chatterjee and Price
UNCLASSIFIED / FOUO
UNCLASSIFIED / FOUO
59
Multiple Regression
Additional Practice Exercise:
A Study of Supervisor Performance
The survey responses were on a scale of 1-5
For purposes of analysis, a score of 1 or 2 was considered
favorable, while a score of 3, 4, or 5 was considered unfavorable
Data was collected from 30 departments, selected randomly form
the organization. Each department had approximately 35 employees
with one supervisor
For each department, the data was aggregated and the data
recorded was the percent favorable for each item
Data file is A-06 Attitude.mtw
Questions:
Can we predict the overall supervisor rating using this data?
What variable(s) have the strongest correlation with the supervisor rating?
Are there any unusual observations?
Comments on the data?

You might also like