Title: | Data Sets from Montgomery, Peck and Vining |
Version: | 2.0 |
Description: | Most of this package consists of data sets from the textbook Introduction to Linear Regression Analysis (3rd ed), by Montgomery, Peck and Vining. Some additional data sets and functions are also included. |
Maintainer: | W.J. Braun <john.braun@ubc.ca> |
LazyLoad: | true |
LazyData: | true |
Depends: | R (≥ 2.0.1), lattice, KernSmooth |
ZipData: | no |
License: | Unlimited |
NeedsCompilation: | no |
Repository: | CRAN |
Packaged: | 2025-04-14 03:30:49 UTC; peterhall |
Author: | W.J. Braun [aut, cre], S. MacQueen [aut] |
Date/Publication: | 2025-04-14 04:30:02 UTC |
Aberrant Crypt Foci in Rat Colons
Description
Numbers of aberrant crypt foci (ACF) in colons of 66 rats subjected to a various numbers of dose of the carcinogen azoxymethane (AOM), sacrificed at 3 different times.
Usage
ACF
Format
This data frame contains the following columns:
- INJ
The number of carcinogen injections
- T
Time of sacrifice, in weeks following injection of AOM
- COUNT
The number of ACF observed in each rat colon
Source
Ranjana P. Bird, Faculty of Human Ecology, University of Manitoba, Winnipeg, Canada.
References
E.A. McLellan, A. Medline and R.P. Bird. Dose response and proliferative characteristics of aberrant crypt foci: putative preneoplastic lesions in rat colon. Carcinogenesis, 12(11): 2093-2098, 1991.
Examples
sapply(split(ACF$COUNT,ACF$T),var)
Confidence Intervals for Bias Corrected Local Regression
Description
Graphs of confidence interval estimates for bias and standard deviation of in bias-corrected local polynomial regression curve estimates.
Usage
BCCIPlot(data, k1=1, k2=2, h, h2, output, g, layout, incl.biasplot, plotdata)
Arguments
data |
A data frame, whose first column must be the explanatory variable and whose second column must be the response variable. |
k1 |
degree of local polynomial used in curve estimator. |
k2 |
degree of local polynomial used in bias estimator. |
h |
bandwidth for regression estimator. |
h2 |
bandwidth for bias estimator. |
output |
if TRUE, numeric output is printed to the console window. |
g |
the target function, if known (for use in simulations). |
layout |
if TRUE, a 2x1 layout of plots is sent to the graphics device. |
incl.biasplot |
if TRUE, the confidence intervals for the bias of the uncorrected estimate are plotted. |
plotdata |
if TRUE, the data points are plotted as a scatter plot. |
Value
A list containing the confidence interval limits, pointwise estimates of bias, standard deviation of bias, curve estimate, standard deviation of curve estimate, and approximate confidence limits for curve estimates. Graphs of the curve estimate confidence limits and the bias confidence limits.
Author(s)
W. John Braun and Wenkai Ma
Bias for Bias-Corrected Local Polynomial Regression
Description
Confidence interval estimates for bias in local polynomial regression.
Usage
BCLPBias(xy,k1,k2,h,h2,numgrid=401,alpha=.95)
Arguments
xy |
A data frame, whose first column must be the explanatory variable and whose second column must be the response variable. |
k1 |
degree of local polynomial used in curve estimator. |
k2 |
degree of local polynomial used in bias estimator. |
h |
bandwidth for regression estimator. |
h2 |
bandwidth for bias estimator. |
numgrid |
number of gridpoints used in the curve estimator. |
alpha |
nominal confidence level. |
Value
A list containing the confidence interval limits, pointwise estimates of bias, standard deviation of bias, curve estimate, standard deviation of curve estimate, and approximate confidence limits for curve estimates and corresponding bias-corrected estimates.
Author(s)
W. John Braun and Wenkai Ma
Local Polynomial Bias and Variability
Description
Graphs of confidence interval estimates for bias and standard deviation of in local polynomial regression curve estimates.
Usage
BiasVarPlot(data, k1=1, k2=2, h, h2, output=FALSE, g, layout=TRUE)
Arguments
data |
A data frame, whose first column must be the explanatory variable and whose second column must be the response variable. |
k1 |
degree of local polynomial used in curve estimator. |
k2 |
degree of local polynomial used in bias estimator. |
h |
bandwidth for regression estimator. |
h2 |
bandwidth for bias estimator. |
output |
if true, numeric output is printed to the console window. |
g |
the target function, if known (for use in simulations). |
layout |
if true, a 2x1 layout of plots is sent to the graphics device. |
Value
A list containing the confidence interval limits, pointwise estimates of bias, standard deviation of bias, curve estimate, standard deviation of curve estimate, and approximate confidence limits for curve estimates. Graphs of the curve estimate confidence limits and the bias confidence limits.
Author(s)
W. John Braun and Wenkai Ma
Biochemical Oxygen Demand
Description
The BioOxyDemand
data frame has 14 rows and 2 columns.
Usage
data(BioOxyDemand)
Format
This data frame contains the following columns:
- x
-
a numeric vector
- y
-
a numeric vector
Source
Devore, J. L. (2000) Probability and Statistics for Engineering and the Sciences (5th ed), Duxbury
Examples
plot(BioOxyDemand)
summary(lm(y ~ x, data = BioOxyDemand))
Cloth Strength Measurements
Description
Strength measurements of 5 bolts of cloth, each treated with varying amounts of a chemical.
Usage
ClothStrength
Format
This data frame contains the following columns:
- Bolt
a factor with 5 levels
- Chemical
a factor with 4 levels
- Strength
a numeric vector
Graphical ANOVA Plot
Description
Graphical analysis of one-way ANOVA data. It allows visualization of the usual F-test.
Usage
GANOVA(dataset, var.equal=TRUE, type="QQ", center=TRUE, shift=0)
Arguments
dataset |
A data frame, whose first column must be the factor variable and whose second column must be the response variable. |
var.equal |
Logical: if TRUE, within-sample variances are assumed to be equal |
type |
"QQ" or "hist" |
center |
if TRUE, center and scale the means to match the scale of the errors |
shift |
on the histogram, lift the points representing the means above the horizontal axis by this amount. |
Value
A QQ-plot or a histogram and rugplot
Author(s)
W. John Braun and Sarah MacQueen
Source
Braun, W.J. 2013. Naive Analysis of Variance. Journal of Statistics Education.
Graphical F Plot for Significance in Regression
Description
This function analyzes regression data graphically. It allows visualization of the usual F-test for significance of regression.
Usage
GFplot(X, y, plotIt=TRUE, sortTrt=FALSE, type="hist", includeIntercept=TRUE, labels=FALSE)
Arguments
X |
The design matrix. |
y |
A numeric vector containing the response. |
plotIt |
Logical: if TRUE, a graph is drawn. |
sortTrt |
Logical: if TRUE, an attempt is made at sorting the predictor effects in descending order. |
type |
"QQ" or "hist" |
includeIntercept |
Logical: if TRUE, the intercept effect is plotted; otherwise, it is omitted from the plot. |
labels |
logical: if TRUE, names of predictor variables are used as labels; otherwise, the design matrix column numbers are used as labels |
Value
A QQ-plot or a histogram and rugplot, or a list if plotIt=FALSE
Author(s)
W. John Braun
Source
Braun, W.J. 2013. Regression Analysis and the QR Decomposition. Preprint.
Examples
# Example 1
X <- p4.18[,-4]
y <- p4.18[,4]
GFplot(X, y, type="hist", includeIntercept=FALSE)
title("Evidence of Regression in the Jojoba Oil Data")
# Example 2
set.seed(4571)
Z <- matrix(rnorm(400), ncol=10)
A <- matrix(rnorm(81), ncol=9)
simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A))
names(simdata) <- c("y", paste("x", 1:9, sep=""))
GFplot(simdata[,-1], simdata[,1], type="hist", includeIntercept=FALSE)
title("Evidence of Regression in Simulated Data Set")
# Example 3
GFplot(table.b1[,-1], table.b1[,1], type="hist", includeIntercept=FALSE)
title("Evidence of Regression in NFL Data Set")
# An example where stepwise AIC selects the complement
# of the set of variables that are actually in the true model:
X <- pathoeg[,-10]
y <- pathoeg[,10]
par(mfrow=c(2,2))
GFplot(X, y)
GFplot(X, y, sortTrt=TRUE)
GFplot(X, y, type="QQ")
GFplot(X, y, sortTrt=TRUE, type="QQ")
X <- table.b1[,-1] # NFL data
y <- table.b1[,1]
GFplot(X, y)
Graphical Regression Plot
Description
This function analyzes regression data graphically. It allows visualization of the usual F-test for significance of regression.
Usage
GRegplot(X, y, sortTrt=FALSE, includeIntercept=TRUE, type="hist")
Arguments
X |
The design matrix. |
y |
A numeric vector containing the response. |
sortTrt |
Logical: if TRUE, an attempt is made at sorting the predictor effects in descending order. |
includeIntercept |
Logical: if TRUE, the intercept effect is plotted; otherwise, it is omitted from the plot. |
type |
Character: hist, for histogram; dot, for stripchart |
Value
A histogram or dotplot and rugplot
Author(s)
W. John Braun
Source
Braun, W.J. 2014. Visualization of Evidence in Regression Analysis with the QR Decomposition. Preprint.
Examples
# Example 1
X <- p4.18[,-4]
y <- p4.18[,4]
GRegplot(X, y, includeIntercept=FALSE)
title("Evidence of Regression in the Jojoba Oil Data")
# Example 2
set.seed(4571)
Z <- matrix(rnorm(400), ncol=10)
A <- matrix(rnorm(81), ncol=9)
simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A))
names(simdata) <- c("y", paste("x", 1:9, sep=""))
GRegplot(simdata[,-1], simdata[,1], includeIntercept=FALSE)
title("Evidence of Regression in Simulated Data Set")
# Example 3
GRegplot(table.b1[,-1], table.b1[,1], includeIntercept=FALSE)
title("Evidence of Regression in NFL Data Set")
# An example where stepwise AIC selects the complement
# of the set of variables that are actually in the true model:
X <- pathoeg[,-10]
y <- pathoeg[,10]
par(mfrow=c(2,1))
GRegplot(X, y)
GRegplot(X, y, sortTrt=TRUE)
X <- table.b1[,-1] # NFL data
y <- table.b1[,1]
GRegplot(X, y)
Juliet
Description
Juliet
has 28 rows and 9 columns. The data is of the input and output of the Spirit Still "Juliet" from Endless Summer Distillery. It is suggested to split the data by the Batch factor for ease of use.
Usage
Juliet
Format
The data frame contains the following 9 columns.
Batch
a Factor determing how many times the volume has been through the still.
Vol1
Volume in litres, initial
P1
Percent alcohol present, initial
LAA1
Litres Absolute Alcohol initial,
Vol1*P1
Vol2
Volume in litres, final
P2
Percent alcohol present, final
LAA2
Litres Absolute Alcohol final,
Vol2*P2
Yield
Percent yield obtained,
LAA2/LAA1
Date
Character, Date of run
Details
The purpose of this information is to determine the optimal initial volume and percentage. The information is broken down by Batch
. A batch factor 1 means that it
is the first time the liquid has gone through the spirit still. The first run through the still should have the most loss due to the "heads" and "tails".
Literature states that the first run through a spirit still should yield 70 percent.
A batch factor 2 means that it is the second time the liquid has gone through the spirit still.
A batch factor 3 means that it is the third time or more that the liquid has gone through the spirit still.
Each subsequent distillation should result in a higher yield, never to exceed 95 percent.
Source
Charisse Woods, Endless Summer Distillery, (2015).
Examples
summary(Juliet)
#Split apart the Batch factor for easier use.
juliet<-split(Juliet,Juliet$Batch)
juliet1<-juliet$'1'
juliet2<-juliet$'2'
juliet3<-juliet$'3'
plot(LAA1~LAA2,data=Juliet)
plot(LAA1~LAA2,data=juliet1)
Local Polynomial Bias
Description
Confidence interval estimates for bias in local polynomial regression.
Usage
LPBias(xy,k1,k2,h,h2,numgrid=401,alpha=.95)
Arguments
xy |
A data frame, whose first column must be the explanatory variable and whose second column must be the response variable. |
k1 |
degree of local polynomial used in curve estimator. |
k2 |
degree of local polynomial used in bias estimator. |
h |
bandwidth for regression estimator. |
h2 |
bandwidth for bias estimator. |
numgrid |
number of gridpoints used in the curve estimator. |
alpha |
nominal confidence level. |
Value
A list containing the confidence interval limits, pointwise estimates of bias, standard deviation of bias, curve estimate, standard deviation of curve estimate, and approximate confidence limits for curve estimates.
Author(s)
W. John Braun and Wenkai Ma
PRESS statistic
Description
Computation of Allen's PRESS statistic for an lm object.
Usage
PRESS(x)
Arguments
x |
An |
Value
Allen's PRESS statistic.
Author(s)
W.J. Braun
See Also
lm
Examples
data(p4.18)
attach(p4.18)
y.lm <- lm(y ~ x1 + I(x1^2))
PRESS(y.lm)
detach(p4.18)
Analysis of Variance Plot for Regression
Description
This function analyzes regression data graphically. It allows visualization of the usual F-test for significance of regression.
Usage
Qyplot(X, y, plotIt=TRUE, sortTrt=FALSE, type="hist", includeIntercept=TRUE, labels=FALSE)
Arguments
X |
The design matrix. |
y |
A numeric vector containing the response. |
plotIt |
Logical: if TRUE, a graph is drawn. |
sortTrt |
Logical: if TRUE, an attempt is made at sorting the predictor effects in descending order. |
type |
"QQ" or "hist" |
includeIntercept |
Logical: if TRUE, the intercept effect is plotted; otherwise, it is omitted from the plot. |
labels |
logical: if TRUE, names of predictor variables are used as labels; otherwise, the design matrix column numbers are used as labels |
Value
A QQ-plot or a histogram and rugplot, or a list if plotIt=FALSE
Author(s)
W. John Braun
Source
Braun, W.J. 2013. Regression Analysis and the QR Decomposition. Preprint.
Examples
# Example 1
X <- p4.18[,-4]
y <- p4.18[,4]
Qyplot(X, y, type="hist", includeIntercept=FALSE)
title("Evidence of Regression in the Jojoba Oil Data")
# Example 2
set.seed(4571)
Z <- matrix(rnorm(400), ncol=10)
A <- matrix(rnorm(81), ncol=9)
simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A))
names(simdata) <- c("y", paste("x", 1:9, sep=""))
Qyplot(simdata[,-1], simdata[,1], type="hist", includeIntercept=FALSE)
title("Evidence of Regression in Simulated Data Set")
# Example 3
Qyplot(table.b1[,-1], table.b1[,1], type="hist", includeIntercept=FALSE)
title("Evidence of Regression in NFL Data Set")
# An example where stepwise AIC selects the complement
# of the set of variables that are actually in the true model:
X <- pathoeg[,-10]
y <- pathoeg[,10]
par(mfrow=c(2,2))
Qyplot(X, y)
Qyplot(X, y, sortTrt=TRUE)
Qyplot(X, y, type="QQ")
Qyplot(X, y, sortTrt=TRUE, type="QQ")
X <- table.b1[,-1] # NFL data
y <- table.b1[,1]
Qyplot(X, y)
Plot of Multipliers in Regression ANOVA Plot
Description
This function graphically displays the coefficient multipliers used in the Regression Plot for the given predictor.
Usage
Uplot(X.qr, Xcolumn = 1, ...)
Arguments
X.qr |
The design matrix or the QR decomposition of the design matrix. |
Xcolumn |
The column(s) of the design matrix under study; this can be either integer valued or a character string. |
... |
Additional arguments to barchart. |
Value
A bar plot is displayed.
Author(s)
W. John Braun
Examples
# Jojoba oil data set
X <- p4.18[,-4]
Uplot(X, 1:4)
# NFL data set; see GFplot result first
X <- table.b1[,-1]
Uplot(X, c(2,3,9))
# In this example, x8 is the only predictor in
# the true model:
X <- pathoeg[,-10]
y <- pathoeg[,10]
pathoeg.F <- GFplot(X, y, plotIt=FALSE)
Uplot(X, "x8")
Uplot(X, 9) # same as above
Uplot(pathoeg.F$QR, 9) # same as above
X <- table.b1[,-1]
Uplot(X, c("x2", "x3", "x9"))
Winnipeg Maximum Temperatures
Description
The Wpgtemp
data frame has 7671 observations on
daily maximum temperatures at the Winnipeg International Airport for the years 1960
through 1980.
Usage
data(Wpgtemp)
Format
This data frame contains the following columns:
- temperature
-
A numeric vector containing the temperatures in degrees Celsius
- day
A numeric vector denoting the observation date in numbers of days after December 31, 1959
Source
Environment Canada
Examples
summary(Wpgtemp)
Electricity Usage in Air Conditioning Systems
Description
The airconditioner
data frame has 20 observations on 3
variables related to measurements on electricity usage during
a summer month for four different kinds of air conditioning
systems. The measurements were taken in houses that were
randomly selected from five different home types which depended
on factors such as floor space, etc.
Usage
data(airconditioner)
Format
This data frame contains the following columns:
- HomeType
a factor representing type of home
- SystemType
a factor representing the air conditioning system
- Usage
a numeric vector representing electricity usage in KWh
Source
Devore, J.L., and Farnum, N. (2005) Applied Statistics for Engineers and Scientists. 2nd Edition, Thomson.
Paper Airplane Flying Distances
Description
Flight distances (in meters) for 12 paper airplanes of varying weights.
Usage
data("airplane")
Format
A data frame with 12 observations on 2 variables.
weight
factor with 3 levels
distance
numeric flight distances
Simulated Paper Airplane Flying Distances - Replicate 1
Description
Simulated flight distances (in meters) for 12 paper airplanes of varying weights. These data were generated under the assumption that there is no difference in mean flight difference due to differences in the weight of the paper. The noise variance was assumed to be 0.96.
Usage
data("airplane.sim01")
Format
A data frame with 12 observations on 2 variables.
weight
factor with 3 levels
distance
numeric flight distances
Simulated Paper Airplane Flying Distances - Replicate 2
Description
Simulated flight distances (in meters) for 12 paper airplanes of varying weights. These data were generated under the assumption that there is no difference in mean flight difference due to differences in the weight of the paper. The noise variance was assumed to be 0.96.
Usage
data("airplane.sim01")
Format
A data frame with 12 observations on 2 variables.
weight
factor with 3 levels
distance
numeric flight distances
Simulated Paper Airplane Flying Distances - Replicate 3
Description
Simulated flight distances (in meters) for 12 paper airplanes of varying weights. These data were generated under the assumption that there are differences in mean flight difference due to differences in the weight of the paper. The noise variance was assumed to be 0.96.
Usage
data("airplane.sim01")
Format
A data frame with 12 observations on 2 variables.
weight
factor with 3 levels
distance
numeric flight distances
Paper Airplane Flying Distances Replicated Study
Description
Flight distances (in meters) for 20 paper airplanes of varying weights.
Usage
data("airplane2")
Format
A data frame with 20 observations on 2 variables.
weight
factor with 4 levels
distance
numeric flight distances
Paper Airplane Flying Distances - Second Replicated Study
Description
Flight distances (in meters) for 20 paper airplanes of varying weights.
Usage
data("airplane3")
Format
A data frame with 20 observations on 2 variables.
weight
factor with 4 levels
distance
numeric flight distances
Blood Pressure Measurements on a Single Adult Male
Description
Systolic and diastolic blood pressure measurement readings were taken on a 56-year-old male over a 39 day period, sometimes in the mornings (AM) and sometimes in the evening (PM). Varying number of replicate measurements were taken at each time point.
Usage
bp
Format
A data frame with 121 observations on the following 4 variables.
TimeofDay
factor with levels AM and PM
Date
numeric
Systolic
numeric
Diastolic
numeric
Examples
require(lattice)
xyplot(Date ~ Diastolic|TimeofDay, groups=cut(Systolic, c(0, 130, 140,
200)), data = bp, col=c(3, 1, 2), pch=16)
matplot(bp[, c(3, 4)], type="l", lwd=2, ylab="Pressure")
n <- nrow(bp)
abline(v=(1:n)[bp[,1]=="PM"]-.5, col="grey")
abline(v=(1:n)[bp[,1]=="PM"], col="grey")
abline(v=(1:n)[bp[,1]=="PM"]+.5, col="grey")
bp.stk <- stack(bp, c("Systolic", "Diastolic"))
bp.tmp <- rbind(bp[,1:2], bp[,1:2])
bp.stk <- cbind(bp.tmp, bp.stk)
names(bp.stk) <- c("TimeofDay", "Date", "Pressure", "Type")
reps <- NULL
for (j in rle(paste(bp.stk$Date, bp.stk$TimeofDay))$lengths) reps <- c(reps, (1:j))
bp.stk$Rep <- reps
xyplot(Pressure ~ I(Date+Rep/24)|TimeofDay, groups=Type, data = bp.stk, xlab="Date", pch=16)
Table B21 - Cement Data
Description
The cement
data frame has 13 rows and 5 columns.
Usage
data(cement)
Format
This data frame contains the following columns:
- y
a numeric vector
- x1
a numeric vector
- x2
a numeric vector
- x3
a numeric vector
- x4
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(cement)
pairs(cement)
Cigarette Butts
Description
On a university campus there are a number of areas designated for smoking. Outside of those areas, smoking is not permitted. One of the smoking areas is towards the north end of the campus near some parking lots and a large walkway towards one of the residences. Along the walkway, cigarette butts are visible in the nearby grass. Numbers of cigarette butts were counted at various distances from the smoking area in 200x80 square-cm quadrats located just west of the walkway.
Usage
data("cigbutts")
Format
A data frame with 15 observations on the following 2 variables.
distance
distance from gazebo
count
observed number of butts
Earthquakes Data
Description
The earthquake data frame contains measurements of latitude, longitude, focal depth and magnitude for all earthquakes having magnitude greater than 5.8 between 1964 and 1985.
Usage
earthquake
Format
This data frame contains 2178 observations on the following columns:
- depth
numeric vector of focal depths.
- latitude
latitudinal coordinate.
- longitude
longitudinal coordinate.
- magnitude
numeric vector of magnitudes.
Source
Jeffrey S. Simonoff (1996), Smoothing Methods in Statistics, Springer-Verlag, New York.
Examples
summary(earthquake)
Micro-fires recorded in a lab setting
Description
Rate of spread measurements (inches/s) in each direction: East, West, North and South for each of 31 experimental runs at given slopes, measured over the given time period of each (measured in seconds).
Usage
fires
Format
A data frame with 31 observations on the following 7 variables.
Run
numeric
Slope
numeric: vertical rise divided by horizontal run, inclined from East to West
ROS_E
numeric: rate of spread measured in easterly direction
ROS_W
numeric: rate of spread measured in westerly direction
ROS_S
numeric: rate of spread measured in southerly direction
ROS_N
numeric: rate of spread measured in northerly direction
Time
numeric
Source
Braun, W.J. and Woolford, D.G. (2013) Assessing a stochastic fire spread simulator. Journal of Environmental Informatics. 22:1-12.
Natural Gas Consumption in a Single-Family Residence
Description
This data frame contains the average monthly volume of natural gas used in the furnace of a 1600 square foot house located in London, Ontario, for each month from 2006 until 2011. It also contains the average temperature for each month, and a measure of degree days. Insulation was added to the roof on one occasions, the walls were insulated on a second occasion, and the mid-efficiency furnace was replaced with a high-efficiency furnace on a third occasion.
Usage
data("gasdata")
Format
A data frame with 70 observations on the following 9 variables.
month
numeric 1=January, 12=December
degreedays
numeric, Celsius
cubicmetres
total volume of gas used in a month
dailyusage
average amount of gas used per day
temp
average temperature in Celsius
year
numeric
I1
indicator that roof insulation is present
I2
indicator that wasll insulation is present
I3
indicator that high efficiency furnace is present
Length Guesses Data
Description
The lengthguesses
list consists of 2 numeric vectors, one
giving the metric-converted length guesses (in feet) of an auditorium
whose actual length (in meters) was 13.1m, and the other containing
the length guesses of 69 others (in meters).
Usage
data(lengthguesses)
Format
This list contains the following columns:
- imperial
a numeric vector of 69 student guesses as to the length of an auditorium using the imperial system, converted to meters.
- metric
a numeric vector of 44 student guesses as to the length of an auditorium using the metric system.
Source
Hills, M. and the M345 Course Team (1986) M345 Statistical Methods, Unit 1: Data, distributions and uncertainty, Milton Keynes: The Open University. Tables 2.1 and 2.4.
References
Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J. and Ostrowski, E. (1994) A Handbook of Small Data Sets. Boca Raton: Chapman & Hall/CRC.
Examples
with(lengthguesses, t.test(imperial, metric))
Lesions in Rat Colons
Description
Numbers of aberrant crypt foci (ACF) in each of six cross-sectional regions of the colons of 66 rats subjected to varying doses of the carcinogen azoxymethane (AOM), sacrificed at 3 different times.
Usage
lesions
Format
This data frame contains the following columns:
- T
Incubation time factor, levels: 6, 12 and 18 weeks
- INJ
Number of injections
- SECT
Section of colon, a factor with levels 1 through 6, where 1 denotes the proximal end of the colon and 6 denotes the distal end
- RAT
Label for animal within a particular T-INJ factor level combination
- ACF.Total
Total number of ACF lesions in a section of a rat's colon
- ACF.total.mult
Sum of ACF multiplicities for a section of a rat's colon
- id
Identifier for each of the 66 rats.
Source
Ranjana P. Bird, University of Northern British Columbia, Prince George, Canada.
References
E.A. McLellan, A. Medline and R.P. Bird. Dose response and proliferative characteristics of aberrant crypt foci: putative preneoplastic lesions in rat colon. Carcinogenesis, 12(11): 2093-2098, 1991.
Examples
summary(lesions)
ACF.All <- aggregate(ACF.Total ~ id + INJ + T, FUN=sum, data = lesions)
lesions.glm <- glm(ACF.Total ~ INJ * T, data = ACF.All, family=poisson)
summary(lesions.glm)
lesions.qp <- glm(ACF.Total ~ INJ * T, data = ACF.All, family=quasipoisson)
summary(lesions.qp)
lesions.noInt <- glm(ACF.Total ~ INJ + T, data = ACF.All, family=quasipoisson)
summary(lesions.noInt)
Motor Vibration Data
Description
Noise measurements for 5 samples of motors, each sample based on a different brand of bearing.
Usage
data("motor")
Format
A data frame with 5 columns.
Brand 1
A numeric vector length 6
Brand 2
A numeric vector length 6
Brand 3
A numeric vector length 6
Brand 4
A numeric vector length 6
Brand 5
A numeric vector length 6
Source
Devore, J. and N. Farnum (2005) Applied Statistics for Engineers and Scientists. Thomson.
noisy image
Description
The noisyimage
is a list. The third component is
noisy version of the third component of tarimage.
Usage
data(noisyimage)
Format
This list contains the following elements:
- x
a numeric vector having 101 elements.
- y
a numeric vector having 101 elements.
- xy
a numeric matrix having 101 rows and columns
Examples
with(noisyimage, image(x, y, xy))
oldwash
Description
The oldwash
dataframe has 49 rows and 8 columns.
The data are from the start up of a wash still considering the amount of time it takes to heat up to a specified temperature and possible influencing factors.
Usage
data("oldwash")
Format
A data frame with 49 observations on the following 8 variables.
Date
character, the date of the run
startT
degrees Celsius, numeric, initial temperature
endT
degrees Celsius, numeric, final temperature
time
in minutes, numeric, amount of time to reach final temperature
Vol
in litres, numeric, amount of liqiud in the tank (max 2000L)
alc
numeric, the percentage of alcohol present in the liquid
who
character, relates to the person who ran the still
batch
factor with levels 1 = first time through, 2 = second time through
Details
The purpose of the wash still is to increase the percentage of alcohol and strip out unwanted particulate. It can take a long time to heat up and this can lead to problems in meeting production time limits.
Source
Charisse Woods, Endless Summer Distillery (2014)
Examples
oldwash.lm<-lm(log(time)~startT+endT+Vol+alc+who+batch,data=oldwash)
summary(oldwash.lm)
par(mfrow=c(2,2))
plot(oldwash.lm)
data2<-subset(oldwash,batch==2)
hist(data2$time)
data1<-subset(oldwash,batch==1)
hist(data1$time)
oldwash.lmc<-lm(time~startT+endT+Vol+alc+who+batch,data=data1)
summary(oldwash.lmc)
plot(oldwash.lmc)
oldwash.lmd<-lm(time~startT+endT+Vol+alc+who+batch,data=data2)
summary(oldwash.lmd)
plot(oldwash.lmd)
Data For Problem 11-12
Description
The p11.12
data frame has 19 observations on satellite cost.
Usage
data(p11.12)
Format
This data frame contains the following columns:
- cost
first-unit satellite cost
- x
weight of the electronics suite
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Simpson and Montgomery (1998)
Examples
data(p11.12)
attach(p11.12)
plot(cost~x)
detach(p11.12)
Data set for Problem 11-15
Description
The p11.15
data frame has 9 rows and 2 columns.
Usage
data(p11.15)
Format
This data frame contains the following columns:
- x
a numeric vector
- y
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Ryan (1997), Stefanski (1991)
Examples
data(p11.15)
plot(p11.15)
attach(p11.15)
lines(lowess(x,y))
detach(p11.15)
Data Set for Problem 12-11
Description
The p12.11
data frame has 44 observations on the fraction
of active chlorine in a chemical product as a function of time
after manufacturing.
Usage
data(p12.11)
Format
This data frame contains the following columns:
- xi
time
- yi
available chlorine
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p12.11)
plot(p12.11)
lines(lowess(p12.11))
Data Set for Problem 12-12
Description
The p12.12
data frame has 18 observations on an
chemical experiment. A nonlinear model relating concentration to
reaction time and temperature with an additive error is proposed to
fit these data.
Usage
data(p12.12)
Format
This data frame contains the following columns:
- x1
reaction time (in minutes)
- x2
temperature (in degrees Celsius)
- y
concentration (in grams/liter)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p12.12)
attach(p12.12)
# fitting the linearized model
logy.lm <- lm(I(log(y))~I(log(x1))+I(log(x2)))
summary(logy.lm)
plot(logy.lm, which=1) # checking the residuals
# fitting the nonlinear model
y.nls <- nls(y ~ theta1*I(x1^theta2)*I(x2^theta3), start=list(theta1=.95,
theta2=.76, theta3=.21))
summary(y.nls)
plot(resid(y.nls)~fitted(y.nls)) # checking the residuals
Data Set for Problem 12-8
Description
The p12.8
data frame has 14 rows and 2 columns.
Usage
data(p12.8)
Format
This data frame contains the following columns:
- x
a numeric vector
- y
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p12.8)
Data Set for Problem 13-1
Description
The p13.1
data frame has 25 observation on the
test-firing results for surface-to-air missiles.
Usage
data(p13.1)
Format
This data frame contains the following columns:
- x
target speed (in Knots)
- y
hit (=1) or miss (=0)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p13.1)
Data Set for Problem 13-16
Description
The p13.16
data frame has 16 rows and 5 columns.
Usage
data(p13.16)
Format
This data frame contains the following columns:
- X1
a numeric vector
- X2
a numeric vector
- X3
a numeric vector
- X4
a numeric vector
- Y
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p13.16)
Data Set for Problem 13-2
Description
The p13.2
data frame has 20 observations on home ownership.
Usage
data(p13.2)
Format
This data frame contains the following columns:
- x
family income
- y
home ownership (1 = yes, 0 = no)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p13.2)
Data Set for Problem 13-20
Description
The p13.20
data frame has 30 rows and 2 columns.
Usage
data(p13.20)
Format
This data frame contains the following columns:
- yhat
a numeric vector
- resdev
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p13.20)
Data Set for Problem 13-3
Description
The p13.3
data frame has 10 observations on the
compressive strength of an alloy fastener used in
aircraft construction.
Usage
data(p13.3)
Format
This data frame contains the following columns:
- x
load (in psi)
- n
sample size
- r
number failing
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p13.3)
Data Set for Problem 13-4
Description
The p13.4
data frame has 11 observations on the
effectiveness of a price discount coupon on the
purchase of a two-litre beverage.
Usage
data(p13.4)
Format
This data frame contains the following columns:
- x
discount
- n
sample size
- r
number redeemed
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p13.4)
Data Set for Problem 13-5
Description
The p13.5
data frame has 20 observations on
new automobile purchases.
Usage
data(p13.5)
Format
This data frame contains the following columns:
- x1
income
- x2
age of oldest vehicle
- y
new purchase less than 6 months later (1=yes, 0=no)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p13.5)
Data Set for Problem 13-6
Description
The p13.6
data frame has 15 observations
on the number of failures of a particular type of valve
in a processing unit.
Usage
data(p13.6)
Format
This data frame contains the following columns:
- valve
type of valve
- numfail
number of failures
- months
months
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p13.6)
Data Set for Problem 13-7
Description
The p13.7
data frame has 44 observations on the coal
mines of the Appalachian region of western Virginia.
Usage
data(p13.7)
Format
This data frame contains the following columns:
- y
number of fractures in upper seams of coal mines
- x1
inner burden thickness (in feet), shortest distance between seam floor and the lower seam
- x2
percent extraction of the lower previously mined seam
- x3
lower seam height (in feet)
- x4
time that the mine has been in operation (in years)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Myers (1990)
Examples
data(p13.7)
Data Set for Problem 14-1
Description
The p14.1
data frame has 15 rows and 3 columns.
Usage
data(p14.1)
Format
This data frame contains the following columns:
- x
a numeric vector
- y
a numeric vector
- time
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p14.1)
Data Set for Problem 14-2
Description
The p14.2
data frame has 18 rows and 3 columns.
Usage
data(p14.2)
Format
This data frame contains the following columns:
- t
a numeric vector
- xt
a numeric vector
- yt
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p14.2)
Data Set for Problem 15-4
Description
The p15.4
data frame has 40 rows and 4 columns.
Usage
data(p15.4)
Format
This data frame contains the following columns:
- x1
a numeric vector
- x2
a numeric vector
- y
a numeric vector
- set
a factor with levels
e
andp
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p15.4)
Data Set for Problem 2-10
Description
The p2.10
data frame has 26 observations on weight and
systolic blood pressure for randomly selected males in the 25-30
age group.
Usage
data(p2.10)
Format
This data frame contains the following columns:
- weight
in pounds
- sysbp
systolic blood pressure
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p2.10)
attach(p2.10)
cor.test(weight, sysbp, method="pearson") # tests rho=0
# and computes 95% CI for rho
# using Fisher's Z-transform
Data Set for Problem 2-12
Description
The p2.12
data frame has 12 observations on
the number of pounds of steam used per month at a plant and
the average monthly ambient temperature.
Usage
data(p2.12)
Format
This data frame contains the following columns:
- temp
ambient temperature (in degrees F)
- usage
usage (in thousands of pounds)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p2.12)
attach(p2.12)
usage.lm <- lm(usage ~ temp)
summary(usage.lm)
predict(usage.lm, newdata=data.frame(temp=58), interval="prediction")
detach(p2.12)
Data Set for Problem 2-13
Description
The p2.13
data frame has 16 observations on the number
of days the ozone levels exceeded 0.2 ppm in the
South Coast Air Basin of California for the years 1976 through
1991. It is believed that these levels are related to temperature.
Usage
data(p2.13)
Format
This data frame contains the following columns:
- days
number of days ozone levels exceeded 0.2 ppm
- index
a seasonal meteorological index giving the seasonal average 850 millibar temperature.
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Davidson, A. (1993) Update on Ozone Trends in California's South Coast Air Basin. Air Waste, 43, 226-227.
Examples
data(p2.13)
attach(p2.13)
plot(days~index, ylim=c(-20,130))
ozone.lm <- lm(days ~ index)
summary(ozone.lm)
# plots of confidence and prediction intervals:
ozone.conf <- predict(ozone.lm, interval="confidence")
lines(sort(index), ozone.conf[order(index),2], col="red")
lines(sort(index), ozone.conf[order(index),3], col="red")
ozone.pred <- predict(ozone.lm, interval="prediction")
lines(sort(index), ozone.pred[order(index),2], col="blue")
lines(sort(index), ozone.pred[order(index),3], col="blue")
detach(p2.13)
Data Set for Problem 2-14
Description
The p2.14
data frame has 8 observations on the molar
ratio of sebacic acid and the intrinsic viscosity of copolyesters.
One is interested in predicting viscosity from the sebacic acid ratio.
Usage
data(p2.14)
Format
This data frame contains the following columns:
- ratio
molar ratio
- visc
viscosity
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Hsuie, Ma, and Tsai (1995) Separation and Characterizations of Thermotropic Copolyesters of p-Hydroxybenzoic Acid, Sebacic Acid and Hydroquinone. Journal of Applied Polymer Science, 56, 471-476.
Examples
data(p2.14)
attach(p2.14)
plot(p2.14, pch=16, ylim=c(0,1))
visc.lm <- lm(visc ~ ratio)
summary(visc.lm)
visc.conf <- predict(visc.lm, interval="confidence")
lines(ratio, visc.conf[,2], col="red")
lines(ratio, visc.conf[,3], col="red")
visc.pred <- predict(visc.lm, interval="prediction")
lines(ratio, visc.pred[,2], col="blue")
lines(ratio, visc.pred[,3], col="blue")
detach(p2.14)
Data Set for Problem 2-15
Description
The p2.15
data frame has 8 observations on the impact
of temperature on the viscosity of toluene-tetralin blends.
This particular data set deals with blends with a 0.4 molar
fraction of toluene.
Usage
data(p2.15)
Format
This data frame contains the following columns:
- temp
temperature (in degrees Celsius)
- visc
viscosity (mPa s)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Byers and Williams (1987) Viscosities of Binary and Ternary Mixtures of Polynomatic Hydrocarbons. Journal of Chemical and Engineering Data, 32, 349-354.
Examples
data(p2.15)
attach(p2.15)
plot(visc ~ temp, pch=16)
visc.lm <- lm(visc ~ temp)
plot(visc.lm, which=1)
detach(p2.15)
Data Set for Problem 2-16
Description
The p2.16
data frame has 33 observations on the
pressure in a tank the volume of liquid.
Usage
data(p2.16)
Format
This data frame contains the following columns:
- volume
volume of liquid
- pressure
pressure in the tank
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Carroll and Spiegelman (1986) The Effects of Ignoring Small Measurement Errors in Precision Instrument Calibration. Journal of Quality Technology, 18, 170-173.
Examples
data(p2.16)
attach(p2.16)
plot(pressure ~ volume, pch=16)
pressure.lm <- lm(pressure ~ volume)
plot(pressure.lm, which=1)
summary(pressure.lm)
detach(p2.16)
Data Set for Problem 2-17
Description
The p2.17
data frame has 17 observations on the
boiling point of water (in Fahrenheit degrees)
for various barometric pressures (in inches of mercury).
Usage
data(p2.17)
Format
This data frame contains the following columns:
- BoilingPoint
numeric vector
- BarometricPressure
numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
References
Atkinson, A.C. (1985) Plots, Transformations and Regression, Clarendon Press, Oxford.
Examples
data(p2.17)
attach(p2.17)
plot(BoilingPoint ~ BarometricPressure, pch=16)
detach(p2.17)
Data Set for Problem 2-18
Description
The p2.18
data frame has 21 observations on the
advertising expenses (in millions of US dollars) and retain
impressions (in millions per week)
for various companies.
Usage
data(p2.18)
Format
This data frame contains the following columns:
- Firm
character vector
- Amount.Spent
numeric vector
- Returned.Impressions
numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
Examples
data(p2.18)
attach(p2.18)
plot(Returned.Impressions ~ Amount.Spent, pch=16)
detach(p2.18)
Data Set for Problem 2-7
Description
The p2.7
data frame has 20 observations on the
purity of oxygen produced by a fractionation process. It
is thought that oxygen purity is related to the percentage
of hydrocarbons in the main condensor of the processing
unit.
Usage
data(p2.7)
Format
This data frame contains the following columns:
- purity
oxygen purity (percentage)
- hydro
hydrocarbon (percentage)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p2.7)
attach(p2.7)
purity.lm <- lm(purity ~ hydro)
summary(purity.lm)
# confidence interval for mean purity at 1% hydrocarbon:
predict(purity.lm,newdata=data.frame(hydro = 1.00),interval="confidence")
detach(p2.7)
Data Set for Problem 2-9
Description
The p2.9
data frame has 25 rows and 2 columns. See
help on softdrink
for details.
Usage
data(p2.9)
Format
This data frame contains the following columns:
- y
a numeric vector: time
- x
a numeric vector: cases stocked
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p2.9)
Data Set for Problem 4-18
Description
The p4.18
data frame has 13 observations on an
experiment to produce a synthetic analogue to jojoba oil.
Usage
data(p4.18)
Format
This data frame contains the following columns:
- x1
reaction temperature
- x2
initial amount of catalyst
- x3
pressure
- y
yield
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Coteron, Sanchez, Matinez, and Aracil (1993) Optimization of the Synthesis of an Analogue of Jojoba Oil Using a Fully Central Composite Design. Canadian Journal of Chemical Engineering.
Examples
data(p4.18)
y.lm <- lm(y ~ x1 + x2 + x3, data=p4.18)
summary(y.lm)
y.lm <- lm(y ~ x1, data=p4.18)
Data Set for Problem 4-19
Description
The p4.19
data frame has 14 observations on
a designed experiment studying the relationship
between abrasion index for a tire tread compound
and three factors.
Usage
data(p4.19)
Format
This data frame contains the following columns:
- x1
hydrated silica level
- x2
silane coupling agent level
- x3
sulfur level
- y
abrasion index for a tire tread compound
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Derringer and Suich (1980) Simultaneous Optimization of Several Response Variables. Journal of Quality Technology.
Examples
data(p4.19)
attach(p4.19)
y.lm <- lm(y ~ x1 + x2 + x3)
summary(y.lm)
plot(y.lm, which=1)
y.lm <- lm(y ~ x1)
detach(p4.19)
Data Set for Problem 4-20
Description
The p4.20
data frame has 26 observations
on a designed experiment to determine the influence
of five factors on the whiteness of rayon.
Usage
data(p4.20)
Format
This data frame contains the following columns:
- acidtemp
acid bath temperature
- acidconc
cascade acid concentration
- watertemp
water temperature
- sulfconc
sulfide concentration
- amtbl
amount of chlorine bleach
- y
a measure of the whiteness of rayon
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Myers and Montgomery (1995) Response Surface Methodology, pp. 267-268.
Examples
data(p4.20)
y.lm <- lm(y ~ acidtemp, data=p4.20)
summary(y.lm)
Data Set for Problem 5-1
Description
The p5.1
data frame has 8 observations on the impact
of temperature on the viscosity of toluene-tetralin blends.
Usage
data(p5.1)
Format
This data frame contains the following columns:
- temp
temperature
- visc
viscosity
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Byers and Williams (1987) Viscosities of Binary and Ternary Mixtures of Polyaromatic Hydrocarbons. Journal of Chemical and Engineering Data, 32, 349-354.
Examples
data(p5.1)
plot(p5.1)
Data Set for Problem 5-10
Description
The p5.10
data frame has 27 observations on the
effect of three factors on a printing machine's ability
to apply coloring inks on package labels.
Usage
data(p5.10)
Format
This data frame contains the following columns:
- x1
speed
- x2
pressure
- x3
distance
- yi1
response 1
- yi2
response 2
- yi3
response 3
- ybar.i
average response
- si
standard deviation of the 3 responses
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p5.10)
attach(p5.10)
y.lm <- lm(ybar.i ~ x1 + x2 + x3)
plot(y.lm, which=1)
detach(p5.10)
Data Set for Problem 5-11
Description
The p5.11
data frame has 8 observations on an
experiment with a catapult.
Usage
data(p5.11)
Format
This data frame contains the following columns:
- x1
hook
- x2
arm length
- x3
start angle
- x4
stop angle
- yi1
response 1
- yi2
response 2
- yi3
response 3
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p5.11)
attach(p5.11)
ybar.i <- apply(p5.11[,5:7], 1, mean)
sd.i <- apply(p5.11[,5:7], 1, sd)
y.lm <- lm(ybar.i ~ x1 + x2 + x3 + x4)
plot(y.lm, which=1)
detach(p5.11)
Data Set for Problem 5-12
Description
The p5.12
data frame has 27 observations on 9
variables.
Usage
data(p5.12)
Format
This data frame contains the following columns:
- i
a numeric vector
- xi
a numeric vector
- x2
a numeric vector
- x3
a numeric vector
- yi1
response 1
- yi2
response 2
- yi3
response 3
- in211.1.gif
a numeric vector
- si
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p5.11)
attach(p5.11)
ybar.i <- apply(p5.11[,5:7], 1, mean)
sd.i <- apply(p5.11[,5:7], 1, sd)
y.lm <- lm(ybar.i ~ x1 + x2 + x3 + x4)
plot(y.lm, which=1)
detach(p5.11)
Data Set for Problem 5-2
Description
The p5.2
data frame has 11 observations on the vapor
pressure of water for various temperatures.
Usage
data(p5.2)
Format
This data frame contains the following columns:
- temp
temperature (K)
- vapor
vapor pressure (mm Hg)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p5.2)
plot(p5.2)
Data Set for Problem 5-3
Description
The p5.3
data frame has 12 observations on the
number of bacteria surviving in a canned food product and the
number of minutes of exposure to 300 degree Fahrenheit heat.
Usage
data(p5.3)
Format
This data frame contains the following columns:
- bact
number of surviving bacteria
- min
number of minutes of exposure
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p5.3)
plot(bact~min, data=p5.3)
Data Set for Problem 5-4
Description
The p5.4
data frame has 8 observations on 2 variables.
Usage
data(p5.4)
Format
This data frame contains the following columns:
- x
a numeric vector
- y
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p5.4)
plot(y ~ x, data=p5.4)
Data Set for Problem 5-5
Description
The p5.5
data frame has 14 observations on the average
number of defects per 10000 bottles due to stones in the bottle
wall and the number of weeks since the last furnace overhaul.
Usage
data(p5.5)
Format
This data frame contains the following columns:
- defects
a numeric vector
- weeks
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p5.5)
defects.lm <- lm(defects~weeks, data=p5.5)
plot(defects.lm, which=1)
Data Set for Problem 7-1
Description
The p7.1
data frame has 10 observations on a predictor variable.
Usage
data(p7.1)
Format
This data frame contains the following columns:
- x
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p7.1)
attach(p7.1)
x2 <- x^2
detach(p7.1)
Data Set for Problem 7-11
Description
The p7.11
data frame has 11 observations on production cost
versus production lot size.
Usage
data(p7.11)
Format
This data frame contains the following columns:
- x
production lot size
- y
average production cost per unit
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p7.11)
plot(y ~ x, data=p7.11)
Data Set for Problem 7-15
Description
The p7.15
data frame has 6 observations
on vapor pressure of water at various temperatures.
Usage
data(p7.15)
Format
This data frame contains the following columns:
- y
vapor pressure (mm Hg)
- x
temperature (degrees Celsius)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p7.15)
y.lm <- lm(y ~ x, data=p7.15)
plot(y ~ x, data=p7.15)
abline(coef(y.lm))
plot(y.lm, which=1)
Data Set for Problem 7-16
Description
The p7.16
data frame has 26 observations on the
observed mole fraction solubility of a solute at a
constant temperature.
Usage
data(p7.16)
Format
This data frame contains the following columns:
- y
negative logarithm of the mole fraction solubility
- x1
dispersion partial solubility
- x2
dipolar partial solubility
- x3
hydrogen bonding Hansen partial solubility
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
(1991) Journal of Pharmaceutical Sciences 80, 971-977.
Examples
data(p7.16)
pairs(p7.16)
Data Set for Problem 7-19
Description
The p7.19
data frame has 10 observations on the concentration
of green liquor and paper machine speed from a kraft paper
machine.
Usage
data(p7.19)
Format
This data frame contains the following columns:
- y
green liquor (g/l)
- x
paper machine speed (ft/min)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
(1986) Tappi Journal.
Examples
data(p7.19)
y.lm <- lm(y ~ x + I(x^2), data=p7.19)
summary(y.lm)
Data Set for Problem 7-2
Description
The p7.2
data frame has 10 observations on solid-fuel
rocket propellant weight loss.
Usage
data(p7.2)
Format
This data frame contains the following columns:
- x
months since production
- y
weight loss (kg)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p7.2)
y.lm <- lm(y ~ x + I(x^2), data=p7.2)
summary(y.lm)
plot(y ~ x, data=p7.2)
Data Set for Problem 7-4
Description
The p7.4
data frame has 12 observations on two variables.
Usage
data(p7.4)
Format
This data frame contains the following columns:
- x
a numeric vector
- y
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p7.4)
y.lm <- lm(y ~ x + I(x^2), data = p7.4)
summary(y.lm)
Data Set for Problem 7-6
Description
The p7.6
data frame has 12 observations on softdrink
carbonation.
Usage
data(p7.6)
Format
This data frame contains the following columns:
- y
carbonation
- x1
temperature
- x2
pressure
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p7.6)
y.lm <- lm(y ~ x1 + I(x1^2) + x2 + I(x2^2) + I(x1*x2), data=p7.6)
summary(y.lm)
Data Set for Problem 8-11
Description
The p8.11
data frame has 25 observations on the tensile
strength of synthetic fibre used for men's shirts.
Usage
data(p8.11)
Format
This data frame contains the following columns:
- y
tensile strength
- percent
percentage of cotton
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Montgomery (2001)
Examples
data(p8.11)
y.lm <- lm(y ~ percent, data=p8.11)
model.matrix(y.lm)
Data Set for Problem 8-3
Description
The p8.3
data frame has 25 observations on delivery
times taken by a vending machine route driver.
Usage
data(p8.3)
Format
This data frame contains the following columns:
- y
delivery time (in minutes)
- x1
number of cases of product stocked
- x2
distance walked by route driver
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(p8.3)
pairs(p8.3)
Data Set for Problem 9-10
Description
The p9.10
data frame has 31 observations
on the rut depth of asphalt pavements prepared under
different conditions.
Usage
data(p9.10)
Format
This data frame contains the following columns:
- y
change in rut depth/million wheel passes (log scale)
- x1
viscosity (log scale)
- x2
percentage of asphalt in surface course
- x3
percentage of asphalt in base course
- x4
indicator
- x5
percentage of fines in surface course
- x6
percentage of voids in surface course
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Gorman and Toman (1966)
Examples
data(p9.10)
pairs(p9.10)
Pathological Example
Description
Artificial regression data which causes stepwise regression with AIC to produce a highly non-parsimonious model. The true model used to simulate the data has only one real predictor (x8).
Usage
pathoeg
Format
This data frame contains the following columns:
- x1
a numeric vector
- x2
a numeric vector
- x3
a numeric vector
- x4
a numeric vector
- x5
a numeric vector
- x6
a numeric vector
- x7
a numeric vector
- x8
a numeric vector
- x9
a numeric vector
- y
a numeric vector
Unstack Vectors into a Data Frame
Description
Padding an unstacked data frame with missing values to ensure equal length vectors in resulting list. This list is then coerced into a data frame for ease of producing tables.
Usage
postunstack(x, form, ...)
Arguments
x |
A list or data frame to be stacked or unstacked. |
form |
a two-sided formula whose left side evaluates to the vector to be unstacked and whose right side evaluates to the indicator of the groups to create. Defaults to 'formula(x)' in the data frame method for 'unstack'. |
... |
further arguments passed to or from other methods. |
Value
a data frame of columns according to the formula 'form'. If the columns do not all have the same length, the resulting list is coerced to a data frame by padding with missing values.
Author(s)
W. John Braun
See Also
QQ Plot for Analysis of Variance
Description
This function is used to display the weight of the evidence against null main effects in data coming from a 1 factor design, using a QQ plot. In practice this method is often called via the function GANOVA.
Usage
qqANOVA(x, y, plot.it = TRUE, xlab = deparse(substitute(x)),
ylab = deparse(substitute(y)), ...)
Arguments
x |
numeric vector of errors |
y |
numeric vector of scaled responses |
plot.it |
logical vector indicating whether to plot or not |
xlab |
character, x-axis label |
ylab |
character, y-axis label |
... |
any other arguments for the plot function |
Value
A QQ plot is drawn.
Author(s)
W. John Braun
Quadratic Overlay
Description
Overlays a quadratic curve to a fitted quadratic model.
Usage
quadline(lm.obj, ...)
Arguments
lm.obj |
A |
... |
Other arguments to the |
Value
The function superimposes a quadratic curve onto an existing scatterplot.
Author(s)
W.J. Braun
See Also
lm
Examples
data(p4.18)
attach(p4.18)
y.lm <- lm(y ~ x1 + I(x1^2))
plot(x1, y)
quadline(y.lm)
detach(p4.18)
Radon Release
Description
Percentage of radon from water released in showers with orifices of various diameters. Four replicates were obtained, but it should be noted that the temperatures for the replicates (in degrees Celsius) are 21, 30, 38, and 46, respectively. This information should really be accounted for in any serious analysis of the data.
Usage
data("radon")
Format
A data frame with 15 observations on the following 2 variables.
diameter
shower orifice diameter in mm
rep 1
percentage radon released in first run
rep 2
percentage radon released in second run
rep 3
percentage radon released in third run
rep 4
percentage radon released in fourth run
Source
Hazin, C.A. and Eichholz, G.G. (1992) Influence of Water Temperature and Shower Head Orifice Size on the Release of Radon During Showering, Environment International, 18, 363-369.
Length Measurements on Rectangular Objects
Description
Observations of heights, widths and diagonal lengths of several rectangular objects, such as books, photographs, and so on were measured. Only the data in MPV versions 1.62 and later can be trusted; there were errors in the third column in previous versions.
Usage
rectangles
Format
A data frame with 51 observations on the following 4 variables.
h
numeric, heights in centimeters
w
numeric, widths in centimeters
d
numeric, diagonal lengths in centimeters
index
numeric, sum of squares of heights and widths
Examples
x <- sqrt(rectangles$index)
y <- rectangles$d
y.lp <- locpoly(x, y, bandwidth=dpill(x,y), degree=1)
plot(y ~ x)
lines(y.lp, col=2, lty=2)
abline(0,1) # y = x + measurement error
plot(y.lp$y - y.lp$x, type="l", col=2)
Seismic Timing Data
Description
The seismictimings
data frame has 504 rows and 3 columns.
Thickness of a layer of Alberta substratum as measured by
several transects of geophones.
Usage
seismictimings
Format
This data frame contains the following columns:
- x
longitudinal coordinate of geophone.
- y
latitudinal coordinate of geophone.
- z
time for signal to pass through substratum.
Examples
plot(y ~ x, data = seismictimings)
Softdrink Data
Description
The softdrink
data frame has 25 rows and 3 columns.
Usage
data(softdrink)
Format
This data frame contains the following columns:
- y
a numeric vector
- x1
a numeric vector
- x2
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(softdrink)
Soil Moisture Data
Description
Percent soil moisture measurements at 26 different locations in a forest in southwestern British Columbia. Some of the locations were in stands that had been thinned.
Usage
data("soilstudy")
Format
A data frame with 26 observations on the following 3 variables.
location
character vector identifying forest stand
moisture
numeric vector, percentage moisture content
treatment
character vector identifying fuel treatment: thinned or unthinned
Source
Millikin, R.L., Braun, W.J., Alexander, M.E., Fani, S. (2024), The Impact of Fuel Thinning on the Microclimate in Coastal Rainforest Stands of Southwestern British Columbia, Canada. Fire. Vol 7(8), 2024, pp 285-309.
Solar Data
Description
The solar
data frame has 29 rows and 6 columns.
Usage
data(solar)
Format
This data frame contains the following columns:
- total.heat.flux
a numeric vector
- insolation
a numeric vector
- focal.pt.east
a numeric vector
- focal.pt.south
a numeric vector
- focal.pt.north
a numeric vector
- time.of.day
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(solar)
Stain Removal Data
Description
Data on an experiment to remove ketchup stains from white cotton
fabric by soaking the stained fabric in one of five substrates for
one hour. Remaining stains were scored visually and subjectively
according to a 6-point scale (0 = completely clean, 5 = no change)
The stain
data frame has 15 rows and 2 columns.
Usage
data(stain)
Format
This data frame contains the following columns:
- treatment
a factor
- response
a numeric vector
Examples
data(stain)
Table B1
Description
The table.b1
data frame has 28 observations on National
Football League 1976 Team Performance.
Usage
data(table.b1)
Format
This data frame contains the following columns:
- y
Games won in a 14 game season
- x1
Rushing yards
- x2
Passing yards
- x3
Punting average (yards/punt)
- x4
Field Goal Percentage (FGs made/FGs attempted)
- x5
Turnover differential (turnovers acquired - turnovers lost)
- x6
Penalty yards
- x7
Percent rushing (rushing plays/total plays)
- x8
Opponents' rushing yards
- x9
Opponents' passing yards
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(table.b1)
attach(table.b1)
y.lm <- lm(y ~ x2 + x7 + x8)
summary(y.lm)
# over-all F-test:
y.null <- lm(y ~ 1)
anova(y.null, y.lm)
# partial F-test for x7:
y7.lm <- lm(y ~ x2 + x8)
anova(y7.lm, y.lm)
detach(table.b1)
Table B10
Description
The table.b10
data frame has 40 observations
on kinematic viscosity of a certain solvent system.
Usage
data(table.b10)
Format
This data frame contains the following columns:
- x1
Ratio of 2-methoxyethanol to 1,2-dimethoxyethane
- x2
Temperature (in degrees Celsius)
- y
Kinematic viscosity (.000001 m2/s
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Viscosimetric Studies on 2-Methoxyethanol + 1, 2-Dimethoxyethane Binary Mixtures from -10 to 80C. Canadian Journal of Chemical Engineering, 75, 494-501.
Examples
data(table.b10)
attach(table.b10)
y.lm <- lm(y ~ x1 + x2)
summary(y.lm)
detach(table.b10)
Table B11
Description
The table.b11
data frame has 38 observations on the
quality of Pinot Noir wine.
Usage
data(table.b11)
Format
This data frame contains the following columns:
- Clarity
a numeric vector
- Aroma
a numeric vector
- Body
a numeric vector
- Flavor
a numeric vector
- Oakiness
a numeric vector
- Quality
a numeric vector
- Region
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(table.b11)
attach(table.b11)
Quality.lm <- lm(Quality ~ Clarity + Aroma + Body + Flavor + Oakiness +
factor(Region))
summary(Quality.lm)
detach(table.b11)
Table B12
Description
The table.b12
data frame has 32 rows and 6 columns.
Usage
data(table.b12)
Format
This data frame contains the following columns:
- temp
a numeric vector
- soaktime
a numeric vector
- soakpct
a numeric vector
- difftime
a numeric vector
- diffpct
a numeric vector
- pitch
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(table.b12)
Table B13
Description
The table.b13
data frame has 40 rows and 7 columns.
Usage
data(table.b13)
Format
This data frame contains the following columns:
- y
a numeric vector
- x1
a numeric vector
- x2
a numeric vector
- x3
a numeric vector
- x4
a numeric vector
- x5
a numeric vector
- x6
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(table.b13)
Table B14
Description
The table.b14
data frame has 25 observations on the transient
points of an electronic inverter.
Usage
data(table.b14)
Format
This data frame contains the following columns:
- x1
width of the NMOS Device
- x2
length of the NMOS Device
- x3
width of the PMOS Device
- x4
length of the PMOS Device
- x5
a numeric vector
- y
transient point of PMOS-NMOS Inverters
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(table.b14)
y.lm <- lm(y ~ x1 + x2 + x3 + x4, data=table.b14)
plot(y.lm, which=1)
Table B15 - Air Pollution and Mortality Data
Description
The table.b15
data frame has 60 observations on the mortality, environment, and demographic variables for a sample of American cities.
Usage
data(table.b15)
Format
This data frame contains the following columns:
- City
character vector
- Mort
numeric vector, age-adjusted mortality from all causes per 100000
- Precip
numeric vector, precipitation in inches
- Educ
numeric vector, median number of school years completed
- Nonwhite
numeric vector, percentage of 1960 population that is nonwhite
- Nox
numeric vector, relative pollution potential of nitrous oxides
- SO2
numeric vector, relative pollution potential of sulfur dioxide
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
References
McDonald, G. C. and Ayers, J.A. [1978], "Some applications of Chernoff faces: A technique for graphically representing multivariate data", in Graphical Representation of Multivariate Data, Academic Press, New York.
Examples
data(table.b15)
pairs(table.b15[,-1])
Table B16 - Life Expectancy Data
Description
The table.b16
data frame has 38 observations on 6 variables. Each observation
corresponds to an individual country.
Usage
data(table.b16)
Format
This data frame contains the following columns:
- Country
character vector
- LifeExp
numeric vector, in years
- People.per.TV
numeric vector
- People.per.Dr
numeric vector
- LifeExpMale
numeric vector, in years
- LifeExpFemale
numeric vector, in years
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
Table B17 - Satisfaction Survey
Description
The table.b17
data frame has 25 observations on 5 variables.
Usage
data(table.b17)
Format
This data frame contains the following columns:
- Satisfaction
numeric vector
- Age
numeric vector, in years
- Severity
numeric vector
- Surgical.Medical
numeric vector
- Anxiety
numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
Table B18
Description
The table.b18
data frame has 16 observations on 9 variables.
Usage
data(table.b18)
Format
This data frame contains the following columns:
- y
numeric vector
- x1
numeric vector
- x2
numeric vector
- x3
numeric vector
- x4
numeric vector
- x5
numeric vector
- x6
numeric vector
- x7
numeric vector
- x8
numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
Table B19
Description
The table.b19
data frame has 32 observations on 11 variables.
Usage
data(table.b19)
Format
This data frame contains the following columns:
- y
numeric vector
- x1
numeric vector
- x2
numeric vector
- x3
numeric vector
- x4
numeric vector
- x5
numeric vector
- x6
numeric vector
- x7
numeric vector
- x8
numeric vector
- x9
numeric vector
- x10
numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
Table B2
Description
The table.b2
data frame has 29 rows and 6 columns.
Usage
data(table.b2)
Format
This data frame contains the following columns:
- y
a numeric vector
- x1
a numeric vector
- x2
a numeric vector
- x3
a numeric vector
- x4
a numeric vector
- x5
a numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
Examples
data(table.b2)
Table B20
Description
The table.b20
data frame has 18 observations on 6 variables.
Usage
data(table.b20)
Format
This data frame contains the following columns:
- x1
numeric vector
- x2
numeric vector
- x3
numeric vector
- x4
numeric vector
- x5
numeric vector
- y
numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
Examples
pairs(table.b20)
Table B22 - Baseball Data
Description
The table.b22
data frame has 30 observations on 12 variables.
Usage
data(table.b22)
Format
This data frame contains the following columns:
- Team
character vector
- Wins
numeric vector
- Batter.Age
numeric vector
- Runs
numeric vector
- HRs
numeric vector
- SLG
numeric vector
- Pitcher.Age
numeric vector
- ERA
numeric vector
- SO
numeric vector
- HRA
numeric vector
- RA.G
numeric vector
- Errors
numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
Examples
pairs(table.b22[,-1])
Table B23
Description
The table.b23
data frame has 59 observations on 8 variables.
Usage
data(table.b23)
Format
This data frame contains the following columns:
- Player
character vector
- Per
numeric vector
- Lane.Agility.Time..Seconds.
numeric vector
- Shuttle.Run..Seconds.
numeric vector
- Three.Quarter.Sprint..Seconds.
numeric vector
- Standing.Vertical.Leap..Inches.
numeric vector
- Max.Vertical.Leap..Inches.
numeric vector
- Position
character vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
Table B24 - Rental Data
Description
The table.b24
data frame has 51 observations on 6 variables.
Usage
data(table.b24)
Format
This data frame contains the following columns:
- City
character vector
- Population
numeric vector
- X95th.Percentile.Income
numeric vector
- Median.Sale.Price
numeric vector
- Median.Price.sqft
numeric vector
- Rental.Price
numeric vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
Table B25 - Golfing Data
Description
The table.b25
data frame has 50 observations on 6 variables.
Usage
data(table.b25)
Format
This data frame contains the following columns:
- Player
character vector
- Average.Score
numeric vector
- SG..Off.the.Tee
character vector
- SG..Approach.to.Green
character vector
- SG..Around.the.Green
character vector
- SG..Putting
character vector
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2021) Introduction to Linear Regression Analysis. 6th Edition, John Wiley and Sons.
Table B3
Description
The table.b3
data frame has observations on gasoline
mileage performance for 32 different automobiles.
Usage
data(table.b3)
Format
This data frame contains the following columns:
- y
Miles/gallon
- x1
Displacement (cubic in)
- x2
Horsepower (ft-lb)
- x3
Torque (ft-lb)
- x4
Compression ratio
- x5
Rear axle ratio
- x6
Carburetor (barrels)
- x7
No. of transmission speeds
- x8
Overall length (in)
- x9
Width (in)
- x10
Weight (lb)
- x11
Type of transmission (1=automatic, 0=manual)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Motor Trend, 1975
Examples
data(table.b3)
attach(table.b3)
y.lm <- lm(y ~ x1 + x6)
summary(y.lm)
# testing for the significance of the regression:
y.null <- lm(y ~ 1)
anova(y.null, y.lm)
# 95% CI for mean gas mileage:
predict(y.lm, newdata=data.frame(x1=275, x6=2), interval="confidence")
# 95% PI for gas mileage:
predict(y.lm, newdata=data.frame(x1=275, x6=2), interval="prediction")
detach(table.b3)
Table B4
Description
The table.b4
data frame has 24 observations on property
valuation.
Usage
data(table.b4)
Format
This data frame contains the following columns:
- y
sale price of the house (in thousands of dollars)
- x1
taxes (in thousands of dollars)
- x2
number of baths
- x3
lot size (in thousands of square feet)
- x4
living space (in thousands of square feet)
- x5
number of garage stalls
- x6
number of rooms
- x7
number of bedrooms
- x8
age of the home (in years)
- x9
number of fireplaces
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Narula, S.C. and Wellington (1980) Prediction, Linear Regression and Minimum Sum of Relative Errors. Technometrics, 19, 1977.
Examples
data(table.b4)
attach(table.b4)
y.lm <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9)
summary(y.lm)
detach(table.b4)
Data Set for Table B5
Description
The table.b5
data frame has 27 observations on liquefaction.
Usage
data(table.b5)
Format
This data frame contains the following columns:
- y
CO2
- x1
Space time (in min)
- x2
Temperature (in degrees Celsius)
- x3
Percent solvation
- x4
Oil yield (g/100g MAF)
- x5
Coal total
- x6
Solvent total
- x7
Hydrogen consumption
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
(1978) Belle Ayr Liquefaction Runs with Solvent. Industrial Chemical Process Design Development, 17, 3.
Examples
data(table.b5)
attach(table.b5)
y.lm <- lm(y ~ x6 + x7)
summary(y.lm)
detach(table.b5)
Data Set for Table B6
Description
The table.b6
data frame has 28 observations on
a tube-flow reactor.
Usage
data(table.b6)
Format
This data frame contains the following columns:
- y
Nb0Cl3 concentration (g-mol/l)
- x1
COCl2 concentration (g-mol/l)
- x2
Space time (s)
- x3
Molar density (g-mol/l)
- x4
Mole fraction CO2
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
(1972) Kinetics of Chlorination of Niobium oxychloride by Phosgene in a Tube-Flow Reactor. Industrial and Engineering Chemistry, Process Design Development, 11(2).
Examples
data(table.b6)
# Partial Solution to Problem 3.9
attach(table.b6)
y.lm <- lm(y ~ x1 + x4)
summary(y.lm)
detach(table.b6)
Data Set for Table B7
Description
The table.b7
data frame has 16 observations on
oil extraction from peanuts.
Usage
data(table.b7)
Format
This data frame contains the following columns:
- x1
CO2 pressure (bar)
- x2
CO2 temperature (in degrees Celsius)
- x3
peanut moisture (percent by weight)
- x4
CO2 flow rate (L/min)
- x5
peanut particle size (mm)
- y
total oil yield
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Kilgo, M.B. An Application of Fractional Experimental Designs. Quality Engineering, 1, 19-23.
Examples
data(table.b7)
attach(table.b7)
# partial solution to Problem 3.11:
peanuts.lm <- lm(y ~ x1 + x2 + x3 + x4 + x5)
summary(peanuts.lm)
detach(table.b7)
Table B8
Description
The table.b8
data frame has 36 observations on Clathrate
formation.
Usage
data(table.b8)
Format
This data frame contains the following columns:
- x1
Amount of surfactant (mass percentage)
- x2
Time (min)
- y
Clathrate formation (mass percentage)
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Tanii, T., Minemoto, M., Nakazawa, K., and Ando, Y. Study on a Cool Storage System Using HCFC-14 lb Clathrate. Canadian Journal of Chemical Engineering, 75, 353-360.
Examples
data(table.b8)
attach(table.b8)
clathrate.lm <- lm(y ~ x1 + x2)
summary(clathrate.lm)
detach(table.b8)
Data Set for Table B9
Description
The table.b9
data frame has 62 observations on an
experimental pressure drop.
Usage
data(table.b9)
Format
This data frame contains the following columns:
- x1
Superficial fluid velocity of the gas (cm/s)
- x2
Kinematic viscosity
- x3
Mesh opening (cm)
- x4
Dimensionless number relating superficial fluid velocity of the gas to the superficial fluid velocity of the liquid
- y
Dimensionless factor for the pressure drop through a bubble cap
Source
Montgomery, D.C., Peck, E.A., and Vining, C.G. (2001) Introduction to Linear Regression Analysis. 3rd Edition, John Wiley and Sons.
References
Liu, C.H., Kan, M., and Chen, B.H. A Correlation of Two-Phase Pressure Drops in Screen-Plate Bubble Column. Canadian Journal of Chemical Engineering, 71, 460-463.
Examples
data(table.b9)
attach(table.b9)
# Partial Solution to Problem 3.13:
y.lm <- lm(y ~ x1 + x2 + x3 + x4)
summary(y.lm)
detach(table.b9)
target image
Description
The tarimage
is a list.
Most of the values are 0, but there are small regions of 1's.
Usage
data(tarimage)
Format
This list contains the following elements:
- x
a numeric vector having 101 elements.
- y
a numeric vector having 101 elements.
- xy
a numeric matrix having 101 rows and columns
Examples
with(tarimage, image(x, y, xy))
Graphical t Test for Regression
Description
This function analyzes regression data graphically. It allows visualization of the usual t-tests for individual regression coefficients.
Usage
tplot(X, y, plotIt=TRUE, type="hist", includeIntercept=TRUE)
Arguments
X |
The design matrix. |
y |
A numeric vector containing the response. |
plotIt |
Logical: if TRUE, a graph is drawn. |
type |
"QQ" or "hist" |
includeIntercept |
Logical: if TRUE, the intercept effect is plotted; otherwise, it is omitted from the plot. |
Value
A QQ-plot or a histogram and rugplot, or a list if plotIt=FALSE
Author(s)
W. John Braun
Examples
# Jojoba oil data set
X <- p4.18[,-4]
y <- p4.18[,4]
tplot(X, y, type="hist", includeIntercept=FALSE)
title("Tests for Individual Coefficients in the Jojoba Oil Regression")
# Simulated data set where none of the predictors are in the true model:
set.seed(4571)
Z <- matrix(rnorm(400), ncol=10)
A <- matrix(rnorm(81), ncol=9)
simdata <- data.frame(Z[,1], crossprod(t(Z[,-1]),A))
names(simdata) <- c("y", paste("x", 1:9, sep=""))
X <- simdata[,-1]
y <- simdata[,1]
tplot(X, y, type="hist", includeIntercept=FALSE)
title("Tests for Individual Coefficients for the Simulated Data Set")
# NFL Data set:
X <- table.b1[,-1]
y <- table.b1[,1]
tplot(X, y, type="hist", includeIntercept=FALSE)
title("Tests for Individual Coefficients for the NFL Data Set")
# Simulated Data set where x8 is the only predictor in the true model:
X <- pathoeg[,-10]
y <- pathoeg[,10]
par(mfrow=c(2,2))
tplot(X, y)
tplot(X, y, type="QQ")
Sample of Loblolly Pine Data
Description
A random sample of observations taken from the 'Loblolly' data frame, one per Seed.
Usage
data("tree.sample")
Format
A data frame with 12 observations on the following 2 variables.
height
tree heights (ft)
age
tree ages (yr)
Measurements of the Widths of Book Covers
Description
Measurements in centimeters of the widths of a random collection of books.
Usage
widths
Format
A numeric vector of length 24.
Winnipeg Wind Speed
Description
The windWin80
data frame has 366 observations on midnight and noon windspeed
at the Winnipeg International Airport for the year 1980.
Usage
data(windWin80)
Format
This data frame contains the following columns:
- h0
a numeric vector containing the wind speeds at midnight.
- h12
a numeric vector containing the wind spees at the following noon.
Examples
data(windWin80)
ts.plot(windWin80$h12^2)
Weather Observations for Three Stations in Northwestern Ontario
Description
Daily observations taken from 2012 through 2021 on temperature, rain, snow and wind for Fort Frances, Kenora and Dryden, Ontario.
Usage
wxNWO
Format
A data frame with 10959 observations on the following 31 variables.
Longitude
numeric
Latitude
numeric
Station.Name
character
Climate.ID
numeric
Date.Time
numeric
Year
numeric
Month
numeric
Day
numeric
Data.Quality
numeric
Max.Temp
numeric
Max.Temp.Flag
numeric
Min.Temp
numeric
Min.Temp.Flag
numeric
Mean.Temp
numeric
Mean.Temp.Flag
numeric
Heat.Deg.Days
numeric
Heat.Deg.Days.Flag
numeric
Cool.Deg.Days
numeric
Cool.Deg.Days.Flag
numeric
Total.Rain
numeric
Total.Rain.Flag
numeric
Total.Snow
numeric
Total.Snow.Flag
numeric
Total.Precip
numeric
Total.Precip.Flag
numeric
Snow.on.Ground
numeric
Snow.on.Ground.Flag
numeric
Dir.of.Max.Gust
numeric
Dir.of.Max.Gust.Flag
numeric
Speed.of.Max.Gust
numeric
Speed.of.Max.Gust.Flag
numeric
Source
Environment Canada