Type: | Package |
Title: | Facilitates Analysis of CDC NHANES Data |
Version: | 1.1.0 |
Date: | 2016-11-28 |
URL: | http://github.com/silentspringinstitute/RNHANES |
BugReports: | https://github.com/silentspringinstitute/RNHANES/issues |
Description: | Tools for downloading and analyzing CDC NHANES data, with a focus on analytical laboratory data. |
License: | Apache License 2.0 | file LICENSE |
LazyData: | TRUE |
Depends: | R (≥ 2.10) |
Imports: | foreign, survey, rvest, xml2, methods, dplyr |
Suggests: | testthat, knitr, rmarkdown |
VignetteBuilder: | knitr |
RoxygenNote: | 5.0.1 |
NeedsCompilation: | no |
Packaged: | 2016-11-28 16:16:35 UTC; susmann |
Author: | Herb Susmann [cre, aut], Silent Spring Institute [cph] |
Maintainer: | Herb Susmann <susmann@silentspring.org> |
Repository: | CRAN |
Date/Publication: | 2016-11-29 02:45:46 |
RNHANES simplifies downloading and analyzing NHANES data.
Description
RNHANES simplifies downloading and analyzing NHANES data.
Translates cycle years into the correct demography filename suffix, e.g. '2001-2002' returns 'B'
Description
Translates cycle years into the correct demography filename suffix, e.g. '2001-2002' returns 'B'
Usage
demography_filename(year)
Arguments
year |
NHANES cycle, e.g. "2001-2002" |
Value
suffix character e.g. "B"
Download an NHANES data file from a given cycle
Description
Download an NHANES data file from a given cycle
Usage
download_nhanes_file(file_name, year, destination = tempdir(), cache = TRUE)
Arguments
file_name |
file name |
year |
NHANES cycle |
destination |
directory to download the file into |
cache |
whether to cache the file |
Value
path to the downloaded file
Returns the NHANES file suffix for the given year
Description
Returns the NHANES file suffix for the given year
Usage
file_suffix(year)
Arguments
year |
NHANES cycle year (e.g. "2001-2002") |
Value
suffix character (e.g. "B" or "C")
Download an NHANES description file
Description
Download an NHANES description file
Usage
load_nhanes_description(file_name, year, destination = tempdir(),
cache = FALSE)
Arguments
file_name |
file name |
year |
NHANES cycle |
destination |
directory to download the file into |
cache |
whether to cache the file |
Value
data frame containing the file description
Compute quantiles from NHANES weighted survey data
Description
Compute quantiles from NHANES weighted survey data
Usage
nhanes_analyze(analysis_fun, nhanes_data, column, comment_column = "",
weights_column = "", filter = NULL)
Arguments
analysis_fun |
function to use to analyze each variable |
nhanes_data |
data frame containing NHANES data |
column |
column name of the variable to compute quantiles for |
comment_column |
comment column name of the variable |
weights_column |
name of the weights column |
filter |
logical expression used to subset the data |
Value
a data frame
List the valid NHANES cycle years
Description
List the valid NHANES cycle years
Usage
nhanes_cycle_years()
Value
vector of NHANES cycle years
List the NHANES data files
Description
List the NHANES data files
Usage
nhanes_data_files(components = "all", destination = tempfile(),
cache = TRUE)
Arguments
components |
one of "all", "demographics", "dietary", "examination", "laboratory", "questionnaire" |
destination |
destinatino to save the file lists |
cache |
whether to cache the downloaded file lists so they don't have to be re-downloaded every time |
Value
data frame of NHANES data files available to download
Examples
## Not run:
# Download a data frame of all the NHANES data files
files <- nhanes_data_files()
# Download a data frame of just the laboratory files
lab_files <- nhanes_data_files(component = "laboratory")
## End(Not run)
Compute detection frequencies of NHANES data
Description
Compute detection frequencies of NHANES data
Usage
nhanes_detection_frequency(nhanes_data, column, comment_column,
weights_column = "", filter = NULL)
Arguments
nhanes_data |
data frame containing NHANES data |
column |
column names of the variables to compute detection frequencies for |
comment_column |
comment column names of the variables to compute detection frequencies for |
weights_column |
sample weight column |
filter |
logical expression used to subset the data |
Value
named vector of detection frequencies
Examples
## Not run:
dat <- nhanes_load_data("UHG_G", "2011-2012", demographics = TRUE)
# Compute detection frequency
nhanes_detection_frequency(dat, c("URXUHG"), c("URDUHGLC"))
## End(Not run)
Plot a weighted histogram of an NHANES variable
Description
Plot a weighted histogram of an NHANES variable
Usage
nhanes_hist(nhanes_data, column, comment_column, weights_column = "",
filter = "", transform = "", ...)
Arguments
nhanes_data |
data frame containing NHANES data |
column |
column name of the variable to plot |
comment_column |
comment column of the variable to plot |
weights_column |
name of the weights column |
filter |
logical expression used to subset the data |
transform |
transformation to apply to the column. Accepts any function name, for example: "log" |
... |
parameters passed through to svyhist function |
Value
a data frame
Examples
## Not run:
dat <- nhanes_load_data("PFC_G", "2011-2012", demographics = TRUE)
nhanes_hist(dat, "LBXPFOA")
## End(Not run)
Download NHANES data files.
Description
Download NHANES data files.
Usage
nhanes_load_data(file_name, year, destination = tempdir(),
demographics = FALSE, cache = TRUE, recode = FALSE,
recode_data = FALSE, recode_demographics = FALSE,
allow_duplicate_files = FALSE)
Arguments
file_name |
NHANES file name (e.g. "EPH") or a vector of filenames (e.g c("EPH", "GHB")) |
year |
NHANES cycle year (e.g. "2007-2008") or a vector of cycle years |
destination |
directory to download the files to |
demographics |
include demographics data into the dataset |
cache |
whether to cache the file to disk |
recode |
whether to recode the data and demographics (overrides other parameters) |
recode_data |
whether to recode just the data |
recode_demographics |
whether to recode just the demographics |
allow_duplicate_files |
how to handle a request that has duplicate file names/cycle years. By default duplicates will be removed. |
Details
If you supply vectors for both file_name and year, then the vectors are paired and each file_name/year pair is downloaded. For example, file_name = c("EPH, GHB"), year = c("2009-2010", "2011-2012") will download "EPH_F.XPT" and "EPH_G.XPT". In other words, the function does not download every possible combination of file_name and year.
You can specify file names in several formats. In order of specificity: You can supply the complete filename: "EPH_F.XPT" You can supply the filename without an extension: "EPH_F" You can supply the filename without a suffix: "EPH", year = "2009-2010"
If you are loading the same file across multiple years, you must supply the filename without a suffix so that the correct suffix for each year can be used.
This function returns either a list or a data frame. If you load multiple files, the return value will always be a list. This is because the columns may not match in between files. If you load one file, the result will be a data frame.
Value
if file_name or year is a vector, returns a list containing a data frame for each file_name. If file_name and year are both singletons, then a data frame is returned.
Examples
## Not run:
nhanes_load_data("UHG", "2011-2012")
# Load data with demographics
nhanes_load_data("UHG", "2011-2012", demographics = TRUE)
# Download to /tmp directory and overwrite the file if it already exists
nhanes_load_data("HDL_E", "2007-2008", destination = "/tmp", cache = FALSE)
## End(Not run)
Download NHANES demography files for a specific cycle.
Description
Download NHANES demography files for a specific cycle.
Usage
nhanes_load_demography_data(year, destination = tempdir(), cache = FALSE)
Arguments
year |
NHANES cycle year (e.g. "2011-2012") |
destination |
directory to download the file to |
cache |
whether load the file if it already exists on disk |
Examples
## Not run:
nhanes_load_demography_data("2011-2012")
## End(Not run)
Compute quantiles from NHANES weighted survey data
Description
Compute quantiles from NHANES weighted survey data
Usage
nhanes_quantile(nhanes_data, column, comment_column = "",
weights_column = "", quantiles = seq(0, 1, 0.25), filter = NULL)
Arguments
nhanes_data |
data frame containing NHANES data |
column |
column name of the variable to compute quantiles for |
comment_column |
comment column name of the variable for checking if computed quantiles are below the LOD |
weights_column |
name of the weights column |
quantiles |
numeric or vector numeric of quantiles to compute |
filter |
logical expression used to subset the data |
Value
a data frame
Examples
## Not run:
dat <- nhanes_load_data("UHG_G", "2011-2012", demographics = TRUE)
# Compute 50th, 95th, and 99th quantiles
nhanes_quantile(dat, "URXUHG", "URDUHGLC", "WTSA2YR", c(0.5, 0.95, 0.99))
## End(Not run)
Compute the sample size of NHANES data
Description
Compute the sample size of NHANES data
Usage
nhanes_sample_size(nhanes_data, column, comment_column = "",
weights_column = "", filter = NULL)
Arguments
nhanes_data |
data frame containing NHANES data |
column |
column name of the variable to compute quantiles for |
comment_column |
comment column name of the variable for checking if computed quantiles are below the LOD |
weights_column |
name of the weights column |
filter |
logical expression used to subset the data |
Value
a data frame
Examples
## Not run:
dat <- nhanes_load_data("UHG_G", "2011-2012", demographics = TRUE)
nhanes_sample_size(dat, "URXUHG", "URDUHGLC")
## End(Not run)
Search the results from nhanes_variables or nhanes_data_files
Description
Search the results from nhanes_variables or nhanes_data_files
Usage
nhanes_search(nhanes_data, query, ..., fuzzy = FALSE, ignore_case = TRUE,
max_distance = 0.2)
Arguments
nhanes_data |
nhanes variable list, from nhanes_variables function, or data file list, from nhanes_data_files |
query |
regular expression search query |
... |
additional arguments to pass to dplyr::filter |
fuzzy |
whether to use fuzzy string matching for search (based on edit distances) |
ignore_case |
whether search query is case-sensitive |
max_distance |
parameter for tuning fuzzy string matching, 0-1 |
Value
data frame filtered by search query
Examples
## Not run:
nhanes_files <- nhanes_data_files()
# Search for data files about pesticides
nhanes_search(nhanes_files, "pesticides")
## End(Not run)
Apply a function from the survey package to NHANES data
Description
Apply a function from the survey package to NHANES data
Usage
nhanes_survey(survey_fun, nhanes_data, column, comment_column = "",
weights_column = "", filter = NULL, analyze = "values",
callback = NULL, ...)
Arguments
survey_fun |
the survey package function (e.g. svyquantile or svymean) |
nhanes_data |
data frame containing NHANES data |
column |
column name of the variable to compute quantiles for |
comment_column |
comment column name of the variable |
weights_column |
name of the weights column |
filter |
logical expression used to subset the data |
analyze |
one of "values" or "comments", whether to apply the survey function to the value or comment column. |
callback |
optional function to execute on each row of the dataframe |
... |
other arguments to pass to the survey function |
Details
This function provides a generic way to apply any function from the survey package to NHANES data. RNHANES provides specific wrappers for computing quantiles (nhanes_quantile) and detection frequencies (nhanes_detection_frequency), and this function provides a general way to use any survey function.
Value
a data frame
Examples
## Not run:
library(survey)
nhanes_data <- nhanes_load_data("EPH", "2011-2012", demographics = TRUE)
# Compute the mean of triclosan using the svymean function
nhanes_survey(svymean, nhanes_data, "URXTRS", "URDTRSLC", na.rm = TRUE)
# Compute the variance using svyvar
nhanes_survey(svyvar, nhanes_data, "URXTRS", "URDTRSLC", na.rm = TRUE)
## End(Not run)
Build survey objects for NHANES data
Description
Build survey objects for NHANES data
Usage
nhanes_survey_design(nhanes_data, weights_column = "")
Arguments
nhanes_data |
data frame containing NHANES data |
weights_column |
name of the weights column |
Value
a survey design object
Examples
## Not run:
dat <- nhanes_load_data("UHG_G", "2011-2012", demographics = TRUE)
design <- nhanes_survey_design(dat, "WTSA2YR")
svymean(~RIDAGEYR, design)
svyglm(URXUHG ~ RIDAGEYR + RIAGENDR, design)
## End(Not run)
Load the NHANES comprehensive variable list
Description
Load the NHANES comprehensive variable list
Usage
nhanes_variables(components = "all", destination = tempfile(),
cache = TRUE)
Arguments
components |
one of "all", "demographics", "dietary", "examination", "laboratory", "questionnaire" |
destination |
where to save the variable list |
cache |
whether to cache the downloaded variable list so it doesn't have to be re-downloaded every time Helper function for nhanes_variables function |
Value
dat
Examples
## Not run:
# Download the comprehensive NHANES variable list
variables <- nhanes_variables()
# Download the variable list and cache it in a specific file
variables <- nhanes_variables(destination = "./nhanes_data")
## End(Not run)
Extract variance/covariance matrix from parameters of svymean
Description
Extract variance/covariance matrix from parameters of svymean
Usage
nhanes_vcov(nhanes_data, columns, weights_column = "", filter = "")
Arguments
nhanes_data |
data frame containing NHANES data |
columns |
columns to include in svymean for |
weights_column |
name of the weights column |
filter |
logical expression used to subset the data |
Value
a data frame
Examples
## Not run:
dat <- nhanes_load_data("PFC_G", "2011-2012", demographics = TRUE)
nhanes_vcov(dat, c("LBXPFOA", "LBXPFOS"))
## End(Not run)
Processes a file name to make sure it is valid and has the correct suffix and extension File names with an extension (e.g. ".XPT") are not altered
Description
Processes a file name to make sure it is valid and has the correct suffix and extension File names with an extension (e.g. ".XPT") are not altered
Usage
process_file_name(file_name, year, extension = ".XPT")
Arguments
file_name |
name of the file |
year |
NHANES cycle year |
extension |
file extension |
Check that the year is in the correct format e.g. '2001-2002' is correct and returns TRUE, '2001' is not correct and returns FALSE
Description
Check that the year is in the correct format e.g. '2001-2002' is correct and returns TRUE, '2001' is not correct and returns FALSE
Usage
validate_year(year, throw_error = TRUE)
Arguments
year |
the year or years to validate |
throw_error |
whether to throw an error if the year is invalid |