Attention: Some changes to functions in the current version of madshapR may require updates of existing code.
previous version (1.1.0 and older) | version 2.0.0 |
---|---|
madshapR_DEMO | madshapR_examples |
previous version (1.1.0 and older) | current version (2.0.0) |
---|---|
dataset_evaluate(as_data_dict_mlstr) | dataset_evaluate(is_data_dict_mlstr) |
data_dict_evaluate(as_data_dict_mlstr) | data_dict_evaluate(is_data_dict_mlstr) |
dossier_evaluate(as_data_dict_mlstr) | dossier_evaluate(is_data_dict_mlstr) |
In dataset_evaluate()
, data_dict_evaluate()
and dossier_evaluate()
, the columns generated in the
outputs have been renamed as follows :
previous version (1.1.0 and older) | current version (2.0.0) |
---|---|
index | Index |
name | Variable name |
label | Variable label |
valueType | Data dictionary valueType |
Categories::label | Categories in data dictionary |
Categories::missing | Non-valid categories |
In dataset_summarize()
and
dossier_summarize()
, the columns generated in the outputs
have been renamed as follows :
previous version (1.1.0 and older) | current version (2.0.0) |
---|---|
index in data dict.name | Index |
name | Variable name |
label | Variable label |
Estimated dataset valueType | Suggested valueType |
Actual dataset valueType | Dataset valueType |
Total number of observations | Number of rows |
Nb. distinct values | Number of distinct values |
Nb. valid values | Number of valid values |
Nb. non-valid values | Number of non-valid values |
Nb. NA | Number of empty values |
% total Valid values | % Valid values |
% Non-valid values | % Non-valid values |
% NA | % Empty values |
———————————— | ——————————— |
https://github.com/maelstrom-research/madshapR/issues/123
https://github.com/maelstrom-research/madshapR/issues/112
https://github.com/maelstrom-research/madshapR/issues/75
https://github.com/maelstrom-research/madshapR/issues/87
https://github.com/maelstrom-research/madshapR/issues/82
https://github.com/maelstrom-research/madshapR/issues/81
https://github.com/maelstrom-research/madshapR/issues/76
https://github.com/maelstrom-research/madshapR/issues/116
https://github.com/maelstrom-research/madshapR/issues/115
https://github.com/maelstrom-research/madshapR/issues/109
data_dict_pivot_longer()
when ‘Source’ or
‘Target’ column was not present has been corrected.https://github.com/maelstrom-research/madshapR/issues/86
The SPSS format, which haven
package uses to produce
labelled variables, define integers different form madshapR, which
ultimately . That has been taken in account and corrected.
The SPSS format in the haven
package used to produce
labelled variables defines integers differently from madshapR, which was
causing errors. The difference has been taken into account.
https://github.com/maelstrom-research/madshapR/issues/83
dataset_preprocess()
now handles grouped dataset,
using parameter “group_by”.
Users can now define groups in summaries and visual reports using a variable that is not categorical or has empty values.
https://github.com/maelstrom-research/madshapR/issues/47
https://github.com/maelstrom-research/madshapR/issues/114
https://github.com/maelstrom-research/madshapR/issues/113
https://github.com/maelstrom-research/madshapR/issues/110
https://github.com/maelstrom-research/madshapR/issues/105
https://github.com/maelstrom-research/madshapR/issues/126
https://github.com/maelstrom-research/madshapR/issues/104
https://github.com/maelstrom-research/madshapR/issues/98
https://github.com/maelstrom-research/madshapR/issues/97
https://github.com/maelstrom-research/madshapR/issues/96
https://github.com/maelstrom-research/madshapR/issues/95
https://github.com/maelstrom-research/madshapR/issues/94
https://github.com/maelstrom-research/madshapR/issues/93
https://github.com/maelstrom-research/madshapR/issues/92
https://github.com/maelstrom-research/madshapR/issues/91
https://github.com/maelstrom-research/madshapR/issues/90
https://github.com/maelstrom-research/madshapR/issues/89
https://github.com/maelstrom-research/madshapR/issues/88
https://github.com/maelstrom-research/madshapR/issues/85
https://github.com/maelstrom-research/madshapR/issues/80
https://github.com/maelstrom-research/madshapR/issues/79
https://github.com/maelstrom-research/madshapR/issues/108
https://github.com/maelstrom-research/madshapR/issues/107
https://github.com/maelstrom-research/madshapR/issues/106
https://github.com/maelstrom-research/madshapR/issues/100
https://github.com/maelstrom-research/madshapR/issues/84
https://github.com/maelstrom-research/madshapR/issues/64
typeof_convert_to_valueType()
converts typeof (and
class if any) into its corresponding valueType.
valueType_convert_to_typeof()
converts valueType
into its corresponding typeof and class in R representation.
data_dict_update()
updates a data dictionary from a
dataset.
data_dict_trim_labels()
adds shortened labels to
data dictionary.
first_label_get()
gets the first label from a data
dictionary.
has_categories()
tests if a dataset has categorical
variables.
https://github.com/maelstrom-research/madshapR/issues/63
variable_visualize()
when the
column was empty after removing internally stopwords.https://github.com/maelstrom-research/Rmonize/issues/53
https://github.com/maelstrom-research/Rmonize/issues/49
dataset_evaluate()
https://github.com/maelstrom-research/madshapR/issues/66
https://github.com/maelstrom-research/madshapR/issues/62
data_dict_apply()
,valueType_guess()
and
valueType_adujst()
) have been corrected to be more
consistent in the usage of these functions.https://github.com/maelstrom-research/madshapR/issues/61
https://github.com/maelstrom-research/madshapR/issues/60
data_dict_summarize()
and
dataset_evaluate()
have cells in tibble generated that can
have more than accepted characters in a cell in Excel. the function
truncates the cells in tibbles to a maximum of 10000 characters.https://github.com/maelstrom-research/madshapR/issues/59
dataset_cat_as_labels()
when the values found in the
dataset are not in the data dictionary, and the valueType is text, and
the dataType is “integer” has been fixed.https://github.com/maelstrom-research/madshapR/issues/58
https://github.com/maelstrom-research/madshapR/issues/57
dataset_evaluate()
has been
corrected in the package fabR.https://github.com/maelstrom-research/madshapR/issues/46
To avoid confusion with help(function), the function
madshapR_help()
has been renamed
madshapR_website()
.
Some of the tests were made with another package (Rmonize) which as “madshapR” as a dependence.
in visual reports, void confusing changes in color scheme in visual reports.
Histograms for date variables display valid ranges.
in reports, change % NA as proportion in reports.
dossier_visualize()
report shows variable labels in
the same lang.
in visual reports, the bar plot only appears when there are multiple missing value types, otherwise only the pie chart is shown.
in reports, all of the percentages are now included under “Other values (non categorical)”, which gives a single value.
suppress overwrite parameter in dataset_visualize()
.
in dataset_summary()
minor issue (consistency in column
names and content).
variable_visualize()
when
valueType_guess = TRUEenhance the function check_data_dict_valueType()
, which
was too slow.
valueType_adjust()
now works with empty column (all
NAs)
col_id()
function which is a short cut for calling
the attribute madshapR::col_id
of a dataset.
as_category()
,is_category()
,drop_category()
function which coerces a vector as a categorical object. Typically a
column in a dataset that needs to be coerced into a categorical variable
(The data dictionary is updated accordingly).
DEMO_files
into madshapR_DEMO
for consistency
across our other packages.Addition of NEWS.md
for the development version use
“(development version)”.
Some improvements in the documentation of the package has been made.
internal call of libraries (using ::
) has been
replaced by proper import in the declaration function.
get functions in fabR have been
changed in its last release. the functions using them as dependencies (
check_xxx()
) have been updated accordingly.
DEMO files no longer include harmonization files that are now in the package harmonizR
New Imports: haven, lifecycle
No longer in Imports: xfun
These functions are imported from fabR
bookdown_template()
replaces the deprecated function
bookdown_template()
.
bookdown_render()
which renders a Rmd collection of
files into a docs/index.html website.
bookdown_open()
Which allows to open a
docs/index.html document when the bookdown is rendered
This separation into 3 functions will allow future developments, such as render as a ppt or pdf.
Due to another package development (see fabR), The
function open_visual_report()
has been deprecated in favor
of bookdown_open()
imported from fabR package.
This package is a collection of wrapper functions used in data pipelines.
This is still a work in progress, so please let us know if you used a function before and is not working any longer.
madshapR_help()
Call the help center for full
documentationThese functions allows to create, extract transform and apply meta data to a dataset.
data_dict_collapse()
,data_dict_expand()
,data_dict_filter()
,
data_dict_group_by()
,data_dict_group_split()
,data_dict_list_nest()
,
data_dict_pivot_longer()
,data_dict_pivot_wider()
,data_dict_ungroup()
data_dict_match_dataset()
,data_dict_apply()
,
data_dict_extract()
as_data_dict()
,
as_data_dict_mlstr()
,as_data_dict_shape()
,
is_data_dict()
, is_data_dict_mlstr()
,
is_data_dict_shape()
as_taxonomy()
,
is_taxonomy()
These functions allows to create, extract transform data/meta data from a dataset. A dossier is a list of datasets.
as_dataset()
, as_dossier()
is_dataset()
, is_dossier()
data_extract()
,
dossier_create()
, dataset_zap_data_dict()
,
dataset_cat_as_labels()
These functions allow user to work with, extract or assign data type (valueType) to values and/or dataset.
as_valueType()
, is_valueType()
,
valueType_adjust()
, valueType_guess()
,
valueType_self_adjust()
, valueType_of()
These helper functions evaluate content of a dataset and/or data dictionary to extract from them irregularities or potential errors. These informations are stored in a tibble that can be use to assess inputs.
check_data_dict_categories()
,
check_data_dict_missing_categories()
,
check_data_dict_taxonomy()
,
check_data_dict_variables()
,
check_data_dict_valueType()
,
check_dataset_categories()
,
check_dataset_valueType()
,
check_dataset_variables()
,
check_name_standards()
These helper functions evaluate content of a dataset and/or data dictionary to extract from them summary statistics and elements such as missing values, NA, category names, etc. These informations are stored in a tibble that can be use to summary inputs.
dataset_preprocess()
, summary_variables()
,
summary_variables_categorical()
,summary_variables_date()
,
summary_variables_numeric()
,summary_variables_text()
read_csv_any_formats()
The csv file is read twice to
detect the number of lines to use in attributing the column type
(guess_max
parameter of read_csv). This avoids common
errors when reading csv files.
read_excel_allsheets()
The Excel file is read and
the values are placed in a list of tibbles, with each sheet in a
separate element in the list. If the Excel file has only one sheet, the
output is a single tibble.
write_excel_allsheets()
Write all Excel sheets using
xlsx::write.xlsx()
recursively.
plot_bar()
, plot_box()
,
plot_date()
, plot_density()
,
plot_histogram()
, plot_main_word()
,
plot_pie_valid_value()
, summary_category()
,
summary_numerical()
,summary_text()
data_dict_evaluate()
dataset_evaluate()
dossier_evaluate()
dataset_summarize()
dossier_summarize()
dataset_visualize()
variable_visualize()
open_visual_report()