Common workflow operations

aggregate()

Aggregate data based on given grouping.

transform()

Transform observation variables.

normalize()

Normalize observation variables.

variable_select()

Select observation variables.

variable_importance()

Measure variable importance.

drop_na_rows()

Drop rows that are NA in all specified variables.

mark_outlier_rows()

Mark outlier rows.

Variable selection methods

correlation_threshold()

Remove redundant variables.

drop_na_columns()

Remove variables with NA values.

variance_threshold()

Remove variables with near-zero variance.

Aggregation methods

covariance()

Compute covariance matrix and vectorize.

Transformation methods

generalized_log()

Generalized log transform data.

husk()

Husk data.

sparse_random_projection()

Reduce the dimensionality of a population using sparse random projection.

spherize()

Spherize data.

Variable importance methods

replicate_correlation()

Measure replicate correlation of variables.

svd_entropy()

Feature importance based on data entropy.

Higher-order profiling methods

extract_subpopulations()

Extract subpopulations.

Meta operations

stratify()

Stratify operations.

Internal

count_na_rows()

Count the number of NAs per variable.

generate_component_matrix()

A sparse matrix for sparse random projection.

find_significant_pcs()

Find significant PC's given the eigenvalues.