| Title: | Partial Least-Squares Algorithm for Categorical and Scalar Functional Data |
|---|---|
| Description: | Performs the Partial Least-Squares ('PLS') algorithm for functional data through the concept of active area integration. This approach builds upon the basis expansion methods for functional 'PLS' regression described in Aguilera et al. (2010) <doi:10.1016/j.chemolab.2010.09.007>. The package seamlessly handles both Scalar Functional Data ('SFD') and Categorical Functional Data ('CFD'), providing interpretable regression curves even for discrete state changes. It was developed during a PhD thesis between 'DECATHLON' and French research institute 'INRIA' 2022-2026. The 'SmoothPLS' method does not directly decompose the data into a basis; rather, it assumes the data is known as precisely as desired, and for every 'PLS' component, the weight functions are decomposed into the basis. For both single-state and multi-state 'CFD' as well as 'SFD', the algorithm is implemented for a scalar response. To provide a baseline, a naive 'PLS' method on time-value functions and standard Functional 'PLS' are also implemented. |
| Authors: | Francois Bassac [aut, cre] |
| Maintainer: | Francois Bassac <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.5 |
| Built: | 2026-05-11 10:49:07 UTC |
| Source: | https://github.com/francoisbassac/smoothpls |
This function assemble the metrics of all the basis of the basis list. This function only assemble the needed basis, especially if length(curve_to_keep) != N_states
assemble_basis_metric(basis_list, curves_to_keep = NULL)assemble_basis_metric(basis_list, curves_to_keep = NULL)
basis_list |
a list of basis fd object |
curves_to_keep |
a list of the states curves to keep |
a matrix of the metric to consider
Francois Bassac
basis1 = fda::create.bspline.basis(c(0,100), nbasis=10, norder=4) basis2 = fda::create.bspline.basis(c(0,100), nbasis=15, norder=1) basis3 = fda::create.fourier.basis(c(0,100), nbasis=7) assemble_basis_metric(list(basis1, basis2, basis3), list(1,2,4)) assemble_basis_metric(list(basis1, basis2, basis3), list(1,2))basis1 = fda::create.bspline.basis(c(0,100), nbasis=10, norder=4) basis2 = fda::create.bspline.basis(c(0,100), nbasis=15, norder=1) basis3 = fda::create.fourier.basis(c(0,100), nbasis=7) assemble_basis_metric(list(basis1, basis2, basis3), list(1,2,4)) assemble_basis_metric(list(basis1, basis2, basis3), list(1,2))
This function checks the integrity of the input for funcPLS. It returns a list of (basis_list, regul_time_list, curve_type_list, id_col_list, time_col_list)
assert_funcPLS_inputs( df_list, Y, basis_obj, regul_time_obj, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time" )assert_funcPLS_inputs( df_list, Y, basis_obj, regul_time_obj, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time" )
df_list |
a list of dataframes (id, time, value_or_state) |
Y |
a numeric vector of the response |
basis_obj |
a list of basis object or a basis object |
regul_time_obj |
a vector of time regularization values or a list of vectors |
curve_type_obj |
a character "cat" or 'num' or a list of those values |
id_col_obj |
a character of the id column for all the curves or a list of id column character |
time_col_obj |
a character of the time column for all the curves or a list of time column character |
a list of (basis_list, regul_time_list, curve_type_list, id_col_list, time_col_list)
Francois Bassac
This function checks the input of naivePLS function.
assert_multivariate_naivePLS_inputs( df_list, Y, regul_time_obj = NULL, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time" )assert_multivariate_naivePLS_inputs( df_list, Y, regul_time_obj = NULL, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time" )
df_list |
a list of dataframe (id, time, value_or_state) |
Y |
a numeric vector |
regul_time_obj |
a list of time regularisation values |
curve_type_obj |
a list of curve type 'cat' or 'num' |
id_col_obj |
a list of the names of the id columns |
time_col_obj |
a list of the names of the time columns |
a list
Francois Bassac
This function checks the integrity of the input for multivariate_fpls. It returns a list of (basis_list, regul_time_list, curve_type_list, id_col_list, time_col_list)
assert_multivariate_smoothPLS_inputs( df_list, Y, basis_obj, regul_time_obj = NULL, curve_type_obj = NULL, orth_obj = list(TRUE), id_col_obj = "id", time_col_obj = "time" )assert_multivariate_smoothPLS_inputs( df_list, Y, basis_obj, regul_time_obj = NULL, curve_type_obj = NULL, orth_obj = list(TRUE), id_col_obj = "id", time_col_obj = "time" )
df_list |
a list of dataframes (id, time, value_or_state) |
Y |
a numeric vector of the response |
basis_obj |
a list of basis object or a basis object |
regul_time_obj |
a vector of time regularization values or a list of vectors |
curve_type_obj |
a character "cat" or 'num' or a list of those values |
orth_obj |
a boolean, a list or a vector of boolean to orthonormalize or not a basis |
id_col_obj |
a character of the id column for all the curves or a list of id column character |
time_col_obj |
a character of the time column for all the curves or a list of time column character |
a list of (basis_list, regul_time_list, curve_type_list, orth_list, id_col_list, time_col_list)*
Francois Bassac
beta_1_real_func
beta_1_real_func(t, end_time = 100, drop = NULL)beta_1_real_func(t, end_time = 100, drop = NULL)
t |
evaluation time |
end_time |
end time; default 100 |
drop |
particular point of the curve, default NULL |
a value
Francois Bassac
beta_1_real_func(0) beta_1_real_func(10) beta_1_real_func(10:90) plot(x=0:100, y=beta_1_real_func(0:100, 100), type='l', main="Beta_1")beta_1_real_func(0) beta_1_real_func(10) beta_1_real_func(10:90) plot(x=0:100, y=beta_1_real_func(0:100, 100), type='l', main="Beta_1")
beta_2_real_func
beta_2_real_func(t, end_time = 100, drop = 3 * 100/5)beta_2_real_func(t, end_time = 100, drop = 3 * 100/5)
t |
evaluation time |
end_time |
end time; default 100 |
drop |
particular point of the curve, default 3*100/5 |
a value
Francois Bassac
beta_2_real_func(0) beta_2_real_func(10) beta_2_real_func(10:90) plot(x=0:100, y=beta_2_real_func(0:100, 100), type='l', main="Beta_2")beta_2_real_func(0) beta_2_real_func(10) beta_2_real_func(10:90) plot(x=0:100, y=beta_2_real_func(0:100, 100), type='l', main="Beta_2")
beta_3_real_func
beta_3_real_func(t, end_time = 100, drop = 27 * 100/100)beta_3_real_func(t, end_time = 100, drop = 27 * 100/100)
t |
evaluation time |
end_time |
end time; default 100 |
drop |
particular point of the curve, default 27 |
a value
Francois Bassac
beta_3_real_func(0) beta_3_real_func(10) beta_3_real_func(10:90) plot(x=0:100, y=beta_3_real_func(0:100, 100), type='l', main="Beta_3")beta_3_real_func(0) beta_3_real_func(10) beta_3_real_func(10:90) plot(x=0:100, y=beta_3_real_func(0:100, 100), type='l', main="Beta_3")
beta_4_real_func
beta_4_real_func(t, end_time = 100, drop = NULL)beta_4_real_func(t, end_time = 100, drop = NULL)
t |
evaluation time |
end_time |
end time; default 100 |
drop |
particular point of the curve, default NULL |
a value
Francois Bassac
beta_4_real_func(0) beta_4_real_func(10) beta_4_real_func(10:90) plot(x=0:100, y=beta_4_real_func(0:100, 100), type='l', main="Beta_4")beta_4_real_func(0) beta_4_real_func(10) beta_4_real_func(10:90) plot(x=0:100, y=beta_4_real_func(0:100, 100), type='l', main="Beta_4")
beta_5_real_func
beta_5_real_func(t, end_time = 100, drop = 3 * 100/5)beta_5_real_func(t, end_time = 100, drop = 3 * 100/5)
t |
evaluation time |
end_time |
end time; default 100 |
drop |
particular point of the curve, default 3*100/5 |
a value
Francois Bassac
beta_5_real_func(0) beta_5_real_func(10) beta_5_real_func(10:90) plot(x=0:100, y=beta_5_real_func(0:100, 100), type='l', main="Beta_5")beta_5_real_func(0) beta_5_real_func(10) beta_5_real_func(10:90) plot(x=0:100, y=beta_5_real_func(0:100, 100), type='l', main="Beta_5")
Constant function = 1 Can adjust the constant value by drop input
beta_6_real_func(t, end_time = 100, drop = 1)beta_6_real_func(t, end_time = 100, drop = 1)
t |
evaluation time |
end_time |
end time; default 100 |
drop |
particular point of the curve, default 1 |
a value
Francois Bassac
beta_5_real_func(0) beta_5_real_func(10) beta_5_real_func(10:90) plot(x=0:100, y=beta_5_real_func(0:100, 100), type='l', main="Beta_5")beta_5_real_func(0) beta_5_real_func(10) beta_5_real_func(10:90) plot(x=0:100, y=beta_5_real_func(0:100, 100), type='l', main="Beta_5")
Triangular function with angle at (x=drop, y=drop) with slope of 1 and -1
beta_7_real_func(t, end_time = 100, drop = 3 * end_time/5)beta_7_real_func(t, end_time = 100, drop = 3 * end_time/5)
t |
evaluation time |
end_time |
end time; default 100 |
drop |
particular point of the curve, default 3*end_time/5 |
a value
Francois Bassac
beta_5_real_func(0) beta_5_real_func(10) beta_5_real_func(10:90) plot(x=0:100, y=beta_5_real_func(0:100, 100), type='l', main="Beta_5")beta_5_real_func(0) beta_5_real_func(10) beta_5_real_func(10:90) plot(x=0:100, y=beta_5_real_func(0:100, 100), type='l', main="Beta_5")
beta_list_generation
beta_list_generation(N_states = 3)beta_list_generation(N_states = 3)
N_states |
a int of the number of states wanted, default 3 |
a list of functions
Francois Bassac
beta_list = beta_list_generation() beta_list_2 = beta_list_generation(6)beta_list = beta_list_generation() beta_list_2 = beta_list_generation(6)
returns a matrix whose blocks are A and B. returns : ( A, 0 ) ( 0, B )
block_diag(A, B)block_diag(A, B)
A |
a matrix |
B |
a matrix |
a matrix
Francois Bassac
A = matrix(c(1, 2, 3, 4), 2) B = matrix(c(5, 6, 7, 8,9, 10), 2) C = block_diag(A, B)A = matrix(c(1, 2, 3, 4), 2) B = matrix(c(5, 6, 7, 8,9, 10), 2) C = block_diag(A, B)
build_df_per_state
build_df_per_state(data_list, id_col = "id", time_col = "time")build_df_per_state(data_list, id_col = "id", time_col = "time")
data_list |
a list containing the dataframe of the indicator function of each state. |
id_col |
a character for the id column, default 'id' |
time_col |
a character for the time column, default 'time' |
a list of the ordered states in the indicator function form.
Francois Bassac
N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) si_df = state_indicator(df, id_col='id', time_col='time') split_df = split_in_state_df(si_df, id_col='id', time_col='time')N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) si_df = state_indicator(df, id_col='id', time_col='time') split_df = split_in_state_df(si_df, id_col='id', time_col='time')
This function preprocess the different input in order to format them to the right number of curve. Warning the "right" number of curve take into account the number of different states for the CFDs.
build_new_data_list( df_list, N_curves, orth_basis_list = NULL, basis_list = NULL, curve_type_list, id_col_list, time_col_list, regul_time_list )build_new_data_list( df_list, N_curves, orth_basis_list = NULL, basis_list = NULL, curve_type_list, id_col_list, time_col_list, regul_time_list )
df_list |
a list of dataframes (id, time, value_or_state) |
N_curves |
a integer, the number of curves |
orth_basis_list |
a list of orthogonalized basis fd list |
basis_list |
a list of basis fd functions |
curve_type_list |
a list of the curve type of each curve |
id_col_list |
a list of the id column name for each curve |
time_col_list |
a list of the time column name for each curve |
regul_time_list |
a list of the time regularization vector for each curve |
a list
Francois Bassac
This function builds the smooth PLS regression functions.
build_reg_curve_spls( plsr_model, curves_names_list, v_i_list, nb_comp_pls_opt = NULL, print_steps = TRUE )build_reg_curve_spls( plsr_model, curves_names_list, v_i_list, nb_comp_pls_opt = NULL, print_steps = TRUE )
plsr_model |
a pls model |
curves_names_list |
a list of the names of the different curves |
v_i_list |
a list of the v functions |
nb_comp_pls_opt |
a integer, the number of component to take into account. if null (default) all the components are used |
print_steps |
a boolean to print steps |
a list of fd object
Francois Bassac
This function build some intermediate functions of the smooth pls algorithm. It also evaluates the different coefficients gamma_ij
build_spls_functions( curves_names_list, new_basis_list, new_orth_basis_list, d_i, u_i )build_spls_functions( curves_names_list, new_basis_list, new_orth_basis_list, d_i, u_i )
curves_names_list |
a list of the names of the different curves |
new_basis_list |
a list of the initial basis fd object for all curves |
new_orth_basis_list |
a list of the orthogonalized basis as fd list for all curves |
d_i |
the pls coefficient such as X = d_i t_i : plsr_model$loadings |
u_i |
the pls coefficient such as t_i = X u_i : plsr_model$loading.weights |
a list Lambda = sum d_i t_i
Francois Bassac
build_u_ki_list
build_u_ki_list(N_states, nbComp, ms_pls_models)build_u_ki_list(N_states, nbComp, ms_pls_models)
N_states |
a integer, number of different states |
nbComp |
a integer, max number of components |
ms_pls_models |
a list of the intermediate pls models to evaluate the real multi-stats pls components. |
a list of the u_i^k
Francois Bassac
This function apply all functions to go from a categorical functional data with different states to a list of one dataframe per state indicatrice (in the ascending order) whose duplicated states where removed.
cat_data_to_indicator(data, id_col = "id", time_col = "time")cat_data_to_indicator(data, id_col = "id", time_col = "time")
data |
a multistates dataframe ('id', 'time', 'states') |
id_col |
a character for the id column, default 'id' |
time_col |
a character for the time column, default 'time' |
a list of the ordered states in the indicator function form.
Francois Bassac
N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) df_list = cat_data_to_indicator(df)N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) df_list = cat_data_to_indicator(df)
This function takes the df_new output from regularize_time_series and convert it into another format
convert_to_wide_format(df_new, id_col = "id", time_col = "time")convert_to_wide_format(df_new, id_col = "id", time_col = "time")
df_new |
a regularized dataframe |
id_col |
col_name of df_new for the id |
time_col |
col_name of df_new for the id |
the dataframe in wide format
Francois Bassac
id_df = data.frame(id=rep(1,5), time=seq(0, 40, 10), state=c(0, 1, 1, 0, 1)) id_df_new = regularize_time_series(id_df, time_seq = seq(0, 40, 2), curve_type = 'cat') convert_to_wide_format(id_df_new)id_df = data.frame(id=rep(1,5), time=seq(0, 40, 10), state=c(0, 1, 1, 0, 1)) id_df_new = regularize_time_series(id_df, time_seq = seq(0, 40, 2), curve_type = 'cat') convert_to_wide_format(id_df_new)
create_bspline_basis
create_bspline_basis(start, end, nbasis = 10, norder = 4)create_bspline_basis(start, end, nbasis = 10, norder = 4)
start |
start time |
end |
end time |
nbasis |
number of basis functions, default 10 |
norder |
order of the basis function, default cubic splines 4 |
a basis fd object
Francois Bassac
b0 = create_bspline_basis(0, 10, 10, 4) plot(b0) b1 = create_bspline_basis(0, 10, 10, 2) plot(b1) b2 = create_bspline_basis(0, 10, 10, 1) plot(b1)b0 = create_bspline_basis(0, 10, 10, 4) plot(b0) b1 = create_bspline_basis(0, 10, 10, 2) plot(b1) b2 = create_bspline_basis(0, 10, 10, 1) plot(b1)
This function determines the curves names bases on its place in the data and its states if it is a categorical functional data.
determine_curve_name(curve_number, curve_type = NULL, states_names = NULL)determine_curve_name(curve_number, curve_type = NULL, states_names = NULL)
curve_number |
a int, the place of the curve in the data |
curve_type |
a character, 'cat' or 'num' |
states_names |
a character or list of character with the states of the CFD |
a vector of names.
Francois Bassac
This function determines the next state base on the current state and the transition matrix.
determine_next_state(current_state, transition_df)determine_next_state(current_state, transition_df)
current_state |
a value of the current state |
transition_df |
a dataframe of the transition matrix |
a value for the next state
Francois Bassac
N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) determine_next_state(1, transition_df)N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) determine_next_state(1, transition_df)
This function returns the min and max values of a list of functions and fd objects on regul_time.
eval_max_min_y(f_list, regul_time)eval_max_min_y(f_list, regul_time)
f_list |
a list of functions and fd objects |
regul_time |
a vector of time evaluation points. |
a vector
Francois Bassac
Either for R func or fd function, this function evaluates the distance between the real curve and each curve fun or fd which are in the func_fd_list.
evaluate_curves_distances(real_f, regul_time, fun_fd_list = NULL)evaluate_curves_distances(real_f, regul_time, fun_fd_list = NULL)
real_f |
a fun of fd function, base function to compare |
regul_time |
a vector of time regularization values |
fun_fd_list |
a list of fun or fd functions or a fun or a fd function |
No return value, called for side effects (prints distances to the console).
Francois Bassac
This function evaluates int_0^T w_i_fd p_j_t dt for all j return a list, the element i length is (i-1)
evaluate_gamma_ij(w_i_list, p_i_list)evaluate_gamma_ij(w_i_list, p_i_list)
w_i_list |
a list of fd functions w_i(t) |
p_i_list |
a list of fd functions p_i(t) |
a list of list of the gamma_ij values
Francois Bassac
Evaluates the integral where are the active
intervals of a categorical functional data (states 0 or 1).
evaluate_id_func_integral( id_df, func, id_col = "id", time_col = "time", rel_tol = .Machine$double.eps^0.5, subdivisions = 1000L, ... )evaluate_id_func_integral( id_df, func, id_col = "id", time_col = "time", rel_tol = .Machine$double.eps^0.5, subdivisions = 1000L, ... )
id_df |
Dataframe for a single individual with at least columns (id, time, state). |
func |
The R function to integrate. |
id_col |
Character, name of the id column, default 'id'. |
time_col |
Character, name of the time column, default 'time'. |
rel_tol |
Relative tolerance for stats::integrate, default 1e-8. |
subdivisions |
Max number of subdivisions for integrate, default 100. |
... |
Additional arguments (ignored to prevent passing unused params to func). |
A dataframe with the id and the calculated integral value.
This function evaluate the integral for a state (0, 1) functional data : int( X(t) func(t) )dt. This function works ONLY for a one state CFD!
evaluate_id_func_integral_deprecated( id_df, func, mode = 1, id_col = "id", time_col = "time", nb_pt = 10, subdivisions = 100 )evaluate_id_func_integral_deprecated( id_df, func, mode = 1, id_col = "id", time_col = "time", nb_pt = 10, subdivisions = 100 )
id_df |
a single id dataframe of at least named columns (id, time) |
func |
the function to integrate |
mode |
select the integration mode 1 for R function integrate, 2 for pracma::trapz. default value : 1 |
id_col |
col_name of df for the id |
time_col |
col_name of df for the time |
nb_pt |
number of points for the integration, default value : 10 |
subdivisions |
default parameter of R function integrate; default value : 100 |
a dataframe with the id and the integral value.
Francois Bassac
id_df = data.frame(id=rep(1,5), time=seq(0, 40, 10), state=c(0, 1, 1, 0, 1)) evaluate_id_func_integral(id_df, function(t){t})id_df = data.frame(id=rep(1,5), time=seq(0, 40, 10), state=c(0, 1, 1, 0, 1)) evaluate_id_func_integral(id_df, function(t){t})
This function evaluates the Lambda matrix such as per column : Lambda_i = int_0^T X(t) phi_i(t) dt. The curve_type input is important function of the type of data you work with 'cat' for Categorical Functional Data 'num' for Scalar Functional Data
evaluate_lambda( df, basis, curve_type = NULL, int_mode = 1, id_col = "id", time_col = "time", nb_pt = 10, subdivisions = 100, regul_time = seq(basis$rangeval[1], basis$rangeval[2], 1), parallel = TRUE )evaluate_lambda( df, basis, curve_type = NULL, int_mode = 1, id_col = "id", time_col = "time", nb_pt = 10, subdivisions = 100, regul_time = seq(basis$rangeval[1], basis$rangeval[2], 1), parallel = TRUE )
df |
dataframe X(t) |
basis |
basis fd object |
curve_type |
a character, 'cat' for Categorical FD, 'num' for Scalar FD |
int_mode |
integration mode, 1 for integrate, 2 for pracma::trapz |
id_col |
a character for the id column, default 'id' |
time_col |
a character for the time column, default 'time' |
nb_pt |
number of points for the integration, default value : 10 |
subdivisions |
default parameter of R function integrate; default value : 100 |
regul_time |
regul_time a vector of time regularization values default basis rangeval per 1 |
parallel |
a boolean to use parallelization, default TRUE |
a matrix
Francois Bassac
df = generate_X_df(nind=100, start=0, end=100, curve_type = 'cat', lambda_0=0.2, lambda_1=0.1, prob_start=0.5) basis = create_bspline_basis(0, 100, 10, 4) Lambda = evaluate_lambda(df, basis, curve_type = 'cat') df = generate_X_df(nind=100, start=0, end=100, curve_type = 'num') basis = create_bspline_basis(0, 100, 10, 4) Lambda = evaluate_lambda(df, basis, curve_type = 'num', int_mode = 2)df = generate_X_df(nind=100, start=0, end=100, curve_type = 'cat', lambda_0=0.2, lambda_1=0.1, prob_start=0.5) basis = create_bspline_basis(0, 100, 10, 4) Lambda = evaluate_lambda(df, basis, curve_type = 'cat') df = generate_X_df(nind=100, start=0, end=100, curve_type = 'num') basis = create_bspline_basis(0, 100, 10, 4) Lambda = evaluate_lambda(df, basis, curve_type = 'num', int_mode = 2)
This function evaluates the Lambda matrix Lambda_ij = int X_j(t) phi_i(t) dt using parallel processing with Chunking.
evaluate_lambda_CFD( df, basis, int_mode = 1, id_col = "id", time_col = "time", nb_pt = 10, subdivisions = 100, parallel = TRUE )evaluate_lambda_CFD( df, basis, int_mode = 1, id_col = "id", time_col = "time", nb_pt = 10, subdivisions = 100, parallel = TRUE )
df |
dataframe X(t) |
basis |
basis fd object or a list of fd functions |
int_mode |
integration mode, 1 for integrate, 2 for pracma::trapz |
id_col |
a character for the id column, default 'id' |
time_col |
a character for the time column, default 'time' |
nb_pt |
number of points for the integration, default value : 10 |
subdivisions |
default parameter of R function integrate; default value : 100 |
parallel |
boolean, if TRUE uses max(nb_cores - 2, 1) cores, default TRUE. |
a matrix of dimension nbasis columns and nind rows
Francois Bassac
This function evaluates the Lambda matrix Lambda_ij = int X_j(t) phi_i(t) dt using parallel processing.
evaluate_lambda_CFD_para_v1( df, basis, int_mode = 1, id_col = "id", time_col = "time", nb_pt = 10, subdivisions = 100, parallel = TRUE )evaluate_lambda_CFD_para_v1( df, basis, int_mode = 1, id_col = "id", time_col = "time", nb_pt = 10, subdivisions = 100, parallel = TRUE )
df |
dataframe X(t) |
basis |
basis fd object or a list of fd functions |
int_mode |
integration mode, 1 for integrate, 2 for pracma::trapz |
id_col |
a character for the id column, default 'id' |
time_col |
a character for the time column, default 'time' |
nb_pt |
number of points for the integration, default value : 10 |
subdivisions |
default parameter of R function integrate; default value : 100 |
parallel |
a boolean to enable parallel processing, default TRUE |
a matrix of dimension nbasis columns and nind rows
Francois Bassac
This function evaluates the Lambda matrix Lambda_ij = int X_j(t) phi_i(t) dt for step > 1 using parallel chunking.
evaluate_lambda_SFD( df, basis, regul_time = seq(basis$rangeval[1], basis$rangeval[2], 1), int_mode = 1, id_col = "id", time_col = "time", subdivisions = 100, parallel = TRUE )evaluate_lambda_SFD( df, basis, regul_time = seq(basis$rangeval[1], basis$rangeval[2], 1), int_mode = 1, id_col = "id", time_col = "time", subdivisions = 100, parallel = TRUE )
df |
dataframe X(t) |
basis |
basis fd object |
regul_time |
a vector of time regularization values default basis rangeval per 1 |
int_mode |
int, integration mode, 1 for integrate, 2 for pracma::trapz |
id_col |
default name of the id column |
time_col |
default name of the time column |
subdivisions |
default parameter of R function integrate; default value : 100 |
parallel |
boolean, if TRUE uses parallel processing. Default TRUE. |
a matrix of dimension nbasis columns and nind rows
Francois Bassac
This function evaluates the Lambda matrix Lambda_ij = int X_j(t) phi_i(t) dt for step > 1
evaluate_lambda_SFD_para_v1( df, basis, regul_time = seq(basis$rangeval[1], basis$rangeval[2], 1), int_mode = 1, id_col = "id", time_col = "time", subdivisions = 100, parallel = TRUE )evaluate_lambda_SFD_para_v1( df, basis, regul_time = seq(basis$rangeval[1], basis$rangeval[2], 1), int_mode = 1, id_col = "id", time_col = "time", subdivisions = 100, parallel = TRUE )
df |
dataframe X(t) |
basis |
basis fd object |
regul_time |
a vector of time regularization values default basis rangeval per 1 |
int_mode |
int, integration mode, 1 for integrate, 2 for pracma::trapz |
id_col |
default name of the id column |
time_col |
default name of the time column |
subdivisions |
default parameter of R function integrate; default value : 100 |
parallel |
a boolean to enable parallel processing, default TRUE |
a matrix of dimension nbasis columns and nind rows
Francois Bassac
This function evaluates the metric of a certain basis. The metric is the inprod of the basis functions.
evaluate_metric(basis)evaluate_metric(basis)
basis |
basis to evaluate the metric |
a matrix of dimension nbasis X nbasis
Francois Bassac
basis = create_bspline_basis(start=0, end=10, nbasis=10, norder=4) metric = evaluate_metric(basis) basis1 = create_bspline_basis(start=0, end=20, nbasis=10, norder=1) metric1 = evaluate_metric(basis1)basis = create_bspline_basis(start=0, end=10, nbasis=10, norder=4) metric = evaluate_metric(basis) basis1 = create_bspline_basis(start=0, end=20, nbasis=10, norder=1) metric1 = evaluate_metric(basis1)
This functions evaluates the regression function such as :
.
evaluate_reg_curve_SPLS_uni(plsr_model, v_i_list, nb_comp = NULL)evaluate_reg_curve_SPLS_uni(plsr_model, v_i_list, nb_comp = NULL)
plsr_model |
a model from pls package |
v_i_list |
a list of fd functions v_i(t) :
|
nb_comp |
a value, number of components to take into account, default NULL if NULL, then delta(t) will take every components available. |
a fd function
Francois Bassac
evaluate_results This function evaluates the PRESS, RMSE, MAE, R2 and the % of variance between Y and Y_hat
evaluate_results(Y, Y_hat)evaluate_results(Y, Y_hat)
Y |
a vector of real values |
Y_hat |
a vector of modeled values |
a dataframe
Francois Bassac
evaluate_results(c(1,2,3,4,5), c(0.9, 2.2, 4, 5.5, 5))evaluate_results(c(1,2,3,4,5), c(0.9, 2.2, 4, 5.5, 5))
This function evaluates the different functions v_i_t bases on gamma_ij and
the functions w_i_fd. Recursive :
evaluate_V_i_function(w_i_list, gamma_ij_list)evaluate_V_i_function(w_i_list, gamma_ij_list)
w_i_list |
a list of fd functions w_i(t) |
gamma_ij_list |
a list of list of gamma_ij values |
a list of fd functions
Francois Bassac
This function return the % of variance explained by Y_hat comparing to Y.
evaluate_variance_explained(Y, Y_hat)evaluate_variance_explained(Y, Y_hat)
Y |
a reference value |
Y_hat |
a modeled value Y_hat = model(X) |
a value in %
Francois Bassac
evaluate_variance_explained(c(1,2,3,4,5,6,7,8,9,10), c(0.9, 1.1, 1.9, 2.4, 5.3, 6.01, 7,45, 9.12, 9.04, 11.6))evaluate_variance_explained(c(1,2,3,4,5,6,7,8,9,10), c(0.9, 1.1, 1.9, 2.4, 5.3, 6.01, 7,45, 9.12, 9.04, 11.6))
This function transform if necessary the input basis into a list of fd functions. If basis is a basis object from fda, the output fd_list is the list of the different basis functions as fd functions. If basis is already a list of fd functions, nothing changes.
from_basis_to_fdlist(basis)from_basis_to_fdlist(basis)
basis |
a basis fd object or a list of fd functions. |
a list of fd functions
Francois Bassac
basis = create_bspline_basis(start = 0, end = 10, nbasis = 5, norder = 4) plot(basis) basis_list = from_basis_to_fdlist(basis) plot(basis_list[[1]], col = 1) for(i in 2:length(basis_list)){ plot(basis_list[[i]], add=TRUE, col = i) } basis_list_2 = from_basis_to_fdlist(basis_list)basis = create_bspline_basis(start = 0, end = 10, nbasis = 5, norder = 4) plot(basis) basis_list = from_basis_to_fdlist(basis) plot(basis_list[[1]], col = 1) for(i in 2:length(basis_list)){ plot(basis_list[[i]], add=TRUE, col = i) } basis_list_2 = from_basis_to_fdlist(basis_list)
This function transform a fd object into a function. It require either the fd object OR the coefficient and the basis object.
from_fd_to_func(fd_obj = NULL, coef = NULL, basisobj = NULL)from_fd_to_func(fd_obj = NULL, coef = NULL, basisobj = NULL)
fd_obj |
a fd object to transform into a function |
coef |
the coefficient of an fd object to transform into a function |
basisobj |
the basis object of an fd object to transform into a function |
a function
Francois Bassac
basis = create_bspline_basis(0, 100, 10, 4) coef = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) func_from_fd = from_fd_to_func(coef = coef, basis = basis)basis = create_bspline_basis(0, 100, 10, 4) coef = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) func_from_fd = from_fd_to_func(coef = coef, basis = basis)
This function performs the Multivariate Functional PLS as a matrix problem.
funcPLS( df_list, Y, basis_obj, regul_time_obj, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time", print_steps = FALSE, plot_rmsep = TRUE, print_nbComp = TRUE, plot_reg_curves = FALSE, jackknife = TRUE, validation = "LOO" )funcPLS( df_list, Y, basis_obj, regul_time_obj, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time", print_steps = FALSE, plot_rmsep = TRUE, print_nbComp = TRUE, plot_reg_curves = FALSE, jackknife = TRUE, validation = "LOO" )
df_list |
a list of dataframes (id, time, value_or_state) |
Y |
a numeric vector of the response |
basis_obj |
a basis fd obj or a list of basis fd obj. If basis fd obj, the same basis is used for all the curves |
regul_time_obj |
a vector of time regularization values or a list of vectors |
curve_type_obj |
a character "cat" or 'num' or a list of those values |
id_col_obj |
a character of the id column for all the curves or a list of id column character |
time_col_obj |
a character of the time column for all the curves or a list of time column character |
print_steps |
a boolean to cat the current step |
plot_rmsep |
a boolean to plot the plsr RMSEP |
print_nbComp |
a boolean to cat the optimal number of components |
plot_reg_curves |
a boolean to directly plot the beta regression curves |
jackknife |
a plsr input, default = TRUE |
validation |
a plsr input, default = 'LOO' |
a list ("curve_names", "alphas", "metric", "root_metric", "trans_alphas", "mfpls_mfd", "nb_comp_pls_opt", "beta_0", "beta_pls_list")
Francois Bassac
This function performs the prediction on a df_predict_ms using the delta_list
.
funcPLS_predict( df_predict_list, delta_list, curve_type_obj = NULL, regul_time_obj = NULL, id_col_obj = "id", time_col_obj = "time", int_mode = 1, nb_pt = 10, subdivisions = 100, parallel = TRUE )funcPLS_predict( df_predict_list, delta_list, curve_type_obj = NULL, regul_time_obj = NULL, id_col_obj = "id", time_col_obj = "time", int_mode = 1, nb_pt = 10, subdivisions = 100, parallel = TRUE )
df_predict_list |
a list of dataframe (id, time, state_or_value) |
delta_list |
a list of regression objects (Intercept, fd, etc) |
curve_type_obj |
a list of the curves types 'cat' or 'num' |
regul_time_obj |
a list of time regularization values |
id_col_obj |
a list of characters of the names of the id columns |
time_col_obj |
a list of characters of the names of the time columns |
int_mode |
a integer for integration mode, 1 for integrate, 2 for pracma::trapz, default 1 |
nb_pt |
a integer, the number of intermediate points for integration mode 2, default 10 |
subdivisions |
a integer, the number of subdivisions for integration mode 1, default 100 |
parallel |
a boolean to use parallelization, default TRUE |
a numeric vector
Francois Bassac
This function generates probabilities whose sum is 1.
generate_probabilities(N_proba)generate_probabilities(N_proba)
N_proba |
a int the number of values requested |
a vector of the probabilities
Francois Bassac
generate_probabilities(3) generate_probabilities(5)generate_probabilities(3) generate_probabilities(5)
This function generate synthetic data of nind X(t). For 'cat' curve_type it is in state 0 or 1 by two different exponential laws. For curve_type = 'num' it is noised cosine.
generate_X_df( nind = 100, start = 0, end = 100, curve_type = "cat", lambda_0 = 0.2, lambda_1 = 0.1, prob_start = 0.5, noise_sd = 0.1, seed = 123 )generate_X_df( nind = 100, start = 0, end = 100, curve_type = "cat", lambda_0 = 0.2, lambda_1 = 0.1, prob_start = 0.5, noise_sd = 0.1, seed = 123 )
nind |
number of individuals, default 100 |
start |
first time, default 0 |
end |
last time, default 100 |
curve_type |
character, type of wanted synthetic data, default 'cat', need to be \'cat\' or \'num\'. |
lambda_0 |
lambda parameter for exponential law for state 0, default 0.2 |
lambda_1 |
lambda parameter for exponential law for state 1, default 0.1 |
prob_start |
Start state 1 probability, binomial law, default 0.5 |
noise_sd |
noise added to the signal, default 0.1 |
seed |
seed for reproducibility, default 123 |
the dataframe of the individuals
Francois Bassac
generate_X_df()generate_X_df()
This function generate synthetic data of nind X(t) in state 0 or 1 by two different exponential laws.
generate_X_df_CFD( nind = 500, start = 0, end = 100, lambda_0 = 0.2, lambda_1 = 0.1, prob_start = 0.5, seed = 123 )generate_X_df_CFD( nind = 500, start = 0, end = 100, lambda_0 = 0.2, lambda_1 = 0.1, prob_start = 0.5, seed = 123 )
nind |
number of individuals, default 500 |
start |
first time, default 0 |
end |
last time, default 100 |
lambda_0 |
lambda parameter for exponential law for state 0, default 0.2 |
lambda_1 |
lambda parameter for exponential law for state 1, default 0.1 |
prob_start |
Start state 1 probability, binomial law, default 0.5 |
seed |
a integer, random seed 123 |
the dataframe of the individuals
Francois Bassac
generate_X_df_CFD() generate_X_df_CFD(10, 13, 60, 0.21, 0.13, 0.7)generate_X_df_CFD() generate_X_df_CFD(10, 13, 60, 0.21, 0.13, 0.7)
This function generates a multistates CFD.
generate_X_df_multistates( nind = 100, N_states = 3, start = 0, end = 100, lambdas, transition_df, seed = 123 )generate_X_df_multistates( nind = 100, N_states = 3, start = 0, end = 100, lambdas, transition_df, seed = 123 )
nind |
a int of the number of the individuals, default 100 |
N_states |
a int of the number of states wanted, default 3 |
start |
a value of starting time, default 0 |
end |
a value of ending time, default 100 |
lambdas |
a vector of N_states lambda values from lambda_determination(N_states) |
transition_df |
a dataframe with the transition matrix from transfer_probabilities(N_states) |
seed |
a integer, random seed |
a dataframe of a multistates Categorical Functional Data
Francois Bassac
N_states = 4 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df)N_states = 4 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df)
This function generate synthetic data of nind X which are noised cosines.
generate_X_df_SFD(nind = 100, start = 0, end = 100, noise_sd = 0.1, seed = 123)generate_X_df_SFD(nind = 100, start = 0, end = 100, noise_sd = 0.1, seed = 123)
nind |
number of individuals, default 500 |
start |
first time, default 0 |
end |
last time, default 100 |
noise_sd |
noise added to the signal, default 0.1 |
seed |
seed for reproducibility, default 123 |
a dataframe
Francois Bassac
generate_X_df_SFD(nind = 100, start = 0, end = 100, noise_sd = 0.1, seed = 123)generate_X_df_SFD(nind = 100, start = 0, end = 100, noise_sd = 0.1, seed = 123)
This function generate test data of nind X which are noised cosines.
generate_X_df_SFD_data( nind = 100, start = 0, end = 100, noise_sd = 0.1, seed = 123 )generate_X_df_SFD_data( nind = 100, start = 0, end = 100, noise_sd = 0.1, seed = 123 )
nind |
number of individuals, default 500 |
start |
first time, default 0 |
end |
last time, default 100 |
noise_sd |
noise added to the signal, default 0.1 |
seed |
seed for reproducibility, default 123 |
a dataframe
Francois Bassac
generate_X_df_SFD(nind = 100, start = 0, end = 100, noise_sd = 0.1, seed = 123)generate_X_df_SFD(nind = 100, start = 0, end = 100, noise_sd = 0.1, seed = 123)
generate_X_df_test
generate_X_df_test( TTRatio = 0.2, nind = 500, start = 0, end = 100, curve_type = "cat", lambda_0 = 0.2, lambda_1 = 0.1, prob_start = 0.5, seed = 123 )generate_X_df_test( TTRatio = 0.2, nind = 500, start = 0, end = 100, curve_type = "cat", lambda_0 = 0.2, lambda_1 = 0.1, prob_start = 0.5, seed = 123 )
TTRatio |
Train Test ratio, default 0.2 |
nind |
number of individuals, default 500 |
start |
first time, default 0 |
end |
last time, default 100 |
curve_type |
character, type of wanted synthetic data, default 'cat', need to be \'cat\' or \'num\'. |
lambda_0 |
lambda parameter for exponential law for state 0, default 0.2 |
lambda_1 |
lambda parameter for exponential law for state 1, default 0.1 |
prob_start |
Start state 1 probability, binomial law, default 0.5 |
seed |
a integer, random seed 123 |
a dataframe of the test data
Francois Bassac
generate_X_df_test() generate_X_df_test(TTRatio = 0.4, nind=8, start=0, end=10, curve_type = 'num')generate_X_df_test() generate_X_df_test(TTRatio = 0.4, nind=8, start=0, end=10, curve_type = 'num')
This function generates Y_df bases on df, beta_func with the following link Y = beta_0 + int(X(t)*beta(t))dt It generates also the noised values of Y. Here the NotS_ratio is the Noise over total Signal ratio meaning that a value of 0.2 means that the noise represents 20% of the TOTAL variance.
generate_Y_df( df, curve_type = NULL, beta_real_func_or_list, beta_0_real = 5.4321, NotS_ratio = 0.2, seed = 123, id_col = "id", time_col = "time", int_mode = 1, nb_pt = 10, subdivisions = 100, parallel = FALSE )generate_Y_df( df, curve_type = NULL, beta_real_func_or_list, beta_0_real = 5.4321, NotS_ratio = 0.2, seed = 123, id_col = "id", time_col = "time", int_mode = 1, nb_pt = 10, subdivisions = 100, parallel = FALSE )
df |
the X(t) dataframe to evaluate Y on |
curve_type |
a character, the type of data, default NULL, need to be \'cat\' or \'num\'. |
beta_real_func_or_list |
a function or a list of functions beta(t), function used for the Y evaluation |
beta_0_real |
the intercept, default 5.4321 |
NotS_ratio |
the Noise over total Signal ratio, default 0.2 |
seed |
a integer value for the seed to be reproducible 123 |
id_col |
a character of the id column, default 'id' |
time_col |
a character of the time column, default 'time' |
int_mode |
integration mode, 1 for integrate, 2 for pracma::trapz |
nb_pt |
number of points for the integration, default value : 10 |
subdivisions |
default parameter of R function integrate; default value : 100 |
parallel |
a boolean to use parallelization, default FALSE |
a dataframe of Y real and noised values
Francois Bassac
df = generate_X_df(nind=100, curve_type = 'cat') beta_real_func<-function(t, end_time=100, drop = NULL){ return(beta_real = sin(t*2*pi/end_time + pi) * exp (1.5*t/end_time)) } beta_0_real=5.4321 Y_df = generate_Y_df(df, curve_type = 'cat', beta_real_func, beta_0_real, NotS_ratio=0.2)df = generate_X_df(nind=100, curve_type = 'cat') beta_real_func<-function(t, end_time=100, drop = NULL){ return(beta_real = sin(t*2*pi/end_time + pi) * exp (1.5*t/end_time)) } beta_0_real=5.4321 Y_df = generate_Y_df(df, curve_type = 'cat', beta_real_func, beta_0_real, NotS_ratio=0.2)
This function generates Y_df bases on df, beta_func with the following link Y = beta_0 + int(X(t)*beta(t))dt It generates also the noised values of Y. Here the NotS_ratio is the Noise over total Signal ratio meaning that a value of 0.2 means that the noise represents 20% of the TOTAL variance.
generate_Y_df_CFD( df, beta_real_func_or_list, beta_0_real = 5.4321, NotS_ratio = 0.2, id_col = "id", time_col = "time", int_mode = 1, nb_pt = 10, subdivisions = 100, seed = 123, parallel = FALSE )generate_Y_df_CFD( df, beta_real_func_or_list, beta_0_real = 5.4321, NotS_ratio = 0.2, id_col = "id", time_col = "time", int_mode = 1, nb_pt = 10, subdivisions = 100, seed = 123, parallel = FALSE )
df |
the X(t) dataframe to evaluate Y on |
beta_real_func_or_list |
a function beta(t), or a list of function used for the Y evaluation |
beta_0_real |
the intercept, default 5.4321 |
NotS_ratio |
the Noise over total Signal ratio, default 0.2 |
id_col |
a character of the id column, default 'id' |
time_col |
a character of the time column, default 'time' |
int_mode |
integration mode, 1 for integrate, 2 for pracma::trapz |
nb_pt |
number of points for the integration, default value : 10 |
subdivisions |
default parameter of R function integrate; default value : 100 |
seed |
a integer, random seed |
parallel |
a boolean to enable parallel processing, default FALSE |
a dataframe of Y real and noised values
Francois Bassac
df = generate_X_df(nind=100, curve_type='cat') beta_real_func<-function(t, end_time=100, drop = NULL){ return(beta_real = sin(t*2*pi/end_time + pi) * exp (1.5*t/end_time)) } beta_0_real=5.4321 Y_df = generate_Y_df_CFD(df, beta_real_func, beta_0_real)df = generate_X_df(nind=100, curve_type='cat') beta_real_func<-function(t, end_time=100, drop = NULL){ return(beta_real = sin(t*2*pi/end_time + pi) * exp (1.5*t/end_time)) } beta_0_real=5.4321 Y_df = generate_Y_df_CFD(df, beta_real_func, beta_0_real)
This function generates Y_df bases on df, beta_func with the following link Y = beta_0 + int(X(t)*beta(t))dt It generates also the noised values of Y. Here the NotS_ratio is the Noise over total Signal ratio meaning that a value of 0.2 means that the noise represents 20% of the TOTAL variance.
generate_Y_df_SFD( df, beta_real_func, beta_0_real = 5.4321, NotS_ratio = 0.2, id_col = "id", time_col = "time", seed = 123 )generate_Y_df_SFD( df, beta_real_func, beta_0_real = 5.4321, NotS_ratio = 0.2, id_col = "id", time_col = "time", seed = 123 )
df |
the X(t) dataframe to evaluate Y on |
beta_real_func |
a function beta(t), function used for the Y evaluation |
beta_0_real |
the intercept, default 5.4321 |
NotS_ratio |
the Noise over total Signal ratio, default 0.2 |
id_col |
a character of the id column, default 'id' |
time_col |
a character of the time column, default 'time' |
seed |
a integer, random seed, 123 |
a dataframe
Francois Bassac
Orthonormalize a basis functions with Gram-Schmidt algorithm.
gram_schmidt_orthonormalize(basis, output_type = "fdlist", tol = 1e-12)gram_schmidt_orthonormalize(basis, output_type = "fdlist", tol = 1e-12)
basis |
A basis object from fda package or a list of fd functions. |
output_type |
A character to choose the output format. "fdlist" (default) or "funlist" for R functions. |
tol |
a float, tolerance parameter, default 1e-12 |
A list of orthonormalized functions fd or func(t)
Francois Bassac
start = 0 end = 10 basis = create_bspline_basis(start, end, nbasis = 10, norder = 4) basis_orth = gram_schmidt_orthonormalize(basis, "fdlist")start = 0 end = 10 basis = create_bspline_basis(start, end, nbasis = 10, norder = 4) basis_orth = gram_schmidt_orthonormalize(basis, "fdlist")
This function print some information on the package use.
help_smoothPLS()help_smoothPLS()
No return value, called for side effects (prints instructions to the console).
Francois Bassac
This functions determine randomly the initial state of a categorical functional data, uniform probability between states.
initial_state_determination(N_states)initial_state_determination(N_states)
N_states |
a int the number of considered states |
a value of the designated state
Francois Bassac
initial_state_determination(2) initial_state_determination(7)initial_state_determination(2) initial_state_determination(7)
Check if a basis function is orthogonal
is_orthogonal(basis, tol = 1e-10)is_orthogonal(basis, tol = 1e-10)
basis |
A basis object from fda package or a list of fd functions. |
tol |
a float, tolerance parameter, default 1e-10) |
A boolean, TRUE if orthogonal, FALSE if not
Francois Bassac
Check if a basis function is orthonormal
is_orthonormal(basis, tol = 1e-10)is_orthonormal(basis, tol = 1e-10)
basis |
A basis object from fda package or a list of fd functions. |
tol |
a float, tolerance parameter, default 1e-10) |
A boolean, TRUE if orthonormal, FALSE if not
Francois Bassac
This function determines a number of lambda parameters for exponential laws.
lambda_determination(N_states, lambda_values = c(0.05, 0.25))lambda_determination(N_states, lambda_values = c(0.05, 0.25))
N_states |
a int the number of states |
lambda_values |
a vector of min and max values authorized for lambda |
a numeric vector with the lambda values
Francois Bassac
lambda_determination(3) lambda_determination(7)lambda_determination(3) lambda_determination(7)
This function evaluates the MAE error based on the values.
mae_values(y, y_hat)mae_values(y, y_hat)
y |
a vector of real values |
y_hat |
a vector of predicted values |
a value
Francois Bassac
y = c(1, 2, 4, 6, 8, 10) y_hat = c(2, 3, 5, 7, 9, 11) mae_values(y, y_hat)y = c(1, 2, 4, 6, 8, 10) y_hat = c(2, 3, 5, 7, 9, 11) mae_values(y, y_hat)
This function builds the alphas matrix by columns binding the alpha of each curve. It returns the alphas matrix and a vector containing all the curves names.
multivariate_alpha_building( df_list, basis_list, curve_type_list, regul_time_list, id_col_list, time_col_list, print_steps = FALSE )multivariate_alpha_building( df_list, basis_list, curve_type_list, regul_time_list, id_col_list, time_col_list, print_steps = FALSE )
df_list |
a list of data frame (id, time, state_or_value) |
basis_list |
a list of basis |
curve_type_list |
a list of curves |
regul_time_list |
a list of regul_time |
id_col_list |
a list if the characters for the id column |
time_col_list |
a list of the characters for the time column |
print_steps |
a boolean to print the current step, default FALSE |
a list of the alphas matrix and the curve_names vector and the new_basis_list
Francois Bassac
This function assemble the metrics of all the basis of the basis list for multivariate use case. It take care on the categorical case by multiplying the basis per state.
multivariate_assemble_basis_metric( df_list, basis_list, curve_list, id_col_list, time_col_list )multivariate_assemble_basis_metric( df_list, basis_list, curve_list, id_col_list, time_col_list )
df_list |
a list of dataframe (id, time, value_or_state) |
basis_list |
a list of basis fd object |
curve_list |
a list of the curves type 'cat' or 'num' |
id_col_list |
a list of the character of the id columns |
time_col_list |
a list of the character of the time columns |
a matrix of the metric to consider
Francois Bassac
This function performs the naive PLS method for Categorical functional data, Scalar functional data and multivariate data.
naivePLS( df_list, Y, regul_time_obj = NULL, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time", print_steps = FALSE, plot_rmsep = TRUE, print_nbComp = TRUE, plot_reg_curves = FALSE, validation = "LOO", jackknife = TRUE )naivePLS( df_list, Y, regul_time_obj = NULL, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time", print_steps = FALSE, plot_rmsep = TRUE, print_nbComp = TRUE, plot_reg_curves = FALSE, validation = "LOO", jackknife = TRUE )
df_list |
a list of dataframe (id, time, value_or_state) |
Y |
a numeric vector for the scalar response |
regul_time_obj |
a list of time regularization values |
curve_type_obj |
a list of the curve types 'cat' or 'num' |
id_col_obj |
a list of character of the names of the id columns |
time_col_obj |
a list of character of the names of the time columns |
print_steps |
a boolean to print the different steps, default FALSE |
plot_rmsep |
a boolean to plot the RMSEP, default TRUE |
print_nbComp |
a boolean to print the optimal number or components, default TRUE |
plot_reg_curves |
a boolean to plot the regression curves, default FALSE |
validation |
a character, pls::plsr input, default 'LOO' |
jackknife |
a boolean, pls::plsr input, default TRUE |
a list of ("plsr_model", "nbCP_opti", "curves_names", "opti_reg_coef", "reg_obj")
Francois Bassac
This function format the list of given dataframe into a time regularized matrix usable for the naivePLS function or prediction.
naivePLS_formatting( df_list, regul_time_obj = NULL, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time" )naivePLS_formatting( df_list, regul_time_obj = NULL, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time" )
df_list |
a list of dataframe of 3 columns (id, time, state_or_value) |
regul_time_obj |
a list of vector of time regularization values |
curve_type_obj |
a list of character of the type of curves |
id_col_obj |
a list of character for the id column names |
time_col_obj |
a list of character for the time column names |
a list
Francois Bassacs
This function use the df_pred to make a prediction for a naivePLS object.
naivePLS_predict( naive_pls_obj, df_predict_list, regul_time_obj = NULL, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time" )naivePLS_predict( naive_pls_obj, df_predict_list, regul_time_obj = NULL, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time" )
naive_pls_obj |
a list of a naivePLS object |
df_predict_list |
a list of dataframe (id, time, state_or_value) |
regul_time_obj |
a list of time regularization values |
curve_type_obj |
a list of curves types 'cat' or 'num' |
id_col_obj |
a list of id column names |
time_col_obj |
a list of time column names |
a numeric vector
Francois Bassac
This function evaluates the number of individuals for a test set.
number_of_test_id(TTRatio = 0.2, nind = 100)number_of_test_id(TTRatio = 0.2, nind = 100)
TTRatio |
Train Test ratio, default 0.2 |
nind |
number of individuals, default 100 |
int, the number of id for test set
Francois Bassac
number_of_test_id(TTRatio = 0.2, nind=100)number_of_test_id(TTRatio = 0.2, nind=100)
This functions creates a list of basis containing the value of the number of states or curves sharing the same basis.
obj_list_creation(N_rep, obj)obj_list_creation(N_rep, obj)
N_rep |
a value of the number of states sharing the same basis |
obj |
a object |
a list of N_rep appended obj
Francois Bassac
N_rep = 4 start = 1 end = 51 nbasis = 13 norder = 3 basis = create_bspline_basis(start, end, nbasis, norder) basis_list = obj_list_creation(N_rep, basis) obj_list_creation(N_rep, 0:100)N_rep = 4 start = 1 end = 51 nbasis = 13 norder = 3 basis = create_bspline_basis(start, end, nbasis, norder) basis_list = obj_list_creation(N_rep, basis) obj_list_creation(N_rep, 0:100)
This function orthonormalized a basis from a list if needed.
orthonormalize_basis_list(basis_list, orth_list, tol = 1e-09)orthonormalize_basis_list(basis_list, orth_list, tol = 1e-09)
basis_list |
a list of basis to orthonormalized |
orth_list |
a list of boolean per basis if orthonormalization is needed |
tol |
a float, tolerance parameter. |
a list of list of fd object : the orthonormalized functions.
Francois Bassac
This functions evaluates the p_i(t) (X_i regression functions) and w_i(t)
such as .
The evaluation is done for all the components.
p_w_building( coefficient, N_curves_processed, new_basis_list, new_orth_basis_list, curves_names_list )p_w_building( coefficient, N_curves_processed, new_basis_list, new_orth_basis_list, curves_names_list )
coefficient |
a table of the coefficient to use |
N_curves_processed |
a integer, the real number of curves (1 sfd = 1 curve, 1 cfd = 1curve per state) |
new_basis_list |
a list of processed basis list (1 basis per curve_processed) |
new_orth_basis_list |
a list of orthonormalized basis list (1 basis per curve_processed) |
curves_names_list |
a list of curve name (1 name per curve_processed) |
a list of fd objects
Francois Bassac
This function only plot some individuals. It plots the first plot_individuals of df_to_plot. Works both for single state and multistates data (numerical states 1, 2, 3, etc)
plot_CFD_individuals( df_to_plot, n_ind_to_plot = 5, id_col = "id", time_col = "time", by_cfda = FALSE )plot_CFD_individuals( df_to_plot, n_ind_to_plot = 5, id_col = "id", time_col = "time", by_cfda = FALSE )
df_to_plot |
dataframe whose individuals will be plotted |
n_ind_to_plot |
number of the first individuals to plot, default 5 |
id_col |
col_name of df_to_plot for the id, not character, default id' |
time_col |
col_name of df_to_plot for the time, not character, default 'time' |
by_cfda |
a boolean to use cfda package function plotData, default FALSE |
a ggplot
Francois Bassac
df = generate_X_df() plot_CFD_individuals(df, 5) plot_CFD_individuals(df, 5, by_cfda = TRUE)df = generate_X_df() plot_CFD_individuals(df, 5) plot_CFD_individuals(df, 5, by_cfda = TRUE)
This function plots on the same figure the fd curves from the fd_list by evaluating them on the given regul_time
plot_fd_list(fd_list, curves_names, regul_time)plot_fd_list(fd_list, curves_names, regul_time)
fd_list |
a list of fd objects |
curves_names |
a list of the curves names |
regul_time |
a numeric vector |
a plot
Francois Bassac
This function plots some histograms for train_results and test_results
plot_model_metrics_base( train_results, test_results, models_to_plot = c("FPLS", "SmoothPLS", "NaivePLS"), n_digits = 3 )plot_model_metrics_base( train_results, test_results, models_to_plot = c("FPLS", "SmoothPLS", "NaivePLS"), n_digits = 3 )
train_results |
a dataframe of train results |
test_results |
a dataframe of test results |
models_to_plot |
a list of characters of the models to plot, default c("FPLS", "SmoothPLS", "NaivePLS") |
n_digits |
a integer for the number of significant numbers to print, default 3 |
a plot
Francois Bassac
plot_real_and_smoothed_data_ind
plot_real_and_smoothed_data_ind( df_wide, df_fd, time_seq = 0:100, id = 1, col_list = c("blue", "red", "green", "yellow") )plot_real_and_smoothed_data_ind( df_wide, df_fd, time_seq = 0:100, id = 1, col_list = c("blue", "red", "green", "yellow") )
df_wide |
a wide dataframe, output of convert_to_wide_format() |
df_fd |
a list of functional data from df_wide |
time_seq |
a vector to plot on, default 0:100 |
id |
a value of the id to plot |
col_list |
a list of color to separate the states |
No return value, called for side effects (generates a plot).
Francois Bassac
N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) df_processed = cat_data_to_indicator(df) df_regul = list() df_wide = list() for(name in names(df_processed)){ print(paste0(name, " regularisation")) df_regul[[name]] = regularize_time_series(df_processed[[name]], time_seq = c(0:100), curve_type = 'cat', id_col='id', time_col='time') df_wide[[name]] = convert_to_wide_format(df_regul[[name]], id_col='id', time_col='time') } basis = create_bspline_basis(0, 100, 10, 4) df_fd = list() for(name in names(df_wide)){ print(paste0(name, " fd transformation")) df_fd[[name]] = fda::Data2fd(argvals = c(0:100), y = t(df_wide[[name]][, -c(1)]), basis) } plot_real_and_smoothed_data_ind(df_wide, df_fd, c(0:100), id=1)N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) df_processed = cat_data_to_indicator(df) df_regul = list() df_wide = list() for(name in names(df_processed)){ print(paste0(name, " regularisation")) df_regul[[name]] = regularize_time_series(df_processed[[name]], time_seq = c(0:100), curve_type = 'cat', id_col='id', time_col='time') df_wide[[name]] = convert_to_wide_format(df_regul[[name]], id_col='id', time_col='time') } basis = create_bspline_basis(0, 100, 10, 4) df_fd = list() for(name in names(df_wide)){ print(paste0(name, " fd transformation")) df_fd[[name]] = fda::Data2fd(argvals = c(0:100), y = t(df_wide[[name]][, -c(1)]), basis) } plot_real_and_smoothed_data_ind(df_wide, df_fd, c(0:100), id=1)
press_model This function evaluates the PRESS error 'LOO' of a lm or glm model.
press_model(model)press_model(model)
model |
a model from lm or glm |
a value
Francois Bassac
This function evaluates the press error using values y and y_hat
press_values(y, y_hat)press_values(y, y_hat)
y |
a vector of real values |
y_hat |
a vector of predicted values |
a value
Francois Bassac
y = c(1, 2, 4, 6, 8, 10) y_hat = c(2, 3, 5, 7, 9, 11) press_values(y, y_hat)y = c(1, 2, 4, 6, 8, 10) y_hat = c(2, 3, 5, 7, 9, 11) press_values(y, y_hat)
This function evaluates the R_2 using values y and y_hat
r_squared_values(y, y_hat)r_squared_values(y, y_hat)
y |
a vector of real values |
y_hat |
a vector of predicted values |
a value
Francois Bassac
y = c(1, 2, 4, 6, 8, 10) y_hat = c(2, 3, 5, 7, 9, 11) r_squared_values(y, y_hat)y = c(1, 2, 4, 6, 8, 10) y_hat = c(2, 3, 5, 7, 9, 11) r_squared_values(y, y_hat)
This function evaluate the beta regression functions of the multivariate functional PLS.
reg_curve_funcPLS_evaluation( mfpls_mfd, nb_comp_pls_opt, root_metric, new_basis_list, curve_names, print_steps = FALSE )reg_curve_funcPLS_evaluation( mfpls_mfd, nb_comp_pls_opt, root_metric, new_basis_list, curve_names, print_steps = FALSE )
mfpls_mfd |
a MFPLS model |
nb_comp_pls_opt |
the optimal number of components |
root_metric |
the root metric |
new_basis_list |
a list of basis fd objects |
curve_names |
a list of curves to keep |
print_steps |
a boolean to cat or not the steps |
a list of the betas per state
Francois Bassac
This function regularize the data on a new time interval time_seq given the curve_type. For curve_type = 'cat' the output state stay the same between 2 times. For curve_type = 'num' the intermediate values are interpolated linearly.
regularize_time_series( df, time_seq = 0:100, curve_type = NULL, id_col = "id", time_col = "time" )regularize_time_series( df, time_seq = 0:100, curve_type = NULL, id_col = "id", time_col = "time" )
df |
dataframe with one or more different ids |
time_seq |
New time sequence where we want to regularize |
curve_type |
A string giving the type of the curve, 'cat' for a categorical functional data, 'num' for a scalar functional data, default : NULL |
id_col |
col_name of df for the id |
time_col |
col_name of df for the time |
a dataframe with the regularized data on time_seq
Francois Bassac
id_df = data.frame(id=rep(1,5), time=seq(0, 40, 10), state=c(0, 1, 1, 0, 1)) regularize_time_series(id_df, time_seq = seq(0, 40, 2), curve_type = 'cat') regularize_time_series(id_df, time_seq = seq(0, 40, 2), curve_type = 'num')id_df = data.frame(id=rep(1,5), time=seq(0, 40, 10), state=c(0, 1, 1, 0, 1)) regularize_time_series(id_df, time_seq = seq(0, 40, 2), curve_type = 'cat') regularize_time_series(id_df, time_seq = seq(0, 40, 2), curve_type = 'num')
This function regularize the data CFD on a new time interval time_seq
regularize_time_series_CFD( df, time_seq = 0:100, id_col = "id", time_col = "time" )regularize_time_series_CFD( df, time_seq = 0:100, id_col = "id", time_col = "time" )
df |
dataframe with one or more different ids |
time_seq |
New time sequence where we want to regularize |
id_col |
col_name of df for the id |
time_col |
col_name of df for the time |
a dataframe with the regularized data on time_seq
Francois Bassac
This function regularizes a Scalar Functional Data SFD on the time_seq input. This function uses linear interpolation.
regularize_time_series_SFD( df, time_seq = 0:100, id_col = "id", time_col = "time" )regularize_time_series_SFD( df, time_seq = 0:100, id_col = "id", time_col = "time" )
df |
dataframe with one or more different ids |
time_seq |
New time sequence where we want to regularize |
id_col |
col_name of df for the id |
time_col |
col_name of df for the time |
a regularized dataframe
Francois Bassac
This function removes the duplicated states and keep the last line. this function works both with (0,1) or categorical states on the state_col!
remove_duplicate_states(data, id_col = "id", time_col = "time")remove_duplicate_states(data, id_col = "id", time_col = "time")
data |
a single- or multi-state dataframe. |
id_col |
a character for the id column, default 'id' |
time_col |
a character for the time column, default 'time' |
a dataframe with no duplicated states.
Francois Bassac
N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) remove_duplicate_states(df)N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) remove_duplicate_states(df)
This function return a list of functions of fd_list without the ones specified in the numeric vector toDrop.
select_from_fd_list(fd_list, toDrop = NULL)select_from_fd_list(fd_list, toDrop = NULL)
fd_list |
a list of fd object |
toDrop |
a numeric vector |
a list of fd objects
Francois Bassac
This function performs the smooth PLS algorithm for both the univariate case for a Scalar Functional Data or a Categorical Functional Data and the multivariate case for a mix of functional data of different nature (SFD or CFD). For some input, if the same value is needed for all the different curves, no need to make a list with the value per curve.
smoothPLS( df_list, Y, basis_obj, regul_time_obj = NULL, curve_type_obj, orth_obj = TRUE, id_col_obj = "id", time_col_obj = "time", int_mode = 1, print_steps = FALSE, plot_rmsep = TRUE, print_nbComp = TRUE, plot_reg_curves = FALSE, jackknife = TRUE, validation = "LOO", parallel = TRUE )smoothPLS( df_list, Y, basis_obj, regul_time_obj = NULL, curve_type_obj, orth_obj = TRUE, id_col_obj = "id", time_col_obj = "time", int_mode = 1, print_steps = FALSE, plot_rmsep = TRUE, print_nbComp = TRUE, plot_reg_curves = FALSE, jackknife = TRUE, validation = "LOO", parallel = TRUE )
df_list |
a list of dataframe (id, time, value_or_state) |
Y |
a vector of the scalar response |
basis_obj |
a basis fd object or a list of basis fd object |
regul_time_obj |
a vector for time regularization values or a list |
curve_type_obj |
a list or vector of the different curves types, 'cat' or 'num. |
orth_obj |
a list or a vector of booleans if the orthonormalization is needed |
id_col_obj |
a list or a vector of the id column name |
time_col_obj |
a list or a vector of time column name |
int_mode |
a integer of the integration method : 1 for integrate, 2 for pracma::trapz |
print_steps |
a boolean to print the algorithm steps |
plot_rmsep |
a boolean to plot the pls model RMSEP |
print_nbComp |
a boolean to print the optimal number of components |
plot_reg_curves |
a boolean to plot the regressions curves |
jackknife |
a boolean for the jackknife input of pls() function, default TRUE |
validation |
a character for the validation input of pls() function, default 'LOO' |
parallel |
a boolean to use parallelization, default TRUE |
a list of the plsr_model and the regression curves (and intercept).
Francois Bassac
Predicts the response variable for Categorical Functional Data (CFD) by integrating the regression coefficient function over active state intervals.
smoothPLS_CFD_predict( df_predict, delta_spls, id_col = "id", time_col = "time", subdivisions = 100, parallel = TRUE, ... )smoothPLS_CFD_predict( df_predict, delta_spls, id_col = "id", time_col = "time", subdivisions = 100, parallel = TRUE, ... )
df_predict |
Dataframe containing columns for id, time, and state. |
delta_spls |
A list containing the scalar intercept and the functional regression coefficient (fd object). |
id_col |
Character, name of the id column, default 'id'. |
time_col |
Character, name of the time column, default 'time'. |
subdivisions |
integer, maximum number of sub-intervals for integration, default 100 |
parallel |
a boolean to enable parallel processing, default TRUE. |
... |
Additional parameters passed to evaluate_id_func_integral (e.g., rel_tol, subdivisions). |
A numeric vector of predicted values for each individual.
Francois Bassac
Predicts the response variable for Categorical Functional Data (CFD) by integrating the regression coefficient function over active state intervals.
smoothPLS_CFD_predict_para_v1( df_predict, delta_spls, id_col = "id", time_col = "time", subdivisions = 100, parallel = TRUE, ... )smoothPLS_CFD_predict_para_v1( df_predict, delta_spls, id_col = "id", time_col = "time", subdivisions = 100, parallel = TRUE, ... )
df_predict |
Dataframe containing columns for id, time, and state. |
delta_spls |
A list containing the scalar intercept and the functional regression coefficient (fd object). |
id_col |
Character, name of the id column, default 'id'. |
time_col |
Character, name of the time column, default 'time'. |
subdivisions |
integer, maximum number of sub-intervals for integration, default 100 |
parallel |
a boolean to enable parallel processing, default TRUE. |
... |
Additional parameters passed to evaluate_id_func_integral (e.g., rel_tol, subdivisions). |
A numeric vector of predicted values for each individual.
Francois Bassac
This function use the list of regression functions to make a prediction
smoothPLS_predict( df_predict_list, delta_list, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time", regul_time_obj = NULL, int_mode = 1, nb_pt = 10, subdivisions = 100, parallel = TRUE )smoothPLS_predict( df_predict_list, delta_list, curve_type_obj = NULL, id_col_obj = "id", time_col_obj = "time", regul_time_obj = NULL, int_mode = 1, nb_pt = 10, subdivisions = 100, parallel = TRUE )
df_predict_list |
a list of dataframe (id, time, value_or_state) |
delta_list |
a list of regression object (intercept, delta_1_fd, delta_2_fd, etc) |
curve_type_obj |
a list of characters of the curve types 'cat' or 'num' |
id_col_obj |
a list of character of the name of the id column, default 'id' |
time_col_obj |
a list of character of the name of the time column, default 'time' |
regul_time_obj |
a list of the time regularization values |
int_mode |
a integer for the integration mode, 1 for integrate, 2 for pracma::trapz |
nb_pt |
a integer, number of intermediate points for pracma::trapz, default 10 |
subdivisions |
a integer, number of subdivision in integrate function, default 100 |
parallel |
a boolean to use parallelization, default TRUE |
a numeric vector of the prediction
Francois Bassac
This function make a prediction base on a dataframe and a list made of the
intercept and the regression curve. The input curve_type in needed to select
the good way of evaluate the integrals .
smoothPLS_predict_uni( df_predict, delta_list, curve_type = NULL, int_mode = 1, id_col = "id", time_col = "time", nb_pt = 10, subdivisions = 100, regul_time = seq(delta_list[[2]]$basis$rangeval[1], delta_list[[2]]$rangeval[2], 1), parallel = TRUE )smoothPLS_predict_uni( df_predict, delta_list, curve_type = NULL, int_mode = 1, id_col = "id", time_col = "time", nb_pt = 10, subdivisions = 100, regul_time = seq(delta_list[[2]]$basis$rangeval[1], delta_list[[2]]$rangeval[2], 1), parallel = TRUE )
df_predict |
a dataframe ('id', 'time', 'state or value') to predict from |
delta_list |
a list of delta_spls : list(intercept, delta_fd) |
curve_type |
a character, 'cat' for Categorical FD, 'num' for Scalar FD |
int_mode |
a value of the integration mode, default 1 |
id_col |
a character for the id column, default 'id' |
time_col |
a character for the time column, default 'time' |
nb_pt |
number of points for the integration, default value : 10 |
subdivisions |
default parameter of R function integrate; default value : 100 |
regul_time |
a vector of time regularization values default delta_fd basis rangeval per 1, NEEDED for curve_type = 'num'! |
parallel |
a boolean to use parallelization, default TRUE |
a vector of predicted values
Francois Bassac
Predicts the response Y for Scalar Functional Data using the analytic L2 inner product. This implementation follows the theory from Chapter 7.
smoothPLS_SFD_predict( df_predict, delta_spls, basis_obj = NULL, id_col = "id", time_col = "time", parallel = TRUE, ... )smoothPLS_SFD_predict( df_predict, delta_spls, basis_obj = NULL, id_col = "id", time_col = "time", parallel = TRUE, ... )
df_predict |
Dataframe with columns (id, time, value). |
delta_spls |
List containing (intercept, delta_fd_object). |
basis_obj |
Optional basis for signal reconstruction. If NULL, uses the basis from delta_fd |
id_col |
Character, name of id column. |
time_col |
Character, name of time column. |
parallel |
a boolean to enable parallel processing, default TRUE. |
... |
Additional arguments for Data2fd or inprod. |
A numeric vector of predicted values
Francois Bassac
Predicts the response Y for Scalar Functional Data using the analytic L2 inner product. This implementation follows the theory from Chapter 7.
smoothPLS_SFD_predict_para_v1( df_predict, delta_spls, basis_obj = NULL, id_col = "id", time_col = "time", parallel = TRUE, ... )smoothPLS_SFD_predict_para_v1( df_predict, delta_spls, basis_obj = NULL, id_col = "id", time_col = "time", parallel = TRUE, ... )
df_predict |
Dataframe with columns (id, time, value). |
delta_spls |
List containing (intercept, delta_fd_object). |
basis_obj |
Optional basis for signal reconstruction. If NULL, uses the basis from delta_fd |
id_col |
Character, name of id column. |
time_col |
Character, name of time column. |
parallel |
a boolean to enable parallel processing, default TRUE. |
... |
Additional arguments for Data2fd or inprod. |
A numeric vector of predicted values
Francois Bassac
This function transform a categorical functional data with its indicator functions into a dedicated list of all the state (one per different state) This function will also work with character states.
split_in_state_df(data, id_col = "id", time_col = "time")split_in_state_df(data, id_col = "id", time_col = "time")
data |
a dataframe containing the indicator functions, output of state_indicator() |
id_col |
a character for the id column, default 'id' |
time_col |
a character for the time column, default 'time' |
a list containing the dataframe of the indicator function of each state.
Francois Bassac
N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) si_df = state_indicator(df, id_col='id', time_col='time') split_df = split_in_state_df(si_df, id_col='id', time_col='time')N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) si_df = state_indicator(df, id_col='id', time_col='time') split_df = split_in_state_df(si_df, id_col='id', time_col='time')
This function takes functional categorical curve as input and transform it into as many indicator curves as the number of state input return DATAFRAME Works even on dataframe without time condition respected (same start and end) This function sort the states by ascending order (if numeric) and put the name 'state_X' as the column of the output concerning the 'X' state. This function will also work with character states. Now for the different lists, the ith element of a list concern the ith states ordered.
state_indicator(data, id_col = "id", time_col = "time")state_indicator(data, id_col = "id", time_col = "time")
data |
a multistates dataframe ('id', 'time', 'states') |
id_col |
a character for the id column, default 'id' |
time_col |
a character for the time column, default 'time' |
a dataframe with columns ('id', 'time', list of states_XX)
Francois Bassac
N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) si_df = state_indicator(df, id_col='id', time_col='time')N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) si_df = state_indicator(df, id_col='id', time_col='time')
This function takes functional categorical curve as input and transform it into as many indicator curves as the number of state input return DATAFRAME Works even on dataframe without time condition respected (same start and end) This function sort the states by ascending order (if numeric) and put the name 'state_X' as the column of the output concerning the 'X' state. This function will also work with character states. Now for the different lists, the ith element of a list concern the ith states ordered.
state_indicator_old(data, id_col = "id", time_col = "time")state_indicator_old(data, id_col = "id", time_col = "time")
data |
a multistates dataframe ('id', 'time', 'states') |
id_col |
a character for the id column, default 'id' |
time_col |
a character for the time column, default 'time' |
a dataframe with columns ('id', 'time', list of states_XX)
Francois Bassac
N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) si_df = state_indicator_old(df, id_col='id', time_col='time')N_states = 3 lambdas = lambda_determination(N_states) transition_df = transfer_probabilities(N_states) df = generate_X_df_multistates(nind = 100, N_states, start=0, end=100, lambdas, transition_df) si_df = state_indicator_old(df, id_col='id', time_col='time')
General function to test basis functions
test_basis_properties(basis, name = "Base", tol = 1e-10)test_basis_properties(basis, name = "Base", tol = 1e-10)
basis |
Basis to test, basis object or list of fd functions |
name |
Character, name of the basis |
tol |
Float, precision, default 1e-10 |
a boolean
Francois Bassac
start = 0 end = 10 basis = create_bspline_basis(start, end, nbasis=10, norder=4) test_basis_properties(basis, "cubic splines", tol = 1e-3) basis_f = fda::create.fourier.basis(rangeval=c(start, end), nbasis=5) test_basis_properties(basis_f, "Fourier", tol = 1e-3)start = 0 end = 10 basis = create_bspline_basis(start, end, nbasis=10, norder=4) test_basis_properties(basis, "cubic splines", tol = 1e-3) basis_f = fda::create.fourier.basis(rangeval=c(start, end), nbasis=5) test_basis_properties(basis_f, "Fourier", tol = 1e-3)
This function gives transfer probabilities between states. row -> columns.
transfer_probabilities(N_states)transfer_probabilities(N_states)
N_states |
a int the number a states considered. |
a dataframe containing the transition probabilities
Francois Bassac
transfer_probabilities(3) transfer_probabilities(5)transfer_probabilities(3) transfer_probabilities(5)
Build transition matrix between to basis.
transition_matrix(basis1, basis2)transition_matrix(basis1, basis2)
basis1 |
First basis, basis obj or list of fd functions |
basis2 |
Second basis, basis obj or list of fd functions |
Transition matrix P such as basis1 = P * basis2
Francois Bassac
This function build the alphas matrix for a dataframe of a curve_type curve on a basis on a regul_time vector.
univariate_alpha_building( df, basis, curve_type = NULL, regul_time, id_col = "id", time_col = "time" )univariate_alpha_building( df, basis, curve_type = NULL, regul_time, id_col = "id", time_col = "time" )
df |
a dataframe (id, time, value_or_state) |
basis |
a basis fd object |
curve_type |
a character, 'cat' or 'num' |
regul_time |
a numeric vector for time regularization |
id_col |
a character for the id column name |
time_col |
a character for the time column name |
a matrix
Francois Bassac