--- title: "first-order" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{first-order} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r options, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, warning = FALSE, fig.align="center", fig.width = 5, fig.height = 5, comment = "#>" ) ``` ```{r libraries, setup} library(doremi) ``` # FIRST ORDER DIFFERENTIAL EQUATIONS The linear first order differential equation with constant coefficients model is given by the following equation: $$\tau\frac{dy}{dt} + (y-y_{eq}) = ku(t)$$ or $$\frac{dy}{dt} + \gamma (y-y_{eq}) = \gamma ku(t)$$ with the initial condition: $$y(t=0)=y_{0}$$ Where: * $y(t)$ is the signal to be analyzed * $\frac{dy}{dt}$ is its first derivative * $u(t)$ is the excitation term perturbing the dynamics of $y(t)$ And regarding the coefficients: * $\gamma$ is the decay rate.Its inverse, $\tau$ represents the characteristic response time of the solution of the differential equation (i.e. the time it takes to the system to go back to equilibrium) * $k$ is the gain. It indicates the proportionality between the stationary increase of $y$ and the constant excitation increase that caused it. * $yeq$ is the signal equilibrium value. Value reached when the excitation term is 0. It is common to find this equation with the excitation term set to 0, $u(t)=0$: $$\frac{dy}{dt} + \gamma(y-y_{eq}) = 0$$ The dynamics in this case follow and exponential decay of the form $y(t)=(y_0-yeq) e^{-\gamma t} + y_{eq}$ and are provoked either by a previous excitation that is no longer present or by the displacement of the system from its equilibrium position (i.e. an initial condition different from 0) # Two steps analysis procedure The estimation of the fixed effect coefficients of the differential equation is performed in two steps. The first step consists in estimating the first derivative, using one of the three methods proposed. The second step consists in performing the following linear mixed-effects regression: \begin{equation} \dot{y}_{ij} \sim (b_{0}+b_{0j})+ (b_1 + b_{1j})y_{ij}+ (b_2 + b_{2j}) U_{ij}+e_{ij} \label{eq2} \end{equation} where: * i accounts for the time * j accounts for the different individuals * $\dot{y}_{ij}$ is the derivative estimated through one of the derivative estimation methods available in the package (gold, glla and fda) calculated on embedding points (gold, glla) or estimated on the time points provided (fda) * $y_{ij}$ and $U_{ij}$ are the variable and the excitation averaged on embedding points (gold, glla) or the original values provided (fda) * $e_{ij}$ is the error term (residuals) Note that random effects are estimated for the intercept ($b_{0} + b_{0j}$), variable ($b_{1} + b_{1j}$) and excitation terms ($b_{2} + b_{2j}$), so that individuals can have different coefficients (initial condition, damping time, gain and equilibrium value). The fixed effect coefficients estimated from the regression are: * Coefficient $gamma = b_{1}$ * Coefficient $yeqgamma = b_{0}$ * Coefficient $kgamma = b_{2} $ The coefficients of the differential equation can thus be calculated as: * Decay time: $tau = \tau = \frac{1}{b_{1}}$ * Gain: $k = \frac{b_{2} }{b_{1}}$ * Equilibrium value: $yeq = \frac{b_{0} }{b_1}$ The estimation is performed using the function `lmer` if there are several individuals or `lm` if there is a single individual. # First order differential equation with no excitation term ## Simulating data Two functions are available to simulate data in the package: `generate.1order` simulates the solution of the differential equation for a given vector of time, and the parameters tau = $\tau$, y0 = $y(t=0)$,yeq = $y_{eq}$, k = $k$ and the vector of excitation $u(t)$. The function create a data.table with a column t for the time, y for the signal, and exc for the excitation. ```{r} test <- generate.1order(0:100,y0 = 10,tau = 10) plot(test$t,test$y) ``` The function `generate.panel.1order` uses `generate.1order` to generate a panel of `nind` individuals, with measurement noise and inter-individual noise. ### Example 1a - Generating signals with no noise In this example we will generate data for 4 individuals, with a decay time of 10s. That is, the signal follows exactly the differential equation (no measurement noise) with no variation of the decay time, the gain and the equilibrium value across individuals (no interindividual noise). As we are considering no excitation, then, as mentioned, the system should be out of its equilibrium value to observe the dynamics of return to equilibrium. ```{r simulation example1a} res1a <- generate.panel.1order(time = 0:99, y0 = 1, tau = 10, nind = 4) ``` It can be specified to the function whether there is an excitation or not by either setting the input to NULL or directly ignoring it in the input arguments. The function returns a data.table with the class 'doremidata'. As any data.table, it can be visualized using the `str` or `head` functions or entering the variable name. ```{r res1a} res1a ``` Where: * id is the identifier of the individual * time is the time vector introduced as input * signalraw is the signal without noise * signal is the signal with noise (in this case, the values of this column are the same as those of the signalraw column and in the plot, and thus both lines are overlapped in the plot below). Plotting the data with the plot method available in the package for doremidata objects is quite straight forward: ```{r plot res_1a, fig.align="center", fig.height=6, fig.width=7} plot(res1a) ``` ### Example 1b - Changing initial condition, gain and equilibrium value In this example, we show the flexibility of the functions to add changes on the mentioned parameters. For this, we simplify the form of the exitation to a simple step: ```{r changing initial condition res1b} timevec <- 0:49 set.seed(123) res1b <- generate.panel.1order(time = timevec, excitation = as.numeric(timevec > 20), y0 = 3, tau = 5, yeq = 1.5, nind = 4) ``` ```{r changing initial condition plot res1b,fig.width = 7, fig.height = 6, fig.align = "center"} plot(res1b) + ggplot2::scale_y_continuous(limits = c(0, 3)) ``` ### Example 2a - Generating signals with noise The call to the function remains almost the same, with two additional arguments: a noise to signal ratio, set at 0.2, and an interindividual noise set at 40%. We can then visualize the different trajectories for the different individuals: ```{r noise res2a} # Generation of signals with intra and inter-noise set.seed(123) res2a <- generate.panel.1order(time = 0:49, y0 = 1, tau = 5, nind = 6, internoise = 0.4, intranoise = 0.1) ``` ```{r noise plot res2a, fig.width = 7, fig.height = 6, fig.align = "center"} plot(res2a) ``` ## Analyzing data ### Example 3 - Analyzing data when the excitation is unknown, with some inter- and intraindividual noise We will test how the analysis functions fit a decreasing exponential, as is the case when there is no excitation, when it is unknown, or when it is constant over time. ```{r analysis example3} #Simulating data with these hypothesis set.seed(123) data3 <- generate.panel.1order(time = 0:50, y0 = 0.5, tau = 10, nind = 3, internoise = 0.2, intranoise = 0.1) ``` ```{r analysis res3} #Analyzing res3 <- analyze.1order(data = data3, id = "id", time = "time", signal = "signal", dermethod = "fda", derparam = 0.7) ``` Note that the `input` parameter has been omitted when calling the function. The result display the summary of the fixed effect coefficients: ```{r} res3 ``` The data.table `result_id` gives the coefficients estimated for each individual: ```{r} res3$resultid ``` It is possible to use the `plot` function directly on the result of the analysis: ```{r analysis plot res3,fig.width = 7, fig.height = 6, fig.align = "center"} #Plotting plot(res3) ``` # First order differential equation with an excitation term Analyzing the signals with noise generated above, we can verify that the parameters were the one introduced in the simulation function and that the estimated signals generated match the simulated ones. # Simulating data ## Example 1a - Generating signals with no noise To generate the excitation process perturbing the system we want to simulate, we can use the `generate.excitation` function, which generate a sequence of square pulses of a given length randomly distributed: ```{r} set.seed(123) U <- generate.excitation(nexc = 3, # number of square pulses duration = 10, # pulse duration deltatf = 1, # time spacing between points tmax = 100, # maximum time minspacing = 20) # minimum spacin between square pulses plot(U$t,U$exc) ``` Note that the time vector and excitation vector can be of any form and created by the user. In this example we will use the excitation vector we just generated and the `generate.panel.1order` function to generate the variable following the first order differential equation for 4 individuals with a decay time of 10s. The signal follows exactly the differential equation (no measurement noise) with no variation of the decay time, the gain and the equilibrium value across individuals (no interindividual noise). ```{r excitation term example1a,fig.width = 7, fig.height = 6, fig.align = "center"} res1a <- generate.panel.1order(time = U$t, excitation = U$exc, tau = 10, k = 1, nind = 4) plot(res1a) ``` ## Example 1b - Changing initial condition, gain and equilibrium value Using the same excitation process, but changing the initial condition, the gain the the equilibrium value: ```{r changing initial condition} res1b <- generate.panel.1order(time = U$t, excitation = U$exc, y0 = 5, tau = 10, k = 3, yeq = 2, nind = 4) ``` ```{r plot res1b,fig.width = 7, fig.height = 6, fig.align = "center"} plot(res1b) ``` And it can be observed how the system tends to equilibrium ($t_{eq} = 2$) from the initial condition ($y_0 = 3$) after being perturbed by the excitations. ## Example 2 - Generating signals with noise The call to the function remains almost the same, this time we have added a Noise to Signal ratio of 2 (the variance of noise is twice the variance of the underlying signal) and a 20% inter-individual noise (that is the parameters gain, equilibrium value, the initial value $y_0$ and $\tau$ of each individual are sampled on a gaussian centered on the parameter value with a standard deviation of 20% its values) to visualize the different trajectories for the different individuals: ```{r example2} # Generation of signals with intra and inter-noise res2a <- generate.panel.1order(time = U$t, excitation = U$exc, tau = 10, k = 1, nind = 4, yeq = 2, y0 = 2, internoise = 0.2, intranoise = 2) ``` ```{r plot res2a,fig.width = 7, fig.height = 6, fig.align = "center"} plot(res2a) ``` # Analyzing data ## Example 3 - Analyzing data from a single individual In this case, we can reuse one of the signals from the previous example ```{r example3} data3 <- res2a[id==1] ``` As the table contains data from a single individual, the `id` parameter has been omitted when calling the function. ```{r res3} #Analyzing res3 <- analyze.1order(data = data3, input = "excitation", time ="time", signal = "signal", verbose=T) res3 ``` ```{r plot res3,fig.width = 6, fig.height = 4, fig.pos = 0.5, fig.align = "center"} #Plotting plot(res3) ``` ## Example 3a - Analyzing data with several individuals and some inter and intra-individual noise Analyzing the data generated in the example 2 above (table res2a), the user must specify the name of the columns containing the id of the participants, the excitation, and the signal. As several methods are available for the estimation of the derivatives, the user needs to specify which method to use (`gold` is the default) and the embedding dimension by modifying the parameter `derparam` (see the package pdf manual for more details). ```{r analysis res3a} res3a <- analyze.1order(data = res2a, id = "id", input ="excitation", time ="time", signal = "signal", dermethod = "gold", derparam = 3) ``` Now let’s take a look at the result. When calling the variable, the fixed effect coefficients of the regression with their associated standard error, the derived coefficient characterizing the signal shape (the decay time tau, the equilibrium value yeq and the gain k), and the R2 are displayed: ```{r analysis res3a print} res3a ``` The signal analyzed was generated with tau = 10, yeq = 0, and k = 1. This is a simplified view of the results. The analysis function supplies an object of class “doremi” that contains in fact several lists. It is possible to explore the full result values by using the function "summary" for doremi objects (see the section on methods created for doremi objects at the end of this vignette). ```{r summary res3a} summary(res3a) ``` The first object of the output contains the original data with some columns added. These columns contain intermediate variables necessary for the mixed-effect regression: ```{r head res3a} head(res3a$data) ``` Where: * signal_rollmean contains the zeroth order derivative of the input signal in embedding points provided by `gold` or `glla`, or the smoothing performed by `fda`. * signal_derivate1 contains the first derivate of signal, calculated by using one of the derivative estimation methods available (`gold`, `glla`, `fda`). * time_derivate contains the values of time in which the derivative was evaluated. * excitation_rollmean contains the roll mean of the excitation signal in embedding points (gold, glla) or in the initial time points provided (`fda`). If we want to visualize the summary of the mixed-effect regression: ```{r components of res3a} res3a$regression ``` Where we have the random and fixed effects and the residuals calculated by the function `lmer` or `lm` depending on if the sample had several or one individual respectively. Beware that, as known in most two-steps procedures, the estimation of derivatives reduces variance, and thus the error terms provided by the regression are not final. The following table contains the fixed effect coefficients (result displayed by default when calling a doremi variable, as mentioned before): ```{r resultmean} res3a$resultmean ``` It can be observed that the decay time is close to the value introduced to the simulation function (6), the equilibrium value close to its true value (0), and the gain close to its true value (1). For each individual we have: ```{r resultid} res3a$resultid ``` Where: * tau is the decay time. * yeq is the equilibrium value. * excitation_k is the gain associated to the excitation "excitation". As we will see later, there is the possibility to consider several excitations, each one of which will have a different effect on the dynamics of the signal studied and thus, a different gain associated to them. If we graphically wish to verify how the estimated signal fits the initial signal, we can once again call the function `plot`, that has been adapted to handle doremi objects as well: ```{r plot res3a,fig.width = 7, fig.height = 6, fig.align = "center"} plot(res3a) ``` Similarly to the `print` function, `plot` applied to a doremi object plots by default the first six individuals contained in the result. If we wish to visualize a single individual or a specific set of individuals, we can specify them by changing the "id" input parameter of the function: ```{r more plot res3a,fig.width = 7, fig.height = 6, fig.align = "center"} plot(res3a, id = 3) plot(res3a, id = c(1,4)) ``` Finally, the results contain the derivative method used and the embedding number/smoothing parameter used for further reference. ## Example 3b - Enhancing the fit by changing the embedding number/ smoothing parameter. As mentioned before, the estimation of derivatives before model fit is a source of error underestimation. It is possible to enhance the quality of the fit in three ways: * By changing the embedding number/smoothing parameter * By changing the order of the derivative * By changing the derivation method In the following example, we will use the function `optimum_param` to find the embedding number that provides the closest $R^2$ to 1. The other ways can be tested "manually" by calling the other functions, for instance, in a simulation study. Let us reuse the data from res2a: ```{r res3b} res3b <- optimum_param (data = res2a, id = "id", input = "excitation", time = "time", signal = "signal", model = "1order", dermethod = "gold", pmin = 3, pmax = 21, pstep = 2) res3b$summary_opt res3b$d ``` And can be seen that from the range provided, an embedding number of `{r} res3b$d` produces the best fit and the coefficients are closer to their true values than the ones estimated in the previous example. The optimum_param function generates an object of class "doremiparam". If we want to graphically see the evolution of the coefficients according to the embedding number, we can easily plot the results with the `plot` function too: ```{r plot res3b,fig.width = 7, fig.height = 6, fig.align = "center"} plot(res3b) ``` ## Example 4 - Analyzing data when the signal is subject to several excitations In this example, we construct a signal having different gain for different excitation sources. For the sake of simplicity, we generate this example signal without noise. ```{r example4} #Simulating data with these hypothesis #Generating the three excitation signals: t <- 0:100 u1 <- as.numeric(t>20 & t<40) u2 <- as.numeric(t>50 & t<70) u3 <- as.numeric(t>80 & t<100) # Arbitrarily choosing a = 1, b = 2 and c = 5 for the first individual et1 <- u1 + 3 * u2 + 5 * u3 y1 <- generate.1order(time = t, excitation = et1, tau = 10, k = 1)$y #as we are using the $y argument of the object generated #Signals for the second individual; # Arbitrarily choosing a = 1, b = 2.5 and c = 4 for the second individual et2 <- u1 + 2.5 * u2 + 4 * u3 y2 <- generate.1order(time = t, excitation = et2, tau = 10, k = 1)$y #Generating a table with the signals data4 <- data.frame(id = rep(c(1, 2), c(length(et1), length(et2))), time = c(t, t), excitation1 = c(u1, u1), excitation2 = c(u2, u2), excitation3 = c(u3, u3), signalcol = c(y1, y2)) ``` We have in data4 a signal, whose dynamics was created by three excitations with different gains: k = 1, 3 and 5 for id = 1, k = 1, 2.5 and 4 for id = 2. We can see in the graph below that each excitation is identical but generates a different change of the signal: ```{r plot example4,fig.width = 7, fig.height = 4, fig.align = "center"} #Plotting signals ggplot2::ggplot( data = data4) + ggplot2::geom_line(ggplot2::aes(time,signalcol, colour = "Signal-no noise"))+ ggplot2::geom_line(ggplot2::aes(time,excitation1,colour = "excitation 1"))+ ggplot2::geom_line(ggplot2::aes(time,excitation2,colour = "excitation 2"))+ ggplot2::geom_line(ggplot2::aes(time,excitation3,colour = "excitation 3"))+ ggplot2::facet_wrap(~id)+ ggplot2::labs(x = "Time (s)", y = "Signal (arb. unit)", colour = "") ``` It is possible in `analyze.1order` to give a vector of column as input. In this case, a gain will be estimated for each excitation column. ```{r res4} #Analyzing signals res4 <- analyze.1order(data = data4, id = "id", input = c("excitation1", "excitation2", "excitation3"), time = "time", signal = "signalcol", dermethod = "fda", derparam = 0.1) #Looking for the calculation of the coefficients of the excitation res4$resultid ``` Here, we recover the estimated gains for each individual in the `resultid` table. They are a good approximation of the coefficients introduced, using this small sample of 2 individuals. ```{r fig.width = 7, fig.height = 4, fig.align = "center"} #Plotting signals plot(res4) ``` ## Example 5 - Analyzing data with missing values in the signal We examine how the analysis function still manages to retrieve an accurate fit from signals with missing measurements: ```{r example5} t <- 0:200 set.seed(123) data5 <- generate.panel.1order(time = t, excitation = as.numeric(t>50 & t<100), tau = 10, k = 1, nind = 6, internoise = 0.4, intranoise = 0.1) ``` ```{r plot data5, fig.width = 7, fig.height = 6, fig.align = "center"} plot(data5) ``` ```{r missing data} #Keeping one third of the rows selected randomly from the full data set set.seed(123) data5rd <- data5[sample(nrow(data5), nrow(data5)/3), ] data5rd <- data5rd[order(id,time)] ``` In the next plot we show the selection of random points made on the full data set, and also represent the signal without noise. The signal with noise will be used to carry out the analysis. ```{r plot missing data,fig.width = 7, fig.height = 6, fig.align = "center"} plot(data5rd) ``` Then analyzing and visualizing the random points selection: ```{r res7} res7 <- analyze.1order(data = data5rd, id = "id", input = "excitation", time ="time", signal = "signal") ``` ```{r plot res7,fig.width = 7, fig.height = 6, fig.align = "center"} plot(res7)+ ggplot2::geom_line(data = data5, ggplot2::aes(time,signalraw, colour = "Original signal")) ``` # C - Applications ## Heart rate during effort tests Using the `cardio` dataset present in this package, we will apply the model to real heart frequency measurements. The data fields are described with detail in the pdf manual of the package, together with the source from which the data was obtained. The data consists in heart rate measurements from 21 individuals carrying out an effort test on a cycle ergometer. Participants pedal on a bicycle with increasing resistance. The resistance load is measured in watts, with higher load forcing the individual to greater effort. His/her heart frequency will then vary according to the effort supplied and the participant’s fitness. According to this, the `cardio` data includes, for each individual, the time since the beginning of the test (seconds), the bicycle load (Watts) and the heart rate (1/min). Here is an example of these data: ```{rf,ig.width = 6, fig.height = 5, fig.align = "center"} ggplot2::ggplot(cardio[id == 2], ggplot2::aes(x = time))+ ggplot2::geom_point(ggplot2::aes(y = hr,color = "heart rate (beat/s)"))+ ggplot2::geom_line(ggplot2::aes(y = load,color = "effort load (W)")) ``` We can perform the analysis using `analyze.1order`: ```{r cardio data} resc1a <- analyze.1order(data = cardio, id = "id", input = "load", time ="time", signal = "hr", dermethod = "glla", derparam = 5) ``` And use the `plot` function to plot the analysis results for all 21 participants. Note that omitting the argument `id` would lead to only the first 6 participants being plotted by default. ```{r plot cardio data,fig.width = 7, fig.height = 16, fig.align = "center"} plot(resc1a, id = 1:21) + ggplot2::facet_wrap(~id,ncol=3,scales="free") ``` As can be seen, the model reproduces reasonably well the variation of the heart rate, and can be used to fully characterize the heart rate dynamics with three simple parameters: the resting heart rate (the equilibrium value), the characteristic time of heart rate change (the decay time), and the increase of heart rate for a given effort (the gain). Such analysis has been used in [(Mongin et al., 2020a)](https://doi.org/10.1080/00273171.2020.1754155) and [Mongin et al., 2020b](https://doi.org/10.1038/s41598-020-69218-1). ## Measuring heart frequency during effort tests by considering several excitations In the previous example, each one of the steps in the resistance load can be considered as a separated perturbation having an effect on heart rate. In this case, the functions will estimate different gains as explained, to account for each one of the contributions of the load to the total heart rate dynamics. Let us first create a new column for each step of effort load: ```{r cardio multiple excitations generate} mydata <- cardio[id %in% 1:10] # create a index indicate which step of the exercise test mydata[,load_idx := data.table::rleid(load),by = id] # transforming to large format, to have one column per workload step mydata_large <- data.table::dcast(id + time + hr ~ paste0("load_",load_idx),data = mydata,value.var = "load") # replacing NAs by 0s load_cols <- paste0("load_",1:max(mydata$load_idx)) mydata_large[,c(load_cols) := lapply(.SD,function(col) data.table::fifelse(is.na(col),0,col)),.SDcols = load_cols] head(mydata_large) ``` To consider each effort step as an independent excitation associated with its own gain, we can use `analyze.1order` and give the vector of the excitation column as `id` argument: ```{r cardio multiple excitations} # analyzing resc1b <- analyze.1order(data = mydata_large, id = "id", input = load_cols, time = "time", signal = "hr", dermethod = "gold", derparam = 5) ``` Here we will estimate a gain per excitation column ```{r} resc1b ``` In the plot, each workload step of the exercise test appear in one different column. ```{r plot cardio mult excitations,fig.width = 7, fig.height = 16, fig.align = "center"} plot(resc1b,id=1:10)+ ggplot2::facet_wrap(~id,ncol=3,scales="free") ``` This analysis has been the main analysis in [Mongin et al.,2020c](https://iopscience.iop.org/article/10.1088/1361-6579/abbb6e) and used in [Mongin et al.,2020d](https://doi.org/10.1109/ESGCO49734.2020.9158156). ## Measurements of response time of individuals when carrying out mental rotation tasks Using the `rotation` dataset present in this package, we will apply the model to a case in which the excitation signal is not clearly defined. In brief, the data is from 17 individuals that carried out mental rotation tasks (identify if two figures, one of which is rotated, are the same or a mirror image) and the response time (in milliseconds) was measured [(Courvoisier et al., 2013)](https://doi.org/10.1016/j.yhbeh.2012.12.007). Each individual was measured every day for 60 days, though there can be missing data. To account for this, the time has been represented by the number of days since the beginning of the experiment (variable `days`). The signal is the mean response time (variable `meanRT`) over the ~200 stimuli done every day. The purpose of the study was to compare the performance between men and women. ```{r mental rotation data} dermethod<- "fda" pmin = 0.1 pmax = 1 pstep = 0.1 restemp <- optimum_param(data = rotation, id = "id", time ="days", signal = "meanRT", dermethod = dermethod, model = "1order", pmin = pmin, pmax = pmax, pstep = pstep) restemp$summary_opt resc2a <- analyze.1order(data = rotation, id = "id", time ="days", signal = "meanRT", dermethod = dermethod, derparam = restemp$d) ``` Plotting analysis results: ```{r plot menta rotation data,fig.width = 7, fig.height = 6, fig.align = "center"} plot(resc2a, id = 1:17) ```