All Courses

Time Series Causal Discovery Methods

While Granger causality offers insights based on predictive power and SVAR models help estimate effects under specific assumptions, they often don't fully reveal the underlying causal graph structure from time-series data. Granger causality is not inherently structural, and SVAR requires strong prior knowledge about contemporaneous relationships, which may not be available. To address this, specialized causal discovery algorithms have been developed for temporal data, aiming to reconstruct the network of lagged and contemporaneous influences directly from observations.

These methods typically represent causal relationships using a time series graph. In this representation, nodes correspond to variables at specific time points (e.g., $X_t$ , $Y_{t-1}$ ), and directed edges signify causal influences. An edge from $X_{t-k}$ to $Y_t$ indicates that variable $X$ at time $t-k$ exerts a direct causal effect on variable $Y$ at time $t$ , relative to the other measured variables in the system. Common underlying assumptions include time homogeneity (causal relationships remain constant over time) and often some form of stationarity, although methods relaxing these constraints are an active area of research.

Constraint-Based Time Series Discovery: PCMCI

A widely used constraint-based algorithm adapted for time series is PCMCI (PC for Multiple time series Conditional Independence). It builds upon the logic of the Peter-Clark (PC) algorithm but modifies it to effectively manage the auto-dependencies intrinsic to time series and the high dimensionality arising from considering multiple time lags.

PCMCI proceeds in two primary stages:

Parent Candidate Selection (PC-Stage): For each variable $X^i_t$ in the time series and each potential lag $\tau > 0$ up to a maximum lag $\tau_{max}$ , this stage identifies a set of candidate parents $\mathcal{P}(X^i_t)$ . These candidates are selected from all other variables $X^j_{t-\tau'}$ (including the variable's own past, i.e., $i=j$ ) at relevant past lags $\tau'$ . Similar to the standard PC algorithm, it employs a series of conditional independence (CI) tests, conditioning on progressively larger sets of potential separating variables. This pruning step significantly reduces the search space by eliminating variables that are conditionally independent of the target, given others. The adaptation for time series is the systematic testing for lagged dependencies.
Momentary Conditional Independence (MCI) Test: Once the candidate parents $\mathcal{P}(X^i_t)$ are identified for each target variable $X^i_t$ , the MCI test determines the existence of a direct causal link $X^j_{t-\tau} \to X^i_t$ for each candidate $X^j_{t-\tau}$ . It performs a specific conditional independence test:
$X^j_{t-\tau} \perp X^i_t \mid \mathcal{P}(X^i_t) \setminus \{X^j_{t-\tau}\}, \mathcal{P}(X^j_{t-\tau})$
The conditioning set is carefully constructed. It includes the identified candidate parents of the target variable $X^i_t$ (but excludes the specific potential parent $X^j_{t-\tau}$ being tested) and often includes the parents of the source variable $X^j_{t-\tau}$ as well. This conditioning helps to account for autocorrelation and potential indirect causal paths mediated through other variables. If the conditional independence hypothesis is rejected (indicating dependence), a directed edge representing the causal link $X^j_{t-\tau} \to X^i_t$ is added to the estimated time series graph.

PCMCI offers an approach to handling the high dimensionality and autocorrelation challenges common in multivariate time series. By first narrowing down potential causal influences (PC-stage) and then applying the targeted MCI test, it manages the complexity of conditional independence testing in high-dimensional temporal settings.

When applying PCMCI, several practical aspects require careful consideration:

Maximum Time Lag ( $\tau_{max}$ ): Selecting the maximum lag $\tau_{max}$ is a significant decision. Setting it too low might lead to missing genuine causal links operating over longer time scales. Conversely, setting it too high increases computational demands and the risk of finding spurious associations. Domain expertise or preliminary time series analyses (like inspecting partial autocorrelation functions) can provide valuable guidance.
Conditional Independence Test: The choice of the CI test is fundamental and depends on the nature of the data and relationships. For linear Gaussian systems, partial correlation (ParCorr) is often suitable. For non-linear relationships or non-Gaussian noise, kernel-based tests (like GPDC) or mutual information-based tests (like CMIknn) might be more appropriate. Using an inappropriate test can lead to significant errors in the discovered graph structure.
Significance Level ( $\alpha$ ): The significance level $\alpha$ used in the CI tests determines the threshold for rejecting the null hypothesis of independence. It controls the trade-off between discovering true but potentially weak links (higher sensitivity, lower $\alpha$ ) and avoiding spurious links (higher specificity, higher $\alpha$ ).

Example Time Series Causal Graph

The following diagram illustrates a potential causal structure among three time series variables (X, Y, Z) that could be discovered by an algorithm like PCMCI.

A time series graph showing auto-dependencies within each variable (e.g., $X_{t-1} \to X_t$ ) and lagged causal influences between different variables across time steps (e.g., $X_{t-1} \to Y_t$ , $Z_{t-2} \to Y_t$ , $Y_{t-1} \to Z_t$ ).

Score-Based and Other Temporal Discovery Approaches

Constraint-based methods, score-based algorithms can also be adapted for time series discovery. For instance, variations of Greedy Equivalence Search (GES) tailored for temporal data exist. Furthermore, approaches that use functional causal models offer alternative perspectives. Examples include extensions of Linear Non-Gaussian Acyclic Models (LiNGAM) designed for vector autoregressive processes (e.g., VAR-LiNGAM, TiMINo), which can identify causal structures under assumptions about non-Gaussian noise distributions or specific functional forms. More recent developments include methods designed for potentially non-linear dynamics using neural networks, such as DYNOTEARS, which optimizes a score function incorporating Granger causal concepts within a continuous optimization framework suitable for linear dynamics.

Addressing Latent Variables and Evaluating Discovery

Unobserved confounding remains a major hurdle in time series causal discovery, just as in static settings. Time series adaptations of algorithms like the Fast Causal Inference (FCI) algorithm (e.g., tsFCI) have been developed. These methods aim to identify causal relationships that hold even in the presence of latent variables. The output graphs often contain additional edge types (e.g., bidirected edges or partially oriented edges) to explicitly represent uncertainty about causal direction or the potential presence of unobserved common causes.

Evaluating the performance of time series causal discovery algorithms is typically done using synthetic data generated from known causal structures. Standard metrics include the structural Hamming distance (SHD) to measure the difference between the estimated and true graphs, as well as precision (fraction of discovered edges that are true) and recall (fraction of true edges that are discovered). In practical applications, the utility of a discovered graph might be assessed by its ability to improve downstream tasks like forecasting accuracy or the effectiveness of interventions designed based on the inferred structure.

Practical Implementation Notes

Specialized software libraries facilitate the application of these advanced methods. The tigramite library in Python is a comprehensive framework implementing PCMCI, various conditional independence tests suitable for time series, and related analysis tools.

# Python workflow using tigramite (illustrative)
import numpy as np
import tigramite
from tigramite import data_processing as pp
from tigramite.pcmci import PCMCI
from tigramite.independence_tests import ParCorr, GPDC # Example CI tests

# Assume 'data' is a NumPy array shaped (time_steps, num_variables)
# Assume 'var_names' is a list like ['X', 'Y', 'Z']

# Example Data (replace with actual time series data)
# data = np.random.randn(100, 3)

# 1. Prepare data using tigramite's DataFrame
#    Handles masking missing values, etc.
dataframe = pp.DataFrame(data, var_names=var_names)

# 2. Select a Conditional Independence Test
#    Use ParCorr for linear-Gaussian assumption
#    Use GPDC (Gaussian Process Discharge Coefficient) for non-linear cases
# cond_ind_test = ParCorr()
cond_ind_test = GPDC(significance='analytic') # Example for non-linear

# 3. Initialize the PCMCI algorithm instance
pcmci = PCMCI(
    dataframe=dataframe,
    cond_ind_test=cond_ind_test,
    verbosity=1) # Set verbosity level for output

# 4. Execute the PCMCI algorithm
#    Specify max lag (tau_max) and significance level for PC stage (pc_alpha)
#    alpha_level for MCI test is usually set separately if needed
results = pcmci.run_pcmci(tau_max=5, pc_alpha=0.05, alpha_level=0.01)

# 5. Process and interpret the results
#    Results dictionary contains p-values, link strengths, and the graph array
print("Significant links (Parent --> Child):")
pcmci.print_significant_links(
    p_matrix=results['p_matrix'],
    val_matrix=results['val_matrix'],
    alpha_level=0.01) # Print links significant at MCI alpha level

#    The graph structure can be visualized
# from tigramite import plotting
# tp = plotting.TipPlot(results)
# tp.plot_graph(
#     show_colorbar=False,
#     var_names=var_names,
#     link_colorbar_label='cross-MCI'
# )
# tp.show() # Display the plot

This code illustrates the standard workflow: data preparation within the tigramite framework, selection of an appropriate CI test based on data characteristics, configuration and execution of the PCMCI algorithm specifying parameters like tau_max and significance levels, and finally, analysis and visualization of the discovered causal graph. Careful tuning of algorithm parameters and validation of the underlying assumptions are necessary steps for obtaining reliable causal insights from temporal data.

These discovery methods provide essential tools for moving past correlational analysis in time series. By aiming to understand the underlying data generating processes, they support more reliable forecasting, simulation of interventions, and a deeper understanding of dynamic systems. Nonetheless, like all causal inference techniques, their validity rests on assumptions that must be carefully evaluated in the context of the specific application.

Was this section helpful?