Revisiting causality using stochastics: 2. Applications

 

Revisiting causality using stochastics: 

2. Applications

Abstract

In a companion paper, we develop the theoretical background of a stochastic approach to causality with the objective of formulating necessary conditions that are operationally useful in identifying or falsifying causality claims. Starting from the idea of stochastic causal systems, the approach extends it to the more general concept of hen-or-egg causality, which includes as special cases the classic causal, and the potentially causal and anti-causal systems. The framework developed is applicable to large-scale open systems, which are neither controllable nor repeatable. In this paper, we illustrate and showcase the proposed framework in a number of case studies. Some of them are controlled synthetic examples and are conducted as a proof of applicability of the theoretical concept, to test the methodology with a priori known system properties. Others are real-world studies on interesting scientific problems in geophysics, and in particular hydrology and climatology.

Inline Graphic

(Deep doubts, deep wisdom; shallow doubts, shallow wisdom – Chinese proverb)

1. Introduction

The companion paper [1] studies theoretically the identifica­tion and characterization of a causal link between two stochastic processes x(t) and y(t), which represent two natural processes. Our proposal relies on the relationship

1.1
where v(t) is another stochastic process (not necessarily white noise), assumed uncorrelated to x(t), which represents the part of the process that is not explained by the causal link. The function g(h) is the impulse response function (IRF) of the system (x(t), y(t)); to see this, we set v(t) ≡ 0 and x(t) = δ(t) (the Dirac delta function, representing an impulse of infinite amplitude at t = 0 and attaining the value of 0 for t ≠ 0), and we readily get y(t) = g(t).

For any two processes x(t) and y(t), equation (1.1) has infinitely many solutions in terms of the function g(h) and the process v(t). An obvious and trivial one is g(h) ≡ 0, y(t) ≡ v(t). The sought solution is the one that corresponds to the minimum variance of v(t), called the least-squares solution. Assuming that this has been determined for the system (x(t), y(t)), we call that system

1.

potentially causal if g(h) = 0 for any h < 0, while the explained variance is non-negligible;

2.

potentially anti-causal if g(h) = 0 for any h > 0, while the explained variance is non-negligible (this means that the system (y(t), x(t)) is potentially causal);

3.

potentially hen-or-egg (HOE) causal if g(h) ≠ 0 for some h > 0 and some h < 0, while the explained variance is non-negligible;

4.

non-causal if the explained variance is negligible.

In the above, we use the adverb ‘potentially’ to highlight the fact that the conditions tested provide necessary but not sufficient conditions for causality. In a potentially causal (or anti-causal) system, the time order is explicitly reflected in the definition. In a potentially HOE causal system, the time order needs to be clarified by defining the principal direction. The most natural indices for this are (a) the time lag hc maximizing the absolute value of cross-covariance; (b) the mean (time average) of the function g(h) and (c) the median h1/2 of the function g (h).

Assuming that processes x(t) and y(t) are positively correlated (i.e. an increase in x(t) would result in an increase in y(t); if not we multiply one of the two by −1), we seek an optimal solution for the IRF by minimizing the variance of v(t). We also set additional desiderata for

(a)

an adequate time span

of h;

(b)

a non-negative g(h) ≥ 0 for all

; and

(c)

a smooth g(h) with the smoothness expressed as a constraint E ≤ E0, where E is determined in terms of the second derivative of g(h) and E0 > 0.

The proposed theoretical framework is formulated in terms of natural, i.e. continuous time. On the other hand, as the estimation of the IRF relies on data in an inductive manner, and data are only available in discrete time, conversion of the continuous to a discrete time framework is necessary for the application. This results in
1.2
where the sequence gj is determined from the function g(h) in the companion paper. Furthermore, any dataset is finite and allows only a finite number of gj terms to be estimated. Therefore, in the applications, the summation limits ± ∞ in equation (1.2) are replaced by ± J, apparently assuming that gj = 0 for |j| > J, where, in order to identify gj from data, J should be chosen much lower than the length of the dataset.

If we exclude the non-negativity constraint, then the problem of identifying the IRF has an analytical solution (equation (3.33) in the companion paper [1]). However, a numerical solution is always possible, simple and fast. The theoretical framework is illustrated, tested and explored in a number of case studies which are presented below. The majority of these are synthetic, with a priori known system properties, and they serve to test the methodology developed. The remaining are real-world case studies that serve to illustrate the usefulness of the method.

Apparently, the estimation of the IRF from data involves uncertainty. As explained in the companion paper [1], the method of choice for the uncertainty assessment is Monte Carlo simulation, because the complexity of the calculations for optimizing the IRF fitting does not allow analytical solutions. To make the study short, this task was kept out of its scope. However, a preliminary investigation and some first results are provided in the electronic supplementary material, section SI2.1, where it is shown that (i) the uncertainty in the IRF ordinates per se can be large if no constraints are used to determine it, (ii) the use of constraints decreases the uncertainty and (iii) the uncertainty in the key characteristics related to causality (time directionality, time lags, explained variance) is small, irrespective of the constraints used.

Increased estimation uncertainty may also result in spurious causality claims. As high autocorrelation increases uncertainty in the long term, this could be a major case leading to false identification of potential causality. This is illustrated in the electronic supplementary material (Section SI2.2) by means of a synthetic example, in which the processes xτ and yτ are, by construction, independent of one another, but with high autocorrelation. Techniques to handle such situations and avoid false conclusions are also discussed there.

As also explained in the companion paper [1], the proposed method for the determination of the ordinates gj based on the minimization of the variance of the error process, is, by its construction, non-parametric. A well-known weakness of determining numerous ordinates is that it is an over-parametrized problem. Alternative techniques may overcome this problem using a parametric model (such as a Box–Jenkins model or an autoregressive moving average exogenous—ARMAX—model [2,3]). In our non-parametric approach, the over-parametrization problem can also be tackled—and here lies the usefulness of imposing constraints (a)–(c) discussed above. For comparison, an additional parametric method, formulated in terms of parametrizing the IRF per se in continuous time is also discussed and compared to the proposed non-parametric method in the electronic supplementary material, section SI2.3. This method is also applied in one of the case studies in the electronic supplementary material, section SI2.4.

2. Case studies

Thirty case studies have been conducted, whose results are summarized in table 1. In all of them, we started by assuming a potentially HOE causal model with a rather small number of weights gj, namely 41 (i.e. J = 20). Depending on the results of the estimation procedure, the system is deemed potentially HOE causal if we have gj > 0 for both some positive and some negative lags j, and potentially causal if gj = 0 for all j < 0 (or anti-causal if this happens for all j > 0). If the explained variance ratio is close to 0, the system would be deemed non-causal, but this case did not appear in the case studies. Furthermore, if it happens that at the edge of the window the IRF vanishes off (g±J = 0), we have a strong indication that the chosen J = 20 (defining the window size) is sufficient to recover the system dynamics in terms of causality. The opposite case would mean that a larger time window with a greater J is required. The optimal J in this case is not investigated as our scope here is not to construct an optimal model for a specific system, but only to test the proposed methodology and seek some insights within it.

Table 1.

Summary indices for the results of all case studies elaborated. The hc is the time lag maximizing the cross-covariance cyx(h), or equivalently the cross-correlation ryx(h):=cyx(h)/cxx(0)cyy(0); μh is the mean (time average) of the function g(h); h1/2 is the median of the function g(h); e is the explained variance ratio; and ϵ is the roughness ratio. In parentheses are the true values for the cases that they are known.

aThe roughness was calculated without considering the second derivative at zero.

bWe have not defined the median in cases that include negative values of gj.

The majority of the case studies are synthetic (#1 to #18) and are conducted as a proof of concept, i.e. to test the methodology developed with a priori known system properties. The remaining are real-world case studies divided in two subcategories. Namely, studies #19 to #22 deal with the precipitation–runoff system which is well understood in hydrology. Here we try to investigate whether our framework is consistent with the known fact that precipitation at a specific location is the cause and runoff at the associated location the effect. In other words, here we know a priori that precipitation causes runoff and we test our methodology (whether it can capture this known fact) rather than try to find the actual causality direction. On the contrary, studies #23 – #30 deal with systems that are much more complex and not well understood. Namely, in studies #23 – #28, we investigate the links between atmospheric temperature and CO2 concentration (cf. [4]), and in studies #29 – #30, we investigate the links between atmospheric temperature and El-Niño Southern Oscillation (ENSO; cf. [5]).

Notice that for each case, each of the directions x → y and y → x are investigated separately as there is no symmetry (or anti-symmetry) in the produced IRFs in the two directions and, hence in the quantified measures, which are summarized in table 1. When we refer to direction y → x we mean that we interchange the time series x and y and still estimate the IRF in the same way, as described in our equations (e.g. equation (1.1)), in which the direction x → y is assumed.

The details of the case studies are given in the following subsections.

(a) Synthetic examples

All synthetic examples are based on the same input series xτ, which was constructed by the methodology in Koutsoyiannis [6] whose software is available online as electronic supplementary material of that paper.

In particular, to generate xτ, we use the moving average scheme

2.1
where I = 1024, wτ is white noise, assumed standard Gaussian, and the sequence of coefficients ai is assumed time symmetric, i.e. ai = ai, and is determined assuming a filtered Hurst–Kolmogorov process with a generalized Cauchy-type climacogram (FHK-C [7,8]):
2.2

The term climacogram denotes the variance γ(k) of a stochastic process averaged at time scale k, as a function of k; by specifying it, all second-order properties of the process (autocovariance, power spectrum and variogram) are also uniquely specified. In the last equation α and λ are scale parameters with dimensions of [t] and [x], respectively, while M (fractal parameter) and H (Hurst parameter) are dimensionless parameters determining the dependence structure at a local level (smoothness or fractality) and the global level (long-range dependence). The parameter values are chosen as λ = 1, α = 20, H = M = 0.85. The chosen values of H and M suggest, respectively, (long-term) persistence and (short-term) smoothness of the process xτ. We deliberately choose a high value of Hurst parameter H, corresponding to a process with long-range dependence, in order to make the case study more challenging and insightful, additionally noting that a high H implies high uncertainty, also in the estimation process. The resulting autocorrelations are high, but not prohibitively high in the sense described in Section SI2.2 of the electronic supplementary material of this paper.

We recall [7,9] that the discrete time autocovariance cη for integer time lag η is the second discrete derivative of the climacogram multiplied by the square of the time scale. Specifically, assuming a discretization time step D = 1, the autocovariance is

2.3
The series of coefficients ai is calculated from that of cη as described in detail in Koutsoyiannis [6].

The system output yτ is calculated in our synthetic case studies from an equation similar to (2.1) but now replacing the white noise wτ with the input xτ, and the latter with the output yτ, while also adding some noise uτ:

2.4
where the integers IL and IH differ in the various applications as specified below. The noise uτ is assumed Gaussian with standard deviation 0.5, except in one case marked as ‘pure’ (applications #1 and #2) where no noise is added. The length of the generated series is 8000 in all synthetic case studies. Here, we assume that only these synthetic series are known, while the generation equation (2.4) is unknown, and we will try to recover (or approximate) it from the data by using equation (1.2).

We note that, since the sequence of ai was determined assuming (long-term) persistence (H = 0.85), all ai are non-negative. Hence, in case where IL, IH ≤ J, if our method works well, we expect to find that the sequence of gi (which we assume unknown and try to estimate by our method) is identical to the sequence of ai. This, however, will apparently not be the case if IL, IH > J.

In case studies #1 and #2, we use IL = IH = 20 without noise, thus building a symmetric HOE causal system without an error term. In this case, assuming J = 20 (41 weights) without using the constraints for roughness and non-negativity (i.e. using the analytical solution of equation (3.33) in the companion paper, with λ = 0), we fully recover the system dynamics in the direction x → y (case study #1), as shown in figure 1a. As seen in table 1, the variance is fully explained by that dynamics (e = 1).

Figure 1.

Figure 1. IRFs for synthetic applications #1 (a) and #2 – #3 (b) representing a HOE causal (symmetric) system without an error term. By construction, the true IRF has 41 non-zero weights (IL = IH = 20) and the system dynamics does not contain a random term. No constraints were used in the IRF estimation. For the estimated IRF, the number of weights is 2J + 1 with J = 20 (applications #1 – #2) or J = 10 (application #3). (Online version in colour.)

If we reverse the direction, i.e. y → x, without using any constraint (case study #2), the explained variance ratio remains very high, e = 0.99 (table 1), but the IRF, depicted in figure 1b, becomes very rough. Clearly, the time symmetry is captured, but apart from this, the shape of the IFR looks like representing a random pattern alternating between positive and negative parts. Does such an alternating random pattern suggest causality?

One may claim that the number of weights (41) is too high and regard the random shape as an artefact of non-parsimonious modelling (even though the data size is quite large, 8000, which should support the estimation of 41 parameters). For this reason, we have also tried to recover the dynamics with fewer (21) weights (J = 10). The resulting IRF is also plotted in figure 1b and again has a similar shape, with a symmetric, yet random pattern alternating between positive and negative parts. The explained variance ratio remains almost equally high, e = 0.99 (table 1, case study #3), which certainly suggests that the solution with 41 weights could become more parsimonious without sacrificing predictability. But does this high predictability of xτ from yτ suggest causality, given the rough and alternating pattern of weights?

To decrease the roughness, we have included case study #4 with direction y → x, 21 weights and a roughness constraint. The solution is depicted in figure 2a, where a more logical pattern of the IRF is seen. This was achieved almost with a negligible cost in predictability (e = 0.98 in case study #4 versus 0.99 in case study #3; table 1). Yet again we have an alternation of negative and positive parts. Notably, g0 = 3.2, while g±1 = −0.4. The alternation of signs for a change of lag of just 1 is fully understandable in terms of improving predictability, but does it have any meaning in establishing causality?

Figure 2.

Figure 2. IRFs for synthetic applications #4 ((a) J = 10, roughness constraint) and #5 ((b) J = 20, non-negativity constraint) representing a HOE causal (symmetric) system without an error term, the same as in figure 1. (Online version in colour.)

Our final application with the same dataset and direction y → x is case study #5 in which we enable the non-negativity constraint (with or without the roughness constraint). Here we get a reasonable solution (figure 2b), with only one non-zero weight at time lag zero. The resulting explained variance ratio is somewhat smaller, e = 0.91, and is fully due to the high lag zero cross-correlation of xτ and yτ (ryx(0) = 0.94). All characteristic time lags are equal to 0, reflecting the full temporal symmetry, as also happens with the direction x → y (case study #1). Yet in this fully symmetric case, even if we did not know that yτ was constructed from xτ, we would conclude that there is a preferential causality direction, x → y, as this results in higher explained variance and a more consistent and prolonged IRF (a bigger time span

).

In case studies #6 and #7, we use IL = 0, IH = 20, thus making a typical causal system (including an error term). If we do not use any constraint, we get the solutions shown in figure 3 ((a) for x → y, (b) for y → x). In both cases, the time directionality x → y is captured (table 1) but the rough shape in the direction x → y (case study 6) and the alternating positive and negative IRF values in both directions do not have any relationship with the true system dynamics. Interestingly, the explained variance ratio is higher in the direction y → x (e = 0.97) than in x → y (e = 0.94), which could not be expected. While causality can be inferred from characteristic time lags, the IRFs do not help in understanding causality. At this stage, the results seem rather puzzling but things will become clearer with the next case studies.

Figure 3.

Figure 3. IRFs for synthetic applications #6 (a) and #7 (b) representing a causal system. By construction, the IRF has 21 non-zero weights (IL = 0, IH = 20) and a random term in the system dynamics. No constraints were used in the estimation of IRF. (Online version in colour.)

In case studies #8 and #9, we use the same time series xτ and yτ as in applications #6 and #7. Here, we additionally enrol the non-negativity constraint for the IRF estimation, but not the roughness one. As seen in figure 4a and table 1, the estimated IRF reproduces the zero gj for time lags j < 0, thus capturing the direction and the correct time lags, but the IRF is far from the true system dynamics because it is very rough. If we reverse the direction, i.e. y → x, the system becomes anti-causal (zero gj for time lags j > 0), thus again recovering the correct causality direction. In both cases, the explained variance ratio is very high, e = 0.94, while, as seen in figure 4b, case #9 results in a rather smooth IRF. Yet the clearly positive lags for x → y and negative ones for y → x do not leave any doubt that the causality direction is the former.

Figure 4.

Figure 4. IRFs for synthetic applications #8 (a) and #9 (b) representing a causal system, the same as that of figure 3. Only the non-negativity constraint was used in the estimation of IRF. (Online version in colour.)

Case studies #10 and #11 again use the same time series xτ and yτ as in #6 – #9, representing the same typical causal system. The difference is that we now use both constraints, non-negativity and small roughness. As seen in figure 5a, now the system dynamics is fully recovered, while, as seen in table 1, this has been done at virtually no cost in terms of explained variance ratio, which is as high as in case #8 (e = 0.94). If we investigate the reverse direction, y → x (figure 5b), the system correctly appears as anti-causal (zero gj for time lags j > 0), thus again recovering the correct causality direction.

Figure 5.

Figure 5. IRFs for synthetic applications #10 (a) and #11 (b) representing the same causal system as that in figure 4, but using both the non-negativity and the roughness constraint in the IRF estimation. (Online version in colour.)

The above illustrations of the IRF behaviour with and without the constraints prove that both constraints are essential in studying causality. Therefore, in what follows, we always use both of them.

Another interesting question to investigate is this. What happens if the system is nonlinear in nature, while our entire framework is based on a linear relationship? To study this question, we exponentiate the time series yτ of case studies #8 – #11, while we leave xτ unchanged and we apply the linear framework again. In other words, now the actual system is

2.5
and we try to approximate it in a linear fashion and estimate it as
2.6

Now we should expect a larger variance of vτ because of the disagreement between the actual system and the model used for causality detection.

As seen in figure 6a, referring to the direction x → y, our framework correctly detected that we have a potentially causal system. The characteristic time lags do not show a noteworthy change in comparison with case #10, despite the fact that, as seen in table 1, the explained variance ratio has been substantially reduced from 0.94 to 0.32. If we change the direction to y → x (figure 6b), the ratio increases to 0.43, and the causality appears as HOE type, but with the correct principal direction, x → y (anti-causal at y → x). Therefore, here we locate a potential problem of the methodology, as the actual causality is not HOE.

Figure 6.

Figure 6. IRFs for synthetic applications #12 (a) and #13 (b) representing the causal system of figure 5, but with exponentiating the output of the latter. (Online version in colour.)

It is not too difficult to resolve this ambiguity: if we produce a scatter plot of yτ versus xτ (figure 7b), it becomes evident that the relationship of xτ and yτ is not linear; for comparison, figure 7a shows a similar plot for case studies #8 – #11, which is typical for linear relationships. Once we are aware of the nonlinearity, we will perform a nonlinear transform on xτ, yτ or both and reapply the methodology on the transformed series. In this particular case, if we apply the logarithmic transformation on yτ, we will switch to cases #8 – #11 and we will fully recover the system dynamics. Additional details on the application of this technique are given in §2b.

Figure 7.

Figure 7. Scatter plots of synchronous yτ versus xτ for applications #8 – #11 (a) and #12 – #13 (b). (Online version in colour.)

In the next case studies, #14 and #15, we have a causal model with long-range cross-dependence, constructed by using IL = 0, IH = 1024 in equation (2.4). However, we keep using J = 20 in our causality detection framework and thus we do not expect to fully recover the true long-term system dynamics. We only test whether we can correctly detect causality and its direction. Using only the roughness constraint, we find the IRF shown in figure 8a. Notice that the true (theoretical) IRF curve exceeds the horizontal axis span by far, going up to lag 1024, while the plotted area coincides with the time window of the sought IRF estimate (i.e. it goes up to lag 20 only). As seen in table 1 the true characteristic lags μh, h1/2 are of the order of 200 while with our chosen time window we can estimate lags up to 20—and most likely of the order of 10. Indeed, the methodology captured the mono-directional causality yet the estimated characteristic time lags are, as expected, too small compared to the true ones of table 1. Interestingly, figure 8a shows increasing IRF magnitude beyond the estimated average lag time. This may seem inconsistent, as the true IRF does not contain an increasing branch. However, this is a reasonable behaviour reflecting the fact that the chosen time window (J = 20) is too narrow to contain the true IRF. It is thus reasonable for our framework to increase the power of the most distant components (closer to lag 20) as a replacement of the power of even more distant ones.

Figure 8.

Figure 8. IRFs for synthetic applications #14 (a) and #15 (b) representing a causal system. By construction, the IRF has 1025 non-zero weights (IH = 1024) and a random term in the system dynamics. (Online version in colour.)

It is most important that the estimated IRF reproduces the zero gj for time lags j < 0, thus capturing the correct causality direction. If we reverse the direction, i.e. y → x, the system becomes anti-causal (zero gj for time lags j > 0), thus again recovering the correct causality direction. In both cases, the explained variance ratio is high, e = 0.57 and 0.50, for x → y and y → x, respectively. Notably, in the latter case, if we considered only the correlation with just one term, with time lag –8, the explained variance ratio would be virtually the same. Overall, the results do not leave any doubt that the true causality direction is x → y.

In our final synthetic applications, #16 – #18, we consider a symmetric HOE causal system. This is similar to that in the first applications #1 and #2, except that we now use long-range cross-dependence with IL = IH = 1024 and include a random term uτ. Applications #16 and #17, depicted in figure 9a, examine the causal direction x → y and they only differ in the roughness term in optimization of IRF. Namely, in application #16, the roughness was calculated without considering the second derivative at zero, in agreement with the fact that in the true system the second derivative is not defined at zero due to discontinuity of the first derivative. In application #17, all roughness terms are included and, as a result, a smoother IRF curve is produced. In both cases, our methodology captures the time symmetry of the system. The increasing gj as the j approaches ±20 give an alert that our time window is too narrow to capture the long-range cross-dependence—something similar with what we observed in application #14. If we reverse the direction, i.e. y → x (application #18), again the time symmetry of the system is confirmed, but now the explained variance ratio (table 1) decreases from e = 0.71 (for x → y) to 0.57 (for y → x). Notably, in the latter case, if we considered only the synchronous (lag 0) cross-correlation, the explained variance ratio would be virtually the same. For this reason, we will again choose as preferential the direction x → y and try a wider time window to determine the IRF.

Figure 9.

Figure 9. IRFs for synthetic applications #16 and #17 (a) and #18 (b) representing a symmetric HOE causal system. By construction, the IRF has 2049 non-zero weights (IL = IH = 1024). In estimation 1’, the roughness was calculated without considering the second derivative at zero (application #16), while in ‘estimation 2’, all roughness terms are included (application #17). (Online version in colour.)

(b) Precipitation–runoff

At the global scale, the hydrological cycle is obviously a family of processes that act in a cyclical manner [10], precipitation–runoff–precipitation– … and therefore can be thought of as a HOE case of causality. However, if we specify a particular location on Earth, the situation is different and it is well known that at a local scale runoff is caused by past precipitation upstream in the drainage basin in a mono-directional fashion. We thus expect that precipitation (P) and runoff (R) data should reflect this pattern.1

To explore this, we use rainfall and streamflow data from the database of the US Geological Survey for the site USGS 01603000 North Branch Potomac River Near Cumberland, MD (39 °37′18.5′′ N, 78 °46′24.3′′ W, catchment area 2271 km2). The data series are for the period 1 October 2013 to 25 February 2021 for a time step of 15 min (a part of these data was also used for a similar purpose in [15]). The discharge data were converted to metric units (from cubic feet per second to m3/s and the precipitation data from inches to millimetres). Both series have a small percentage of missing values, which were left unfilled. The series were aggregated to a time scale of 3 h in order for our time window of length 40 to represent a period of a few days (120 h or 5 d).

Indeed, figure 10a and table 1 (case study #19) suggest a causal system with direction P → R, a time span greater than 20 time units (60 h) and characteristic time lags greater than 10 time units (30 h). The increasing gj beyond the time lag j = 17 is clearly an artefact produced by the fact that the time span of the causal relationship exceeds the size of our time window. Had the latter been large enough, we would expect to see a shape like a unit hydrograph, with a monotonically decreasing limb after the peak at time lag 6 (see below). If we reverse the direction, i.e. R → P (figure 10b and table 1, case study #20), we just confirm that the true causality direction is the opposite, i.e. P → R. The explained variance ratio in the direction R → P is very low, e = 0.04, not greater than implied by merely the maximum cross-correlation value. In the correct direction, P → R, this increases by a factor of 4 (e = 0.17). Yet the latter value would still be too low if our aim were to capture the system dynamics. One reason for this low value is the narrow time window, as already discussed. A second reason becomes obvious from the scatter plot of runoff versus precipitation in figure 11a. The scatter plot does not suggest linearity and we may expect a larger e if we transform the two variables.

Figure 10.

Figure 10. IRFs for precipitation–runoff case studies #19 (a) and #20 (b). (Online version in colour.)

Figure 11.

Figure 11. Scatter plots of lagged runoff (R) versus precipitation (P), where the time lag is the one that maximizes cross-correlation. (a) untransformed variables, applications #19 – #20; (b) transformed variables, applications #21 – #22. (Online version in colour.)

We choose to apply to each of the variables a single-parameter (c) transformation suggested in an entropy maximization framework [16], namely

2.7
where the two parameters were optimized simultaneously with the IRF and were found to be cP = 0.0392 mm h−1, cR = 0.001 m3 s−1. Practically, the low values of the parameters mean that the transformations made are almost pure logarithmic. Yet the transformation in equation (2.7) is more advantageous than the pure logarithmic, because of its property of keeping the smallest values for P or R < c virtually unchanged and the zero values precisely unchanged. The scatter plot of the transformed variables is presented in figure 11b. While a linear arrangement of the points is more visible in the transformed variables rather than in the untransformed ones (figure 11a), there was no improvement (actually there was slight worsening) in the achieved maximum cross-correlation (figure 11 and table 1). Yet the transformation resulted in a substantial improvement in the explained variance ratio: from e = 0.17, it increased fourfold, to e = 0.68 for the direction P → R. In the opposite direction, there was no change. This means that a cross-correlation at a single time lag is a very poor representation of (potential) causality and justifies our framework which is based on an appropriate range of time lags.

The IRFs of the transformed processes are shown in figure 12, which is not essentially different from figure 10. This means that, while the transformation offers explanatory power, our methodology, even applied to the untransformed processes, is able to reveal the causal relationship (recall also case studies #12 – #13 and see theoretical explanation in Section SI1.2 of the electronic supplementary material of the companion paper [1]).

Figure 12.

Figure 12. IRFs for transformed precipitation–runoff case studies #21 (a) and #22 (b); for the IRFs of the original dataset (figure 10). (Online version in colour.)

To show that the increasing limb of gj beyond the time lag j = 17 in applications #19 and #21 is an artefact of the small time window, we repeated the calculations with a time window twice as long, i.e. J = 40 instead of J = 20. The results are shown in figure 13. Where there were increasing limbs in applications #19 and #21, now there is continuation of the decreasing limbs, as expected. As a result, the characteristic lags increased: in 3 h time units, the mean μh became 17.35 and 19.10 for untransformed and transformed variables, respectively (from 10.24 and 10.84, respectively), and the median h1/2 increased to 15.64 and 18.54 (from 9.83 and 10.72, respectively). The explained variance increased to 0.26 and 0.71 for untransformed and transformed variables, respectively (from 0.17 and 0.68, respectively). However, again there appear (smaller) increasing limbs close to the right end of the wider window. This means that even J = 40 is still not enough to recover the dynamics and we should choose an even higher value of J. But as we have repeatedly stated, the scope of the study is not to provide a model of the process but to explore causality.

Figure 13.

Figure 13. IRFs for precipitation–runoff case studies similar to #19 ((a) untransformed variables) and #21 ((b) transformed variables) but with a time window twice as long, i.e. J = 40 instead of J = 20; results are plotted only for non-negative lags. (Online version in colour.)

(c) Atmospheric temperature and carbon dioxide concentration

The problem related to the causal relationship between atmospheric temperature (T) and concentration of carbon dioxide ([CO2]) is regarded by many as part of a ‘settled science’ yet it remains challenging and still debated. For example, the study by Koutsoyiannis & Kundzewicz [4] concluded, making use of the HOE causality concept and based on the analysis of modern measurements of T and CO2, that the principal causality direction is

, despite the common conviction that the opposite is true. In addition, using palaeoclimatic proxy data from Vostok ice cores, Koutsoyiannis [15] and Koutsoyiannis & Kundzewicz [4] found a time lag of CO2 from T of a thousand years. Here we re-examine both modern and paleo datasets with our proposed causality detection methodology.

As modern observations for global temperature, we use the satellite dataset developed at the University of Alabama in Huntsville (UAH). The temperature of three broad levels of the troposphere is inferred from satellite measurements of the oxygen radiance in the microwave band, using advanced (passive) microwave sounding units on National Oceanic and Atmospheric Administration (NOAA) and NASA satellites [17,18]. The dataset begins in 1979 and continues to date. It is publicly available at a monthly scale in the form of time series of ‘anomalies’ (defined as differences from long-term means) for several parts of the Earth, as well as in map form. Here we use only the global average (noting that Liang [19] disputes the use of averages as he claims that generally they conceal regional patterns of change) on the monthly scale for the lowest level, referred to as the lower troposphere. For the CO2 concentration, we use the most famous dataset, that of the Mauna Loa Observatory [20]. The Observatory, located on the north flank of Mauna Loa Volcano, on the Big Island of Hawaii, USA, at an elevation of 3397 m above sea level, is a premier atmospheric research facility that has been continuously monitoring and collecting data related to the atmosphere since 1958. Here we examine the common 42-year period of the two datasets (1979 to May 2021) at a monthly scale.

Both datasets were also used by Koutsoyiannis & Kundzewicz [4] who also examined a ground-based temperature data series (CRUTEM.4.6.0.0 global T2 m land temperature) and three additional CO2 series (Barrow, Alaska, USA; South Pole; global average). The results did not substantially differ for the different pairs of T – [CO2] series and therefore here we limit our current analysis to one pair. We additionally note that we also examined the temperature data of the other two satellite levels for the troposphere and the results were very similar to those reported here for the lower troposphere.

Furthermore, in our analysis, we follow the data pre-processing justified in Koutsoyiannis & Kundzewicz [4]; namely

1.

We take the logarithms of [CO2], based on Arrhenius’s [21] rule that when [CO2] increases in geometric progression, T will increase nearly in arithmetic progression.

2.

We take the difference of each monthly value from that of the same month of the previous year. This diminishes the effect of seasonality while eliminating possible artificial effects of the convention of giving the temperature data as ‘anomalies’ (departures from changing monthly means) rather than actual values.

We note that, since we are differencing the process, taking the average of 12 consecutive monthly differences (which is a more established process) is equivalent to taking a difference for a time step of 12 months (notice that x2 − x1 + x3 − x2 + ··· + x13 − x12 = x13 − x1). We further note that Koutsoyiannis and Kundzewicz also examined the non-differenced series and showed that they give spurious results due to the continuously rising [CO2] in the time window of modern observations, which results in autocorrelation values that are virtually 1 for all lags. Here the case of spurious results due to very high autocorrelation is explained more thoroughly—see Section SI2.2 in electronic supplementary material, and notice the similarity of figure SI2.3 with figure 9 in Koutsoyiannis & Kundzewicz [4].

Hence the examined processes are ΔT and

. Figure 14 gives the obtained IRFs in the directions (a) and (b). Impressively, the results are not different from those in the precipitation–runoff case study. Clearly, the results in figure 14 suggest a (mono-directional) potentially causal system with T as the cause and [CO2] as the effect. Hence, the common perception that increasing [CO2] causes increased T can be excluded as it violates the necessary condition for this causality direction. Even the possibility of HOE causality, supported by Koutsoyiannis & Kundzewicz [4], is not confirmed by the new methodological framework. The causality direction

is further supported by all numerical indices given in table 1, as well as the graphical comparison of modelled and empirical cross-correlation functions depicted in figure 15:

All characteristic time lags (hc, μh, h1/2) are positive in the direction

(ranging from 5 to about 8 months) and negative in the direction

.

The explained variance ratio is greater in the direction

(e = 0.31) than in the direction

(e = 0.23).

In the direction

, the cross-correlation function, reconstructed from the IRF and the autocorrelation function using the discretized version of equation (3.8) in the companion paper [1]), agrees impressively well with the empirical cross-correlation function. In the direction

, the proximity of the two is much lower.

Figure 14.

Figure 14. IRFs for of temperature–CO2 concentration based on the modern time series. (a) Case study #23 (

); (b) case study #24 (

). (Online version in colour.)

Figure 15.

Figure 15. (a) Autocorrelation function of (i) ΔT and (ii)

. (b) Cross-correlation functions, empirical and reconstructed (marked as modelled’) from the IRF and the autocorrelation functions in the upper panels, using the discretized version of equation (3.8) in the companion paper, for case studies (i) #23 and (ii) #24. (Online version in colour.)

Remarkably, however, the explained variance ratio of e = 0.31 is low and suggests that the two processes have a behaviour that is much more complex and affected by additional geophysical processes. However, insofar the relationship of these two processes is concerned, this explained variance ratio is adequate to detect the main characteristics, i.e. direction and time lags. Indications for this adequacy are provided by the precipitation–runoff case study (§2b), in which e = 0.17, as well in the additional controlled (synthetic) case study that is provided in electronic supplementary material (section SI2.1, e = 1/3).

Having gathered strong indications that in the recent decades the increase of temperature is potentially the cause of the increased CO2 concentration, while the opposite is not probable, it is interesting to examine if this is also the case for longer periods. To this aim, we use the datasets from the Vostok ice cores [2224] which were originally given for an irregular time step and were regularized in the study of Koutsoyiannis [15] for a time resolution of 1000 years.

We study again the processes ΔT and

, where the differences are taken for 1 time step (1000 years). Figure 16 gives the obtained IRFs in the directions (a) and (b). Here, the results support a HOE causality. Nonetheless, again the principal direction is with a time lag of the order of 1000 years (0.79 to 1.11 time steps for ; –0.56 to –0.82 time steps for

; table 1, case studies #25 – #26). The explained variance ratio is small, e = 0.17 in both directions.

Figure 16.

Figure 16. IRFs for of temperature–CO2 concentration based on the proxy time series from the Vostok ice cores. (a) Case study #25 (

); (b) case study #26 (

). (Online version in colour.)

As the proxy datasets are free of monotonic trends and produce reasonable empirical autocorrelation functions (see [15]), here we could also apply our framework for the non-differenced processes. We initially note that, if xτ is a differenced process (where the differences are taken for 1 time step) and Xτ the non-differenced (original) one, then the two are related by

2.8
and likewise for y(t) and v(t). In this case from equation (1.2), it follows
2.9

This is identical to equation (1.2) which means that the original and differenced processes are related with the same potential causality equation involving the same IRF. Furthermore, as a result of aggregation, and the resulting smoothing of the noise Vτ, it is expected that the explained variance ratio should be higher after aggregation.

We test if this happens in the case of the proxy T and [CO2] data. Figure 17 gives the obtained IRFs in the directions

(a) and (b). The shapes of the curves are quite similar to those of figure 16. Again, we find HOE causality with principal direction is and with a time lag of the order of 1000 years (1.74 to 1.87 time steps for ; –1.03 to –1.10 time steps for ; table 1, case studies #27 – #28). The explained variance ratio increased substantially, from e = 0.17 to e = 0.86 and is greater in the direction

, thus confirming the latter as the principal causality direction.

Figure 17.

Figure 17. IRFs for of temperature–CO2 concentration based on the proxy time series from the Vostok ice cores as in figure 16 but for the aggregated (non-differenced) processes. (a) Case study #27 (

); (b) case study #28 (

). (Online version in colour.)

(d) Atmospheric temperature and El Niño–Southern Oscillation

For a second case study related to climate, we examine the El Niño–Southern Oscillation (ENSO) in relation to the global temperature. The ENSO is associated with irregular, anti-persistent (at the time scale of a few years, else known as quasi-periodic), variation of sea surface temperature and air pressure over the tropical Pacific Ocean. It is broadly recognized as the principal climate variability mode [25]. There exists a plethora of indices related to ENSO, among which here we use the Southern oscillation index (SOI) of the USA's NOAA (see also [26]).

In a recent paper, Kundzewicz et al. [5] examined several ENSO indices, as well as indices of similar phenomena in the Pacific and in the Atlantic, and found that they meaningfully influence the global mean annual temperature. Here, we examine a single pair of processes, the UAH temperature, also used in case studies #23 – #28 (§2c), and the SOI. While Kundzewicz et al. [5] processed the residuals from 5-year running averages to remove the influence of large-scale fluctuations, here we use the data without averaging. However, in both time series, we take the difference of each monthly value from that of the same month of the previous year, a technique already described in §2c.

The results are depicted in figure 18, which suggests a clear case of a potentially causal system in the direction SOI → T. As seen in table 1, the characteristic lags are close to six months and the explained variance is 0.39. We note that the technique used by Kundzewicz et al. [5] (residuals from 5-year running averages) explained a higher percentage of the variance (greater than 0.60), which means that there is a margin for improvement of the results obtained here. However, as already mentioned, the scope of the current study is the exploration of potential causality, rather than the building of a reliable model.

Figure 18.

Figure 18. IRFs for of SOI and temperature (T). (a) Case study #29 (ΔSOI → ΔT); (b) case study #30 (ΔT → ΔSOI). (Online version in colour.)

3. Discussion and conclusion

The theoretical methodology proposed in the companion paper [1] was applied to a range of case studies, both synthetic (artificial) and real world. Since the system dynamics in the artificial cases is known, this enables the method to be tested and validated. We showed that the method does not introduce any spurious potential causation when none is present in the system dynamics, except in cases of very high autocorrelation (see Section SI2.2 in electronic supplementary material), which, nonetheless, is easily handled by studying the changes in the time series (differenced time series) instead of the original time series. Further, the requirement that the IRF coefficients be non-negative is, on its own, sufficient to enable the correct direction of causality to be inferred. (Recall from the companion paper [1] that we do not subsume oscillatory nonlinear systems, in which the sign of g(h) could alternate, under the causality notion, which accords with Cox's [27] conditions for causality and in particular the monotonic relation of the cause with the effect.) When the roughness condition is added, this enables the correct system dynamics to be recovered.

We showed how to deal with the presence of nonlinearity (in which case a nonlinear transform of the potential cause should be carried out) and of long-term persistence in the time series (which causes rising limbs in the IRF at the edges of its domain).

In addition to causality studies of synthetic examples, the theoretical framework was also applied to three real-world geophysical examples. In the first case, the causing of runoff by rainfall over a catchment is well established. The proposed framework indicates the correct direction of causation, even while suggesting the need for a nonlinear transformation and the need to extend the size of the domain of the IRF. The framework was further validated in its ability to detect the clear causal connection between SOI and global mean annual temperature.

The remaining real-world case study led to an important side product of the current research. This is the surprising finding that, while in general, the causal relationship of atmospheric T and CO2 concentration, as obtained by proxy data, appears to be of HOE type with principal direction

, in the recent decades, the more accurate modern data support a conclusion that this principal direction has become exclusive. In other words, it is the increase of temperature that caused increased CO2 concentration. Though this conclusion may sound counterintuitive at first glance, because it contradicts common perception (and for this reason we have assessed the case with an alternative parametric methodology in the electronic supplementary material, section SI2.4, with results confirming those presented here), in fact it is reasonable. The temperature increase began at the end of the Little Ice Period, in the early nineteenth century, when human CO2 emissions were negligible; hence other factors, such as the solar activity (measured by sunspot numbers), as well as internal long-range mechanisms of the complex climatic systems had to play their roles.

A possible physical mechanism for the [CO2] increase, as a result of temperature increase, was proposed by Koutsoyiannis & Kundzewicz [4] and involves biochemical reactions, as, at higher temperatures, soil respiration and hence natural CO2 emissions, are increasing. In addition, as pointed out by Liu et al. [28], the influence of El Niño on climate is accompanied by large changes to the carbon cycle, with the pantropical biosphere releasing much more carbon into the atmosphere during large El Niño occurrences. Noticeably, in a very recent paper, Goulet Coulombe & Göbel [29] seem to confirm the finding by Koutsoyiannis & Kundzewicz [4], yet they deem it an ‘apparently counterintuitive finding that GMTA [global mean surface temperature anomalies] explains a larger portion of the forecast error variance of CO2 than vice versa’. To ‘resolve’ it, they ‘explore a last avenue, that of using annual CO2 emissions'. However, using anthropogenic CO2 emissions, which contribute only a small portion (3.8%) to the global carbon cycle [30], as a principal variable is definitely less meaningful than using the atmospheric CO2 concentration.

We believe that counterintuitive results, such as those about the causal link between temperature and CO2 concentration conveyed in this paper, can indeed open up avenues of research. However, these avenues of research might not resolve the issue in a way compatible with what intuition dictates. In the history of science, such avenues were often created when established ideas were overturned by new findings.

By letting the geophysical records speak for themselves, with the help of our original methodology, we discovered a regularity that apparently contradicts common opinion. Our innovative findings should be given considerable attention as well as careful and critical scrutiny in the form of public discussion by the scientific community, which will undoubtedly improve understanding. If the methodology we proposed in the companion paper [1] stands up to scrutiny, then our novel, high-impact results, i.e. those of cases #23 – #28 in the present paper, will have to be taken seriously and interpreted. Further research on the regularities of the causal behaviour of the climate system reported herein, being of considerable importance and relevance, is urgently needed.

Data accessibility

All data used in the paper are available online for free. The rainfall and runoff data were obtained from this U.S. Geological Survey database, publicly available at https://nwis.waterdata.usgs.gov/md/nwis/uv/?site_no=01603000&agency_cd=USGS (accessed June 2021). The temperature time series and the Mauna Loa CO2 time series are readily available on monthly scale from http://climexp.knmi.nl (accessed June 2021). The palaeoclimatic data of Vostok CO2 were retrieved from http://cdiac.ess-dive.lbl.gov/ftp/trends/co2/vostok.icecore.co2 (dated January 2003, accessed September 2018) and the temperature data from http://cdiac.ess-dive.lbl.gov/ftp/trends/temp/vostok/vostok.1999.temp.dat (dated January 2000, accessed September 2018). The SOI data are provided by National Centers for Environmental Information (NCEI) of the USA's NOAA https://www.ncdc.noaa.gov/teleconnections/enso/indicators/soi/.

Comments

Popular posts from this blog

AN INTERESTING CONCLUSION… SOLAR FARMS WILL BECOME THUNDERSTORM and TORONADO INCUBATORS and MAGNETS.

IT’S A LITTLE LONG BUT DEFINITELY WORTH THE READ

Fact Checking The Claim Of 97% Consensus On Anthropogenic Climate Change