10.5 Top-down approaches

Top-down approaches involve first generating forecasts for the “Total” series \(y_t\) on the top of the aggregation structure and then disaggregating these downwards. We let \(p_1,\dots,p_{m}\) be a set of proportions which dictate how the base forecasts of the “Total” series are to be distributed to revised forecasts for each series at the bottom-level of the structure. For example for the hierarchy of Figure 10.1 using proportions \(p_1,\dots,p_{5}\) we get, \[ \ytilde{AA}{t}=p_1\hat{y}_t,~~~\ytilde{AB}{t}=p_2\hat{y}_t,~~~\ytilde{AC}{t}=p_3\hat{y}_t,~~~\ytilde{BA}{t}=p_4\hat{y}_t~~~\text{and}~~~~~~\ytilde{BB}{t}=p_5\hat{y}_t. \] Using matrix notation we can stack the set of proportions in a \(m\)-dimensional column vector \(\bm{p}=(p_1,\ldots,p_{m})'\)20 and write \[\tilde{\bm{y}}_{K,t}=\bm{p}\hat{y}_t\] Once the bottom-level \(h\)-step-ahead forecasts have been generated these can be aggregated to generate coherent forecasts for the rest of the series. In general using the summing matrix and for a specified set of proportions, top-down approaches can be represented as, \[\tilde{\bm{y}}_h=\bm{S}\bm{p}\hat{y}_t.\] Note that for all top-down approaches the top-level coherent forecasts are equal to the top-level base forecasts, i.e., \(\tilde{y}_{h}=\hat{y}_{h}\).

The most common top-down approaches specify proportions based on the historical proportions of the data. The two most common versions follow. These performed well in the study of Gross and Sohl (1990) hence the acronyms in the forecast package.

Average historical proportions

\[ p_j=\frac{1}{T}\sum_{t=1}^{T}\frac{y_{j,t}}{{y_t}} \] for \(j=1,\dots,m\). Each proportion \(p_j\) reflects the average of the historical proportions of the bottom-level series \(y_{j,t}\) over the period \(t=1,\dots,T\) relative to the total aggregate \(y_t\).

This approach is implemented in the forecast package by setting

forecast(...,method = "tdgsa",...). 

Proportions of the historical averages

\[ p_j={\sum_{t=1}^{T}\frac{y_{j,t}}{T}}\Big/{\sum_{t=1}^{T}\frac{y_t}{T}} \] for \(j=1,\dots,m\). Each proportion \(p_j\) captures the average historical value of the bottom-level series \(y_{j,t}\) relative to the average value of the total aggregate \(y_t\).

This approach is implemented in the forecast package by setting

forecast(..., method = "tdgsf", ...). 

The greatest attribute of such top-down approaches is their simplicity to apply. One only needs to model and generate forecasts for the most aggregated top-level series. In general these approaches seem to produce quite reliable forecasts for the aggregate levels and they are very useful with low count data. On the other hand, their greatest disadvantage is the loss of information due to aggregation. Using such top-down approaches, we are unable to capture and take advantage of individual series characteristics such as time dynamics, special events, etc.

Forecast proportions

An alternative approach that improves on the historical and static nature of the proportions specified above is to use forecast proportions introduced in Athanasopoulos, Ahmed, and Hyndman (2009).

To demonstrate the intuition of this method, consider a one level hierarchy. We first generate \(h\)-step-ahead base forecasts for all the series. At level 1 we calculate the proportion of each \(h\)-step-ahead base forecast to the aggregate of all the \(h\)-step-ahead base forecasts at this level. We refer to these as the forecast proportions and we use these to disaggregate the top-level \(h\)-step-ahead forecast and generate coherent forecasts for the whole of the hierarchy.

For a \(K\)-level hierarchy this process is repeated for each node going from the top to the very bottom-level. Applying this process leads to the following general rule for obtaining the forecast proportions \[ p_j=\prod^{K-1}_{\ell=0}\frac{\hat{y}_{j,h}^{(\ell)}}{\hat{S}_{j,h}^{(\ell+1)}} \] for \(j=1,2,\dots,m\). These forecast proportions disaggregate the \(h\)-step-ahead base forecast of the “Total” series to \(h\)-step-ahead coherent forecasts of the bottom-level series. \(\hat{y}_{j,h}^{(\ell)}\) is the \(h\)-step-ahead base forecast of the series that corresponds to the node which is \(\ell\) levels above \(j\). \(\hat{S}_{j,h}^{(\ell)}\) is the sum of the \(h\)-step-ahead base forecasts below the node that is \(\ell\) levels above node \(j\) and are directly connected to that node.

We will use the hierarchy of Figure 10.1 to explain this notation and to demonstrate how this general rule is reached. Assume we have generated base forecasts for each series in the hierarchy. Recall that for the top-level “Total” series, \(\tilde{y}_{h}=\hat{y}_{h}\), for any top-down approach. Here are some examples using the above notation:

  • \(\hat{y}_{\text{A},h}^{(1)}=\hat{y}_{\text{B},h}^{(1)}=\hat{y}_{h}= \tilde{y}_{h}\)
  • \(\hat{y}_{\text{AA},h}^{(1)}=\hat{y}_{\text{AB},h}^{(1)}=\hat{y}_{\text{AC},h}^{(1)}= \hat{y}_{\text{A},h}\)
  • \(\hat{y}_{\text{AA},h}^{(2)}=\hat{y}_{\text{AB},h}^{(2)}= \hat{y}_{\text{AC},h}^{(2)}=\hat{y}_{\text{BA},h}^{(2)}= \hat{y}_{\text{BB},h}^{(2)}=\hat{y}_{h}= \tilde{y}_{h}\)
  • \(\Shat{AA}{h}{1} = \Shat{AB}{h}{1}= \Shat{AC}{h}{1}= \yhat{AA}{h}+\yhat{AB}{h}+\yhat{AC}{h}\)
  • \(\Shat{AA}{h}{2} = \Shat{AB}{h}{2}= \Shat{AC}{h}{2}= \Shat{A}{h}{1} = \Shat{B}{h}{1}= \hat{S}_{h}= \yhat{A}{h}+\yhat{B}{h}\)

Moving down the farthest left branch of the hierarchy coherent forecasts are given by \[ \ytilde{A}{h} = \Bigg(\frac{\yhat{A}{h}}{\Shat{A}{h}{1}}\Bigg) \tilde{y}_{h} = \Bigg(\frac{\yhat{AA}{h}^{(1)}}{\Shat{AA}{h}{2}}\Bigg) \tilde{y}_{h} \] and \[ \ytilde{AA}{h} = \Bigg(\frac{\yhat{AA}{h}}{\Shat{AA}{h}{1}}\Bigg) \ytilde{A}{h} =\Bigg(\frac{\yhat{AA}{h}}{\Shat{AA}{h}{1}}\Bigg) \Bigg(\frac{\yhat{AA}{h}^{(1)}}{\Shat{AA}{h}{2}}\Bigg)\tilde{y}_{h}. \] Consequently, \[ p_1=\Bigg(\frac{\yhat{AA}{h}}{\Shat{AA}{h}{1}}\Bigg) \Bigg(\frac{\yhat{AA}{h}^{(1)}}{\Shat{AA}{h}{2}}\Bigg) \] The other proportions can be similarly obtained. The greatest disadvantage of the top-down forecast proportions approach, which is a disadvantage of any top-down approach, is that they do not produce unbiased revised forecasts even if the base forecasts are unbiased as shown by Hyndman et al. (2011)

This approach is implemented in the forecast package by setting

forecast(..., method = "tdfp", ...). 

References

Athanasopoulos, George, Roman A Ahmed, and Rob J Hyndman. 2009. “Hierarchical Forecasts for Australian Domestic Tourism.” International Journal of Forecasting 25 (January): 146–66. https://doi.org/10.1016/j.ijforecast.2008.07.004.

Gross, C W, and J E Sohl. 1990. “Disaggregation methods to expedite product line forecasting.” Journal of Forecasting 9: 233–54.

Hyndman, Rob J, Roman A Ahmed, George Athanasopoulos, and Han Lin Shang. 2011. “Optimal Combination Forecasts for Hierarchical Time Series.” Computational Statistics and Data Analysis 55 (September). Citeseer: 2579–89. https://doi.org/10.1016/j.csda.2011.03.006.


  1. \(A'\) denotes the transpose of \(A\).