diagnostics.md (prophet-0.7) | : | diagnostics.md (prophet-1.0) | ||
---|---|---|---|---|
--- | --- | |||
layout: docs | layout: docs | |||
docid: "diagnostics" | docid: "diagnostics" | |||
title: "Diagnostics" | title: "Diagnostics" | |||
permalink: /docs/diagnostics.html | permalink: /docs/diagnostics.html | |||
subsections: | subsections: | |||
- title: Cross validation | ||||
id: cross-validation | ||||
- title: Parallelizing cross validation | - title: Parallelizing cross validation | |||
id: parallelizing-cross-validation | id: parallelizing-cross-validation | |||
- title: Hyperparameter tuning | - title: Hyperparameter tuning | |||
id: hyperparameter-tuning | id: hyperparameter-tuning | |||
--- | --- | |||
<a id="cross-validation"> </a> | ||||
### Cross validation | ||||
Prophet includes functionality for time series cross validation to measure forec ast error using historical data. This is done by selecting cutoff points in the history, and for each of them fitting the model using data only up to that cutof f point. We can then compare the forecasted values to the actual values. This fi gure illustrates a simulated historical forecast on the Peyton Manning dataset, where the model was fit to a initial history of 5 years, and a forecast was made on a one year horizon. | Prophet includes functionality for time series cross validation to measure forec ast error using historical data. This is done by selecting cutoff points in the history, and for each of them fitting the model using data only up to that cutof f point. We can then compare the forecasted values to the actual values. This fi gure illustrates a simulated historical forecast on the Peyton Manning dataset, where the model was fit to a initial history of 5 years, and a forecast was made on a one year horizon. | |||
 |  | |||
[The Prophet paper](https://peerj.com/preprints/3190.pdf) gives further descript ion of simulated historical forecasts. | [The Prophet paper](https://peerj.com/preprints/3190.pdf) gives further descript ion of simulated historical forecasts. | |||
This cross validation procedure can be done automatically for a range of histori cal cutoffs using the `cross_validation` function. We specify the forecast horiz on (`horizon`), and then optionally the size of the initial training period (`in itial`) and the spacing between cutoff dates (`period`). By default, the initial training period is set to three times the horizon, and cutoffs are made every h alf a horizon. | This cross validation procedure can be done automatically for a range of histori cal cutoffs using the `cross_validation` function. We specify the forecast horiz on (`horizon`), and then optionally the size of the initial training period (`in itial`) and the spacing between cutoff dates (`period`). By default, the initial training period is set to three times the horizon, and cutoffs are made every h alf a horizon. | |||
The output of `cross_validation` is a dataframe with the true values `y` and the out-of-sample forecast values `yhat`, at each simulated forecast date and for e ach cutoff date. In particular, a forecast is made for every observed point betw een `cutoff` and `cutoff + horizon`. This dataframe can then be used to compute error measures of `yhat` vs. `y`. | The output of `cross_validation` is a dataframe with the true values `y` and the out-of-sample forecast values `yhat`, at each simulated forecast date and for e ach cutoff date. In particular, a forecast is made for every observed point betw een `cutoff` and `cutoff + horizon`. This dataframe can then be used to compute error measures of `yhat` vs. `y`. | |||
Here we do cross-validation to assess prediction performance on a horizon of 365 days, starting with 730 days of training data in the first cutoff and then maki ng predictions every 180 days. On this 8 year time series, this corresponds to 1 1 total forecasts. | Here we do cross-validation to assess prediction performance on a horizon of 365 days, starting with 730 days of training data in the first cutoff and then maki ng predictions every 180 days. On this 8 year time series, this corresponds to 1 1 total forecasts. | |||
```R | ```R | |||
# R | # R | |||
df.cv <- cross_validation(m, initial = 730, period = 180, horizon = 365, units = 'days') | df.cv <- cross_validation(m, initial = 730, period = 180, horizon = 365, units = 'days') | |||
head(df.cv) | head(df.cv) | |||
``` | ``` | |||
```python | ```python | |||
# Python | # Python | |||
from fbprophet.diagnostics import cross_validation | from prophet.diagnostics import cross_validation | |||
df_cv = cross_validation(m, initial='730 days', period='180 days', horizon = '36 5 days') | df_cv = cross_validation(m, initial='730 days', period='180 days', horizon = '36 5 days') | |||
``` | ``` | |||
```python | ```python | |||
# Python | # Python | |||
df_cv.head() | df_cv.head() | |||
``` | ``` | |||
<div> | <div> | |||
<style scoped> | <style scoped> | |||
.dataframe tbody tr th:only-of-type { | .dataframe tbody tr th:only-of-type { | |||
skipping to change at line 69 | skipping to change at line 75 | |||
<th>yhat_lower</th> | <th>yhat_lower</th> | |||
<th>yhat_upper</th> | <th>yhat_upper</th> | |||
<th>y</th> | <th>y</th> | |||
<th>cutoff</th> | <th>cutoff</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<th>0</th> | <th>0</th> | |||
<td>2010-02-16</td> | <td>2010-02-16</td> | |||
<td>8.956828</td> | <td>8.959678</td> | |||
<td>8.460272</td> | <td>8.470035</td> | |||
<td>9.476844</td> | <td>9.451618</td> | |||
<td>8.242493</td> | <td>8.242493</td> | |||
<td>2010-02-15</td> | <td>2010-02-15</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<th>1</th> | <th>1</th> | |||
<td>2010-02-17</td> | <td>2010-02-17</td> | |||
<td>8.723230</td> | <td>8.726195</td> | |||
<td>8.208639</td> | <td>8.236734</td> | |||
<td>9.222179</td> | <td>9.219616</td> | |||
<td>8.008033</td> | <td>8.008033</td> | |||
<td>2010-02-15</td> | <td>2010-02-15</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<th>2</th> | <th>2</th> | |||
<td>2010-02-18</td> | <td>2010-02-18</td> | |||
<td>8.607021</td> | <td>8.610011</td> | |||
<td>8.106506</td> | <td>8.104834</td> | |||
<td>9.104792</td> | <td>9.125484</td> | |||
<td>8.045268</td> | <td>8.045268</td> | |||
<td>2010-02-15</td> | <td>2010-02-15</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<th>3</th> | <th>3</th> | |||
<td>2010-02-19</td> | <td>2010-02-19</td> | |||
<td>8.528870</td> | <td>8.532004</td> | |||
<td>8.061701</td> | <td>7.985031</td> | |||
<td>9.024450</td> | <td>9.041575</td> | |||
<td>7.928766</td> | <td>7.928766</td> | |||
<td>2010-02-15</td> | <td>2010-02-15</td> | |||
</tr> | </tr> | |||
<tr> | <tr> | |||
<th>4</th> | <th>4</th> | |||
<td>2010-02-20</td> | <td>2010-02-20</td> | |||
<td>8.270872</td> | <td>8.274090</td> | |||
<td>7.773299</td> | <td>7.779034</td> | |||
<td>8.745526</td> | <td>8.745627</td> | |||
<td>7.745003</td> | <td>7.745003</td> | |||
<td>2010-02-15</td> | <td>2010-02-15</td> | |||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table> | </table> | |||
</div> | </div> | |||
In R, the argument `units` must be a type accepted by `as.difftime`, which is we eks or shorter. In Python, the string for `initial`, `period`, and `horizon` sho uld be in the format used by Pandas Timedelta, which accepts units of days or sh orter. | In R, the argument `units` must be a type accepted by `as.difftime`, which is we eks or shorter. In Python, the string for `initial`, `period`, and `horizon` sho uld be in the format used by Pandas Timedelta, which accepts units of days or sh orter. | |||
Custom cutoffs can also be supplied as a list of dates to to the `cutoffs` keywo rd in the `cross_validation` function in Python and R. For example, three cutoff s six months apart, would need to be passed to the `cutoffs` argument in a date format like: | Custom cutoffs can also be supplied as a list of dates to the `cutoffs` keyword in the `cross_validation` function in Python and R. For example, three cutoffs s ix months apart, would need to be passed to the `cutoffs` argument in a date for mat like: | |||
```R | ```R | |||
# R | # R | |||
cutoffs <- as.Date(c('2013-02-15', '2013-08-15', '2014-02-15')) | cutoffs <- as.Date(c('2013-02-15', '2013-08-15', '2014-02-15')) | |||
df.cv2 <- cross_validation(m, cutoffs = cutoffs, horizon = 365, units = 'days') | df.cv2 <- cross_validation(m, cutoffs = cutoffs, horizon = 365, units = 'days') | |||
``` | ``` | |||
```python | ```python | |||
# Python | # Python | |||
cutoffs = pd.to_datetime(['2013-02-15', '2013-08-15', '2014-02-15']) | cutoffs = pd.to_datetime(['2013-02-15', '2013-08-15', '2014-02-15']) | |||
df_cv2 = cross_validation(m, cutoffs=cutoffs, horizon='365 days') | df_cv2 = cross_validation(m, cutoffs=cutoffs, horizon='365 days') | |||
``` | ``` | |||
The `performance_metrics` utility can be used to compute some useful statistics of the prediction performance (`yhat`, `yhat_lower`, and `yhat_upper` compared t o `y`), as a function of the distance from the cutoff (how far into the future t he prediction was). The statistics computed are mean squared error (MSE), root m ean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), median absolute percent error (MDAPE) and coverage of the `yhat_lower` and `yhat_upper` estimates. These are computed on a rolling window of the predic tions in `df_cv` after sorting by horizon (`ds` minus `cutoff`). By default 10% of the predictions will be included in each window, but this can be changed with the `rolling_window` argument. | The `performance_metrics` utility can be used to compute some useful statistics of the prediction performance (`yhat`, `yhat_lower`, and `yhat_upper` compared t o `y`), as a function of the distance from the cutoff (how far into the future t he prediction was). The statistics computed are mean squared error (MSE), root m ean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), median absolute percent error (MDAPE) and coverage of the `yhat_lower` and `yhat_upper` estimates. These are computed on a rolling window of the predic tions in `df_cv` after sorting by horizon (`ds` minus `cutoff`). By default 10% of the predictions will be included in each window, but this can be changed with the `rolling_window` argument. | |||
```R | ```R | |||
# R | # R | |||
df.p <- performance_metrics(df.cv) | df.p <- performance_metrics(df.cv) | |||
head(df.p) | head(df.p) | |||
``` | ``` | |||
```python | ```python | |||
# Python | # Python | |||
from fbprophet.diagnostics import performance_metrics | from prophet.diagnostics import performance_metrics | |||
df_p = performance_metrics(df_cv) | df_p = performance_metrics(df_cv) | |||
df_p.head() | df_p.head() | |||
``` | ``` | |||
<div> | <div> | |||
<style scoped> | <style scoped> | |||
.dataframe tbody tr th:only-of-type { | .dataframe tbody tr th:only-of-type { | |||
vertical-align: middle; | vertical-align: middle; | |||
} | } | |||
skipping to change at line 167 | skipping to change at line 173 | |||
<table border="1" class="dataframe"> | <table border="1" class="dataframe"> | |||
<thead> | <thead> | |||
<tr style="text-align: right;"> | <tr style="text-align: right;"> | |||
<th></th> | <th></th> | |||
<th>horizon</th> | <th>horizon</th> | |||
<th>mse</th> | <th>mse</th> | |||
<th>rmse</th> | <th>rmse</th> | |||
<th>mae</th> | <th>mae</th> | |||
<th>mape</th> | <th>mape</th> | |||
<th>mdape</th> | <th>mdape</th> | |||
<th>smape</th> | ||||
<th>coverage</th> | <th>coverage</th> | |||
</tr> | </tr> | |||
</thead> | </thead> | |||
<tbody> | <tbody> | |||
<tr> | <tr> | |||
<th>0</th> | <th>0</th> | |||
<td>37 days</td> | <td>37 days</td> | |||
<td>0.494800</td> | <td>0.493764</td> | |||
<td>0.703420</td> | <td>0.702683</td> | |||
<td>0.505277</td> | <td>0.504754</td> | |||
<td>0.058540</td> | <td>0.058485</td> | |||
<td>0.050149</td> | <td>0.049922</td> | |||
<td>0.676565</td> | <td>0.058774</td> | |||
<td>0.674052</td> | ||||
</tr> | </tr> | |||
<tr> | <tr> | |||
<th>1</th> | <th>1</th> | |||
<td>38 days</td> | <td>38 days</td> | |||
<td>0.500706</td> | <td>0.499522</td> | |||
<td>0.707606</td> | <td>0.706769</td> | |||
<td>0.510301</td> | <td>0.509723</td> | |||
<td>0.059120</td> | <td>0.059060</td> | |||
<td>0.049955</td> | <td>0.049389</td> | |||
<td>0.675423</td> | <td>0.059409</td> | |||
<td>0.672910</td> | ||||
</tr> | </tr> | |||
<tr> | <tr> | |||
<th>2</th> | <th>2</th> | |||
<td>39 days</td> | <td>39 days</td> | |||
<td>0.522967</td> | <td>0.521614</td> | |||
<td>0.723165</td> | <td>0.722229</td> | |||
<td>0.516433</td> | <td>0.515793</td> | |||
<td>0.059724</td> | <td>0.059657</td> | |||
<td>0.050078</td> | <td>0.049540</td> | |||
<td>0.672682</td> | <td>0.060131</td> | |||
<td>0.670169</td> | ||||
</tr> | </tr> | |||
<tr> | <tr> | |||
<th>3</th> | <th>3</th> | |||
<td>40 days</td> | <td>40 days</td> | |||
<td>0.530259</td> | <td>0.528760</td> | |||
<td>0.728189</td> | <td>0.727159</td> | |||
<td>0.519331</td> | <td>0.518634</td> | |||
<td>0.060033</td> | <td>0.059961</td> | |||
<td>0.049706</td> | <td>0.049232</td> | |||
<td>0.678849</td> | <td>0.060504</td> | |||
<td>0.671311</td> | ||||
</tr> | </tr> | |||
<tr> | <tr> | |||
<th>4</th> | <th>4</th> | |||
<td>41 days</td> | <td>41 days</td> | |||
<td>0.537736</td> | <td>0.536078</td> | |||
<td>0.733305</td> | <td>0.732174</td> | |||
<td>0.520341</td> | <td>0.519585</td> | |||
<td>0.060114</td> | <td>0.060036</td> | |||
<td>0.049955</td> | <td>0.049389</td> | |||
<td>0.685244</td> | <td>0.060641</td> | |||
<td>0.678849</td> | ||||
</tr> | </tr> | |||
</tbody> | </tbody> | |||
</table> | </table> | |||
</div> | </div> | |||
Cross validation performance metrics can be visualized with `plot_cross_validati on_metric`, here shown for MAPE. Dots show the absolute percent error for each p rediction in `df_cv`. The blue line shows the MAPE, where the mean is taken over a rolling window of the dots. We see for this forecast that errors around 5% ar e typical for predictions one month into the future, and that errors increase up to around 11% for predictions that are a year out. | Cross validation performance metrics can be visualized with `plot_cross_validati on_metric`, here shown for MAPE. Dots show the absolute percent error for each p rediction in `df_cv`. The blue line shows the MAPE, where the mean is taken over a rolling window of the dots. We see for this forecast that errors around 5% ar e typical for predictions one month into the future, and that errors increase up to around 11% for predictions that are a year out. | |||
```R | ```R | |||
# R | # R | |||
plot_cross_validation_metric(df.cv, metric = 'mape') | plot_cross_validation_metric(df.cv, metric = 'mape') | |||
``` | ``` | |||
```python | ```python | |||
# Python | # Python | |||
from fbprophet.plot import plot_cross_validation_metric | from prophet.plot import plot_cross_validation_metric | |||
fig = plot_cross_validation_metric(df_cv, metric='mape') | fig = plot_cross_validation_metric(df_cv, metric='mape') | |||
``` | ``` | |||
 |  | |||
The size of the rolling window in the figure can be changed with the optional ar gument `rolling_window`, which specifies the proportion of forecasts to use in e ach rolling window. The default is 0.1, corresponding to 10% of rows from `df_cv ` included in each window; increasing this will lead to a smoother average curve in the figure. The `initial` period should be long enough to capture all of the components of the model, in particular seasonalities and extra regressors: at l east a year for yearly seasonality, at least a week for weekly seasonality, etc. | The size of the rolling window in the figure can be changed with the optional ar gument `rolling_window`, which specifies the proportion of forecasts to use in e ach rolling window. The default is 0.1, corresponding to 10% of rows from `df_cv ` included in each window; increasing this will lead to a smoother average curve in the figure. The `initial` period should be long enough to capture all of the components of the model, in particular seasonalities and extra regressors: at l east a year for yearly seasonality, at least a week for weekly seasonality, etc. | |||
<a id="parallelizing-cross-validation"> </a> | <a id="parallelizing-cross-validation"> </a> | |||
### Parallelizing cross validation | ### Parallelizing cross validation | |||
Cross-validation can also be run in parallel mode in Python, by setting specifyi ng the `parallel` keyword. Four modes are supported | Cross-validation can also be run in parallel mode in Python, by setting specifyi ng the `parallel` keyword. Four modes are supported | |||
* `parallel=None` (Default, no parallelization) | * `parallel=None` (Default, no parallelization) | |||
* `parallel="processes"` | * `parallel="processes"` | |||
* `parallel="threads"` | * `parallel="threads"` | |||
* `parallel="dask"` | * `parallel="dask"` | |||
For problems that aren't too big, we recommend using `parallel="processes"`. It will achieve the highest performance when the parallel cross validation can be d one on a single machine. For large problems, a [Dask](https://dask.org) cluster can be used to do the cross validation on many machines. You will need to [insta ll Dask](https://docs.dask.org/en/latest/install.html) separately, as it will no t be installed with `fbprophet`. | For problems that aren't too big, we recommend using `parallel="processes"`. It will achieve the highest performance when the parallel cross validation can be d one on a single machine. For large problems, a [Dask](https://dask.org) cluster can be used to do the cross validation on many machines. You will need to [insta ll Dask](https://docs.dask.org/en/latest/install.html) separately, as it will no t be installed with `prophet`. | |||
```python | ```python | |||
from dask.distributed import Client | from dask.distributed import Client | |||
client = Client() # connect to the cluster | client = Client() # connect to the cluster | |||
df_cv = cross_validation(m, initial='730 days', period='180 days', horizon='365 days', | df_cv = cross_validation(m, initial='730 days', period='180 days', horizon='365 days', | |||
parallel="dask") | parallel="dask") | |||
skipping to change at line 303 | skipping to change at line 315 | |||
df_cv = cross_validation(m, cutoffs=cutoffs, horizon='30 days', parallel="pr ocesses") | df_cv = cross_validation(m, cutoffs=cutoffs, horizon='30 days', parallel="pr ocesses") | |||
df_p = performance_metrics(df_cv, rolling_window=1) | df_p = performance_metrics(df_cv, rolling_window=1) | |||
rmses.append(df_p['rmse'].values[0]) | rmses.append(df_p['rmse'].values[0]) | |||
# Find the best parameters | # Find the best parameters | |||
tuning_results = pd.DataFrame(all_params) | tuning_results = pd.DataFrame(all_params) | |||
tuning_results['rmse'] = rmses | tuning_results['rmse'] = rmses | |||
print(tuning_results) | print(tuning_results) | |||
``` | ``` | |||
changepoint_prior_scale seasonality_prior_scale rmse | changepoint_prior_scale seasonality_prior_scale rmse | |||
0 0.001 0.01 0.757489 | 0 0.001 0.01 0.757694 | |||
1 0.001 0.10 0.745049 | 1 0.001 0.10 0.743399 | |||
2 0.001 1.00 0.753315 | 2 0.001 1.00 0.753387 | |||
3 0.001 10.00 0.763111 | 3 0.001 10.00 0.762890 | |||
4 0.010 0.01 0.536260 | 4 0.010 0.01 0.542315 | |||
5 0.010 0.10 0.538103 | 5 0.010 0.10 0.535546 | |||
6 0.010 1.00 0.544326 | 6 0.010 1.00 0.527008 | |||
7 0.010 10.00 0.520970 | 7 0.010 10.00 0.541544 | |||
8 0.100 0.01 0.524669 | 8 0.100 0.01 0.524835 | |||
9 0.100 0.10 0.521302 | 9 0.100 0.10 0.516061 | |||
10 0.100 1.00 0.520692 | 10 0.100 1.00 0.521406 | |||
11 0.100 10.00 0.515338 | 11 0.100 10.00 0.518580 | |||
12 0.500 0.01 0.532103 | 12 0.500 0.01 0.532140 | |||
13 0.500 0.10 0.528939 | 13 0.500 0.10 0.524668 | |||
14 0.500 1.00 0.525256 | 14 0.500 1.00 0.521130 | |||
15 0.500 10.00 0.524619 | 15 0.500 10.00 0.522980 | |||
```python | ```python | |||
# Python | # Python | |||
best_params = all_params[np.argmin(rmses)] | best_params = all_params[np.argmin(rmses)] | |||
print(best_params) | print(best_params) | |||
``` | ``` | |||
{'changepoint_prior_scale': 0.1, 'seasonality_prior_scale': 10.0} | {'changepoint_prior_scale': 0.1, 'seasonality_prior_scale': 0.1} | |||
Alternatively, parallelization could be done across parameter combinations by pa rallelizing the loop above. | Alternatively, parallelization could be done across parameter combinations by pa rallelizing the loop above. | |||
The Prophet model has a number of input parameters that one might consider tunin g. Here are some general recommendations for hyperparameter tuning that may be a good starting place. | The Prophet model has a number of input parameters that one might consider tunin g. Here are some general recommendations for hyperparameter tuning that may be a good starting place. | |||
**Parameters that can be tuned** | **Parameters that can be tuned** | |||
- `changepoint_prior_scale`: This is probably the most impactful parameter. It d etermines the flexibility of the trend, and in particular how much the trend cha nges at the trend changepoints. As described in this documentation, if it is too small, the trend will be underfit and variance that should have been modeled wi th trend changes will instead end up being handled with the noise term. If it is too large, the trend will overfit and in the most extreme case you can end up w ith the trend capturing yearly seasonality. The default of 0.05 works for many t ime series, but this could be tuned; a range of [0.001, 0.5] would likely be abo ut right. Parameters like this (regularization penalties; this is effectively a lasso penalty) are often tuned on a log scale. | - `changepoint_prior_scale`: This is probably the most impactful parameter. It d etermines the flexibility of the trend, and in particular how much the trend cha nges at the trend changepoints. As described in this documentation, if it is too small, the trend will be underfit and variance that should have been modeled wi th trend changes will instead end up being handled with the noise term. If it is too large, the trend will overfit and in the most extreme case you can end up w ith the trend capturing yearly seasonality. The default of 0.05 works for many t ime series, but this could be tuned; a range of [0.001, 0.5] would likely be abo ut right. Parameters like this (regularization penalties; this is effectively a lasso penalty) are often tuned on a log scale. | |||
- `seasonality_prior_scale`: This parameter controls the flexibility of the seas onality. Similarly, a large value allows the seasonality to fit large fluctuatio ns, a small value shrinks the magnitude of the seasonality. The default is 10., which applies basically no regularization. That is because we very rarely see ov erfitting here (there's inherent regularization with the fact that it is being m odeled with a truncated Fourier series, so it's essentially low-pass filtered). A reasonable range for tuning it would probably be [0.01, 10]; when set to 0.01 you should find that the magnitude of seasonality is forced to be very small. Th is likely also makes sense on a log scale, since it is effectively an L2 penalty like in ridge regression. | - `seasonality_prior_scale`: This parameter controls the flexibility of the seas onality. Similarly, a large value allows the seasonality to fit large fluctuatio ns, a small value shrinks the magnitude of the seasonality. The default is 10., which applies basically no regularization. That is because we very rarely see ov erfitting here (there's inherent regularization with the fact that it is being m odeled with a truncated Fourier series, so it's essentially low-pass filtered). A reasonable range for tuning it would probably be [0.01, 10]; when set to 0.01 you should find that the magnitude of seasonality is forced to be very small. Th is likely also makes sense on a log scale, since it is effectively an L2 penalty like in ridge regression. | |||
End of changes. 22 change blocks. | ||||
69 lines changed or deleted | 81 lines changed or added |