Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am comparing SARIMAX fitting results between R (3.3.1) forecast package (7.3) and Python's (3.5.2) statsmodels (0.8).

The R-code is:

    library(forecast)
    data("AirPassengers")
    Arima(AirPassengers, order=c(2,1,1), seasonal=list(order=c(0,1,0), 
    period=12))$aic
    [1] 1017.848

The Python code is:

    from statsmodels.tsa.statespace import sarimax
    import pandas as pd
    AirlinePassengers = 
    pd.Series([112,118,132,129,121,135,148,148,136,119,104,118,115,126,
              141,135,125,149,170,170,158,133,114,140,145,150,178,163,                       
              172,178,199,199,184,162,146,166,171,180,193,181,183,218,
              230,242,209,191,172,194,196,196,236,235,229,243,264,272,
              237,211,180,201,204,188,235,227,234,264,302,293,259,229,
              203,229,242,233,267,269,270,315,364,347,312,274,237,278,
              284,277,317,313,318,374,413,405,355,306,271,306,315,301,
              356,348,355,422,465,467,404,347,305,336,340,318,362,348,
              363,435,491,505,404,359,310,337,360,342,406,396,420,472,
              548,559,463,407,362,405,417,391,419,461,472,535,622,606,
              508,461,390,432])
    AirlinePassengers.index = pd.DatetimeIndex(end='1960-12-31', 
                              periods=len(AirlinePassengers), freq='1M')
    print(sarimax.SARIMAX(AirlinePassengers,order=(2,1,1),
          seasonal_order=(0,1,0,12)).fit().aic)

Which throws an error: ValueError: Non-stationary starting autoregressive parameters found with enforce_stationarity set to True.

If I set enforce_stationarity (and enforce_invertibility, which is also required) to False, the model fit works but AIC is very poor (>1400).

Using some other model parameters for the same data, e.g., ARIMA(0,1,1)(0,0,1)[12] I can get identical results from R and Python with stationarity and invertibility checks enabled in Python.

My main question is: What explains the difference in behavior with some model parameters? Are statsmodels' invertibility checks different from forecast's Arima and is the other somehow "more correct"?

I also found a pull request related to fixing an invertibility calculation bug in statsmodels: https://github.com/statsmodels/statsmodels/pull/3506

After re-installing statsmodels with the latest source code from Github, I still get the same error with the code above, but setting enforce_stationarity=False and enforce_invertibility=False I get aic of around 1010 which is lower than in the R case. But model parameters are also vastly different.

This is an issue related to starting value generation procedure and can be solved easiest by using manual starting parameters: model.fit(start_params=[0, 0, 0, 1]) More information, see: groups.google.com/forum/#!topic/pystatsmodels/S_Fo53F25Rk – cuckoops Sep 7, 2017 at 6:15 I have the same error on this model with exogenous regressors, but not without. I am trying to pass the starting parameters of the latter to .fit() but I get an index out of bound error. Do you know how to choose the starting parameters? how many should they be? modx = sm.tsa.statespace.SARIMAX(air.visit_mean[:450], trend='c', exog=hol.holiday_flg[:450], order=(2,0,2), seasonal_order=(0,1,1,7)); resultsx = modx.fit(start_params=mod.start_params) – aless80 Dec 8, 2017 at 3:06 It looks like the number of starting parameters is AR + MA in order times the number of exogenous series – aless80 Dec 8, 2017 at 3:19

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.