相关文章推荐
耍酷的芹菜  ·  [Python] async def & ...·  2 周前    · 
乐观的竹笋  ·  如何快速上手MuJoCo ...·  1 周前    · 
着急的跑步鞋  ·  Python ...·  6 天前    · 
年轻有为的海龟  ·  使用Python ...·  5 天前    · 
稳重的佛珠  ·  JAR will be empty - ...·  1 年前    · 
谦虚好学的火柴  ·  Android ...·  2 年前    · 
Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Getting "Perfect separation detected, results not available" while building the Logistic Regression model

Ask Question

As part of my assignment I am building logistic regression model but I am getting an error "Perfect separation detected, results not available" while building it.

**X_train :-**
      year     amt_spnt      rank
1   -1.723034   -0.418500   0.272727
2   0.716660    2.088507    -0.636364
3   1.174102    -0.558333   -1.545455
4   -0.503187   -1.297451   1.181818
5   1.326583    -0.628250   -1.545455
**y_train :-** 
1    0
2    1
3    1
4    0
5    1
Name: result, dtype: int64
**Logistic Model code:-** 
import statsmodels.api as sm
logm1 = sm.GLM(y_train,(sm.add_constant(X_train)), family = sm.families.Binomial())
logm1.fit().summary()
**Dataset before and after scaling**

This is a model setting issue, because of the perfect separation, your model can not converge. Perfect separation means there is one (or more) variable in your independent variables that can perfectly distinct dependent variable = 0 from dependent variable = 1. See the following example:

Y 0 0 0 0 0 0 1 1 1 1

X 1 2 3 4 4 4 5 6 7 8

If X <= 4, Y = 0

If X > 4, Y = 1

A short answer to your question is to find such variable in your independent variable and remove it from your model.

Thank you for your response. But I can't see such feature in my dataset. I have edited my question with full dataset (of 10 rows in total) before scaling and after scaling. Could you please help me is there any variable that is causing the issue? – Upendra Dama Apr 12, 2020 at 16:45 Hi, I simplified what is a perfect separation issue, your data does not seem to have the issue I describe above, but it is a quasi-complete separation issue which caused by a combination of independent variables. I did not often use 'statsmodel' for modeling, but I was trying to do the modeling in other software, and it turns out that "year" is the variable that causes the perfect separation issue. After I moved "year", the model did not converge either, and it is "rank" that still causes the perfect separation issue. – Neo Apr 12, 2020 at 19:40 In statistics, it is usually because your sample size is small and one or a combination of IVs can almost perfectly predict the DV. Usually, there are three ways for this issue: 1. increase sample size so that one or a combination of IVs are less likely to predict the DV; 2. delete the IVs that cause perfect separation, in this case, "year" and "rank"; 3. recode the IVs that cause perfect separation. It will be helpful if you can give a little bit background on how the DV and IVs are and how they are measured. – Neo Apr 12, 2020 at 19:40

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.