Statsmodels Linear Regression - Patsy Formula To Include All Predictors In Model
Say I have a dataframe (let's call it DF) where y is the dependent variable and x1, x2, x3 are my independent variables. In R I can fit a linear model using the following code, and
Solution 1:
I haven't found .
equivalent in patsy documentation either. But what it lacks in conciseness, it can make-up for by giving strong string manipulation in Python. So, you can get formula involving all variable columns in DF
using
all_columns = "+".join(DF.columns - ["y"])
This gives x1+x2+x3
in your case. Finally, you can create a string formula using y
and pass it to any fitting procedure
my_formula = "y~" + all_columns
result = lm(formula=my_formula, data=DF)
Solution 2:
No this doesn't exist in patsy yet, unfortunately. See this issue.
Solution 3:
As this is still not included in patsy
, I wrote a small function that I call when I need to run statsmodels
models with all columns (optionally with exceptions)
def ols_formula(df, dependent_var, *excluded_cols):
'''
Generates the R style formula for statsmodels (patsy) given
the dataframe, dependent variable and optional excluded columns
as strings
'''
df_columns = list(df.columns.values)
df_columns.remove(dependent_var)
for col in excluded_cols:
df_columns.remove(col)
return dependent_var + ' ~ ' + ' + '.join(df_columns)
For example, for a dataframe called df
with columns y, x1, x2, x3
, running ols_formula(df, 'y', 'x3')
returns 'y ~ x1 + x2'
Post a Comment for "Statsmodels Linear Regression - Patsy Formula To Include All Predictors In Model"