Features != Information

There are two big univariate feature selection tools in sklearn: SelectPercentile and SelectKBest. The difference is pretty apparent by the names: SelectPercentile selects the X% of features that are most powerful (where X is a parameter) and SelectKBest selects the K features that are most powerful (where K is a parameter).

high bias
pays little attention to data over simplified, high error on training set
high variance
pays too much attention to data(does not generalize well) over fit

Regularization in Regression
method for automatically penalizing extra features
-Lasso Regression: minimize SSE + γ|β|

m1 – m4: coefficients of regression
x1-x4: features

import sklearn, linear_model Lasso
features, labes = GetMyData()
reguression = Lasso()
regression fit(features)
regression predict([2, 4])