MISE – ソフトウェアエンジニアの技術ブログ：Software engineer tech blog

Gaussian Kernel and Bandwidth
Mean Integrated Standard Error
E[|Pn-P|^2] = Ef(Pn(x)-P(x))^2*dx

We can technically use MISE or AMISE to select the Optimal Bandwidth

def MahalanobisDist(x, y):
	covariance_xy = np.cov(x,y, rowvar=0)
	inv_covariance_xy = np.linalg.inv(covariance_xy)
	xy_mean = np.mean(x),np.mean(y)
	x_diff = np.array([x_i - xy_mean[0] for x_i in x])
	y_diff = np.array([y_i - xy_mean[1] for y_i in y])
	diff_xy = np.transpose([x_diff, y_diff])
	md = []
	for i in range(len(diff_xy)):
		md.append(np.sqrt(np.dot(np.dot(np.transpose(diff_xy[i]), inv_covariance_xy),diff_xy[i])))
	return md

md = MahalanobisDist(x,xbar)

problem formulation => choice of los/ risk
purpose of the model => problem formulation

Identification
natural sciences, economics, medicine, some engineering

Prediction/Generalize
statistic/ machine learning, comple phenomenon, general applications

M
logistic regression, support vector machine, random forest

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

def init_models(X_train, y_train):
	models = [LogisticRegression(),
			RandomForestClassifier(),
			SVC(probability=True)]

	for model in models:
		model.fit(X_train, y_train)

	return models

models = init_models(X_train, y_train)

Learning Curves
Plot of the model performance:
The Risk or Cost or Score vs.
Size of Training Set and Test Set
Classifiers: Score or l- Score