r^2 of a regression – ソフトウェアエンジニアの技術ブログ：Software engineer tech blog

r^2
how much of my change in the output(y) is explained by the change in my input(x)

0.0 < r^2 < 1.0 classification & regression property, supervised classification, regression output type, discrete(class labels), continuous(number) what are you trying to find?, decision boundary, best fit line evaluation, accuracy, "sum of squared error" r^2 Regression multi-variate age, IQ, education -> net worth

Multi-variate regression
y = 5×1 + 2.5×2 – 200

y = House Price
y = x1 – 10×2 + 500

import sys
import pickle
sys.path.append("../tools/")
from feature_format import featureFormat, targetFeaturSplit
dictionary = pickle.load( open("../final_project/final_project_dataset_modified.pkl","r"))

features_list = ["bonus", "salary"]
data = featureFormat( dictionary, features_list, remove_any_zeroes=True)
target, features = targetFeatureSplit( data )

from sklearn.cross_validation import tarain_test_split
feature_train, feature_test, target_train, target_test = train_test_split(features, target, test_size=0.5, random_state=42)
train_color = "b"
test_color = "b"

import matplotlib.pyplot as plt
for feature, target in zip(feature_test, target_test):
	plt.scatter( feature, target, color=test_color )
for feature, target in zip(feature_train, target_train):
	plt.scatter( feature, target, color=train_color )

plt.scatter(feature_test[0], target_test[0], color=test_color, label="test")
plt.scatter(feature_test[0], target_test[0], color=train_color, label="train")

try:
	plt.plot( feature_test, reg.predict(feature_test) )
except NameError:
	pass
plt.xlabel(features_list[1])
plt.ylabel(features_list[0])
plt.legend()
plt.show()