PCA

Principal Component Analysis – PCA
Dimensional of data:2

x = 2
y = 3
Δx = 1
Δy = 2

square footage + No.Rooms -> Size

How to determine the principal component
variance – the willingness/flexibility of an algorithm to learn
technical term in statistics – roughly the “spread” of a data distribution(similar to standard duration)

– maximum variance and information loss

def doPCA():
	from sklearn.decomposition import PCA
	pca = PCA(n_components=2)
	pca.fit(data)
	return pca

pca = doPCA()
print pca.explained_variance_ratio_
first_pc = pca.component_[0]
second_pc = pca.components_[1]

transformed_data = pca.transform(data)
for ii, jj in zip(transofrmed_data, data):
	plt.scatter( first_pc[0]*ii[0],  first_pc[1]*ii[0], color="r")
	plt.scatter( second_pc[0]*ii[1], second_pc[1]*ii[1], color="c")
	plt.scatter( jj[0], jj[i], color="b")

plt.xlabel("bonus")
plt.ylabel("long-term incentive")
plt.show()