Principal Component Analysis – PCA
Dimensional of data:2
x = 2
y = 3
Δx = 1
Δy = 2
square footage + No.Rooms -> Size
How to determine the principal component
variance – the willingness/flexibility of an algorithm to learn
technical term in statistics – roughly the “spread” of a data distribution(similar to standard duration)
– maximum variance and information loss
def doPCA(): from sklearn.decomposition import PCA pca = PCA(n_components=2) pca.fit(data) return pca pca = doPCA() print pca.explained_variance_ratio_ first_pc = pca.component_[0] second_pc = pca.components_[1] transformed_data = pca.transform(data) for ii, jj in zip(transofrmed_data, data): plt.scatter( first_pc[0]*ii[0], first_pc[1]*ii[0], color="r") plt.scatter( second_pc[0]*ii[1], second_pc[1]*ii[1], color="c") plt.scatter( jj[0], jj[i], color="b") plt.xlabel("bonus") plt.ylabel("long-term incentive") plt.show()