When to use PCA
-> latent features driving the patterns in data
-> dimensional reduction
-> visualize high-dimensional data, reduce noise
-> make other algorithms(regression, classification) work better fewer inputs
PCA for facial recognition
X_train, X_test, y_train, y_test = train_split(X, y, test_size=0.25) n_components = 150 print "Extracting the top %d eigenfaces from %d faces" % (n_components, X_train.shape[0]) t0 = time() pca = RandomizePCA(n_components=n_components, whiten=True).fit(X_train) print "done in %0.3fs" % (time() - t0) eigenfaces = pca.components_.reshape((n_component, h, w)) print "Projecting the input data on the eigenfaces orthnormal basis" t0 = time() X_train_pca = pca.tranform(X_train) X_test_pca = pca.transform(X_test) print "done in %0.3fs" % (time() - t0) print "Fitting the classifier to the training set"
http://scikit-learn.org/stable/auto_examples/applications/face_recognition.html