ソフトウェアエンジニアの技術ブログ：Software engineer tech blog – Page 645 – 随机应变 ABCD: Always Be Coding and … : хороший

Dimensionality reduction

Dimensional reduction
local linear embedding
iso map

cluster by affinity
do em/kneans succeed
in finding the 2 closure

Affinity matrix

dimentionality for large environment

supervised vs unsupervised learnings

Maximum likelihood

Maximum likelihood
3, 4, 5, 6, 7
m = 5
μ = 5
σ2 = 2

3, 9, 9, 3
μ = 6
σ2 = 9

Gaussians
– functional form
– fit from data
– multivariate gaussians

Expectation maximization
P(x) = Σi=i k P(c=i)p(x|C=i)
πi μiΣi

EM versus K-mean

minimize: -Σj log p(xjlσΣ1k)+ cosf k
guess
run EM
remove

clustering
– k-means, em

Unsupervised learning

Unsupervised learning
-constructure

density estimation
-clustering
-dimensionality reduction
blind separation

K Means Clustering
– need to know k
– local minimum
– high dimentionality
– lack of mathematics

Gaussian Learning
pacamakes of a gaussian
f(x1u102)=1/√2πΘ exp(x-μ)2/2α2

μ=1/m mΣj=1 xg

Data x1…xm p(x1…xm|μ1Θ2)=πi f(xi|μ1Θ2)=(1/2πΘ2)m/2 exp – Σπ(xi-μ)2/2α2
m/2 log 1/2πα2 – 1/2α2 mΣi=i(xi-μ)2

minimize more complicated loss function

Gradient
L = Σj(yj – w1x0 – w0)2 ->min
ΘL/w1 = -2Σj(yj-w1xj-w0)xj
ΘL/w0 = -2Σj(yj-w1xj-w0)

Perception algorithm
Linear seperator
w1x + w0 >= 0
0 if w1x + w0 < 0 Linear function Linear Method -regression vs classification -exact solution vs iterative solution -smoothing -non-linear problems Supervised Learning -> parametic

KNN definition
learning: memorize all data

Problems of KNN
-very large data set
kdd trees
-very large feature spaces

Quadratic Loss

Minimize quadratic loss
minΣ(yi-w1xi-w0)2 = L
ΘL/Θw0
Σxiyi – 1/m ΣyiΣxi – w/m(Σxi)2 = w1Σxi2

f(x)= w1X + w0
w0 = 3
w1 = -1

Sum(x_i y_i) – (1/M) Sum(y_i) Sum(x_i) + (w_1/M)( Sum(x_i) )^2 = w_1 Sum(x_i^2)

Regularization
loss = loss(data)+loss(parameters)
Σj(yi-wixi-w0)^2 + Σi|wi|p

Advanced spam filters

advanced spam filters
-know spamming ip?
-have you emailed reason before?
-have other people received same message?
-email header consistent
-all caps
-do inline urls point to where they say?
-are you addressed by name?

Digit recognition
-input vector = pixel values
16 x 16

over fitting prevention
-Occam’s razor k?
cross validation

supervised learning
->classification yie{0,1}
->regression yie[0,1] eR

f(x) = w1X + w0
w0 = 3, w1= -1

Linear Regression
Data f(x)=w1x + w0, f(x)=wx+w0
y = f(x)
Loss = Σj(yj-x1xg-w0)2

Relationship to bayes network

Bayes Network
-offer is secret click sports
p(“secret”|spam) = 1/3

Dictionary has words
p(spam) ~ 1
p(wi|spam) ~ 11
p(wi|ham) ~ 11

message m=”sports”
p(spam|m) = 0.1667 or 3/18
= p(m|spam) p(spam) / P(m|spam)p(spam)+(m|ham)p(ham)

m = “secret is secret”
p(spam | m) = 25 /26

laplace smoothing
ml p(x) = count(x)/n

LS(k) p(x) = count(x) + k / (n + k|x|)
1 message 1 spam p(spam) = 2/3
10 message 6 spam p(spam) = 7/12
100 message 60 spam p(spam) = 61/102

k = 1, p(spam) = 2/5 p(ham) = 3/5 p(“today”|spam) = 1/21 p(“today”|ham) = 3/27

M = “today is secret” P(spam|m)= 0.4858

summary naive bayes

Supervised Learning

Supervised learning
x1, x2, x3 … xn -> y1
x21 x22 x23 … x2n -> y2
->Xm

f(xm) = ym

OCCAM’S RAZOR
everything eles begins equal
choose the less complex hypothesis
fit <-> low complexity
generalization error and over fitting error.

SPAM, HAM
offer is secret, play sports today
click secret link, went play sports
secret sports link, secret sport event, sport is today, sport costs money

P(spam) = 3/8

Maximum likehood
ssshhhhh p(s)=π
p(yi) = {π if yi = s, 1-π if yi = h
p(yi) = πyi ・(1-π)1-yi
p(date) = πi=1p(yi)= πcount(yi=1) (1-π)count(yi=0)=π3・(1-π)5

MLSolutions for
P(“secret” | spam) = 1/3
P(“secret” | ham) = 1/15

Machine Learning

Machine learning
-> Bayes networks = reason with know models
-> Machine learning = learn models from data

Supervised Learning
Unsupervised Learning

Famous for using machine learning
-google web mining
-netflix DVD recommendations
-Amazon Product placement

Machine Learning
what?
->parameters, structure, hidden concepts
what form?
->supervised unsupervised reinforcement
what for?
-> production diagnostics sumarization…
How?
-> passive, active, online, offline
output?
-> classification, regression
Details
->generative, distinguish

Approximate Inference Sampling

Approximate Inference Sampling
P(B|+a)
Likelihood weighting
Inconsistent

GIBBS Sampling
Markov chain monte carlo mcmc
+c +s -r -w
+c -s -l -w
+c -s +r -w

P(A) = 0.5, P(B|A)=0.2, P(B|¬A)=0.8
P(¬A)=1-P(A)=0.5
P(A|B)=0.2
P(A|B)=P(B|A)P(A)/P(B)=0.2*0.5/P(B|A)P(A)+P(B|¬A)P(¬A)
=0.1/(0.1+0.8+0.5)=0.2

Simple Bayes Net
P(A)=0.5, Vi:P(Xi|A)=0.2, P(Xi|¬A)=0.6
P(A|X1 X2 ¬X3)=P(¬X3|A)P(A|X1X2)
P(¬A|x1X2¬X3)αP(¬X3hA)P(x2|¬A) P(x1|¬A)P(¬A)