Computational Learning Theory

Mondrain Composition
https://www.khanacademy.org/humanities/ap-art-history/later-europe-and-americas/modernity-ap/a/mondrian-composition
Colored Vornoi Diagram

support vector machines SVMs perceptron
Nearest neighbor 1-NN
decision trees

-defining learning problems
-showing specific algorithms work
-show these problems are fundamentally hard

Theory of computing analyzes how use resources: time, space, o(nlogn), o(n^2)

Inductive learning
1.probability of successful training
2.number of examples to train on
3.complexity of hypothesis class
4.accuracy to which target concept is approximated
5.manner in which training examples presented
6.manner in which training examples selected

computational complexing
– how much computational effort is needed for a learner to coverage?
sample complexing -batch
– how many training examples are needed for a learner to create a successful hypothesis
mistake bounds – online
– how many misclassfications can a learner make over an infinite run

true hypothesis: ceH Training set:s<=X candidate hypothesis: heH consistent learner: produce c(x)=h(x) far xeS version space: VS(s) = {h s.t. heH consistent wants} hypotheses consistent with examples errord(h) = Prx~d[c(x)=h(x)]

Support Vector Machines

y = w^t*x + b
label, parameter of the plane
w^t*x + b = 1
w^t*x + b = 0
w^t*x + b = -1
y = {-1, +1}

w^t*x1 + b = 1
w^t*x2 + b = -1
w^t(x1-x2)/||w|| = 2/||w|| margin

max 2/||w|| while classifying everything correctly
yi(w^t*x + b) >= 1
min 1/2 ||w||^2 quadratic programming
w(α) = Σi αi – 1/2 Σio αi*αu*yi*yu*xi^t*xu
s.t. αi>=Θ, Σi αi*yi = Θ

SVMs: Linearly Married
– margins : generalization overfitting
– big is better
– optimization problem for finding max margins: QPs
– support vectors

Ensemble learning boosting 

spam email {+, -}
sample rules: body “manly” +, from spouse -, short +, just URLs +, just image +, “pΘrn” +, “make money easy” +

1. Learn over a subset of data -> rule: uniformly randomly, pick data and apply a learner
2. combine: complex rule -> mean

Boosting
“hardest” examples
weighted mean

Error: mismatches
PrD[h(x) = c(x)]

Instance Based Learning

Before
1, 2, … n => f(x)

Now
1, 2, … n
f(x) = look up(x)

+ remember, fast, simple
– generalization, overfit

distance stand for similarity

K-NN
Given: Training data D = {xi, yi}
distance metric d(q, x)
number or neighbors k
query point q
– NN = {i: d(q, xi) k smallest}
– Retrun
– classification: ploraity
– regression: mean

Preference Bias
+ locality -> near points are similar
+ smoothness -> averaging
+ an features matter equally

Curse of dimensionality
As the number of features or dimensions grows, the amount of data we need to generalize accurately grow exponentially

Neural Networks

Neural Networks
cell body, neuron, axon, synapses, spike trains
computational unit
Artificial Neural networks

x1, x2, x3 -> Θ(unit: Perceptron) -> y
Σi=1 Xi・Wi (activation) >= Θ firing
output yes: y:1, no: y:0

e.g.
x1 = 1, X2 = 0, X3 = -1.5, w1 = 1/2, w2 = 3/5, w3 = 1
1*1.2 + 0*3.5-1.5*1 = -1
Θ = 0
out put should be y=0

How powerful is a perceptron unit

Regression & Classification

Regression
supervised learning: take examples of inputs and outputs. Now, given a new input, predict its output.

Mapping continuous inputs to outputs.
discrete, continuous

child Height, parent height
2/3 < 1, regression to mean Reinforcement learning Regression in machine learning Finding the best constant function f(x) = c E(c) = Σi=1(yi-c)^2 LOSS, ERROR Order of polynomial k = 0:constant k = 1:line k = 2:parabola f(x) = c0 + cix + c2x^2 + ... ckX^k polynomial regression c0 + c1x + c2x^2 + c3x^3 = y Errors Training data has errors not modeling f, but f + ε where do errors come from? sensor error Cross Validation Fundamental assumption use a model that is complex enough to fit the data without causing problems on the test set -training error -cross validation error -> scalar input, continuous
-> vector input, continuous
include more input features (size, distance from zoo)

predict credit score
job? age? assets?
-> distance, vector or scalar

ID3

LOOP:
-A <- best attribute -Assign A as decision attribute for Node -For each value of A create a deescalate of node -Sort training examples to create -If examples perform classified stop else iterate over leaves gain(s,a) = entropy(s) - Σv |Sv| / |S| entropy(Sv) -Σv P(v) logP(v) ID3: Bias INDUCTIVE BIAS Restriction bias: H Preference bias: -good spots at top -correct over incorrect -shorter trees Decision trees: other considerations - continuous attributes? e.g. age, weight, distance When do we stop? -> everything classified correctly!
-> no more attribute!
-> no overfitting
Regression
splitting? variance?
output average, local linear fit

Decision Trees

Supervised Learning
classification: true or false
regression

Credit history: lend money? -> classification: binary task

classification learning
– instances (input)
– concept function -> T,F
– target concept -> actual answer
– hypothesis -> class, all functions
– sample (training set)
– candidate: concept = target concept
– testing set

Decision Tree
entry: type italian, french, thai
atmosphere: fancy, hiw, casual
occupied
hot date?
cost, hungry, raining

node -> values -> attribute

representation vs algorithm

Decision Trees: Learning
1. Pick best attribute
Best ~ splits the data
2. Asked question
3. Follow the answer path
4. Go to 1
on til got an answer

Decision trees: Expressioness
Boolean
A and B, A or B, A xor B

n-or:any, n-xor:parity(odo)

XOR is hard, n attributes(boolean) o(n!), how many trees?, output is boolean

Truth table
a1, a2, a3, …△n, output
y, t, t … t
t, t, t … t

Philosophy of Machine Learning

Theoretical, Pratical
What is machine learning? × Proving theorems
computational statistics
broader notion of building computational artifacts that learn over time based on experience.

-supervised learning
-unsupervised learning
-reinforcement learning

1:1, 2:4, 3:9, 4:16, 5:25, 6:36
output <- input ^2 induction and deduction supervised learning = approximation unsupervised learning = description pixels -> Function approximator -> labels
Reinforcement learning

Optimization
supervised learning: labels data well
reinforcement learning: behavior scores well
unsupervised learning: cluster wrests well

Localization Tools

project preparation tools
project execution tools
quality assurance tools

translation management, terminology management, translation memory management

In-Context Match Exact(ICE): 100% with context
Exact Match:100%
Fuzzy Match: lower than 99%
No Match: all new words