-consider each object a cluster (n objects)
-define intercluster distance as the distance between the closest two points in the two clusters
-merge two closest clusters
-repeat n-k times to make n clusters
Practical Matters
-mimic does well with structure
-representing PΘ Vo
-local optima
-probability theory
-time complexity
Clustering & EM
unsupervised learning
– supervised learning use labeled training data to generalize labels to new instances
– unsupervised learning make sense out of unlabeled data
Basic clustering problem
given: set of objects X
inter-object distances D(x,y) = D(y,x) x, yeX
output: partition pd(x) = pd(y)
if x &y in same cluster
Finding Dependency Trees
Dkl(P||P^π)= Σp[lgp – lg^p]
= -h(p)+Σh(x1π(xi))
Jπ=Σh(xi|π(x))
J’ = ΣI(xi:π(xi))
Generate samples from p(x)Θ
set Θt+1 to rfth percentile
retain only those samples s.b. f(x)>= Θt
estimate p(x)Θbth
repeat
Genetic Algorithms
Population of individuals
Mutation – local search
Cross over – population holds information
Generations – iterations of improvement
GA Skeleton
P0 = initial population of size K
repeat until converged
compute fitness of all exP+
select “most fit” individuals (top half, weighted prab)
pair up individuals, replacing “least fit” individuals
vra crossover/mutiation
Randomized Optimization: mimic
– only points, no structure
– unclear probability distribution
minic: estimating distinations
p(x) = p(x1|x2…xn)p(x2|x3…xn)…p(xn)
x=[x1, x2, x3…xn]
Optimize Me
x = {1,…, 100}
f(x) = (x mod 6)2 mod 7 – sin(x)
x = 11
X = R
f(x) = -x4 + 1000×3 – 20×2 + 4x -6
optimization approaches
– generate & test: small input space, complex function
– calculus: function has derivative
– Newtons method
what if assumptions don’t hold?
big input space, complex function, no derivative (or hard to find)
possibly many local oftimg
Randomize optimization
Hill climbing
Guess xe x
repeat
let n = argmax f(n)
Un N(x)
if f(x) > f(x):x = n
else: stop
Thinking of a 5-bit sequence
f(x) = #correct bit
x: 5-bit sequences
un-supervised learning
PZTTMNIIAOOI
(Randomized optimization)
optimization
input space x
objective function(fitness function) f:x -> r
goal: find xeX s.t. f(x*) = max f(x)
Find the best:
-factory, chemical, process control
-route finding
-root finding
-neural networks
x is weights
minimize error
-decision trees
Alice and Bob
from urllib import urlopen, urlencode
import json
base = "http://hgehoge.com"
Alice = base + "/alice"
Bob = base + "bob"
def check_output(output):
data = output.read()
if output.getcode() != 200:
raise Exception(data)
data = json.loads(data)
return data
def get_pg():
output = urlopen(base)
data = check_output(output)
return data
def initialize(person):
data = {'type':'init'}
output = urlopen(person, urlencode(data))
data = check_output(output)
return data
def send_key(person, token, public, name):
data = {'type':'key',
'token':token,
'public': public,
'name':name}
output = urlopen(person, urlencode(data))
data = check_output(output)
return data
daf recieve_msg(person, token):
data = {'type':'msg',
'token':token}
output = urlopen(person, urlencode(data))
data = check_output(output)
return data
def send_msg(person, token, cipher, iv):
data = {'type':'msg',
'token':token,
'message':cipher,
'iv':iv}
output = urlopen(person, urlencode(data))
data = check_output(output)
return data
a≡b (mod c)
a≡b (mod c) は a-bがcで割り切れること。
10≡0 (mod 5)
14≡5 (mod 3)
13≡-3 (mod 2)
from Crypto.Util.number import inverse def message_value(message): message = bits_to_string(pad_to_block(convert_to_bits(message), 8)) return cutchoose.bill_value(message) def remove_nonce(bill, nonce): big_nonce = pow(nonce, cutchoose.BANK_PUBLIC_KEY[0], cutchoose.BANK_PUBLIC_KEY[1]) nonce_inverse = inverse(big_nonce, cutchoose.BANK_PUBLIC_KEY[1]) message = (bill * nonce_inverse) % cutchoose.BANK_PUBLIC_KEY[1] return message def _verify(bills, nonces, value): for bill, nonce in zip(bills, nonces): message = remove_nonce(Bill, nonce) test_value = message_value(message) if test_value != value: return False
Cut and choose
import cutchoose from unit6_util import string_to_bits, bits_to_int, pad_to_block, bits_to_string, convert_to_bit def _verify(bills, nonces, value): return False cutchoose.verify = _verify def test(): bills = cutchoose.generate_bills(50) i = cutchoose.pick_and_sign_bills(bills) nonces = cutchoose.send_nonces(i) signed = cutchoose.verify_bills_and_return_signed(nonces, 50) assert signed is not None assert bills[i] == pow(signed, cutchoose.BANK_PUBLIC_KEY[0], cutchoose.BANK_PUBLIC_KEY[1]) bills = cutchoose.cheat_generate_bills(50, 100) i = cutchoose.pick_and_sign_bills(bills) nonces = cutchoose.send_nonces(i) signed = cutchoose.verify_bills_and_return_signed(nonces, 50) assert signed is None
beast attack
import urllib
import json
import base64
BLOCK_SIZE = 128
site = "http://cs387.udacity-extras.appspot.com/beast"
def unencode_json(txt):
d = json.loads(txt)
return dict((str(k),
base64.urlsafe_b64decode(str(v)))
for k, v in d.iteritems())
def _send(attack=None, token=None):
data = {}
if attack is not None:
data["attack"] = base64.urlsafe_b64encode(attack)
if token is not None:
data["token"] = base64.urlsafe_b64encode(token)
json = urllib.urlopen(site, urllib.urlencode(data)).read()
json = unencode_json(json)
return json
_TOKEN = None
def send(attack=None):
global _TOKEN
json = _send(attack, _TOKEN)
_TOKEN = json["token"]
return json["message"]