negative correlation

How much two variable related each other.
always use correlation in statistic.
x= 3, 4, 5
y= 8, 5, 3

r = Σi[(xi-x)(yi-y)] / √Σi[(xi-x)(yi-y)]
x=1/N Σ Xi
y=1/N Σ Yi

x-x -1, 0, 1
y-y 3, 0, -3

= -6 / √ 2 * 18
= -1

x= 3, 4, 5
y= 8, 5, 8
r= 0

x= 3, 4, 5 xμ = 4
y= 8, 3, 7 yμ = 6

x-x -1, 0, 1
y-y 2, -3, 1

= -1 / √ 2 * 14
= -0.189

Correlation

Correlation
r = 1 positive linear
r = 0 no relation within the data
r = -1 negative linear

b = 4
a = -3
y = 4x – 3
r is positive

correlation efficient
value between -1 and 1

r = Σi[(xi-x)(yi-y)] / √Σi[(xi-x)(yi-y)]
x=1/N Σ Xi
y=1/N Σ Yi

if it is
x 3, 4, 5
y 7, 8, 9

2 / √2 * 2 = 1
we know r is between -1 and 1.

x= 3, 4, 5
y= 2, 5, 8
x-x -1, 0, 1
y-y -3, 0, 3

= 6 / √ 2 * 6
= 1

Hypothesis Testing

sample -> statistic -> information -> statistic -> decision

weight loss 90% guaranteed
Yes 11, No 4
H0: p = 0.9
H1: p < 0.9 critical regions Null hypothesis [python] from math import sqrt def mean(l); return float(sum(l))/len(l) def var(l): m = mean(l) return sum([(x-m)**2 for x in l])/len(l) def factor(l): return 1.96 def conf(l): return factor(l) * sqrt(val(l)/ len(l)) def test(l, h): m = mean(l) c = conf(l) return abs(h-m) <= c l = [199, 200, 201, 202, 203, 204] print mean(l) print conf(l) [/python] 95% confidence candidate A 55 candidate B 45 1.96√p(1-p)/n = 1.96√.55-.44/100 = 9.75 Candidate A: 55 +/- 9.75

Estimation probability

Confidence Intervals

60% partyA +-3% -> confidence interval 57, 63 in %.
many often confidence interval become 95% chance.

suppose we increase the sample size N, size of CI shrink.

P=0.5, μ=0.5, σ^2=0.25
mean(ΣXi), Var(ΣXi), Var(1/nΣXi), std dev, CI
n=1, 0.5, 0.25, 0.25, 0.5, 0.98
n=2, 1, 0.5, 0.125, 0.35, 0.69
n=10, 5, 2.5, 0.025, 0.16, 0.31
1.96 magic number

π 3.14
e 2.718

calculate mean

# remove outliers
# extract data between lower and upper quartile

# fit Gaussian using MLE

# compute x that corresponds to standard score z
return x

import random
from math import sqrt

def mean(data):
	return sum(data)/len(data)

def variance(data):
	mu=mean(data)
	return sum([(x-mu)**2 for x in data])/len(data)

def stddev(data):
	return sqrt(variance(data))

weight=[80.,85,200,85,69,65,68,66,85,72,85,82,65,105,75,80,
    70,74,72,70,80,60,80,75,80,78,63,88.65,90,89,91,1.00E+22,
    75,75,90,80,75,-1.00E+22,-1.00E+22,-1.00E+22,86.54,67,70,92,70,76,81,93,
    70,85,75,76,79,89,80,73.6,80,80,120,80,70,110,65,80,
    250,80,85,81,80,85,80,90,85,85,82,83,80,160,75,75,
    80,85,90,80,89,70,90,100,70,80,77,95,120,250,60]

print mean(weight)

def calculate_weight(data, z):
	data.sort()
	lowerq = (len(data)-3)/4
	upperq = lowerq * 3 + 3
	newdata = [data[i] for i in range(lowerq, upperq)]

	mu = mean(newdata)
	sigma = stddev(newdata)

	x = mu + z * sigma
	return x

print calculate_weight(weight, -2.)

central limit theorem

coin:(0,1) P(Σi=k)= n!/(n-k)!k!
Pascal Triangle

flip a coin 1000 times
mean
standard deviation

import random
from math import sqrt

def mean(data):
	return float(sum(data))/len(data)

def variance(data):
	mu=mean(data)
	return sum([(float(x)-mu)**2 for x in data])/len(data)

def stddev(data):
	return sqrt(variance(data))

def flip(N):
    return [random.random() > 0.5 for x in range(N)]

N=1000
f=flip(N)

print mean(f)
print stddev(f)