## negative correlation

How much two variable related each other.
always use correlation in statistic.
x= 3, 4, 5
y= 8, 5, 3

r = Σi[(xi-x)(yi-y)] / √Σi[(xi-x)(yi-y)]
x=1/N　Σ Xi
y=1/N　Σ Yi

x-x -1, 0, 1
y-y 3, 0, -3

= -6 / √ 2 * 18
= -1

x= 3, 4, 5
y= 8, 5, 8
r= 0

x= 3, 4, 5 xμ = 4
y= 8, 3, 7 yμ = 6

x-x -1, 0, 1
y-y 2, -3, 1

= -1 / √ 2 * 14
=　-0.189

## Correlation

Correlation
r = 1 positive linear
r = 0 no relation within the data
r = -1 negative linear

b = 4
a = -3
y = 4x – 3
r is positive

correlation efficient
value between -1 and 1

r = Σi[(xi-x)(yi-y)] / √Σi[(xi-x)(yi-y)]
x=1/N　Σ Xi
y=1/N　Σ Yi

if it is
x 3, 4, 5
y 7, 8, 9

2 / √2 * 2 = 1
we know r is between -1 and 1.

x= 3, 4, 5
y= 2, 5, 8
x-x -1, 0, 1
y-y -3, 0, 3

= 6 / √ 2 * 6
= 1

## Hypothesis Testing

sample -> statistic -> information -> statistic -> decision

weight loss 90% guaranteed
Yes 11, No 4
H0: p = 0.9
H1: p < 0.9 critical regions Null hypothesis [python] from math import sqrt def mean(l); return float(sum(l))/len(l) def var(l): m = mean(l) return sum([(x-m)**2 for x in l])/len(l) def factor(l): return 1.96 def conf(l): return factor(l) * sqrt(val(l)/ len(l)) def test(l, h): m = mean(l) c = conf(l) return abs(h-m) <= c l = [199, 200, 201, 202, 203, 204] print mean(l) print conf(l) [/python] 95% confidence candidate A 55 candidate B 45 1.96√p(1-p)/n = 1.96√.55-.44/100 = 9.75 Candidate A: 55 +/- 9.75

## magic number

Mean +/- 1.96σ/√n -> normal
μ=1/N Σ Xi

P(μ

## Estimation probability

Confidence　Intervals

60% partyA +-3% -> confidence interval 57, 63 in %.
many often confidence interval become 95% chance.

suppose we increase the sample size N, size of CI shrink.

P=0.5, μ=0.5, σ^2=0.25
mean(ΣXi), Var(ΣXi), Var(1/nΣXi),　std dev, CI
n=1, 0.5, 0.25, 0.25, 0.5, 0.98
n=2, 1, 0.5, 0.125, 0.35, 0.69
n=10, 5, 2.5, 0.025, 0.16, 0.31
1.96 magic number

π 3.14
e 2.718

## calculate mean

# remove outliers
# extract data between lower and upper quartile

# fit Gaussian using MLE

# compute x that corresponds to standard score z
return x

```import random
from math import sqrt

def mean(data):
return sum(data)/len(data)

def variance(data):
mu=mean(data)
return sum([(x-mu)**2 for x in data])/len(data)

def stddev(data):
return sqrt(variance(data))

weight=[80.,85,200,85,69,65,68,66,85,72,85,82,65,105,75,80,
70,74,72,70,80,60,80,75,80,78,63,88.65,90,89,91,1.00E+22,
75,75,90,80,75,-1.00E+22,-1.00E+22,-1.00E+22,86.54,67,70,92,70,76,81,93,
70,85,75,76,79,89,80,73.6,80,80,120,80,70,110,65,80,
250,80,85,81,80,85,80,90,85,85,82,83,80,160,75,75,
80,85,90,80,89,70,90,100,70,80,77,95,120,250,60]

print mean(weight)

def calculate_weight(data, z):
data.sort()
lowerq = (len(data)-3)/4
upperq = lowerq * 3 + 3
newdata = [data[i] for i in range(lowerq, upperq)]

mu = mean(newdata)
sigma = stddev(newdata)

x = mu + z * sigma
return x

print calculate_weight(weight, -2.)
```

## Normal Distribution

Binomial Distribution
->
Central Limit Theorem
->
Normal Distribution

f(x)=l^[-1/2(x-μ)^2/σ^2]

Manipulating Normals
(aX+b)
μ,σ^2, aμ+b,a^2σ^2
(X+Y)
μ,σ^2 μσ^2, μ+μ, σ^2σ^2

## central limit theorem

coin:(0,1) P(Σi=k)= n!/(n-k)!k!
Pascal Triangle

flip a coin 1000 times
mean
standard deviation

```import random
from math import sqrt

def mean(data):
return float(sum(data))/len(data)

def variance(data):
mu=mean(data)
return sum([(float(x)-mu)**2 for x in data])/len(data)

def stddev(data):
return sqrt(variance(data))

def flip(N):
return [random.random() > 0.5 for x in range(N)]

N=1000
f=flip(N)

print mean(f)
print stddev(f)
```

## Arrangements

n! / (n-k)!・k!

p(heads)=0.5
flip coin 5 times
p(#head)=1
5!/4!*1! = 5, 2^5 = 32
0.15625

probability p(heads)=0.8
frip coin 3 time p(#heads= 1)
0.096

5*(0.8)^4 *(0.2)^1

## compute mean with outlier baby

outlier: 外れ値

real age -> age in database
type ->

diminish type from target

lower quarlile
upper quarlile
interquarlile range

percentile
20,21,22,24,211
upper 20% percentile