Android Design

Working with density-independent pixels
1px / 1dp = 160dpi / 160dpi
2px / 1dp = 320dpi / 160dpi
※ dpi = dot per inch
7inch nexus^7 1280x800pixel => 960x600dp

Density buckets


	<item android:state_pressed="true"

	<item android:state_pressed="true"

	<item android:state_checked="true"

	<item android:drawable="@drawable/box_default">


content, padding, margins
FrameLayout, LinearLayout, RelativeLayout, GridLayout, ScrollView, ListView, ViewPager



	<!-- Base application theme. -->
	<style name="AppTheme" parent="Theme.AppCompat.Light.DarkActionBar">
		<!-- Customize theme here -->

	<style name="MyStyle">
		<item name="android:textColor">#FF255F26</item>

	<style name="AnotherStyle">
		<item name="android:textColor">#1D175F</item>
		<item name="android:textStyle">bold</item>

Time Series Forecasting

time series forecasting to predict values for following business situations.
– Monthly beach bike rentals
– A stock’s daily closing value
– Annual sheep population

Average Method
The best predictor of what will happen tomorow is the average of everything that has happened up until now.

Moving average method

Naive Method
If there is not enough data to create a predictive model, the Naive method can supplement forecasts for the near future.

Seasonal Naive Method
Assumes that the magnitude of the seasonal pattern will remain constant.

Exponential Smoothing Model
Past Observations, Weighted Average

Bayesian Interface

Representing and reasoning with probabilities
Bayesian Networks

Joint Distribution

Strom lightning thonder
T, T .25 .20,.05
T, F .40 .04,0.36
F, T .05 .04,0.01
F, F .30 .03,0.27
Random day 2pm – look outside summer
Pr(¬storm) = 0.35
pr(lightning|storm) = .4615 (.25/.65)

X is conditionally independent of Y given Z fi the probability distribution governing X is independent of the value of y given the value of Z; that is, if
P(X=x|Y=y,Z=z)= P(X=x|Z=x)
more compactly write
P(X|Y,Z) = P(X|Z)

two things distribution are for -probability of value, generate value
simulation of a complex process
approximate inference

P(x)= Σy*P(x,y)
P(x,y)= P(x)*P(y|x)
P(y|x) = P(x|y)*P(y)/P(z)

Bayesian Learning

Learn the best hypothesis given data
$ some domain knowledge

Learn the most probable H given data
$ domain knowledge

Pr(h|D) = Pr(D|h)*Pr(h) / Pr(D) … Bayes’ rule
Pr(a,b) = Pr(a|b)P(b)
Pr(b,a) = Pr(b|a)P(a)

Bayesian Learning
For each h e H
calculate Pr(h|D) = P(D|h)P(h)/P(D)
h = argmax Pr(h|D)
h = argmax Pr(D|h)

VC Dimensions

Infinite Hypothesis Spaces
m>= 1/ε(ln|H|+ln1/γ)

spaces are infinite
– linear separators
– artificial neural networks
– decision trees (continuous input)

H: h(x) = x>=Θ

Trade all hypotheses (only track non-negative integer), keep version space

X = R
H = {h(x) = xe[a,b]}
parameterized by a,b e R
VC = 2

Computational Learning Theory

Mondrain Composition
Colored Vornoi Diagram

support vector machines SVMs perceptron
Nearest neighbor 1-NN
decision trees

-defining learning problems
-showing specific algorithms work
-show these problems are fundamentally hard

Theory of computing analyzes how use resources: time, space, o(nlogn), o(n^2)

Inductive learning
1.probability of successful training
2.number of examples to train on
3.complexity of hypothesis class
4.accuracy to which target concept is approximated
5.manner in which training examples presented
6.manner in which training examples selected

computational complexing
– how much computational effort is needed for a learner to coverage?
sample complexing -batch
– how many training examples are needed for a learner to create a successful hypothesis
mistake bounds – online
– how many misclassfications can a learner make over an infinite run

true hypothesis: ceH Training set:s<=X candidate hypothesis: heH consistent learner: produce c(x)=h(x) far xeS version space: VS(s) = {h s.t. heH consistent wants} hypotheses consistent with examples errord(h) = Prx~d[c(x)=h(x)]

Support Vector Machines

y = w^t*x + b
label, parameter of the plane
w^t*x + b = 1
w^t*x + b = 0
w^t*x + b = -1
y = {-1, +1}

w^t*x1 + b = 1
w^t*x2 + b = -1
w^t(x1-x2)/||w|| = 2/||w|| margin

max 2/||w|| while classifying everything correctly
yi(w^t*x + b) >= 1
min 1/2 ||w||^2 quadratic programming
w(α) = Σi αi – 1/2 Σio αi*αu*yi*yu*xi^t*xu
s.t. αi>=Θ, Σi αi*yi = Θ

SVMs: Linearly Married
– margins : generalization overfitting
– big is better
– optimization problem for finding max margins: QPs
– support vectors

Ensemble learning boosting 

spam email {+, -}
sample rules: body “manly” +, from spouse -, short +, just URLs +, just image +, “pΘrn” +, “make money easy” +

1. Learn over a subset of data -> rule: uniformly randomly, pick data and apply a learner
2. combine: complex rule -> mean

“hardest” examples
weighted mean

Error: mismatches
PrD[h(x) = c(x)]

Instance Based Learning

1, 2, … n => f(x)

1, 2, … n
f(x) = look up(x)

+ remember, fast, simple
– generalization, overfit

distance stand for similarity

Given: Training data D = {xi, yi}
distance metric d(q, x)
number or neighbors k
query point q
– NN = {i: d(q, xi) k smallest}
– Retrun
– classification: ploraity
– regression: mean

Preference Bias
+ locality -> near points are similar
+ smoothness -> averaging
+ an features matter equally

Curse of dimensionality
As the number of features or dimensions grows, the amount of data we need to generalize accurately grow exponentially

Neural Networks

Neural Networks
cell body, neuron, axon, synapses, spike trains
computational unit
Artificial Neural networks

x1, x2, x3 -> Θ(unit: Perceptron) -> y
Σi=1 Xi・Wi (activation) >= Θ firing
output yes: y:1, no: y:0

x1 = 1, X2 = 0, X3 = -1.5, w1 = 1/2, w2 = 3/5, w3 = 1
1*1.2 + 0*3.5-1.5*1 = -1
Θ = 0
out put should be y=0

How powerful is a perceptron unit