Add a remote repository

[vagrant@localhost git]$ git remote
fatal: Not a git repository (or any of the parent directories): .git
[vagrant@localhost git]$ git --version
git version 2.2.1

create a new directory with the name my-travel-plans
use git init to turn the my-travel-plans directory into a Git repository
create a README.md file
create index.html
create app.css

README.md

# Travel Destinations
A simple app to keep track of destinations I'd like to visit.

index.html

<!doctype html>
<html lang="en">
<head>
  <meta charsete="utf-8">
  <title>Travels</title>
  <meta name="description" content="">
  <link rel="stylesheet" href="css/app.css">
</head>
<body>

  <div class="container">
    <div class="destination-container">
      <div class="destination" id="florida">
        <h2>Florida</h2>
      </div>

      <div class="destination" id="paris">
        <h2>Paris</h2>
      </div>
    </div>
  </div>
</html>

app.css

html {
  box-sizing: border-box;
  height: 100%;
}

*,
*::before,
*::after {
  box-sizing: inherit;
}

body {
  display: flex;
  margin: 0;
  height: 100%;
}

.container {
  margin: auto;
  padding: 1em;
  width: 80%;
}

.destination-container {
  display: flex;
  flex-flow: wrap;
  justify-content: center;
}

.destination {
  background: #03a9f4;
  box-shadow: 0 1px 9px 0 rgba(0, 0, 0, 0.4);
  color: white;
  margin: 0.5em;
  min-height: 200px;
  flex: 0 1 200px;
  display: flex;
  justify-content: center;
  align-items: center;
  text-align: center;
}

h2 {
  margin: 0;
  transform: rotate(-45deg);
  text-shadow: 0 0 5px #01579b;
}

#florida {
  background-color: #03a9f4;
}

#paris {
  background-color: #d32f2f;
}

Git intro

creating repositories with git init and git clone
reviewing repos with git status
using git log and git show to review past commits
being able to make commits with git add
commit them to the repo with git commit
need to know about branching, merging branches together, and resolving merge conflicts
being able to undo things in Git:
git commit –amend to undo the most recent commit or to change the wording of the commit message
git reset if you’re comfortable with all of these, then you’ll be good to go for this

It’s incredibly helpful to make all of your commits on descriptively named topic branches. Branches help isolate unrelated changes from each other.
So when you’re collaborating with other developers make sure to create a new branch that has a descriptive name that describes what changes it contains.


git remote
git push
git pull

Git is a distributed version control system which means there is not one main repository of information. Each developer has a copy of the repository. So you can have a copy of the repository (which includes the published commits and version history) and your friend can also have a copy of the same repository. Each repository has the exact same information that the other ones have, there’s no one repository that’s the main one.

The way we can interact and control a remote repository is through the Git remote command:

$ git remote

Alpha and Jitter

'''(r)
ggplot(aes(x = age, y = friends_initiated), data = pf)
 geom_point(alpha = 1/10, position = 'jitter')
'''
age_groups <- group_by(pf, age)
pf.fc_by_age <- summarise(age_groups,
	friend_count_mean = mean(friend_count),
	friend_count_median = median(friend_count),
	n = n())
pf.fc_by_age <- arrange(pf.fc_by_age, age)
head(pf.fc_by_age)

Explore Variables

Scatterplots

'''(r)
library(ggplot2)
pf <- read.csv('pseudo_facebook.tsv', sep = '\t')

qplot(x = age, y = friend_count, data = pf)
qplot(age, friend_count, data = pf)
'''
'''(r)
qplot(x = age, y = friend_count, data = pf)

ggplot(aes(x = age, y= friend_count), data = pf) + geom_point()

summary(pf$age)
'''
'''(r)
ggplot(aes(x = age, y = friend_count),data = pf)+
	geom_point(alpha = 1/20) + xlim(13, 90)
'''

Histogram of Users’ birth

'''(r)
install.packages('ggplot2')

names(pf)
qplot(x -dob_day, data - pf)
'''
'''(r)
qplot(x - friend_count, data - pf)
'''
'''(r)
qplot(x - friend_count, data - pf, xlim - c(0, 1000))

qplot(x - friend_count, data_pf) +
	scale_x_continuous(limits - c(0, 1000))
'''

R Markdown Documents

'''{r}
# the hash or pound symbol inside the block creates
# a comment. These three lines of are not code and cannot be
x <- [1:10]
mean(x)
'''
a <- c(1,2,5.3,6,-2,4) # numeric vector
b <- c("one","two","three") # character vector
c <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
reddit <- read.csv('reddit.csv')

table(reddit$employment)

str(reddit)
levels(reddit$age.range)

library(ggplot2)
qplot(data = reddit, x = age.range)

R

the leading tool
many packages
active community

install.packages("swirl")
library(swirl)
swirl()
> ?mean
> x <- c(0:10, 50)
> x
 [1]  0  1  2  3  4  5  6  7  8  9 10 50
> xm <- mean(x)
> xm
[1] 8.75
> c(xm, mean(x, trim = 0.10))
[1] 8.75 5.50
> subset(statesInfo, state.region == 1)
               X state.abb state.area state.region population income illiteracy life.exp murder
7    Connecticut        CT       5009            1       3100   5348        1.1    72.48    3.1
19         Maine        ME      33215            1       1058   3694        0.7    70.39    2.7
21 Massachusetts        MA       8257            1       5814   4755        1.1    71.83    3.3
29 New Hampshire        NH       9304            1        812   4281        0.7    71.23    3.3
30    New Jersey        NJ       7836            1       7333   5237        1.1    70.93    5.2
32      New York        NY      49576            1      18076   4903        1.4    70.55   10.9
38  Pennsylvania        PA      45333            1      11860   4449        1.0    70.43    6.1
39  Rhode Island        RI       1214            1        931   4558        1.3    71.90    2.4
45       Vermont        VT       9609            1        472   3907        0.6    71.64    5.5
   highSchoolGrad frost  area
7            56.0   139  4862
19           54.7   161 30920
21           58.5   103  7826
29           57.6   174  9027
30           52.5   115  7521
32           52.7    82 47831
38           50.2   126 44966
39           46.4   127  1049
45           57.1   168  9267
Title
========================================================
This is an R Markdown document or RMD. Markdown is a simple formatting syntax for authoring web pages (click the **Help** toolbar button for more details on using R Markdown).

When you click the **Knit HTML** button a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Why learn EDA?

So what’s getting unbiquitous and cheap?
Data.
And what is complementary to data?
Analysis.
-Hal Varian

Netflix Prize Competition
EDA:electronic design automation

Netflix Prize Dataset Visualization

Television Size Over the Years

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

X, y = make_classification(n_samples=10000, n_features=10, n_classes=2, n_informative=5)
Xtrain = X[:9000]
Xtest = X[9000:]
ytrain = y[:9000]
ytest = y[9000:]

clf = LogisticRegression()
clf.fit(Xtrain, ytrain)

500 TB of data

Facebook processes more than 500 TB of data daily
https://www.cnet.com/news/facebook-processes-more-than-500-tb-of-data-daily/

One of Facebook’s tools, Presto (mainly used for adhoc analysis), processes over 1 petabyte of data per day.

Google Trends
Chiken, Music, Movies
https://trends.google.com/trends/explore?date=all&q=chicken,music,movies