
Google, Amazon, Facebook, Apple, Microsoftはどのように機械学習に取り組んでいるのか?まずそれを知ることが大事。


続いて、Waymoのsoftware engineeringの採用情報を見てみましょう。

Data scientistの条件。phDか。。まじかー。
We’d like you to have:
PhD in statistics, math, or other quantitative area
Progressive statistics background either in academia or industry
Data science and system evaluation experience
Willingness to understand a complex system and its various components
Experience with tools for manipulating big data
Experience with R/Python and statistical libraries

続いてDeep Leaningのsoftware engineer
PhD in Computer Science, Machine Learning, Robotics, similar technical field of study, or equivalent practical experience
Experience in applied Machine Learning including data collection, analysis and feature engineering
Experience in Deep Learning research
Experience with TensorFlow
Experience programming in Python/C++

そうか、computer scienceはOKとして、Machine Learning, Roboticsは必須だな。


まずはmachine learningとアプリケーションからか。

machine learning anotation

Annotation is critical to AI development and operation.
It is important to create large amount of accurate annotation data
Analysis and improvement of annotation process by machine learning.

Annotations are processes located upstream of pipeline. Therefore, if there are many errors in the annotation, it may have a fatal effect on subsequent processes, including model learning and evaluation(in many cases, evaluation data is also generated by the annotation).

Why is annotation important?
To unify the content to be read from the data.
In the upstream process of the AI pipeline, it has a fatal impact on the leader processes such as model learning and evaluation.


difference between deep learning and machine learning

先日、deep learningのサービスについて会話をしていたら、私が”machine learning”と言ったら、それは「machine learning ではなくdeep learning」と突っ込まれた。ということで、deep learning と machine learningの違いについて。

Deep learning is a further development of machine learning. The major difference from conventional machine learning is that the framework used to analyze information and data is different. This is a “neural network” created by imitating human nerves, making computer analysis and learning of data powerful.

Although there are AI mechanisms for “machine learning” and “deep learning”, it can be said that there is a difference that automation of functional enhancement is being promoted. In particular, it can be said that the system is evolving in that it automatically finds out where to look for when distinguishing the object of analysis.


tech系のイベント@六本木ヒルズ18F メルカリ


1. apple, amazon, googleをwatchしている
2. 学会の論文を読んで、それをモデリングに落として試している
3. mercariはawsを使っている
4. 機械学習ができて当然は本当だった
5. MLチーム構成は半分日本人、次にインド人
6. 全部出来るやつが強い(当たり前か。。)


Amazon ML を触ろう


というか、Amazon Redshiftって何だ。。。 dataを保存できるのか。

modeling process



Try real-time prediction



AWS machine learning

Amazon machine learning モデルの概念図

what is amazon machine learning
amazon ML can be used to make predictions for a variety of purposes. For example, you could build a model in Amazon ML that will predict whether a given customer is likely to respond to a marketing offer. Amazon ML creates models from supervised data sets. This means that the model is based on a set of previous observations. This set of observations consists of features or attributes as well as the target outcome. In the marketing offer example, the features might include the age, profession, and gender of the customer. The target outcome (also called the target variable) would be whether that particular customer responded to the marketing offer or not.

The process of creating a model from a set of known observations is called training. Once you have trained a model in Amazon ML, you can then use the model to predict outcomes from a set of attributes that matches the attributes used to train the model. Amazon ML scales so that you can make thousands of predictions concurrently. This is important, as today machine learning is often used to provide predictions in near real-time. In this lab, you will be using a machine learning model to predict which restaurants a customer is likely to favor based on the results of a search query.

data setをs3のバケットのuploadする。


machine learningを選択する



Classification : Identifying to which category an object belongs to.
SVM, nearest neighbors, random forest, …
Spam detection, Image recognition.

Regression:Predicting a continuous-valued attribute associated with an object.
SVR, ridge regression, Lasso, …
Drug response, Stock prices.

Clustering:Automatic grouping of similar objects into sets.
k-Means, spectral clustering, mean-shift, …
Customer segmentation, Grouping experiment outcomes

Dimensionality reduction:Reducing the number of random variables to consider.
Algorithms: PCA, feature selection, non-negative matrix factorization.
Visualization, Increased efficiency

Model selection:Comparing, validating and choosing parameters and models.
grid search, cross validation, metrics.
Improved accuracy via parameter tuning

Preprocessing:Feature extraction and normalization.
preprocessing, feature extraction.
Transforming input data such as text for use with machine learning algorithms.

ということで、Classification、 Regression、Clusteringは割と一般的なモデリングだと思います。




-Simple and efficient tools for data mining and data analysis
-Accessible to everybody, and reusable in various contexts
-Built on NumPy, SciPy, and matplotlib
-Open source, commercially usable – BSD license

BSD license ってあまりみませんね。
BSD license:カリフォルニア大学によって策定され、同大学のバークレー校内の研究グループ、Computer Systems Research Groupが開発したソフトウェア群であるBSDなどで採用されている。「無保証」であることの明記と著作権およびライセンス条文自身の表示を再頒布の条件とするライセンス規定である。この条件さえ満たせば、BSDライセンスのソースコードを複製・改変して作成したオブジェクトコードをソースコードを公開せずに頒布できる。


sklearn import pandas

import pandas as pd
from sklearn import svm, metrics

xor_input = [
	[0, 0, 0],
	[0, 1, 1],
	[1, 0, 1],
	[1, 1, 0]

xor_df = pd.DataFrame(xor_input)
xor_data = xor_df.ix[:,0:1]
xor_label = xor_df.ix[:,2]

clf = svm.SVC(), xor_label)
pre = clf.predict(xor_data)

ac_score = metrics.accuracy_score(xor_label, pre)
print(" 正解率=", ac_score)

[vagrant@localhost python]$ python3
/home/vagrant/.pyenv/versions/3.5.2/lib/python3.5/importlib/ RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/vagrant/.pyenv/versions/3.5.2/lib/python3.5/importlib/ RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/vagrant/.pyenv/versions/3.5.2/lib/python3.5/importlib/ RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/vagrant/.pyenv/versions/3.5.2/lib/python3.5/importlib/ RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/vagrant/.pyenv/versions/3.5.2/lib/python3.5/importlib/ RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/vagrant/.pyenv/versions/3.5.2/lib/python3.5/importlib/ RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
正解率= 1.0
