Questioning
Modeling
Validating
=>
Answers!
e.g.
“Can we predict the next time a person will tweet?”
=> time of day
regression estimator, hypothesis test, classification
r(time since last tweet(Δt)) = time next tweet
Prepare data for histogram
tweetsDF = pandas.io.json.read_json("new_gruber_tweets.json")
createdDF = tweetsDF.ix[0:, ["created_at"]]
createdTextDF = tweetsDF.ix[0:, ["created_at", "text"]]
createdTextVals = createdTextDF.values
Collect "created_at" attributes for each tweetsDF
tweetTimes = []
for i, row in createdDF.iterrows():
tweetTimes.append(row["created_at"])
tweetTimes.sort()
Create initial histogram
timeToNextSeries.hist(bins=30, normed=True) <matplotlib.axes.AxesSubplot at 0x10c625390>