Questioning
Modeling
Validating
=>
Answers!
e.g.
“Can we predict the next time a person will tweet?”
=> time of day
regression estimator, hypothesis test, classification
r(time since last tweet(Δt)) = time next tweet
Prepare data for histogram
tweetsDF = pandas.io.json.read_json("new_gruber_tweets.json") createdDF = tweetsDF.ix[0:, ["created_at"]] createdTextDF = tweetsDF.ix[0:, ["created_at", "text"]] createdTextVals = createdTextDF.values Collect "created_at" attributes for each tweetsDF tweetTimes = [] for i, row in createdDF.iterrows(): tweetTimes.append(row["created_at"]) tweetTimes.sort()
Create initial histogram
timeToNextSeries.hist(bins=30, normed=True) <matplotlib.axes.AxesSubplot at 0x10c625390>