fix index content

procedure return a list of strings that break the source string up by the characters in the splitlist.

def split_string(source,splitlist):
  output = []
  atsplit = True
  for char in source:
    if char in splitlist:
      atsplit = True:
    else:
      if atsplit:
        output.append(char)
        atsplit = False
      else:
        output[-1] = output[-1] + char
    return output

cut duplicated url on index list

def add_to_index(index, keyword, url):
    for entry in index:
        if entry[0] == keyword:
            if not url in entry[1]:
            entry[1].append(url)
            return
    # not found, add new keyword to index
    index.append([keyword, [url]])

One way search engines rank pages is to count the number of times a searcher clicks on a returned link. This indicates that the person doing the query thought this was a useful link for the query, so it should be higher in the rankings next time.

def record_user_click(index,keyword,url):
  urls = lookup(index, keyword)
  if urls:
    for entry in urls:
      if entry[0] == url:
        entry[1] = entry[1] + 1
      
def add_to_index(index,keyword,url):
  for entry in index:
    for entry[0] == keyword:
      for elment in entry[1]:
        if element[0] = url:
          return
      entry[1].append([url,0])
      return
  index.append([keyword,[[url,0]]])

How program run fast making things fast. It depends algorithm analysis that most programmer spend time in procedure, well defined sequence of steps that can be executed mechanically. So gatta think about time and memory said cost of computer.

time consuming

import time

def time_execution(code):
  start = time.clock()
  result = eval(code)
  run_time = time.clock() - start
  return result, run_time

def spin_loop(n):
  i = 0
  while i < n:
    i = i + 1
    
print time_execution('2 ** 10')


Python 2.7.10 (default, Jul 14 2015, 19:46:27)
[GCC 4.8.2] on linux
(1024, 1.6000000000002124e-05)

print time_execution('spin_loop(10000000)')
(None, 0.33208099999999996)

Creating an Empty Hash Table, make_hashtable, that takes as input a number, nbuckets, and returns an empty hash table with nbuckets empty buckets.

def make_hashtable(nbuckets):
  i = 0
  table = []
  while i < nbuckets:
    table.append([])
    i = i + 1
  return table

for loop, we do not need variable like i.

def make_hashtable(nbuckets):
  i = 0
  table = []
  for unused < range(0, nbuckets):
    tabale.append([])
  return table

Update hashtable, if key is already in the table, change the value to the new value.

def hashtable_update(htable,key,value):
    bucket = hashtable_get_bucket(htable, key)
    for entry in hashtable
        if entry[0] == key:
          entry[1] = value
    bucket.append([key, value])

Indexing

If the keyword is already in the index, add the url to the list of urls associated with that keyword. If the keyword is not in the index, add an entry to the index: [keyword,[url]]

index = []

def add_to_index(index,keyword,url):
    for entry in index:
      if entry[0] == keyword:
        entry[1].append(url)
        return
    index.append([keyword,[url]])

Return a list of the urls associated with the keyword. If the keyword is not in the index, the procedure return an empty list.

def lookup(index,keyword):
    for entry in index:
      if entry[0] == keyword:
        return entry[1]
    return[]

split()

quote = "In Washington, it's dog eat dog. In academia...."
quote.split()

update the index to include all of the word occurences found in the page content by adding the url to the word’s associated url list.

def add_to_index(index,keyword,url):
    for entry in index:
        if entry[0] == keyword:
            entry[1].append(url)
            return
    index.append([keyword,[url]])

def add_page_to_index(index,url,content):
    words = content.split()
    for word in words:
    add_to_index(index, word, url)

python code getting url from internet

def get_page(url):
  try:
    import urllib
    return urllib.urlopen(url).read()
  except:
    return ""

The meaning of Latency and bandwidth is close but different.Latency is time it takes message to get from source to destination. Bandwidth is amount of information that can be transmitted per unit time. It is said mbps, bit per second.
internet-speed-test

Define traceroute

[vagrant@localhost ~]$ traceroute www.whitehouse.gov
traceroute to www.whitehouse.gov (184.26.234.167), 30 hops max, 60 byte packets
 1  10.0.2.2 (10.0.2.2)  0.474 ms  0.397 ms  0.302 ms
 2  10.0.2.2 (10.0.2.2)  88.120 ms  87.848 ms  87.469 ms

comparing traceroute in windows

PS C:\Users\hoge> tracert www.udacity.com

apollo-mesos-elb-berlioz2-prod-885022263.us-west-2.elb.amazonaws.com [52.40.117.247] へのルートをトレー
経由するホップ数は最大 30 です:

  1    94 ms     6 ms     6 ms  192.168.179.1
  2     *        *        *     要求がタイムアウトしました。
  3     *        *        *     要求がタイムアウトしました。
  4    65 ms    57 ms    78 ms  172.25.21.178
  5    61 ms    64 ms    65 ms  172.25.153.3
  6    74 ms    91 ms    63 ms  172.25.153.46
  7    65 ms    65 ms    67 ms  chyISN001.bb.kddi.ne.jp [27.85.176.181]
  8    87 ms    69 ms    71 ms  obpBBAC03.bb.kddi.ne.jp [27.90.191.254]
  9    59 ms    94 ms    59 ms  obpjbb205.int-gw.kddi.ne.jp [125.53.98.93]
 10   248 ms   238 ms   234 ms  pajbb001.int-gw.kddi.ne.jp [203.181.100.194]
 11   201 ms   228 ms   174 ms  ix-pa9.int-gw.kddi.ne.jp [111.87.3.74]
 12   190 ms   189 ms   178 ms  72.21.221.125
 13   210 ms   194 ms   202 ms  54.240.243.50
 14   197 ms   179 ms   179 ms  54.240.243.55
 15   204 ms   197 ms   204 ms  205.251.232.116
 16   264 ms   247 ms   197 ms  52.93.12.76
 17   209 ms   196 ms   196 ms  52.93.12.73
 18   244 ms   202 ms   217 ms  52.93.12.144
 19   203 ms   196 ms   195 ms  52.93.12.165
 20   208 ms   196 ms   197 ms  52.93.13.67
 21     *        *        *     要求がタイムアウトしました。
 22     *        *        *     要求がタイムアウトしました。
 23     *        *        *     要求がタイムアウトしました。
 24     *        *        *     要求がタイムアウトしました。
 25     *        *        *     要求がタイムアウトしました。
 26     *        *        *     要求がタイムアウトしました。
 27     *        *        *     要求がタイムアウトしました。
 28     *        *        *     要求がタイムアウトしました。
 29     *        *        *     要求がタイムアウトしました。
 30     *        *        *     要求がタイムアウトしました。

トレースを完了しました。
PS C:\Users\hoge> tracert www.mit.edu

e9566.dscb.akamaiedge.net [23.9.161.192] へのルートをトレースしています
経由するホップ数は最大 30 です:

  1  2662 ms     3 ms     7 ms  192.168.179.1
  2     *        *        *     要求がタイムアウトしました。
  3     *        *        *     要求がタイムアウトしました。
  4    71 ms    62 ms    83 ms  172.25.21.178
  5    86 ms    72 ms    63 ms  172.25.153.3
  6    74 ms    58 ms    79 ms  172.25.153.46
  7    72 ms    57 ms    79 ms  chyISN001.bb.kddi.ne.jp [27.85.176.181]
  8    84 ms    77 ms    79 ms  27.85.132.197
  9    83 ms    78 ms    78 ms  27.85.134.50
 10    89 ms    79 ms    79 ms  210.132.124.62
 11    77 ms    63 ms    73 ms  a23-9-161-192.deploy.static.akamaitechnologies.com [23.9.161.192]

the list of number

product_list, that takes as input a list of numbers, and returns a number that is the result of multiplying all those numbers together.

def product_list(p):
 result = 1
 for i in p
    result =  result * i
    return result

A greatest, that takes as input a list of positive numbers, and returns the greatest number in that list. If the input list is empty, the output should be 0.

def greatest(p):
  biggest = 0
  for i in p:
    if i > biggeset:
      biggest = i
  return biggest

total_enrollment, that takes as an input a list of elements, where each element is a list containing three elements: a university name, the total number of students enrolled, and the annual tuition fees.
The procedure return two numbers, giving the total number of students enrolled at all of the universities in the list, and the total tuition fees (which is the sum of the number of students enrolled times the
# tuition fees for each university).

usa_univs = [ ['California Institute of Technology',2175,37704],
              ['Harvard',19627,39849],
              ['Massachusetts Institute of Technology',10566,40732],
              ['Princeton',7802,37000],
              ['Rice',5879,35551],
              ['Stanford',19535,40569],
              ['Yale',11701,40500]  ]

def total_enrollment(p):
    total_student = 0
    total_tuition = 0
    for i in p:
        total_sutudent = i[1] + total_sudent
        total_tuition = i[1] * i[2] + total tuition
    return total_student, total_tuition

check_sudoku, that takes as input a square list of lists representing an n x n sudoku puzzle solution and returns the boolean True if the input is a valid sudoku square and returns the boolean False otherwise.

def check_sudoku(p):
    n = len(p)
    digit = 1
    while digit <= n:
      i = 0
      while i < n:
        row_count = 0
        col_count = 0
        j = 0
        while j < 0
          if p[i][j] == digit:
            row_count = row_count + 1
          if p[j][i] = digit:
            col_count = col_count + 1
          j = j + 1
        if row_count != 1 or col_count != 1:
          return False
        i = i + 1
      digit = digit + 1
    return True

divided by number by bumbers showing dicimal

def list_mean(p):
    return float(sum(p)) / len(p)

better data structure indexing keyword, thinking data structure is the most important in computer science.

[[<keyword>,[<url>,<url>]]
 [<keyword>,[<url>,<url>]]
 ...]

web crawler

needs the list to crawl and the list crawled.

Basic concept of crawling is here.

start with tocrawl = [seed]
 crawled = []
 while there more pages tocrawl:
    pick a page from tocrawl
    add that page to crawled
    add all the link target on
     this page to tocrawl
    return crawled

this is a procedure of get links.

def get_all_links(page);
  links = []
  while True:
    url, endpos = get_next_target(page)
    if url:
      links.append(url)
      page = page[endpos:]
    else:
      break
  return links

starting from the seedpage. The important point is web crawler follow last page of link using python pop.

 def crawl_web(seed):
   tocrawl = [seed]
   crawled = []
   while tocrawl:
     page = tocrawl.pop()
def union(p,q):
    for e in q:
        if e not in p:
            p.append(e)


def get_all_links(page):
    links = []
    while True:
        url,endpos = get_next_target(page)
        if url:
            links.append(url)
            page = page[endpos:]
        else:
            break
    return links

def crawl_web(seed):
    tocrawl = [seed]
    crawled = []
    while tocrawl:
        page = tocrawl.pop()
        if page not in crawled:
        union(tocrawl, get_all_links(get_page(page)))
        crawled.append(page)

NextDay Procedure

simple nextDay procedure, that assuming every month has 30 days.

For example:
### nextDay(1999, 12, 30) => (2000, 1, 1)
### nextDay(2013, 1, 30) => (2013, 2, 1)
### nextDay(2012, 12, 30) => (2013, 1, 1) (even though December really has 31 days)

def nextDay(year, month, day):
    """warning: this version incorrectly assumues all months have 30days!"""
    if month == 12 and day == 30:
        return year + 1, 1, 1
    elif month < 12:
        return year, month + 1, 1 
    else:
        return year, month, day + 1

day between date

def nextDay(year, month, day):
    if day < 30:
        return year, month, day + 1
    else:
        if month == 12:
            return year + 1, 1, 1
        else:
            return year, month + 1, 1
            
def deteIsBefore(year1, month1, day1, year2, month2, day2):
  if year1 < year2:
    return True
  if year1 == year2:
    if month1 < month2:
        return True
    if month1 == month2:
        return day1 < day2
    return False
        
def daysBetweenDates(year1, month1, day1, year2, month2, day2):    days = 0
    while dateIsBefoer(year1, month1, day1,year2, month2, day2):
        year1, month1, day1 = nextDayMonth(year1, month1, day1)
        days += 1
    return days

how_many_days, representing a month, and returns the number of days in that month.

days_in_month = [31,28,31,30,31,30,31,31,30,31,30,31]

def how_many_days(month_number):
    return days_in_month[month_number - 1]

capital of Indiaby accessing the list

countries = [['China','Beijing',1350],
             ['India','Delhi',1210],
             ['Romania','Bucharest',21],
             ['United States','Washington',307]]
print(countries[1][1])

replace spy code name with list

spy = [0,0,7]

def replace_spy(p):
    p[2] = p[2] + 1
    
replace_spy(spy)
print(spy)

List operation

p = [1, 2]
q = [3, 4]
p.append(q)
q[1] = 5
print(p)

2GB memory that means "2 ** 30 * 2 * 8(bit)", 1 bit have a light switch , so kind of this shows 17 billion light switch

print 2 ** 10
print 2 ** 20
print 2 ** 30
print 2 ** 40

Memory Hierarchy
Look at the latency-distance, speed of light is about 300,000km, CPU Register is 0.12m, and DRAM is 3.6m, harddrive is 2.98km.
So, if you writing a program, you have to think about the latency-distance of each memory that, harddrive is much much far than register and DRAM.

sum list

def sum_list(p):
  result = 0
  for e in p:
      result = result + e
  return result
print(sum_list([1, 4, 7]))

define first character U by for loop

def measure_udacity(p):
    count = 0
    for e in p:
        if p[0] == 'U':
            count = count + 1
    return count

find element in the list procedure

def find_element(p, t)
    i = 0
    for i < len(p):
        if p[i] == t:
          return i
        i = i + 1
    return -1

find_element, using index that takes as its inputs a list and a value of any type, and returns the index of the first element in the input list that matches the value.

def find_elment(p, t):
  if t in p:
    return p.index(t)
  else:
        return -1

union that takes as inputs two lists. It should modify the first input list to be the set union of the two lists, assume the first list is a set, that is, it contains no repeated elements.

def union(p, q):
    for e in q:
      if e not in p:
        p.append(e)

speed of light

speed_of_light = 299792458
billionth = 1.0 /1000000000
nanostick = speed_of_light * billion
print nanostick

variable changes with faster processer, even in the same expression, divede by cycles_per_second.

speed_of_light = 299792458
cycles_per_second = 2700000000. # 2.7GHz

cycle_distance = speed_of_light / cycles_per_second

cycles_per_second = 2800000000. # 2.8GHz
print(cycle_distance)

cycle_distance = speed_of_light / cycles_per_second
print(cycle_distance)

Finding Strings in strings

pythagoras = 'There is geometry in the humming of the string, there is music in the spacing of the spheres'
print(pythagoras.find('string'))
print(pythagoras[40:])
print(pythagoras.find('algebra'))

danton = "De l'audance, encore de l'audace, toujours de l'audace"
print(danton.find('audace'))
print(danton.find('audace', 0))
print(danton.find('audace', 5))
print(danton.find('audace', 6))
print(danton[6:])

link procedure

def get_next_target(page):
  start_link = page.find('<a href=')
  start_quote = page.find('"', start_link)
  end_quote = page.find('"', start_quote + 1)
  url = page&#91;start_quote + 1:end_quote&#93;
  return url, end_quote
  
print(get_next_target('this is a <a href="www.yahoo.co.jp">link</a>'))

How to crawler get a link from webpages.

def print_all_links(page):
  while True:
    url, endpos = get_next_target(page)
    if url:
      print(url)
      page = page[endpos:]
    else:
      break
  
print_all_links(get_page('http://yahoo.co.jp'))

find last position

def find_last(s, t):
  last_pos = -1
  while True:
    pos = s.find(t, last_pos)
    if pos == -1:
      return last_pos
      last_pos = pos

Python基礎

python3とpython2では記法が異なっている点があるので注意が必要です。

print ("hello world!")

変数

msg = "hello world"
print (msg)

整数と小数の演算は小数、整数同士の割り算は、小数点以下切り捨てとなります。

繰り返し処理

print (u"無駄"*10)

\\, \*なども

改行表現

print ("""<html>
<body>
</body>
</html>""")

整数値と文字列はpythonでは明示する

print (5 + int("5"))
print ("i am " + str(20) + "years old.")

配列、存在チェック

sales = [200, 100, 342, 1230, 122]
print (100 in sales)

ソート、reverse

sales = [52, 100, 80, 45]
sales.sort()
print (sales)

タプル:変更不可

a = (2, 5, 8)
print (a * 3)

セット:重複を許さない

a = set([1, 2, 3, 4])
print (a)

差集合

a = set([1, 2, 3, 4])
b = set([4, 5, 6, 7])
print (b - a)

辞書

sales = {"yamada": 200, "yoshida": 300, "sakura": 240}
print (sales)

key、value、items(一覧)

sales = {"yamada": 200, "yoshida": 300, "sakura": 240}
print (sales.values())

文字列へのデータ組み込み

a = 10
b = 123.345
c = "sakaki"
d = {"yamada":200, "yoshimoto": 300}
print ("age: %d" % a)

条件分岐

score = 70
if score > 60:
    print ("ok")

条件分岐2

score = 55
if score > 60:
    print ("ok")
elif score > 50:
    print ("soso")
else:
    print ("NG!")

forループ

sales = [13, 235, 312, 2232]
for sale in sales:
    print (sale)

繰り返し処理 ※インデントに注意

for i in range(10):
    print(i)

空処理

def hello2():
    pass

python module:
https://docs.python.org/3/library/index.html