Python – Page 34 – ソフトウェアエンジニアの技術ブログ：Software engineer tech blog

python time alert

time alert program which call you tube video.

import time
import webbrowser

total_breaks = 3
break_count = 0

print("this program start on"+time.ctime())
while(break_count < total_breaks):
    time.sleep(2*60*60)
    webbrowser.open("https://www.youtube.com/watch?v=PDSkFeMVNFs")
    break_count = break_count + 1

get a file from computer and rename file name.In this case, remove the number of file name.

import os
def rename_files():
    #(1)get file names from a folder
    file_list = os.listdir(r"C:\Users\prank")
    #print(file_list)
    save_path = os.getcwd()
    print(save_path)
    os.chdir(r"C:\Users\prank")
    #(2)for each file, rename filename
    for file_name in file_list:
       os.rename(file_name, file_name.translate(None, "0123456789"))
    os.chdir(save_path)

rename_files()

crawling page rank

def crawl_web(seed): # returns index, graph of outlinks
    tocrawl = [seed]
    crawled = []
    graph = {}  # <url>:[list of pages it links to]
    index = {} 
    while tocrawl: 
        page = tocrawl.pop()
        if page not in crawled:
            content = get_page(page)
            add_page_to_index(index, page, content)
            outlinks = get_all_links(content)
            graph[page] = outlinks
            union(tocrawl, outlinks)
            crawled.append(page)
    return index, graph

def ancestors(genealogy, person):
  if person in genealogy:
    parents = genealogy[person]
    result = parents
    for parent in parents:
      result = result + ancestors(genealogy, parent)
    return result
    
  return []

Blaise Pascal Triangle

def make_next_row(row):
  result = []
  prev = 0
  for e in row:
    result.append(e + prev)
    prev = e
  result.append(prev)
  return result

def triangle(n):
  result = []
  current = [1]
  for unused in range(0, n):
    result.append(current)
    current = make_next_row(current)
    
  return result

fibonacci

-base cases
fibonacci(0)=0
fibonacci(1)=1

-recursive case
fibonacci(n) = fibonacci(n-1)+fibonacci(n-2)
n > 1

A procedure, fibonacci, that takes a natural number as its input, and returns the value of that fibonacci number.

def fibonacci(n):
    if n == 0:
      return 0
    if n == 1:
      return n
    return fibonacci(n-1) + fibonacci(n-2)

fibonacci has interesting feature below
fibo(x) = fibo(x-n-1)

More faster fibonacci procedure using for loop.

def fibonacci(n):
    current = 0
    after = 1
    for i in range(0, n):
      current, after = after, current + after
    return current

mass_of_earth = 5.9722 * 10**24
mass_of_rabbit = 2

n = 1
while fibonacci(n) * mass_of_rabbit < mass_of_earth:
  n = n + 1
print n, fibonacci(n)

- Ranking web pages as popularity
popularity(p) = # of people who are friends with p
popularity(p) = Σ popularity(f), f E friends of p

def popularity(p):
score = 0
for f in friends(p):
score = score + popularity(f)
return score

Defining Procedures Recursively

factorial(n) = n*(n-1)*(n-2)*…1

factorial(n)=1
factorial(n)=n*factorial(n-1)

A procedure, factorial, that takes a natural number as its input, and returns the number of ways to arrange the input number of items.

def factorial(n):
 if n == 0:
   return 1
 else:
   return n * factorial(n-1)

Palindromes:match the first and last string, such as level.
A procedure is_palindrome, that takes as input a string, and returns a Boolean indicating if the input string is a palindrome.

def is_palindrome(s):
  if s == "":
    return True
  else:
    if s[1] == s[-1]:
        return is_palindrome(s[1:-1])
    else:
        return false

Another way, using for loop, to write palindrome procedure.

def iter_palindrome(s):
  for i in range(0, len(s)/2):
    if s[i] != s[-(i + 1)]:
      return False
  return True

String: sequence of characters, immutable
ex. ‘hello’, s[i] is i th character in S
List: list of elements, mutable
ex. [‘alpha’, 23], p[i] is i th element of p
Dictionary: set of pairs, mutable
ex. {‘hydrogen’: 1, ‘helium’: 23, d[k], k is whatever the key value
ex. d[k] = v, is similar to the update.

Here is a example to use dictionary

elements = {'hydrogen':1, 'helium':2, 'carbon':6 }
elements['lithimu'] = 3
elements['nitrogen'] = 8
elements['nitrogen'] = 7
print elements['nitrogen']

population = {'Shanghai':17.8, 'Istanbul':13.3, 'Karachi':13.0, 'Mumbai':12.5 }

another example

elements = {}
elements['H'] = {'name': 'Hydrogen', 'number': 1, 'weight': 1.00794}
elements['He'] = {'name': 'Helium', 'number': 2, 'weight': 4.002602, 'noble gas': True}
print elements['He']['name']

fix index content

procedure return a list of strings that break the source string up by the characters in the splitlist.

def split_string(source,splitlist):
  output = []
  atsplit = True
  for char in source:
    if char in splitlist:
      atsplit = True:
    else:
      if atsplit:
        output.append(char)
        atsplit = False
      else:
        output[-1] = output[-1] + char
    return output

cut duplicated url on index list

def add_to_index(index, keyword, url):
    for entry in index:
        if entry[0] == keyword:
            if not url in entry[1]:
            entry[1].append(url)
            return
    # not found, add new keyword to index
    index.append([keyword, [url]])

One way search engines rank pages is to count the number of times a searcher clicks on a returned link. This indicates that the person doing the query thought this was a useful link for the query, so it should be higher in the rankings next time.

def record_user_click(index,keyword,url):
  urls = lookup(index, keyword)
  if urls:
    for entry in urls:
      if entry[0] == url:
        entry[1] = entry[1] + 1
      
def add_to_index(index,keyword,url):
  for entry in index:
    for entry[0] == keyword:
      for elment in entry[1]:
        if element[0] = url:
          return
      entry[1].append([url,0])
      return
  index.append([keyword,[[url,0]]])

How program run fast making things fast. It depends algorithm analysis that most programmer spend time in procedure, well defined sequence of steps that can be executed mechanically. So gatta think about time and memory said cost of computer.

time consuming

import time

def time_execution(code):
  start = time.clock()
  result = eval(code)
  run_time = time.clock() - start
  return result, run_time

def spin_loop(n):
  i = 0
  while i < n:
    i = i + 1
    
print time_execution('2 ** 10')


Python 2.7.10 (default, Jul 14 2015, 19:46:27)
[GCC 4.8.2] on linux
(1024, 1.6000000000002124e-05)

print time_execution('spin_loop(10000000)')
(None, 0.33208099999999996)

Creating an Empty Hash Table, make_hashtable, that takes as input a number, nbuckets, and returns an empty hash table with nbuckets empty buckets.

def make_hashtable(nbuckets):
  i = 0
  table = []
  while i < nbuckets:
    table.append([])
    i = i + 1
  return table

for loop, we do not need variable like i.

def make_hashtable(nbuckets):
  i = 0
  table = []
  for unused < range(0, nbuckets):
    tabale.append([])
  return table

Update hashtable, if key is already in the table, change the value to the new value.

def hashtable_update(htable,key,value):
    bucket = hashtable_get_bucket(htable, key)
    for entry in hashtable
        if entry[0] == key:
          entry[1] = value
    bucket.append([key, value])

Indexing

If the keyword is already in the index, add the url to the list of urls associated with that keyword. If the keyword is not in the index, add an entry to the index: [keyword,[url]]

index = []

def add_to_index(index,keyword,url):
    for entry in index:
      if entry[0] == keyword:
        entry[1].append(url)
        return
    index.append([keyword,[url]])

Return a list of the urls associated with the keyword. If the keyword is not in the index, the procedure return an empty list.

def lookup(index,keyword):
    for entry in index:
      if entry[0] == keyword:
        return entry[1]
    return[]

split()

quote = "In Washington, it's dog eat dog. In academia...."
quote.split()

update the index to include all of the word occurences found in the page content by adding the url to the word’s associated url list.

def add_to_index(index,keyword,url):
    for entry in index:
        if entry[0] == keyword:
            entry[1].append(url)
            return
    index.append([keyword,[url]])

def add_page_to_index(index,url,content):
    words = content.split()
    for word in words:
    add_to_index(index, word, url)

python code getting url from internet

def get_page(url):
  try:
    import urllib
    return urllib.urlopen(url).read()
  except:
    return ""

The meaning of Latency and bandwidth is close but different.Latency is time it takes message to get from source to destination. Bandwidth is amount of information that can be transmitted per unit time. It is said mbps, bit per second.
internet-speed-test

Define traceroute

[vagrant@localhost ~]$ traceroute www.whitehouse.gov
traceroute to www.whitehouse.gov (184.26.234.167), 30 hops max, 60 byte packets
 1  10.0.2.2 (10.0.2.2)  0.474 ms  0.397 ms  0.302 ms
 2  10.0.2.2 (10.0.2.2)  88.120 ms  87.848 ms  87.469 ms

comparing traceroute in windows

PS C:\Users\hoge> tracert www.udacity.com

apollo-mesos-elb-berlioz2-prod-885022263.us-west-2.elb.amazonaws.com [52.40.117.247] へのルートをトレー
経由するホップ数は最大 30 です:

  1    94 ms     6 ms     6 ms  192.168.179.1
  2     *        *        *     要求がタイムアウトしました。
  3     *        *        *     要求がタイムアウトしました。
  4    65 ms    57 ms    78 ms  172.25.21.178
  5    61 ms    64 ms    65 ms  172.25.153.3
  6    74 ms    91 ms    63 ms  172.25.153.46
  7    65 ms    65 ms    67 ms  chyISN001.bb.kddi.ne.jp [27.85.176.181]
  8    87 ms    69 ms    71 ms  obpBBAC03.bb.kddi.ne.jp [27.90.191.254]
  9    59 ms    94 ms    59 ms  obpjbb205.int-gw.kddi.ne.jp [125.53.98.93]
 10   248 ms   238 ms   234 ms  pajbb001.int-gw.kddi.ne.jp [203.181.100.194]
 11   201 ms   228 ms   174 ms  ix-pa9.int-gw.kddi.ne.jp [111.87.3.74]
 12   190 ms   189 ms   178 ms  72.21.221.125
 13   210 ms   194 ms   202 ms  54.240.243.50
 14   197 ms   179 ms   179 ms  54.240.243.55
 15   204 ms   197 ms   204 ms  205.251.232.116
 16   264 ms   247 ms   197 ms  52.93.12.76
 17   209 ms   196 ms   196 ms  52.93.12.73
 18   244 ms   202 ms   217 ms  52.93.12.144
 19   203 ms   196 ms   195 ms  52.93.12.165
 20   208 ms   196 ms   197 ms  52.93.13.67
 21     *        *        *     要求がタイムアウトしました。
 22     *        *        *     要求がタイムアウトしました。
 23     *        *        *     要求がタイムアウトしました。
 24     *        *        *     要求がタイムアウトしました。
 25     *        *        *     要求がタイムアウトしました。
 26     *        *        *     要求がタイムアウトしました。
 27     *        *        *     要求がタイムアウトしました。
 28     *        *        *     要求がタイムアウトしました。
 29     *        *        *     要求がタイムアウトしました。
 30     *        *        *     要求がタイムアウトしました。

トレースを完了しました。
PS C:\Users\hoge> tracert www.mit.edu

e9566.dscb.akamaiedge.net [23.9.161.192] へのルートをトレースしています
経由するホップ数は最大 30 です:

  1  2662 ms     3 ms     7 ms  192.168.179.1
  2     *        *        *     要求がタイムアウトしました。
  3     *        *        *     要求がタイムアウトしました。
  4    71 ms    62 ms    83 ms  172.25.21.178
  5    86 ms    72 ms    63 ms  172.25.153.3
  6    74 ms    58 ms    79 ms  172.25.153.46
  7    72 ms    57 ms    79 ms  chyISN001.bb.kddi.ne.jp [27.85.176.181]
  8    84 ms    77 ms    79 ms  27.85.132.197
  9    83 ms    78 ms    78 ms  27.85.134.50
 10    89 ms    79 ms    79 ms  210.132.124.62
 11    77 ms    63 ms    73 ms  a23-9-161-192.deploy.static.akamaitechnologies.com [23.9.161.192]

the list of number

product_list, that takes as input a list of numbers, and returns a number that is the result of multiplying all those numbers together.

def product_list(p):
 result = 1
 for i in p
    result =  result * i
    return result

A greatest, that takes as input a list of positive numbers, and returns the greatest number in that list. If the input list is empty, the output should be 0.

def greatest(p):
  biggest = 0
  for i in p:
    if i > biggeset:
      biggest = i
  return biggest

total_enrollment, that takes as an input a list of elements, where each element is a list containing three elements: a university name, the total number of students enrolled, and the annual tuition fees.
The procedure return two numbers, giving the total number of students enrolled at all of the universities in the list, and the total tuition fees (which is the sum of the number of students enrolled times the
# tuition fees for each university).

usa_univs = [ ['California Institute of Technology',2175,37704],
              ['Harvard',19627,39849],
              ['Massachusetts Institute of Technology',10566,40732],
              ['Princeton',7802,37000],
              ['Rice',5879,35551],
              ['Stanford',19535,40569],
              ['Yale',11701,40500]  ]

def total_enrollment(p):
    total_student = 0
    total_tuition = 0
    for i in p:
        total_sutudent = i[1] + total_sudent
        total_tuition = i[1] * i[2] + total tuition
    return total_student, total_tuition

check_sudoku, that takes as input a square list of lists representing an n x n sudoku puzzle solution and returns the boolean True if the input is a valid sudoku square and returns the boolean False otherwise.

def check_sudoku(p):
    n = len(p)
    digit = 1
    while digit <= n:
      i = 0
      while i < n:
        row_count = 0
        col_count = 0
        j = 0
        while j < 0
          if p[i][j] == digit:
            row_count = row_count + 1
          if p[j][i] = digit:
            col_count = col_count + 1
          j = j + 1
        if row_count != 1 or col_count != 1:
          return False
        i = i + 1
      digit = digit + 1
    return True

divided by number by bumbers showing dicimal

def list_mean(p):
    return float(sum(p)) / len(p)

better data structure indexing keyword, thinking data structure is the most important in computer science.

[[<keyword>,[<url>,<url>]]
 [<keyword>,[<url>,<url>]]
 ...]

web crawler

needs the list to crawl and the list crawled.

Basic concept of crawling is here.

start with tocrawl = [seed]
 crawled = []
 while there more pages tocrawl:
    pick a page from tocrawl
    add that page to crawled
    add all the link target on
     this page to tocrawl
    return crawled

this is a procedure of get links.

def get_all_links(page);
  links = []
  while True:
    url, endpos = get_next_target(page)
    if url:
      links.append(url)
      page = page[endpos:]
    else:
      break
  return links

starting from the seedpage. The important point is web crawler follow last page of link using python pop.

 def crawl_web(seed):
   tocrawl = [seed]
   crawled = []
   while tocrawl:
     page = tocrawl.pop()

def union(p,q):
    for e in q:
        if e not in p:
            p.append(e)


def get_all_links(page):
    links = []
    while True:
        url,endpos = get_next_target(page)
        if url:
            links.append(url)
            page = page[endpos:]
        else:
            break
    return links

def crawl_web(seed):
    tocrawl = [seed]
    crawled = []
    while tocrawl:
        page = tocrawl.pop()
        if page not in crawled:
        union(tocrawl, get_all_links(get_page(page)))
        crawled.append(page)

NextDay Procedure

simple nextDay procedure, that assuming every month has 30 days.

For example:
### nextDay(1999, 12, 30) => (2000, 1, 1)
### nextDay(2013, 1, 30) => (2013, 2, 1)
### nextDay(2012, 12, 30) => (2013, 1, 1) (even though December really has 31 days)

def nextDay(year, month, day):
    """warning: this version incorrectly assumues all months have 30days!"""
    if month == 12 and day == 30:
        return year + 1, 1, 1
    elif month < 12:
        return year, month + 1, 1 
    else:
        return year, month, day + 1

day between date

def nextDay(year, month, day):
    if day < 30:
        return year, month, day + 1
    else:
        if month == 12:
            return year + 1, 1, 1
        else:
            return year, month + 1, 1
            
def deteIsBefore(year1, month1, day1, year2, month2, day2):
  if year1 < year2:
    return True
  if year1 == year2:
    if month1 < month2:
        return True
    if month1 == month2:
        return day1 < day2
    return False
        
def daysBetweenDates(year1, month1, day1, year2, month2, day2):    days = 0
    while dateIsBefoer(year1, month1, day1,year2, month2, day2):
        year1, month1, day1 = nextDayMonth(year1, month1, day1)
        days += 1
    return days

how_many_days, representing a month, and returns the number of days in that month.

days_in_month = [31,28,31,30,31,30,31,31,30,31,30,31]

def how_many_days(month_number):
    return days_in_month[month_number - 1]

capital of Indiaby accessing the list

countries = [['China','Beijing',1350],
             ['India','Delhi',1210],
             ['Romania','Bucharest',21],
             ['United States','Washington',307]]
print(countries[1][1])

replace spy code name with list

spy = [0,0,7]

def replace_spy(p):
    p[2] = p[2] + 1
    
replace_spy(spy)
print(spy)

List operation

p = [1, 2]
q = [3, 4]
p.append(q)
q[1] = 5
print(p)

2GB memory that means "2 ** 30 * 2 * 8(bit)", 1 bit have a light switch , so kind of this shows 17 billion light switch

print 2 ** 10
print 2 ** 20
print 2 ** 30
print 2 ** 40

Memory Hierarchy
Look at the latency-distance, speed of light is about 300,000km, CPU Register is 0.12m, and DRAM is 3.6m, harddrive is 2.98km.
So, if you writing a program, you have to think about the latency-distance of each memory that, harddrive is much much far than register and DRAM.

sum list

def sum_list(p):
  result = 0
  for e in p:
      result = result + e
  return result
print(sum_list([1, 4, 7]))

define first character U by for loop

def measure_udacity(p):
    count = 0
    for e in p:
        if p[0] == 'U':
            count = count + 1
    return count

find element in the list procedure

def find_element(p, t)
    i = 0
    for i < len(p):
        if p[i] == t:
          return i
        i = i + 1
    return -1

find_element, using index that takes as its inputs a list and a value of any type, and returns the index of the first element in the input list that matches the value.

def find_elment(p, t):
  if t in p:
    return p.index(t)
  else:
        return -1

union that takes as inputs two lists. It should modify the first input list to be the set union of the two lists, assume the first list is a set, that is, it contains no repeated elements.

def union(p, q):
    for e in q:
      if e not in p:
        p.append(e)