ソフトウェアエンジニアの技術ブログ：Software engineer tech blog – Page 571 – 随机应变 ABCD: Always Be Coding and … : хороший

Video Processing

1. Relationship between Images and Videos
2. Persistence of vision in playing (and computing) Videos
3. Extend filtering and processing of Images to Videos
4. Tracking points in Videos

Digital Image
numeric representation in two-dimensions (x and y)
referred to as I(x,y) in continuous function from I(i,j) in descrete
Image resolution
expressed as representation of width and Height of the image
Each pixel (picture element) contains light intensities for each value of x and y of I(x,y)

Video Resolution
– expressed as representation of width and height of the image
– usually in aspect ratios of 4*3, 16*9, etc

Foundation observation of why we perceive video
Rational behind the invention of video cameras
muybirdge(1830-1904) used stop-action photographs to study animal motion

Processing/ Filtering Video
– same as with images, just over a video volume
– Can filter in 3D
(x,y,t)
– Motion information is used in video compression
apply to xt- and yt- space
if all pixels from one frame to another frame, different, than it maybe a drastic motion change

same as in images
leverage the fact that features found in one frame may be visible in the next

Photosynth

1. Going beyond panoramas
2. Photo Tourism
3. Photo maps, street, vews, tec.

Photo Tourism => Photo Synth
– snavely, setz, szeliski, “photo tourism” exploring photo collections in 3D, ACM SIGGRAPH, 2006
– photosynth.net Technology prevew (2008-2013)

Scene reconstruction
-position, orientation, and focal length of cameras
-3D position of feature points

Feature detection -> Pairwise feature matching -> Correspondence estimation -> Incremental structure from motion

Structure from motion

Stereo

1. Geometry (Depth structure) in a Scene
2. Stereo
3. Parallax
4. Compute depth from a stereo image pair

Depth (of a scene)
3D scene -> illumination -> optics -> sensor -> processing -> display -> user

Above all, we are interested in capturing a 3D scene with Geometry
Xo,Yo,Zo
Xi = Xo/Zof, Yi=Yo/Zof

Fundamental ambiguity any points along the same ray map to the same point int the image
Perspective Nanishing lines/points
Depth Cues

trimensional
3D scanner for iPhone

Depth Cues
shape from structured light

Shape from x
– perspective, shading, motion, focus, occlusions, objects

"""Make an Anaglyph Image."""

import numpy as np
import cv2

def make_anaglyph(img_left, img_right):
	return np.dstack([img_right, img_right, img_left])

def test_run():
	"""Driver function called by Test Run."""
	img_left = cv2.imread("flowers-left.png", 0)
	img_right = cv2.imread("flowers-right.png", 0)
	cv2.imshow("Left image", img_left)
	cv2.imshow("Right image", img_right)

	img_ana = make_anaglyph(img_left, img_right)
	cv2.imshow("Anaglyph image", img_ana)

High Dynamic Range

1. Dynamic Range
2. Digital cameras do not encode Dyamic Range very well
3. Image Acquisition Pipeline for capturing scene radiance to pixel values
4. Linear and non-linear aspects inherent in the Image Acquisition pipeline
5. Camera Calibration
6. Pixel Values from different Exposure Images are used to render a Radiance map of scene
7. Tone mapping

Dynamic range in Real World
Inside, no lights long exposure
Inside, Incandescent light

Luminance: A photometric measure of the luminous intensity per unit area of light traveling in a given direction. measured in candela per square meter(cd/m^2)

Human static Constrast Ratio 1001 (10^2) -> about 65 f-stops
Human Dynamic constraste Ratio 10000000:1 (10^6:1) -> about 20 f-stops

3D scene -> scene radiance -> lens optics -> sensor irradiance -> shutter -> sensor exposure

Generate a Panorama

1. generate a panorama
2. Image Re-projection
3. Homography from a pair of images
4. Computing inliers and outliers
5. Details of constructing panoramas

5 steps to make a panorama
– comupte images
– detection and matching
– warping -> aligning images
– blending, fading, cutting
– cropping (optional)

Align images: Translate
A Bundle of Rays Contains all views
View 1, View 2 -> Synthetic
Possible to generate any synthetic camera view as long as it has the same center of projection

Image Re-Projection
To relate two images from the same camera center and map a pixel from PP1 to PP2
– cast a ray through each pixel in PP1
– Draw the pixel where that ray intersects PP2

Recall: Image warping
traslaion, scale, rotation, affine, perspective

Computing Homography
(x,y), (wx’/w, wy’/w)= (x’,y’)
To compute the homography H, given pairs of corresponding points in two images, we need to set up an equation

Set up a system of linear equation
Ah = b
where vector of unknowns
h = [a,b,c,d,e,f,g,h]^T

Need at least 8 equations, but the more the better
solve for h. if over-constrained, solve using least-equation

Warp into a shared coordinate space

Random sample consensus (RANSAC)
Select one match count INLIERS
Find “average” translation vector

1. select four feature pairs
2. compute homography H(exact)
3. compute inliers where SSD(Pin’, H Pin) < ε

“””Building a (crude) panorama from two images.”””

import numpy as np
import cv2

# Read images
img1 = cv2.imread(“einstein.png”) # left image
img2 = cv2.imread(“davinci.png”) # right image
print “Image 1 size: {}x{}”.format(img1.shape[1], img1.shape[0])
print “Image 2 size: {}x{}”.format(img2.shape[1], img2.shape[0])
cv2.imshow(“Image 1”, img1)
cv2.imshow(“Image 2”, img2)

# Convert to grayscale
img1_gray = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
img2_gray = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)

# Initialize ORB detector object
orb = cv2.ORB() # or cv2.SIFT() in OpenCV 2.4.9+

Image Morphing

1. Image Warping
2. Forward and inverse warping
3. warping using a mesh
4. image morphing
5. Feature-based image morphing

Transformation Lines remain lines
Warping points are mapped to point
A mathematical function for warping from a plane to the plane

Distorted through simulation of optical aberrations
Projected onto a curved or mirrored surface

Consider a S and T image
S has pixel coordinates(u,v)
T has pixel coordinates(x,y)
Forward(x,y)=[X(u,v),Y(u,v)]

Use a sparse set of corresponding points and interpolate with a displacement field
– triangulate the set of points on source
– use the affine model for each triangle
– triangulate target with displaced points
– use inverse mapping

Animations that changes (or morphs) one image or shape into another through a seamless transition
Widely used in movies
Image morphing approaches
Quadrilateral mesh displaced with variational interpolation
Corresponding features/ points
Corresponding orientated line segments (specifies translation, rotation, scaling)

Feature-based morphing

Homogeneous Coordinates

represent coordinates in 2 dimensions with 3-vector
Add a 3rd coordinate to every 2D point
(x, y, w)-> (x/w, y/w)
(x, y, 0) => infinity

p’ = Mp

Projective Transformations
– combination of affine transformations and projective warps

Properties
– origin does not necessarily map to origin
– lines map to lines

import cv2
import numpy as numpy

img = cv2.imread('tech.png')
height, width = img.shape[:2]
cv2.imshow("Original", img)

M_trans = np.float32(
	[[1, 0, 100],
	[0, 1, 50]])
print "Translation matrix:"
print M_trans
img_trans = cv2.warpAffine(img, M_trans, (width, height))
cv2.imshow("translation", img_trans)

Image Transformation

1. Transform the image
2. Rigid Transformations, translation, rotation
3. Affine/projective transformation
4. Degree of freedom for different transformations

image filtering
image warping

g(x)= T(f(x))

change domain of image
g(x)= f(T(x))

Parametric Global Warping
origin(0,0)

rotate Θ

Transformation T
p’ = T(p)

A global and parametric T
– same for any p
– a few numbers(parameters)

As a matrix transform
p’ = Mp

Image Scaling(2D)
– multiply each components by a scalar
– uniform scaling scalar same for x, y
– Non-uniform not same

2D Rotation
(x’, y’), (x, y)
x’ = xcos(Θ)- ysin(Θ)
y’ = xsin(Θ)- ycos(Θ)

Harris Detector: some property

Invariant to Rotation?
Invariance to image intensity?
Invariant to image scale?

Ellipse rotates but its shape(i.g. eigenvalues) remain the same
Corner response R is invariant to image rotation

Mostly invariant to additive and multiplicative intensity changes
only derivatives are used
intensity scale

Invariant to image scale
Not invariant to image scale
but can we do something about this

Consider regions(e.g. circles) of different size around a point
A region which is “scale invariant”
Not affected by the size but will be the same for “corresponding regions”
At a point, compute the scale invariant function over different size neighborhoods
Choose the scale for each image at which the function is a maximum

Sift scale invariant feature transform
specific suggestion use pyramid to find maximum values (remember edge detective) then eliminate “edges” and pick only corners

Harris-Laplacian
Find local maximum of
– harris corner
detector in space (image coordinates)
Laplacian in scale

Feature Detection

1. Harris Corner Detector Algorithm
2. SIFT

Harris Corner Response Function
R > 0, λ1, λ2 large

1.Compute horizontal and vertical derivatives of the image (convolve with derivative of Gaussians)
2.Compute outer products of gradients M
3.Convolve with larger Gaussian
4.Compute scalar interest measure R

Harris Detector Algorithm(Preview)
-compute Gaussian derivatives at each pixel
-compute second moment matrix M in a Gaussian window around each pixel
-compute corner response function R
-Threshold R
-Find local maxima of response function(non-maximum suppression)

"""Haris Cornaer Detection"""

import numpy as np
import cv2

def find_corners(img):
	"""Find corners in an image using Harris corner detection method."""

	img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) if len(img.shape) == 3 else img
	
	h_response = cv2.cornerHarris(img_gray, 2, 3, 0.04)

	h_min, h_max, _, _ = cv2.minMaxLoc(h_response)
	cv2.imshow("Harris response", np.uint8((h_response - h_mimn) * (255.0 / (h_max - h_min))))

	h_thresh = 0.01 * h_max
	_, h_selected = cv2.threshold(h_response, 1, cv2.THRESH_TOZERO)

	nhood_size = 5
	nhood_r = int(nhood_size / 2)
	corners = []
	for y in xrange(h_selected.shape[0]):
		for x in xrange(h_selected.shape[1]):
			h_value = h_selected.item(y, x)
			nhood = h_selected[(y - nhood_r):(y + nhood_r + 1),(x - nhood_r):(x + nhood_r + 1)]
			if not nhood.size:
				continue
			local_max = np.amax(nhood)
			if h_value == local_max:
				corners.append((x, y, h_value))
				h_selected[(y - nhood_r):(y + nhood_r),(x- nhood_r):(x + nhood_r)] = 0
				h_selected.itemset((y,x), h_value)

			h_suppressed = np.uint8((h_selected - h_thresh) * (255.0/ (h_max - h_thresh)))
			cv2.imshow("Suppressed Harris response", h_suppressed)
			return corners

def test():
	"""Test find_corners() with sample imput. """

	# Read image
	img = cv2.imread("octagon.png")
	cv2.imshow("image", img)

	corners = find_corners(img)
	print "\n".join("{}{}".format(corner[0], corner[1]) for corner in corners)

	img_out = img.copy()
	for (x, y, resp) in corners:
		cv2.circle(img_out, (x, y), 1, (0, 0, 255), -1)
		cv2.circle(img_out, (x, y), 5, (0, 255, 0), 1)
	cv2.imshow("Output", img_out)