OS structuring

SPIN and External approaches to OS extensibility
L3: Micro kernel-based approach to OS extensibility

OS service example
process, memory management, interprocess communication(ipc), file system, access to I/O devices, access to the network

Why the structure of the OS is important?
-protection, performance, flexibility, scalability, agility, responsiveness

Protection: within and across users + the OS itself
Performance: Time taken to perform the services
Flexibility: Extensibility => not one size at all
Scalability: performance if hardware resources
Agility: adapting to changes in application needs and/or resource availability
Responsiveness: reaching to external events

Commercial OS
– Linux, MacOS, Windows

Managing the CPU and Memory

OS abstraction

Resource needs of applications
-cpu, memory, peripheral devices
app launch time
– know how to create memory
App – os loader – stack, heap, global data memory
App asks for additional resources at runtime

Processor related OS abstractions
– program => static image loaded into memory
– process => a program in execution
Process = program + state => evolves as the program executes

advanced Operating Systems

variety of platforms — cell phones, multi-core, parallel systems, distributed systems, and cloud computing.

Digging deeper into the power of abstruction
Google earth
<-> series of abstruction <->
Electrons, Transistors, Logic gates, Seq. + comb.Logic element, machine organization(Data path + control), Instruction set architecture, System software(os, compilers, etc), Applications

Hardware continuum
smartphone, tablet laptop, desktop, server, cloud

Internal organization
– same for all manifestations
Contr, CPU, memory, Bus, contr, Network

System Bus(Higher speed), frame buffer and I/O Bus(Lower speed) connected with Bridge

OS is a resource manager
OS provides a consistent interface to the hardware resources
OS schedules applications on the cpu

OS protected access to hardware resources
-arbitrate among competing requests

Cloud Deployment Models

Public
-third party customers/tenants
Private
-leverage technology internally
Hybrid(Public + Private)
-fail over, dealing with spikes testing
Community
-used by certain type of users

On-premises
Infrastructure
Platform(Paas)
Software(SaaS)

1. “fungible” resources
2. elastic, dynamic resource allocations
3. scale: management at scale, scalable resources
4. dealing with failures
5. multi-tenancy: performance & isolation
6. security

Cloud-enabling Technologies
-virtualization
-Resource provisioning (scheduling) mesos, yarn…

Storage
-distributed FS(“append only”)
-NoSQL, distributed in-memory caches…

Software – befined… networking, storage, datacenters…

“the cloud as a big data engine”
-data storage layer
-data processing layer
-caching layer
-language fron-ends

Datacenter Technologies

Internet service == any type of service provided via web interface

-presentation == static content
-business logic == dynamic content
-database tier == data store

-not necessarily separate processes on separate machines
-many available open source and proprietary technologies

…in multi process configurations ->
some form of IPC used, including RPC/RMO, shared memory …

For scale: multi-process, multi-node
=> “scale out” architecture

1. “Boss-worker”: front-end distributes requests to nodes
2. “All Equal”: all nodes execute any possible step in request processing, for any request

Functionally heterogeneous…
-different nodes, different tasks/requests
-data doesn’t have to be uniformly accessible everywhere

Traditional Approach:
– buy and configure resources
=> determine capacity based on expected demand(peak)
– When demand exceeds capacity
dropped request
lost opportunity

・on-demand elastic resources and services
・fine-grained pricing based on usage
・professionally managed and hosted
・API-based access

shared resources
– infrastructure and software/services
APIs for access & configuration
– web-based, libraries, command line…

Law of large numbers
– per customer there is large variation in resource needs
– average across many customers is roughly constant
Economies of Scale
– unit cost of providing resources or service drop at “bulk”

Distributed Shared Memory

-must decide placement
place memory (pages) close to relevant processes
-must decide migration
when to copy memory(pages) from remote to local
-must decide sharing rules

Client
-send requests to file service

Caching
-improve performance (seen by client) and scalability

Servers
-own and manage state(files)
-provide service(file access)

Each node…
-“owns” state => memory
-provides service
memory reads/writes
from any node
consistency protocols

permits scaling beyond single machine memory limits
– more shared memory at lower cost
– slower overall memory access
– commodity interconnect technologies support this(RDMA)

Hardware vs Software DSM
hardware supported
– relies on interconnect
– os manages larger physical memory
– NICs translate remote MM accesses to messages
– NICs involved in all aspects of Mm management, support atomics
Software-supported
– everything done by software
– os or language runtime

Application access algorithm
-single reader/ single write(srsw)
-multiple readers / single writer(mrsw)
-multiple readers / multiple writers(mrmw)

Performance considerations
DSM performance metric == access latency
Achieving low latency through.. migration
-makes sense for srsw
-requires data movement
Replication(caching)
-more general
-requires consistency management

DSM Design: Consistency management
DSM ~shared memory in SMPs
in smp
– write -invalidate
– write -update

DSM Desgin: Cosistency management
Push invalidations when data is written to…
Pull modification info periodically…

if MRMW…
– need local caches for performance
– home node drives coherence
– all nodes responsible for part of distributed memory management

“Home” node
– keep state: pages accessed, modifications, caching enabled/disabled, locked…
– current “owner”

Consistency model == agreement between memory(state) and upper software layers

“mem behaves correctcy if and only if software follows specific rules”

Replication vs. Partitioning

Replication == each machine holds all files
load balancing, availability fault tolerance
writes become more complex
-> synchronously to all
-> or, write to one, then propagated to others
replicas must be reconsiled
Partitioning
== each machine has subset of files
availability vs. single server DFS
scalabililty w/file system size
single file write simpler
on failure, lose portion of data load balancing harder; if not balanced, then hot spots possible

NFSv3 == stateless, NFSv4 == stateful
caching
session-based(non-concurrent)
periodic updates
– default: 3sec for files; 30 sec for dir
NFSv4 => delegation to client for a period of time(avoids ‘update checks’)
locking
lease-based
NFSv4 => also “share reservation” – reader/writer lock

Access Pattern (workload) analysis
-33% of all file accesses are writes

Distribute File Systems

DFS design and implementation
Networked File System(NFS)
Caching in the Sprite Network File System by Nelson et al

-Accessed via well-defined interface
– access via VFS
-Focus on consistent state
-mixed distribution models possible

client application machine
(file-system interface, vfs interface, local file system)
file server machine

Remote File Service: Extremes
Upload/Download
– like FTP, SVN…
local reads/ writes at client
entire file download/upload even for small access
True Remote File Access
– every access to remote file, nothing done locally

file access centralized, easy to reason about consitency
every file operation pays network cost

A more Pratical Remote File Access (with Caching)
1. allow client to store parts of files locally
low latency on file operations server load reduced => is more scalable
2. Force clients to interact w/server (frequently)
server has insights into what clients are doing server has control into which accesses can be permitted => easier to maintain consistency

however, server more complex, requires different file sharing semantics

Stateless vs. stateful file server

Caching State in a DFS(optimization)
-locally clients maintain portion of state(file blocks)
-locally clients perform operations on chached state(open/read/write)

File Sharing
-in client memory
-on client storage device(HDD/SDD…)
-in buffer cache in memory on server(usefulness will depend on clients load, request interleaving…)

File sharing semantics in DFS
UNIX semantics => every write visible immediately
Session semantics(between open-close => session)
– write back on close(), update on open()
– easy to reason, but may be insufficient

Periodic Updates
– client writes-back periodically
– server invalidates periodically

Java RMI

Java Remote Method Invocations -RMI
– among address spaces in JVM(s)
– matches JAVA OO semantics
– IOL == Java (language-specific)

RMI Runtime
– Remote Reference Layer
unicast, broadcast, return-first response, returen-if-all-match
-Transport
TCP, UDP, shared memory

SunPRC Binding

CLIENT* clnt_create(char* host, unsigned long prog,
	unsigned long vers, char* proto);

// for square example
CLIENT* clnt_handle;
clnt_handle = clnt_create(rpc_host_name, SQUARE_PROG, SQURE_VERS, "tcp");

CLIENT type
– client handle
– status error, authentication…

XDR Data Types
Default Types
-char, byte, int, float…
Additional XDR types
-const(#define), hyper(64-bit integer), guadruple(128-bit float), opaque(~c byte)
– uninterpreted binary data

Fixed length array
– e.g., int data[80]
Variable length array
– e.g., int data<80> => translate into a data structure with “len” and “val” fields

except for strings
– string line<80> => c pointer to char
– stored in memory as a normal null-terminated string
– encoded for (transmission) as a pair of length and data

XDR Routines
marshalling/unmarshalling
-found in square_xdr.c
Clean-up
-xdr.free()
-user_define.freeresult procedure
-e.g., square_prog_1_freeresult
-called after result returned

RPC header
-service procedure ID, version number, request ID…
Actual data
– arguments or results
– encoded into a byte stream depending on data type

XDR IDL + the encoding
– i.e., the binary representation of data “on-the-wire”
XDR Encoding Rules
– all data types are encoded in multiples of 4 bytes
– big endian is the transmission standard