Starting from today 2019-11-01, I am going to study the course CMU 15-440 Distributed systems course on my own. I have prepared the challenges mentally and physically. The principle is to be curious about new technologies and learn something new, Most importantly, have fun!
Resources
Course dd website: https://www.synergylabs.org/courses/15-440/syllabus.html
Day 0 Intro
What is a distributed system?
A collection of independent computers that appears to its users as a single coherent system.
Characteristic of a DS?
- Present a single-system image
- Easily expansible
- Continuously availability
- Supported by middleware
Domain Name Server
Decentralized - admins updates own domains without coordinating with other domains.
Scalable - used for hundreds of millions of domains
Robust - handles load and failures well
Day 1 Network
Basic idea
Network is built on top of many links.
Need to share network resources. Switched network: Party “A” gets resources sometimes, Party “B” gets them sometimes. So what is the mechanism to share resources? The answer is Packet Switching. Source sends information as self-contained packets that have an address. Each packet travels independently to the destination host.
What if Network is Overloaded?
Solution: Buffering and Congestion Control
(TODO) Figure out how the congestion control works. What if buffer overflows? Packets dropped.
Model of Communication channel
- Latency. How long does it take for the first bit to reach destination.
- Capacity. How many bits/sec can we push through.
- Jitter. How much variation in latency.
- Loss/Reliability. Can the channel drop packets?
- Packet reordering.
Back-of-the-Envelope Bandwidth Calculation
- Cross country latency
Distance/speed = 510^6m/210^8m/s = 25*10^-3s = 25ms 50ms round-trip-time (RTT) for one bit
- Link speed (capacity) 100Mbps
- Packet size = 1250 bytes = 10kbits
- 1 packet takes 10k/100M = .1ms to transmit. 25ms to reach there. ACKs are small. So 0ms to transmit. 25ms to get back.
- Effective bandwith = 10kbits/50.1ms = 200kbits/sec
Internet
An inter-net: a network of networks.
Networks are connected using routers that support communication in a hierarchical fashion. Often need other special devices at the boundaries for security, accounting
How to get ip address: From Regional Internet Registries (RIRs) ARIN (North America, Southern Africa), APNIC (Asia-Pacific), RIPE (Europe, Northern Africa), LACNIC (South America)
Classical Synchronization
Concurrency
Concurrency is key in DS. Today, we start with threads on a single node. We will extend them to multiple machines.
Solution is Mutual Exclusion. Guarantee that only a single thread/process enters a CS, avoiding races.
Concurrency model for GO
- Set up a mini client/server within programs
- Channels are used for passing information around. Synchronizing Goroutines. Provide pointer to return location.
- GoRountines indepently executing function, launched by “go”. Independent call stack, very inexpensive, 1000s of them.
- Concept: Instead of communicating by sharing memory, share memory by communicating.
What is Concurrency
Concurrency is the composition of independently executing computations.
Concurrency vs Parellelism
Concurrency is not parallelism, although it enables parallelism. 1 Processor: Program can still be concurrent but not parallel.
A well written concurrent program may run well on a multiprocessor platform.
Go channels have some limitations: size is bounded.