An availability set is a logical grouping of VMs to provide for redundancy and availability
“Availability Set” refers to two or more Virtual Machines deployed across different Fault Domains to avoid a single point of failure.
There is no cost for the Availability Set itself, you only pay for each VM instance that you add to availability set
If two or more VMs should be provisioned within an availability set, SLA will be 99.95%
Concepts
Fault Domain:
A collection of servers that share common resources such as power and network connectivity
A fault domain is factually a rack of the servers which consumes mostly subsystem like network, power, cooling, etc.
Update Domain:
A group of VMs and underlaying hardware that can be rebooted together
Think update domain as “a server” (hardware + hardware manager + service febric + VMs) in rack
For planned maitenance, hardware or software level patch/update, each update domain will be shutdown one at a time
Each virtual machine in the Availability Set is assigned an Update domain and Fault domain by the Azure platform
VMs in Availability Set are automatically distributed across multiple fault domains
You can’t add an existing VM to an availability set after it’s created, you can only provision/add new VMs to Availability Set. So, plan accordingly from the beginning
Each virtual machine in the Availability Set is assigned an Update domain and Fault domain by the Azure platform
Availability Sets ensure that the Azure virtual machines are deployed across multiple isolated hardware nodes in a cluster
Availability Set prevents single point of failure
If a server in a rack (update domain) goes down or a whole rack (fault domain) goes down, your VMs are safe because Availability Set ensures that VMs are spread across “multiple fault domains” (“multiple racks”)
For planned maitenance (i.e. patching by Microsoft), one update domains (servers) will be rebooted one at a time, and since VMs are spread across multiple racks, VMs are safe during planned maintenance
SLA
Availability Set gives 99.95%
The reason SLA is not five 9’s (99.999%) is that, in case of Datacenter level failure or Region level failure, your VMs are not safe