SAP HANA High Availability - Introduction



When we talk about high availability, we mostly think it's just the hana system replication.
But, this is not the right statement, especially in case of SAP Hana. To know the reason, let's get into the concept of high availability.
High availability is a method or a set of techniques that allows us achieve business continuity. This basically means that the system should be available continuously to run the business.

SAP HANA is designed in such a way that it properly supports high availability. It can help us prevent our system from faults, software errors & disasters.

Following are the high level concepts that help us achieve high availability in SAP: 

  • Fault tolerance - It is the ability of a system to suffer a fault but continue to operate. It eliminates single points of failure.
  • Fault resilience - It is the ability to recover quickly from an outage
  • Fault recovery - It is the process of recovering and resuming operations after an outage due to a fault
  • Disaster recovery - It is the process of recovering operations after an outage due to a prolonged data center or site failure

Note: Fault tolerance and high availability augment each other in the field of IT resilience, so you may find different definitions at multiple places. We suggest you not to consider those definitions here.

Defense mechanisms supported by Hana for failure related outages:

  • Hardware Redundancy - SAP HANA appliance vendors offer multiple layers of redundant hardware, software and network components, such as redundant power supplies and fans, enterprise grade error-correcting memories, fully redundant network switches and routers, and uninterrupted power supply (UPS). Disk storage systems use batteries to guarantee writing even in the presence of power failure, and use striping and mirroring to provide redundancy for automatic recovery from disk failures. Generally speaking, all these redundancy solutions are transparent to SAP HANA's operation, but they form part of the defense against system outage due to single component failures.
  • Software - SAP HANA is based on SUSE Linux Enterprise 11 for SAP and includes security pre-configurations (for example, minimal network services). Additionally, the SAP HANA system software also includes a watchdog function, which automatically restarts configured services (index server, name server, and so on), in case of detected stoppage (killed or crashed).
  • Persistence - SAP HANA persists transaction logs, savepoints and snapshots to support system restart and recovery from host failures, with minimal delay and without loss of data.
  • Standby and Failover - Separate, dedicated standby hosts are used for failover, in case of failure of the primary, active hosts. This improves the availability by significantly reducing the recovery time from an outage.

SAP HANA supports the following recovery measures from failures:

  • Disaster recovery support:
    • Backups: Periodic saving of database copies in safe place.
    • Storage replication: Continuous replication (mirroring) between primary storage and backup storage over a network (may be synchronous).
    • System replication: Continuous update of secondary systems by primary system, including in-memory table loading.
  • Fault recovery support:
    • Service auto-restart: Automatic restart of stopped services on host (watchdog).
    • Host auto-failover: Automatic failover from crashed host to standby host in the same system.
    • System replication: Continuous update of secondary systems by primary system, including in-memory table loading and read-only access on the secondary.

In our next blog we'll specifically talk about hana system replication.

Comments