Präsentation zum Thema: "EMC – GESCHÄFTSKRITISCHE BUSINESS CONTINUITY FÜR SAP"— Präsentation transkript:
1 EMC – GESCHÄFTSKRITISCHE BUSINESS CONTINUITY FÜR SAP Note to Presenter: This presentation supports the solution described in the white paper: EMC Mission-Critical Business Continuity for SAP - EMC VPLEX, Symmetrix VMAX, VNX, VMware vSphere HA, Brocade Networking, Oracle RAC, SUSE Linux Enterprise. It is intended for SAP Basis Administrators, Oracle DBAs, storage administrators, IT architects, and technical managers responsible for designing, creating, and managing mission-critical SAP applications in 24/7 landscapes._______________________________________________________Welcome!Today we will discuss an EMC solution that addresses mission-critical business continuity for SAP applications. As you can see from the list of components, the solution combines technologies from EMC, VMware, Brocade, Oracle, and SUSE. And the application platform for which the solution is designed is SAP ERP.EMC VPLEX, EMC Symmetrix VMAX, EMC VNX, VMware vSphere HA, Brocade-Netzwerk, Oracle RAC, SUSE Linux EnterpriseEMC Solutions Group
2 Überblick Lösungsüberblick und Architektur Lösungskomponenten und KonfigurationEMC VPLEX MetroVMware vSphereSAP-SystemarchitekturOracle-DatenbankBrocade-NetzwerkEMC SpeicherTests und ValidierungZusammenfassung und FazitThe organization of the presentation is straightforward:A general overview of the solution, including the challenges it addresses and how it addresses them.A review of the overall architecture of the solution.An introduction to each of the enabling technologies in turn, and how they are configured for the solution.A summary of the testing that EMC carried out to validate the solution.A brief summary of the main points from the presentation and of the business benefits of the solution.
3 Geschäftskritische Business Continuity für SAP Beseitigung von Single-Points-of-Failure auf allen Ebenen der UmgebungEinführung von Aktiv/Aktiv-Rechenzentren mit RPOs und RTOs von nahezu nullWhen designing a business continuity and high availability strategy, businesses must consider a range of challenges. Recovery point objectives (RPOs) and recovery time objectives (RTOs) are key metrics and answer two fundamental questions that businesses must address:How much data can we afford to lose (RPO)?How fast do we need the system or application to recover (RTO)?For mission-critical applications, minimizing RPO and RTO is a key challenge.The other main challenges include:Eliminating single points of failure—technology, people, processesMaximizing resource utilizationReducing infrastructure costsManaging the complexity of integrating, maintaining, and testing multiple point solutionsThe solution we’re presenting here addresses all these challenges for SAP ERP applications. It demonstrates an innovative, active/active deployment model for data centers up to 100 km apart. This transforms the traditional active/passive disaster recovery model to a highly available business continuity solution, with 24/7 application availability, no single points of failure, and near-zero RTOs and RPOs. And the solution is fully automated.The solution scenario consists of two, geographically-separate data centers, with SAP ERP running on VMWare virtual machines, and Oracle Database running on physical servers at the two sites. EMC VMAX and VNX arrays provide the physical storage for the environment and EMC VPLEX Metro provides distributed storage federation across the two sites.At a high level, the solution:Eliminates single points of failure at all layers in the environment, including storage, database, application, and network.Provides active/active data centers that support near-zero RPOs and RTOs and mission-critical business continuity.Additional benefits are also identified on this slide. All of these benefits can also deliver reduced costs for the business.Aktiv/Aktiv-RechenzentrenRPOs und RTOs von nahezu nullUnterbrechungsfreie AnwendungsverfügbarkeitKein Single-Point-of-FailureVereinfachtes Management für hohe VerfügbarkeitFehlerbehandlung und Lastenausgleich vollautomatischWartung ohne AusfallzeitVereinfachte Bereitstellung von Oracle RAC auf Extended Distance ClustersHöhere Infrastrukturauslastung
4 Herausforderung und Lösung Herausforderung Single-Points-of-Failure bei SAPLösung Hohe Verfügbarkeit und Business ContinuitySAP implementations – the challenge and the solutionTraditional SAP implementations have several single points of failure (SPOFs), including:Central ServicesEnqueue serverMessage serverDatabase serverSingle site deploymentLocal disk storageThe diagram on the left here illustrates these single points of failure and the diagram on the right illustrates the solution components that address these SPOFs – though this is not an exact one-to-one mapping. So, for example, VMware vSphere virtualizes the SAP application components and eliminates these as single points of failure, and VPLEX Metro virtualizes the storage layer and enables an active/active data center distributed across two geographically-separate sites.Overall, the architecture and components of the solution create an active/active clustered solution for the entire SAP stack. This enhances reliability and availability while simplifying the deployment and management of the environment.Note that, in this solution, the SAP enqueue and message servers are implemented as services within the ASCS instance.
5 Wegfall von Single-Points-of-Failure This slide illustrates the high-availability solutions implemented at each layer of the environment to provide mission- critical high availability.a) Independent physical server with local storageThe EMC validation team initially installed and validated the environment without any high-availability or business continuity protection schemes. The SAP application and database components resided on independent physical servers with local storage. Each single point of failure was then mitigated by using fault-tolerant components and high-availability clustering technologies.b) Storage layer HAAll the storage required by the servers in the environment was moved to enterprise-class EMC storage arrays – a Symmetrix VMAX at Site A and a VNX5700 at Site B. In addition, Brocade 8510 Backbones were deployed to provide a redundant SAN fabric for storage access. This takes advantage of the proven five 9s uptime provided by the arrays and the SAN Backbones, including their advanced manageability and business continuity features.c) Database HAAt the database layer, the backend database server was converted from an Oracle single instance database to a four-node Oracle RAC database on Oracle ASM. This eliminates the database server as a single point of failure.d) SAP application HAThe SAP application servers were fully virtualized using VMware ESXi 5.0. Each of the SAP virtual machines was deployed using SUSE Linux Enterprise Server for SAP Applications as the guest operating system.SUSE Linux Enterprise High Availability Extension and SAP Enqueue Replication Server (ERS) were also deployed to protect the SAP message server and enqueue server. This eliminates the ASCS as a single point of failure.e) Data center HAThe high-availability cluster configuration implemented thus far protects SAP within the data center. For high availability between the two data centers, the solution uses EMC VPLEX Metro storage virtualization technology. VPLEX Metro’s unique active/active clustering technology allows read/write access to distributed volumes across synchronous distances, enabling users at both locations to access the same information at the same time.This solution combines VPLEX Metro with SUSE Linux Enterprise HAE (at the operating system layer) and Oracle RAC (at the database layer) to remove the data center as a single point of failure and provide a robust business continuity strategy for mission-critical applications.f) Network HAWill look at in a later slide.
6 LösungskomponentenDie geschäftskritische Business Continuity für SAP ERP wird von einer Kombination aus Technologien von EMC, VMware, Oracle, SUSE und Brocade bereitgestellt.EMC VPLEX MetroEMC VPLEX WitnessEMC Symmetrix VMAX und EMC VNXOracle RAC auf Extended Distance ClustersVMware vSphereVMware vSphere High AvailabilitySUSE Linux Enterprise Server für SAP-Anwendungen mit SUSE Linux Enterprise High Availability ExtensionSAP Enqueue Replication ServerBrocade MLXe-CorerouterBrocade DCX 8510-BackbonesThese are the main technologies used by the solution:EMC VPLEX Metro is the primary enabling technology. It is a SAN-based storage federation solution that delivers both local and distributed storage federation. In the context of this solution, it is the technology that provides the virtual storage layer that enables an active/active Metro data center.EMC VPLEX Witness is a high availability component that supports continuous application availability, even in event of disruption at one of the data centers.EMC VMAX and EMC VNX arrays provide the enterprise-class storage platform for the solution, with proven five 9s availability, Fully Automated Storage Tiering (FAST), and a choice of replication technologies.The solution is designed for a SAP ERP system with SAP services on virtual machines and the database on physical servers.Oracle Database 11g provides the database platform for the solution. A single instance database was migrated to Oracle RAC on Extended Distance Clusters to remove single points of failure at the database layer, across distance.VMware vSphere virtualizes the SAP application components and eliminates these as single points of failure. And VMware High Availability (HA) protects the virtual machines in the case of physical server and OS failures.SUSE Linux Enterprise Server for SAP Applications, with SUSE Linux Enterprise High Availability Extension and SAP Enqueue Replication Server, protects the SAP central services across two cluster nodes to eliminate these services as single points of failure.Brocade Ethernet fabrics and MLXe core routers provide seamless networking and Layer2 extension between sites.Brocade DCX 8510 Backbones provide redundant SAN infrastructure, including fabric extension.
7 LösungsarchitekturThis diagram illustrates the physical architecture of all layers of the solution, including the network components.In each data center, an Ethernet fabric was built using Brocade virtual cluster switch (VCS) technology, which delivers a self-healing and resilient access layer with all links forwarding. Virtual Link Aggregation Groups (vLAGs) connect the VCS fabrics to the Brocade MLXe core routers that extend the Layer 2 network across the two data centersVPLEX WitnessThis diagram also shows the VPLEX Witness component that the solution uses to monitor connectivity between the two VPLEX clusters and to ensure continued availability in the event of an inter-cluster network partition failure or a cluster failure. VPLEX Witness is deployed on a virtual machine at a third, separate failure domain.
8 Datensicherheitsebenen Before we move on to see how each of the enabling technologies was configured for the solution, this slide briefly summarizes the HA layers that the solution uses to eliminate single points of failure.The table at the center of the diagram summarizes the components deployed to provide local high availability.VPLEX Metro then extends this local high availability with a clustering architecture that breaks the boundaries of the data center and allows servers at multiple data centers to have read/write access to shared block storage devices.An even higher degree of resilience is then achieved by using VPLEX Witness and a VPLEX Cross-Cluster Connect configuration – both of which we will discuss later in the presentation.
9 VPLEX Metro – Einführung Standort AVPLEX Cross-Cluster ConnectStandort BSAN-basierter SpeicherverbundAktiv/Aktiv-RechenzentrenCa. 100 km EntfernungWorkload-AusgleichRPO/RTO von nahezu nullRechenzentrumsmigrationNote to Presenter: This slide contains animation. The first click reveals the VPLEX Witness components; the second click reveals the Cross-Cluster Connection configuration.VPLEXEMC VPLEX is a storage virtualization solution for both EMC and non-EMC storage arrays. EMC offers VPLEX in three configurations: VPLEX Local, which enables storage virtualization within a data center; VPLEX Metro, which enables storage virtualization across synchronous distances; and VPLEX Geo, which enables storage virtualization across asynchronous distances.VPLEX MetroThis solution is based on VPLEX Metro. VPLEX Metro uses a unique clustering architecture – called AccessAnywhere – to help customers break the boundaries of the data center by enabling the same data to exist in two separate geographical locations, and to be accessed and updated at both locations at the same time. The two data centers can be up to 100 km apart, or have a round-trip time of up to 5 ms. This architecture delivers active/active, block-level access to data on two sites within synchronous distances, and supports workload balancing, near-zero RPOs and RTOs, and non-disruptive data center migration.Note to Presenter: Click now to display VPLEX WitnessVPLEX High AvailabilityVPLEX Metro enables application and data mobility and, when configured with VPLEX Witness, provides a high-availability infrastructure for clustered applications such as Oracle RAC. VPLEX Witness is an optional external server that is installed as a virtual machine in a separate failure domain to the VPLEX clusters. It connects to both VPLEX clusters using a VPN over the management IP network. By reconciling its own observations with information reported periodically by the clusters, the Witness enables the clusters to distinguish between inter-cluster network partition failures and cluster failures and to automatically resume I/O at the appropriate site.VPLEX Metro enables you to build an extended or stretch cluster as if it was a local cluster, and removes the data center as a single point of failure. Moreover, as the data and applications are active at both sites, the solution provides a simple business continuity strategy.Note to Presenter: Click now to display VPLEX Cross-Cluster ConnectionAn even higher degree of availability can be achieved by using a VPLEX Cross-Cluster Connect configuration. In this case, each host is connected to the VPLEX clusters at both sites. This ensures that, in the unlikely event of a full VPLEX cluster failure, the host has an alternate path to the remaining VPLEX cluster.Standort CVPLEXWITNESSVPLEX High AvailabilityVPLEX WitnessVPLEX Cross-Cluster ConnectAccessAnywhereAktivAktiv
10 VPLEX Metro-Konfiguration Logische Strukturen bei VPLEXConsistency GroupVirtuelles VolumeVerteiltes GerätGerätExtentSpeicher-VolumeVPLEX encapsulates traditional physical storage array devices and applies layers of logical abstraction to these exported LUNs. This slide provides an overview of VPLEX logical storage structures and how these are configured for the solution.Starting at the bottom of the storage structure hierarchy:A storage volume is a LUN exported from an array and encapsulated by VPLEX. An extent is the mechanism VPLEX uses to divide storage volumes and may use all or part of the capacity of the underlying storage volume. A device encapsulates an extent or combines multiple extents or other devices into one large device with a specific RAID type. For the solution, there is a one-to-one mapping between storage volumes, extents, and devices at each site. The devices encapsulated at Site A are virtually provisioned thin devices, while the devices encapsulated at Site B are traditional LUNs.Next in the hierarchy are distributed devices. These encapsulate other devices from two separate VPLEX clusters. At the top layer of the storage structure are virtual volumes. These are created from a top-level device, which can be either a device or a distributed device. Virtual volumes are the elements that VPLEX exposes to hosts. To create distributed devices for the solution, all cluster-1 devices are mirrored remotely on cluster-2, in a distributed RAID 1 configuration. These distributed devices are encapsulated by virtual volumes, which are then presented to the hosts through storage views. The storage views define which hosts access which virtual volumes on which VPLEX ports.Next are consistency groups, which aggregate virtual volumes so that the same properties can be applied to them all. VPLEX Metro uses synchronous (as opposed to asynchronous) consistency groups. With synchronous consistency groups, clusters can be separated by up to 5 ms of latency. In this case, VPLEX Metro sends writes to the back- end storage volumes, and acknowledges a write to the application only when the back- end storage volumes in both clusters acknowledge the write.For the solution, a single consistency group contains all the virtual volumes that hold the Oracle database binaries, the ASM disk groups, and the OCR and voting files. A detach rule is defined for the consistency group to specify cluster-1 (or Site A) as the preferred cluster.
11 VMware-Virtualisierungskomponenten This slide provides an overview of the virtualization platform for the solution.As you can see, the SAP application servers are fully virtualized using VMware vSphere 5.0 and VPLEX Witness is also deployed on a virtual machine.VMware vMotion and VMware Storage vMotion are implemented as part of the solution, as are VMware High Availability and the vSphere Distributed Resource Scheduler.The last two items here are EMC plug-ins for vSphere:PowerPath/VE works as a multipathing plug-in that provides enhanced path management capabilities to ESXi hosts.VSI is a vSphere plug-in that provides a single management interface for managing EMC storage within the vSphere environment.vSphere 5.0vMotionStorage vMotionVMware HADRS (Distributed Resource Scheduler)EMC PowerPath/VEEMC VSI (Virtual Storage Integrator)
12 VMware vSphere mit VPLEX Metro Cross-Cluster ConnectWe’ll take a look now at VMware deployments on VPLEX Metro in general and then discuss the particular configuration used for this solution.VPLEX Metro delivers concurrent access to the same set of devices at two physically separate locations. This provides an active/active infrastructure that enables geographically stretched clusters based on VMware vSphere. And the use of Brocade vLAG technology enables extension of VLANs, and hence subnets, across the two physical data centers.So what can we achieve by deploying vSphere features and components together with VPLEX Metro?vMotion: Provides the ability to live migrate virtual machines between the two sites in anticipation of planned events such as hardware maintenance.Storage vMotion: Provides the ability to migrate a virtual machine’s storage without any interruption in the availability of the virtual machine. This allows the relocation of live virtual machines to new datastores.VMware DRS: Provides automatic load distribution and virtual machine placement across the two sites through the use of DRS groups and affinity rules.VMware HA: VMware HA is a host failover clustering technology that leverages multiple ESXi hosts, configured as a cluster, to provide rapid recovery from outages and cost-effective high availability for applications running in virtual machines. It protects against server failure by restarting VMs on other ESXi servers within the cluster, and it protects against application failure by monitoring VMs and resetting them in the event of guest OS failure. Combining VPLEX Metro HA with VMware HA provides automatic application restart for any site-level disaster.VPLEX Metro HA Cross-Cluster Connect: Protection of the VMware HA cluster can be further increased by adding a cross-cluster connect between the local VMware ESXi servers and the VPLEX cluster on the remote site, as shown in this slide. Cross-connecting vSphere environments to VPLEX clusters protects against local data unavailability events (which VMware vSphere 5.0 does not recognize) and ensures that failed virtual machines automatically move to the surviving site. This solution uses VPLEX Metro HA with Cross- Cluster Connect to maximize the availability of the VMware virtual machines.Note: VPLEX Cross-Cluster Connect is available for up to 1 ms of distance-induced latency.
13 VMware-Konfiguration mit über größere Entfernungen ausgedehnten Clustern The screenshots in this slide illustrate the configuration of the VMware stretched cluster for the solution.A single vSphere cluster is stretched between Site A and Site B by using a distributed VPLEX virtual volume with VMware HA and VMware DRS. There are four hosts in the cluster, two at each site. VPLEX Metro HA Cross-Cluster Connect provides increased resilience to the configuration.The first screenshot is from vCenter and shows the configuration of the vSphere cluster, with its four hosts and with vSphere DRS and HA enabled.The second screenshot shows the configuration of the datastore (EXT_SAP_VPLEX_DS01) created for the solution. This datastore was created on a 1 TB VPLEX distributed volume and presented to the ESXi hosts in the stretch cluster. All virtual machines were migrated to this datastore, using Storage vMotion, either because they needed to share virtual disks or because they needed to be able to vMotion between sites.vCenter- Screenshots
14 VMware HA- und DRS-Konfiguration We’ve seen that both vSphere and DRS were enabled for the VMware stretched cluster. The first screenshot on this slide shows these options being configured for the cluster.vSphere HA configurationVM Monitoring was configured to restart individual virtual machines if their heartbeat is not received within 60 seconds.The VM Restart Priority option for the four SAP VMs was set to High, as shown in the second screenshot. This ensures that these VMs are powered on first in the event of an outage. This screen also shows the Host Isolation Response setting which was left at the default value of ‘Leave powered on’.The next screenshot shows the datastores used for heartbeating. As vSphere HA requires at least two datastores to implement heartbeating, a second datastore was created on a 20 GB VPLEX distributed volume and presented to all the ESXi hosts.vSphere DRSThe final screenshot shows the DRS affinity rule configured for the solution. This is a VM- VM affinity rule which specifies that the ASCS (SAPASCS1) and ERS (SAPASCS2) virtual machines should always be kept on separate hosts.HA-Neustartpriorität für SAP-VMsHA und DRS, für VMware mit über größere Entfernungen ausgedehnten Clustern aktiviertHA Heartbeat DatastoresDRS-VM-VM-Affinitätsregel
15 EMC Virtual Storage Integrator und VPLEX EMC Virtual Storage Integrator (VSI) for VMware vSphere is a plug-in to the VMware vSphere client that provides a single management interface for managing EMC storage within the vSphere environment. It provides enhanced visibility into VPLEX directly from the vCenter GUI. The Storage Viewer and Path Management features are accessible through the EMC VSI tab.In the solution, VPLEX distributed volumes host the EXT_SAP_VPLEX_DS01 VMFS datastore, and Storage Viewer provides details of the datastore’s virtual volumes, storage volumes, and paths. The screenshot here also shows that the LUNS which make up this datastore are four 256 GB distributed RAID 1 VPLEX Metro volumes that are accessible via PowerPath.Registerkarte für EMC VSI in der vCenter-GUI
16 SAP-Systemarchitektur SAP-AnwendungssoftwareSAP Enhancement Package 4 für SAP ERP 6.0 IDESSAP NetWeaver Application Server für ABAP 7.01SAP Enqueue Replication ServerBetriebssystemSUSE SLES (Linux Enterprise Server) für SAP-Anwendungen 11 SP1SUSE Linux Enterprise High Availability ExtensionVirtualisierungSAP-Services auf virtuellen VMware-MaschinenOracle RAC-Datenbank auf physischen ServernThis slide illustrates the SAP system architecture for the solution.The SAP application layer is based on SAP ERP 604 and SAP NetWeaver The SAP ASCS instance, ERS instance, and Dialog Instances are virtualized on VMware ESXi servers.Each of the SAP VMs is deployed using SUSE Linux Enterprise Server for SAP Applications as the guest operating system. In addition, SUSE Linux Enterprise High Availability Extension and SAP Enqueue Replication Server are deployed to protect the SAP message server and enqueue server.
17 SAP-Systemarchitektur – Überlegungen zum Design Enqueue- und Messaging-Server von Zentralinstanz abgekoppelt und als Services in der ASCS-Instanz implementiertERS als Teil der HA-Architektur installiert, um den Verlust von Anwendungssperren zu verhindernZwei Dialoginstanzen für redundante Arbeitsprozesse wie Dialog, Hintergrund, Update, SpoolASCS-Instanz zur Entkopplung vom VM- Hostnamen mit virtuellem Hostnamen installiertERS-Instanz mit anderer Instanznummer installiert, um bei gleichzeitiger Clustersteuerung von ASCS und ERS Missverständnisse zu vermeidenThe solution implements a high-availability SAP system architecture, with these features:The enqueue and message servers are decoupled from the Central Instance and implemented as services within the ASCS instance.SAP ERS is installed as part of the HA architecture to provide zero application lock loss and further protect the enqueue server.Two dialog instances are installed to provide redundant work processes such as dialog, background, update, and spool.The SAP system deployed for the solution also implements several key design features:The ASCS instance is installed with a virtual hostname to decouple it from VM hostname.The ERS instance is installed with a different instance number to avoid future confusion when both ASCS and ERS are under cluster control.
18 SAP-Systemarchitektur – Überlegungen zum Design (Forts.) SAP-Update-Prozesse auf zusätzlichen Anwendungsserverinstanzen konfiguriertInstanzprofile für ASCS, ERS, Start und Dialog mit ERS-Konfigurationen aktualisiertFreigegebene SAP-Dateisysteme auf Oracle ACFS gespeichert und als NFS- Shares auf SAP-VMs gemountet – bereitgestellt als hoch verfügbare, von Oracle Clusterware gemanagte NFS-RessourceSpeicher für die gesamte SAP-Umgebung gekapselt, virtualisiert, auf zwei Standorte verteilt und über VPLEX Metro für SAP- Server zugänglich gemachtThe solution implements a high-availability SAP system architecture, with these features:The enqueue and message servers are decoupled from the Central Instance and implemented as services within the ASCS instance.SAP ERS is installed as part of the HA architecture to provide zero application lock loss and further protect the enqueue server.Two dialog instances are installed to provide redundant work processes such as dialog, background, update, and spool.The SAP system deployed for the solution also implements several key design features:SAP update processes are configured on the additional application server instances.The SAP ASCS, ERS, start, and dialog instance profiles are updated with ERS configurations.SAP shared file systems are stored on Oracle ACFS and mounted as NFS shares on the SAP VMs. These shared file systems are presented as a highly available NFS resource that is managed by Oracle Clusterware.The storage for the entire SAP environment is encapsulated and virtualized for the solution. The storage is distributed across the two sites and made available to the SAP servers through VPLEX Metro.
19 SUSE Linux Enterprise HAE-Konfiguration SLES HAE schützt die Enqueue- und Messaging- Server über zwei auf VMware-VMs erstellte Cluster-Nodes hinweg.VMware High Availability schützt die VMs.Ressourcen-Agents (virtuelle IP-Adresse, Master/Slave und SAPInstance) überwachen und steuern die Ressourcenverfügbarkeit.Der SAPInstance-Agent steuert die ASCS- und ERS-Instanzen und ist als Master/Slave- Ressource konfiguriert, damit ASCS und ERS nie auf demselben Node gestartet werden.Eine VMDK-Partition wird als SBD-STONITH- Gerät eingesetzt und mit Mehrfachschreiboption konfiguriert, sodass mehrere VMs gleichzeitig Schreibzugriff erhalten.This slide shows how the solution uses SUSE Linux Enterprise High Availability Extension to protect the central services (message server and enqueue server) across two cluster nodes built on VMware virtual machines, with VMHA protecting the virtual machines.The key components of SUSE Linux Enterprise HAE that are implemented in this solution include:OpenAIS/Corosync, which acts as a high-availability cluster manager that supports multinode failover.Resource agents that monitor and control the availability of resources – the resource agents implemented are Virtual IP address, master/slave, and SAPInstance.A high-availability GUI and various command line tools.The SUSE HAE system deployed for the solution also implements several key design features:The SBD STONTH device for the solution uses a partition of a virtual disk. This means that both cluster nodes must have simultaneous access to this disk. The virtual disk is stored in the same datastore as the SAP virtual machines. This is provisioned and protected by VPLEX and is available on both sites.By default, VMFS prevents multiple virtual machines from accessing and writing to the same VMDK. However, sharing was enabled by configuring the multi-writer option.The SAPInstance resource agent controls the ASCS instance and ERS instance and is configured as a master/slave resource. In the event of a failure, the slave is promoted to the role of master and starts the SAP ASCS instance. Similarly, the master is demoted to the role of slave and starts the ERS instance. This master/slave mode ensures that an ASCS instance is never started on the same node as the ERS.Note: Corosync token parameter configurationIn the Corosync configuration file—corosync.conf—the token timeout specifies the time (in milliseconds) after which a token loss is declared if a token is not received. This timeout corresponds to the time spent detecting the failure of a processor in the current configuration. For this solution, the value of this parameter is set to 10,000 ms in order to cope with the switchover of the underlying layers without unnecessary cluster service failover.
20 Oracle-Datenbankarchitektur Oracle-KomponentenOracle Database 11g Release 2 Enterprise EditionOracle ASMOracle ACFSOracle ClusterwareEinzelinstanzdatenbank auf physisches RAC-Cluster mit vier Nodes auf ASM migriertOracle Extended RAC über VPLEXVereinfachtes ManagementHostverbindungen nur zum jeweiligen lokalen VPLEX-ClusterI/O nur einmal von Hosts an das lokale Cluster gesendet – keine doppelten Schreibvorgänge erforderlichKein Bedarf an Bereitstellung von Oracle Voting Disc und Clusterware an drittem StandortWegfall der bei hostbasierter Spiegelung benötigten kostspieligen Host-CPU-Zyklen.Absicherung mehrerer Datenbanken und/oder Anwendungen als EinheitOracle components and configurationOracle Database 11g Release 2 provides the underlying database for the SAP applications. At each data center, the database originated as a physical single instance. However, to eliminate the database server as a single point of failure, the single instance database was migrated to a four- node physical Oracle RAC cluster with the Oracle database residing on ASM. The Oracle database files and SAP ERP application files reside on Oracle ASM Cluster File System (ACFS). An Oracle RAC on Extended Distance Clusters architecture is deployed to allow servers in the cluster to reside in physically separate locations and to remove the data center as a single point of failure.Why Oracle Extended RAC over VPLEXOracle RAC is normally run in a local data center due to the potential impact of distance-induced latency and the relative complexity and overhead of extending Oracle RAC across data centers with host-based mirroring using Oracle ASM. With EMC VPLEX Metro, however, an Oracle Extended RAC deployment, from the Oracle DBA perspective, becomes a standard Oracle RAC install and configuration.The main benefits of deploying Oracle Extended RAC with VPLEX include:VPLEX simplifies management of Extended Oracle RAC, as cross-site high availability is built in at the infrastructure level. To the Oracle DBA, installation, configuration, and maintenance are exactly the same as for a single site implementation of Oracle RAC.VPLEX eliminates the need for host-based mirroring of ASM disks and the host CPU cycles that this consumes. With VPLEX, ASM disk groups are configured with external redundancy and are protected by VPLEX distributed mirroring.Hosts need to connect to their local VPLEX cluster only and I/O is sent only once from that node. However, hosts have full read-write access to the same database at both sites.There is no need to deploy an Oracle voting disk on a third site to act as a quorum device at the application level.VPLEX enables you to create consistency groups that will protect multiple databases and/or applications as a unit.
21 Oracle-Datenbankkonfiguration Vier ACFS-Volumes über das RAC-Cluster verteilt gemountetTRANS, ASCS500, SAPMNT als NFS-Shares auf SAP-Server exportiertFreigegebene Dateisysteme als hoch verfügbare, von Oracle Clusterware gemanagte NFS-Ressource bereitgestelltASM-Laufwerksgruppen passend zum vorhandenen Layout mit Einzelinstanz konfiguriertThe diagram in this slide provides a logical representation of the solution’s deployment of Oracle Extended RAC on VPLEX Metro.This solution uses four ACFS volumes mounted across the Oracle RAC cluster. Three of the ACFS volumes – SAPMNT, USRSAPTRANS, and ASCS00 – were then exported as NFS shares to the SAP servers, using a virtual IP address and a highly available NFS resource under control of Oracle Clusterware.The ASM disk groups for the solution were configured to reflect the existing single- instance Oracle database layout – the lowermost table in the slide shows the ASM disk groups and their configuration.ACFS-VolumeMount-PunktSAP_O_HOME/oracle/VSE/112SAPMNT/sapmnt/VSEUSRSAPTRANS/usr/sap/transASCS00/usr/sap/VSE/ASCS00ASM- LaufwerksgruppeAnzahl LaufwerkeGröße der Laufwerksgruppe (GB)RedundanzOCR540NormalEA_SAP_ACFS464ExternEA_SAP_DATA162.048EA_SAP_REDOEA_SAP_REDOMEA_SAP_FRA256
22 Brocade-Netzwerkinfrastruktur This slide shows the IP and SAN networks deployed for the solution in the two data centers, and the Layer 2 extension between the data centers. These networks are created using Brocade networking technologies.IP networkIn each data center, the IP network is built using two Brocade VDX 6720 switches, which are deployed in a virtual cluster switch or VCS configuration. All servers are connected to the network using redundant 10 GbE connections provided by Brocade 1020 CNAs.The two VDX switches at each site are connected to a Brocade MLX Series router using a Virtual Link Aggregation Group or vLAG. The MLX Series routers extend the Layer 2 network between the two data centers. All traffic between Site A and Site B is routed through the MLX routers using multiple ports configured as a LAG.Oracle RAC relies on a highly available virtual IP for private network communication. For the solution, a separate VLAN—VLAN 10—is used for this interconnect, while VLAN 20 handled all public traffic.SAN networkThe SAN in each data center is built with Brocade DCX 8510 Backbones. All servers are connected to the SAN using redundant 8 Gb connections that are provided by Brocade 825 HBAs.The VPLEX to VPLEX connection between the data centers uses multiple FC connections between the DCX 8510 Backbones. These are used in active/active mode with failover.IP-NetzwerkSAN
23 EMC Speicherlayout Standort A – EMC Symmetrix VMAX Virtual ProvisioningStandort B – EMC VNX5700Herkömmliche RAID-Gruppen und LUNsThe storage at each site is provided by enterprise-class EMC storage arrays – a Symmetrix VMAX at Site A and a VNX5700 at Site B.Both the VMAX and VNX provide proven five 9s availability. Both also support EMC FAST (Fully Automated Storage Tiering) technology on a range of drive types and both are powered by Intel Xeon processors.VPLEX virtualizes storage on heterogeneous arrays – in this case, a VMAX and a VNX. However, it is still important to follow best practices for whichever storage arrays you are using.For the VMAX in the solution:VPLEX Metro, Oracle Extended RAC, and SAP volumes are laid out using EMC Virtual Provisioning. This configuration places the Oracle data files and log files in separate thin pools and allows each to use distinct RAID protection. The data files reside in a RAID 5 protected pool and the redo logs in a RAID 1 protected pool. You can see this layout in the first diagram in the slide.Storage was not pre-allocated to any of the devices, except for the Oracle REDO log devices, as recommended by EMC.For the VNX5700 in the solution:VPLEX Metro, Oracle Extended RAC, and SAP volumes were laid out using traditional RAID groups and LUNs. This configuration places the Oracle data files and log files in separate RAID groups and allows each to use distinct RAID protection. The data files reside in a RAID 5 protected RAID group and the redo logs in a RAID 10 protected RAID group. The FRA disk group resides on NL-SAS drives with RAID 6 protection. You can see this layout in the right-hand diagram in the slide.Similar EMC best practices apply to both Virtual Provisioning and traditional provisioning methods, and the same ASM disk groups were created on the VNX and VMAX. In addition, the LUNs created on the VNX match the number and size of the thin devices created on the VMAX.
24 Tests und ValidierungTestsFehler beim SAP Enqueue- ServiceprozessAusfall der virtuellen Maschine für die SAP ASCS-InstanzAusfall des Oracle RAC-NodeSystemausfall am Standort (VPLEX-Cluster, ESXi-Server, Netzwerk, RAC-Nodes)VPLEX-ClusterisolierungErwartetes Verhalten Die Anwendung wird ohne Unterbrechung weiter ausgeführt.The EMC validation team initially installed and validated the solution environment without any high-availability or business continuity protection schemes. They then transformed the environment to the mission-critical business continuity solution described in this presentation.To validate the solution, and to demonstrate the elimination of all single points of failure, the validation team carried out the tests listed in this slide.The result was the same for each test: that is, the application continued without interruption.
25 Fehler beim SAP Enqueue-Serviceprozess Der SAPInstance- Ressourcen-Agent erkennt/meldet den Fehler.Der Master/Slave- Ressourcen-Agent stuft SAPASCS1 zum Master hoch (dieser hostet die ASCS-Services).Der Master/Slave- Ressourcen-Agent startet ERS auf SAPASCS2, sobald dieser Node wieder im Cluster vertreten ist.Die replizierte Sperrtabelle wird wiederhergestellt.12This slide summarizes the “SAP Enqueue Service Process Failure” test carried out by the EMC validation team. This test validates how the system behaves in the event of a SAP enqueue server process failure.Failure simulationTo test this type of failure, the enqueue service process on the active ASCS node was terminated by running the kill command.System behaviorWhen the node fails, the failure is reported, as shown in the uppermost screen segment in the slide. Then, what was previously the slave node (SAPASCS1) is promoted to become the master node and host the ASCS services, as shown in the main screen shot. When SAPASCS2 (the failed node) rejoins the cluster, the ERS is restarted on that node. Finally, the replicated lock table is restored.ResultThis test demonstrates that the application continues without interruption if the enqueue process fails and that no administrative intervention is required to deal with the failure.3ErgebnisDie Anwendung wird ohne Unterbrechung weiter ausgeführt.Es ist kein Eingriff durch einen Administrator erforderlich.4
26 Ausfall der VM für die SAP ASCS-Instanz SAPASCS2 ist über den vSphere- Client nicht mehr verfügbar.Der SAPInstance-Ressourcen- Agent erkennt/meldet den Fehler.VMHA startet die ausgefallene VM auf dem noch aktiven ESXi- Host neu.Der Master/Slave-Ressourcen- Agent stuft SAPASCS1 zum Master hoch (dieser hostet die ASCS-Services) und startet ERS auf SAPASCS2, sobald dieser Node wieder im Cluster vertreten ist.Die replizierte Sperrtabelle wird wiederhergestellt.123This slide summarizes the “SAP ASCS Instance VM Failure” test carried out by the EMC validation team. This test validates how the system behaves if the VM on which the ASCS instance is running fails.Failure simulationTo simulate this type of failure, the ESXi server hosting the ASCS instance VM was powered off via DRAC. The server was then rebooted without entering maintenance mode.System behaviorThe system responded to the failure as shown in steps 1 to 5 in the slide.ResultThis test demonstrates that the application continues without interruption if the ASCS instance VM fails and that no administrative intervention is required to deal with the failure.4ErgebnisDie Anwendung wird ohne Unterbrechung weiter ausgeführt.Es ist kein Eingriff durch einen Administrator erforderlich.5
27 Ausfall des Oracle RAC-Node ErgebnisFür Anwender verlängert sich die Reaktionszeit bei Transaktionen, wenn der DI-Arbeitsprozess eine Verbindung zu einem anderen RAC-Node herstellt.Unvollendete Transaktionen werden auf DB-Ebene rückgängig gemacht, um die Datenkonsistenz zu erhalten. Dem Anwender wird eine Systemfehlermeldung angezeigt, er muss die Transaktion neu starten.Es ist kein Eingriff durch einen Administrator erforderlich.This slide summarizes the “Oracle RAC Node Failure” test carried out by the EMC validation team. This test validates how the system behaves in the event of an unexpected RAC node failure.Failure simulationTo test this type of failure, the server was rebooted so that the Oracle RAC node running on it went offline.System behaviorWhen the RAC node went offline, instance VSE003 became unavailable. The SAP instance work process then automatically connected to another RAC node – this is illustrated by the screen shots in the slide.ResultThe test results are also summarized on the slide.Der RAC-Node geht offline, die Instanz VSE003 ist nicht mehr verfügbar.Der Arbeitsprozess der SAP-Instanz stellt eine Verbindung zu einem anderen RAC-Node her.12
28 Status der Umgebung vor dem Systemausfall am Standort Alle RAC-Nodes werden ausgeführt.Die VPLEX-Cluster sind auf beiden Seiten verfügbar.Die ESXi-Server sind auf beiden Seiten verfügbar.Die virtuellen SAP- Maschinen an Standort A und Standort B sind aktiv.The next two tests relate to a complete site failure and isolation of a VPLEX cluster. This slide shows the status of the environment before these tests were carried out.
29 Systemausfall am Standort VPLEX Witness setzt die Abkopplungsregel für die Consistency Group außer Kraft, damit VPLEX an Standort B verfügbar bleibt.Die RAC-Nodes an Standort B bleiben verfügbar.VMHA startet SAPASCS1 und SAPDI1 an Standort B neu.SLE HAE erkennt den Ausfall von SAPASCS1 und startet ERS neu, sobald dieser Node wieder im Cluster vertreten ist.Die Anwendersitzungen auf SAPDI1 werden unterbrochen. Die Anwender können sich jedoch neu anmelden, wenn SAPDI1 an Standort B neu gestartet wird. Während des Neustarts werden neue Anwender zu SAPDI2 geleitet.12This slide summarizes the “Site Failure” test carried out by the EMC validation team. This test validates how the system behaves in the event of a complete site failure.Failure simulationTo test this failure scenario, the validation team simulated a complete failure of Site A, including VPLEX cluster, ESXi server, network, and Oracle RAC node components. The VPLEX Witness remained available on Site C. And on Site B, VPLEX cluster-2 remained in communication with the VPLEX Witness.System behaviorSteps 1 to 5 outline how the system responds to the failure, and the diagram to the left illustrates the status of the environment after the site failure:When Site A fails, VPLEX Witness ensures that the consistency group’s detach rule, which defines cluster-1 as the preferred cluster, is overridden and that the storage served by VPLEX cluster-2 on Site B remains available.RAC nodes sse-ea-erac-n03 and sse-ea-erac-n04 on Site B remain available.When the ESXi servers on Site A fail, VMHA restarts SAPASCS1 and SAPDI1 on Site B, with SAPASCS1 restarted on a different ESXi host to SAPASCS2.SUSE Linux Enterprise HAE detects the failure of cluster node SAPASCS1. Because the ERS was running on this node, the cluster takes no action except to restart the ERS when SAPASCS1 rejoins the cluster. The lock table is preserved and operational all the time.End users on SAPDI1 lose their sessions due to the ESXi server failure. During the restart process, new users are directed to SAPDI2. When SAPDI1 restarts on Site B, users can log into SAPDI1 again.ResultThe overall result is that the application continues without interruption even in the event of a complete site failure.345
30 VPLEX-Clusterisolierung VPLEX Witness setzt die Abkopplungsregel für die Consistency Group außer Kraft, damit VPLEX an Standort B verfügbar bleibt.Die RAC-Nodes an Standort B bleiben verfügbar.Die RAC-Nodes an Standort A werden ausgeworfen.Die ESXi-Server an Standort A bleiben verfügbar.Die virtuellen Maschinen SAPASCS1 und SAPDI1 bleiben dank VPLEX Metro HA Cross- Cluster Connect aktiv.12This slide summarizes the “VPLEX Cluster Isolation” test carried out by the EMC validation team. This test validates how the system behaves in the event of isolation of one of the VPLEX clusters.Failure simulationTo test this failure scenario, the validation team simulated isolation of the preferred cluster on Site A, with both the external Management IP network and the VPLEX WAN communications network partitioned. The LAG network remained available. VPLEX Witness remained available on Site C. On Site B, VPLEX cluster-2 remained in communication with VPLEX Witness.System behaviorSteps 1 to 5 explain outline how the system responds to the failure, and the diagram to the left illustrates the status of the environment after cluster isolation:When the VPLEX on Site A becomes isolated, the VPLEX Witness ensures that the consistency group’s detach rule, which defines cluster-1 as the preferred cluster, is overridden and that the storage served by VPLEX cluster-2 on Site B remained available.RAC nodes sse-ea-erac-n03 and sse-ea-erac-n04 on Site B remain available.RAC nodes sse-ea-erac-n01 and sse-ea-erac-n02 on Site A are ejected.The ESXi servers on Site A remain available.Virtual machines SAPASCS1 and SAPDI1 remain active due to the use of VPLEX Metro HA Cross-Cluster Connect.ResultThe overall result is that the application continues without interruption even if the preferred VPLEX cluster is isolated.345
31 Tests und Validierung Tests Fehler beim SAP Enqueue- Serviceprozess Ausfall der virtuellen Maschine für die SAP ASCS- InstanzAusfall des Oracle RAC-NodeSystemausfall am Standort (VPLEX-Cluster, ESXi-Server, Netzwerk, RAC- Nodes)VPLEX-ClusterisolierungBeobachtetes Verhalten Die Anwendung wird ohne Unterbrechung weiter ausgeführt.The EMC validation team initially installed and validated the solution environment without any high-availability or business continuity protection schemes. They then transformed the environment to the mission-critical business continuity solution described in this presentation.To validate the solution, and to demonstrate the elimination of all single points of failure, the validation team carried out the tests listed in this slide.The result was the same for each test: that is, the application continued without interruption.
32 Zusammenfassung und Fazit Lösung mit einer Kombination aus Technologien von EMC, SAP, VMware, Oracle, SUSE und Brocade. Ziele:Beseitigung von Single-Points-of-Failure auf allen Ebenen der UmgebungEinführung von Aktiv/Aktiv-Rechenzentren mit RPOs und RTOs nahe NullBereitstellung von geschäftskritischer Business Continuity für SAP- AnwendungenThis solution demonstrates the transformation of a traditional active/passive SAP deployment to a highly available business continuity solution with active/active data centers and always-on application availability.The solution combines EMC, VMware, Oracle, SUSE, and Brocade high-availability components to:Eliminate single points of failure at all layers in the environmentProvide active/active data centers that support near-zero RPOs and RTOsEnable mission-critical business continuity for SAP applicationsEach single point of failure was identified and mitigated by using fault-tolerant components and high-availability clustering technologies. Resource utilization was increased by enabling active/active data access. And failure handling was fully automated to eliminate the final and often most unpredictable SPOF from the architecture—people and processes.In addition, the use of management and monitoring tools such as the vSphere Client, EMC Virtual Storage Integrator, and the VPLEX performance tools simplifies operational management and allows monitoring and mapping of the infrastructure stack.The testing performed by the EMC validation team demonstrates how using the solution design principles and components eliminated single points of failure at the local level and created an active-active data center that enables mission-critical business continuity for SAP. The components involved here are: EMC VPLEX Metro, EMC VPLEX Witness, EMC Symmetric VMAX, VMware vSphere HA, Oracle RAC, SUSE Linux Enterprise HAE, and Brocade networking.The testing also demonstrates how VPLEX Metro, combined with SUSE Linux Enterprise HAE, Oracle Extended RAC, and Brocade networking, extends this high availability to break the boundaries of the data center and allow servers at multiple data centers to have read/write access to shared block storage devices. VPLEX Witness and Cross-Cluster Connect provide an even higher level of resilience.Together, these technologies enable transformation of a traditional active/passive data center deployment to a mission-critical business continuity solution with active/active data centers, 24/7 application availability, no single points of failure, and near-zero RTOs and RPOs.Aktiv/Aktiv-RechenzentrenRPOs und RTOs von nahezu nullUnterbrechungsfreie AnwendungsverfügbarkeitKein Single-Point-of-FailureVereinfachtes Management für hohe VerfügbarkeitFehlerbehandlung und Lastenausgleich vollautomatischWartung ohne AusfallzeitVereinfachte Bereitstellung von Oracle RAC auf Extended Distance ClustersHöhere Infrastrukturauslastung