Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA

Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA
Stefan Bäuerle, Jonathan Dees, Franz Faerber, Wolfgang Lehner

Agenda Motivation & Requirements
Different Processing Engines and Integration Scale out edition engine

Application requirements for a modern DBMS
Different: data types consumption models data models notions of consistency application and query language levels of scaling hardware capabilities

HANA Platform

HANA System

Beyond relational data processing (1/3)
Integrate as deep as possible into the engine Bringing OLAP and OLTP together Proven: works in thousands of customer systems Simplicity: get rid of extracts, loads and redundancy, one system OLAP dominates OLTP in real world systems: optimize accordingly Data mining and prediction Examples: Basked analysis, different forecasting algorithms… Easy interaction with R and SAS Unstructured data Support text search > 30 languages including: Stemming, speech tagging, noun extractions, … Classification, clustering, named entity recognition, sentinel analysis Planning extensions Planning: Define and align business figures for foreseeable future Data heavy operators like disaggregation or logical snapshots

Graph processing Real world business data often resembles graphs Model as graph: More explicit and more efficient operators Distance, siblings, shortest path, reachability, transitive closure, … Hierarchy processing Special type of general graphs Used by almost every business application Support for time dependent and versioned hierarchies Extended graph operators: level, neighbor, is_ancestor, … Geospatial processing & Time series Native relational data types Existing compression techniques + powerful specializations for sensor data Spatial: WithinDistance, Contains, Area, … Time series: Group by time interval, Interpolate Missing Values, …

Scientific processing Bring prominent operators into the engine Simplifies and speeds up operations in scientific and financial area Matrix operators: Eigenvalue, Multiply, … Financial operators: Interest Rates, GarmanKohlagenProcess, … No SQL processing Document based models, XML, JSON, … Key value stores Flexible Schema, in HANA via specific flexible table type Massive scale out Conventional business applications fit on single box, but: there is a new kind of applications requiring massive scale out Deep and seamless integration with the Hadoop system Scale out and single box application act as one system

Application integration ( examples )
Currency conversion Hierarchy handling Aging / dynamic tiering Dictionary maintenance Graph optimizations

HANA Data Platform Dynamic Tiering
HANA Dynamic Tiering Declare table to use disk storage Cost efficient for big data Optimized disk based processing powered by IQ New warm option beside Hot (in-memory) Cold (Near Linear Storage) CREATE TABLE „demo“.“SalesOrders_WARM“ ( ID Integer NOT NULL, CustomerID Integer NOT NULL, OrderDate date NOT NULL, …, PRIMARY KEY (id) ) USING EXTENDED STORAGE; INSERT INTO „demo“.“SalesOrders_WARM“ VALUES ( … ); HANA Dynamic Tiering Native Big Data solution – real-time insights – ALL enterprise data Preferred for struct./transactional cases Manage data cost effectively, yet with desired performance based on SLAs Terabytes to Petabytes Application defined temperature Single Database experience Update & query all data seamlessly via HANA tables Centralized operational control

SAP HANA Massive Scale Out Edition (Velocity)
Motivation: Engine for massive scale out and big data Key Features: Scale to thousands of nodes Different data freshness and consistency levels Efficient fail safety design First class citizen within Hadoop (Spark) Support variety of hardware and operating systems Extreme query performance by compiling SQL to native code

SAP HANA SOE (Velocity) and Hadoop (1/2)
Ambari Cluster Management Hadoop Ecosystem Zookeeper Coordination Pig Scripting MLib Machine Learning Hive SQL SparkSQL Yarn Processing HDFS Distributed File System HBase Database Spark Processing

Steps Stage 1: Integration with Spark (2015) Stage 2: Independent execution cluster Benefits Integration of SAP data with data lakes HANA features add Value into Hadoop (e.g. SQL extensions like time series, hierarchies, …) Performance Holistic data platform

Architecture to Support Different Data Freshness Levels
Options read your own writes up-to-date data vs. certain age Separate component for Transactions DTX Query Engine 1 Transaction Broker Version Table A, B, C Query Engine 2 Query Engine 3 R Storage 1 Storage n Storage 2 Distributed Log … A, D A, C, D DQP Storage (checkpoints) Connection n Connection 1 (Session data)

SAP HANA scale out integration

Conclusion Today’s applications have multidimensional set of specialized requirements Gains from moving these requirements into a (single) DBMS: Simplified and more explicit data modeling and processing for applications Increased performance No complicated data transfer between specialized engines Powerful orchestration required Web-scale processing is key to support new applications SAP HANA strives to answer all these requirements in a single data management platform.

SAP HANA Massive scale out edition (Project Velocity)
Scales to thousands of nodes Support of massive distribution and failure tolerance ACID properties on large landscape Can run on small devices Low footprint allows to run on small commodity hardware and small devices Integration into Hadoop infrastructure ( Spark ) Access via standard Hadoop mechanisms ( i.e. map & reduce) Deep integration into Spark execution framework Extreme performance with SQL compilation Compile SQL into C code and realtime compilation into executable Support for IoT and semi structured data Special data types for IoT ( time series data) Support of document style data in a massive scale environment Big modification of slide in Stanford EE Computer Systems Colloquium v1.0 (Chris Hallenbeck, Richard Pledereder) The topics under High Performance (compression, parallelization and scanning) receive major attention in this section. Column store is emphasized, although row store is mentioned. ACID (Atomicity, Consistency, Isolation, Durability)

General: Embrace Hadoop as technology Goal: Get our own Engine on Hadoop Velocity  HANA Scale-Out Extension Steps First step: Integrated with Spark ( Q3 2015) Mid Term: independent execution cluster Benefits Holistic data platform Integration of SAP data with data lakes HANA features on Hadoop (e.g. time series) Value added abilities on Hadoop data Performance General: Embrace Hadoop as technology Goal: Get our own Engine on Hadoop Velocity  HANA Scale-Out Extension Steps First step: Integrated with Spark ( Q3 2015) Mid Term: independent execution cluster Benefits Holistic data platform Integration of SAP data with data lakes HANA features on Hadoop (e.g. time series) Value added abilities on Hadoop data Performance

Architecture to Support Different Data Freshness Levels
Distributed query processor Workers Distributed transaction manager Velocity (OLTP) Velocity (OLTP) Velocity (OLAP) Velocity (OLAP) Distributed log Distributed filesystem (for checkpoints …) Text Document Graph Time series Storage

Thank you

Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA

Ähnliche Präsentationen

Präsentation zum Thema: "Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA"— Präsentation transkript:

Ähnliche Präsentationen

Über Projekt

Feedback

Anmelden

Anmeldung über soziales Netzwerk:

Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA

Ähnliche Präsentationen

Präsentation zum Thema: "Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA"— Präsentation transkript:

Ähnliche Präsentationen

Über Projekt

Feedback