Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA

Ähnliche Präsentationen


Präsentation zum Thema: "Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA"—  Präsentation transkript:

1 Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA
Stefan Bäuerle, Jonathan Dees, Franz Faerber, Wolfgang Lehner

2 Agenda Motivation & Requirements
Different Processing Engines and Integration Scale out edition engine

3 Application requirements for a modern DBMS
Different: data types consumption models data models notions of consistency application and query language levels of scaling hardware capabilities

4 HANA Platform

5 HANA System

6 Beyond relational data processing (1/3)
Integrate as deep as possible into the engine Bringing OLAP and OLTP together Proven: works in thousands of customer systems Simplicity: get rid of extracts, loads and redundancy, one system OLAP dominates OLTP in real world systems: optimize accordingly Data mining and prediction Examples: Basked analysis, different forecasting algorithms… Easy interaction with R and SAS Unstructured data Support text search > 30 languages including: Stemming, speech tagging, noun extractions, … Classification, clustering, named entity recognition, sentinel analysis Planning extensions Planning: Define and align business figures for foreseeable future Data heavy operators like disaggregation or logical snapshots

7 Beyond relational data processing (2/3)
Graph processing Real world business data often resembles graphs Model as graph: More explicit and more efficient operators Distance, siblings, shortest path, reachability, transitive closure, … Hierarchy processing Special type of general graphs Used by almost every business application Support for time dependent and versioned hierarchies Extended graph operators: level, neighbor, is_ancestor, … Geospatial processing & Time series Native relational data types Existing compression techniques + powerful specializations for sensor data Spatial: WithinDistance, Contains, Area, … Time series: Group by time interval, Interpolate Missing Values, …

8 Beyond relational data processing (3/3)
Scientific processing Bring prominent operators into the engine Simplifies and speeds up operations in scientific and financial area Matrix operators: Eigenvalue, Multiply, … Financial operators: Interest Rates, GarmanKohlagenProcess, … No SQL processing Document based models, XML, JSON, … Key value stores Flexible Schema, in HANA via specific flexible table type Massive scale out Conventional business applications fit on single box, but: there is a new kind of applications requiring massive scale out Deep and seamless integration with the Hadoop system Scale out and single box application act as one system

9 Application integration ( examples )
Currency conversion Hierarchy handling Aging / dynamic tiering Dictionary maintenance Graph optimizations

10 HANA Data Platform Dynamic Tiering
HANA Dynamic Tiering Declare table to use disk storage Cost efficient for big data Optimized disk based processing powered by IQ New warm option beside Hot (in-memory) Cold (Near Linear Storage) CREATE TABLE „demo“.“SalesOrders_WARM“ ( ID Integer NOT NULL, CustomerID Integer NOT NULL, OrderDate date NOT NULL, …, PRIMARY KEY (id) ) USING EXTENDED STORAGE; INSERT INTO „demo“.“SalesOrders_WARM“ VALUES ( … ); HANA Dynamic Tiering Native Big Data solution – real-time insights – ALL enterprise data Preferred for struct./transactional cases Manage data cost effectively, yet with desired performance based on SLAs Terabytes to Petabytes Application defined temperature Single Database experience Update & query all data seamlessly via HANA tables Centralized operational control

11 HANA Data Platform BigData | Vision
HANA Data Management Platform HANA native BigData Dynamic Tiering Smart Data Streaming NoSQL | Graph | Geo | TimeSeries HANA & Hadoop SDA  Hive | Spark MapReduce | HDFS Admin & Monitoring User Mgmt / Security Hadoop Extension Velocity Engine Integrated with HANA and Hadoop Information Management | Text | Search | Graph | Geospatial | Predictive SAP HANA In-Memory HANA Dynamic Tiering HADOOP HANA Scale Out 0.1sec Infinite Storage Raw Data Instant Results Warm Data Smart Data Streaming Administration | Monitoring | Operations | User Management | Security

12 SAP HANA Massive Scale Out Edition (Velocity)
Motivation: Engine for massive scale out and big data Key Features: Scale to thousands of nodes Different data freshness and consistency levels Efficient fail safety design First class citizen within Hadoop (Spark) Support variety of hardware and operating systems Extreme query performance by compiling SQL to native code

13 SAP HANA SOE (Velocity) and Hadoop (1/2)
Ambari Cluster Management Hadoop Ecosystem Zookeeper Coordination Pig Scripting MLib Machine Learning Hive SQL SparkSQL Yarn Processing HDFS Distributed File System HBase Database Spark Processing

14 SAP HANA SOE (Velocity) and Hadoop (2/2)
Steps Stage 1: Integration with Spark (2015) Stage 2: Independent execution cluster Benefits Integration of SAP data with data lakes HANA features add Value into Hadoop (e.g. SQL extensions like time series, hierarchies, …) Performance Holistic data platform

15 Architecture to Support Different Data Freshness Levels
Options read your own writes up-to-date data vs. certain age Separate component for Transactions DTX Query Engine 1 Transaction Broker Version Table A, B, C Query Engine 2 Query Engine 3 R Storage 1 Storage n Storage 2 Distributed Log A, D A, C, D DQP Storage (checkpoints) Connection n Connection 1 (Session data)

16 SAP HANA scale out integration

17 Conclusion Today’s applications have multidimensional set of specialized requirements Gains from moving these requirements into a (single) DBMS: Simplified and more explicit data modeling and processing for applications Increased performance No complicated data transfer between specialized engines Powerful orchestration required Web-scale processing is key to support new applications SAP HANA strives to answer all these requirements in a single data management platform.

18 SAP HANA Massive scale out edition (Project Velocity)
Scales to thousands of nodes Support of massive distribution and failure tolerance ACID properties on large landscape Can run on small devices Low footprint allows to run on small commodity hardware and small devices Integration into Hadoop infrastructure ( Spark ) Access via standard Hadoop mechanisms ( i.e. map & reduce) Deep integration into Spark execution framework Extreme performance with SQL compilation Compile SQL into C code and realtime compilation into executable Support for IoT and semi structured data Special data types for IoT ( time series data) Support of document style data in a massive scale environment Big modification of slide in Stanford EE Computer Systems Colloquium v1.0 (Chris Hallenbeck, Richard Pledereder) The topics under High Performance (compression, parallelization and scanning) receive major attention in this section. Column store is emphasized, although row store is mentioned. ACID (Atomicity, Consistency, Isolation, Durability)

19 SAP HANA SOE (Velocity) and Hadoop (2/2)
General: Embrace Hadoop as technology Goal: Get our own Engine on Hadoop Velocity  HANA Scale-Out Extension Steps First step: Integrated with Spark ( Q3 2015) Mid Term: independent execution cluster Benefits Holistic data platform Integration of SAP data with data lakes HANA features on Hadoop (e.g. time series) Value added abilities on Hadoop data Performance General: Embrace Hadoop as technology Goal: Get our own Engine on Hadoop Velocity  HANA Scale-Out Extension Steps First step: Integrated with Spark ( Q3 2015) Mid Term: independent execution cluster Benefits Holistic data platform Integration of SAP data with data lakes HANA features on Hadoop (e.g. time series) Value added abilities on Hadoop data Performance

20 Architecture to Support Different Data Freshness Levels
Distributed query processor Workers Distributed transaction manager Velocity (OLTP) Velocity (OLTP) Velocity (OLAP) Velocity (OLAP) Distributed log Distributed filesystem (for checkpoints …) Text Document Graph Time series Storage

21 Thank you

22

23


Herunterladen ppt "Towards a Web-scale Data Management Ecosystem Demonstrated by SAP HANA"

Ähnliche Präsentationen


Google-Anzeigen