6 Beyond relational data processing (1/3) Integrate as deep as possible into the engineBringing OLAP and OLTP togetherProven: works in thousands of customer systemsSimplicity: get rid of extracts, loads and redundancy, one systemOLAP dominates OLTP in real world systems: optimize accordinglyData mining and predictionExamples: Basked analysis, different forecasting algorithms…Easy interaction with R and SASUnstructured dataSupport text search > 30 languages including:Stemming, speech tagging, noun extractions, …Classification, clustering, named entity recognition, sentinel analysisPlanning extensionsPlanning: Define and align business figures for foreseeable futureData heavy operators like disaggregation or logical snapshots
7 Beyond relational data processing (2/3) Graph processingReal world business data often resembles graphsModel as graph: More explicit and more efficient operatorsDistance, siblings, shortest path, reachability, transitive closure, …Hierarchy processingSpecial type of general graphsUsed by almost every business applicationSupport for time dependent and versioned hierarchiesExtended graph operators: level, neighbor, is_ancestor, …Geospatial processing & Time seriesNative relational data typesExisting compression techniques + powerful specializations for sensor dataSpatial: WithinDistance, Contains, Area, …Time series: Group by time interval, Interpolate Missing Values, …
8 Beyond relational data processing (3/3) Scientific processingBring prominent operators into the engineSimplifies and speeds up operations in scientific and financial areaMatrix operators: Eigenvalue, Multiply, …Financial operators: Interest Rates, GarmanKohlagenProcess, …No SQL processingDocument based models, XML, JSON, …Key value storesFlexible Schema, in HANA via specific flexible table typeMassive scale outConventional business applications fit on single box, but: there is a new kind of applications requiring massive scale outDeep and seamless integration with the Hadoop systemScale out and single box application act as one system
10 HANA Data Platform Dynamic Tiering HANA Dynamic TieringDeclare table to use disk storageCost efficient for big dataOptimized disk based processing powered by IQNew warm option besideHot (in-memory)Cold (Near Linear Storage)CREATE TABLE „demo“.“SalesOrders_WARM“ (ID Integer NOT NULL,CustomerID Integer NOT NULL,OrderDate date NOT NULL,…,PRIMARY KEY (id)) USING EXTENDED STORAGE;INSERT INTO „demo“.“SalesOrders_WARM“ VALUES ( … );HANA Dynamic TieringNative Big Data solution – real-time insights – ALL enterprise dataPreferred for struct./transactional casesManage data cost effectively, yet with desired performance based on SLAsTerabytes to PetabytesApplication defined temperatureSingle Database experienceUpdate & query all data seamlessly via HANA tablesCentralized operational control
11 HANA Data Platform BigData | Vision HANA Data Management PlatformHANA native BigDataDynamic TieringSmart Data StreamingNoSQL | Graph | Geo | TimeSeriesHANA & HadoopSDA Hive | SparkMapReduce | HDFSAdmin & MonitoringUser Mgmt / SecurityHadoop ExtensionVelocity EngineIntegrated with HANA and HadoopInformation Management | Text | Search | Graph | Geospatial | Predictive∞SAP HANA In-MemoryHANA Dynamic TieringHADOOPHANA Scale Out0.1secInfinite Storage Raw DataInstant ResultsWarm DataSmart Data StreamingAdministration | Monitoring | Operations | User Management | Security
12 SAP HANA Massive Scale Out Edition (Velocity) Motivation:Engine for massive scale out and big dataKey Features:Scale to thousands of nodesDifferent data freshness and consistency levelsEfficient fail safety designFirst class citizen within Hadoop (Spark)Support variety of hardware and operating systemsExtreme query performance by compiling SQL to native code
13 SAP HANA SOE (Velocity) and Hadoop (1/2) Ambari Cluster ManagementHadoop EcosystemZookeeperCoordinationPigScriptingMLibMachine LearningHiveSQLSparkSQLYarn ProcessingHDFS Distributed File SystemHBaseDatabaseSpark Processing
14 SAP HANA SOE (Velocity) and Hadoop (2/2) StepsStage 1: Integration with Spark (2015)Stage 2: Independent execution clusterBenefitsIntegration of SAP data with data lakesHANA features add Value into Hadoop (e.g. SQL extensions like time series, hierarchies, …)PerformanceHolistic data platform
15 Architecture to Support Different Data Freshness Levels Optionsread your own writesup-to-date data vs. certain ageSeparate component for TransactionsDTXQuery Engine 1Transaction BrokerVersion TableA, B, CQuery Engine 2Query Engine 3RStorage 1Storage nStorage 2Distributed Log…A, DA, C, DDQPStorage (checkpoints)Connection nConnection 1(Session data)
17 ConclusionToday’s applications have multidimensional set of specialized requirementsGains from moving these requirements into a (single) DBMS:Simplified and more explicit data modeling and processing for applicationsIncreased performanceNo complicated data transfer between specialized enginesPowerful orchestration requiredWeb-scale processing is key to support new applicationsSAP HANA strives to answer all these requirements in a single data management platform.
18 SAP HANA Massive scale out edition (Project Velocity) Scales to thousands of nodesSupport of massive distribution and failure toleranceACID properties on large landscapeCan run on small devicesLow footprint allows to run on small commodity hardware and small devicesIntegration into Hadoop infrastructure ( Spark )Access via standard Hadoop mechanisms ( i.e. map & reduce)Deep integration into Spark execution frameworkExtreme performance with SQL compilationCompile SQL into C code and realtime compilation into executableSupport for IoT and semi structured dataSpecial data types for IoT ( time series data)Support of document style data in a massive scale environmentBig modification of slide in Stanford EE Computer Systems Colloquium v1.0 (Chris Hallenbeck, Richard Pledereder)The topics under High Performance (compression, parallelization and scanning) receive major attention in this section.Column store is emphasized, although row store is mentioned.ACID (Atomicity, Consistency, Isolation, Durability)
19 SAP HANA SOE (Velocity) and Hadoop (2/2) General: Embrace Hadoop as technologyGoal: Get our own Engine on HadoopVelocity HANA Scale-Out ExtensionStepsFirst step: Integrated with Spark ( Q3 2015)Mid Term: independent execution clusterBenefitsHolistic data platformIntegration of SAP data with data lakesHANA features on Hadoop (e.g. time series)Value added abilities on Hadoop dataPerformanceGeneral: Embrace Hadoop as technologyGoal: Get our own Engine on HadoopVelocity HANA Scale-Out ExtensionStepsFirst step: Integrated with Spark ( Q3 2015)Mid Term: independent execution clusterBenefitsHolistic data platformIntegration of SAP data with data lakesHANA features on Hadoop (e.g. time series)Value added abilities on Hadoop dataPerformance
20 Architecture to Support Different Data Freshness Levels Distributed query processorWorkersDistributed transaction managerVelocity (OLTP)Velocity (OLTP)Velocity (OLAP)Velocity (OLAP)Distributed logDistributed filesystem (for checkpoints …)TextDocumentGraphTime seriesStorage
Your consent to our cookies if you continue to use this website.