Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

DATA WAREHOUSE Oracle Data Warehouse Mit Big Data neue Horizonte für das Data Warehouse ermöglichen Alfred Schlaucher, Detlef Schroeder DATA WAREHOUSE.

Ähnliche Präsentationen


Präsentation zum Thema: "DATA WAREHOUSE Oracle Data Warehouse Mit Big Data neue Horizonte für das Data Warehouse ermöglichen Alfred Schlaucher, Detlef Schroeder DATA WAREHOUSE."—  Präsentation transkript:

1 DATA WAREHOUSE Oracle Data Warehouse Mit Big Data neue Horizonte für das Data Warehouse ermöglichen Alfred Schlaucher, Detlef Schroeder DATA WAREHOUSE

2 Themen Big Data Buzz Word oder eine neue Dimension und Möglichkeiten
Oracles Technologie zu Speichern von unstrukturierten und teilstrukturierten Massendaten Cloudera Framwork „Connectors“ in die neue Welt Oracle Loader for Hadoop und HDFS Big Data Appliance Mit Oracle R Enterprise neue Analyse-Horizonte entdecken Big Data Analysen mit Endeca

3 Oracle Hadoop Loader Utilities
Oracle Loader for Hadoop Oracle Direct Connector for Hadoop Distributed File System Oracle R Connector for Hadoop

4 Oracle Loader for Hadoop
Map Reduce – Programm Native – Aufruf Lesen aus HDFS Formate Hive table Delimited Text Files Simple delimited Text Files Avro*-Format, binäre record files Schreiben Schreiben in File Datapump-Format CSV-Format Schreiben direkt in die Datenbank OCI -> Direct Path Load JDBC -> Convential Path Load Kann parallelisieren über Partition-Funktion + sortieren *Avro ist ein Remote Procedure Call- und Serialisierungs-Framework, das als Teil von Apaches Hadoop-Projekt entwickelt worden ist. Es verwendet JSON, um Datentypen und Protokolle zu definieren. Die eigentlichen Daten werden in einem kompakten Binärformat serialisiert. Sein Hauptverwendungszweck ist Hadoop, wo es sowohl als Serialisierungsformat für die Persistierung von Daten als auch als Datenübertragungsformat für die Kommunikation zwischen Hadoop-Knoten untereinander sowie zwischen Hadoop-Services und Client-Programmen verwendet werden kann.

5 Oracle Direct Connector for Hadoop Distributed File System (HDFS)
Erweiterungsfunktion zu external Tables Hdfs_stream – Routine als Preprocessor-Funktionalität Liest direkt aus HDFS Aufruf aus der Datenbank heraus Z. B.: Insert into Ziel-Tabelle select * from Ext_Tab_HDFS; Input-Formate CSV Datapump

6 Oracle R Connector for Hadoop
Erweiterungs-Package (Library) für R-Engine Library(ORHC) Sinnvoll für Datenextrakt direkt aus HDFS und laden in memory der R-Engine Making Connections orhc.connect orhc.disconnect orhc.reconnect orhc.which Copying Data hdfs.upload hdfs.download hdfs.get hdfs.push hdfs.put hdfs.pull Exploring Files hdfs.attach hdfs.cd hdfs.exists hdfs.ls hdfs.mkdir hdfs.parts hdfs.pwd hdfs.rm hdfs.rmdir hdfs.sample hdfs.size Executing Scripts hadoop.exec hadoop.run

7 Big Data Connectors - Optionen
R-Package R-Package R Environment Oracle R Connector for Hadoop Oracle R Enterprise (Advanced Analytics) HDFS Cluster-Machines Oracle Server-Machine Oracle 11.2 hdfs_stream HDFS Oracle Direct Connector for HDFS (ODCH) External Table Preprocessor: hdfs_stream Target Table Hive Table CSV + / n CSV Data pump Offline Mode direct path convential path Parallel Execution Oracle Loader for Hadoop OCI Online Mode LoaderMap JDBC Partitioned + sorted MapReduce Job Framework

8 Big Data Connectors Oracle 11.2 HDFS Mails Mails Oracle R Connector
R-Package R-Package R Environment Oracle R Connector for Hadoop Oracle R Enterprise (Advanced Analytics) HDFS Cluster-Machines Oracle Server-Machine Oracle 11.2 HDFS hdfs_stream Oracle Direct Connector for HDFS (ODCH) External Table Preprocessor: hdfs_stream Target Table Mails Mails CSV Data pump Offline Mode direct path convential path Parallel Execution Oracle Loader for Hadoop OCI Online Mode LoaderMap JDBC Partitioned + sorted MapReduce Job Framework

9 Big Data Connectors – Das Demo-Szenario Oracle Direct Connector for HDFS (ODCH)
R-Package R Environment Oracle R Enterprise (Advanced Analytics) HDFS Cluster-Machines Oracle Server-Machine HDFS Oracle 11.2 Mails hdfs_stream Mails CSV Oracle Direct Connector for HDFS (ODCH) External Table Preprocessor: hdfs_stream Target Table lf1.hdfsm lf2.hdfsm Filter lf1.hdfsm lf3.hdfsm lf2.hdfsm lf3.hdfsm Util Mapper Reducer Jobs Oracle Loader for Hadoop LoaderMap MapReduce Job Framework

10 Oracle Hadoop Loader Utilities
Oracle Loader for Hadoop Oracle Direct Connector for Hadoop Distributed File System Oracle R Connector for Hadoop

11 Das Beispiel- Szenario
Mails / Blogs / Texte Bewertung / Äusserungen zu Produkten des Marktes Cloudera Framwork „Connectors“ in die neue Welt Oracle Loader for Hadoop und HDFS Big Data Appliance Mit Oracle R Enterprise neue Analyse-Horizonte entdecken Big Data Analysen mit Endeca

12 Die SERVICE GmbH Cross-Segment-Dienstleister Mails / Blogs / Texte
Bäumärkte Vermittlung von Handwerkerdienstleitungen Finanzierungs-Dienstleistungen Mails / Blogs / Texte Bewertung / Äusserungen zu Produkten des Marktes

13 Bekannte Daten D_KUNDE D_ARTIKEL D_ZEIT F_UMSATZ D_REGION
KUNDEN_ID KUNDENNR GESCHLECHT VORNAME NACHNAME TITEL ANREDE GEBDAT BRANCHE WOHNART KUNDENART BILDUNG ANZ_KINDER EINKOMMENSGRUPPE ORTNR NUMBER, BERUFSGRUPPE STATUS STRASSE TELEFON TELEFAX KONTAKTPERSON FIRMENRABATT BERUFSGRUPPEN_NR BILDUNGS_NR EINKOMMENS_NR WOHNART_NR HAUSNUMMER PLZ ORT KUNDENKARTE ZAHLUNGSZIEL_TAGE TOTAL TOTAL_NR Bekannte Daten PK D_ARTIKEL ARTIKEL_NAME GRUPPE_NR GRUPPE_NAME SPARTE_NAME SPARTE_NR ARTIKEL_ID D_ZEIT DATUM_ID TAG_DES_MONATS TAG_DES_JAHRES WOCHE_DES_JAHRES MONATS_NUMMER MONAT_DESC QUARTALS_NUMMER JAHR_NUMMER ZEIT_ID PK F_UMSATZ ARTIKEL_ID KUNDEN_ID ZEIT_ID REGION_ID KANAL_ID UMSATZ MENGE UMSATZ_GESAMT FK FK PK FK FK FK D_REGION REGION_ID ORTNR ORT KREISNR KREIS LANDNR LAND REGIONNR REGION PK D_VERTRIEBSKANAL KANAL_ID VERTRIEBSKANAL KANALBESCHREIBUNG VERANTWORTLICH KLASSE PK 13

14 Mails DWH Der Anwendungsfall Neue Information Produkte Feedback-System
Umsatzzahlen, Produktlisten Mails DWH The interest in big data has reached new highs. Everywhere you turn there is no escaping the buzz about big data. In this presentation we are going to look at some of the use cases for big data. But before we start on that, let’s take a look at why there’s all that interest in big data now. There are two trends that are helping to drive the interest in big data today. First, there is simply a lot more data being generated online today. On the one hand, there is a greater volume of human-generated data – from social media, to photographs, to , and so on. But there is also a lot more machine-generated data being generated as well - today sensors are cheap, and small enough to go anywhere. Think of smart meters, cell phones, security cameras, consumer products that phone home and so on. Interestingly, although there’s probably more human-generated data at the moment, that will certainly change some time in the next few years. The second trend that is driving interest in big data is the decreasing cost of hardware and the emergence of open source tools to store and processes all this data. One could argue that much of this “new data” has been available for many years, but it was never cost-effective to acquire and analyze all of it. Today, however, it is more economically feasible to do so. Analyzing this information can give insight into customers, target markets and so on. And it is this potential to gain new insights and in turn uncover new business cases and opportunities to improve the way you run your business, that is really how big data can pay off. NEW WAYS TO GENERATE DATA - We are finding new ways to monitor activities and processes, building new data streams… HIGH VELOCITY DATA FLOWS - These new data streams are increasingly being generated in real-time and they need to be analyzed in real-time to gain maximum benefit… VAST DATA POOLS - new sources are introducing a wide variety of schema-less data streams that need to be mined and analyzed to gain greater insight… ECONOMICS OF ANALYTICS – The total cost of acquiring, organizing and analyzing massive, varied and complex data sets is declining Sensors are cheap and small enough to go anywhere Growing digital ecosystem Neue Information

15 Mails DWH Der Anwendungsfall Neue Information Produkte Feedback-System
Umsatzzahlen, Produktlisten The interest in big data has reached new highs. Everywhere you turn there is no escaping the buzz about big data. In this presentation we are going to look at some of the use cases for big data. But before we start on that, let’s take a look at why there’s all that interest in big data now. There are two trends that are helping to drive the interest in big data today. First, there is simply a lot more data being generated online today. On the one hand, there is a greater volume of human-generated data – from social media, to photographs, to , and so on. But there is also a lot more machine-generated data being generated as well - today sensors are cheap, and small enough to go anywhere. Think of smart meters, cell phones, security cameras, consumer products that phone home and so on. Interestingly, although there’s probably more human-generated data at the moment, that will certainly change some time in the next few years. The second trend that is driving interest in big data is the decreasing cost of hardware and the emergence of open source tools to store and processes all this data. One could argue that much of this “new data” has been available for many years, but it was never cost-effective to acquire and analyze all of it. Today, however, it is more economically feasible to do so. Analyzing this information can give insight into customers, target markets and so on. And it is this potential to gain new insights and in turn uncover new business cases and opportunities to improve the way you run your business, that is really how big data can pay off. NEW WAYS TO GENERATE DATA - We are finding new ways to monitor activities and processes, building new data streams… HIGH VELOCITY DATA FLOWS - These new data streams are increasingly being generated in real-time and they need to be analyzed in real-time to gain maximum benefit… VAST DATA POOLS - new sources are introducing a wide variety of schema-less data streams that need to be mined and analyzed to gain greater insight… ECONOMICS OF ANALYTICS – The total cost of acquiring, organizing and analyzing massive, varied and complex data sets is declining Sensors are cheap and small enough to go anywhere Growing digital ecosystem Neue Information Mails DWH

16 Mails DWH Der Anwendungsfall Filter / Suchstrings
The interest in big data has reached new highs. Everywhere you turn there is no escaping the buzz about big data. In this presentation we are going to look at some of the use cases for big data. But before we start on that, let’s take a look at why there’s all that interest in big data now. There are two trends that are helping to drive the interest in big data today. First, there is simply a lot more data being generated online today. On the one hand, there is a greater volume of human-generated data – from social media, to photographs, to , and so on. But there is also a lot more machine-generated data being generated as well - today sensors are cheap, and small enough to go anywhere. Think of smart meters, cell phones, security cameras, consumer products that phone home and so on. Interestingly, although there’s probably more human-generated data at the moment, that will certainly change some time in the next few years. The second trend that is driving interest in big data is the decreasing cost of hardware and the emergence of open source tools to store and processes all this data. One could argue that much of this “new data” has been available for many years, but it was never cost-effective to acquire and analyze all of it. Today, however, it is more economically feasible to do so. Analyzing this information can give insight into customers, target markets and so on. And it is this potential to gain new insights and in turn uncover new business cases and opportunities to improve the way you run your business, that is really how big data can pay off. NEW WAYS TO GENERATE DATA - We are finding new ways to monitor activities and processes, building new data streams… HIGH VELOCITY DATA FLOWS - These new data streams are increasingly being generated in real-time and they need to be analyzed in real-time to gain maximum benefit… VAST DATA POOLS - new sources are introducing a wide variety of schema-less data streams that need to be mined and analyzed to gain greater insight… ECONOMICS OF ANALYTICS – The total cost of acquiring, organizing and analyzing massive, varied and complex data sets is declining Sensors are cheap and small enough to go anywhere Growing digital ecosystem Mails DWH Zählen / Statistik / Mining

17 Mails, Blogs etc

18 Neue Informationen Neue Informationen über Wahrnehmung und Wirkung Wie wird die Qualität von gekauften Artikeln bewertet? Was bedeutet „Bewertung“? Gibt es ein Klassifizierungssystem? Was ist gut, was ist schlecht? Wie kann „Bewertung“ gemessen werden? Können Begriffe klassifiziert werden? Wann ist ein Begriff positiv / wann negativ? Welche Produkte prägen das Image der Firma mit? Stimmungslegende saustark 1 sehr gut 2 hervorragend 3 klasse 4 super 5 super Sache 6 geil 7 affengeil 8 passgenau 9 eine Zumutung 10 Schrott 11 Katzenjammer 12 Müll 13 Sch.... 14

19 Meldungs- und Bewertungs-Mails zu Produkten
Produkt_Nr: 75 -> 17 Falsche_Beratung -> Das ist Mieskram . Produkt (Nr, Name) Standardisierter Meldungstext Freeform für Kommentare saustark sehr gut hervorragend klasse super super Sache geil affengeil passgenau eine Zumutung 1 2 3 4 5 6 7 8 9 10 Schrott 11 Katzenjammer Müll Sch.... 12 13 14 Stimmungslegende saustark sehr gut hervorragend klasse super super Sache geil affengeil passgenau eine Zumutung 1 2 3 4 5 6 7 8 9 10 Schrott 11 Katzenjammer Müll Sch.... 12 13 14 Stimmungslegende saustark sehr gut hervorragend klasse super super Sache geil affengeil passgenau eine Zumutung 1 2 3 4 5 6 7 8 9 10 Schrott 11 Katzenjammer Müll Sch.... 12 13 14 Stimmungslegende Ja Verpackung besch. Ja Verpackung falsch Ja Meldung

20 Schritt 1 Daten in das Hadoop Distributed File System laden
Hadoop fs –put mails input

21 Anzeigen lassen

22 2. Schritt: Die Filter bereitstellen
„Feine“ -Version „Strassen“ -Version Suchreihenfolge so: gut sehr gut oder so: sehr gut gut Hadoop fs –put filter.txt input

23 Auswahl der Filterkriterien
Begriffswahl abhängig von der sozialen Gruppe und vorherrschenden Sprachmittel der Gruppen Statistische Verwendungshäufigkeit bestimmter Wörter Gewichtung von Begriffen innerhalb einer positiv/negativ-Skala positiv negativ stark schwach Soziologische und linguistische Analysen

24 Die zu analysierenden Mails im HDFS

25 3. Schritt Der HADOOP Loader- Aufruf MapReduce - Steps

26

27 4. Schritt: External Table Direct Access HDFS

28 Big Data Connectors – Das Demo-Szenario Oracle Direct Connector for HDFS (ODCH)
R-Package R Environment Oracle R Enterprise (Advanced Analytics) HDFS Cluster-Machines Oracle Server-Machine HDFS Oracle 11.2 Mails hdfs_stream Mails CSV Oracle Direct Connector for HDFS (ODCH) External Table Preprocessor: hdfs_stream Target Table lf1.hdfsm lf2.hdfsm Filter lf1.hdfsm lf3.hdfsm lf2.hdfsm lf3.hdfsm Util Mapper Reducer Jobs Oracle Loader for Hadoop LoaderMap MapReduce Job Framework

29 Weitere Schritte Bewertung D_Artikel Nutzbarkeit
create table Bewertung (Produkt_Nr number, Fehler_Nr number, Bewertung varchar2(50), Treffer number) As select * from mail_wert; Bewertung D_Artikel Nutzbarkeit PRODUKT_NR FEHLER_NR BEWERTUNG TREFFER ARTIKEL_NAME ARTIKEL_ID GRUPPE_NR GRUPPE_NAME SPARTE_NAME SPARTE_NR NUTZ_NR NUTZ_WERT

30 Bewertungsstatistik Berechne für jeden Artikel den Mittelwert aller Bewertungen über eine Skala von

31 Weitere Fragen Welchen Anteil haben die Produkte mit einer Bewertung unter 11 am Gesamtumsatz? Wie verteilen sich die Produkte mit einer Bewertung von unter 11 auf die Regionen? Wie verteilen sich die Produkte mit einer Bewertung von unter 11 auf die Vertriebskanäle? positiv negativ stark schwach

32

33


Herunterladen ppt "DATA WAREHOUSE Oracle Data Warehouse Mit Big Data neue Horizonte für das Data Warehouse ermöglichen Alfred Schlaucher, Detlef Schroeder DATA WAREHOUSE."

Ähnliche Präsentationen


Google-Anzeigen