Präsentation herunterladen
Die Präsentation wird geladen. Bitte warten
Veröffentlicht von:Harm Bockrath Geändert vor über 10 Jahren
1
Prof. Dr. Stefan Edlich NoSQL in der Cloud
2
nosqlberlin.de nosqlfrankfurt.de nosql powerdays
6
NoSQL is specialization!
Big Data Massive Write Performance Fast KV Access Write Availability Flexible Schema (Migration) + Flexible Datatypes Easier maintainability, administration and operations No single point of failure Programmer ease of use
7
Theorie?! Map/Reduce Map/Reduce Nachfolger!
ACID / BASE & CAP P liegt in der Regel nie vor! Consistent Hashing Basis skalierbarer K/V Stores MVCC non blocking Vorteile Vector Clocks [122:1] [147:2|122:1] [97:3|147:2|122:1]
8
Schnittstellen REST Language API Thrift/Avro Map/Reduce Get/Put Datenmodell Column Family DocumentDB Key / Value Graph Persistenz Disk Full Memory Hybrid Pluggable
9
Google Protocol Buffers
=>
10
Apache Avro! JSON Binary data transfer automatic RPC generation
no code generation Client + Server tauschen Schema bei Änderung unbedingt evaluieren!
11
Datenmodelle
12
Column Family DocumentDBs Key/ValueDBs GraphDBs andere
Voldemort, Chordless, Scalaris, Dynamo / Dynomite Key/ValueDBs GraphDBs db4o, Versant, Objectivity, Gemstone, Progress, Mark Logic, EMC Momentum, Tamino, GigaSpaces, Hazelcast, Terracotta, … andere
13
HBase Cassandra SimpleDB
14
+ Skalierung = new node + Community + API - Replikation - Aufsetzen, Optimierung, Wartung
+ Konfiguration (r, w) - Dokumentation - Abfragen + stressfreie SaaS Lösung + transparent scaling - UTF-8 String - Daten liegen bei Amazon +- kein tuning / config
15
Document Databases
16
any JS-Client no Middleware! DB+WebServer +evolving App
17
2.Runde += 6,5 Mio $
18
nicht normalisiert (Duplicates, Delete Orphans, ...)
(konfigurierbare Zeit Crash anfällig) (Journaling) Eventually Consistent echte Skalierung nur über Sharding - (noch nicht kill -9 fest)
19
EC2 Node 66 GB EC2 Node 66 GB 67 GB Index Data
11 hours + 1 day off
20
+ nicht normalisiert + Schema Agilität + Doku exzellent + Speed (MemMapped Files) + Installation+save =28 sek! + beliebige Indizes + MapReduce + Rich Query Language + GridFS (statt HDFS) + einfache Replizierung (Master-Slave / Replica Sets)
21
db.system.indexes.find();
db.friends.getIndexes(); db.friends.ensureIndex({friend: 1}); db.friends.ensureIndex({friend: 1, zip: 1}); //compound db.friends.find({friend: „Mario“, zip: „13755“}).explain(); Queries: age: {$gt: 10} food:{$all: [„pizza“, „noodles“]} $gt, $lt, $lte, $ne, $in, $nin, $mod, $all, $size, $exists, $type, , $or, $elem, $elemMatch, regexp, ... NoSQL Query LockIn?!
22
Sich veränderndes Schema
Migrations Architektur-Pattern: A) Blacklist rename try { ... } catch (FirstException | SecondException ex) { // newName = BlackList.checkName(OldName)}
23
B) „Rails“ Migration new name new name new name new name old name
(nicht wenn zu oft repliziert)
24
Duplikate = Space Aktualität der Daten
„Pre-Joined“ Daten! „pre-computeD“ wachsende Daten raus oder Pre-SPACED
25
In die Cloud…
26
64 bit [extra | double | quadrupel] Large
Clients mongos ROUTER Config Servers Replica Set Shard A Shard B Shard C RAM+ DISK+ POSSIBLE ARBITER micro 64 bit [extra | double | quadrupel] Large
27
Erfahrungen… RAID Konfigurationen (00,01,10,03,05, …)
Journaling-Dateisysteme (ext4, xfs, …) (Security) Ports, F-Deskriptoren, Snapshots,… Postgres & Cassandra
28
K/V-Stores + sehr schnell > /sek + konfigurierbarer Disc sync + API für eigene Anbindung + einfache Replikation + hash, list, set, sorted set, messages + Installation UNIX: 38 sek Windows: 18 sek - cloud-cluster erst in Version 3.* Datenstrukturen abbilden ->
29
Sorted Set
30
memcached API
31
simply dynamic scaling (up & down)
scales linear bullet proof by Zynga.com limited membase protocol Membase Tap (Protocol Interception) Code-Node:
32
Membase in der Cloud Fertige RightScale & AMI templates
Diverse Ports öffnen DNS Eintrag und keine verändernden IPs Master Node angeben legt Quota für die Erben fest Backups für EBS
33
GraphDBs Property Graph
34
player Neo4j, OrientDB, Sones, Infinite-Graph, MS-Trinity, HyperGraphDB, Inf-Grid, Dex, VertexDB, Filament, HyperGraphDB, Allegro Graph, Bigdata, Open-Link-Virtuoso, VertexDB, FlockDB
35
Graph DBs in der Cloud > N Milliarden Knoten? Sharding!
aber meistens kein „predictable lookup“ möglich nur bei Domain Specific Knowledge ausbalancierte DBs ohne sweet spots kaum möglich Access Patterns + Heuristiken (Insert Sharding / Runtime Sharding) => partitionierungs Algorithmen (HA) Neo4j Cache Sharding! Multi-Master Cluster for Consistent Routing
36
> 220 DBs durchaus frustrierendes Consulting…
37
other Non-Functional Requirements
Data Transactions Performance Queries Architecture other Non-Functional Requirements
38
Data- / Storage-Model:
Analyse your Data Domain-Data, Log-Data, Event-Data, Message-Data, critical Data, Business-Data, Meta-Data, temp Data, Session-Data, Geo Data, etc. Data- / Storage-Model: relational, column-o, doc-alike, graphs, objects, etc. What Types / Type-System? Data-Navigation, Data Amount, Data Komplexity (Deep XML?) ACID vs. BASE vs. Mixture? CAP decisions Performance Dimension Analysis Latency, Request behaviour, Throughput Scale-Up vs Scale-Out Query Requirements Typical queries, Tools, Ad-Hoc Queries, SQL / LINQ needed, Map/Reduce? … Distribution Architecture local, parallel, distributed / grid, service, cloud, mobile, p2p, … Data Access Patterns read / write distribution, random / sequential, Access Design Patterns Non Functional Requirements: Replication, Refactoring Frequency, DB-Support, Qualification / simplicity, Company restrictions, DB diversity (allowed?), Security, Safety / Backup & Restore, Crash Resistance, Licence…
39
NoSQL FAZIT
40
Unbedingt RAM & SDD annehmen!
RethinkDB Gustavo Alonso Lot‘s of >1 PT RAM DBs in California! SAP-Strategie? Service, RAM, Cloud, Mobile
41
DaaS Zeitalter Amazon statt 225 Mio $ nur 90 Mio $ Alleine für MongoDB weit über 100 „Database-as-a-Service“ Provider! Amazon: SimpleDB, Hadoop, etc.
42
Viele clevere hybrid Lösungen!
CouchBase, Hadoop+MySQL
43
Availability Ad Hoc Query OLAP Database-aaS => best Mix!
44
unkritische Daten kritische Daten Management Analytics
(View, Domain, Stamm, Meta, Log, …) by Couch, MongoDB, Redis, Membase, … unkritische Daten kritische Daten Management Zahlungsdaten, persönliche Daten, … by classic RDBMS, Vertica, VoltDB, Database.com, GenieDB, … Hadoop* BI OLAP BI Analytics Dwight Merriman (10gen)
45
Links nosql-database.org nosqltapes.com mynosql.com .com
46
Thanks for listening! Diskussion!
47
funktionale (graph) Dekomposition? Oder…
Schutzpatent Group By Uralt, Lisp, Use Case: Aggregate pi -> > 1000 cluster
48
„A giant step back! Imcompatible, missing features, not new, …“
Programmierung top! Programmierung nervt! herrlich paralellisierbar Nur `large data indexing` „A giant step back! Imcompatible, missing features, not new, …“ Stonebraker Starke Konkurrenz: Stratosphere (TUB), ePic, SwissBox, etc.
49
=> Paralellization Contracts compile, analyze, optimize
Cross Paralellization Contracts Map Match CoGroup Reduce Graph Ops u.v.m… => compile, analyze, optimize auf einer atmenden Cloud!
50
Eventually Consistent
ACID WATER (basically available, soft state, eventually consistent) BASE Amazon Dynamo MySQL Replikation
51
Consistency Models © Wilfried Springer NoSQL Rollercoaster
52
CAP Theoreme Pick 2! System is always ‘ on‘ Clients find replicas
Availability Partition Tolerance Consistency System is always ‘ on‘ Clients find replicas Pick 2! Klassiker NoSQL ACID / Isolation Clients see equal data
53
„Don‘t throw C away so easy! It‘s complex.“
What you really have is: Application errors Repetable DBMS errors Unrepeatable DBMS errors Operating System errors Hardware failure in cluster Network partition in local cluster A disaster WAN failure
54
6 = Network Partition is rare
3,4,5,6 is mostly a Single Node Algorithms can help! „give up P rather than sacrificing C. Use VoltDB or NimbusDB”
55
Consistent Hashing ausfallsicher leicht erweiterbar W = 2*W
KNOTEN REPLIKAT 2 M N,O 8 N O,P 10 O P,Q 17 P Q,R 22 Q R,M 26 R M,N M:[0,5) R:[25,30) N:[5,10) Q:[20,25) Altbekannt seit 1997 in Web-Servern O:[10,15) P:[15,20) ausfallsicher leicht erweiterbar gut verteilt / vnodes W = 2*W R = 1*R
56
Multi Version Concurrency Control
MVCC Multi Version Concurrency Control Bei allen relationalen DBs längst drin pessimistisches Locking?
58
Vector Clocks => Anna Paul Laura laufen A:1 L:1 surfen P:1 L:1
Ähnliche Präsentationen
© 2024 SlidePlayer.org Inc.
All rights reserved.