Prof. Dr. Stefan Edlich NoSQL in der Cloud
nosqlberlin.de nosqlfrankfurt.de nosql powerdays
http://nosql-database.org
NoSQL is specialization! Big Data Massive Write Performance Fast KV Access Write Availability Flexible Schema (Migration) + Flexible Datatypes Easier maintainability, administration and operations No single point of failure Programmer ease of use
Theorie?! Map/Reduce Map/Reduce Nachfolger! ACID / BASE & CAP P liegt in der Regel nie vor! Consistent Hashing Basis skalierbarer K/V Stores MVCC non blocking Vorteile Vector Clocks [122:1] [147:2|122:1] [97:3|147:2|122:1]
Schnittstellen REST Language API Thrift/Avro Map/Reduce Get/Put Datenmodell Column Family DocumentDB Key / Value Graph Persistenz Disk Full Memory Hybrid Pluggable
Google Protocol Buffers =>
Apache Avro! JSON Binary data transfer automatic RPC generation no code generation Client + Server tauschen Schema bei Änderung unbedingt evaluieren!
Datenmodelle
Column Family DocumentDBs Key/ValueDBs GraphDBs andere Voldemort, Chordless, Scalaris, Dynamo / Dynomite Key/ValueDBs GraphDBs db4o, Versant, Objectivity, Gemstone, Progress, Mark Logic, EMC Momentum, Tamino, GigaSpaces, Hazelcast, Terracotta, … andere
HBase Cassandra SimpleDB
+ Skalierung = new node + Community + API - Replikation - Aufsetzen, Optimierung, Wartung + Konfiguration (r, w) - Dokumentation - Abfragen + stressfreie SaaS Lösung + transparent scaling - UTF-8 String - Daten liegen bei Amazon +- kein tuning / config
Document Databases
any JS-Client no Middleware! DB+WebServer +evolving App
2.Runde += 6,5 Mio $ http://highscalability.com/blog/2010/10/15/troubles-with-sharding-what-can-we-learn-from-the-foursquare.html
nicht normalisiert (Duplicates, Delete Orphans, ...) (konfigurierbare Zeit Crash anfällig) (Journaling) Eventually Consistent echte Skalierung nur über Sharding - (noch nicht kill -9 fest)
EC2 Node 66 GB EC2 Node 66 GB 67 GB Index Data http://highscalability.com/blog/2010/10/15/troubles-with-sharding-what-can-we-learn-from-the-foursquare.html 11 hours + 1 day off
+ nicht normalisiert + Schema Agilität + Doku exzellent + Speed (MemMapped Files) + Installation+save =28 sek! + beliebige Indizes + MapReduce + Rich Query Language + GridFS (statt HDFS) + einfache Replizierung (Master-Slave / Replica Sets)
db.system.indexes.find(); db.friends.getIndexes(); db.friends.ensureIndex({friend: 1}); db.friends.ensureIndex({friend: 1, zip: 1}); //compound db.friends.find({friend: „Mario“, zip: „13755“}).explain(); Queries: age: {$gt: 10} food:{$all: [„pizza“, „noodles“]} $gt, $lt, $lte, $ne, $in, $nin, $mod, $all, $size, $exists, $type, , $or, $elem, $elemMatch, regexp, ... NoSQL Query LockIn?!
Sich veränderndes Schema Migrations Architektur-Pattern: A) Blacklist rename try { ... } catch (FirstException | SecondException ex) { // newName = BlackList.checkName(OldName)}
B) „Rails“ Migration new name new name new name new name old name (nicht wenn zu oft repliziert)
Duplikate = Space Aktualität der Daten „Pre-Joined“ Daten! „pre-computeD“ wachsende Daten raus oder Pre-SPACED
In die Cloud…
64 bit [extra | double | quadrupel] Large Clients mongos ROUTER Config Servers Replica Set Shard A Shard B Shard C RAM+ DISK+ POSSIBLE ARBITER micro 64 bit [extra | double | quadrupel] Large
Erfahrungen… RAID Konfigurationen (00,01,10,03,05, …) Journaling-Dateisysteme (ext4, xfs, …) (Security) Ports, F-Deskriptoren, Snapshots,… www.mongodb.org/display/DOCS/Amazon+EC2 Postgres & Cassandra
K/V-Stores + sehr schnell > 100.000 /sek + konfigurierbarer Disc sync + API für eigene Anbindung + einfache Replikation + hash, list, set, sorted set, messages + Installation UNIX: 38 sek Windows: 18 sek - cloud-cluster erst in Version 3.* Datenstrukturen abbilden ->
Sorted Set
memcached API
simply dynamic scaling (up & down) scales linear bullet proof by Zynga.com limited membase protocol Membase Tap (Protocol Interception) Code-Node:
Membase in der Cloud Fertige RightScale & AMI templates Diverse Ports öffnen DNS Eintrag und keine verändernden IPs Master Node angeben legt Quota für die Erben fest Backups für EBS
GraphDBs Property Graph
player Neo4j, OrientDB, Sones, Infinite-Graph, MS-Trinity, HyperGraphDB, Inf-Grid, Dex, VertexDB, Filament, HyperGraphDB, Allegro Graph, Bigdata, Open-Link-Virtuoso, VertexDB, FlockDB
Graph DBs in der Cloud > N Milliarden Knoten? Sharding! aber meistens kein „predictable lookup“ möglich nur bei Domain Specific Knowledge ausbalancierte DBs ohne sweet spots kaum möglich Access Patterns + Heuristiken (Insert Sharding / Runtime Sharding) => partitionierungs Algorithmen (HA) Neo4j Cache Sharding! Multi-Master Cluster for Consistent Routing
> 220 DBs durchaus frustrierendes Consulting…
other Non-Functional Requirements Data Transactions Performance Queries Architecture other Non-Functional Requirements
Data- / Storage-Model: Analyse your Data Domain-Data, Log-Data, Event-Data, Message-Data, critical Data, Business-Data, Meta-Data, temp Data, Session-Data, Geo Data, etc. Data- / Storage-Model: relational, column-o, doc-alike, graphs, objects, etc. What Types / Type-System? Data-Navigation, Data Amount, Data Komplexity (Deep XML?) ACID vs. BASE vs. Mixture? CAP decisions Performance Dimension Analysis Latency, Request behaviour, Throughput Scale-Up vs Scale-Out Query Requirements Typical queries, Tools, Ad-Hoc Queries, SQL / LINQ needed, Map/Reduce? … Distribution Architecture local, parallel, distributed / grid, service, cloud, mobile, p2p, … Data Access Patterns read / write distribution, random / sequential, Access Design Patterns Non Functional Requirements: Replication, Refactoring Frequency, DB-Support, Qualification / simplicity, Company restrictions, DB diversity (allowed?), Security, Safety / Backup & Restore, Crash Resistance, Licence…
NoSQL FAZIT
Unbedingt RAM & SDD annehmen! RethinkDB Gustavo Alonso Lot‘s of >1 PT RAM DBs in California! SAP-Strategie? Service, RAM, Cloud, Mobile
DaaS Zeitalter Amazon statt 225 Mio $ nur 90 Mio $ Alleine für MongoDB weit über 100 „Database-as-a-Service“ Provider! Amazon: SimpleDB, Hadoop, etc.
Viele clevere hybrid Lösungen! CouchBase, Hadoop+MySQL
Availability Ad Hoc Query OLAP Database-aaS => best Mix!
unkritische Daten kritische Daten Management Analytics (View, Domain, Stamm, Meta, Log, …) by Couch, MongoDB, Redis, Membase, … unkritische Daten kritische Daten Management Zahlungsdaten, persönliche Daten, … by classic RDBMS, Vertica, VoltDB, Database.com, GenieDB, … Hadoop* BI OLAP BI Analytics Dwight Merriman (10gen)
Links nosql-database.org nosqltapes.com mynosql.com .com
Thanks for listening! http://edlich.de Diskussion!
funktionale (graph) Dekomposition? Oder… Schutzpatent Group By Uralt, Lisp, Use Case: Aggregate pi -> 1015 -> 1000 cluster
„A giant step back! Imcompatible, missing features, not new, …“ Programmierung top! Programmierung nervt! herrlich paralellisierbar Nur `large data indexing` „A giant step back! Imcompatible, missing features, not new, …“ Stonebraker Starke Konkurrenz: Stratosphere (TUB), ePic, SwissBox, etc.
=> Paralellization Contracts compile, analyze, optimize Cross Paralellization Contracts Map Match CoGroup Reduce Graph Ops u.v.m… => compile, analyze, optimize auf einer atmenden Cloud!
Eventually Consistent ACID WATER (basically available, soft state, eventually consistent) BASE Amazon Dynamo MySQL Replikation
Consistency Models © Wilfried Springer NoSQL Rollercoaster
CAP Theoreme Pick 2! System is always ‘ on‘ Clients find replicas Availability Partition Tolerance Consistency System is always ‘ on‘ Clients find replicas Pick 2! Klassiker NoSQL ACID / Isolation Clients see equal data
„Don‘t throw C away so easy! It‘s complex.“ What you really have is: Application errors Repetable DBMS errors Unrepeatable DBMS errors Operating System errors Hardware failure in cluster Network partition in local cluster A disaster WAN failure http://cacm.acm.org/blogs/blog-cacm/83396-errors-in-database-systems-eventual-consistency-and-the-cap-theorem/fulltext http://news.ycombinator.com/item?id=1817731
6 = Network Partition is rare 3,4,5,6 is mostly a Single Node Algorithms can help! „give up P rather than sacrificing C. Use VoltDB or NimbusDB”
Consistent Hashing ausfallsicher leicht erweiterbar W = 2*W KNOTEN REPLIKAT 2 M N,O 8 N O,P 10 O P,Q 17 P Q,R 22 Q R,M 26 R M,N M:[0,5) R:[25,30) N:[5,10) Q:[20,25) Altbekannt seit 1997 in Web-Servern O:[10,15) P:[15,20) ausfallsicher leicht erweiterbar gut verteilt / vnodes W = 2*W R = 1*R
Multi Version Concurrency Control MVCC Multi Version Concurrency Control Bei allen relationalen DBs längst drin pessimistisches Locking?
Vector Clocks => Anna Paul Laura laufen A:1 L:1 surfen P:1 L:1