Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.

Ähnliche Präsentationen


Präsentation zum Thema: "VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism."—  Präsentation transkript:

1 VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism and Signaling Edda Klipp Humboldt University Berlin Lecture 2 / WS 2007/08 Basic Principles of Graph Theory and Random Networks

2 VL Netzwerke, WS 2007/08 Edda Klipp 2 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Basic Principles of Graph Theory Literature: J. Sedlácek (1968) Einführung in die Graphentheorie. Teubner Verlagsgesellschaft, Leipzig. Albert & Barabási (2002) Statistical mechanics of complex networks. Rev Mod Physics, 74, 47-97. Barabási & Oltvai (2004) Network biology: understanding the cells functional organization, Nature Review Genetics, 5, 101-113.

3 VL Netzwerke, WS 2007/08 Edda Klipp 3 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Classical Examples The problem of Fährmann, Ziege, Wolf und Heu (F,Z,W,H) (W,H) (F,W,H) (W) (H) (F,Z,W) (F,Z,H) (F,Z) (Z) (0)

4 VL Netzwerke, WS 2007/08 Edda Klipp 4 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics The Bridges of Königsberg Die Brücken von Königsberg Im Zentrum der preussischen Stadt Königsberg (heute Kaliningrad) bildet der Fluss Pregel beim Zusammenfluss zweier Arme eine Insel. Im 18. Jahrhundert verbinden 7 Brücken die Flussufer mit der Insel. Es stellt sich die Frage, ob es einen Rundweg gibt, bei dem man alle 7 Brücken genau einmal überquert und wieder zum Ausgangspunkt zurück gelangt. Geschichte Das Problem der Königsberger Brücken stammt von Leonhard Euler. Im Jahre 1736 beweist er, dass es keinen solchen Rundweg geben kann. Er betrachtet den allgemeinen Fall mit einer beliebigen Anzahl Inseln und Brücken und zeigt, dass ein Rundweg der gesuchten Art genau dann möglich ist, wenn sich an keinem der Ufer eine ungerade Zahl von Brücken befindet. Gibt es an genau zwei Ufern eine ungerade Anzahl Brücken, dann existiert ein Weg, der bei diesen beiden Ufern beginnt und endet und dabei alle Brücken genau einmal überquert. Gibt es, wie in Königsberg, mehr als zwei Gebiete, zu denen eine ungerade Zahl von Brücken führt, dann kann kein Weg existieren, der genau einmal alle Brücken überquert.

5 VL Netzwerke, WS 2007/08 Edda Klipp 5 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graphs: Definitions A edge vertex, node A graph is a tuple (V,E) with V a set of n vertices and a set of m edges E : G=(V,E) Example: Proteins – vertices, interactions – edges B C vertex – Knoten edge – Kante tuple – Tupel, geordnete Menge set – Menge

6 VL Netzwerke, WS 2007/08 Edda Klipp 6 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graphs: Completeness A edge vertex B C Edge AB is has vertices A and B. Knoten A ist inzidiert mit Kante AB. Be E 0 the set of all sub-sets of V with two elements. A graph is complete, if E=E 0. a)b)c) d) If G 1 =(V 1,E 1 ) G 2 =(V 2,E 2 ) and : G 1 and G 2 are complementary. d)

7 VL Netzwerke, WS 2007/08 Edda Klipp 7 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph Types Undirected graphs: Directed graphs (digraphs): directed edge (i,j) E with i denoting the head and j denoting the tail of the edge. A B A B Extension: Directed edge (i,j,s) E with s {+1,-1} to represent activatory or inhibitory influences. A B

8 VL Netzwerke, WS 2007/08 Edda Klipp 8 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph Types: Biparite Graphs A BD C R1R1 Set of graph vertices decomposed into two disjoint sets such that no two graph vertices within the same set are adjacent. Graphs must represent two distinct classes of nodes such as metabolites (blue, circles) and reactions (yellow, boxes) ATP Fruc-6-PFruc-1,6-P 2 ADP R1R1

9 VL Netzwerke, WS 2007/08 Edda Klipp 9 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph Representation: Adjacency Matrix ABCDEFG A0100000 B0011000 C0000000 D0010010 E0001000 F0000101 G0000000 AB C E D F G Adjacency matrix A : non-zero entries represent edges - quadratic - unique assignment of adjacency matrix to graph - unique assignment of graph to adjacency matrix Bipartite graphs: sub-matrices for the two classes of nodes Alternative formats: edge lists, vertex lists Adjacency matrix – Inzidenzmatrix

10 VL Netzwerke, WS 2007/08 Edda Klipp 10 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph Theoretical Measures: Degree ABCDEFG A0100000 B0011000 C0000000 D0010010 E0001000 F0000101 G0000000 AB C E D F G Number of edges to which a vertex is connected: Degree k. For directed graphs: in-degree – edges ending at a vertex out-degree – edges starting a vertex Vertices with degree 0: isolated Degree – Knotengrad Be G a finite graph, v the number of nodes, k the number of edges and s 1, s 2,…s u the degrees of the individual nodes, then holds:

11 VL Netzwerke, WS 2007/08 Edda Klipp 11 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph Theoretical Measures: Degree AB C E D F G Global connectivity properties of a graph: Average degree Degree distribution P(k) k in 0 1 2 P(k in ) 1/7 4/7 2/7 k out 0 1 2 P(k out ) 1/7 4/7 2/7 = (4x1 + 2x2)/7=8/71,14 Degree distributions allow to distinguish between different types of networks

12 VL Netzwerke, WS 2007/08 Edda Klipp 12 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Einschub: Diskrete Wahrscheinlichkeitsverteilungen Binomialverteilung: Summe aller Wahrscheinlichkeiten E(X) = np Erwartungswert (vgl.: Mittelwert für sehr viele Wiederholungen) Var(X) = np(1-p) Varianz Eigenschaften einer Stichprobe: Wenn das gewünschte Ergebnis eines Versuches die Wahrscheinlichkeit p besitzt, und die Zahl der Versuche n ist, dann gibt die Binomialverteilung an, mit welcher Wahrscheinlichkeit sich insgesamt k Erfolge einstellen. P(k) ist die Wahrscheinlichkeit (z.B. mit n Versuchen aus einem Topf von Bällen k schwarze zu ziehen) p=1/2 P(k)P(k)

13 VL Netzwerke, WS 2007/08 Edda Klipp 13 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Einschub: Diskrete Wahrscheinlichkeitsverteilungen Poissonverteilung: Eigenschaften einer Stichprobe: Wie vorher, nur bei sehr kleiner Wahrscheinlichkeit der Einzelereignisse, z.B. weil n sehr groß. - Ereignisrate (z.B. Fehlerrate bei der DNS-Replikation) E(X) = Erwartungswert (vgl.: Mittelwert für sehr viele Wiederholungen) Var(X) = Varianz Exponentialverteilung: 020406080100 0.8 0.6 0.4 0.2 0.0 E(X) = 1/ Var(X) = 1/

14 VL Netzwerke, WS 2007/08 Edda Klipp 14 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Degree Distributions Degree distribution of the World Wide Web from two different measurements: h, the 325 729-node sample of Albert et al. (1999); s, the measurements of over 200 million pages by Broder et al. (2000); (a)degree distribution of the outgoing edges; (b)degree distribution of the incoming edges. The data have been binned logarithmically to reduce noise. Albert & Barabasi, 2002, Rev Mod Phys

15 VL Netzwerke, WS 2007/08 Edda Klipp 15 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Degree Distributions Albert & Barabasi, 2002, Rev Mod Phys The degree distribution of several real networks: (a) Internet at the router level. Data courtesy of Ramesh Govindan; (b) movie actor collaboration network. After Barabasi and Albert 1999. Note that if TV series are included as well, which aggregate a large number of actors, an exponential cutoff emerges for large k (Amaral et al., 2000); (c) co-authorship network of high- energy physicists. After Newman (2001a,2001b); (d) co-authorship network of neuroscientists. After Barabasi et al. (2001).

16 VL Netzwerke, WS 2007/08 Edda Klipp 16 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Degree Distributions Jeong H et al, 2000, Nature Connectivity distributions P(k) for substrates. a, Archaeoglobus fulgidus (archae); b, E. coli (bacterium); c, Caenorhabditis elegans (eukaryote), shown on a log±log plot, counting separately the incoming (In) and outgoing links (Out) for each substrate. kin (kout) corresponds to the number of reactions in which a substrate participates as a product (educt). d, The connectivity distribution averaged over all 43 organisms.

17 VL Netzwerke, WS 2007/08 Edda Klipp 17 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Random Graphs First well-know example: Model of Paul Erdős and Alfréd Rényi History: Erdős number beschreibt die Distanz im Graphen der Koautorenschaft bezogen auf den Mathematiker Paul Erdős. Im Graphen werden die publizistisch verwandten Autoren als Knoten repräsentiert, zwischen denen jeweils dann eine Kante existiert, wenn sie eine Publikation gemeinsam verfasst haben. Paul Erdős selbst hat die Erdős-Zahl 0, alle Koautoren, mit welchen er publiziert hat, haben die Erdős-Zahl 1. Autoren, die mit Koautoren von Paul Erdős publiziert haben, haben die Erdős-Zahl 2 usw. Wenn keine Verbindung in dieser Form zu einer Person herstellbar ist, ist ihre Erdős-Zahl. Es zeigt sich, dass die Erdős-Zahl der meisten Personen entweder unendlich oder erstaunlich gering ist. Letzteres rührt vor allem daher, dass Erdős mit über 500 verschiedenen Wissenschaftlern gemeinsam publizierte und er in vielen Teilbereichen der Mathematik bewandert war.

18 VL Netzwerke, WS 2007/08 Edda Klipp 18 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Random Graphs n1n1 n2n2 n3n3 n4n4 n5n5 n1n1 x-x- n2n2 --x n3n3 x- n4n4 - n5n5 A well-know example: Model of Paul Erdős and Alfréd Rényi Start with N nodes. Connect every pair of nodes with probability p Dice number z [0;1]. If z<p then connection Obtain graph with approx. ½ pN (N-1) edges Degree distribution: Poisson distribution Average degree: = ½ pN (N-1) * 2/N = p(N-1) pN

19 VL Netzwerke, WS 2007/08 Edda Klipp 19 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Random Graphs: Evolution Construction of random graphs is called evolution. Starting with a set of isolated vertices, the graph develops by the successive addition of random edges. The graphs obtained at different stages of this process correspond to larger and larger connection probabilities p, eventually obtaining a fully connected graph having the maximum number of edges n=N(N-1)/2 for p 1.

20 VL Netzwerke, WS 2007/08 Edda Klipp 20 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Random Networks Questions: Are real networks really random? Display real networks organization principles? Is a typical graph connected? (depending on p ) Does it contain a triangle of connected nodes? Does its diameter depends on its size?

21 VL Netzwerke, WS 2007/08 Edda Klipp 21 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Random Networks: Subgraphs A graph G 1 consisting of a set V 1 of nodes and a set E 1 of edges is a subgraph of a graph G={V,E} if all nodes in V 1 are also nodes of V and all edges in E 1 are also edges of E. A cycle of order k is a closed loop of k edges such that every two consecutive edges and only those have a common node. Average degree: 2 The opposite of cycles are the trees, which cannot form closed loops. More precisely, a graph is a tree of order k if it has k nodes and k-1 edges, and none of its subgraphs is a cycle. Average degree of a tree of order k : =2-2/k ( 2 for large trees) Triangle Rectangle

22 VL Netzwerke, WS 2007/08 Edda Klipp 22 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Random Networks: Subgraphs The threshold probabilities at which different subgraphs appear in a random graph. For pN 3/2 0 the graph consists of isolated nodes and edges. For p~N -3/2 trees of order 3 appear, while for p~N -4/3 trees of order 4 appear. At p~N -1 trees of all orders are present, and at the same time cycles of all orders appear. The probability p~N -2/3 marks the appearance of complete subgraphs of order 4 and p~N -1/2 corresponds to complete subgraphs of order 5. As z approaches 0, the graph contains complete subgraphs of increasing order.

23 VL Netzwerke, WS 2007/08 Edda Klipp 23 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Degree Distribution The degree distribution that results from the numerical simulation of a random graph. We generated a single random graph with N=10 000 nodes and connection probability P=0.0015, and calculated the number of nodes with degree k, Xk. The plot compares X k /N with the expectation value of the Poisson distribution (13), E(X k )/N=P(k i =k), and we can see that the deviation is small.

24 VL Netzwerke, WS 2007/08 Edda Klipp 24 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph-theoretical Measure: Distance Path: Connection between two vertices u and v without repetition of nodes (i.e. no backtracking, no loops) Shortest path length l(u,v) : Local measure for two nodes Average shortest path length Global network property indicating navigability A B C E D F G

25 VL Netzwerke, WS 2007/08 Edda Klipp 25 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph-theoretical Measure: Distance Breadth-first search: Exploration of all nodes in a graph starting from those adjacent to a current node. Dijkstras algorithm: Construct shortest-path tree from a source to every other vertex (vertex number N : O(N 2 ) ) A B C E D F G

26 VL Netzwerke, WS 2007/08 Edda Klipp 26 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph-theoretical Measure: Diameter Strictly speaking, the diameter of a disconnected graph (i.e., one made up of several isolated clusters) is infinite, but it can be defined as the maximum diameter of its clusters. Random graphs tend to have small diameters, provided p is not too small. If = pN < 1, a typical graph is composed of isolated trees and its diameter equals the diameter of a tree. If > 1, a giant cluster appears. The diameter of the graph equals the diameter of the giant cluster if >3.5, and is proportional to ln(N)/ln( ). If >ln(N), almost every graph is totally connected. The diameters of the graphs having the same N and are concentrated on a few values around ln(N)/ln( ). AB C E D F G The diameter of a graph is the maximal distance between any pair of its nodes.

27 VL Netzwerke, WS 2007/08 Edda Klipp 27 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph-theoretical Measures: Clustering AB C E D F G Clustering coefficient C(v) for node v : Ratio between the number of edges linking nodes adjacent to v and the total number of possible edges among them (at most k v (k v -1)/2 for k v neighbors) C( D ) =1/3 Adjacent nodes: B, C, E, F Number of links: 2 Possible number of links : 6 Idea behind: In many networks, if node A is connected to B, and B is connected to C, then it is highly probable that A also has a direct link to C.

28 VL Netzwerke, WS 2007/08 Edda Klipp 28 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph-theoretical Measures: Clustering AB C E D F G Average clustering coefficient : Tendency of the network to form clusters or groups Average clustering coefficient for all nodes with k links C(k) : Diversity of cohesiveness of local neighborhoods C( A ) =0 C(B) =1/3 C(C) =1 C(D) =1/3 C(E) =1 C(F) =1/3 C(G) =0 =3/7

29 VL Netzwerke, WS 2007/08 Edda Klipp 29 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph-theoretical Measures: Clustering AB C E D F G Complex networks exhibit a large degree of clustering. If we consider a node in a random graph and its nearest neighbors, the probability that two of these neighbors are connected is equal to the probability that two randomly selected nodes are connected.

30 VL Netzwerke, WS 2007/08 Edda Klipp 30 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Graph-theoretical Measures: Clustering Clustering coefficients as predicted for random networks and Clustering coefficients for real networks (WWW, movie actors, co-authorship, E.coli substrate graph, E.coli reaction graph, food webs, word co-occurrence, power grids,…)


Herunterladen ppt "VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism."

Ähnliche Präsentationen


Google-Anzeigen