Robust Expert Ranking in Online Communities - Fighting Sybil Attacks

Slides:

Advertisements

Ähnliche Präsentationen

Cadastre for the 21st Century – The German Way

Advertisements

Service Oriented Architectures for Remote Instrumentation

An new European Power Network: Student Power

Vernetzung von Repositorien : DRIVER Guidelines Dr Dale Peters, SUB Goettingen 4. Helmholtz Open Access Workshop Potsdam, 17 Juni 2008.

PSI and Competition The General Framework

Finding the Pattern You Need: The Design Pattern Intent Ontology

E-Solutions mySchoeller.com for Felix Schoeller Imaging

H - A - M - L - E - IC T Teachers Acting Patterns while Teaching with New Media in the Subjects German, Mathematics and Computer Science Prof. S. Blömeke,

Mean and variance.

Managing the Transition from School-to-Work Empirical Findings from a Mentoring Programme in Germany Prof. i.V. Dr. Martin Lang.

R. Zankl – Ch. Oelschlegel – M. Schüler – M. Karg – H. Obermayer R. Gottanka – F. Rösch – P. Keidler – A. Spangler th Expert Meeting Business.

Die ZBW ist Mitglied der Leibniz-Gemeinschaft Copyright © ZBW 2010 Seite 1 Potenziale semantischer Technologien für die Bibliothek der Zukunft Klaus Tochtermann.

© 2006 Open Grid Forum OGF26 - Chapel Hill, May 2009 Addressing Metadata Challenges OGF Digital Repositories RG.

First Seminar in Brussels, 15th of December 2010

Steinbeis Forschungsinstitut für solare und zukunftsfähige thermische Energiesysteme Nobelstr. 15 D Stuttgart WP 4 Developing SEC.

Fakultät für informatik informatik 12 technische universität dortmund Optimizations Peter Marwedel TU Dortmund Informatik 12 Germany 2009/01/17 Graphics:

Fakultät für informatik informatik 12 technische universität dortmund Mapping of Applications to Platforms Peter Marwedel TU Dortmund, Informatik 12 Germany.

Fakultät für informatik informatik 12 technische universität dortmund Specifications Peter Marwedel TU Dortmund, Informatik 12 Graphics: © Alexandra Nolte,

Peter Marwedel TU Dortmund, Informatik 12

Fakultät für informatik informatik 12 technische universität dortmund Hardware/Software Partitioning Peter Marwedel Informatik 12 TU Dortmund Germany Chapter.

Testing the Importance of Cleansing Procedures for

Regional Support in the context of LCG/EGEE

C. Kottmeier, C. Hauck, G. Schädler, N. Kalthoff

NUMEX – Numerical experiments for the GME Fachhochschule Bonn-Rhein-Sieg Wolfgang Joppich PFTOOL - Precipitation forecast toolbox Semi-Lagrangian Mass-Integrating.

Institut für Verkehrsführung und Fahrzeugsteuerung > Technologien aus Luft- und Raumfahrt für Straße und Schiene Automatic Maneuver Recognition in the.

Thomas Herrmann Software - Ergonomie bei interaktiven Medien Step 6: Ein/ Ausgabe Instrumente (Device-based controls) Trackball. Joystick.

Hochschulteam der Agentur für Arbeit Trier Preventing the Brainware Crisis Workshop Schloss Dagstuhl Student Enrollment in Computer Science.

Fachabteilung 16A Überörtliche Raumplanung Cross border co-operation from the view of a public administration unit.

Deutsche Gesellschaft für Technische Zusammenarbeit GmbH Integrated Experts as interface between technical cooperation and the private sector – An Example.

Seminar Telematiksysteme für Fernwartung und Ferndiagnose Basic Concepts in Control Theory MSc. Lei Ma 22 April, 2004.

Methods Fuzzy- Logic enables the modeling of rule based knowledge by the use of fuzzy criteria instead of exact measurement values or threshold values.

Institut für Umweltphysik/Fernerkundung Physik/Elektrotechnik Fachbereich 1 SADDU June 2008 S. Noël, K.Bramstedt,

Institut für Umweltphysik/Fernerkundung Physik/Elektrotechnik Fachbereich 1 Pointing Meeting Nov 2006 S. Noël IFE/IUP Elevation and Azimuth Jumps during.

Integration of renewable energies: competition between storage, the power grid and flexible demand Thomas Hamacher.

HAW Hamburg, CARPE 2011, Prof. Dr. Rüdiger Weißbach, Revision : Bridging the Communication Gap in IT Projects - Enabling Non-IT Professionals.

Laurie Clarcq The purpose of language, used in communication, is to create a picture in the mind and/or the heart of another.

Case Study Session in 9th GCSM: NEGA-Resources-Approach

Institut AIFB, Universität Karlsruhe (TH) Forschungsuniversität gegründet 1825 Towards Automatic Composition of Processes based on Semantic.

Sanjay Patil Standards Architect – SAP AG April 2008

A good view into the future Presented by Walter Henke BRIT/SLL Schweinfurt, 14. November 2006.

BAS5SE | Fachhochschule Hagenberg | Daniel Khan | S SPR5 MVC Plugin Development SPR6P.

Alp-Water-Scarce Water Management Strategies against Water Scarcity in the Alps 4 th General Meeting Cambery, 21 st September 2010 Water Scarcity Warning.

Technische Universität Berlin Fakultät für Verkehrs- und Maschinensysteme, Institut für Mechanik Lehrstuhl für Kontinuumsmechanik und Materialtheorie,

DEUTSCHLAND UND DIE MEDIEN

Institut für Öffentliche Dienstleistungen und Tourismus The role of universities for regional labour markets: the example of central Switzerland Simone.

Faculty of Public Health Department of Health Economics and Management University of Bielefeld WP 3.1 and WP 4.1: Macrocost EUprimecare Plenary Meeting.

Kölner Karneval By Logan Mack

Einführung Bild und Erkenntnis Einige Probleme Fazit Eberhard Karls Universität Tübingen Philosophische Fakultät Institut für Medienwissenschaft Epistemic.

Berner Fachhochschule Hochschule für Agrar-, Forst- und Lebensmittelwissenschaften HAFL Recent activities on ammonia emissions: Emission inventory Rindvieh.

4th Symposium on Lidar Atmospheric Applications

Ein Projekt des Technischen Jugendfreizeit- und Bildungsvereins (tjfbv) e.V. kommunizieren.de Blended Learning for people with disabilities.

BASIS - Balanced Scorecards and Strategic Management Information Systems for Public Administrations Björn Niehaves European Research Center for Information.

ESSnet Workshop Conclusions.

FORSCHUNGSINSTITUT FÜR ÖFFENTLICHE VERWALTUNG BEI DER DEUTSCHEN HOCHSCHULE FÜR VERWALTUNGSWISSENSCHAFTEN SPEYER Dr. Sonja Bugdahn 1 Can New Regulators.

3rd Review, Vienna, 16th of April 1999 SIT-MOON ESPRIT Project Nr Siemens AG Österreich Robotiker Technische Universität Wien Politecnico di Milano.

Berner Fachhochschule Hochschule für Agrar-, Forst- und Lebensmittelwissenschaften HAFL 95% der Ammoniakemissionen aus der Landwirtschaft Rindvieh Pflanzenbau.

HRM A – G. Grote ETHZ, WS 06/07 HRM A: Work process design Overview.

KIT – die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Vorlesung Knowledge Discovery - Institut AIFB Tempus fugit Towards.

Lehrstuhl für Steuerrecht und Öffentliches Recht Prof. Dr. Roland Ismer MSc Econ. (LSE)/Prof. Dr. Klaus Meßerschmidt Grundlagen Staats- und Verwaltungsrecht.

1 Stevens Direct Scaling Methods and the Uniqueness Problem: Empirical Evaluation of an Axiom fundamental to Interval Scale Level.

Technische Universität München Spatial aspects of the formation of GMO-free or GMO clubs Maarten J. Punt Technische Universität München.

Lehrstuhl für Waldbau, Technische Universität MünchenBudapest, 10./11. December 2006 WP 1 Status (TUM) Bernhard Felbermeier.

Social Media and Social Innovation a Manifesto

Selectivity in the German Mobility Panel Tobias Kuhnimhof Institute for Transport Studies, University of Karlsruhe Paris, May 20th, 2005.

How to use and facilitate an OptionFinder Audience Response System.

Technische Universität München 1 CADUI' June FUNDP Namur G B I The FUSE-System: an Integrated User Interface Design Environment Frank Lonczewski.

TUM in CrossGrid Role and Contribution Fakultät für Informatik der Technischen Universität München Informatik X: Rechnertechnik und Rechnerorganisation.

Andreas Burger ZENTRUM FÜR MEDIZINISCHE LEHRE RUHR-UNIVERSITÄT BOCHUM Irkutsk October 2012 Report about the lecture "Report of the TEMPUS IV- Project Nr.

Inter-Cultural Teaching and Learning ICTaL Technische Universität Berlin Zentraleinrichtung Kooperation Wissenschaftliche und interne Weiterbildung Introductory.

LLP DE-COMENIUS-CMP Dieses Projekt wurde mit Unterstützung der Europäischen Kommission finanziert. Die Verantwortung für den Inhalt dieser.

Präsentation transkript:

Robust Expert Ranking in Online Communities - Fighting Sybil Attacks 8th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing October 14–17, 2012 Pittsburgh, Pennsylvania, United States Robust Expert Ranking in Online Communities - Fighting Sybil Attacks Khaled A. N. Rashed, Cristina Balasoiu, Ralf Klamma RWTH Aachen University Advanced Community Information Systems (ACIS) {rashed|balsoiu|klamma}@dbis.rwth-aachen.de

Requirements Engineering Advanced Community Information Systems (ACIS) Responsive Open Community Information Systems Community Visualization and Simulation Community Analytics Community Support Web Engineering Web Analytics Fake multimedia and misbehaviour Requirements Engineering

Agenda Introduction and motivation Related work Our Approach Expert ranking algorithm Robustness of the expert ranking algorithm Evaluation Conclusions and outlook First I will introduce you to the research background and problems of faked multimedia

The task is very important in online collaborative systems Introduction The expert search and ranking refer to the way of finding a group of authoritative users with special skills and knowledge for a specific category. The task is very important in online collaborative systems Problems: openness and misbehaviour and No attention has been made to the trust and reputation of experts Solution: Leveraging trust

Motivation Examples Manipulating the truth for war propaganda Tidal bores presented as Indian Ocean Tsunami Published as: 2004 Indian Ocean Tsunami Proved to be tidal bores, a four-day-long government-sponsored tourist festival in China Published as: British soldiers abusing prisoners in Iraq Proved to be fake by Brigadier Geoff Sheldon who said the vehicle featured in the photo had never been to Iraq Appeard in London Daily‘s Mirror Use of expert knowledge to figure out the fake. 2. Genuine photos with fake metadata. Expert knowledge, analysis and witnesses are needed to identify the fake!

A Case Study: Collaborative Fake Multimedia Detection System Collaborative activities (rating, tagging and commenting) Provide new means of search, retrieval and media authenticity evaluation Explicit ratings and tags are used for evaluating authenticity of multimedia items Reliability: not all of the submitted ratings are reliable No centralized control mechanism Vulnerability to attacks Three types of users Honest users Experts Malicious users e.g. Press Agencies

Research Questions and Goals How to measure users’ expertise in collaborative media sharing and evaluating systems? and how to rank them? What is the implication of trust Robustness! how to ensure robustness of the ranking algorithm Goals Improve multimedia evaluation Reduce impacts of malicious users

Related Work Probabilistic models e.g.[Tu et al.2010] Voting models [Macdonald and Ounis 2006] [Macdonald et al.2008] Link-based approaches PageRank [Brein and Page 1998], HITS [Kleinberg1999] and their variations. SPEAR algorithm [Noll et al. 2009] ExpertRank [Jiao et al. 2009] TREC enterprise track -Find the associations between candidates and documents e.g.[Balog 2006, Balog 2007] Machine learning algorithms e.g. [Bian and Liu 2008, Li et al. 2009]

Assumptions Our Approach Expert definition Expert users tend to have many authenticity ratings Correctly evaluated media are rated by users of high expertise Following expert users provides more benefits Expert definition Rates a big number of media files in an authentic way with respect to a topic and Highly trusted by his directly connected users Should be trustable in evaluating multimedia we discuss the notions of experts and expertise in the context of collaborative fake multimedia detection systems. Here we try to define the expert and we asume that …. Improve media evaluation (by increasing the impact of experts)

Expert Ranking Methods Domain knowledge driven method Considers tags that users assign to media files User profile: merging tags user submitted to the media files in the system Similarity coefficient between the candidate profile and the tags assigned to a specific resource Used to reorder users who voted a media file according to the tag profile Domain knowledge independent method Use the connections between users and resources to decide on the expertise of the users A modified version of HITS algorithm Mutual reinforcement of users expertise and media

MHITS : Expert Ranking Algorithm MHITS: Expert ranking algorithm in online collaborative systems Link-based approach, based on HITS algorithm HITS Authorities: pages that are pointed to by good pages Hubs: pages that points to good pages Reinforcement between hubs and authorities MHITS Users act as hubs (correctly evaluated media rated by them) Media files act as authorities Mutual reinforcement between users and media files Local trust values between users are assigned Considers the rates of the users HITS : reinforcement relation between hubs and authorities :a page has high authority if many pages pointing to it have high hubness and a page has high hubness if many pages pointing to it have high authority MHITS :The mutual reinforcement relation refers to the fact that the expertise of a user depends on the way she rates and the authority of a rated resource comes from the way it is rated by users. This means that the authority of a media file is influenced by the ratings users assign to it and by the trust the users receive from their neighbors, at the same time, the expertise of users comes from the authorities that they rate.

MHITS: Expert Ranking Algorithm one network for users and ratings one for users only (trust network). Symbol Description a(m) Authority score U(m) Set of users pointing to media file m h(u) Hubness score r(u) Rating of user u for media file m t(u) Average trust of the direct connected users to user u M(u) Set of media files to which user u points Coefficient that weights the influence of the two terms, in range [0, 1] the expert ranking network: two types of nodes and two types of edges (ratings and trust edges) bipartite graph between set of users and set of media files and bipatite graaph between users We exploit links between users and media files and also links between users Trust in range [0, 1] Ratings 0.5 for a fake vote, 1 for an authentic vote

Robustness of the MHITS Algorithm Compromising techniques Sybil attack [Douc02], Reputation theft, Whitewashing attack, etc. Compromising the input and the output of the algorithm Sybil attack Fundamental problem in online collaborative systems A malicious user creates many fake accounts (Sybils) which all reference the user to boost his reputation (attacker’s goal is to be higher up in the rankings) SybilGuard, SybilLimit are descentralized SumUp is centralizerd SybilGuard is based on the “social network” among user identities, where an edge between two identities indicates a human-established trust relationship. Malicious users can create many identities but few trust relationships. Thus, there is a disproportionately-small “cut” in the graph between the sybil nodes and the honest nodes. SybilGuard exploits this property to bound the number of identities a malicious user can create. SybilLimit – leverages the same insight as SybilGuard but is an improved version that reduces the accepted Sybil nodes of a honest node from O(nlogn) to O(logn) for n honest nodes When all nodes vote, SumUp leads to much lower attack capacity than SybilLimit despite the same asymptotic bound per attack edge First, SumUp’s bound of 1 + log n in Theorem 5.1 is a loose upper bound of the actual average capacity. Second, since links pointing to lower-level nodes are not eligible for ticket distribution, many incoming links of an adversarial nodes have zero tickets and thus are assigned capacity of one Countermeasures against Sybil attack SybilGuard [YKGF06] SybilLimit [YGKX08] SumUp [TMLS09] Protocol type Decentralized Centralized Accepted Sybils per attack edge

SumUp Centralized approach Aims to aggregate votes in a Sybil resilient manner Key idea – adaptive vote flow technique - that appropriately assigns and adjusts link capacities in the trust graph to collect the votes for an object New: we Integrate SumUp with the MHITS Java implementation – used own data structure based on Java Sparse Arrays SumUp Steps Assign the source node and number of votes per media file Levels assignment Pruning step Capacity assignment Max-flow computation – collect votes on each resource Leverage user history to penalize adversarial nodes Countermeasures against Sybil attack SumUp is a Sybil resilient online content rating system that uses the trust network among users to defend against Sybil attacks. It uses the concept of max-flow

Integration of SumUp with MHITS

Evaluation Experimental Setup BarabasiAlbert model for generating network 300 users 20 media files (10 known to be fake and 10 known to be authentic) 800 ratings 3000 trust edges

Ratings Distribution

Evaluation Evaluation metrics: Precision@K Spearman’s rank correlation coefficient p - Spearman’s coefficient of rank correlation -1 ≤ ps ≤ 1 di - is the different between the rank of xi and the rank of yi n:- the number of data points in the sample (total number of observations) ps = - 1 or 1 high degree of correlation between x any y Ps = 0 a lack of linear association between two variables +1 -1 Perfect Positive Correlation No Correlation Perfect Negative P@K computes for a given result of ranked users, the fraction of relevant results in the top K results. The higher the precision, the better the performance is. We use this metric to compare the results of the expert ranking algorithms that we developed with the ranking of experts resulted by counting the number of fair votes. Spearman’s rank correlation coefficien is a non-parametric measure of statistical dependence between two ranked lists. Spearman’s rank correlation coefficient it is based on rank order of scores and not the score data. Correlation Coefficient between the ranked variables d= Difference of rank between paired item in two series (lists).

Experimental Results I For this step of the evaluation, I assume that all users in the network are behaving in a fair way and are rating a random number of media files. So the only way the users can rate a media file wrong, is when the user has no competence in the specific topic. What is different in the two methods is that, besides the reinforcement between users voting fairly and authentic media files, the ranking in the case of the MHITS considers also the local trust values the user has in the social network. Since average precision ignores the exact rank of a user, we use the Spearman's rank correlation coefficient to get a better view of the efficiency. In Table 6.2, the correlation coefficients for n = 15 are presented. One can notice that the result of the MHITS algorithm is higher correlated to the fair number of media file ranking as the value gets closer to 1 No Sybils Results are compared with the ranking of the users according to the number of fair ratings each of them had in the system HITS MHITS Spearman n=15 0.87 0.93

Experimental Results II From the results, we can see that our proposed model integration of Sumup to Mhits algorithm outperforms the HITS and the MHITS with out SumUp, which confirms the effectiveness of our approach As it can be seen, the MHITS in combination with SumUp performs better for K = 10 and then for K = 20 the precision decreases much rapidly even than the MHITS. We think that this happens due to the fact that some Sybil users are already entering the ranking for K = 20 due to their high local trust values and therefore the precision decreases. 10% Sybils 4 attack edges HITS MHITS MHITS & SumUp Spearman n=20 0.52 0.68 0.93

Experimental Results III Precision@K 10% Sybils (one group) and 8 attack edges 20% Sybils (one group) and 24 attack edges

Further evaluation 3% 17% - Number of Sybil votes increased with respect to the total number of fair votes expertise ranking does not change 9 to 14 and 24 Number of attack edges was increased keeping the number of Sybil votes to 17% percent of the number of fair votes and constant number of Sybils (50) precision does not change 17% 50% and then to 100% the number of Sybil votes Increased keeping constant the Nr of attack edges (24) and Sybils Nr. It can be noticed that by increasing the number of the Sybils, the attack edges or even the votes (up to 50% of the number of the fair votes), the ranking of the users do not change dramatically. Also it can be seen that the Modified HITS with SumUp performs only slightly better than the Modified HITS alone. The reason for these facts is that the steps that are additionally done by SumUp when run together with HITS which are: pruning of the trust network, assignment of capacity in the network and elimination of the links that posses high negative history do not affect the Sybils. The reason for this is that the capacity assignment does not reach them so votes from Sybils do not reach the source node. In this case, the edges connecting Sybils to fair nodes do not accumulate negative history and therefore are not eliminated. On this resulting network, Modified HITS is run again. The Sybils are kept and due to the high local trust values that they have from the other Sybil nodes in the group, they get into the top rank of experts. K MHITS 20% MHITS & SumUp 50% MHITS&SumUp 100% 12 0.91 0.27 0.33 0.08 15 0.93 0.40 0.06

Conclusions and Future Work Proposed an expertise ranking algorithm in collaborative systems (fake multimedia detection systems) Leveraging trust and showed the trust implications Combination of expert ranking and resistant to Sybils algorithms Future Work Applying the algorithm on real data and on different data sets Temporal analysis –time series analysis Integrate the domain knowledge driven method Combination of expert ranking and resistant to Sybils algorithms to ensure robustness