Präsentation zum Thema: "Robust Expert Ranking in Online Communities - Fighting Sybil Attacks"— Präsentation transkript:
1Robust Expert Ranking in Online Communities - Fighting Sybil Attacks 8th IEEE International Conference onCollaborative Computing:Networking, Applications and WorksharingOctober 14–17, 2012 Pittsburgh, Pennsylvania, United StatesRobust Expert Ranking in Online Communities - Fighting Sybil AttacksKhaled A. N. Rashed, Cristina Balasoiu, Ralf KlammaRWTH Aachen University Advanced Community Information Systems (ACIS)
2Requirements Engineering Advanced Community Information Systems (ACIS)Responsive Open Community Information SystemsCommunity Visualization and SimulationCommunity AnalyticsCommunity SupportWeb EngineeringWeb AnalyticsFake multimedia and misbehaviourRequirements Engineering
3Agenda Introduction and motivation Related work Our Approach Expert ranking algorithmRobustness of the expert ranking algorithmEvaluationConclusions and outlookFirst I will introduce you to the research background and problems of faked multimedia
4The task is very important in online collaborative systems IntroductionThe expert search and ranking refer to the way of finding a group of authoritative users with special skills and knowledge for a specific category.The task is very important in online collaborative systemsProblems: openness and misbehaviour andNo attention has been made to the trust and reputation of expertsSolution: Leveraging trust
5Motivation Examples Manipulating the truth for war propaganda Tidal bores presented as Indian Ocean TsunamiPublished as: 2004 Indian Ocean TsunamiProved to be tidal bores, a four-day-long government-sponsored tourist festival in ChinaPublished as: British soldiers abusing prisoners in IraqProved to be fake by Brigadier Geoff Sheldon who said the vehicle featured in the photo had never been to IraqAppeard in London Daily‘s MirrorUse of expert knowledge to figure out the fake.2. Genuine photos with fake metadata.Expert knowledge, analysis and witnesses are needed to identify the fake!
6A Case Study: Collaborative Fake Multimedia Detection System Collaborative activities (rating, tagging and commenting)Provide new means of search, retrieval and media authenticity evaluationExplicit ratings and tags are used for evaluating authenticity of multimedia itemsReliability: not all of the submitted ratings are reliableNo centralized control mechanismVulnerability to attacksThree types of usersHonest usersExpertsMalicious userse.g. Press Agencies
7Research Questions and Goals How to measure users’ expertise in collaborative media sharing and evaluating systems? and how to rank them?What is the implication of trustRobustness! how to ensure robustness of the ranking algorithmGoalsImprove multimedia evaluationReduce impacts of malicious users
8Related WorkProbabilistic models e.g.[Tu et al.2010]Voting models [Macdonald and Ounis 2006] [Macdonald et al.2008]Link-based approaches PageRank [Brein and Page 1998], HITS [Kleinberg1999] and their variations. SPEAR algorithm [Noll et al. 2009] ExpertRank [Jiao et al. 2009]TREC enterprise track -Find the associations between candidates and documents e.g.[Balog 2006, Balog 2007]Machine learning algorithms e.g. [Bian and Liu 2008, Li et al. 2009]
9Assumptions Our Approach Expert definition Expert users tend to have many authenticity ratingsCorrectly evaluated media are rated by users of high expertiseFollowing expert users provides more benefitsExpert definitionRates a big number of media files in an authentic way with respect to a topic and Highly trusted by his directly connected usersShould be trustable in evaluating multimediawe discuss the notions of experts and expertise in the context of collaborative fake multimedia detection systems.Here we try to define the expert and we asume that ….Improve media evaluation (by increasing the impact of experts)
10Expert Ranking Methods Domain knowledge driven methodConsiders tags that users assign to media filesUser profile: merging tags user submitted to the media files in the systemSimilarity coefficient between the candidate profile and the tags assigned to a specific resourceUsed to reorder users who voted a media file according to the tag profileDomain knowledge independent methodUse the connections between users and resources to decide on the expertise of the usersA modified version of HITS algorithmMutual reinforcement of users expertise and media
11MHITS : Expert Ranking Algorithm MHITS: Expert ranking algorithm in online collaborative systemsLink-based approach, based on HITS algorithmHITSAuthorities: pages that are pointed to by good pagesHubs: pages that points to good pagesReinforcement between hubs and authoritiesMHITSUsers act as hubs (correctly evaluated media rated by them)Media files act as authoritiesMutual reinforcement between users and media filesLocal trust values between users are assignedConsiders the rates of the usersHITS : reinforcement relation between hubs and authorities :a page has high authority if many pages pointing to it have highhubness and a page has high hubness if many pages pointing to it have high authorityMHITS :The mutual reinforcement relation refers to the fact that the expertise of a user depends on the way she rates and the authority of a rated resource comes from the way it is rated by users. This means that the authority of a media file is influenced by the ratings users assign to it and by the trust the users receive from their neighbors, at the same time, the expertise of users comes from the authorities that they rate.
12MHITS: Expert Ranking Algorithm one network for users and ratingsone for users only (trust network).SymbolDescriptiona(m)Authority scoreU(m)Set of users pointing to media file mh(u)Hubness scorer(u)Rating of user u for media file mt(u)Average trust of the direct connected users to user uM(u)Set of media files to which user u pointsCoefficient that weights the influence of the two terms, in range [0, 1]the expert ranking network: two types of nodes and two types of edges (ratings and trust edges)bipartite graph between set of users and set of media files and bipatite graaph between usersWe exploit links between users and media files and also links between usersTrust in range [0, 1]Ratings 0.5 for a fake vote,1 for an authentic vote
13Robustness of the MHITS Algorithm Compromising techniquesSybil attack [Douc02], Reputation theft, Whitewashing attack, etc.Compromising the input and the output of the algorithmSybil attackFundamental problem in online collaborative systemsA malicious user creates many fake accounts (Sybils) which all reference the user to boost his reputation (attacker’s goal is to be higher up in the rankings)SybilGuard, SybilLimit are descentralizedSumUp is centralizerdSybilGuard is based on the “social network” among user identities, where an edge between two identities indicates a human-established trustrelationship. Malicious users can create many identities but few trust relationships. Thus, there is a disproportionately-small “cut” in the graph between the sybil nodes and the honest nodes. SybilGuard exploits this property to bound the number of identities a malicious user can create.SybilLimit – leverages the same insight as SybilGuard but is an improved version that reduces the accepted Sybil nodes of a honest node from O(nlogn) to O(logn) for n honest nodesWhen all nodes vote, SumUp leads to much lower attack capacity than SybilLimit despite the same asymptotic bound per attack edgeFirst, SumUp’s bound of 1 + log n inTheorem 5.1 is a loose upper bound of the actual average capacity. Second, since links pointing to lower-levelnodes are not eligible for ticket distribution, many incoming links of an adversarial nodes have zero tickets and thusare assigned capacity of oneCountermeasures against Sybil attackSybilGuard [YKGF06]SybilLimit [YGKX08]SumUp [TMLS09]Protocol typeDecentralizedCentralizedAccepted Sybils per attack edge
14SumUp Centralized approach Aims to aggregate votes in a Sybil resilient mannerKey idea – adaptive vote flow technique - that appropriately assigns and adjusts link capacities in the trust graph to collect the votes for an objectNew: we Integrate SumUp with the MHITS Java implementation – used own data structure based on Java Sparse ArraysSumUp StepsAssign the source node and number of votes per media fileLevels assignmentPruning stepCapacity assignmentMax-flow computation – collect votes on each resourceLeverage user history to penalize adversarial nodesCountermeasures against Sybil attackSumUp is a Sybil resilient online content rating system that uses the trust network among users to defend against Sybil attacks. It uses the concept of max-flow
18Evaluation Evaluation metrics: Precision@K Spearman’s rank correlation coefficientp - Spearman’s coefficient of rank correlation -1 ≤ ps ≤ 1di - is the different between the rank of xi and the rank of yin:- the number of data points in the sample (total number of observations)ps = - 1 or 1 high degree of correlation between x any yPs = 0 a lack of linear association between two variables+1-1Perfect PositiveCorrelationNo CorrelationPerfect Negativecomputes for a given result of ranked users, the fraction of relevant results in the top K results. The higher the precision, the betterthe performance is. We use this metric to compare the results of the expert ranking algorithms that we developed with the ranking of experts resulted by counting the number of fair votes.Spearman’s rank correlation coefficien is a non-parametric measure of statistical dependence between two ranked lists.Spearman’s rank correlation coefficient it is based on rank order of scores and not the score data. Correlation Coefficient between the ranked variables d= Difference of rank between paired item in two series (lists).
19Experimental Results I For this step of the evaluation, I assume that all users in the network are behaving in a fair way and are rating a random number of media files. So the only way the users can rate a media file wrong, is when the user has no competence in the specific topic.What is different in the two methods is that, besides the reinforcement between users voting fairly and authentic media files, the ranking in the case of the MHITS considers also the local trust values the user has in the social network.Since average precision ignores the exact rank of a user, we use the Spearman's rank correlation coefficient to get a better view of the efficiency. In Table 6.2, the correlation coefficients for n = 15 are presented. One can notice that the result of the MHITS algorithm is higher correlated to the fair number of media file ranking as the value gets closer to 1No SybilsResults are compared with the rankingof the users according to the number offair ratings each of them had in the systemHITSMHITSSpearmann=150.870.93
20Experimental Results II From the results, we can see that our proposed model integration of Sumup to Mhits algorithm outperforms the HITS and the MHITS with out SumUp, which confirms the effectiveness of our approachAs it can be seen, the MHITS in combination with SumUp performs better for K = 10 and then for K = 20 the precision decreases much rapidly even than the MHITS. We think that this happens due to the fact that some Sybil users are already entering the ranking for K = 20 due to their high local trust values and therefore the precision decreases.10% Sybils4 attack edgesHITSMHITSMHITS & SumUpSpearmann=200.520.680.93
21Experimental Results III 10% Sybils (one group) and 8 attack edges20% Sybils (one group) and 24 attack edges
22Further evaluation3% % - Number of Sybil votes increased with respect to the total number of fair votesexpertise ranking does not change9 to 14 and 24 Number of attack edges was increased keeping the number of Sybil votes to 17% percent of the number of fair votes and constant number of Sybils (50)precision does not change17% % and then to 100% the number of Sybil votes Increased keeping constant the Nr of attack edges (24) and Sybils Nr.It can be noticed that by increasing the number of the Sybils, the attack edges or even the votes (up to 50% of the number of the fair votes), the ranking of the users do not change dramatically.Also it can be seen that the Modified HITS with SumUp performs only slightly better than the Modified HITS alone. The reason for these facts is that the steps that are additionally done by SumUp when run together with HITS which are: pruning of the trust network, assignment of capacity in the network and elimination of the links that posses high negative history do not affect the Sybils.The reason for this is that the capacity assignment does not reach them so votes from Sybils do not reach the source node. In this case, the edges connecting Sybils to fair nodes do not accumulate negative history and therefore are not eliminated. On this resulting network, Modified HITS is run again. The Sybils are kept and due to the high local trust values that they have from the other Sybil nodes in the group, they get into the top rank of experts.KMHITS20%MHITS & SumUp50%MHITS&SumUp100%120.910.270.330.08150.930.400.06
23Conclusions and Future Work Proposed an expertise ranking algorithm in collaborative systems (fake multimedia detection systems)Leveraging trust and showed the trust implicationsCombination of expert ranking and resistant to Sybils algorithmsFuture WorkApplying the algorithm on real data and on different data setsTemporal analysis –time series analysisIntegrate the domain knowledge driven methodCombination of expert ranking and resistant to Sybils algorithms to ensure robustness