Public A Framework for Improving Data Integration with Linked Data Ahmad Assaf † Supervised by: Aline Senart † and Raphaël Troncy ‡ † SAP Research, SAP.

Slides:



Advertisements
Ähnliche Präsentationen
SAP Rapid-Deployment Solution for Financial Close and Disclosure Management Solution Summary.
Advertisements

ES Community Definition Group – Bundle for Customer Billing and Payment In-Person Meeting April 29, 2008 Ratingen, Germany.
G21Billing Document Outbound via EDI Overview
G20 Sales Order Processing via EDI Overview
J62 Buchungskreisübergreifende Auftragsabwicklung - Überblick
Abwicklung von Gutschriften
Anlagenzugang für Anlagen im Bau
Muster- und Simulationskalkulation
Pipeline Performance Management
Einkauf Fremdleistung
Unternehmensstruktur Übersicht SAP Best Practices.
Serialnummernverwaltung
Kostenlose Lieferung SAP Best Practices.
Schlankes Kampagnenmanagement
Account- und Kontaktmanagement
Zeiterfassung SAP Best Practices.
Manufacturing Analytics SAP Best Practices for Business Warehousing V2
Rapid database migration to Sybase Adaptive Server Enterprise Solution Summary.
SAP SCM Rapid-Deployment Solution for Advanced Production Scheduling
DMS Modul Kurzvorstellung.
SAP AG 2011, Introduction to SAP Business One 8.8, GTM Rollout Services Page 1 SwissAddOn Installation und Setup Allgemein: Zur Abdeckung Schweiz-spezifischer.
SAP InnoJam SUP 2011 BlackBelt InnoJam in Walldorf September, 2011.
Task and Duty Modul Kurzvorstellung. Key Features Komfortables Benutzerinterface zur Wartung und Benutzung von Rechten und Pflichten. Such- und Beschlagwortungsmöglichkeit.
Leistungsvorstellung
Neues bei V1.603 SAP Best Practices for Chemicals (Deutschland) SAP Best Practices.
CEO Analytics SAP Best Practices for Business Warehousing V2.701
Interaktives Reporting SAP Best Practices for CRM
Operatives Berichtswesen für Tradingkontrakte
Voraussetzungen und Annahmen für die Aufwandsschätzung
Übersicht SAP AG.
DMS Modul Kurzvorstellung. Key Features Komfortables Benutzerinterface zur Wartung und Benutzung von in SAP Abgelegten Dokumenten. Such- und Beschlagwortungs-
Gebietsmanagement SAP Best Practices. ©2011 SAP AG. All rights reserved.2 Einsatzmöglichkeiten, Vorteile und wichtige Abläufe im Szenario Einsatzmöglichkeiten.
Martin Rink, SAP Trust Center Services SAP Trust Center Services SAP Passports - Scenarios of Usage.
Interaktives Reporting
Unternehmensstruktur Übersicht
C83 – Interaction Center (IC) Service Request Management Process Diagram EHP2 for SAP CRM 7.0 EHP2 for SAP CRM 7.0, version for SAP HANA.
Financial Analytics SAP Best Practices for Business Warehousing V2.701 SAP Best Practices.
Purchasing Analytics SAP Best Practices for Business Warehousing V2.701 SAP Best Practices.
C67 – Pipeline Performance Management Process Diagram EHP2 for SAP CRM 7.0 EHP2 for SAP CRM 7.0, version for SAP HANA.
Erweiterte Kundenauftragsbearbeitung mit Vertriebsunterstützung und dynamischem Produktvorschlag SAP Best Practices.
Sales Analytics SAP Best Practices for Business Warehousing V2.701 SAP Best Practices.
SAP Best Practices for Subsidiary Integration in One Client Buchungskreisübergreifende Erlösplanung und Berichtswesen mit CO-PA SAP Best Practices.
Innenauftragsplanung für Marketing und sonstige Gemeinkosten
Interaction Center (IC) Serviceanforderungsmanagement SAP Best Practices.
Opportunity Management SAP Best Practices. ©2011 SAP AG. All rights reserved.2 Einsatzmöglichkeiten, Vorteile und wichtige Abläufe im Szenario Einsatzmöglichkeiten.
Ausgehender Fakturabeleg via EDI
Eingehender Kundenauftrag via EDI
Interaction Center Sales (mit ERP-Kundenauftrag) SAP Best Practices.
Reklamations- und Retourenabwicklung
Kundenauftragsabwicklung mit Lieferung von einem anderen Werk
? What is Open PS? SAP Open PS based on EPS 4.0
Belegaufteilung für die Konsumgüterindustrie und den Großhandel aktivieren SAP Best Practices.
Internes Projekt SAP Best Practices. ©2011 SAP AG. All rights reserved.2 Einsatzmöglichkeiten, Vorteile und wichtige Abläufe im Szenario Einsatzmöglichkeiten.
Serviceauftragsabwicklung SAP Best Practices. ©2011 SAP AG. All rights reserved.2 Einsatzmöglichkeiten, Vorteile und wichtige Abläufe im Szenario Einsatzmöglichkeiten.
Integriertes Angebots- und Auftragsmanagement SAP Best Practices.
INTERN TB1100 SAP Business One Rechnungswesen. ©2013 SAP AG. Alle Rechte vorbehalten.2 Weitergabe und Vervielfältigung dieser Publikation oder von Teilen.
SAP and HERE IoT App Challenge - Use Case Template Please return to November 18, 2015.
Logistik Stammdatenaufbau Handel SAP Best Practices Baseline Package V1.605.
Quartalsplan – Absatzmengenprognose mit CO-PA SAP Best Practices.
BW Analytics SAP Best Practices. ©2011 SAP AG. All rights reserved.2 Einsatzmöglichkeiten, Vorteile und wichtige Abläufe im Szenario Einsatzmöglichkeiten.
Michael Becker SAP AG Betreuer: Sven Helmer
C65 – Activity Management Process Diagram EHP2 for SAP CRM 7.0 EHP2 for SAP CRM 7.0, version for SAP HANA.
C30 – Lead Management Process Diagram EHP2 for SAP CRM 7.0 EHP2 for SAP CRM 7.0, version for SAP HANA.
SAP and Zebra Zatar Use Case Template. ©2016 SAP SE. All rights reserved. Use Case Description 1/3 Company name Main contact name Project type Solution.
CEO SAP Best Practices for Business Intelligence SAP Best Practices.
Service SAP Best Practices for Business Intelligence SAP Best Practices.
Custom error page for timeout Gergely Andó / Application Innovation July 10, 2013 Customer.
Planungsrezept anlegen SAP Best Practices Baseline Package SAP Best Practices.
Szenarioübersicht Terminierung mittels Rechnungen und Nachberechnung.
 Präsentation transkript:

Public A Framework for Improving Data Integration with Linked Data Ahmad Assaf † Supervised by: Aline Senart † and Raphaël Troncy ‡ † SAP Research, SAP Research France SAS ‡ EURECOM, Sophia Antipolis - France Dec 14, 2012

©2011 SAP AG. All rights reserved.2 Public Background MSc Advanced Software Engineering – University of St. Andrews (UK) Research interests: Collective Intelligence, Data Integration & Visualization Technical Background: Web Development technologies RUBIXremixPanorama BI RTI Internship Oct 11- April 12 RUBIX: Two-men team (main contributor) remix: Collaboration between BI teams in Sophia and Dresden (UI\UX) Panorama: Collaboration between RTI Sophia and RTI Paris PhD Start May 12

©2011 SAP AG. All rights reserved.3 Public Earlier Research Direction An Interaction Framework for Business Intelligence Presenting Recommendations, Suggestions and Feedback Simplicity Working with Large Data Sets Working with External Data Sources using agreed-upon semantics Users' End Goal Data Exploration Users Interactions' Tracing Data Visualization Data Selection Data Manipulation Data Analysis Interactions, Support for Mobility

©2011 SAP AG. All rights reserved.4 Public Projects RUBIXremixPanorama

©2011 SAP AG. All rights reserved.5 Public RUBIX - Problem Definition Linking External Data Distributed sources with heterogeneous data formats and terminologies Complex data models Different storage models Noisiness (duplications, inconsistencies)  Need to find mappings between these internal and external complex data structures (schema matching) Sensor Data Governmental Data Social Media Feeds ERP - - CRM PRM Business Intelligence Analysis Enterprise Data Decision Making Process

©2011 SAP AG. All rights reserved.6 Public RUBIX - Proposal Goal: Allow business users to semi-automatically combine potentially noisy data residing in heterogeneous silos Proposal  Provide a novel framework enabling schema matching of internal and external sources  Develop several matching algorithms to increase accuracy  Leverage Linked Data to enrich the cells  Compare schemas on several bases:  Column global type and name  Cells` rich types retrieved from Linked Data Implementation  Google Refine: A tool designed to process, clean and enrich large amounts of data with existing knowledge bases  Auto Mapping Core: A tool designed by SAP Research, enabling the developer to combine several matching algorithms  Freebase: An open repository of structured data

©2011 SAP AG. All rights reserved.7 Public RUBIX - Experiments Different languages (header name and cell values) Abbreviations Codes (IATA, NASDAQ) Empty column headers

©2011 SAP AG. All rights reserved.8 Public RUBIX - Experiments Results AMC by default runs a set of String matching algorithms between columns` headers Extra plugins (matchers) can be configured and added The results of different matchers are combined using different methods, for our experiments the default “average method” is used The results of AMC default matching algorithms:

©2011 SAP AG. All rights reserved.9 Public RUBIX - Experiments Results AMC’s default set + Cosine Similarity AMC’s default set + Cosine Similarity + PPMCC method AMC’s default set + Cosine Similarity + PPMCC method + Spearman’s

©2011 SAP AG. All rights reserved.10 Public RUBIX - Publications 1.Ahmad Assaf, Eldad Louw, Aline Senart, Corentin Follenfant, Raphael Troncy, David Trastour, RUBIX: A Framework for Improving Data Integration with Linked Data, to be published in ICP Series of the ACM Digital Library. 2.Ahmad Assaf, Eldad Louw, Aline Senart, Corentin Follenfant, Raphael Troncy, David Trastour, Improving Schema Matching with Linked Data, In Proceedings of the 1 st International Workshop on Open Data (WOD), Nantes, France, May 2012.

©2011 SAP AG. All rights reserved.11 Public Projects RUBIXremixPanorama

©2011 SAP AG. All rights reserved.12 Public remix is a self-service BI tool that enables non-technical business users to compose existing BI artifacts with new structured internal and external data sources. It helps business users intuitively and quickly build insightful reports. It enables the composition of existing BI artifacts with new data from the enterprise and from external sources. Recommends the best course of action by leveraging content and interaction traces

©2011 SAP AG. All rights reserved.13 Public remix Demo

©2011 SAP AG. All rights reserved.14 Public Data quality Data quality involves data management, modeling, analysis, storage and presentation It is an important issue for data driven applications which should be deeply investigated and understood in order to ensure the data is fit to be combined and used to infer better business decisions Data quality is subjective and cannot be assessed easily, the actual value of data is mainly realized when it is used Studies found out that most data quality problems are in fact “data misinterpretations” or problems with the data semantics  With the rise of Semantic Web, new data quality principles should be identified

©2011 SAP AG. All rights reserved.15 Public Our Proposal Data Quality PrincipleAttribute Quality of Data Sources Accessibility Authority & Sustainability License Trustworthiness & verifiability Performance Quality of raw data Accuracy Referential correspondence Cleanness Consistency Comprehensibility Completeness Typing Provenance Versatility Traceability Quality of the semantic conversion Correctness Granularity Consistency Quality of the linking process Connectedness Isomorphism Directionality

©2011 SAP AG. All rights reserved.16 Public remix - Results Finalist in TechEd madrid Ahmad Assaf and Aline Senart, Data Quality Principles in the Semantic Web, In Proceedings of the International Workshop on Data Quality Management and Semantic Technologies (DQMST 2012), September 2012, Palermo, Italy

©2011 SAP AG. All rights reserved.17 Public Projects RUBIXremixPanorama

©2011 SAP AG. All rights reserved.18 Public What is Panorama? Vision: Panorama is a self-service, real-time dashboarding mobile solution for business users, leveraging LAVA design principles as self-service enabler, Analytics on Demand (AoD), HANA Views and Data Specification Language (DaSL) to easily create and consume powerful analytic computations running at HANA speed. Key Value Proposition:  Self-service dashboarding  On device authoring  LAVA  Automated Storytelling

©2011 SAP AG. All rights reserved.19 Public Future Roadmap Within the context of Panorama: Data Modeling and enrichment using external Linked Data Sources Defining visualization vocabulary  recommendation of visualization Machine learning problems and user profiling

©2011 SAP AG. All rights reserved.20 Public Summary Participation in 3 projects (RUBIX, remix and Panorama) Project remix made it to the finals in TechEd Madrid Published 3 papers Investigating possible research problems in Panorama

Thank You! Contact information: Ahmad Assaf SAP Research, France

©2011 SAP AG. All rights reserved.22 Public No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. Microsoft, Windows, Excel, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation. IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z, System z10, System z9, z10, z9, iSeries, pSeries, xSeries, zSeries, eServer, z/VM, z/OS, i5/OS, S/390, OS/390, OS/400, AS/400, S/390 Parallel Enterprise Server, PowerVM, Power Architecture, POWER6+, POWER6, POWER5+, POWER5, POWER, OpenPower, PowerPC, BatchPipes, BladeCenter, System Storage, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, Parallel Sysplex, MVS/ESA, AIX, Intelligent Miner, WebSphere, Netfinity, Tivoli and Informix are trademarks or registered trademarks of IBM Corporation. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. Adobe, the Adobe logo, Acrobat, PostScript, and Reader are either trademarks or registered trademarks of Adobe Systems Incorporated in the United States and/or other countries. Oracle and Java are registered trademarks of Oracle and/or its affiliates. UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group. Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of Citrix Systems, Inc. HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C ®, World Wide Web Consortium, Massachusetts Institute of Technology. © 2011 SAP AG. All rights reserved. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork, and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Business Objects and the Business Objects logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius, and other Business Objects products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Business Objects Software Ltd. Business Objects is an SAP company. Sybase and Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere, and other Sybase products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of Sybase, Inc. Sybase is an SAP company. All other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves informational purposes only. National product specifications may vary. The information in this document is proprietary to SAP. No part of this document may be reproduced, copied, or transmitted in any form or for any purpose without the express prior written permission of SAP AG.

©2011 SAP AG. All rights reserved.23 Public © 2011 SAP AG. Alle Rechte vorbehalten. Weitergabe und Vervielfältigung dieser Publikation oder von Teilen daraus sind, zu welchem Zweck und in welcher Form auch immer, ohne die ausdrückliche schriftliche Genehmigung durch SAP AG nicht gestattet. In dieser Publikation enthaltene Informationen können ohne vorherige Ankündigung geändert werden. Die von SAP AG oder deren Vertriebsfirmen angebotenen Softwareprodukte können Softwarekomponenten auch anderer Softwarehersteller enthalten. Microsoft, Windows, Excel, Outlook, und PowerPoint sind eingetragene Marken der Microsoft Corporation. IBM, DB2, DB2 Universal Database, System i, System i5, System p, System p5, System x, System z, System z10, System z9, z10, z9, iSeries, pSeries, xSeries, zSeries, eServer, z/VM, z/OS, i5/OS, S/390, OS/390, OS/400, AS/400, S/390 Parallel Enterprise Server, PowerVM, Power Architecture, POWER6+, POWER6, POWER5+, POWER5, POWER, OpenPower, PowerPC, BatchPipes, BladeCenter, System Storage, GPFS, HACMP, RETAIN, DB2 Connect, RACF, Redbooks, OS/2, Parallel Sysplex, MVS/ESA, AIX, Intelligent Miner, WebSphere, Netfinity, Tivoli und Informix sind Marken oder eingetragene Marken der IBM Corporation. Linux ist eine eingetragene Marke von Linus Torvalds in den USA und anderen Ländern. Adobe, das Adobe-Logo, Acrobat, PostScript und Reader sind Marken oder eingetragene Marken von Adobe Systems Incorporated in den USA und/oder anderen Ländern. Oracle und Java sind eingetragene Marken von Oracle und/oder ihrer Tochtergesellschaften. UNIX, X/Open, OSF/1 und Motif sind eingetragene Marken der Open Group. Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame und MultiWin sind Marken oder eingetragene Marken von Citrix Systems, Inc. HTML, XML, XHTML und W3C sind Marken oder eingetragene Marken des W3C ®, World Wide Web Consortium, Massachusetts Institute of Technology. SAP, R/3, SAP NetWeaver, Duet, PartnerEdge, ByDesign, SAP BusinessObjects Explorer, StreamWork und weitere im Text erwähnte SAP-Produkte und ­ Dienstleistungen sowie die entsprechenden Logos sind Marken oder eingetragene Marken der SAP AG in Deutschland und anderen Ländern. Business Objects und das Business-Objects-Logo, BusinessObjects, Crystal Reports, Crystal Decisions, Web Intelligence, Xcelsius und andere im Text erwähnte Business-Objects-Produkte und ­Dienstleistungen sowie die entsprechenden Logos sind Marken oder eingetragene Marken der Business Objects Software Ltd. Business Objects ist ein Unternehmen der SAP AG. Sybase und Adaptive Server, iAnywhere, Sybase 365, SQL Anywhere und weitere im Text erwähnte Sybase-Produkte und -Dienstleistungen sowie die entsprechenden Logos sind Marken oder eingetragene Marken der Sybase Inc. Sybase ist ein Unternehmen der SAP AG. Alle anderen Namen von Produkten und Dienstleistungen sind Marken der jeweiligen Firmen. Die Angaben im Text sind unverbindlich und dienen lediglich zu Informationszwecken. Produkte können länderspezifische Unterschiede aufweisen. Die in dieser Publikation enthaltene Information ist Eigentum der SAP. Weitergabe und Vervielfältigung dieser Publikation oder von Teilen daraus sind, zu welchem Zweck und in welcher Form auch immer, nur mit ausdrücklicher schriftlicher Genehmigung durch SAP AG gestattet.