Datenanalyse und Data Mining mit der SAS Software

Slides:



Advertisements
Ähnliche Präsentationen
Objektrelationales Mapping mit JPA
Advertisements

E-Commerce Shop System
Prof. Dr. Dr. h.c. mult. August-Wilhelm Scheer
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice HP Web Jetadmin.
Prof. Dr. Hans-Jürgen Scheruhn
Design- und Entwicklungswerkzeuge
<<Presentation Title>>
Systemverwaltung wie es Ihnen gefällt.
Mehrwert aus Daten gewinnen mit Datamining und Textmining
Stefanie Selzer - Pascal Busch - Michael Kropiwoda
Installation des Add-In SOLVERSTAT
LINUX&NT/ Konkurrenz &Kooperation Dürrenweid Professur systeme Betriebs- CheOpS 1 LINUX & Windows NT - Konkurrenz & Kooperation Historie Konfiguration.
Die Bank von morgen - eine neue Welt für IT und Kunden? 23. Oktober 2001.
INSTITUT FÜR DATENTECHNIK UND KOMMUNIKATIONS- NETZE 1 Harald Schrom ViEWcon08.
SKALIERBARE HARDWARE UNABHÄNGIGE LÖSUNGEN FÜR HSM, ARCHIVIERUNG UND SICHEREN DATENAUSTAUSCH YOUR DATA. YOUR CONTROL.
Windows Small Business Server 2008
Michael Haverbeck System Engineer
Don`t make me think! A Common Sense Approach to Web Usability
Clustered Neuronal Network A C#.NET project for Compute Cluster Server 2003.
DATA WAREHOUSE Oracle Data Warehouse Mit Big Data neue Horizonte für das Data Warehouse ermöglichen Alfred Schlaucher, Detlef Schroeder DATA WAREHOUSE.
Analytisches CRM Phonnet Gruppe 1 (Loher, Meier, Rehhorn, Piasini)
Data Mining mit SQL Server 2008 und Excel 2007
7th German CDISC User Group Basel, 11. März 2010 Willkommen zum Define.xml Workshop.
Problemstellung Heterogene DV-Strukturen Mangelnde Kapazität in der EDV-Abteilung Historische Daten nicht verfügbar Analysen belasten die vorhandene Infrastruktur.
Technisches Update Veeam Backup & Replication Version 7 Cloud Edition
BusinessPerformancePoint Server 2007 Planen, Überwachen, Analysieren
Claudia Fischer Licensing Marketing Manager Jochen Katz Product Manager – Windows Server Anna Fetzer Product Manager – System Center.
Agenda 13: Begrüßung & Einführung in das Thema
R zieht ein in das Oracle Data Warehouse
ESRI EUROPEAN USER CONFERENCE
© 2009 Quest Software, Inc. ALL RIGHTS RESERVED Quest Recovery Manager for SharePoint Volker Pingen Senior System Consultant
Projektmanagement Ziel und Umfang eines Softwareprojektes definieren
Mehr Zeit für den Kunden und wirtschaftlicher Arbeiten mit BIB-Control
Phonnet Pflichtenheft Analytisches CRM Gruppe 1 (Loher, Meier, Rehhorn, Piasini) Phonnet Analytisches CRM.
Titelmasterformat durch Klicken bearbeiten Textmasterformate durch Klicken bearbeiten Zweite Ebene Dritte Ebene Vierte Ebene Fünfte Ebene 1 Titelmasterformat.
cubido business solutions gmbh Haidfeldstrasse 33 A-4060 Leonding Rainer Stropek SQL Server.
Vorgehen Business Analyse
Software Architektur für on-premise und die Cloud Lösungen
Windows Interface Guidelines for Software Design1 The Windows Interface Guidelines for Software Design.
RETAIL 2010 MAXIMALER SCHUTZ MINIMALE BELASTUNG. RETAIL 2010 MAXIMALER SCHUTZ MINIMALE BELASTUNG Nur 8 MB Speicherverbrauch.
VRealize Operations Insight. Sehen & analysieren Sie all Ihre IT-Daten Structured Data Metrics Alerts Events VMware vRealize Operations Kapazität, Leistungs-
Vorgehen Business Analyse
Ralf M. Schnell Technical Evangelist Microsoft Deutschland GmbH
Christian Binder Senior Platform Strategy Manager Microsoft Deutschland GmbH.
Ralf M. Schnell Technical Evangelist Microsoft Deutschland GmbH.
OOSE nach Jacobson Sebastian Pohl/ST7 Betreuer: Prof. Dr. Kahlbrandt.
Arbeiten in einem agilen Team mit VS & TFS 11
? What is Open PS? SAP Open PS based on EPS 4.0
Dynamic Threat Protection detect. prevent. respond. Desktop. Server. Netze.
XML Seminar: XP und XML 1 XP and XML Gregor Zeitlinger.
3rd Review, Vienna, 16th of April 1999 SIT-MOON ESPRIT Project Nr Siemens AG Österreich Robotiker Technische Universität Wien Politecnico di Milano.
EUROPÄISCHE GEMEINSCHAFT Europäischer Sozialfonds EUROPÄISCHE GEMEINSCHAFT Europäischer Fonds für Regionale Entwicklung Workpackage 5 – guidelines Tasks.
Berliner Elektronenspeicherring-Gesellschaft für Synchrotronstrahlung m.b.H., Albert-Einstein-Straße 15, Berlin frontend control at BESSY R. Fleischhauer.
NEYCONSULT Profile Adalbert Ney. NEYCONSULT Consulting Profile: Sucessfull track story since 2003 in the areas of procurement and supply chain management.
EUROPÄISCHE GEMEINSCHAFT Europäischer Sozialfonds EUROPÄISCHE GEMEINSCHAFT Europäischer Fonds für Regionale Entwicklung Workpackage 5 – guidelines Tasks.
Seminararbeit Release Management von Web-Systemen Minh Tran Lehrstuhl für Software Engineering RWTH Aachen
Making Global Knowledge Leaders Oracle eBusiness Intelligence: Vom Business Monitoring zum Strategischen Unternehmensmanagement 14. Deutsche ORACLE Anwenderkonferenz.
Cubido business solutions gmbh Haidfeldstrasse 33 A-4060 Leonding di(fh) Wolfgang Straßer ° Rundumblick.
Application Performance Management Udo Brede.  Komplexe Umgebungen  Häufige Änderungen  Hohe Aktivität Database Servers Application Servers Web Servers.
Technische Universität München Alexander Neidhardt Forschungseinrichtung Satellitengeodäsie 1 Concepts for remote control of VLBI-telescopes: on the way.
Warum Data Science Ausbildung an einer Wirtschaftsuniversität? Axel Polleres, Institut für Informationswirtschaft, WU
© 2014 VMware Inc. All rights reserved. Automatisierung und Verrechnung in einer IoT Cloud am Beispiel von Bosch Private Cloud Costing | Cloud Business.
Linde AG : Transparent data enable efficient plant engineering
Scamander S O L U T I O N S Befreien Sie Ihre Oracle Applications Daten! Christian Rokitta - Berater Scamander Solutions BV
Premiere Conferencing GmbH
IT QM Part2 Lecture 7 PSE GSC
1. Stammorganisation 2. Projektorganisation 3. Prozessorganisation
Intelligent Data Mining
Integrating Knowledge Discovery into Knowledge Management
Enterprise Search Solution
 Präsentation transkript:

Datenanalyse und Data Mining mit der SAS Software SAS Enterprise Miner TM Datenanalyse und Data Mining mit der SAS Software Reinhard Strüby SAS Institute Heidelberg

Warum? - Data Mining und Anforderungen Was? - Data Mining Definition Inhalt Warum? - Data Mining und Anforderungen Was? - Data Mining Definition Wer? - Anwendergruppen Wie? - Erfolgsfaktoren für Data Mining Wie? - Der SEMMA Prozess SAS! - Die SAS Data Mining Lösung Data mining is the technology which drives the major IT application today: Customer Relationship Management. Besides the Web data mining is one of the hotest IT topics today. In this presentation we want to address the following questions: - Why is d.m. of importance today? - What is it? - Who is the target audience for data mining and how to address their needs? A related question is who would be the decision maker on a data mining sale - What is needed to make DM a success? - The SAS solution is covering all this - today.

??? DATA MINING ? This is the situation today: Everybody talks about data mining, but despite the hype in the press very few understand it and know how to use it for business advantage. This is exactly the question which we want to address.

Welche Kontakte gab es bisher? Die Geschäftsaufgabe “Kenne Deine Kunden!” Wer sind sie? Was wünschen sie? Welche Kontakte gab es bisher? Wie kann eine dauerhafte Beziehung hergestellt werden? Welche Kunden könnten uns verlassen? We are living in a time of world-wide competition. This means more opportunities, but also more potential problems. In the past IT was mainly focussed on getting the daily job done - so master the transactions - telco, banking, insurance, retai etc. Nowadays the customer generating all of these transactions becomes the focus of interest. Knowledge about your customer is power and makes you more successful. - Who are they? Customer segmentation. Do I have different customer groups with different needs? Can I differentiate “good” and “bad” c.? - What are they like? Customer profiling? What attributes (eg. demographics) make a good or bad customer? - What interactions have already taken place? Can I build up from past data a model of customer behaviour? This “training” is needed to generalise findings for new customers - How to build up a long term relationship? Depending on the industry it is 4-10 times more expensive to acquire a new customer than to make an old customer happy. - What customers might leave us? Churn management. Major concern, eg. in Telco industry.

Data Mining - warum jetzt? Erhöhter Wettbewerbsdruck Geringere Speicherkosten und höhere Rechengeschwindigkeiten Data Warehouses oft vorhanden Versteckte Informationen in großen Dateien Data Mining - Methoden finden Muster GUI Data Mining Anwendungen Kundendruck auf Veränderungen ROI erhöhen The increased competitive pressure is the major business reason for the data mining popularity. Some IT infrastructure came into play to make DM more accessible. A first step is the ever decreasing cost to store large amounts of data and process it with ever faster CPUs. The larger amounts of data (incl. the web) need proper ways for storing and managing it for business advantage: the Data Warehouse. Traditional reporting on large amounts of data gives consistent results, but we are not sure if we ask the right questions. The question “which customers should we concentrate on in future” might be more relevant than the traditional “which products in what area in the last year generated the most profit?” Data mining is exactly that technology which looks for interesting patterns in customer behaviour which could be exploited for business advantage. Data mining was a specialist area until very recently. The widespread availability of user-friendly DM packages in the last few years made a big difference. Data mining is almost always directed towards a business question: eg how can I improve my return rate on direct mail? How can I find out which potential customer will turn into a bad deptor?

Data Mining Definition Data Mining ist der Prozess des Selektierens, Erklärens und Modellierens großer Datenmengen, um bisher unbekannte Datenmuster für einen Geschäftsvorteil zu nutzen. Data Mining is defined different ways depending on who you are talking to ranging from artificial intelligence to statistical analysis to OLAP. So let’s start with a basic definition of data mining. Note that data mining implies some use and acceptance of new methodologies to understand relationships in data. Data mining is a process of steps we will look more into detail later on. Implicit in this definition is the fact the some traditional techniques are well suited for data mining activities as well. Key to the definition of data mining is the idea of large data sets, many sample sizes are in the range of 50 gigabytes and up. Also implicit in the definition is the concept of a model. Determining what are important customer/process attributes and then building a predictive model is the key to successful data mining.

Data Mining ist ein Prozess. Data Mining beinhaltet die enge Kooperation von IT, Fachabteilung und Data Minern. Data Mining ist nicht beschränkt auf bestimmte Industriezweige oder Probleme. It is now generally accepted that data mining is a process of steps (META, IBM) and more insight into this process is needed. The next characteristic is that DM similar to Data Warehousing involves a cooperation issue. DW needs cooperation of IT and Business. Data Mining builds on top of DW and needs in addition the expertise of “data miners”, staff who are trained in analytical methods and who should be part of the data mining team (but not the only one group involved!). Data mining can be used anywhere, as we will discuss on the next slides.

DATA MINING - INDUSTRIES General Customer Segmentation Targeted/cross marketing Pricing Analysis Associations & Demography Insurance & Health Care Claim Analysis Fraudulent Behavior Banking Credit Authorization Credit Card Fraud Detection Portfolio Analysis Cash Planning Telecommunications Call Behaviour Analysis Churn Management Retail/Marketing Market Basket Analysis Database Marketing Category Management Production and Utilities Process Management Demand Patterns Capacity Planning Inventory Planning When relating data mining apps to industries it can be differentiated into dm apps common to all industries (“general”) and to those specific for particular idustries. The financial sector, telecoms and retail in that order are the industries with the highest demand for data mining solutions. Industry specific DM apps are often related to the particular operational business of that industry, eg. banking: financial transactions insurance: claim analysis telco: call behaviour analysis. So this means that there is a chance to get a foot into the door to own this business-critical data via data mining, with data warehousing as a possible follow-up activity.

IS DATA MINING IMPORTANT? Postbank N.V. “50% response on first mailing payed for DM investment” US West “Reducing customer churn by any amount is 10 times cheaper than gaining a new customer” ABN AMRO “Interest earned on 40% reduction in cash in ATMs” Neckermann Versand AG “Increased number of good customers getting credit by 80 a day” Gloucestershire Constabulary “For the public, increased crime pattern identification and prevention is priceless” Data Mining is being used today in many areas today. And those companies listed here have made very favourable experiences with it. Finally all of these companies have something in common: they use the SAS System for data mining and they are happy to talk about.

Leiter von Fachabteilungen / Spezialisten Data Miner DATA MINING - Nutzer Leiter von Fachabteilungen / Spezialisten Data Miner Who is the user and who decides on data mining projects? As we will see these groups are NOT identical. Users first. These are: Users in business departments, business analysts or knowledge workers and their department heads Data Miners or quantitative experts, who depending on industry or company may sit in IT, marketing, or planning departments. They are not usually known under that generic name but may have titles such as marketing analyst, financial analyst, actuary or such. Data miners are more common in the financial industry than in eg. retail. Heads of business departments are often intiators of a data mining project and influence the buying decision, although they will rarely buy themselves (probably IT). Data miners are at best influencers but play an important role as a user and support for business staff.

Nutzer: Fachabteilung Kennen das Fachgebiet Verstehen die Inhalte der Daten Suchen nach Informationen, haben aber oft geringe analytische Kenntnisse Arbeiten häufig in Marketing-Abteilungen als Analyst Business technologists are individuals who know their industry and selected business processes of their company very well. Moreover they are familiar with the contents of the data reflecting their business processes although they might not know where/how they are being stored. Business analysts need to analyze business critical beyond the level of standard reporting but are not professional statisticians. They are not familiar with the syntax and options of SAS System quantitative procedures or with statistical terminology. Their focus is on making more money for their organizations through execution of specific business objectives. The business technologist requires a high level interface to the data mining capabilities and expects to be able to apply the SEMMA methodology without having to be a statistician. The business technologist may also wish to do campaign management.

Etwas isoliert von Geschäftsfragen Nutzer: DATA MINER Quantitative Experten: statistischer/mathematischer Background oder vergleichbare Kenntnisse Etwas isoliert von Geschäftsfragen Vertraut mit Algorithmen und Datenanalyse-Prozess Häufig im Finanzsektor, sonst eher selten Data miners or quantitative experts may come from IT, maketing, or planning departments. As mentioned they are fairly common in the financial industry, but rarely to find in Telecomms or Retail. They have a statistical or mathematical background and now a lot about analytical techniques - the traditional SAS/STAT user. They probably build models now for tasks such as customer profitability, credit scoring, or financial modelling and deliver these to department heads, finally to upper management but are not involved in the decision making process. If there are no data miners at your prospect then this need to be compensated by SAS consulting or partners/QPs to exploit the SAS Data Mining Solution to its full potential.

THE DATA MINING MARKET - IN MILL $ Breakdown of figures from the previous page. I have left off non-software related categories such as data providers (Nielsen etc.), and systems integrators (HW suppliers, big 6 etc.). With these omitted Meta splits up the data mining software providers into 5 different categories: Vertical application providers to provide business functions such as churn management, supply chain optimisation etc. for particular industries Horizontal applications (cross-industry) such as marketing and sales tracking, customer service orientation Macro mining. These are basic toolkits to provide custom data mining solutions. Micro mining. These include Q&R, OLAP and pure statistical solutions. Data visualization - as part of statistics or GIS. With the exception of vertical applications the SAS System is mentioned in all categories as one of the leading contenders if not the leading one. And we have still room to grow into the protitable vertical applications area. Source: META Group, Data Mining Market Trends 1997-1998 * SAS System mentioned in this Category.

Data Warehousing / Data Mining integrated THIRD GENERATION DATA MINING -Integrated Corporate OLTP Systems Data Warehouses Data Marts Data Mining DSS, EIS OLAP VSAM IMS DB2 Demographic Data IT Data Miners Business Depts. Mkt Analysts Exec Mgmt The market is looking now for integrated solutions such as in the diagram above. First generation data mining involved outsourcing data mining to a service provider with turn around times of weeks, unsatisfactory support and most often results in paper form with no way to integrate into corporate IT. Second generation data mining already included in-house data mining but in from of stand-alone DM packages which often had serious deficiencies to read all sort of corporate data directly. Eg. neural network packages often could not even deal with character data and needed flat-file input. A much more desirable situation is the one above, where data storage and management is donein a data warehouse environment, extracts are made on a departmental level as data marts on which data mining can be pursued easily and painlessly. Functions are clearly separated: IT manages the operational data and the DW and data miners as well as business staff has access to both the data marts as well as the data to be mined. Furthermore anyone including Exec management is in a position to consume the results of data mining via customised reporting. Lifestyle and Behaviour Data Industry Data Data Warehousing / Data Mining integrated

Zugriff auf alle Datenquellen - Data Warehousing Erfolgsfaktoren Zugriff auf alle Datenquellen - Data Warehousing Skalierbarkeit: HW / SW Breites Spektrum von DM Methoden: Konzentration auf Geschäftsprobleme Strategie der Implementation These are the basic 4 criteria for a successful data mining project. 1. Esotheric discussions on data mining methodology often miss the most important aspect: that ALL available data no matter what source or format can be read immediately for use in data mining. This is best done using Data Warehousing, also covering Web data as a source. 2. Data mining is typically done on A LOT of data, gigabytes, sometimes even terabytes. Data mining solutions therefore need to be scalable to cover this amount of data. Stand-alone PC packages quickly run out of steam. Data mining requires typically a client-server setup. 3. There is quite a discussion on what data mining technique gives the best result. Again this discussion misses an important point: it depends on the business question and on the data what fits best, so the more dm methods are available the more flexible the solution can be tailored to the problem. This way we can concentrate on the business problem rather than being forced to twist the data or problem to the technique at hand. 4. Do not forget the user. Key for the success of data mining is its acceptance by the user. The SEMMA methodology has proven to be a practical approach for data mining.

Vergleich OLAP gegen Data Mining Report Writing Data Mining Methodology Nutzergesteuertes Reporting - Dimensionen bekannt Datengesteuerte Exploration - Suche nach Dimensionen Within data warehousing OLAP is another popular technique for data exploitation. OLAP vendors have tried in the past to position OLAP as a way to do data minin.g. This is far from the truth. OLAP is reporting on data along known and defined dimensions. A typical question would be: tell me the best selling product in the year 1966 for Europe. OLAP is a top-down approach - ask an exact question where all dimensions of the question are well defined. Data mining is exactly the opposite. Data mining is a bottom-up strategy. Starting from the data what are interesting patterns in the data which might lead to a business relevant question. Another way to put it: data mining is an exploration for the relevant questions - such as: which are the customers we should be concentrating on? OLAP and data mining are no alterternatives but work together well. Data mining finds the relevant questions and OLAP very well suited to track the answer at any point in time. Bestverkauftes Produkt im Jahr 1997 in der Region X ? Auf welche Kunden sollten wir uns konzentrieren ?

SAS DATA MINING SOLUTION Data Mining, IT and Business Transform Data into Information Act on Business Question Data Warehouse DBMS Data Mining Processing EIS, Business Reporting, Graphics Identify Problem Measure Results The “virtual cycle of data mining” (Berry, Linhoff: Data Mining Techniques for Marketing, Sales and Customer Support, Wiley 1997) gives a good picture how well data mining integrates with business and IT strategies. The first phase is to identify a business opportunity, such as a sizeable number of profitable customers are leaving a bank. Business problems should be reflected in the corporate database. The data warehouse works as the corporate memory and we assume that there is information about the customers who have left in the data warehouse. In the next step, data mining, we would like to gain actionable results such as what is the profile of those leaving our bank as opposed to those who stay with us. Let’s suppose we have identified a major portion of the leavers as on average having multiple accounts, often using ATM’s of other banks, and often trying to call us on business matters but sometimes have to wait for a response. We would like to act on this information. If we could clearly identify the group of leavers mentioned above then we could provide them with an additional telephone number with a higher service priority. We follow-up on the results using standard reporting (EIS systems or the like). to see if We would like to measure the results. It may well be that this measure pays the desired result of reducing customer churn. If so we have to redefine the business opportunity, look for further ways to reduce churn or we may attack a completely new business opportunity such as introducing new financial services for customer groups with different needs. - The virtuous cycle of data mining starts again.

SEMMA Sample Explore Manipulate Model Assess Sampling? Visual Exploration Data Reduction Grouping, Subsetting Transform Neural Networks Decision Trees Statistical Techniques Associations, Sequences Model Comparison, New Questions Within the data mining stage (see slide before) SAS Institute offers SEMMA, a refined data mining methodology which has been welcomed by our early adopters of data mining. Data mining might involve many tries to find patterns with multiple passes of the data. Taking a random sample of the data is very appropriate to save execution time. If we have found a reasonable explanation for an opportunity such as customer profitability we would of course score the whole data, so each customer for profitability. One word of caution: when being in competition with IBM you might find that they will wrongly accuse us that sampling indicates that we cannot deal with larger data sizes. Not true at all - sampling is just efficient, not required. Before any modelling is done it is a good idea to survey the data graphically or statistically for significant features, outliers, groupings and other obvious major characteristics. Exploration of data gives us a good idea what we should look for at the modelling stage. Insights gained in exploration should be reflected in the data in a data manipulation stage. Maybe we can concentrate on a set of customer characteristics which have proven to be interesting, certain groups may have been identified with different behaviour and the group information should be added, what about missing values etc. All the information gained so far can now be used to sensibly model customer behaviour with the appropriate technique. We offer a wide range of data mining techniques to fit to any business question. The final step of most interest for the business user is the assessment. How well does the model explain customer behaviour? If we have several alternatives for explanation which one is the best in terms of associated profits? Results in assessment should be presented in a business-like and graphical way to meet the needs of the business user.

SAMPLING ? Empfohlen, nicht Voraussetzung: Inhalte gehen nicht verloren. Erhebliche Performance Vorteile Modellprüfung: Training, Testing, Validation Samples

Erkennen von Ausreißern, Gruppen, Assoziationen ... EXPLORATION Erkennen von Ausreißern, Gruppen, Assoziationen ... Visual Exploration: 3-dim. Charts Graphische Daten Analyse GIS Analytical Exploration: Cluster Analysis Correspondence Analysis PCA, Factor, MDS … Welche Fragen sollten gestellt werden ?

Welches sind wesentliche Variable? Fehlende Werte ? DATA MANIPULATION Welches sind wesentliche Variable? Fehlende Werte ? Variablentransformation ? Neue Informationen hinzufügen: Groups, Labels etc. Mit welchen Informationen sollte ich arbeiten ?

MODELLING Welche Form haben meine Daten ? ... NNs Statistical Tree-based Methods Time Series Welche Form haben meine Daten ? ...

Bewertung: Wie gut ist mein Modell ? Assessment - Scoring Erklärungsbeitrag der Variablen, Ausreißer Assessment - Scoring Klassifikation Lift Charts Verallgemeinerung für andere Daten

SAS Data Mining Solution Currently (Feb 98) Data Warehousing incl. Web Technology Analytical Solutions NNA - Production on Win, OS/2 and all major UNIX, ORLANDO I and II Tree Menue System Exploration: INSIGHT, SPECTRAVIEW, GIS Statistics Time Series Forecasting Market Research Methods EIS, Enterprise Reporter, Graphics SAS Institute has offered solutions for data mining before the term was fashionable. These always consisted of three key ingredients: Solutions for data access and management, now with the Warehouse Administrator and all new technology for parallel access of data (SPDS) and access of Web data (SAS/IntrNet). The multitude of analytical solutions SAS Institute was always known for, latest enhancements for data mining were the SAS Neural Network Application and the Tree Menue System. End user reporting tools such as the Enterprise Miner. Together these solutions make up the SASSystem for Data Mining, which according to the Meta Group and the Two Crows Corporation is the market leader for data mining in terms of customer usage(see later slides).

Neue SAS DM Lösung SAS Enterprise Miner TM Einheitliche und voll skalierbare Business Lösung für das Data Mining Füllt den Platz zwischen Data Warehousing und Endnutzer Reporting aus. Das GUI schaft ein nutzerfreundliches front-end für den SEMMA Prozess.

SAS ENTERPRISE MINER Vorteile für die Nutzer: IT: DW Zugriff, Skalierbarkeit Business Nutzer: Intuitive Oberfläche und Orientierung auf die Geschäftsfragen Data Miners: Analytische Tiefe und Flexibilität

SAS ENTERPRISE MINER Umgebung Projekte/Modelle in Win95 Hierarchiestruktur SEMMA Prozess in Process Flow Diagrams Bestehende SAS Programme und Anwendungen können einfach integriert werden. Alle Funktionalitäten des SAS Enterprise Miner wie die DMDB und alle analytischen Werkzeuge sind ausschließlich in dieser Data Mining Lösung verfügbar. SEMMA is an acronym for a data mining process. Data Mining is part of a process: 1) begin by generating ideas and hypotheses, 2) validate ideas based on patterns in the data, 3) transform into actionable segments, 4) measure the results. Note that preceding SEMMA is a data warehousing or data mart strategy and that following SEMMA is the delivery of this information to the “downstream” consumers of the data mining work. The downstream users are the ones who execute the marketing/sales campaign around the results of the data mining activities. The workflow of a campaign includes: 1) list pulling from a scored file 2) campaign analysis, 3) campaign tracking, 4)campaign profitability, and 5) history. Campaigns are launched as outreach efforts (such as mailings, telemarketing, and web activities) to the list of customers identified from the list pull. The results of the campaign are then passed back to the data mining environment for input into the next pass.

ENTERPRISE MINER User Interface 3 Hauptfenster: Projects, Data Mining Workspace, Tools Palette

ENTERPRISE MINER Projekt Fenster Start: Doppel-click EM Icon Fenster der verfügbaren Projekte Maus-Steuerung Pull-down menus: File, Edit, View, Insert, Globals, Options, Help Toolbar: Up one level, Delete, Properties, Help Pop-up menu: Open, Rename, Delete, Properties Projekte: Create, Open, Save, Run, Close, Delete

ENTERPRISE MINER andere Fenster Data Mining Window (DMW) Default: open Build, edit, run process flow diagrams Tools Window Tool palette, covers EM functionality D n’ D tools on DMW window Message Window Default: closed Messages generated when creating/running PFDs

ENTERPRISE MINER Process Flow Diagrams

ENTERPRISE MINER DM Workspace Window Toolbar: Open, Save, Cut, Copy, Paste, Undo, Help Pull-down menu: File, Edit, View, Actions, Globals, Options, Windows, Help Pop-up menu: Add node, add endpoints, paste, undelete, select all, create subdiagram, refresh, up one level, top level, connect items, move and connect items Add nodes: dnd icons or use pop-up menu Connect, cut, delete nodes PFD logic: tools loosely organized according to SEMMA

ENTERPRISE MINER Funktionalitäten Data: Input Data Source, Random Sample, Partition, DMDB Explore/Modify: Transform Data, Filter Outliers, Bar Chart, INSIGHT, Clustering, Variable Selection Modelling: DM Regression, Neural Networks, Tree Models, Associations Assessment: Scoring, Assessment Utilities: Group Processing, Data Replacement, SAS Code Node, Administrator, Nodes Manager, Control Points, Subdiagrams.

Regeln für die Knoten Input data source node zuerst in PFD. Sampling nach Input, dann beliebige Exploration, Modifizierung oder Modellierung An beliebiger Stelle: Filter outliers, transform, bar chart Nach Cluster: filter outliers, transform, bar chart, oder Modellierungen Einem Assessment muß Modellierung vorangehen.

Einheitliches Erscheinungsbild der Knoten Dialog über Tabulatoren Datendialog Variablendialoge Notizendialog (einige Knoten): Browser für Resultate

SAS ENTERPRISE MINER Flow DMINE Numerical Exploration DMREG (Logistic) Regression NEURAL Neural Networks SPLIT CHAID/ CART Graphical Sampling Random, Stratified DBMS, Data Warehouse Other Data Factor, Discrim ... Reporting, EIS Assessment Comparison DMDB Data + Metadata SAS Enterprise Miner Data Mining Database

SAS ENTERPRISE MINER Systemanforderungen Pentium PC Windows NT 4.0+ or Win 95 250 Mb + freier Plattenplatz CD ROM Laufwerk

SAS ENTERPRISE MINER Architektur Client-server Lösung: Clients: Win 95, Win NT Servers: Win NT, all major UNIX Mainframe als Data Server, später auch Compute Server Beta: Only Win95, Win NT initially. Unix: AIX, HP-UX, Solaris

SAS ENTERPRISE MINER Beta Etwa 100 EM Beta Anwendungen in USA Etwa 60 EM Beta Tester in EUROPA

Zusammenfassung SAS Enterprise Miner: Modelliert Data Mining als einen Prozess Ermöglicht Kooperation von IT, Business und Data Miners Vollständige SEMMA Implementation Integration von DW, DM and Reporting Wettbewerbsvorteil durch Data Mining