Testing the Importance of Cleansing Procedures for

Slides:



Advertisements
Ähnliche Präsentationen
Cadastre for the 21st Century – The German Way
Advertisements

PRESENTATION HEADLINE
PSI and Competition The General Framework
Finding the Pattern You Need: The Design Pattern Intent Ontology
E-Solutions mySchoeller.com for Felix Schoeller Imaging
H - A - M - L - E - IC T Teachers Acting Patterns while Teaching with New Media in the Subjects German, Mathematics and Computer Science Prof. S. Blömeke,
Mean and variance.
Managing the Transition from School-to-Work Empirical Findings from a Mentoring Programme in Germany Prof. i.V. Dr. Martin Lang.
Forschungsdatenzentrum der Bundesagentur für Arbeit im Institut für Arbeitsmarkt- und Berufsforschung Two Issues on Remote Data Access.
R. Zankl – Ch. Oelschlegel – M. Schüler – M. Karg – H. Obermayer R. Gottanka – F. Rösch – P. Keidler – A. Spangler th Expert Meeting Business.
Herzlich Willkommen zum Informations-Forum: SAP Interoperabilität
1 | R. Steinbrecher | IMK-IFU | KIT – die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Natural Sources SNAP11.
First Seminar in Brussels, 15th of December 2010
Die Senatorin für Arbeit, Frauen, Gesundheit, Jugend und Soziales ESF-Verwaltungsbehörde Freie Hansestadt Bremen Hildegard Jansen, head of Unit labour.
Steinbeis Forschungsinstitut für solare und zukunftsfähige thermische Energiesysteme Nobelstr. 15 D Stuttgart WP 4 Developing SEC.
Peter Marwedel TU Dortmund, Informatik 12
NUMEX – Numerical experiments for the GME Fachhochschule Bonn-Rhein-Sieg Wolfgang Joppich PFTOOL - Precipitation forecast toolbox Semi-Lagrangian Mass-Integrating.
WS 2006/07 1 U. van SuntumKonjunktur und Beschäftigung 7. Expectations and the Phillips Curve Exogenous business cycle explanation rational expectations.
Institut für Verkehrsführung und Fahrzeugsteuerung > Technologien aus Luft- und Raumfahrt für Straße und Schiene Driving Manoeuvre Recognition > 19. Januar.
Insulin pump therapy in adults allows metabolic control at lower rates of hypoglycemia along with reduced insulin doses – results from the nationwide DPV-survey.
Lancing: What is the future? Lutz Heinemann Profil Institute for Clinical Research, San Diego, US Profil Institut für Stoffwechselforschung, Neuss Science.
Three minutes presentation I ArbeitsschritteW Seminar I-Prax: Inhaltserschließung visueller Medien, Spree WS 2010/2011 Giving directions.
Hochschulteam der Agentur für Arbeit Trier Preventing the Brainware Crisis Workshop Schloss Dagstuhl Student Enrollment in Computer Science.
Introduction to the topic. Goals: Improving the students essay style in general Finding special words and expressions that can be used in essay writing.
Fachabteilung 16A Überörtliche Raumplanung Cross border co-operation from the view of a public administration unit.
Deutsche Gesellschaft für Technische Zusammenarbeit GmbH Integrated Experts as interface between technical cooperation and the private sector – An Example.
Institut für Umweltphysik/Fernerkundung Physik/Elektrotechnik Fachbereich 1 SADDU June 2008 S. Noël, K.Bramstedt,
Institut für Umweltphysik/Fernerkundung Physik/Elektrotechnik Fachbereich 1 Pointing Meeting Nov 2006 S. Noël IFE/IUP Elevation and Azimuth Jumps during.
Integration of renewable energies: competition between storage, the power grid and flexible demand Thomas Hamacher.
HAW Hamburg, CARPE 2011, Prof. Dr. Rüdiger Weißbach, Revision : Bridging the Communication Gap in IT Projects - Enabling Non-IT Professionals.
Case Study Session in 9th GCSM: NEGA-Resources-Approach
Machen Sie sich schlau am Beispiel Schizophrenie.
Forschungsdatenzentrum der Bundesagentur für Arbeit im Institut für Arbeitsmarkt- und Berufsforschung, Regensburger Str. 104, Nürnberg,
Institut AIFB, Universität Karlsruhe (TH) Forschungsuniversität gegründet 1825 Towards Automatic Composition of Processes based on Semantic.
T.Ruf, N.Brook, R.Kumar, M.Meissner, S.Miglioranzi, U.Uwer D.Voong Charge Particle Multiplicity Disclaimer: Work has started only recently! I am not an.
Centre for Public Administration Research E-Government for European Cities Thomas Prorok
BAS5SE | Fachhochschule Hagenberg | Daniel Khan | S SPR5 MVC Plugin Development SPR6P.
Topic: Work. Problem - Lösung Problem der Klasse: Mangelnde mündliche Beteiligung im Unterrichtsgespräch Lösungsansatz? Reflektion der eigenen Leistung.
Neno Loje Berater & MVP für Visual Studio ALM und TFS (ehemals VSTS) Hochqualitative Produkte mit Visual Studio & TFS 2010.
INTAKT- Interkulturelle Berufsfelderkundungen als ausbildungsbezogene Lerneinheiten in berufsqualifizierenden Auslandspraktika DE/10/LLP-LdV/TOI/
DEUTSCHLAND UND DIE MEDIEN
DEUTSCHLAND UND DIE MEDIEN
Institut für Öffentliche Dienstleistungen und Tourismus The role of universities for regional labour markets: the example of central Switzerland Simone.
SIT-MOON ESPRIT Project Nr st Review, Brussels, 27th of April 1998 slide 1 Siemens AG Österreich Robotiker Technische Universität Wien Politecnico.
Faculty of Public Health Department of Health Economics and Management University of Bielefeld WP 3.1 and WP 4.1: Macrocost EUprimecare Plenary Meeting.
Tage der Woche German Early Level Montag Dienstag Mittwoch Donnerstag
Kölner Karneval By Logan Mack
Criteria for Authorship
Impairments in Polarization-Multiplexed DWDM Channels due to Cross- Polarization Modulation Marcus Winter Christian-Alexander Bunge Klaus Petermann Hochfrequenztechnik-Photonik.
Berner Fachhochschule Hochschule für Agrar-, Forst- und Lebensmittelwissenschaften HAFL Recent activities on ammonia emissions: Emission inventory Rindvieh.
4th Symposium on Lidar Atmospheric Applications
Ein Projekt des Technischen Jugendfreizeit- und Bildungsvereins (tjfbv) e.V. kommunizieren.de Blended Learning for people with disabilities.
BASIS - Balanced Scorecards and Strategic Management Information Systems for Public Administrations Björn Niehaves European Research Center for Information.
Cross-Polarization Modulation in DWDM Systems
ESSnet Workshop Conclusions.
1 von 10 ViS:AT Abteilung IT/3, IT – Systeme für Unterrichtszwecke ViS:AT Österreichische Bildung auf Europaniveau BM:UKK Apple.
German Early Level The Weather.
Negation is when you dont have or dont do something.
Sentence Structure Subject and verb are always together. Subject and verb are always together. Subject and verb must agree Subject and verb must agree.
KIT – die Kooperation von Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Vorlesung Knowledge Discovery - Institut AIFB Tempus fugit Towards.
1 Stevens Direct Scaling Methods and the Uniqueness Problem: Empirical Evaluation of an Axiom fundamental to Interval Scale Level.
Technische Universität München Spatial aspects of the formation of GMO-free or GMO clubs Maarten J. Punt Technische Universität München.
Lehrstuhl für Waldbau, Technische Universität MünchenBudapest, 10./11. December 2006 WP 1 Status (TUM) Bernhard Felbermeier.
Selectivity in the German Mobility Panel Tobias Kuhnimhof Institute for Transport Studies, University of Karlsruhe Paris, May 20th, 2005.
Forschungsinstitut Betriebliche Bildung Markierung für aktuelles Hauptthema. Im Folienmaster kopieren und auf der jeweiligen Einzelfolie rechts neben dem.
Andreas Burger ZENTRUM FÜR MEDIZINISCHE LEHRE RUHR-UNIVERSITÄT BOCHUM Irkutsk October 2012 Report about the lecture "Report of the TEMPUS IV- Project Nr.
Proposal for a unified XML-file of the avalanche report... © Patrick NAIRZ, Avalanche Warning Center Tyrol - Why do we need a unified XML-file? - Integration.
© Handwerkskammer für München und Oberbayern, Max-Joseph-Straße 4, München Dietmar Schneider Foreign Trade Department of the Chamber of Trade and.
Unabhängiger Monitoringausschuss zur Umsetzung der UN-Konvention über die Rechte von Menschen mit Behinderungen MonitoringAusschuss.at Monitoring the UN.
LLP DE-COMENIUS-CMP Dieses Projekt wurde mit Unterstützung der Europäischen Kommission finanziert. Die Verantwortung für den Inhalt dieser.
 Präsentation transkript:

Testing the Importance of Cleansing Procedures for Overlaps in German Administrative Data. Patrycja Scioch (Research Data Centre of the BA at the IAB, Germany) I’d like to present a part of a research project, that analyses variations in processing administrative data and the stability of evaluation-results, based on these different processed data. Why is this of interest? New Techniques and Technologies for Statistics, 18.-20.2.2009

Motivation increasing importance of using administrative data for research in Germany we have two types of such data: collected for official statistical purposes by-product of administration (e.g. federal employment services) administrative data: not collected for research different and independent sources of data merging may cause contradictions in information In the last couple of years researchers more and more detected administrative data as the basis for their analyses. In Germany two types of data relevant for researchers can be identified: first – data for official statistics, which are collected and processed for administrative needs. These are in most of the cases survey data, which are available for researchers too. Second kind of data are data which arise from administration processes, for example by-products of the daily business in the employment services. This kind of data gets more and more attention from research and is also the centre of my studies. The main characteristics of these data are the following ones: - these data are not collected for research, that means, that they are not in the condition a researcher would prefer to have, - in most cases they are collected from different sources. Which are independent and so the merging of theses information may cause contradictions. So why is it then a good idea to to use them? Because it does not cost anything to produce these data, they are by-products and so cheaper than surveys and they contain a lot of information. BUT, and there is always a but, it costs a lot of time to bring them to the desired shape and not always all information of interest is in one data set, so a combination is required, which leads to other problems and makes further research in quality necessary.

The Integrated Employment Biographies - IEB combination of four different sources: Employee History Benefit Recipient History Applicants Pool Data Participants in Measure Dataset subsample: 2.2% random sample latest update 2006 characteristics: daily records splitted into episodes quality depends on source of information The data set I use is the Integrated Employment Biographies, created by the Institute for employment Research. These are individual data, not aggregated. It is a combination of data out of four different and independent sources. It contains information on employment periods and also times when persons receive wage substitution from the federal employment agency. Furthermore periods on search for employment are integrated as well as times of participation in measures. This is a huge dataset, with about 65 Million individuals and 950 million records. To simplify matters I use a 2.2 % random sample. The data records are splitted into episodes, so that spells are not overlapping but are exactly parallel. These parallel spells of different sources cause a lot of problems, due to the fact that their quality depends on the respective source and there may be contradictory statements belonging to the same person at the same point of time and one does not now which information to believe.

Literature previous findings: open issues: concentrate on the analysis of overlaps - qualitative and quantitative (Jaenichen et. al (2005), Bernhard et. al (2006)) correction of single variables (Waller, M. (2007), Kruppe et. al (2007)) evidence: need for data processing in the IEB the way heavily depends on the research question open issues: impact on estimates data processing by transformation of structure of dataset To improve the quality of this data some studies where done, like the ones of Jaenichen et al and Bernhard et al, who try to identify the most common overlaps and inconsistencies and propose options to deal with them properly. Waller and Kruppe et al were investigating the impact of single variables. Waller analysed the correction of enddates of training measures and Kruppe et al find 60 different definitions of unemployment and analysed their implication into the data. The conclusion of all of these papers is that there is a crucial need to put effort into data processing and that the way this should be done depends heavily on the underlying research question. Not analysed, or in an minor degree is the effect of correcting data in different ways on the results of estimations. Also an interesting point to look at is to transform the datastructure to process the data. These open issues are the matter of my study.

Identification/Method assumptions: dataset → processing → method → result within the Case: Wunsch/Lechner (2007) evaluation of labour market programmes in West Germany analyses by comparing matching-estimates time-dependent employment opportunities as outcome step: replication of the data processing and variations of the analysis sample step: replication of the evaluation study 3. step: analyses of the effects of the variations on the results How did I detect this question? Lets make it simple and say, that researchresults depend on the underlying data, the processing of the data and the method to analyse the data, regression, matching and so on. If I now take the same data and keep the method constant, then differences of the results should be attributed to the processing procedures. To follow up this Idea I perform a replication study within the case of a paper of Wunsch/lechner from last year. They are evaluating labour market programmes in West Germany by matchingestimations. Matching is the Comparison of the employment state or other things of two individuals, where the two are the same in there characteristics except that one of them took part in a programme and the other one did not. The aim is to make conclusions about the effectiveness of labour market programmes. In the first step I will replicate the processing of the data and then vary these procedures. So different analysis samples are created, which build the basis for the next step, the evaluation. The matching remains the same for each analysis sample. The last step is to analyse the differences of the estimation results and to draw conclusions about the effect of the processing.

‚Matching-estimatior‘ - fix Approach/Framework analysis- sample V0 outcome V0 IEB - data set analysis- sample V1 outcome V1 Outcome ? analysis- sample V2 outcome V2 A simple illustration of the workflow is shown here, with the basis data on the left, which is processed in different ways to gain the analysis samples. V0 is the result of processing like Wunsch/Lechner did and V1 and V2 are variations I made. By keeping the estimation fix 3 outcomes are received, which are compared with each other and maybe there are no differences and the processing has noch impact, or there are some and they have to be interpreted. Processing - variable ‚Matching-estimatior‘ - fix Comparison

Processing rules time windows of two weeks multiple possibilities of spells (different sources, overlaps) goal: exact one state for each period Sort by duration and priority of source Choose the two with capital importance Select one final state using more priority-rules different analysis samples How are the data processed in particular? The periods are divided into time windows of two weeks. every two-week-window may have paralell spells of different sources or even of the same source and they don’t need to give identical information. Now it is not so easy to say which spell of which source is the right one and therefore which one to choose. So the aim is to determine one state for each period, means two-week-window. This is done by sorting them by duration within the two weeks and the priority of the source the information comes from. This priority is defined before and will be explained later on. Then the two spells with capital importance are chosen and out of them the final state for the two-week-spell is selected following further priority-rules. Changing the Priority leads to another selection and so different analysis samples are gained.

Rules of Priority Priority Model V0 Model V1 Model V2 1 Programme Employment 2 Benefits 3 4 Applicants Differences: Model V1 prefers employment-spells to benefit-spells compared to V0 Model V2 downgrades participation in programmes and prefers employment In this table you can see the Priorities in the different models. In column 1 you see the importance of the source, 1 is highest priority and 4 the lowest. Model V0 in column2 corresponds to Wunsch/Lechner where participating in a programme is more important than receiving benefits or being employed. That is because they are evaluating labour market programmes and so they attach great importance to them. The second is the receipt of benefits, because money is paid and when money is involved you can act on the assumption that the data are correct. Nearly the same argument comes along with employment-spells. These are relative reliable, because these are notices of the employer about their employees from the notification procedure for health, pension and unemployment insurance and therefore again linked with money. The Applicants Pool Data is not very reliable, cause it contains data, which are optional and so not always recorded or often with less care. Model V1 in Column 3 differs in the point that the priority of benefitreceipt and times of employment are inverted. That because of the Fact that both are relative reliable and no one can say which one is more valid. In the last Model V2 the Priority changed with respect to Model V1, by giving programmeparticipation the lowest priority. This comes from the consideration that these data are recorded in the employment agencies before the programme took place and afterwards no one knows if the unemployed really participated or not. More often than not this is the case, but you can never say with certainty. The differences I expect are more employment in Model V1 and less participation in V2 respective to V0

Results before starting the estimation programme – benefit – employment – applicant state 1 state 2 final state window 1 (x3) benefit employment window 2 applicant window 3 (x2) programme analysis- sample V0 programme – employment – benefit – applicant state 1 state 2 final state window 1 benefit employment window 2 (x2) window 3 applicant window 4 programme Window 5 IEB- data set analysis- sample V1 employment – programme – benefit – applicant state 1 state 2 final state window 1 benefit employment window 2 (x2) window 3 applicant window 4 programme Window 5 Here you can see a simple example. The tables show two main states (green) within the two-week-windows for the person and the final state selected out of the two in the last column. Time-windows with exact the same contents were deleted for simplicity and space. analysis- sample V2

Descriptive results Participants: differences between sample V0 and V1, V2 different magnitudes insignificant Group of Non-Participants: significant differences not of practical importance

Estimation results - 1 Effects of programme participation compared to non-participation 11 11

Estimation results - 2 Variance in the estimation results 12 12

Summary/Prospects large insignificant differences during lock-in-effect smaller at the end of observation period => The Effect does not depend on the procedure (only the extent)! => Rules are necessary, but time + effort should not exceed benefit! creation of a “naive”-model comparison with other countries

Thank you for your attention! Patrycja.Scioch@iab.de http://fdz.iab.de/

Back-Up References Bernhard, S., Dressel, C., Fitzenberger, B. und Schnitzlein, D. (2006): Überschneidungen in der IEBS: Deskriptive Auswertung und Interpretation, FDZ Methodenreport 4/2006, Nürnberg. Jaenichen, U., Kruppe, T., Stephan, G., Ullrich, B. und Wießner, F. (2005): You can split it if you really want: Korrekturvorschläge für ausgewählte Inkonsistenzen in IEB und MTG, FDZ Datenreport 4/2005, Nürnberg. Kruppe, T., Müller, E., Wichert, L. und Wilke, R. (2007): On the Definition of Unemployment and ist Implementation in Register Data – The Case of Germany, FDZ Methodenreport 3/2007, Nürnberg. Waller, M. (2007): Do Reported End Dates of Treatments Matter for Evaluation Results?, FDZ Methodenreport 1/2007, Nürnberg. Wunsch, C. und Lechner, M. (2007): What Did All the Money Do? On the General Ineffectiveness of Recent West German Labour Market Programmes, University of St. Gallen Department of Economics working paper series 2007 2007-19, Department of Economics, University of St. Gallen.