Die Präsentation wird geladen. Bitte warten

# Analysis Thema 9 / Analysis Grand Alexandra.

## Präsentation zum Thema: "Analysis Thema 9 / Analysis Grand Alexandra."—  Präsentation transkript:

Analysis Thema 9 / Analysis Grand Alexandra

Analysis 3. Inferential Statistics 2. Descriptive Statistics
testing hypotheses and models 2. Descriptive Statistics describing the data 1. Data Preparation organizing the data

Conclusion Validity Conclusion Validity Internal Validity
Is there a relationship between two variables (between cause and effect)? Assuming that there is a relationship in this study, is the relationship a causal one? conclusion there is a relationship there is no relationship Is the conclusion about the relationship reasonable? „Je mehr Fernseher vorhanden sind, desto schlechter wird die PISA-Leistung.“ (Presse, PISA-Sieger: Weiblich und ohne TV) „third variable“?

signal-to-noise ratio problem
Threats to conclusion validity Incorrect conclusion about a relationship in the observation 1. conclude that there is no relationship when in fact there is „missing the needle in the haystack“ signal-to-noise ratio problem „noise“ – factors that make it hard to see the relationship „signal“ – relationship you are trying to see 2. conclude that there is a relationship when in fact there is not „seeing things that aren´t there“

Threats to conclusion validity
„Finding no relationship when there is one“ conclusion reality no relationship relationship threats: low reliability of measures low reliability of treatment implementation random irrelevancies in the setting random heterogeneity of respondents -> low statistical power violation of assumptions of statistical tests „noise“ producing factors add variability „Finding a relationship when there is not one“ conclusion reality relationship no relationship threats: fishing and the error rate problem violation of assumptions of statistical tests

Improving Conclusion Validity
good statistical power (should be > 0.8) power = „the odds of saying that there is an relationship, when in fact there is one“ Factors that affect power: sample size: use lager sample size effect size: increase effect size (e.g. increase the dosage of the program) signal -> increase noise -> decrease α-level: raise the alpha-level good reliability -> reduce „noise“ good implementation

β-error (Type II Error)
Statistical Inference Decision Matrix two mutually exclusive hypotheses (H0, HA) decision: which hypothesis to accept and which to reject REALITY H0 is true HA is true decision right 1-α (e.g. 0.95) confidence level decision wrong β (e.g. 0.20) β-error (Type II Error) α (e.g. 0.05) α-error (Type I Error) significance level 1-β (e.g. 0.80) Power accept H0 CONCLUSION accept HA

Statistical Inference Decision
H0 right HA right 1-α 1-β POWER β α what we want: high power and low Type I Error problem: the higher the power the higher the Type I Error

Practical Ein in „Wirklichkeit“ hochbegabtes Kind wird als nicht hochbegabt diagnostiziert. Um welchen Fehler handelt es sich in diesem Fall?  α-Fehler (Fehler 1. Art/ Type I Error)  β-Fehler (Fehler 2. Art/ Type II Error) Das Ergebnis einer Studie: WU-StudentInnen mit HAK-Abschluss erreichen eine höhere Punkteanzahl bei der MC-Prüfung in Buchhaltung. In Wirklichkeit gibt es aber keinen Unterschied zwischen HAK- und nicht HAK-Absolventen hinsichtlich der erreichten Punkteanzahl. Um welchen Fehler handelt es sich in diesem Fall?  α-Fehler (Fehler 1. Art/ Type I Error)  β-Fehler (Fehler 2. Art/ Type II Error)

Practical Kreuzen Sie die richtige Antwort an und stellen Sie die falschen Antworten richtig. Durch Erhöhung des α-Fehlers von 0.01 auf 0.05 … sinkt die Power (Teststärke) sinken die Chancen einen Fehler 1. Art zu machen sinken die Chancen einen β-Fehler zu machen  ist der Test restriktiver steigt steigen weniger restriktiv

Analysis Beispieldatensatz „Arbeitszufriedenheit“ – AZ
Datensatz: AZ.sav Hinweis: Die Daten wurden zu Illustrationszwecken aus einem Datensatz* willkürlich gewählt! Etwaige Ergebnisse sollten daher nicht allzu ernst genommen werden. Stichprobengröße: n = 15 Variablen: dichotom: SEX, Items zu den Konstrukten Arbeitszufriedenheit** (AZ_... ), Betriebsklima** (BK_...), Arbeitsbelastung** (AB_... ) ordinal: POSITION (Position im Betrieb) metrisch: MITARB (Anzahl der Mitarbeiter), NETTO (monatl. Nettoverdienst in €) neue Variable: AZ „Arbeitszufriedenheit“(Annahme: intervallskaliert!)  Summenscore der einzelnen Variablen AZ_... * Böhnisch, B., Grand, A., Rechberger, R., Wimmer, W. (2006). Berufliche Zufriedenheit. Seminararbeit aus Empirische Forschungsmethoden. ** Items wurden übernommen von: Giegler, H. (1985). Rasch-Skalen zur Messung von „Arbeits- und Berufszufriedenheit“, „Betriebsklima“ und „Arbeits- und Berufsbelastung“ auf Seiten der Betroffenen.

1. Data Preparation Logging the data (Checking the data for accuracy)
Developing a database structure – Codebook (Kodierungsschema) Entering the data into the computer (once only entry or double entry); Checking the data for accuracy Data Transformation missing values item reversals (example: transform reversal items e.g. BK_2: old value: 1 „agree“, 2 „disagree“ -> new value: 2 „agree“, 1 „disagree“) recode variables (example: transform items „AZ_...“, „AB_...“, “BK_...“: old value: 1 „agree“, 2 „disagree“ -> new value: 1 „agree“, 0 „disagree“) scale totals (example: generate new variable „AZ“ (Arbeitszufriedenheit))  to get a total score for AZ add across the individual items AZ_...,) categories

1.Data Preparation - Codebook
ID SEX 1 2 MITARB NETTO 3 POSITION 2 The codebook should include: variable name variable description variable format instrument/method of collection date collected respondent or group variable location in database 1 1 2 AZ_1 - BK_2 AB_3 AZ_4 BK_5

1. Data Preparation - Checking data for accuarcy
summarize (e.g. frequency table) and check the data are the listed values reasonable? („wild codes“, outlier/Ausreißer) are there missing values? („missing values“) outlier/Ausreißer it acutally is an outlier or error in data entry „wild code“ „missing values“ „missing values“ there exist no data or data weren´t entered

2. Descriptive Statistics
„quantitative description in a manageable form“ describe basic features of the data, provide simple summaries simple graphics analysis Univariate Analysis - Analysis of one variable at a time Description of a single variable: distribution central tendency (Lagemaß) dispersion (Streuungsmaß) Bivariate Analysis – Analysis of two variables at a time Multivariate Analysis – Analysis of multiple variables at a time

Frequency distribution absolute Häufigkeiten relative Häufigkeiten
2. Descriptive Statistics - Distribution Frequency distribution t a b l e g r a p h absolute frequencies relative frequencies absolute frequencies relative frequencies Frequency table: Geschlecht pie chart bar chart boxplot histogram (stem and leaf diagram) Geschlecht absolute Häufigkeiten relative Häufigkeiten männlich 8 53% weiblich 7 47% crosstab

2. Descriptive Statistics - Distribution
Kreisdiagramm - Geschlecht Balkendiagramm - Position g r a p h s Histogramm – monatl. Nettoverdienst Boxplot – Anzahl der Mitarbeiter

Central Tendencies / LAGEMASSE
2. Descriptive Statistics – Central Tendency Central Tendencies / LAGEMASSE Mean (Mittelwert) Median Modus computation „most frequently occuring value“ „sum of values xi / number n of values“ „center of the sample“ data metric data ordinal data metric data nominal data ordinal data metric data if distribution is approx. normal distributed not robust against single extreme values („outliers“) robust against outliers robust against outliers adequacy

2. Descriptive Statistics – Central Tendency / Practical
Berechnen Sie den Mittelwert, Median und Modus der Variablen SEX, MITARB (Anzahl der Mitarbeiter) und POSITION - Achten Sie dabei auf eine sinnvolle Anwendung! Hilfestellung: aufsteigende Sortierung der Variablen Mitarbeiter und Position

2. Descriptive Statistics – Distribution / Practical_Solution
Variable Mean Median Modus Mitarbeiter 48.5 18 7 Position - 2 1 Geschlecht

Dispersions/ STREUUNGSMASSE
2. Descriptive Statistics - Dispersion Dispersions/ STREUUNGSMASSE Range / Spannweite Variance s² Standard Deviation s = computation „average of the sum of the squared deviations “ „square root of the variance“ „highest value minus lowest value“ data metric data metric data ordinal data metric data

Dispersions/ STREUUNGSMASSE
2. Descriptive Statistics - Dispersion Dispersions/ STREUUNGSMASSE Interquartile range IQR max 25% „difference between third and first quartile“ 3. quartile (Q3): 75% of the cases fall below this value 1. quartile (Q1): 25% of the cases fall below this value median: 50% of the cases fall above and below this value Q3 computation 25% IQR Q2 = median 25% Q1 data metric data 25% min adequacy robust against outliers

2. Descriptive Statistics – Dispersion / Practical_Solution
Berechnung der Varianz, Standardabweichung und der Spannweite der Variable NETTO (Nettoverdienst): n = 15, mean = 1553,3 ; min = 200, max = 2800 Steps (Variance): 1. compute distance between each value and the mean 2. square each discrepancy 3. sum the squares to get the Sum of Squares (SS) value 4. divide the SS by n - 1 Variable Variance Standard Deviation Range (Spannweite) Min Max Netto- verdienst 2600 200 2800

Correlation Correlation
„A correlation is a single number that describes the degree of relationship between two variables“ correlation coefficient between -1 < r < 1 the higher the absolute r-value, the stronger the relationship between the variables uncorrelated r = 0 positive correlation r > 0 positive relationship  the higher the x-values the higher the y-values on average negative correlation r < 0 negative relationship  the higher the x-values the lower the y-values on average and vice versa exact linear correlation r = 1 (positive), r= -1 (negative)

Correlation - Example Example:
Is there a relationship between the variable „Nettoverdienst“ and the variable „Arbeitszufriedenheit“? If yes, … Which type of relationship? How strong is the relationship? Is the correlation significant? Descriptive statistics for „Nettoverdienst“ and „Arbeitszufriedenheit“ Variable Mean StDev Variance Sum Min Max Range Netto-verdienst 23300 200 2800 2600 Arbeits-zufried. 5.20 2.178 4.743 78 1 9 8

Example - Descriptive Statistics
Boxplot – Arbeitszufriedenheit (AZ) Boxplot – monatl. Nettoverdienst in €

Example – 1. Which type of relationship?

Example – 2. How strong is the relationship?
Product-Moment-Correlation (Pearson) variables (x,y) are metric and normal distributed Calculating the correlation SPSS-Output: Korrelation AZ/NETTO

Example – Q-Q Plot Q-Q Plot: AZ (Arbeitszufriedenheit)
Q-Q Plot: monatl. Nettoverdienst in €

Example – 3. Is the correlation significant?
Testing the Significance of a Correlation Null Hypothesis: r = 0 Alternative Hypothesis: r <> 0 Steps: determine the significance level alpha-level compute the degrees of freedom df one-tailed or two-tailed test? look at the critical value α = 0.05 df = N-2 -> = 13 two-tailed test

Example – 3. Is the correlation significant?
Auszug: t-Verteilungen für Produkt-Moment-Korrelationen SPSS-Output: Korrelation AZ/NETTO correlation is significant: r (0.692) > rcrit (0.514)

Correlation Matrix symmetric matrix
relationships between all possible pairs of variables e.g. between C1,…,C10  45 unique correlations N*(N-1) / 2

Other correlations Pearson Product Moment
(bivariate normal distribution, variables on interval scale) Spearman rank Order Correlation (rho) (two ordinal variables) Kendall rank order Correlation (tau) Point-Biserial Correlation (one variable is on a continuous interval level and the other is dichotomous)

Literatur Basisliteratur:
Trochim, W. & Donelly, J.: The Research methods Knowledge Base (3rd edition) Atomic Dog Internet WWW page, URL: (version current as of October 20, 2006). Bortz, J., Döring, N. (2006). Forschungsmethoden und Evaluation. Heidelberg: Springer Verlag. Hatzinger, R. (2006). Angewandte Statistik mit SPSS. Wien: Facultas. Hatzinger, R. , Nagel, H. (2009). PASW Statistics. Statistische Methoden und Fallbeispiele. München: Pearson Studium. Nagel, H. (2003). Empirische Sozialforschung.

Ähnliche Präsentationen