QUANTITATIVE DATA ANALYSIS USING SPSS AN INTRODUCTION FOR HEALTH AND SOCIAL SCIENCE Pete Greasley Quantitative Data Analysis Using SPSS Quantitative Data Analysis Using SPSS An Introduction for Health & Social Science Pete Greasley Open University Press McGraw-Hill Education McGraw-Hill House Shoppenhangers Road Maidenhead Berkshire England SL6 2QL email: enquiries@openup. co. uk world wide web: www. openup. co. uk and Two Penn Plaza, New York, NY 10121-2289, USA First published 2008 Copyright © Pete Greasley 2008 All rights reserved.
Except for the quotation of short passages for the purpose of criticism and review, no part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the publisher or a licence from the Copyright Licensing Agency Limited. Details of such licences (for reprographic reproduction) may be obtained from the Copyright Licensing Agency Ltd of Saffron House, 6–10 Kirby Street, London EC1N 8TS.
A catalogue record of this book is available from the British Library ISBN-10: 0 335 22305 2 (pb) 0 335 22306 0 (hb) ISBN-13: 978 0335 22305 3 (pb) 978 0335 22306 0 (hb) Library of Congress Cataloging-in-Publication Data CIP data applied for Typeset by Re? neCatch Limited, Bungay, Suffolk Printed in the UK by Bell and Bain Ltd, Glasgow Contents Introduction 1 5 5 7 7 13 16 17 18 19 19 20 21 22 25 27 28 31 33 33 38 41 41 45 46 50 52 56 57 58 58 60 61 61 61 63 1 A questionnaire and what to do with it: types of data and relevant analyses 1. 1 The questionnaire 1. What types of analyses can we perform on this questionnaire? 1. 2. 1 Descriptive statistics 1. 2. 2 Relationships and differences in the data 1. 3 Summary 1. 4 Exercises 1. 5 Notes 2 Coding the data for SPSS, setting up an SPSS database and entering the data 2. 1 The dataset 2. 2 Coding the data for SPSS 2. 3 Setting up an SPSS database 2. 3. 1 De? ning the variables 2. 3. 2 Adding value labels 2. 4 Entering the data 2. 5 Exercises 2. 6 Notes 3 Descriptive statistics: frequencies, measures of central tendency and illustrating the data using graphs 3. 1 Frequencies 3. Measures of central tendency for interval variables 3. 3 Using graphs to visually illustrate the data 3. 3. 1 Bar charts 3. 3. 2 Histograms 3. 3. 3 Editing a chart 3. 3. 4 Boxplots 3. 3. 5 Copying charts and tables into a Microsoft Word document 3. 3. 6 Navigating the Output Viewer 3. 4 Summary 3. 5 Ending the SPSS session 3. 6 Exercises 3. 7 Notes 4 Cross-tabulation and the chi-square statistic 4. 1 Introduction 4. 2 Cross-tabulating data in the questionnaire 4. 3 The chi-square statistical test vi 4. 4 4. 5 4. 6 4. 7 4. 8 Quantitative Data Analysis Using SPSS
Levels of statistical signi? cance Re-coding interval variables into categorical variables Summary Exercises Notes 67 69 73 73 75 77 77 77 78 80 82 83 84 85 86 86 88 88 88 95 100 101 102 103 103 103 107 109 110 130 133 135 5 Correlation: examining relationships between interval data 5. 1 Introduction 5. 2 Examining correlations in the questionnaire 5. 2. 1 Producing a scatterplot in SPSS 5. 2. 2 The strength of a correlation 5. 2. 3 The coef? cient of determination 5. 3 Summary 5. 4 Exercises 5. 5 Notes 6 Examining differences between two sets of scores 6. 1 Introduction 6. Comparing satisfaction ratings for the two counsellors 6. 2. 1 Independent or related samples? 6. 2. 2 Parametric or non-parametric test? 6. 3 Comparing the number of sessions for each counsellor 6. 4 Summary 6. 5 Exercises 6. 6 Notes 7 Reporting the results and presenting the data 7. 1 Introduction 7. 2 Structuring the report 7. 3 How not to present data Concluding remarks Answers to the quiz and exercises Glossary References Index Introduction I remember reading somewhere that for every mathematical formula, included in a book, the sales would be reduced by half.
So guess what, there are no formulas, equations or mathematical calculations in this book. This is a practical introduction to quantitative data analysis using the most widely available statistical software – SPSS (Statistical Package for the Social Sciences). The aim is to get students and professionals past that ? rst hurdle of dealing with quantitative data analysis and statistics. The book is based upon a simple scenario: a local doctor has conducted a brief patient satisfaction questionnaire about the counselling service offered at his health centre.
The doctor, having no knowledge of quantitative data analysis, sends the data from 30 questionnaires to you, the researcher, for analysis. The book begins by exploring the types of data that are produced from this questionnaire and the types of analysis that may be conducted on the data. The subsequent chapters explain how to enter the data into SPSS and conduct the various types of analyses in a very simple step-by-step format, just as a researcher might proceed in practice. Each of the chapters should take about an hour to complete the analysis and exercises.
So, in principle, the basic essentials of quantitative data analysis and SPSS may be mastered in a matter of just six hours of independent study. The chapters are listed below with a brief synopsis of their content. • Chapter 1 A questionnaire and what to do with it: types of data and relevant analyses. The aim of this chapter is to familiarize yourself with the questionnaire and the types of analyses that may be conducted on the data. • Chapter 2 Coding the data for SPSS, setting up an SPSS database and entering data.
In this chapter you will learn how to code the data for SPSS, set up an SPSS database and enter the data from 30 questionnaires. • Chapter 3 Descriptive statistics: frequencies, measures of central tendency and visually illustrating data using graphs. In this chapter you will use SPSS to produce some basic descriptive statistics from the data: frequencies for categorical data and measures of central tendency (the mean, median and mode) for interval level data. You will also learn how to produce and edit charts to illustrate the data analysis, and to copy your work into a Microsoft Word ? e. • Chapter 4 Cross-tabulation and the chi-square statistic. In this chapter you will learn about cross-tabulation for categorical data, a statistical test (chisquare) to examine associations between variables, and the concept of statistical signi? cance. You will also learn how to re-code interval data into categories. • Chapter 5 Correlation: examining relationships between interval data. In this 2 Quantitative Data Analysis Using SPSS chapter you will learn about scatterplots and correlation to examine the direction and strength of relationships between variables. Chapter 6 Examining differences between two sets of scores. In this chapter you will learn about tests which tell us if there is a statistically signi? cant difference between two sets of scores. In so doing you will learn about independent and dependent variables, parametric and non-parametric data, and independent and related samples. There is also a ? nal concluding chapter which provides advice on how to structure the report of a quantitative study and how not to present data. The approach Quantitative data analysis and statistics is often a frightening hurdle for any students in the health and social sciences, so my primary concern has been to make the book as simple and accessible as possible. This quest for simplicity starts with the fact that the student has only one dataset to familiarize themselves with – and that dataset itself is very simple: a patient satisfaction questionnaire consisting of just ? ve questions. The questionnaire is, however, designed to yield a range of statistical analyses and should hopefully illustrate the potentially complex levels of analyses that can arise from just a few questions.
This will also act as a warning to students who embark upon research projects involving complex designs without fully appreciating how they will actually analyse the data. My advice to students who are new to research is always to ‘keep it simple’ and, where possible, to design the study according to the statistics they understand. I have taken a pragmatic approach to quantitative data analysis which means that I have focused on the practicalities of doing the analyses rather than ruminating on the theoretical underpinnings of statistical principles.
And since actually doing the analyses requires knowledge of appropriate statistical software, I have chosen to illustrate this using the most widely available and comprehensive statistical package in universities: SPSS. Thus, by the end of this book you should not only be able to select the appropriate statistical test for the data, you should also be able to conduct the analysis and produce the results using SPSS. The scope of the book I have set a distinct limit to the level of analysis which I think is appropriate for an introductory text.
This limit is the analyses of two variables – known as bi-variate analyses. In my experience of teaching health and social science students, most of whom are new to quantitative data analysis and statistics, this is suf? cient for an introduction. Also, I did not want to scare people off with a more imposing tome covering things like logistic regression and factorial ANOVA. There are many other books which include these more advanced statistics, some of which are listed in the references.
This book is designed to get people started with quantitative data Introduction 3 analysis using SPSS; as such it may provide a platform for readers to consult these texts with more con? dence. The audience: health and social sciences As an introduction to quantitative data analysis, this book should be relevant to undergraduates, postgraduates or diploma level students undertaking a ? rst course in quantitative research methods. I have used these materials to teach students from a variety of backgrounds including health, social sciences and management.
It may be particularly relevant for students and professionals in health and social care, partly due to the subject matter (a patient satisfaction questionnaire about counselling) and the examples used throughout the text, but also due to the design of the materials. Many students and professionals in health and social care are studying part-time or by distance learning, or perhaps undertaking short courses in research methods. This means that their opportunities for attendance are often limited and courses need to be designed to cater for this mode of study, for example, attendance for one or two days at a time.
It is with these students in mind that these materials should also be suitable for independent study. After the introductory chapter outlining the types of data and analyses, the book continues with step-by-step instructions for conducting the analysis using SPSS. Furthermore, the practical approach should suit professionals who may wish to develop their own proposals and conduct their own research but have limited time to delve into the theoretical details of statistical principles.
In health studies the emphasis on evidence-based practice has reinforced the need for professionals to not only understand and critically appraise the research evidence but also to conduct research in their own areas of practice. This book should provide professionals with a basic knowledge of the principles of quantitative research along with the means to actually design and conduct the analysis of data using SPSS. For lecturers This book is an organized course divided into six chapters/sessions which may be delivered as a combination of lectures and practical sessions on SPSS.
I have delivered this course in three ways: 1 First, as a series of ? ve weekly lectures and practical sessions (two–three hours) for the ? rst half of a postgraduate module on quantitative and qualitative data analysis. The ? rst session primarily consists of a lecture introducing the questionnaire, the dataset and relevant analyses (Chapter 1) before moving on to enter the data (Chapter 2). Thereafter, each of the remaining four sessions consists of an introductory lecture discussing the analysis in subsequent chapters (descriptive statistics and graphs, cross-tabulation and chi-square, Quantitative Data Analysis Using SPSS correlation, examining differences in two sets of scores) before moving onto SPSS to conduct the analyses and exercises in each chapter. In the ? nal session I also include discussion of writing up the results and reporting more generally (Chapter 7). For a full module of 10–11 sessions this book could either be supplemented by additional materials covering more advanced analyses (e. g. , ANOVA and regression analysis) or students could design (and conduct) their own study (in groups) based upon the analyses covered in the book. A one/two day course for a postgraduate module on Research Methods. This starts with formal lecture introducing the questionnaire, the dataset and relevant analyses (Chapter 1), and then students (in pairs) work through the materials at their own pace, continuing with independent study. The practical sessions may be interspersed with brief lectures reviewing the types of analyses. 3 A half-day workshop on SPSS. Again, this begins with a brief introduction to the questionnaire, the dataset and types of analyses, with guided instruction on speci? c exercises from each of the chapters.
Though it has to be said that a half day is not really suf? cient time to cover the materials (in my view, and according to the student evaluations! ) This is especially the case for students with little prior knowledge of statistics. Where an assignment has been set for the course, students have been asked to produce a report for the doctor who requested the analysis. This requires students to write a structured report in which the ‘most relevant analyses’ are presented along with some discussion of the results, critical re? ections on the survey and recommendations for further research.
Getting a copy of SPSS SPSS, as noted above, is the most widely used software for the statistical analysis of quantitative data. It is available for use at most universities where staff and students can usually purchase their own copy on cd for ? 10–? 20. The licence, which expires at the end of each year, can be renewed by contacting the supplier at the university who will provide the necessary ‘authorization’ code. Acknowledgements Thanks to all the students who have endured evolving versions of this text, to the publishing people for coping with the numerous ? ures and screenshots, and to Wendy Calvert (proof-reader extraordinaire). 1 A questionnaire and what to do with it: types of data and relevant analyses The aim of this chapter is to familiarize yourself with the questionnaire and the types of analyses that may be conducted on the data before we go onto SPSS. By the end of the chapter you should be familiar with: types of data and levels of measurement; frequencies and cross-tabulation; measures of central tendency; normal and skewed distributions of data; correlation and scatterplots; independent and dependent variables. 1. 1 The questionnaire
A local general practice (family practice or health centre for those outside the UK) has been offering a counselling service to patients for over a year now. The doctor at the practice refers patients to a counsellor if they are suffering from mild to moderate mental health issues, like anxiety or depression. 1 The doctor decided that he wanted to evaluate the service by gathering some information about the patients referred for counselling and their satisfaction with the service. So, he designed a brief questionnaire and sent it to every patient who attended for counselling over the year.
The doctor had referred 30 patients to the service and was delighted to ? nd that all 30 returned the questionnaire. But then he realized he had a bit of a problem – he did not know how to analyse the data! That is when he thought of you. So, with a polite accompanying letter appealing for help, he sends you the 30 completed questionnaires for analysis. A copy of the questionnaire is provided in Figure 1. 1. The ? rst thing you notice is that he has collected some basic demographic data about the gender and age of the patients.
Then you see that he has asked whether they saw the male or female counsellor – that might be interesting in terms of satisfaction ratings: perhaps one received higher ratings than the other? He has also collected information about the number of counselling sessions conducted for each patient because, he tells you in the letter, the counsellors are supposed to offer ‘brief therapy’ averaging six sessions. Are they both abiding by this? Finally, you see that patients were asked to rate their satisfaction with the service on a seven point scale. Will the ratings depend on the sex or age of the patient?
Perhaps they would be related to the number of counselling sessions or, as noted above, which particular counsellor the patient saw. Well, you think, that is not too bad – at least it is simple. But what sorts of analysis can you do on this questionnaire? (See Box 1. 1 for a brief discussion of some questionnaire design issues. ) 6 Quantitative Data Analysis Using SPSS Figure 1. 1 Counselling service: patient satisfaction questionnaire Box 1. 1 Questionnaires: some design issues While this is not the place for a full discussion of questionnaire design issues, there are some cardinal rules that should be brie? noted. First, make sure the questions are clear, brief and unambiguous. In particular avoid ‘double questions’, for example: ‘Was the room in which the counselling took place quiet and comfortable? Well, it was comfortable but there was a lot of noise from the next room . . . ‘ Second, make sure that the questionnaire is easy to complete by using ‘closed questions’ with check boxes providing the relevant options that respondents can simply tick. So, you should avoid questions like: ‘Q79: Please list all the times you felt anxious, where you were, who you were with, and what you’d had to eat the night before’.
The more you think through the options before, the less work there will be later when it comes round to analysing the data. As Robson (2002: 245) points out: ‘The desire to use open-ended questions appears to be almost universal in novice survey researchers, but is usually extinguished with experience . . . ‘ Piloting the questionnaire, which is important to check how respondents may interpret the questions, can also provide suggestions for closed alternatives. There are some occasions, however, when ‘open questions’ are necessary to provide useful information.
For example, the question about level of satisfaction with the service may have bene? ted from a comments box to allow patients to expand on issues relating to satisfaction. An alternative strategy may have been to use more scales to measure different dimensions of satisfaction (for example, relating to the counsellor, the room in which counselling was conducted, the referral procedure, etc. ) Another issue is the design of the scale used to measure satisfaction. The doctor might have used a more typical Likert-type format where the respondent indicates the extent to which they agree or disagree with a statement:
A questionnaire and what to do with it I was satis? ed with the service: Strongly disagree Disagree Undecided Agree Strongly agree 7 Notice that there are only ? ve options here (and they are labelled). The format you use will depend on the context and the level of sensitivity you require, which may result in a seven or nine point scale. Also notice that whatever the length of the scale, there is an option for a ‘neutral’ or ‘undecided’ response. In the counselling questionnaire you may also notice that the question asking for the age of patients may have provided a list of age groupings, for example, 20–9, 30–9.
Although categories can make the questionnaire easier to complete, and more anonymous (some people may not like to specify their age because it may help to identify them), my advice would be to gather the precise ages where possible because you can convert them into any categories you want later; the same principle applies to number of counselling sessions. A full discussion of questionnaire design issues would require a chapter unto itself. For further reading Robson (2002) provides a relatively succinct chapter with guidance on design and other issues. 1. 2 What types of analyses can we perform on this questionnaire? . 2. 1 Descriptive statistics Descriptive statistics provide summary information about data, for example, the number of patients who are male or female, or the average age of patients. There are three distinct types of data that are important for statistical analysis: Types of data (or levels of measurement) 1 Interval or Ratio: This is data which takes the form of a scale in which the numbers go from low to high in equal intervals. Height and weight are obvious examples. In our data this applies to age, number of counselling sessions and patient satisfaction ratings. Ordinal: This is data that can be put into an ordered sequence. For example, the rank order of runners in a race – 1st, 2nd, 3rd, etc. Notice that this gives no information on how much quicker 1st was than 2nd or 2nd was than 3rd. So, in a race, the winner may have completed the course in 20 seconds, the runner-up in 21 seconds, but third place may have taken 30 seconds. Whereas there is only one second difference between 1st and 2nd, there are nine seconds difference between second and third. Do we have any of this type of data in our sample? No we do not (though see Box 1. for further discussion). 3 Categorical or nominal: This is data that represents different categories, rather than a scale. In our data this applies to: sex (male or female) and counsellor (John or Jane). So, if we were assigning numbers to these categories, as we will be doing, they do not have any order as they would have 8 Quantitative Data Analysis Using SPSS in a scale: if we were to code male as 1 and female as 2, this does not imply any order to the numbers – it is just an arbitrary assignment of numbers to categories. Making a distinction between these levels of measurement is mportant because the type of analysis we can perform on the data from the questionnaire depends on the type of data – as illustrated in Table 1. 1. Table 1. 1 Type of data and appropriate descriptive statistics Type of data Categorical data: Interval/ratio data: Descriptive statistics Frequencies, cross-tabulation. Measures of central tendency: mean, median, mode. We will now examine each of these in turn. Box 1. 2 Types of data & levels of measurement Whereas this brief review is really all we need to know for our questionnaire data, there is in fact a lot more to say about types of data and levels of measurement.
For example, although I have grouped interval and ratio data together, as many textbooks do (e. g. , Bryman and Cramer 2001: 57), there is much debate about the differences between true interval data and that provided in rating scales. In our questionnaire, age and counselling sessions are ratio data because there is a true zero point and we know that someone who is 40 years is twice as old as someone who is 20 years; similarly, we know that 12 counselling sessions is four times as many as three; we know the ratio of scores.
The problem with interval data is that, while the intervals may be equal we cannot be sure that the ratio of scores is equal. For example, if we were measuring anxiety on a scale of 0–100, should we maintain that a person who scored 80 had twice as much anxiety as a person who scored 40? (Howell 1997: 6. ) This issue could be raised about our satisfaction ratings: can we really be sure that a patient who circles 6 is twice as satis? ed as a person who circles 3, or three times as satis? ed as a person who circles 2?
It is for this reason that some analysts would treat this as ordinal data – like the rank order of runners in a race – 1st, 2nd, 3rd, etc. described above. But clearly, our satisfaction rating scale is more than ordinal, and since the numerical intervals in the scale are presented as equal (assuming equal intervals between the numbers) we might say they ‘approximate’ interval data. For those who wish to delve further into this debate about whether rating scales should be treated as ordinal or interval data see Howell (1997) or the recent articles in Medical Education by Jamieson (2004) and Pell (2005).
Descriptive statistics for categorical data Frequencies. Probably the ? rst thing a researcher would do with the data from our questionnaire is to ‘run some frequencies’. This simply means that we would look at the numbers and percentages for our categorical questions, A questionnaire and what to do with it 9 which we might hereafter refer to as ‘variables’ (because the data may vary according to the patient answering the question: male/female, old/young, satis? ed or not satis? ed etc. ) • How many males/females were referred for counselling? Are they similar proportions?
Were there more males or females? • How many patients were seen by John and how many were seen by Jane? Did they both see a similar number of patients? Cross-tabulation. The next step might be to cross-tabulate this data to gain more speci? c information about the relationship between these two variables. For example, imagine that we had collected this information for 200 patients and, from our frequencies analysis on each variable, we found the following results: Table 1. 2 Sex of patients Number Male Female Total 100 100 200 Percent 50% 50% 100% Table 1. Counsellor seen by patients Number John Jane Total 100 100 200 Percent 50% 50% 100% While these tables tell us that 50 per cent of patients were male, and that each counsellor saw 50 per cent of patients, they do not inform us about the relationship between the two variables: were the male and female patients equally distributed across the two counsellors or, at the other extreme, did all the female patients see Jane and all the male patients see John? In order to ? nd this out we need to cross-tabulate the data. It might produce the following table: Table 1. Cross-tabulation of gender and counsellor John Male Female Total 80 20 100 Jane 20 80 100 Total 100 100 200 In this example we can see that there were 100 male and 100 female patients (row totals). We can also see that the counsellors saw an equal number of patients: 100 saw John and 100 saw Jane (column totals). However, this crosstabulation table also shows us that patients were not equally distributed across the two counsellors: whereas 80 per cent of males saw John, 80 per cent of females saw Jane. If the patients were randomly distributed to each of the 10
Quantitative Data Analysis Using SPSS counsellors you would expect a similar proportion seeing each of the counsellors. So in this hypothetical example it would appear that there is some preference for male patients to see a male counsellor, and for females to see a female counsellor. This might be important information for the doctor. For example, if one of the counsellors was intending to leave and the doctor needed to employ another counsellor, this might suggest is it necessary to ensure a male and a female counsellor are available to cater for patient preferences.
Descriptive statistics for interval data: Measures of central tendency Having ‘run frequencies’ and cross-tabulated our categorical variables, we would next turn to the other variables that contain interval data: age, number of counselling sessions and satisfaction ratings. If we wanted to produce summary information about these items it would be more useful to provide measures of central tendency: means, medians or modes. The Mean. The arithmetic mean is the most common measure of central tendency. It is simply the sum of the scores divided by the number of scores.
So, to calculate the mean in the following example, we simply divide the sum of the ages by the number of patients: 355/11 = 32. Thus, the mean age of the patients is 32 years. Table 1. 5 Calculating the mean age Patient: Age: 1 46 2 23 3 34 4 25 5 28 6 31 7 23 8 40 9 36 10 45 11 24 Sum 355 The Median. The median is another common measure of central tendency. It is the midpoint of an ordered distribution of scores. Thus, if we order the age of patients from lowest to highest it looks like this: Table 1. 6 Finding the median age Patient: Age: 1 23 2 23 3 24 4 25 5 28 6 31 7 34 8 36 9 40 10 45 1 46 The median is simply the middle number, in this case 31. If you have an even number of cases – with no singular middle number then you just take the midpoint between those two numbers: Table 1. 7 Finding the median age in an even number of cases Patient: Age: 1 23 2 23 3 24 4 25 5 28 6 31 7 34 8 46 9 40 10 45 You then simply calculate the midpoint between these two central values: (28+31)/2 = 29. 5 The Mode. The mode, which is generally of less use, is simply the most frequently occurring value. In our age example above that would be 23 – since A questionnaire and what to do with it 11 t occurs twice – all the other ages only occur once. As an example, the mode might be useful for a shoe manufacturer who wanted to know the most common shoe size of the population. When should I refer to the mean or the median? In the data above, which provided the ages of 11 patients (Tables 1. 5 and 1. 6), we saw that the mean value was 32 years and the median age was just one year younger at 31 years. So the two values are very similar. However, in some data the mean and the median values might be quite different. Consider the following example which shows the salary of employees at a small company: Table 1. Salary of employees at a small company Employee: 1 2 3 4 5 6 7 8 9 Total Salary: 8,000 8,000 9,000 9,000 10,000 11,000 11,000 40,000 45,000 151,000 Here, whereas the median salary is ? 10,000, the mean actually works out at: ? 151,000/9 = ? 16,777. This is clearly not representative as a measure of central tendency since the majority of employees (seven out of nine) get well below the mean salary! From this example, we can see how a couple of extreme values can distort the mean value of a dataset. In such cases we should cite the median which is more representative.
Darrell Huff (1991), in his classic short book How to Lie with Statistics, points out that this ambiguity in the common use of the term ‘average’ (a more-orless typical centre value) is a common ploy in the deceptive use of statistics. For example, a magazine may choose to cite the larger mean (rather than median) income of their readers to make it look like they have a wealthier readership, thus encouraging more advertising revenue. In statistical terms the ‘average’ will invariably be used to refer to the arithmetic mean. The normal distribution. Statistically speaking, the mean should be used when the data is normally distributed.
For example, if we survey 30 people coming out of a supermarket we might expect that there would be a few very young and very old shoppers, but most people would be aged between, say, 30–60 years. This is illustrated in Figure 1. 2. Here we can see that most people are aged between 30 and 60, and the mean value and median value are virtually identical: mean age = 40. 5 years; median age = 40 years. However, when the data is not normally distributed, when it is skewed towards the lower or higher end, the mean and the median values are not equivalent. So, if we turn back to our employee salaries example (with a larger set of ? titious data), where a few employees get very high salaries, we might ? nd a distribution like the one illustrated in Figure 1. 3. This is known as a skewed distribution because the data is skewed to one end of the scale – in this case it is ‘skewed off’ towards the higher end of the salary scale – whereas the salary for most people, as illustrated by the median value, is at this lower end of the scale. It is important that we examine the spread of interval data to see whether the mean or the median is the most valid measure of central tendency. If we have 12 Quantitative Data Analysis Using SPSS
Figure 1. 2 Supermarket shoppers: age normally distributed data that is markedly skewed, then the mean value may not be a reliable measure. We shall see later that a ‘normal’ or skewed distribution can also dictate which statistical test we should use to analyse the data. Section summary We have thus far considered the various ways of describing the data from the doctor’s questionnaire. We have seen that categorical data may be described using frequencies and cross-tabulation, and that interval data may be described using measures of central tendency: the mean, median and mode.
We have also seen that the validity of citing the mean or the median depends on the distribution of the data. Where it is normally distributed the mean can be used, but when data is extremely skewed to one end of the scale, the median may be a more reliable measure of central tendency. How might we apply this to our counselling data? Well, we might want to summarize our interval data to answer the following questions: • What was the mean age of patients seen for counselling? • What was the mean number of sessions? • Were most patients satis? ed with the service? What was the mean score?
A questionnaire and what to do with it 13 Figure 1. 3 Employee salaries: skewed distribution Notice that I have referred to mean values in the above questions. The most appropriate measure of central tendency should of course be used – the mean or the median – and this will depend on the spread of the data. It is not unusual to ? nd that both are cited to demonstrate the reliability of the mean – or otherwise. 1. 2. 2 Relationships and differences in the data There are two further types of analyses that we might conduct on interval data: • Examine the relationship between variables. Examine differences between two sets of scores. Examining relationships between variables with interval data: correlation We have already looked at the relationship between items with categorical data using cross-tabulation. For variables with interval data – such as age – we can use another technique known as correlation. 14 Quantitative Data Analysis Using SPSS A correlation illustrates the direction and strength of a relationship between two variables. For example, we might expect that height and shoe size are related – that taller people have larger feet. Figure 1. shows a scatterplot of height and shoe size for 30 people, where we can see that, as height increases, so does shoe size. This is known as a positive correlation: the more tightly the plot forms a line rising from left to right, the stronger the correlation. Figure 1. 4 Positive correlation between height and shoe size A negative correlation is the opposite: high scores on one variable are linked with low scores on another. For example, we might ? nd a negative correlation between IQ scores and the number of hours spent each week watching reality TV shows, as illustrated in Figure 5. (hypothetical data). From this scatterplot we can discern a line descending from left to right in the opposite direction to the height and shoe size plot. Finally, we would probably not expect to ? nd an association between IQ and shoe size, as illustrated in Figure 1. 6, where there is no discernible correlation between the two variables. 2 In Chapter 5 we will look at how to produce these scatterplots in SPSS. How might we apply this to our counselling data? Well, ? rst of all we need to identify two interval variables that might be correlated.
We have three to choose from: age, number of counselling sessions and satisfaction ratings. So, as one example, we might want to see if patients’ satisfaction ratings are linked to the number of appointments they had. Perhaps the more A questionnaire and what to do with it 15 Figure 1. 5 Negative correlation between IQ and interest in reality TV shows Figure 1. 6 Scatterplot for IQ and shoe size appointments they had the more satis? ed they were? Or maybe this is wrong: perhaps more appointments are linked to more unresolved problems and thus less satisfaction? 16 Quantitative Data Analysis Using SPSS
Examining the differences in scores within variables Finally, we should also be interested in examining any differences in scores within a particular variable. For example, we might wish to calculate the mean satisfaction ratings achieved for John compared to those for Jane. If we were to do this it is useful to categorize variables into two kinds: independent variables and dependent variables. So, if we think that level of satisfaction depends on which counsellor the patient saw, we would have the following independent and dependent variables: • Independent variable: counsellor (John or Jane). Dependent variable: satisfaction rating. Thus we are examining whether patients satisfaction ratings are dependent on the counsellor they saw. For example, if each counsellor saw ? ve patients, then we would calculate the mean score for John and the mean score for Jane and consider the difference, as illustrated in Table 1. 9. Table 1. 9 Comparing mean satisfaction ratings for John and Jane (hypothetical data) John 2 3 4 2 3 14 2. 8 Jane 5 4 7 6 5 27 5. 4 Sum Mean In Chapter 6 we will use a statistic that tells us whether or not any difference in the two mean scores is statistically signi? ant, which basically means that it was unlikely to have occurred by chance. Summary That is the end of this ? rst chapter in which you have learned about: • Different types of data and levels of measurement (categorical, ordinal and interval data). • Frequencies and cross-tabulation. • Measures of central tendency – mean, median and mode. • Appropriate use of the mean or the median value depending on the distribution of the data – is it normally distributed or skewed to one end of the scale? • Using scatterplots to see if interval data is correlated. Categorizing variables into independent variables and dependent variables to examine differences between two sets of scores. Having familiarized ourselves with the dataset and the types of analysis we may conduct on it, we now need to enter the data into SPSS. That is, after you have completed the exercises . . . A questionnaire and what to do with it 1. 4 Exercises Exercise 1. 1 Types of data Would the following variables yield interval or nominal/categorical data? (a) (b) (c) (d) ethnic background; student assignment marks; level of education; patient satisfaction ratings on a 1–7 scale. 17 Exercise 1. Measures of central tendency Which do you imagine would be the most representative measure of central tendency for the following data? (a) number of days taken by students at a University to return overdue library books; (b) IQ scores for a random sample of the population; (c) number of patients cured of migraine in a year by an acupuncturist; (d) number of counselling sessions attended by patients. Exercise 1. 3 Correlation What sort of correlation would you expect to see from the following variables? (a) fuel bills and temperature; (b) ice-cream sales and temperature; (c) number of counselling sessions and gender.
Exercise 1. 4 Independent and dependent variables Identify the independent and dependent variables in the following research questions: (a) Does alcohol affect a person’s ability to calculate mathematical problems? (b) Is acupuncture better than physiotherapy in treating back pain? Exercise 1. 5 What type of analysis? What type of analysis would you perform to examine the following? (a) relationship between gender and preference for a cat or dog as a pet; (b) relationship between time spent on an assignment and percentage mark; (c) relationship between gender and patient satisfaction ratings. You will ? d answers to the exercises at the end of the book. 18 1. 5 Notes Quantitative Data Analysis Using SPSS 1 Mental health issues are the third most common reason for consulting a general practitioner (GP), after respiratory disorders and cardiovascular disorders. A quarter of routine GP consultations relate to people with a mental health problem, most commonly depression and anxiety. It has been estimated that over half the general practices in England (51%) provide counselling services for patients (for further details and references see Greasley and Small 2005a). 2 While we might not expect to ? d a correlation between IQ and shoe size in a random sample of 30 people, there may be some samples for which we might ? nd a correlation, for example, relating to age differences. 2 Coding the data for SPSS, setting up an SPSS database and entering the data In this chapter you will learn how to code the data for SPSS, set up an SPSS database and enter the data. 2. 1 The dataset The data from the 30 questionnaires is provided in Table 2. 1. Each row provides data for that particular patient: their sex and age, the counsellor they saw, how many sessions they attended, and their satisfaction rating for the counselling.
Table 2. 1 Data from the counselling satisfaction questionnaire Patient 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Sex male male male female male female male female male male female female male female male female male male female female female male female female female male female female male female Age 21 22 25 36 41 28 26 38 35 24 41 34 32 38 33 42 31 33 40 47 27 44 43 33 45 36 49 39 35 38 Counsellor John John John Jane Jane Jane John Jane John John Jane Jane John Jane John Jane John John John Jane Jane Jane John Jane John Jane John Jane Jane Jane
Sessions 8 8 9 6 4 5 12 7 10 11 6 9 9 5 12 7 5 8 8 9 6 6 3 7 10 7 8 6 7 3 Satisfaction 5 4 7 2 1 3 5 3 5 6 4 5 4 1 5 5 4 6 6 7 4 3 5 3 2 5 4 4 4 2 20 2. 2 Coding the data for SPSS Quantitative Data Analysis Using SPSS We now need to enter our data from Table 2. 1 into SPSS. This will enable us to conduct all the analyses discussed in the previous chapter – frequencies, cross-tabulation, measures of central tendency (mean, median, mode), correlations, graphs, etc. , at the click of a button (well, a few buttons in some cases).
But before we can enter the data into SPSS we need to give our variables names and code the data – because all data in SPSS should be entered as numbers. The simplest way to illustrate this is through the codebook I have produced for the data in Figure 2. 1. The ? rst column provides the variable name that we will use for SPSS. In the second column I have written some coding instructions. Since all our data needs to be entered as numbers, this means that data which is not collected as numbers, like male/female, needs to be converted into numbers.
For example, in our codebook, we have assigned the number 1 for male, and 2 for female. 1 Figure 2. 1 An SPSS codebook for the data in Table 2. 1 In practice, for a questionnaire with only ? ve questions and only two coded variables, the idea of producing a codebook is a little excessive. But for larger questionnaires it can be a very useful reference point. Alternatively, another strategy is to simply add any codes for categorical variables on a copy of the questionnaire so that you have a record of the codes. Coding, setting up and entering data on SPSS 21
Now that we have decided upon our variable names and codes for categorical data, as illustrated in Figure 2. 1, we can set up the SPSS database. 2. 3 Setting up an SPSS database When you open SPSS you should be faced with the following screen: Screenshot 2. 1 Click Type in data and then click OK. This opens a blank spreadsheet. The SPSS data screen The screen below is known as the Data View. This is where you will enter the data. But not yet – there’s a little more reading to do. Each row will contain the data for one patient. So, in the example on the next page, we have data for 3 patients: • Patient 1 is a male (coded 1), aged 24. Patient 2 is a female (coded 2), aged 25. • Patient 3 is male, aged 26. 22 Screenshot 2. 2 Quantitative Data Analysis Using SPSS 2. 3. 1 De? ning the variables Before we can enter our data, we need to enter variable names and coding instructions. On the bottom left of the Data Editor screen you will see two tabs labelled Data View and Variable View. Screenshot 2. 3 Click on Variable View. This will produce the following screen where you will type in information about the variables: Screenshot 2. 4 In this view each row will provide information for each variable.
This is where we need to refer to our SPSS codebook (see Figure 2. 1). Name Enter the ? rst SPSS variable name listed in your codebook, i. e. , patient. Then press the right arrow on your keyboard to go to the next column – type. Coding, setting up and entering data on SPSS 23 Box 2. 1 SPSS rules for naming variables SPSS has a number of rules for naming variables: • The length of the name cannot exceed 64 characters. Though, clearly, you should keep the variable name as short and succinct as possible, as we have done for the counselling questionnaire. • The name must begin with a letter.
The remaining characters can be any letter, any digit, a full stop or the symbols @, #, _ or $. • Variable names cannot contain spaces or end with a full stop. • Each variable name must be unique: duplication is not allowed. • Reserved keywords cannot be used as variable names. Reserved keywords are: ALL, AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, WITH. • Variable names can be de? ned with any mixture of upper and lower case characters. Type: what type of data is it? Once you have entered a variable name the default value for Type will appear automatically as numeric.
All of our variables will be numeric because we will be coding any words – such as male and female – as numbers (i. e. , 1 for male and 2 for female). So you can move on to the next column – Width (see Box 2. 2 for occasions when you really have to enter words). Box 2. 2 Entering words into SPSS There are certain occasions when you might want to enter words into SPSS. For example, if the data forms part of a database where you need to retain the names of individuals. Or it may be that you are copying data into SPSS, from an Excel spreadsheet, for example, which includes words.
If you do not tell SPSS to expect words it may not recognize them (your column of words from Excel may not appear). As an example, if we wanted to enter the words ‘male’ and ‘female’ (instead of numbers) we would need to click in the cell and a shaded square with three dots will appear. Click on this and a list of options will appear: Screenshot 2. 5 24 Quantitative Data Analysis Using SPSS In SPSS words are known as string variables. So, we would click on String and then OK and this cell would say String, rather than Numeric. Notice also that SPSS provides options to enter dates or currency data.
Width: how many numbers will you be entering? SPSS defaults to eight characters. The most we will need is two – for our age and session data (e. g. , 24 years, 12 sessions). There is no need to change this. You would increase it if you were entering very large numbers, for example, 184,333,333. 24 (i. e. , 11 numbers/characters). Decimals SPSS defaults to two decimal places. Since our data does not require decimal places we can simply click in the Decimals cell and click the up or down arrows (which appear to the right of the cell) to adjust decimal places needed for that particular variable.
Screenshot 2. 6 For datasets with many variables that do not have decimal places it may be worthwhile changing the default setting to 0 decimal places. You can do this by clicking on Edit from the menu at the top of the screen and then choosing Options. Next, click the Data tab and change the decimal place value to 0 in Display format for new numeric variables. Label The Label column allows you to provide a longer description of your variable, which will be shown in the output produced by SPSS. You do not need to put anything here for ‘patient’, or the other variables, since the names are selfexplanatory. Values Values are numbers assigned to categories for nominal variables, for example, where male = 1 and female = 2. Since your ? rst variable (patient) has no ‘values’, you do not need to put anything here. Missing Sometimes it is useful to assign speci? c values to indicate different reasons for missing data. However, SPSS recognizes any blank cell as missing data and Coding, setting up and entering data on SPSS 25 excludes it from any calculations, so if you intend to leave the cell blank there is no need to enter values for missing data. And we have no missing data anyway. Columns You can change the column width to reduce the space it takes on the screen. But you need to allow enough space for variable names, so the default of eight is usually OK. Align This is usually set at Right, which is OK. Measure The default measure is Scale, but you can change this to Ordinal or Nominal by clicking in the cell and then on the down arrow on the right of the cell. We have scale data and nominal data: Table 2. 2 Measure de? nitions Measure Scale: Nominal: Ordinal: De? nition For numeric values on an interval or ratio scale: age, sessions, satisfaction.
For values that represent categories with no intrinsic order: patient, sex, counsellor. For values with some intrinsic order (e. g. , low, medium, high; ? rst, second third). Since our ? rst variable is simply listing the patient/questionnaire number we should change this to ‘nominal’ because the numbers are simply assigned to the patients (they have no meaning as a scale). You should now have completed the information for the ? rst variable, ‘patient’, and it should look like this: Screenshot 2. 7 2. 3. 2. Adding value labels The next variable is ‘sex’.
Proceed as you did for ‘patient’ until you get to Values. Whereas ‘patient’ had no values – it was simply patient numbers – sex has two values: we will be entering the number 1 for male and number 2 for female. So you need to tell SPSS that is what you are doing. This is how you add value labels in SPSS: 26 Quantitative Data Analysis Using SPSS 1 Click in the Values cell and then on the button with 3 dots on the right side of the cell. This opens the Value Label box. 2 Click in the box marked Value. Type in 1. 3 Click in the box marked Value Label. Type in male. 4 Click on Add.
You will then see in the summary box: 1=male. 5 Repeat this procedure for females (Value = 2; Value Label: female; Add). Screenshot 2. 8 6 Then click OK. This information is now stored in your SPSS database. The only other thing to change is Measure (to Nominal) and your database should look like this: Screenshot 2. 9 7 You should now be able to complete the information for the remaining variables, ensuring that you enter the value labels for ‘counsellor’ (1 = John; 2 = Jane). When you have done this your database should look like this: Screenshot 2. 10 Coding, setting up and entering data on SPSS
You have now de? ned your variables and can proceed to enter the data. 27 2. 4 Entering the data Click the Data View tab (bottom left of screen) and this will switch you from Variable View to Data View. Notice that the columns are now headed with your variable names. Screenshot 2. 11 You can now enter the data for the 30 patients. Simply click in the top lefthand cell (patient 1) and enter 1. Then move to the next cell (press the right arrow on your keyboard) and enter the data for each patient according to the data in Table 2. 1. This should result in the following data: Screenshot 2. 2 28 Quantitative Data Analysis Using SPSS Once you have entered the data make sure you save the ? le. 4 We are now ready to begin the analyses. That is, after you have done the following exercises. 2. 5 Exercises Exercise 2. 1 Viewing value labels In order to check that you have entered the data correctly you might wish to display your value labels so that the SPSS dataset looks exactly like the data in Table 2. 1. You can view the value labels by clicking View, then click Value Labels: Screenshot 2. 13 Your value labels for sex and counsellor are now displayed:
Coding, setting up and entering data on SPSS Screenshot 2. 14 29 Exercise 2. 2 Sorting the data Sometimes it is useful to re-order the data, for example, if you wanted to visually examine all the male cases together. 5 To do this: 1 From the menu at the top of the screen, click Data, then click Sort Cases. Screenshot 2. 15 30 Quantitative Data Analysis Using SPSS 2 Click on the variable sex, then click the arrow (to the left of the Sort by box) to move it into the Sort by box: Screenshot 2. 16 3 You can now click OK to sort your dataset by the sex of the patient: Screenshot 2. 17 Your dataset is now re-ordered with all the male patients at the top (see Screenshot 2. 18). Note: If you are working in SPSS v15 an Output Viewer screen will appear logging the fact that you have conducted a procedure in SPSS. Close this by clicking the cross in the orange box to the top right of the Output Viewer screen. When it asks if you want to save this, click no. Coding, setting up and entering data on SPSS Screenshot 2. 18 31 5 To get the data back into its original order, go back to Data/Sort Cases. Double-click (left mouse button) on Patient to sort the data back into its original order. And note that we would not have been able to do this if we had not numbered each patient/questionnaire. ) You can actually sort by as many variables as you want – at the same time. For example, you could sort the data by sex of patients and age by putting both variables in the Sort by box, but really, I think we need to get on with the analysis. Notes 1 This is an arbitrary assignment of the numbers 1 and 2, and is not meant in any way to re? ect the order of importance of the male and female genders. I point this out because one student did raise this issue (in class) and it was debated for some time. . . You might make use of the Label facility if you had a long questionnaire and you decided to label your variables q1, q2, q3, etc. In the Label column you would provide a description of each particular question. For example, if we took this approach for the counselling questionnaire, the label for q4 32 Quantitative Data Analysis Using SPSS would be ‘number of counselling sessions’ – and this label would appear in the output rather than ‘q4’ to help us identify the particular question. This approach used to be more common in older versions of SPSS when we were limited to eight characters in the variable column, so truncated (barely identi? ble) names often needed to be entered. 3 As an example, we might want to differentiate missing data due to a patient refusing to answer a sensitive question, and missing data due to a question not being applicable to a patient. In such cases we would need to click in the Missing cell whereupon a button with three dots will appear to the right of the cell; a dialogue box then appears where you can enter a value for the missing data. The value you enter should be out of the range of values that may occur as part of the data: 99 is a commonly used value to de? e missing data – though if your variable may legitimately include that value (potentially ‘age’ in our dataset) you should choose a more remote number (999). 4 And make sure you know where you have saved it! There have been a few occasions when students have saved the data ? le on the university network but were unable to locate it for the next session. Also, if you are working on a university or college computer you should also be aware that some computers are programmed to ‘log out’ after a speci? ed period if they are not being used; if you leave the computer without saving the ? e, the data may not be there when you return . . . 5 Sorting the data can be useful for a number of reasons. I recently found it useful for a questionnaire where I needed to edit the dataset according to the month patients were referred to a service. Sorting the cases according to month of referral made this much easier. 3 Descriptive statistics: frequencies, measures of central tendency and illustrating the data using graphs In this chapter you will use SPSS to produce some basic descriptive statistics from the data: frequencies for categorical data and measures of central tendency for interval level data.
You will also learn how to produce and edit charts to illustrate the data analysis, and how to copy your work into a Microsoft Word ? le. 3. 1 Frequencies We noted in Chapter 1 that the ? rst thing a researcher would do with this data is to ‘run some frequencies’. This simply means that we want to look at the frequencies of our categorical data: • How many male/female patients are there? • How many patients were treated by each of the counsellors John and Jane? This will provide us with an initial overview of our sample, or ‘population’. So let us start with our ? st categorical variable – sex. Running frequencies in SPSS 1 From the menu at the top of the screen click Analyze, then Descriptive Statistics then Frequencies. 34 Screenshot 3. 1 Quantitative Data Analysis Using SPSS 2 Choose the variable sex by clicking with your mouse. 3 Once sex is highlighted move it across into the variables box by clicking the arrow. Alternatively you could just double-click it once it is highlighted. Screenshot 3. 2 Descriptive statistics 4 Now click OK. 35 A new SPSS window should now appear. This is called the Output Viewer, as shown next.
The results of all analyses performed by SPSS will appear in this viewer – which can be saved separately as a ? le. Note: You may need to maximize this screen by clicking the maximize button (top right of screen – middle icon to the left of the x). Screenshot 3. 3 This has produced the frequencies analysis for the variable sex in the form of an SPSS ‘pivot table’. The ? rst column has the labels male and female in it: if you had not entered value labels, instead of male there would be a 1, and instead of female there would be a 2. The second column tells us that there are 30 cases: 14 are male and 16 are female.
The third column provides percentages. The fourth column – Valid Percent – calculates percentage ignoring any missing values. Since there are no missing values here, valid per cent is the same as actual per cent. But in the hypothetical example on the next page, I have removed data for ten cases. 36 Screenshot 3. 4 Quantitative Data Analysis Using SPSS When we have missing data the values for per cent and valid per cent are different. This is because the per cent column calculates percentages for all the data – including missing data.
So, in the output above, we have ten males, ten females and ten missing data – each of which constitutes 33. 3 per cent of the total data – 30 cases. Valid per cent, however, ignores the missing data (ten cases) and has calculated percentage based on a total of 20 cases. Thus, 20/10 = 50%. If you have missing data this is the percentage ? gure you should cite. Note that you now have two SPSS windows in operation: • SPSS Data Editor (with data and variable views). • Output Viewer. Descriptive statistics 37 You can switch between the two by clicking the tabs at the bottom of your screen. Screenshot 3. 5
Now try running frequencies for the other categorical variable – counsellor – to see how many patients they each treated. Go to the menu at the top of the screen and click Analyze then Descriptive Statistics then Frequencies. You should then remove the variable sex from the frequencies analysis box by double-clicking it with the left mouse button. You can now run frequencies for counsellor, which should result in the following output: 38 Screenshot 3. 6 Quantitative Data Analysis Using SPSS You can in fact run Frequencies for more than one variable at a time by moving them all into the Variables box, as shown below: Screenshot 3. 3. 2 Measures of central tendency for interval variables Having examined frequencies for our categorical data, we now need to examine our variables containing interval data: age, number of sessions and satisfaction with the service. Our aim here is to obtain basic, descriptive information about these variables. For example, we should want to know the age range of patients attending for counselling and the mean (or median) age of our sample. This is basic information that the doctor would want to know about the ‘population’ attending for counselling. So, the information we are seeking is:
Descriptive statistics 1 What is the mean/median: • age of patients; • number of sessions; • satisfaction rating. 2 What is the range of values – from lowest to highest – for: • age of patients; • number of sessions; • satisfaction ratings. 39 We can obtain this information by running Frequencies for age, sessions and satisfaction. Running frequencies for measures of central tendency 1 From the menu at the top of the screen click Analyze, then Descriptive Statistics, then Frequencies (remove any existing variables from previous analyses by double-clicking them to return them to the list). Double-click (left mouse button) the variables age, sessions and satisfaction to move them into the Variables box. 3 Click Statistics and click in the boxes next to mean, median, mode, and minimum and maximum. Then click Continue. Screenshot 3. 8 4 De-select Display Frequency Tables (otherwise you will get a table listing every age). 40 Screenshot 3. 9 Quantitative Data Analysis Using SPSS 5 Click OK, and this should produce the output table below: Table 3. 1 SPSS descriptive statistics This table provides us with information about the mean, median and mode, and the lowest and highest values (range of scores) for our three variables.
Focusing on the variable ‘age’, we can see that mean age was 35. 2 years and the median age was 35. 5 years. Since these values for the mean and the median are very similar this tells us that our data is not skewed towards one end of the scale (as we discussed in detail in Chapter 1). We can also see that ages ranged from 21 years to 49 years. The modal age (most common actual age) is also provided (33 years), but note that SPSS tells us that ‘multiple modes exist’ and that this is the smallest value.
The most commonly occurring age is not, of course, very useful for our data analysis. The modal value may, however, be of more interest for the other two variables. So while we know that the mean number of sessions is 7, it may also be interesting to know that the modal number of sessions was actually 6 if the doctor Descriptive statistics 41 or counselling service is recommending that most patients should be limited to 6 sessions. Similarly, although we know that the mean satisfaction rating is a relatively neutral 4, it may be useful to know that the modal rating is also 4.
The mean value of 4 could have come from data that was an average of very low and very high ratings – indicating that patients were divided in their perceptions of the service; a modal rating of 4 suggests that this is not the case (though note of course that SPSS tells us that multiple modes exist). So, in our report to the doctor, from the analyses we have so far conducted, we can inform him about: • The frequencies or number of patients seen by each of the counsellors, and their gender. • The mean/median age, number of sessions and satisfaction rating, along with the range of values for each of these variables.
This information would be an important ? rst step in presenting the results of our analysis to the doctor. 3. 3 Using graphs to visually illustrate the data This information may also be graphically illustrated using bar charts, histograms and boxplots. 3. 3. 1 Bar charts Bar charts present a graphical display of categorical data, for example, comparing the mean number of sessions provided by each counsellor. Producing a bar chart comparing the mean number of sessions provided by each counsellor 1 From the menu at the top of the screen click Graphs, then Chart Builder (a dialogue box may appear asking if you have set the correct easurement levels for each variable and included value labels for categorical variables). Since you have done both (you have, haven’t you . . . ), put a tick next to Do not show this dialogue again and click OK). 2 You should then be faced with the follo