Cleaning your data in SPSS means preparing your raw dissertation dataset so it is accurate, consistent, and ready for analysis. It covers checking each variable for impossible values, fixing how categories are coded, deciding what to do with missing data, and screening for outliers before you run a single test. Done first, it protects every result that follows in your thesis.

Why cleaning comes before any analysis in your thesis

A statistical test cannot tell the difference between a real value and a typing error, so it will happily run on dirty data and hand you a confident but wrong answer. That is why data cleaning is the first stage of your results chapter, not an afterthought. When you clean your dissertation dataset carefully, you can defend every number to your supervisor, and you avoid the painful situation of discovering an error after the whole analysis is written up. The goal is a dataset where every cell means exactly what you think it means.

Theoretical quantiles (normal)Sample quantilesreference line
Screening continuous variables for shape and outliers is part of cleaning: points that drift far from the reference line flag cases worth inspecting before you analyse.

A practical cleaning order for your SPSS dataset

Work through your dissertation dataset in a fixed order so nothing is missed. First, open Variable View and set each variable's type, measure (nominal, ordinal, or scale), and value labels so the software treats each column correctly. Second, run frequencies on every categorical variable to catch stray codes, such as a 7 appearing in a question that only allows 1 to 5. Third, run descriptives on every continuous variable and read the minimum and maximum: an age of 200 or a negative score is an impossible value you can trace back and correct. Fourth, screen for outliers using boxplots and standardised scores, deciding case by case whether each is a genuine extreme or a data-entry slip.

Throughout, keep a written log of every change you make. Examiners expect a clear audit trail, and a clean log also lets you rebuild the dataset if you ever need to start the analysis again. When the shape of a continuous variable looks unusual after cleaning, that feeds straight into testing for normality, which decides whether parametric tests are appropriate.

Recoding, computing, and clearing variables the right way

Cleaning is not only about removing errors; it is also about reshaping variables so they answer your research questions. Use Recode into Different Variables to collapse categories or reverse-score items, always keeping the original column intact so nothing is lost. Use Compute Variable to build totals and subscale scores from individual items. If a variable was entered by mistake or is no longer needed, clear it cleanly: select its column in Data View or its row in Variable View and delete it, rather than blanking individual cells, which leaves the column structure confusing. Document each recode so a reader can follow how your final variables were built from the raw responses.

Handling the gaps and reading the output

Almost every dissertation dataset has gaps, and how you treat them changes your results. SPSS marks blanks as system-missing, but you should also define user-missing values so codes like 99 for "prefer not to say" are never analysed as real numbers. The strategy you choose, whether listwise deletion, pairwise deletion, or an imputation approach, deserves a deliberate decision rather than a default, which is covered in handling missing data. Once the file is clean, your first real output is a set of descriptive statistics, and the difference between describing your sample and generalising from it is set out in descriptive versus inferential statistics. Reading those tables correctly is the next skill, explained in interpreting SPSS output.

Treat cleaning as the foundation everything else rests on. A tidy, well-documented file makes your analysis faster, your write-up clearer, and your defence far easier, because every value in the table can be traced back to a decision you made on purpose.