PSPP Syntax: Examples and Notes

A practical guide to PSPP syntax and psppire commands for data analysis.

PSPP is a free, open-source statistical software, and an alternative to SPSS. It supports many of the same statistical and data-management procedures, uses a similar command language, and is works well for batch processing, scripting, reproducible analysis, as well as interactive use.

Compared to SPSS, PSPP costs less (free) and is open-source while SPSS is commercial software that requires a paid license. Both PSPP and SPSS can read very large files. PSPP is useful for coursework, teaching, and many practical data-analysis tasks when the procedures it offers cover what you need.

For a definitive list of PSPP features that are not yet implemented in PSPP, see the official GNU PSPP “Not Implemented” page.

What does PSPP stand for? This can best be answered by the "What does PSPP stand for?" entry in the pspp frequently asked questions file. TLDR: it doesn't have any official acronymic explansion.

 

About This Page

This page collects practical examples of PSPP syntax for PSPP and psppire. It focuses on commands people actually use and which I use when I program with SPSS and PSPP — GET DATA, DATA LIST, REGRESSION, CROSSTABS, DESCRIPTIVES, and specialized topics including using functions for data cleaning, reading special types of data, and computing p‑values from ANOVA output.

My goal is to give clear, reproducible patterns you can adapt to your own work. The page also gives tips for improving your PSPP workflow.

Much of the material on this page applies to SPSS users as well, since PSPP follows the same command language for many core procedures. Most workflow habits are general and transfer between the two programs, though a few relate to PSPP-only features. In most cases, PSPP syntax runs in SPSS unless PSPP-specific commands are used, such as FILE HANDLE. Where PSPP and SPSS differ, those differences are noted.

The example outputs on this page are mostly plain text because correctness matters more than cosmetics and that's what psppire and command line pspp produce unless you change the output format. If the numbers are wrong, it doesn't matter how pretty the presentation is or how modern the application looks. That said, I do provide a screenshot of psppire in action.

This page is for the curious. Run the examples. See what happens.

PSPP Command Formatting

Before the next section, let's talk about how pspp commands need to be formatted. The psppire GUI formats them automatically when it generates code, but if you are editing a program yourself in the Syntax Editor, another editor, or you accidentally change the code that psppire created, you need to be aware of the rules.

PSPP commands end with periods. Commands may span multiple lines, and indentation does not matter after the first line. The first line of a command must begin in column 1; otherwise PSPP treats it as a continuation of the previous command. Comments begin with * (single‑line) or /* … */ (block). Extra spaces and blank lines are ignored. (Some older SPSS syntax relied on quirks such as a blank lines ending commands, but PSPP does not support those behaviors, so always end a command with a period.)

 

Using PSPP for Data Preparation and Analysis

Before we get into real data and statistical procedures, here’s a tiny example you can type into the psppire Syntax Editor and Run, to see how it works.

ECHO "Hello, world!" .

Hello, world!

Most examples on this page follow this pattern: a short block of PSPP command(s), followed by the output PSPP produces. This shows you how PSPP creates output and also that the PSPP GUI can operate the same as the PSPP commandline pspp program.

PSPP uses a statistical command language similar to SPSS and supports batch processing, scripting, and reproducible analysis. You can write commands in any text editor and save them as .sps. If you prefer a GUI, the psppire Syntax Editor can open, edit, and run these files. The examples on this page were run using both psppire and pspp; PSPP produces the same results in either environment.

If you learn better from examples, feel free to skip ahead, run the examples, and come back later for the explanations and tips.

Here is a short example of reading a CSV file into pspp or psppire:

Data (example.csv):
id,age,sex,score1,score2
101,27,M,88,91
102,34,F,76,82
103,29,F,90,87
104,41,M,72,78
  GET DATA 
  /TYPE=TXT
  /FILE='example.csv'
  /DELIMITERS=","
  /ARRANGEMENT=DELIMITED
  /FIRSTCASE=2
  /VARIABLES=
    id F8
    age F8
    sex A1
    score1 F8
    score2 F8
  .

LIST.
DISPLAY DICTIONARY.

         Data List
+---+---+---+------+------+
| id|age|sex|score1|score2|
+---+---+---+------+------+
|101| 27|M  |    88|    91|
|102| 34|F  |    76|    82|
|103| 29|F  |    90|    87|
|104| 41|M  |    72|    78|
+---+---+---+------+------+

                                   Variables
+------+--------+----------------+-----+-----+---------+----------+-----------+
|      |        |   Measurement  |     |     |         |   Print  |   Write   |
|Name  |Position|      Level     | Role|Width|Alignment|  Format  |   Format  |
+------+--------+----------------+-----+-----+---------+----------+-----------+
|id    |       1|Scale           |Input|    8|Right    |F8.0      |F8.0       |
|age   |       2|Scale           |Input|    8|Right    |F8.0      |F8.0       |
|sex   |       3|Nominal         |Input|    1|Left     |A1        |A1         |
|score1|       4|Scale           |Input|    8|Right    |F8.0      |F8.0       |
|score2|       5|Scale           |Input|    8|Right    |F8.0      |F8.0       |
+------+--------+----------------+-----+-----+---------+----------+-----------+;

This example shows a PSPP command reading the example.csv data file. The PSPP syntax is readable and understandable. The /VARIABLES subcommand is required to read the data (copy the first line of the CSV file, the variable names, add the formats, and remove the commas to get a variable list.) The /FILE subcommand is also required and gives the data file to read.

LIST shows the data that were imported so you can check they are correct. DISPLAY DICTIONARY show the attributes assigned to the variables so you can check they are correct. These are good commands to run after every data import to check that it worked correctly.

PSPPIRE Note: The psppire GUI also shows the list of loaded data and the data dictionary in the Data Editor on the Data View and Variable View tabs at the bottom left of the window, respectively.

pspp Data Editor Views

View Buttons at the bottom of the Data Editor


psppire screenshot

psppire: The Graphical Interface for pspp

psppire is a graphical interface for the PSPP engine. It’s useful for exploring data, running common procedures, and generating basic commands. Many users begin in psppire and use the generated commands in the Syntax Editor as a basis for modifying their program file for further analysis.

Some PSPP commands and options are not available in the psppire menus. When features that the GUI doesn’t expose are needed, commands can be entered directly in a Syntax Editor window or an .sps file can be run with pspp at the command line. psppire runs all PSPP syntax, not just what appears in the dialogs. Thus, psppire can be used to develop and run all PSPP commands.

psppire does not save commands or workflow automatically. If the program is closed without saving the commands, any unsaved commands entered or generated during the session cannot be recovered. Saving matters if you need to reproduce or debug your analysis. PSPP generally prompts you to save, though, be sure to save your work.

And you will need to reproduce your work more often than you think. Schools want to see how you did your assignments, and your boss might want to see how you do things on the job, for example.


This section covers only the basics needed to open psppire, capture syntax, and run commands. A more detailed review of psppire’s behavior and limitations appears later in this page.

Launching psppire depends on your operating system. On most systems it appears in the standard applications menu; on Unix‑like systems it can also be started from a terminal by running psppire.

On Windows, at least from what I've seen, psppire, the graphical interface, is opened from the PSPP icon. The pspp command line program is also installed on Windows and that is what provides the full PSPP language and features beyond what the graphic interface menus expose. Via pspp.exe, the Syntax Editor window can run far more commands than the menus provide.

I renamed my start menu shortcut to "PSPP GUI" so it's clear what it is running, but doing that is beyond the scope of this page.

psppire Syntax Editor - How to Open It

A new PSPP program can be created using using File->New->Syntax which opens an empty Syntax Editor window where commands can be entered and executed. The Run menu executes the commands in the window, either all code in the Editor or the selected code. Files should be saved if they will be used later.

psppire syntax editor

Syntax Editor in psppire from the File Menu of the Data Editor.

Many dialogs in psppire include a "Paste" button. Clicking it opens the Syntax editor (if not already open) and inserts the commands from the dialog. Files should be saved if further work is planned.

psppire paste button

Paste button in psppire in the Frequencies setup.

Opening Existing Files

To Open Existing .sps files, use File->Open. The dialog displays PSPP data files and .sps command files by default. To display only command files, choose Syntax Files (*sps) from the filter menu in the lower right. Navigate to other directories if necessary to find your files.

pspppire open file menu

Open File Menu in psppire from the File Menu in the Data Editor.

Creating a new command file via File->Open, requires an empty .sps file to exist beforehand; otherwise, File->New->Syntax, is simpler.

About PSPP Syntax (Commands)

PSPP syntax uses commands terminated by periods. Commands may span multiple lines. Comments begin with an asterisk (*) or may be written as block comments enclosed in /* and */. Most data management and analysis operations can be expressed in this format, whether run from the command line or from psppire’s Syntax Editor.

Running Commands with pspp and the GUI (psppire)

Commands can be run in the psppire Syntax Editor window via the Run menu there, either all the code in the file or only selected lines. Running a program in psppire opens the Output Window where you can review the results and confirm they match expectations.

In Linux or Unix: > pspp -o example1.new2.list example1.sps is the basic command. -o tells pspp what the output file is called, and then the .sps file is where the commands reside. If there are no errors or warnings, the run will complete without a message. Check the output file on disk (list) to see what happened.

If you need wider output, use -O width=150, but be aware, this does not work for html, ps, or PDF output and will give an "unknown option 'width'" error. Width does work for text and list output, however.

To produce html output, use -o example1.new2.html.

For further details of command line options in pspp, type pspp --help and hit Enter at a command line.

No matter how you got your code — typed, pasted, or generated — you are still using and running syntax.

Next will be a full GET DATA example, including the commands to use and the output.

Reading Data With the GET DATA Command

Note: psppire’s File Import wizard uses GET DATA internally; the generated command appears in the output window. This command can be edited in the Syntax Editor, for example, changing the variable names and labels, without going through the import wizard repeatedly. psppire's Data Editor also allows changing variables names, types, alignment, and widths directly.

I edited the GET DATA command below; it was not generated by the File Import wizard. If an analysis needs to be reproduced—including the data‑input step, the syntax file of PSPP commands must be saved and able to be rerun. This has been my entire work experience with programming.

The LIST output shows what happened, but it does not repeat the import. psppire’s output window always shows the results of your commands, while the command‑line pspp program produces output only when your commands include procedures that generate it or when PSPP reports an error. In either case, the output alone is not enough to recreate the import; the commands used must be saved to document the data-input step.

Note: If psppire is preferred, rather than commandline syntax in an editor, setting the correct delimiter allows the import wizard to split the fields and load the data. However, I have found that its format guesses are sometimes wrong or missing. Variable formats may still need to be added manually (as shown below) to read the data correctly.

* Save the following lines into a text file, for example: data.txt
* GET DATA reads data from files.
*123456 27 1984 Male
*987654 58 1990 Female

GET DATA
  /TYPE = TXT
  /FILE = 'data.txt'
  /DELCASE = LINE
  /DELIMITERS = " "
  /VARIABLES =
      ID  F6
      age F2
      year F4
      sex  A6.
LIST.
DISPLAY DICTIONARY.

Output:

        Data List
+------+---+----+------+
|  ID  |age|year|  sex |
+------+---+----+------+
|123456| 27|1984|Male  |
|987654| 58|1990|Female|
+------+---+----+------+

                                   Variables
+----+--------+----------------+-----+-----+---------+------------+-----------+
|    |        |   Measurement  |     |     |         |            |   Write   |
|Name|Position|      Level     | Role|Width|Alignment|Print Format|   Format  |
+----+--------+----------------+-----+-----+---------+------------+-----------+
|ID  |       1|Scale           |Input|    8|Right    |F6.0        |F6.0       |
|age |       2|Scale           |Input|    8|Right    |F2.0        |F2.0       |
|year|       3|Scale           |Input|    8|Right    |F4.0        |F4.0       |
|sex |       4|Nominal         |Input|    6|Left     |A6          |A6         |
+----+--------+----------------+-----+-----+---------+------------+-----------+

There were no issues with the read of the data. The only limitation of GET DATA is that it always reads data from a file, so a data file must exist. GET DATA reads the data just like the DATA LIST FREE example below. GET DATA gives the same results.

Reading Data in Free Format

The following example demonstrates data processing in PSPP using the command‑line pspp. The same commands can be copied into psppire’s Syntax Editor window and executed there via the Run menu. DATA LIST FREE does not appear in psppire’s pull‑down menus, but it can still be entered and executed in the Syntax Editor.

This example uses the same data as the GET DATA example above and the results should be the same.

BEGIN DATA and END DATA mark the literal block of raw data that PSPP reads when using DATA LIST to read inline raw data.

If there are issues reading the data with DATA LIST FREE, such as bad formats assigned, the only reliable method to read the data is to apply formats for all variables on the DATA LIST FREE line.

* Read simple space‑delimited data.
DATA LIST FREE 
  /ID (F6.0)
   AGE (F2.0)
   YEAR (F4.0) 
   SEX (A6) .
BEGIN DATA.
123456  27 1984 Male
987654  58 1990 Female
END DATA.
FORMATS
  ID   (F6.0)
  AGE  (F2.0)
  YEAR (F4.0)
  SEX  (A6).
LIST.
DISPLAY DICTIONARY.

Output:

        Data List
+------+---+----+------+
|  ID  |AGE|YEAR|  SEX |
+------+---+----+------+
|123456| 27|1984|Male  |
|987654| 58|1990|Female|
+------+---+----+------+
                                    Variables
+----+--------+-----------------+-----+-----+---------+------------+------------+
|Name|Position|Measurement Level| Role|Width|Alignment|Print Format|Write Format|
+----+--------+-----------------+-----+-----+---------+------------+------------+
|ID  |       1|Scale            |Input|    8|Right    |F6.0        |F6.0        |
|AGE |       2|Scale            |Input|    8|Right    |F2.0        |F2.0        |
|YEAR|       3|Scale            |Input|    8|Right    |F4.0        |F4.0        |
|SEX |       4|Nominal          |Input|    6|Left     |A6          |A6          |
+----+--------+-----------------+-----+-----+---------+------------+------------+

DATA LIST FREE can behave unpredictably reading some data. The most dependable approach to ensure correct and reproducible results, is to specify formats for all variables. If in doubt, verify the DATA LIST FREE read by comparing it with GET DATA, which always requires explicit formats for delimited data.

Inspecting Imported Data

Use FREQUENCIES when quick counts, ranges, and missing‑value checks are needed. Use LIST when inspecting exact values, string variables, or formatting. Use DISPLAY DICTIONARY to see the attributes of what was imported. If the data are not being read correctly, add or adjust variable formats, and recheck. It is a good idea to do this with every data import.

Another good thing to do is to look at your data, really look at it. If the data allow, do a plot. Look for large values, small values, values that are clumped together, values that are off by themselves, variables that change together, values that don't belong, and the general shape of the data. This can help you to understand how your analysis might be affected, for example.

Reading Data in Fixed Column Format

DATA LIST FIXED reads variables from specific column locations in a text data file. As long as the text is aligned in columns, PSPP can parse it exactly as specified. Column locations in DATA LIST FIXED are 1-based. Thus the first data column is 1 and below we see ID is in columns 1-5.

This example was run with psppire.

Commands for Fixed Column

DATA LIST FIXED
  /ID    1-5
   NAME  6-13 (A)
   AGE   14-15.
BEGIN DATA.
12345John    27
98765Maria   34
END DATA.

LIST.
DISPLAY DICTIONARY.

PSPP sees:

- ID → columns 1–5 → numeric
- NAME → columns 6–13 → 8‑character string
- AGE → columns 14–15 → numeric

Output

DATA LIST FIXED
  /ID    1-5
   NAME  6-13 (A)
   AGE   14-15.

  Reading 1 record from INLINE.
╭────────┬──────┬───────┬──────╮
│Variable│Record│Columns│Format│
├────────┼──────┼───────┼──────┤
│ID      │     1│1-5    │F5.0  │
│NAME    │     1│6-13   │A8    │
│AGE     │     1│14-15  │F2.0  │
╰────────┴──────┴───────┴──────╯
BEGIN DATA.
12345John    27
98765Maria   34
END DATA.

LIST.
    Data List
╭─────┬─────┬───╮
│  ID │ NAME│AGE│
├─────┼─────┼───┤
│12345│John │ 27│
│98765│Maria│ 34│
╰─────┴─────┴───╯

DISPLAY DICTIONARY.
                                    Variables
╭────┬────────┬─────────────────┬─────┬─────┬─────────┬────────────┬────────────╮
│Name│Position│Measurement Level│ Role│Width│Alignment│Print Format│Write Format│
├────┼────────┼─────────────────┼─────┼─────┼─────────┼────────────┼────────────┤
│ID  │       1│Scale            │Input│    8│Right    │F5.0        │F5.0        │
│NAME│       2│Nominal          │Input│    8│Left     │A8          │A8          │
│AGE │       3│Scale            │Input│    8│Right    │F2.0        │F2.0        │
╰────┴────────┴─────────────────┴─────┴─────┴─────────┴────────────┴────────────╯

There are a couple of points to remember when using DATA LIST FIXED:

1. DATA LIST FIXED is brittle by design

If the file shifts by even one space, every variable after that point is wrong. This is why fixed‑width files must be inspected in a monospace editor and each column range verified to ensure PSPP is reading the data correctly.

2. Strings must match their column width

PSPP does not infer string length in DATA LIST FIXED. If NAME spans columns 6–13, it must be (A8) or (A) with that exact column range. The data width must match the span of the fixed width field.

3. psppire states "Reading 1 record from INLINE" even though there are two lines of data. psppire reads the first line of the INLINE data to determine the variable formats, which it reports. Then it reads the rest of the data.

4. DATA LIST FIXED uses 1-based column locations as opposed to GET DATA reading fixed data, which uses 0-based column locations. PSPP follows this convention internally, so the DATA LIST FIXED and GET DATA commands will not read the same bytes unless the column ranges are adjusted accordingly.

Reading Data with Implied Decimals

Some types of data, including government survey data such as CPS, NHANES in older ASCII releases, and older Census PUMS files, include data fields that have implied decimal places. This means the decimal point is not stored in the data file.

For example, a monthly earnings value that is $582.89 in real life would appear in the raw data as 58289 when two implied decimal places are used. Implied decimals are a legacy formatting method used in many long‑running surveys, and PSPP can read them directly using DATA LIST FIXED with the right setup.

The number of implied decimal places should be defined in the codebook or other documentation supplied for the study. It is usually noted in the variable definition as “IMPLIED DECIMAL” along with the number of decimal places. The codebook may be the only documentation that will tell you where that decimal belongs, so try to get the codebook for your data.

This section goes over a short example of implied-decimal data and how to read it with PSPP.

Raw data:

58289
43750
120055

What the codebook should show:

EARN columns 1-6 IMPLIED DECIMAL 2

** PSPP implied decimals example.
DATA LIST FIXED
  /earn 1-6 (2).
BEGIN DATA
58289
43750
120055
END DATA.

LIST.
DISPLAY DICTIONARY.

For implied decimals only the number of decimals is specified in parentheses for that field, such as EARN in this example. PSPP determines the width from the column range.

Run this in psppire (via the Run menu in the Syntax Editor) and it should produce output similar to the following. psppire uses box-drawing charcters for tables while the command=line pspp uses plain ASCII.

** Implied decimals example.
DATA LIST FIXED
  /earn 1-6 (2).
  Reading 1 record from INLINE.
╭────────┬──────┬───────┬──────╮
│Variable│Record│Columns│Format│
├────────┼──────┼───────┼──────┤
│earn    │     1│1-6    │F6.2  │
╰────────┴──────┴───────┴──────╯
BEGIN DATA
58289
43750
120055
END DATA.

LIST.
Data List
╭───────╮
│  earn │
├───────┤
│ 582.89│
│ 437.50│
│1200.55│
╰───────╯

DISPLAY DICTIONARY.
                                    Variables
╭────┬────────┬─────────────────┬─────┬─────┬─────────┬────────────┬────────────╮
│Name│Position│Measurement Level│ Role│Width│Alignment│Print Format│Write Format│
├────┼────────┼─────────────────┼─────┼─────┼─────────┼────────────┼────────────┤
│earn│       1│Scale            │Input│    8│Right    │F7.2        │F7.2        │
╰────┴────────┴─────────────────┴─────┴─────┴─────────┴────────────┴────────────╯

The LIST output shows that the input values were read with two decimal places applied.

Reading Saved Files

Use GET FILE= to read PSPP system files and portable files. Use GET DATA to read CSV delimited files (see below).

Reading CSV Files with GET DATA: GET DATA can read CSV files by using TYPE=TXT and DELIMITERS=",". CSV files are treated as ordinary delimited text, so variable formats must still be specified. /VARIABLES is required.

* Reading those rare portable SPSS files.
GET FILE='legacy.por'.

* Reading a PSPP system file.
GET FILE='mydata.sav'.

* Reading a CSV file.
GET DATA
  /TYPE=TXT
  /FILE='data.csv'
  /DELIMITERS=","
  /QUALIFIER='"'
  /ARRANGEMENT=DELIMITED
  /FIRSTCASE=2
  /VARIABLES=
    id F8.0
    age F3.0
    gender A1.
    score F5.2
    .

Note: FIRSTCASE tells pspp that the data to read begin on line 2. The variable names are often in line 1, so that line is skipped. FIRSTCASE is not needed if the CSV file is one long line of data and no header row. This example defines the file as DELIMITED, in this case comma-separated values, with the comma as the delimiter and double quotes as the text value qualifier. The /VARIABLES line is required as is the /FILE line.

Note: For CSV and other delimited files, /DELCASE does not work with /ARRANGEMENT=DELIMITED. It does work with /ARRANGEMENT= FIXED.

File Handles in PSPP

File handles help reduce the typing of long path names in the program. Instead, a shorter file handle is defined and after that, the handle can be used to refer to the data file. Below, "survey" refers to "C:\path\to\survey.sav". After the handle is defined in this example, other commands can refer to the file as simply "survey".

After defining it, you can use paths like 'handle/filename.sav' directly in GET FILE and SAVE OUTFILE.

* File handle usage.
FILE HANDLE survey /FILE='C:\path\to\survey.sav'.
GET FILE=survey.

Workflow Tip: A FILE HANDLE can also point to a directory, not just a single file. This makes referencing data files in a common directory faster and keeps your syntax cleaner.

* File handle as a directory.
FILE HANDLE datadir /NAME='/path/to/'.
GET FILE='datadir/survey.sav'.

Workflow Tip: When the file handle is no longer needed, close it with CLOSE FILE HANDLE [handle name]. psppire requires closing file handles so they don’t persist between pspp runs and cause file handle errors when the program is rerun.

The entire sequence of using file handles looks like this:

FILE HANDLE demo /FILE='demo' /FILE='/home/analysis/data/demo.sav'.

GET FILE=demo.

* Do some work here (uses the active file from GET FILE).
FREQUENCIES VARIABLES=age sex income.

CLOSE FILE HANDLE demo.

The CLOSE FILE HANDLE command is is an extension provided by PSPP and is specific to PSPP.

Reading Less Data for Testing: N OF CASES

When developing commands on a large data file, reading many thousands of cases just to test a few lines of code on a couple of values can waste time. N OF CASES tells PSPP to read only the first N cases in the file. This speeds up development and makes debugging easier.

N OF CASES 15.
GET FILE=survey.

* Try out the transformations.
RECODE age income (SYSMIS=0).

* Check a few variables.
FREQUENCIES VARIABLES=age income.

Workflow Tip: Modify these commands to fit the situation. Once the code is working, remove N OF CASES and run the full analysis. This speeds up testing and development goes more smoothly overall. It is especially useful when reading large raw data files, testing recodes, COMPUTE statements, checking variable formats, verifying merges, and building commands incrementally.

Selecting Data for Analysis: SELECT IF

SELECT IF is used to select cases for analysis or testing. Cases that are not selected are removed from the active dataset. If the dataset is saved after this point, the removed cases cannot be recovered except by reloading the original data file.

* Keep only adults with non‑missing income.
SELECT IF (age >= 18) AND (NOT SYSMIS(income)).
EXECUTE.

SYSMIS (the pspp system level missing value) is used here to exclude cases with missing income values.

The remaining sections cover data formats, exporting, psppire behavior, and file‑combining (merging). These topics are not part of the basic procedures shown above, but they are useful when working with real datasets and larger projects. psppire is introduced briefly above so new users can get started; its detailed behavior and limitations are covered in several later sections.

 


Notes and Limitations


 

Up to this point we have seen GET DATA, DATA LIST FIXED, and DATA LIST FREE used to read ordinary raw data files.

Another kind of raw data that still appears in practice is the multi‑record data format. The information about this process is moved to a new multi-record web page to reduce the volume of reading in this page.

The remaining sections cover data formats, exporting, psppire behavior, and file‑combining (merging). These topics are not part of the basic procedures shown above, but they are useful when working with real datasets and larger projects. psppire is introduced briefly above so new users can get started; its detailed behavior and limitations are covered in several later sections.

Recoding Variables using RECODE

The RECODE command can be used to change values in a variable or create new values in a new variable.

In the next two examples, the recoded values are strings, so PSPP requires creating a string variable to hold them. Anything not recoded is given the System Missing value (SYSMIS) because the code includes ELSE=SYSMIS.

This example also includes variable labels, value labels, and variable level definitions to help identify these variables, especially ones being created here.

/* Create  GENDERC wide enough to accept the longest value then recode.*/
STRING GENDERC (A8). 
RECODE GENDER (2="Female") (1="Male") (ELSE=SYSMIS) into GENDERC.

/* Apply labels to the new variable */
VARIABLE LABELS     
  GENDER "Gender of Respondent"
  GENDERC "Gender of Respondent (Char)".

/* Value labels apply to the numeric variable (GENDER), not the string version. */
VALUE LABELS
  GENDER   
    1 "Male"    
    2 "Female"
    .

VARIABLE LEVEL
    GENDER  (NOMINAL)
    GENDERC (NOMINAL).

/* Create a text Treatment var (TRTA) */
STRING TRTA(A1).
RECODE TRTMENT (1="A") (2="B") (3="C") (4="D") (5="E") (6="F") (7="G") 
               (8="H") (9="I") (10="J") 
               (ELSE=SYSMIS) into TRTA.

VARIABLE LABELS TRTA "Treatment Assignment (Char)".
VARIABLE LEVEL TRTA (NOMINAL).
/* Reverse these scales */
RECODE 
   CONSIST TO KEEP_RID  /* First consecutive block
   UNCERT TO POWER  /* Second consecutive block */ 
   MORALITY CONTROL CAUS_YOU /* Individual variables */
   (1=9) (2=8) (3=7) (4=6) (5=5) (6=4) (7=3) (8=2) (9=1) 
   (ELSE=SYSMIS).

This scale-reversing recode is changing the numeric values in the same variables it reads from (CONSIST TO KEEP_RID UNCERT TO POWER, and MORALITY CONTROL CAUS_YOU). Because those variables already exist, no new variables are needed. Anything not recoded is given the System Missing value (SYSMIS).

Workflow Tip: Reverse scoring is always done by explicitly mapping each value to its opposite (e.g. (1=9) (2=8)...). PSPP does not infer the scale range automatically, so you must specify the full mapping.

This example also uses "TO" in the variable list. TO is useful for shortening long variable lists and reducing typing, but it only works on variables that are consecutive in the data set. That is why this example has a break in the variable list. CONSIST TO KEEP_RID covers one consecutive block, UNCERT TO POWER covers another, and MORALITY CONTROL CAUS_YOU are variables not part of any consecutive sequence, so they must be listed separately.

A variable list like CONSIST TO CAUS_YOU causes PSPP to include every variable inclusive between those names in dictionary order — often far more than intended.

Workflow Tip: When recoding a large set of consecutive variables, use TO in the variable list. It reduces typing, prevents spelling errors, and ensures the same transformation is applied consistently across the entire scale.

DISPLAY DICTIONARY. shows the variable order, making it easy to confirm which variables are consecutive before using TO.

RECODE works on both string to numeric coding and numeric to string coding. Also string to string and numeric to numeric coding.

Using COMPUTE to Create Values

COMPUTE creates a new variable or replaces the values of an existing one. It is used for simple arithmetic, scale scores, or system variables such as $CASENUM.

COMPUTE operates on the active dataset and runs once per case. If no data have been read yet, PSPP has no cases to apply the transformation to, and COMPUTE will return an error. This is why you must load or create data before using COMPUTE.

/* Compute an index number based on current case number */
COMPUTE IDN = $CASENUM.

/* Average three items */
COMPUTE SITUATN = (CONSIST + WANTEDBY + IMPROVED) / 3.

/* Average three items using a function */
COMPUTE SITUATN = MEAN(CONSIST, WANTEDBY, IMPROVED).

These examples illustrate the main patterns of COMPUTE: using system variables like $CASENUM, creating scale scores with arithmetic, and using functions inside a transformation. Most PSPP transformations follow one of these forms.

PSPP Function Examples

PSPP has many functions that can be used to work on data. Here are five that are used frequently.

1. MEAN() — the workhorse

Handles missing values gracefully and uses the “function inside COMPUTE” pattern.

COMPUTE avg_score = MEAN(v1, v2, v3, v4).

2. SUM() — simple, predictable, and widely needed. Good for scales, counts, and composite scores.

COMPUTE total = SUM(item1, item2, item3).

3. DATEDIFF() — essential for age and time intervals. This one solves a real problem for archivists and anyone working with dates. $DATE contains today's date as a string, so is converted to a number for the age calculation. birthdate is a variable in the active file and is formatted as MM-DD-YYYY.

COMPUTE age = DATEDIFF(NUMBER($DATE, EDATE10), birthdate, "years").

4. LTRIM() / RTRIM() — string cleanup That will remove leading or trailing spaces from a string variable's values.

COMPUTE clean_name = RTRIM(LTRIM(name)).

5. SD() — a simple statistical function that demonstrates PSPP’s analytic side. Useful for z‑scores or quick diagnostics.

COMPUTE z = (score - MEAN(score)) / SD(score).

6. LAG() — The LAG function returns the value of a variable from the previous case. It is commonly used for detecting group boundaries, computing running totals, and performing transformations that depend on earlier rows.

COMPUTE prev = LAG(variable).

LAG only works correctly when the data is sorted in the order you intend to reference. It cannot be used after TEMPORARY.

7. COMPUTE highbp = (bp_sys > 140). highbp will get values of 0 and 1, depending on the test bp_sys > 140. The test creates Boolean false and true values as 0 and 1.

Workflow Tip: PSPP includes hundreds of functions across math, statistics, strings, dates, logical tests, and data transformations. Only a few were shown here; the full list is in the PSPP manual, which is also linked at the end of this page.

Converting Stacked Data to Wide Format (Using LAG and AGGREGATE)

PSPP does not fully implement SPSS's INPUT PROGRAM for multi-record or stacked-record processing. In particular, END CASE does not reliably flush cases at group boundaries, and PSPP retains only the last flushed case. As a result, stacked to wide conversion cannot be done inside INPUT PROGRAM.

Note: Stacked data is not multi-record data. PSPP's RECORDS= and DATA LIST FIXED cannot process this data because each record repeats the same variables. This section details another method for using pspp to convert this stacked data format to wide format.

PSPP can reshape stacked data using a combination of:

This method works for any number of stacked rows per conceptual case. The example below shows the basic pattern using two stacked rows per case.

Example: Stacked Data (Two Rows Per Case)

Suppose the stacked file contains two rows per case:

Data:

  16 A 1
  18 A 1
  14 B 2
  13 B 2

The goal is to produce:

Expected Output:

 
  value1 value2 group groupn
  16     18     A     1
  14     13     B     2

Now the example will be run in pspp to show the real output.

Step 1 - Read the stacked data normally

Code:

  DATA LIST FILE='stacked.txt'
     /value 1-3 group 4-5 (A) groupn 6-7.

Step 2 - Assign row numbers within each group

rownum identifies the position of each stacked row using the LAG() function.

Code:

  SORT CASES BY groupn.
  COMPUTE rownum = 1.
  DO IF groupn = LAG(groupn).
  COMPUTE rownum = LAG(rownum) + 1.
  END IF.

After this step, the data looks like:

Output Data

         Data List
+-----+-----+------+------+
|value|group|groupn|rownum|
+-----+-----+------+------+
|   16| A   |     1|  1.00|
|   18| A   |     1|  2.00|
|   14| B   |     2|  1.00|
|   13| B   |     2|  2.00|
+-----+-----+------+------+

Step 3 - Use AGGREGATE to pivot to wide format

Note: PSPP does not support SPSS’s WHERE clause inside AGGREGATE expressions. To aggregate conditionally, first filter the cases using SELECT IF, then run AGGREGATE on the filtered subset. Repeat as needed and combine results using MATCH FILES.

Define one output variable per stacked row:

Code:

TEMPORARY.
SELECT IF rownum = 1.
AGGREGATE
 /OUTFILE='first.sav'
 /BREAK=groupn
 /value1 = MAX(value).

LIST.

TEMPORARY.
SELECT IF rownum = 2.
AGGREGATE
 /OUTFILE='second.sav'
 /BREAK=groupn
 /value2 = MAX(value).

LIST.

MATCH FILES
 /FILE='first.sav'
 /FILE='second.sav'
 /BY groupn.
EXECUTE.

LIST.

This produces one wide case per BY group value.

Output:

       Data List
+------+------+------+
|groupn|value1|value2|
+------+------+------+
|     1|    16|    18|
|     2|    14|    13|
+------+------+------+

For stacked schemas with more than two rows, add additional expressions to select additional rownum values and more temporary data sets to use with MATCH FILES. It can get onerous if you have many rows, but it works.

Why this works

AGGREGATE evaluates each expression once per group:

– value1 gets the first stacked row

– value2 gets the second stacked row

Exporting and Saving PSPP Data (CSV, POR, SAV)

PSPP can export data in several formats using commands, which is the most reproducible way to create files for use in other software. The examples below show how to write datasets in common formats. psppire also supports exporting through its File->Export dialog; its available formats are described in a later section.

/* Set the working directory to avoid path errors when saving files. */
CD '/home/user/analysis'.

SAVE OUTFILE='spss/emo.sav'.

EXPORT OUTFILE='spss/emo.por'.

/* Write a CSV file with variable names in the first row. */
SAVE TRANSLATE
  /OUTFILE='spss/emo.csv'
  /TYPE=CSV
  /FIELDNAMES
  /REPLACE.

The commands above show how to write data files in several formats using pspp commands. SAVE creates an SPSS system file (.sav), and EXPORT writes a portable file (.por) for transferring data to other applications that support this format. The SAVE TRANSLATE command writes a CSV file with variable names in the first row; the REPLACE option allows it to overwrite an existing file. SAVE TRANSLATE can also write tab‑separated files by specifying TYPE=TAB.

Note: CD is a PSPP command (change directory) and one way to change where files are saved or imported from. And alternative is to use a FILE HANDLE for the directory and then use the file handle as the location of files, e.g. in GET FILE.

PSPP commands can be run either from a shell command line or by pasting it into psppire’s Syntax Editor. psppire can also export the assembled data directly through its File->Export dialog, which provides a subset of the formats available in PSPP commands. The following section describes psppire itself in more detail.

psppire: The PSPP Graphical Interface

Many PSPP users prefer psppire, the graphical interface that resembles the SPSS Data Editor. psppire is useful for data entry, quick exploration, and running common procedures without writing commands. However, it does not expose all of PSPP's capabilities, and some dialogs correspond to commands that are only partially implemented.

For reproducible analysis, batch processing, or advanced procedures, PSPP command language is still the recommended approach. psppire can generate commands for many procedures, which can then be copied, edited, and reused in command files. But there are many more PSPP commands outside the GUI that may be needed from time to time.

For example, merging datasets (MATCH FILES) is a standard PSPP operation, but it is not currently available through psppire’s menus.

psppire is ideal for learning pspp commands by example, but complex workflows are best handled directly in syntax files where all the PSPP commands can be used, including commands not in psppire. psppire shows only the commands for which dialogs exist; PSPP commands support many additional commands and options (although any PSPP commands can be used in psppire's Syntax Editor).

psppire and File Handles vs. Rerunning Programs

psppire keeps FILE HANDLE definitions for the duration of the session. If a file handle is defined via commands and then the program is rerun, psppire will report that the handle is already in use. This does not happen when running PSPP from the command line, because each run starts a fresh session.

When using FILE HANDLEs to give files meaningful names, add CLOSE FILE HANDLE commands at the end of the program. This removes the handles so the command file can be rerun without errors about handles in use.

CLOSE FILE HANDLE demo.
CLOSE FILE HANDLE psych.
CLOSE FILE HANDLE out.

Closing the file handles prevents the “handle already in use” error and allows rerunning the commands in psppire without restarting the application.

psppire Export Formats

psppire can export the contents of the Output window in several formats. Only the formats shown in the File->Export menu are supported; if a format is not listed, psppire does not produce it. The export format is determined by the output file extension entered in the dialog. For example, entering "myfile.pdf" in the export dialog causes psppire to generate a PDF file.

Rich / Page‑Description Formats (cairo‑based)

These preserve layout, fonts, borders, and the exact appearance of the Output window.

Structured / Document Formats

These formats preserve tables and structure, suitable for editing or further processing.

Plain Data Formats

These formats provide minimal formatting and are useful for scripting, data interchange or raw text.

Unsupported Formats

Formats not shown in the File->Export menu are not implemented in the current version of psppire (or the one being used). Examples include:

Choosing a Format

Merging Files in PSPP

PSPP supports several methods for combining or merging datasets, including ADD FILES, MATCH FILES, and UPDATE. A complete set of examples and notes is available on the pspp merge page: Merging Files in PSPP.

Analysis of Data with PSPP

PSPP Analysis Commands This page contains DESCRIPTIVES, FREQUENCIES, CROSSTABS, LINEAR REGRESSION, ONEWAY, and other pspp analysis commands. The plan is to add more analysis sections.

APPENDIX

Further Reading

The GNU PSPP Web-Based Manual. This is the authoritative file. If it's missing in the PDF, you should see it on the web page. It is worth consulting when details beyond the examples shown here are needed, or when exploring options available in the psppire menus.

The PSPP user manual provides the reference for all commands, functions, and procedures available in PSPP.

Installing PSPP

PSPP is at version 2.1.1 currently and after several important bug fixes (file handle close, writing portable files, and reading multi-record data with skipped records in DATA LIST, among other issues) and the addition of GLM and CTABLES commands in the last several years, PSPP is more capable than ever. If your version is older than 2.1.1 consider upgrading.

PSPP download page at gnu.org The GNU PSPP site gives installation instructions for Windows, Mac, Debian, Ubuntu, Fedora, and with Flatpak.

PSPP source can also be obtained with git: savannah.gnu.org/git

See INSTALL and README.Git for details. gmake -f Smake bootstraps the build.

FreeBSD: PSPP is not in the FreeBSD ports collection (expired 2025-03-01 after being marked broken). FreeBSD users can still install PSPP by building it from the official GNU source release. The PSPP build instructions work on FreeBSD. See pspp/INSTALL after cloning with Git for the details. Be sure to install all prerequisite programs. Use gmake in the build.