PSPP Syntax Guide: Examples and Notes

PSPP is a free, open-source alternative to SPSS. It supports many of the same statistical procedures, uses a similar syntax language, and is well-suited for batch processing, scripting, reproducible analysis, as well as interactive use.

 

About This Page

This page collects practical, syntax‑first examples for PSPP. It focuses on commands people actually use — DATA LIST, REGRESSION, CROSSTABS, multi‑record (“stacked”) data handling, and small workflow techniques such as using functions and computing p‑values from ANOVA output. The goal is to give clear, reproducible patterns you can adapt to your own work.

Note: A common misconception is that PSPP lacks support for multi‑record or hierarchical data files. In fact, PSPP has long supported reading multi‑record data using DATA LIST FIXED with RECORDS= and slash notation.

"Stacked" data handling as used here is unrelated to modern ‘data stacking’ (VARTOCASES), which restructures data into long format.

 

Programming with PSPP

PSPP uses a syntax language similar to SPSS and supports batch processing, scripting, and reproducible analysis. You can write syntax in any text editor and save it as .sps. If you prefer a GUI, the psppire Syntax Editor can open, edit, and run these files, but the examples here assume a syntax‑driven workflow.

psppire screenshot

psppire: The Graphical Interface for pspp

psppire is a graphical interface for the PSPP engine. It’s useful for exploring data, running common procedures, and generating basic syntax. Many users begin in psppire and then copy the generated syntax into the Syntax Editor or a program file.

Some PSPP commands and options are not available in the psppire menus. When features that the GUI doesn’t expose are needed, syntax can be entered directly in a syntax window or an .sps file can be run with pspp. psppire runs all PSPP syntax, not just what appears in the dialogs. Thus, psppire can be used to develop and run PSPP syntax.

psppire does not save syntax or workflow automatically. If the program is closed without saving the syntax, any unsaved syntax entered or generated during the session cannot be recovered. Saving matters if you need to reproduce or debug your analysis. PSPP generally prompts you to save, though, so be aware.


This section covers only the basics needed to open psppire, capture syntax, and run syntax. A more detailed review of psppire’s behavior and limitations appears later in this page.

Launching psppire depends on your operating system. On most systems it appears in the standard applications menu; on Unix‑like systems it can also be started from a terminal by running psppire.

psppire Syntax Editor - How to Open It

A new PSPP program can be created using using File->New->Syntax which opens an empty Syntax Editor window where commands can be entered and executed. The Run menu executes the commands in the window, either all code in the Editor or the selected code. Files should be saved if they will be used later.

Many dialogs in psppire include a "Paste" button. Clicking it opens the Syntax editor (if not already open) and inserts the syntax corresponding to the dialog choices. This is a quick way to learn PSPP syntax and to refine or extend the generated commands. Files should be saved if further work is planned.

Opening Existing Files

To Open Existing .sps files, use File->Open. The dialog displays PSPP data files and .sps syntax files by default. To display only syntax files, choose Syntax Files (*sps) from the filter menu in the lower right.

Creating a new syntax file via File->Open, requires an empty .sps file to exist beforehand; otherwise, File->New->Syntax, is simpler.

About PSPP Syntax

PSPP syntax uses commands terminated by periods. Commands may span multiple lines. Comments begin with an asterisk (*) or may be written as block comments enclosed in /* and */. Most data management and analysis operations can be expressed in this format, whether run from the command line or from psppire’s Syntax Editor.

Example: Running Syntax Not Available in psppire Menus

With these basics in place, the following example demostrates data processing in PSPP using the command‑line pspp. The same syntax can be copied into psppire’s Syntax Editor window and executed there. DATA LIST FREE does not appear in psppire’s pull‑down menus; psppire provides no GUI menu item for this command, but it can still be entered and executed in the Syntax Editor.

BEGIN DATA and END DATA mark the literal block of raw data that PSPP reads when using DATA LIST to read raw data.

If there are issues reading the data with DATA LIST FREE, such as bad formats, the only reliable method to read the data is to apply formats for all variables on the DATA LIST FREE line.

* Read simple space‑delimited data.
DATA LIST FREE 
  /ID (F6.0)
   AGE (F2.0)
   YEAR (F4.0) 
   SEX (A6) .
BEGIN DATA.
123456  27 1984 Male
987654  58 1990 Female
END DATA.
FORMATS
  ID   (F6.0)
  AGE  (F2.0)
  YEAR (F4.0)
  SEX  (A6).
LIST.
DISPLAY DICTIONARY.

Output:

        Data List
+------+---+----+------+
|  ID  |AGE|YEAR|  SEX |
+------+---+----+------+
|123456| 27|1984|Male  |
|987654| 58|1990|Female|
+------+---+----+------+

                                    Variables
+----+--------+-----------------+-----+-----+---------+------------+------------+
|Name|Position|Measurement Level| Role|Width|Alignment|Print Format|Write Format|
+----+--------+-----------------+-----+-----+---------+------------+------------+
|ID  |       1|Scale            |Input|    8|Right    |F6.0        |F6.0        |
|AGE |       2|Scale            |Input|    8|Right    |F2.0        |F2.0        |
|YEAR|       3|Scale            |Input|    8|Right    |F4.0        |F4.0        |
|SEX |       4|Nominal          |Input|    6|Left     |A6          |A6          |
+----+--------+-----------------+-----+-----+---------+------------+------------+

DATA LIST FREE can behave unpredictably reading some data. To ensure correct and reproducible results, specify formats for all variables. If in doubt, verify the DATA LIST FREE read by comparing it with GET DATA, which always requires explicit formats.

Note: If psppire is preferred, rather than commandline syntax in an editor, setting the correct delimiter allows the import wizard to split the fields and load the data. However, its format guesses are sometimes wrong or missing. Variable formats may still need to be added manually (as shown above) to read the data correctly.

Inspecting Imported Data

Use FREQUENCIES when quick counts, ranges, and missing‑value checks are needed. Use LIST when inspecting exact values, string variables, or formatting. Use DISPLAY DICTIONARY to see the attributes of what was imported. If the data are not being read correctly, add or adjust variable formats, and recheck. It is a good idea to do this with every data read.

Another good thing to do is to look at your data, really look at it. If the data allow, do a plot. Look for large values, small values, values that are clumped together, values that are off by themselves, variables that change together, values that don't belong, and the general shape of the data. This can help you to understand how your analysis might be affected, for example.

GET DATA command

This example uses the same dataset as the DATA LIST FREE example above so the results of GET DATA can be compared with those from DATA LIST FREE.

Note: psppire’s File Import wizard uses GET DATA internally; the generated command appears in the output window. This syntax can be edited in the syntax editor, for example, changing the variable names and labels, without going through the import wizard repeatedly. psppire's data editor also allows changing variables names, types, alignment, and widths directly.

The GET DATA syntax below was edited by hand; it was not generated by the File Import wizard. If an analysis needs to be reproduced—including the data‑input step—the syntax file of PSPP commands must be saved and rerun. The list output shows what happened, but it does not repeat the import. psppire’s output window always shows the results of your commands, while the command‑line pspp program produces output only when your syntax includes procedures that generate it or when PSPP reports an error. In either case, the output alone is not enough to recreate the import; the syntax must be saved to document the data-input step.

* Save the following lines into a text file, for example: data.txt
* GET DATA reads data from files.
*123456 27 1984 Male
*987654 58 1990 Female

GET DATA
  /TYPE = TXT
  /FILE = 'data.txt'
  /DELCASE = LINE
  /DELIMITERS = " "
  /VARIABLES =
      ID  F6
      age F2
      year F4
      sex  A6.
LIST.
DISPLAY DICTIONARY.

Output:

        Data List
+------+---+----+------+
|  ID  |age|year|  sex |
+------+---+----+------+
|123456| 27|1984|Male  |
|987654| 58|1990|Female|
+------+---+----+------+

                                   Variables
+----+--------+----------------+-----+-----+---------+------------+-----------+
|    |        |   Measurement  |     |     |         |            |   Write   |
|Name|Position|      Level     | Role|Width|Alignment|Print Format|   Format  |
+----+--------+----------------+-----+-----+---------+------------+-----------+
|ID  |       1|Scale           |Input|    8|Right    |F6.0        |F6.0       |
|age |       2|Scale           |Input|    8|Right    |F2.0        |F2.0       |
|year|       3|Scale           |Input|    8|Right    |F4.0        |F4.0       |
|sex |       4|Nominal         |Input|    6|Left     |A6          |A6         |
+----+--------+----------------+-----+-----+---------+------------+-----------+

There were no issues with the read of the data. The only limitation of GET DATA is that it always reads data from a file, so a data file must exist. GET DATA reads the data just like the DATA LIST FREE example above. It gives the same results.

Reading CSV Files with GET DATA: GET DATA can read CSV files by using TYPE=TXT and DELIMITERS=",". CSV files are treated as ordinary delimited text, so variable formats must still be specified.

Reading Data in Fixed Column Format

DATA LIST FIXED reads variables from specific column locations in a text data file. As long as the text is aligned in columns, PSPP can parse it exactly as specified.

This example was run with psppire.

Syntax

DATA LIST FIXED
  /ID    1-5
   NAME  6-13 (A)
   AGE   14-15.
BEGIN DATA.
12345John    27
98765Maria   34
END DATA.

LIST.
DISPLAY DICTIONARY.

PSPP sees:

- ID → columns 1–5 → numeric
- NAME → columns 6–13 → 8‑character string
- AGE → columns 14–15 → numeric

Output

DATA LIST FIXED
  /ID    1-5
   NAME  6-13 (A)
   AGE   14-15.

  Reading 1 record from INLINE.
╭────────┬──────┬───────┬──────╮
│Variable│Record│Columns│Format│
├────────┼──────┼───────┼──────┤
│ID      │     1│1-5    │F5.0  │
│NAME    │     1│6-13   │A8    │
│AGE     │     1│14-15  │F2.0  │
╰────────┴──────┴───────┴──────╯
BEGIN DATA.
12345John    27
98765Maria   34
END DATA.

LIST.
    Data List
╭─────┬─────┬───╮
│  ID │ NAME│AGE│
├─────┼─────┼───┤
│12345│John │ 27│
│98765│Maria│ 34│
╰─────┴─────┴───╯

DISPLAY DICTIONARY.
                                    Variables
╭────┬────────┬─────────────────┬─────┬─────┬─────────┬────────────┬────────────╮
│Name│Position│Measurement Level│ Role│Width│Alignment│Print Format│Write Format│
├────┼────────┼─────────────────┼─────┼─────┼─────────┼────────────┼────────────┤
│ID  │       1│Scale            │Input│    8│Right    │F5.0        │F5.0        │
│NAME│       2│Nominal          │Input│    8│Left     │A8          │A8          │
│AGE │       3│Scale            │Input│    8│Right    │F2.0        │F2.0        │
╰────┴────────┴─────────────────┴─────┴─────┴─────────┴────────────┴────────────╯

There are a couple of points to remember when using DATA LIST FIXED:

1. DATA LIST FIXED is brittle by design

If the file shifts by even one space, every variable after that point is wrong. This is why fixed‑width files must be inspected in a monospace editor and each column range verified to ensure PSPP is reading the data correctly.

2. Strings must match their column width

PSPP does not infer string length in DATA LIST FIXED. If NAME spans columns 6–13, it must be (A8) or (A) with that exact column range. The data width must match the span of the fixed width field.

3. psppire states "Reading 1 record from INLINE"

even though there are two lines of data. psppire reads the first line of the INLINE data to determine the variable formats, which it reports. Then it reads the rest of the data.

Reading Data with Implied Decimals

Some types of data, including government survey data such as CPS, NHANES in older ASCII releases, and older Census PUMS files, include data fields that have implied decimal places. This means the decimal point is not stored in the data file.

For example, a monthly earnings value that is $582.89 in real life would appear in the raw data as 58289 when two implied decimal places are used. Implied decimals are a legacy formatting method used in many long‑running surveys, and PSPP can read them directly using DATA LIST FIXED with the right setup.

The number of implied decimal places should be defined in the codebook or other documentation supplied for the study. It is usually noted in the variable definition as “IMPLIED DECIMAL” along with the number of decimal places.

This section goes over a short example of implied-decimal data and how to read it with PSPP.

Raw data:

58289
43750
120055

What the codebook should show:

EARN columns 1-6 IMPLIED DECIMAL 2

** PSPP implied decimals example.
DATA LIST FIXED
  /earn 1-6 (2).
BEGIN DATA
58289
43750
120055
END DATA.

LIST.
DISPLAY DICTIONARY.

For implied decimals only the number of decimals is specified in parentheses for that field, such as EARN in this example. PSPP determines the width from the column range.

Run this in psppire (via the Run menu) and it should produce output similar to the following. psppire uses box-drawing charcters for tables while the command=line pspp uses plain ASCII.

** Implied decimals example.
DATA LIST FIXED
  /earn 1-6 (2).
  Reading 1 record from INLINE.
╭────────┬──────┬───────┬──────╮
│Variable│Record│Columns│Format│
├────────┼──────┼───────┼──────┤
│earn    │     1│1-6    │F6.2  │
╰────────┴──────┴───────┴──────╯
BEGIN DATA
58289
43750
120055
END DATA.

LIST.
Data List
╭───────╮
│  earn │
├───────┤
│ 582.89│
│ 437.50│
│1200.55│
╰───────╯

DISPLAY DICTIONARY.
                                    Variables
╭────┬────────┬─────────────────┬─────┬─────┬─────────┬────────────┬────────────╮
│Name│Position│Measurement Level│ Role│Width│Alignment│Print Format│Write Format│
├────┼────────┼─────────────────┼─────┼─────┼─────────┼────────────┼────────────┤
│earn│       1│Scale            │Input│    8│Right    │F7.2        │F7.2        │
╰────┴────────┴─────────────────┴─────┴─────┴─────────┴────────────┴────────────╯

The LIST output shows that the input values were read with two decimal places applied.

Reading Saved Files

Use GET FILE= to read PSPP system files and portable files. Use GET DATA to read CSV files.

* Reading those rare portable SPSS files.
GET FILE='legacy.por'.

*Reading a PSPP system file.
GET FILE='mydata.sav'.

*Reading a CSV file.
GET DATA
  /TYPE=CSV
  /FILE='data.csv'
  /DELCASE=LINE
  /DELIMITERS=","
  /QUALIFIER='"'
  /ARRANGEMENT=DELIMITED
  /FIRSTCASE=2
  /VARIABLES=
    id F8.0
    age F3.0
    gender A1.
    score F5.2
    .

Note: FIRSTCASE tells pspp that the data to read begin on line 2. The variable names are often in line 1, so that line is skipped. FIRSTCASE is not needed if the CSV file is one long line of data and no header row. This example defines the file as DELIMITED, in this case comma-separated values, with the comma as the delimiter and double quotes as the text qualifier.

File Handles in PSPP

File handles help reduce the typing of long path names in the program. Instead, a shorter file handle is defined and after that, the handle can be used to refer to the data file. Below, "survey" refers to "C:\path\to\survey.sav". After the handle is defined in this example, other commands can refer to the file as simply "survey".

* File handle usage.
FILE HANDLE survey /FILE='C:\path\to\survey.sav'.
GET FILE=survey.

When the file handle is no longer needed, close it with CLOSE FILE HANDLE [handle name]. psppire requires closing file handles so they don’t persist between runs and cause file handle errors when the program is rerun.

The entire sequence of using file handles looks like this:

FILE HANDLE demo /FILE='demo' /FILE='/home/analysis/data/demo.sav'.

GET FILE=demo.

* Do some work here (uses the active file from GET FILE).
FREQUENCIES VARIABLES=age sex income.

CLOSE FILE HANDLE demo.

Reading Less Data for Testing: N OF CASES

When developing syntax on a large data file, reading many thousands of cases just to test a few lines of code on a couple of values wastes time. N OF CASES tells PSPP to read only the first N cases in the file. This speeds up development and makes debugging easier.

N OF CASES 15.
GET FILE=survey.

* Try out the transformations.
RECODE age income (SYSMIS=0).

* Check a few variables.
FREQUENCIES VARIABLES=age income.

Modify these commands to fit the situation. Once the code is working, remove N OF CASES and run the full analysis. This speeds up testing and development goes more smoothly overall. It is especially useful when reading large raw data files, testing recodes, COMPUTE statements, checking variable formats, verifying merges, and building syntax incrementally.

Selecting Data for Analysis: SELECT IF

SELECT IF is used to select cases for analysis or testing. Cases that are not selected are removed from the active dataset. If the dataset is saved after this point, the removed cases cannot be recovered except by reloading the original data file.

* Keep only adults with non‑missing income.
SELECT IF (age >= 18) AND (NOT SYSMIS(income)).
EXECUTE.

SYSMIS is used here to exclude cases with missing income values.

Descriptive Statistics

The DESCRIPTIVES command computes means, standard deviations, and other summary statistics. PSPP supports variable lists, subcommands, and formatting options. These are specified through the DESCRIPTIVES command syntax.

DESCRIPTIVES does not read raw data, so if no active dataset is available—or there is no PSPP system file to load—use DATA LIST or GET DATA to read the raw data before running the procedure.

* Short example dataset for DESCRIPTIVES.
DATA LIST FREE 
  /x y z.
BEGIN DATA
5  24  7  
96  8  3
END DATA.
LIST.

DESCRIPTIVES VARIABLES = x y z
  /STATISTICS = MEAN STDDEV MIN MAX.

Output:

     Data List
+-----+-----+----+
|  x  |  y  |  z |
+-----+-----+----+
| 5.00|24.00|7.00|
|96.00| 8.00|3.00|
+-----+-----+----+

                Descriptive Statistics
+--------------------+-+-----+-------+-------+-------+
|                    |N| Mean|Std Dev|Minimum|Maximum|
+--------------------+-+-----+-------+-------+-------+
|x                   |2|50.50|  64.35|   5.00|  96.00|
|y                   |2|16.00|  11.31|   8.00|  24.00|
|z                   |2| 5.00|   2.83|   3.00|   7.00|
|Valid N (listwise)  |2|     |       |       |       |
|Missing N (listwise)|0|     |       |       |       |
+--------------------+-+-----+-------+-------+-------+

This example uses a very small dataset so the results are easy to verify. The LIST command shows that the data were read correctly. Using LIST is always good practice when using DATA LIST FREE. The means and minimum/maximum values can be checked by inspection, and although computing the standard deviations requires a bit more arithmetic, the values reported are consistent with the data.

If the /SAVE option is included, DESCRIPTIVES also computes Z‑scores for all variables specified and adds them to the active dataset as new variables. This works in PSPP syntax, and psppire includes a checkbos in the DESCRIPTIVES dialog labeled “Save Z‑scores of selected variables as new.”.

The FORMAT subcommand used by DESCRIPTIVES is accepted for backward compatibility but has no effect in PSPP.

Frequencies

The FREQUENCIES command produces counts and percentages for the values of one or more variables and is often the first procedure run after reading a dataset, because it quickly shows whether the values were read correctly and whether any unexpected categories or codes appear.

FREQUENCIES can also produce summary statistics and simple charts, but in its basic form it is most useful as a data‑checking tool.

DATA LIST FREE /group score.
BEGIN DATA
1 10
1 12
2 15
2 18
2 18
END DATA.

LIST.

FREQUENCIES VARIABLES = group score.

Output:

  Data List
+-----+-----+
|group|score|
+-----+-----+
| 1.00|10.00|
| 1.00|12.00|
| 2.00|15.00|
| 2.00|18.00|
| 2.00|18.00|
+-----+-----+

       Statistics
+---------+-----+-----+
|         |group|score|
+---------+-----+-----+
|N Valid  |    5|    5|
|  Missing|    0|    0|
+---------+-----+-----+
|Mean     | 1.60|14.60|
+---------+-----+-----+
|Std Dev  |  .55| 3.58|
+---------+-----+-----+
|Minimum  | 1.00|10.00|
+---------+-----+-----+
|Maximum  | 2.00|18.00|
+---------+-----+-----+

                             group
+----------+---------+-------+-------------+------------------+
|          |Frequency|Percent|Valid Percent|Cumulative Percent|
+----------+---------+-------+-------------+------------------+
|Valid 1.00|        2|  40.0%|        40.0%|             40.0%|
|      2.00|        3|  60.0%|        60.0%|            100.0%|
+----------+---------+-------+-------------+------------------+
|Total     |        5| 100.0%|             |                  |
+----------+---------+-------+-------------+------------------+

                              score
+-----------+---------+-------+-------------+------------------+
|           |Frequency|Percent|Valid Percent|Cumulative Percent|
+-----------+---------+-------+-------------+------------------+
|Valid 10.00|        1|  20.0%|        20.0%|             20.0%|
|      12.00|        1|  20.0%|        20.0%|             40.0%|
|      15.00|        1|  20.0%|        20.0%|             60.0%|
|      18.00|        2|  40.0%|        40.0%|            100.0%|
+-----------+---------+-------+-------------+------------------+
|Total      |        5| 100.0%|             |                  |
+-----------+---------+-------+-------------+------------------+

The DATA LIST successfully read the data, as shown by the LIST output. The statistics we requested appear next. Of particular interest are the counts of valid and missing values. In this case all values are valid and none are missing, because this small example was constructed to be complete. Real datasets often contain missing or unexpected values, and FREQUENCIES is a quick way to identify them.

Following the statistics output are the frequency tables for group and score. There are two groups and 5 subjects. The score variable shows 18 occurs twice and the o ther values occur once. These results can be checked directly against the data.

FREQUENCIES can summarize multiple variables at once, but it treats each variable separately. CROSSTABS and CTABLES can also produce counts for categorical variables, but they’re designed for examining relationships or producing formatted tables, not for quick single‑variable inspection.

Cross-Tabulation

The CROSSTABS command displays the relationship between two categorical variables by showing the joint distribution of their values. It is often used after FREQUENCIES to check that the categories of each variable combine as expected.

Two-Variable Crosstabs with Counts and Percentages

The simplest use of CROSSTABS is a two‑variable table. The following example shows a basic 2×2 crosstab with counts, percentages, and the chi‑square test.

CROSSTABS can also produce row, column, and total percentages, which make it easier to compare groups. The same small dataset is used in this example to show how the values of one variable are distributed within the categories of another. It also requests the chi‑square test, which PSPP prints below the table.

Output:

DATA LIST FREE /group outcome.
BEGIN DATA
1 0
1 1
2 0
2 1
2 1
END DATA.
LIST.
CROSSTABS
  /TABLES = group BY outcome
  /STATISTICS = CHISQ.

   Data List
╭─────┬───────╮
│group│outcome│
├─────┼───────┤
│ 1.00│    .00│
│ 1.00│   1.00│
│ 2.00│    .00│
│ 2.00│   1.00│
│ 2.00│   1.00│
╰─────┴───────╯
CROSSTABS
  /TABLES = group BY outcome
  /STATISTICS = CHISQ.

                    Summary
╭───────────────┬─────────────────────────────╮
│               │            Cases            │
│               ├─────────┬─────────┬─────────┤
│               │  Valid  │ Missing │  Total  │
│               ├─┬───────┼─┬───────┼─┬───────┤
│               │N│Percent│N│Percent│N│Percent│
├───────────────┼─┼───────┼─┼───────┼─┼───────┤
│group × outcome│5│ 100.0%│0│    .0%│5│ 100.0%│
╰───────────────┴─┴───────┴─┴───────┴─┴───────╯
              group × outcome
╭───────────────────┬─────────────┬──────╮
│                   │   outcome   │      │
│                   ├──────┬──────┤      │
│                   │  .00 │ 1.00 │ Total│
├───────────────────┼──────┼──────┼──────┤
│group 1.00 Count   │     1│     1│     2│
│           Row %   │ 50.0%│ 50.0%│100.0%│
│           Column %│ 50.0%│ 33.3%│ 40.0%│
│           Total % │ 20.0%│ 20.0%│ 40.0%│
│     ╶─────────────┼──────┼──────┼──────┤
│      2.00 Count   │     1│     2│     3│
│           Row %   │ 33.3%│ 66.7%│100.0%│
│           Column %│ 50.0%│ 66.7%│ 60.0%│
│           Total % │ 20.0%│ 40.0%│ 60.0%│
├───────────────────┼──────┼──────┼──────┤
│Total      Count   │     2│     3│     5│
│           Row %   │ 40.0%│ 60.0%│100.0%│
│           Column %│100.0%│100.0%│100.0%│
│           Total % │ 40.0%│ 60.0%│100.0%│
╰───────────────────┴──────┴──────┴──────╯
                                               Chi-Square Tests
╭────────────────────────────┬─────┬──┬──────────────────────────┬─────────────────────┬─────────────────────╮
│                            │Value│df│Asymptotic Sig. (2-tailed)│Exact Sig. (2-tailed)│Exact Sig. (1-tailed)│
├────────────────────────────┼─────┼──┼──────────────────────────┼─────────────────────┼─────────────────────┤
│Pearson Chi-Square          │  .14│ 1│                      .709│                     │                     │
│Likelihood Ratio            │  .14│ 1│                      .710│                     │                     │
│Fisher's Exact Test         │     │  │                          │                1.033│                 .700│
│Continuity Correction       │  .00│ 1│                     1.000│                     │                     │
│Linear-by-Linear Association│  .11│ 1│                      .739│                     │                     │
│N of Valid Cases            │    5│  │                          │                     │                     │
╰────────────────────────────┴─────┴──┴──────────────────────────┴─────────────────────┴─────────────────────╯

The crosstabulation shows how the categories of the two variables combine. This confirms that the values were read correctly and that all expected category combinations appear. The chi‑square test is printed below the table. In this example the p‑value is 0.709. Whether this is considered evidence against the null hypothesis depends on the analyst’s chosen significance level and the question being asked. The important point here is that the procedure ran correctly and produced the expected statistics.

Three-Variable Crosstabs Example

CROSSTABS can include more than two variables. When you specify three variables, PSPP produces a single nested table (for example, AGE × YEAR × SEX) with hierarchical counts and totals. If you include additional variables, PSPP may split the output into multiple tables depending on how much nesting it can format. For more flexible multi‑layered layouts, use CTABLES, which is designed for complex, formatted tables.

Here is a small example of survey data that collects YEAR of the survey, AGEGROUP of respondent and INCCAT, the respondent's income category. CROSSTABS can display all three categories.

** Survey data example .

  DATA LIST LIST /
  YEAR (F4.0)
  AGEGROUP (A5)
  INCCAT (A7).

Reading free-form data from INLINE.
╭────────┬──────╮
│Variable│Format│
├────────┼──────┤
│YEAR    │F4.0  │
│AGEGROUP│A5    │
│INCCAT  │A7    │
╰────────┴──────╯

BEGIN DATA
2018 18-29 <30k
2018 18-29 30-60k
2018 30-44 30-60k
2018 45-64 60-90k
2020 18-29 30-60k
2020 30-44 60-90k
2020 30-44 90k+
2020 45-64 30-60k
END DATA.

CROSSTABS
  /TABLES = AGEGROUP BY INCCAT BY YEAR.

                         Summary
╭────────────────────────┬─────────────────────────────╮
│                        │            Cases            │
│                        ├─────────┬─────────┬─────────┤
│                        │  Valid  │ Missing │  Total  │
│                        ├─┬───────┼─┬───────┼─┬───────┤
│                        │N│Percent│N│Percent│N│Percent│
├────────────────────────┼─┼───────┼─┼───────┼─┼───────┤
│AGEGROUP × INCCAT × YEAR│8│ 100.0%│0│    .0%│8│ 100.0%│
╰────────────────────────┴─┴───────┴─┴───────┴─┴───────╯


                   AGEGROUP × INCCAT × YEAR
╭──────────────────────────────┬───────────────────────┬─────╮
│                              │         INCCAT        │     │
│                              ├──────┬──────┬────┬────┤     │
│                              │30-60k│60-90k│90k+│<30k│Total│
├──────────────────────────────┼──────┼──────┼────┼────┼─────┤
│YEAR 2018 AGEGROUP 18-29 Count│     1│     0│    │   1│    2│
│                  ╶───────────┼──────┼──────┼────┼────┼─────┤
│                   30-44 Count│     1│     0│    │   0│    1│
│                  ╶───────────┼──────┼──────┼────┼────┼─────┤
│                   45-64 Count│     0│     1│    │   0│    1│
│         ╶────────────────────┼──────┼──────┼────┼────┼─────┤
│          Total          Count│     2│     1│    │   1│    4│
│    ╶─────────────────────────┼──────┼──────┼────┼────┼─────┤
│     2020 AGEGROUP 18-29 Count│     1│     0│   0│    │    1│
│                  ╶───────────┼──────┼──────┼────┼────┼─────┤
│                   30-44 Count│     0│     1│   1│    │    2│
│                  ╶───────────┼──────┼──────┼────┼────┼─────┤
│                   45-64 Count│     1│     0│   0│    │    1│
│         ╶────────────────────┼──────┼──────┼────┼────┼─────┤
│          Total          Count│     2│     1│   1│    │    4│
╰──────────────────────────────┴──────┴──────┴────┴────┴─────╯

While CROSSTABS summarizes how categories combine, many analyses require modeling a quantitative outcome using one or more predictors. PSPP’s REGRESSION command performs ordinary least squares estimation for this purpose. The following section shows a simple regression example and the standard output produced by PSPP.

Linear Regression

The REGRESSION command fits an ordinary least squares model to a continuous dependent variable using one or more predictors. After checking the variables with FREQUENCIES and CROSSTABS, regression shows how the outcome relates to the predictors. PSPP provides the standard coefficients and model statistics used in OLS (ordinary least squares).

DATA LIST LIST /
    y  x1  x2.
BEGIN DATA
  10  1  4.1
  12  2  5.0
  13  3  6.2
  15  4  7.1
  16  5  8.0
END DATA.
LIST.

REGRESSION
  /DEPENDENT = y
  /METHOD = ENTER x1 x2
  /STATISTICS = COEFF R ANOVA
  /SAVE = PRED RESID.

Output:

This /SAVE option above creates two additional variables: PRED1 (predicted values) and RES1 (residuals). Many statistics courses require students to examine and plot the residuals and predicted values to check model assumptions. Saving these values in PSPP's regression procedure provides exactly what those assignments need.

    Data List
╭─────┬────┬────╮
│  y  │ x1 │ x2 │
├─────┼────┼────┤
│10.00│1.00│4.10│
│12.00│2.00│5.00│
│13.00│3.00│6.20│
│15.00│4.00│7.10│
│16.00│5.00│8.00│
╰─────┴────┴────╯
                     Model Summary (y)
╭───┬────────┬─────────────────┬──────────────────────────╮
│ R │R Square│Adjusted R Square│Std. Error of the Estimate│
├───┼────────┼─────────────────┼──────────────────────────┤
│.99│     .99│              .98│                       .37│
╰───┴────────┴─────────────────┴──────────────────────────╯
                      ANOVA (y)
╭──────────┬──────────────┬──┬───────────┬─────┬────╮
│          │Sum of Squares│df│Mean Square│  F  │Sig.│
├──────────┼──────────────┼──┼───────────┼─────┼────┤
│Regression│         22.53│ 2│      11.27│84.50│.012│
│Residual  │           .27│ 2│        .13│     │    │
│Total     │         22.80│ 4│           │     │    │
╰──────────┴──────────────┴──┴───────────┴─────┴────╯
                               Coefficients (y)
╭──────────┬────────────────────────────┬─────────────────────────┬────┬────╮
│          │ Unstandardized Coefficients│Standardized Coefficients│    │    │
│          ├───────────┬────────────────┼─────────────────────────┤    │    │
│          │     B     │   Std. Error   │           Beta          │  t │Sig.│
├──────────┼───────────┼────────────────┼─────────────────────────┼────┼────┤
│(Constant)│      12.16│            6.92│                      .00│1.76│.177│
│x1        │       2.60│            2.20│                     1.72│1.18│.359│
│x2        │      -1.11│            2.22│                     -.73│-.50│.667│
╰──────────┴───────────┴────────────────┴─────────────────────────┴────┴────╯

LIST.

          Data List
╭─────┬────┬────┬────┬─────╮
│  y  │ x1 │ x2 │RES1│PRED1│
├─────┼────┼────┼────┼─────┤
│10.00│1.00│4.10│-.20│10.20│
│12.00│2.00│5.00│ .20│11.80│
│13.00│3.00│6.20│-.07│13.07│
│15.00│4.00│7.10│ .33│14.67│
│16.00│5.00│8.00│-.27│16.27│
╰─────┴────┴────┴────┴─────╯

The output includes the standard tables for an ordinary least squares model. The Model Summary shows an R of 0.99 and an R‑square of 0.99 for this small example. The Coefficients table lists each predictor along with its estimate, standard error, t-statistic, and p‑value. These values confirm that the procedure ran correctly and that PSPP produced the expected regression statistics. Interpretation of the coefficients depends on the context and the analyst’s goals; the purpose here is simply to show how PSPP fits the model and reports the results.

Computing the P-value for F Manually

PSPP can be used to compute the p-value for the F-statistic. The CDF.F function returns the cumulative distribution function of the F distribution--that is, the probability that an F‑distributed variable is less than or equal to the specified value.

The following calculates the upper tail probability for an F value of 84.50 with 2 model and 2 error degrees of freedom, as shown in the ANOVA table from the regression output.

DO IF $CASENUM = 1.
  COMPUTE p_F = 1 - CDF.F(84.50, 2, 2).
END IF.
FORMATS p_F (F10.6).
LIST p_F /CASES=FROM 1 TO 1.

Data List
╭───────╮
│  p_F  │
├───────┤
│.011696│
╰───────╯

The computed p-value is 0.011696 which matches the pspp output within rounding error. The DO IF block runs the COMPUTE expression only for the first case, and the LIST command displays only that case. All remaining cases contain system-missing values for p_F.

The remaining sections cover data formats, exporting, psppire behavior, and file‑combining. These topics are not part of the basic procedures shown above, but they are useful when working with real datasets and larger projects. psppire is introduced briefly above so new users can get started; its detailed behavior and limitations are covered in several later sections.

 


Notes and Limitations


 

Up to this point we have seen DATA LIST FIXED, DATA LIST FREE and GET DATA used to read ordinary raw data files. Another kind of raw data that still appears in practice is the multi‑record data format.

Multi-Record Data Format with PSPP

Multi-record data is fixed-width data where each case spans multiple lines in a fixed sequence. Data like this is still encountered in industries where legacy reporting systems, long-running surveys, or fixed-width exports are produced, or where efficient storage is important.

Here are two screenshots of parts of the 1980 PUMS codebook showing two different record types (Person and Housing):

1980 PUMS Codebook example (P Records):

PUMS P Record screenshot

1980 PUMS Codebook example (H Records):

PUMS H Record screenshot

PSPP can read this kind of multi-record data directly. This section shows how. There is no need to preprocess this data to get it into a dataset.

You may hear these referred to as multi-record case files, multi-record data files, stacked data, or (historically) hierarchical data files. This format originated when data were stored on punched paper cards, which limited the width of the data. Instead of making each record wider, this format made it taller by spreading one case across multiple fixed-width cards in a defined sequence.

This format stores a large number of variables and values in a compact form, spreading each case across multiple physical records. However, the format is not self-documenting. External documentation is required to interpret the file, usually in the form of a codebook defining how the data were collected, what the cards represent, the variable names, and the column locations, or PSPP code that reads the data correctly.

Here is an example of multi-record data:

044104101712288855439855238255785555221010000010000000005080
805060909080808090500000708000005092543214424122111432310100
000200000000080908080810090909100800000007070000051015441344
2412311243221
087107002688878693388888577338897238210020001020001010308090
810071008070506060906030607020505092444443225332444554400200
000100010001081009090810100708090707060304070302050845444421
1522124444441

Each block in the example above consists of four records or "cards." Single digit responses make it nearly impossible to tell which value belongs to which variable. Different variables appear on different cards (or lines in the file), so this file cannot be read vertically as a simple rectangular dataset or CSV file.

PSPP can handle this type of data by reading each record with a separate set of input statements for each card. It then combines the values from all the records into a single (wide) case.

Tip: Every Record Must be Read or Explicitly Skipped

A multi‑record file is like a stack of cards. PSPP reads them in order, one record at a time. There are two safe options to not read every data card:

Every card must be accounted for. Removing a card is like removing variables from the middle of a DATA LIST FREE statement — everything after the gap shifts left, and the data becomes misaligned. In multi-record files, this misalignment appears vertically.

Multi‑record files follow a vertical, sequential pattern that is not obvious at first glance. Without understanding the data file structure, it is easy to make assumptions that don’t match how these files must be read.

This page shows how to read a multi‑record file with PSPP and export the assembled dataset to CSV or SAV. Once in a modern format, the data can be used in modern analysis tools.

Before looking at PSPP syntax, it helps to see the basic pattern of a multi-record file. Each card has its own layout, and PSPP reads them in order:

- card 1 → /1 variables (Subject IDs and Treatments)
- card 2 → /2 variables (Attitudes, Part 1)
- card 3 → /3 variables (Attitudes, Part 2)
- etc.

This dataset has 4 cards per case. Two of the cards are not needed for analysis, so PSPP skips them using blank slash lines.

DATA LIST FILE=studydat RECORDS=4 NOTABLE 
 /1 SUBJECT 1-3  STUDY 4  TRTMENT 5-6
    FEEL1 10  FEEL2 11  FEEL3 12
 /2 LIKE1 1  LIKE2 2  LIKE3 3
    LIKE4 4  LIKE5 5  LIKE6 6  LIKE7 7  LIKE8 8
    LIKE9 9  LIKE10 10  ACTU11 11  ACTU12 12  ACTU13 13
 /
 /
 .

To skip a card, PSPP is instructed to advance one record for each slash (/). To skip two cards, enter two slashes and end the command with a period.

This code illustrates how DATA LIST (and DATA LIST FIXED) reads multi-record data. First, the RECORDS=4 tells PSPP that each case consists of four cards, even though two cards are not used for analysis.

Each card is defined by a /N and followed by variable names and their column locations on that card. Note that the column locations differ across the cards because each card has its own layout. In this example, /1 and /2 define the two cards being read.

Output:

                                  Data List                                                                             
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|SUBJECT|STUDY|TRTMENT|FEEL1|FEEL2|FEEL3|LIKE1|LIKE2|LIKE3|LIKE4|LIKE5|LIKE6|
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|     44|    1|      4|    7|    1|    2|    8|    0|    5|    0|    6|    0|  
|     87|    1|      7|    6|    8|    8|    8|    1|    0|    0|    7|    1|  
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+

+-----+-----+-----+------+------+------+------+
|LIKE7|LIKE8|LIKE9|LIKE10|ACTU11|ACTU12|ACTU13|
+-----+-----+-----+------+------+------+------+
|    9|    0|    9|     0|     8|     0|     8|  
|    0|    0|    8|     0|     7|     0|     5|  
+-----+-----+-----+------+------+------+------+

                                   Variables
+-------+--------+---------------+-----+-----+---------+-----------+----------+
|       |        |  Measurement  |     |     |         |   Print   |   Write  |
|Name   |Position|     Level     | Role|Width|Alignment|   Format  |  Format  |
+-------+--------+---------------+-----+-----+---------+-----------+----------+
|SUBJECT|       1|Scale          |Input|    8|Right    |F3.0       |F3.0      |   
|STUDY  |       2|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|TRTMENT|       3|Nominal        |Input|    8|Right    |F2.0       |F2.0      |   
|FEEL1  |       4|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|FEEL2  |       5|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|FEEL3  |       6|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE1  |       7|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE2  |       8|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE3  |       9|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE4  |      10|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE5  |      11|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE6  |      12|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE7  |      13|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE8  |      14|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE9  |      15|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE10 |      16|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|ACTU11 |      17|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|ACTU12 |      18|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|ACTU13 |      19|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
+-------+--------+---------------+-----+-----+---------+-----------+----------+

The 2 cases that were originally 4 records of data have been turned into 2 cases of one row each.

Moral to the story: Yes, PSPP can read multi-record data and anyone who says it can't is spreading misinformation. Please stop. It's been able to do this via DATA LIST FIXED for many, many years, at least as far back as v0.82 of PSPP or older. Even psppire can run the code above. But it doesn't appear in any GUI menu. The functionality exists and pspp can read multi-record files.

Note: Although the PSPP user manual describes a /rec# form of GET DATA for multi-record fixed-column files, PSPP does not currently accept this syntax. Multi-record files are supported, but only through DATA LIST FIXED using the RECORD= subcommand.

Prevent errors when the data don’t match the expected pattern

For anyone unfamiliar with the vertical, sequential reading pattern that multi-record files depend upon, the format may seem unusual at first. The structure itself is not the problem. It is an efficient, compact way to store data--and it still works well today. But because it relies on reading multiple cards of data in the correct sequence, even a small slip can create problems that are hard to detect.

Just because it looks okay doesn’t always mean it is.

If a card is missing, out of order, or removed from the read pattern, the software doesn't complain. It simply reads the next physical record as the next logical one, and everything after that is wrong. The first card may appear correct, but the misalignment often becomes apparent only on later cards. Because PSPP cannot infer what was intended, it is essential to run FREQUENCIES on a few key variables drawn from several cards to confirm that the data were read correctly.

When all variables are single digits, a one‑off shift can still produce values that look plausible. In those cases, FREQUENCIES may not reveal the problem because the range of values is unchanged--only the counts shift. This is where having the original, documented frequencies or an original filled-out survey form is essential. When FREQUENCIES cannot be trusted to expose a misread due to complexity, the only dependable method is to check the values by hand to see where the read went wrong.

Recoding Variables using RECODE

The RECODE command can be used to change values in a variable or create new values in a new variable.

In the next two examples, the recoded values are strings, so PSPP requires creating a string variable to hold them. Anything not recoded is given the System Missing value (SYSMIS) because the code includes ELSE=SYSMIS.

This example also includes variable labels, value labels, and variable level definitions.

/* Create  GENDERC wide enough to accept the longest value then recode.*/
STRING GENDERC (A8). 
RECODE GENDER (2="Female") (1="Male") (ELSE=SYSMIS) into GENDERC.

/* Apply labels to the new variable */
VARIABLE LABELS     
  GENDER "Gender of Respondent"
  GENDERC "Gender of Respondent (Char)".

/* Value labels apply to the numeric variable (GENDER), not the string version. */
VALUE LABELS
  GENDER   
    1 "Male"    
    2 "Female"
    .

VARIABLE LEVEL
    GENDER  (NOMINAL)
    GENDERC (NOMINAL).

/* Create a text Treatment var (TRTA) */
STRING TRTA(A1).
RECODE TRTMENT (1="A") (2="B") (3="C") (4="D") (5="E") (6="F") (7="G") 
               (8="H") (9="I") (10="J") 
               (ELSE=SYSMIS) into TRTA.

VARIABLE LABELS TRTA "Treatment Assignment (Char)".
VARIABLE LEVEL TRTA (NOMINAL).
/* Reverse these scales */
RECODE 
   CONSIST TO KEEP_RID  /* First consecutive block
   UNCERT TO POWER  /* Second consecutive block */ 
   MORALITY CONTROL CAUS_YOU /* Individual variables */
   (1=9) (2=8) (3=7) (4=6) (5=5) (6=4) (7=3) (8=2) (9=1) 
   (ELSE=SYSMIS).

This scale-reversing recode is changing the numeric values in the same variables it reads from (CONSIST TO KEEP_RID UNCERT TO POWER, and MORALITY CONTROL CAUS_YOU). Because those variables already exist, no new variables are needed. Anything not recoded is given the System Missing value (SYSMIS).

This example also uses "TO" in the variable list. TO is useful for shortening long variable lists and reducing typing, but it only works on variables that are consecutive in the data set. That is why this example has a break in the variable list. CONSIST TO KEEP_RID covers one consecutive block, UNCERT TO POWER covers another, and MORALITY CONTROL CAUS_YOU are variables not part of any consecutive sequence, so they must be listed separately.

A variable list like CONSIST TO CAUS_YOU causes PSPP to include every variable between those names in dictionary order — often far more than intended.

DISPLAY DICTIONARY. shows the variable order, making it easy to confirm which variables are consecutive before using TO.

RECODE works on both string to numeric coding and numeric to string coding.

Using COMPUTE to Create Values

COMPUTE creates a new variable or replaces the values of an existing one. It is used for simple arithmetic, scale scores, or system variables such as $CASENUM.

/* Compute an index number based on current case number */
COMPUTE IDN = $CASENUM.

/* Average three items */
COMPUTE SITUATN = (CONSIST + WANTEDBY + IMPROVED) / 3.

/* Average three items using a function */
COMPUTE SITUATN = MEAN(CONSIST, WANTEDBY, IMPROVED).

These examples illustrate the main patterns of COMPUTE: using system variables like $CASENUM, creating scale scores with arithmetic, and using functions inside a transformation. Most PSPP transformations follow one of these forms.

PSPP Function Examples

PSPP has many functions that can be used to work on data. Here are five that are used frequently.

1. MEAN() — the workhorse

Handles missing values gracefully and uses the “function inside COMPUTE” pattern.

COMPUTE avg_score = MEAN(v1, v2, v3, v4).

2. SUM() — simple, predictable, and widely needed. Good for scales, counts, and composite scores.

COMPUTE total = SUM(item1, item2, item3).

3. DATEDIFF() — essential for age and time intervals. This one solves a real problem for archivists and anyone working with dates.

COMPUTE age = DATEDIFF(TODAY(), birthdate, "years").

DATEDIFF returns the difference between two dates without adding 1. If inclusive counting is needed, add 1 manually.

4. LTRIM() / RTRIM() — string cleanup That will remove leading or trailing spaces from a string variable's values.

COMPUTE clean_name = RTRIM(LTRIM(name)).

5. SD() — a simple statistical function that demonstrates PSPP’s analytic side. Useful for z‑scores or quick diagnostics.

COMPUTE z = (score - MEAN(score)) / SD(score).

PSPP includes hundreds of functions across math, statistics, strings, dates, logical tests, and data transformations. Only a few were shown here; the full list is in the PSPP manual, which is also linked at the end of this page.

Exporting and Saving PSPP Data (CSV, POR, SAV)

PSPP can export data in several formats using syntax, which is the most reproducible way to create files for use in other software. The examples below show how to write datasets in common formats. psppire also supports exporting through its File->Export dialog; its available formats are described in a later section.

/* Set the working directory to avoid path errors when saving files. */
CD '/home/user/analysis'.

SAVE OUTFILE='spss/emo.sav'.

EXPORT OUTFILE='spss/emo.por'.

/* Write a CSV file with variable names in the first row. */
SAVE TRANSLATE
  /OUTFILE='spss/emo.csv'
  /TYPE=CSV
  /FIELDNAMES
  /REPLACE.

The commands above show how to write data files in several formats using syntax. SAVE creates an SPSS system file (.sav), and EXPORT writes a portable file (.por) for transferring data to other applications that support this format. The SAVE TRANSLATE command writes a CSV file with variable names in the first row; the REPLACE option allows it to overwrite an existing file. SAVE TRANSLATE can also write tab‑separated files by specifying TYPE=TAB.

PSPP syntax can be run either from a shell command line or by pasting it into psppire’s Syntax Editor. psppire can also export the assembled data directly through its File->Export dialog, which provides a subset of the formats available in PSPP syntax. The following section describes psppire itself in more detail.

psppire: The PSPP Graphical Interface

Many PSPP users prefer psppire, the graphical interface that resembles the SPSS Data Editor. psppire is useful for data entry, quick exploration, and running common procedures without writing syntax. However, it does not expose all of PSPP's capabilities, and some dialogs correspond to syntax that is only partially implemented.

For reproducible analysis, batch processing, or advanced procedures, PSPP syntax is still the recommended approach. psppire can generate syntax for many commands, which can then be copied, edited, and reused in syntax files. But there are many more PSPP commands outside the GUI that may be needed from time to time.

For example, merging datasets (MATCH FILES) is a standard PSPP operation, but it is not currently available through psppire’s menus.

psppire is ideal for learning the syntax by example, but complex workflows are best handled directly in syntax files where all the PSPP commands can be used, including commands not in psppire. psppire shows only the commands for which dialogs exist; PSPP syntax supports many additional commands and options (although any PSPP commands can be used in psppire's Syntax Editor).

psppire and File Handles vs. Rerunning Programs

psppire keeps FILE HANDLE definitions for the duration of the session. If a file handle is defined in syntax and then the program is rerun, psppire will report that the handle is already in use. This does not happen when running PSPP from the command line, because each run starts a fresh session.

When using FILE HANDLEs to give files meaningful names, add CLOSE FILE HANDLE commands at the end of the program. This removes the handles so the syntax can be rerun without errors about handles in use.

CLOSE FILE HANDLE demo.
CLOSE FILE HANDLE psych.
CLOSE FILE HANDLE out.

Closing the file handles prevents the “handle already in use” error and allows rerunning the syntax in psppire without restarting the application.

psppire Export Formats

psppire can export the contents of the Output window in several formats. Only the formats shown in the File->Export menu are supported; if a format is not listed, psppire does not produce it. The export format is determined by the output file extension entered in the dialog. For example, entering "myfile.pdf" in the export dialog causes psppire to generate a PDF file.

Rich / Page‑Description Formats (cairo‑based)

These preserve layout, fonts, borders, and the exact appearance of the Output window.

Structured / Document Formats

These formats preserve tables and structure, suitable for editing or further processing.

Plain Data Formats

These formats provide minimal formatting and are useful for scripting, data interchange or raw text.

Unsupported Formats

Formats not shown in the File->Export menu are not implemented in the current version of psppire (or the one being used). Examples include:

Choosing a Format

Merging Files (MATCH FILES)

Merging data files is a routine part of data processing. Additional variables often come from a different file, for example. PSPP's MATCH FILES command merges them into the main file for analysis or reporting purposes.

psppire does not currently provide dialogs for merging datasets. MATCH FILES is a standard PSPP command, but it must be run from PSPP syntax.

The following example shows how to merge two datasets that share a common key variable (PATID). One file contains demographic variables; the other contains psychological measures. MATCH FILES combines them into a single dataset by matching cases on PATID.

Example datasets

demo.sav

PATIDAGESEX
101341
102292
103411

psych.sav

PATIDSCORE1SCORE2
1011218
1021520
1031117

Merged Output

PATIDAGESEX SCORE1SCORE2
101341 1218
102292 1520
103411 1117
MATCH FILES
  /FILE='demo.sav'
  /TABLE='psych.sav'
  /BY PATID.
EXECUTE.

SAVE OUTFILE='merged.sav'.

PSPP matches cases from both files using PATID. Variables from demo.sav and psych.sav appear together in the merged dataset. If a case appears in one file but not the other, MATCH FILES still produces a case, but variables from the missing file are system-missing.

The dataset named on /FILE becomes the active dataset after the merge. Variables from each /TABLE file are added to it. In this example, demo.sav is the base file. After MATCH FILES completes, the active dataset contains the merged result, which should be saved (for example, with SAVE OUTFILE='merged.sav'). PSPP system files contain the data set with all its variable attributes.

PSPP always has exactly one active file, referred to in syntax as * (asterisk). Commands such as GET FILE, MATCH FILES, and ADD FILES replace the active file with their result. Because the active file is overwritten whenever a new dataset is read or created, save any results desired to be kept before running another command that changes the active file.

What happens when there is not a one-to-one merge? Subject 104 has been added to psych.sav but not to demo.sav. In this case, PATID 104 is merged but that subject's demographic variables are all missing.

Revised psych.sav

PATIDSCORE1SCORE2
1011218
1021520
1031117
1041419

Merged Output

PATIDAGESEX SCORE1SCORE2
101341 1218
102292 1520
103411 1117
104.. 1419

If a PATID value appears in one file but not the other, MATCH FILES still creates a case in the merged dataset. This is not a missing file or a missing variable — it is a missing case in one of the datasets. PSPP has no values to supply for that side of the merge, so the variables from the file where the case is absent are set to system-missing. This is the safest behavior: it preserves the case without inventing zeros or placeholder values.

Flags created with /IN= track which file each case came from. An /IN variable is set to 1 if the case was present in that file, and 0 if it was not. This makes it easy to identify unmatched cases after the merge.

MATCH FILES
  /FILE='demo.sav'  /IN=indemo
  /TABLE='psych.sav' /IN=inpsych
  /BY PATID.

SAVE OUTFILE='merged.sav'.

Merged Output with /IN= Flags

PATIDAGESEX SCORE1SCORE2 indemoinpsych
101341 1218 11
102292 1520 11
103411 1117 11
104.. 1419 01

In addition to MATCH FILES, PSPP also provides ADD FILES to append data from multiple files to the active data set, and UPDATE, which updates a master file with modifications from a transaction file.

A note about copy-and-paste merging:

The /IN= flags above show exactly what goes wrong when the data in two files doesn't match perfectly. A statistical merge can make these mismatches visible and preserve the structure of the data but only if you use the tools designed for that purpose. A copy‑and‑paste merge cannot. In a spreadsheet, unmatched cases, misaligned rows, and missing variables are all silent — you won't see them, and you won't know they happened. Copy/paste may appear to “work” on perfectly clean data, but the first time the files differ even slightly, the structure is destroyed and information is lost. This is why merges should always be done by a statistical package like PSPP, not by hand. Understand your data, understand its structure, and always verify your merges.

APPENDIX

Further Reading

The PSPP user manual provides the full reference for all commands, functions, and procedures available in PSPP. It is worth consulting when details beyond the examples shown here are needed, or when exploring options available in the psppire menus.

Installing PSPP

PSPP is at version 2.1.1 currently and after several important bug fixes (file handle close, writing portable files, and reading multi-record data with skipped cards in DATA LIST, and other issues) and the addition of GLM and CTABLES commands in the last several years, PSPP is more capable than ever. If your version is older than 2.1.1 consider upgrading.

PSPP download page at gnu.org The gnu site gives instructions for installing PSPP for Windows, Mac, Debian, Ubuntu, Fedora, and with Flatpak.

PSPP source can also be obtained with git: savannah.gnu.org/git

FreeBSD: PSPP is not in the FreeBSD ports collection (expired 2025-03-01 after being marked broken). FreeBSD users can still install PSPP by building it from the official GNU source release. The PSPP build instructions work on FreeBSD. See pspp/INSTALL after cloning with Git for the details.