PSPP Syntax Guide: Examples and Notes
PSPP is a free, open-source alternative to SPSS. It supports many of the same statistical procedures, uses a similar syntax language, and is well-suited for batch processing, scripting, reproducible analysis, as well as interactive use.
About This Page
This page collects practical, syntax‑first examples for PSPP. It focuses on commands people actually use —
DATA LIST,REGRESSION,CROSSTABS, multi‑record (“stacked”) data handling, and small workflow techniques such as using functions and computing p‑values fromANOVAoutput. The goal is to give clear, reproducible patterns you can adapt to your own work.Note: A common misconception is that PSPP lacks support for multi‑record or hierarchical data files. In fact, PSPP has long supported reading multi‑record data using DATA LIST FIXED with RECORDS= and slash notation.
"Stacked" data handling as used here is unrelated to modern ‘data stacking’ (VARTOCASES), which restructures data into long format.
Programming with PSPP
PSPP uses a syntax language similar to SPSS and supports batch processing, scripting, and reproducible analysis. You can write syntax in any text editor and save it as .sps. If you prefer a GUI, the psppire Syntax Editor can open, edit, and run these files, but the examples here assume a syntax‑driven workflow.
![]()
psppire: The Graphical Interface for pspp
psppire is a graphical interface for the PSPP engine. It’s useful for exploring data, running common procedures, and generating basic syntax. Many users begin in psppire and then copy the generated syntax into the Syntax Editor or a program file.
Some PSPP commands and options are not available in the psppire menus. When features that the GUI doesn’t expose are needed, syntax can be entered directly in a syntax window or an .sps file can be run with pspp. psppire runs all PSPP syntax, not just what appears in the dialogs. Thus, psppire can be used to develop and run PSPP syntax.
psppire does not save syntax or workflow automatically. If the program is closed without saving the syntax, any unsaved syntax entered or generated during the session cannot be recovered. Saving matters if you need to reproduce or debug your analysis. PSPP generally prompts you to save, though, so be aware.
This section covers only the basics needed to open psppire, capture syntax, and run syntax. A more detailed review of psppire’s behavior and limitations appears later in this page.
Launching psppire depends on your operating system. On most systems it appears in the standard applications menu; on Unix‑like systems it can also be started from a terminal by running psppire.
psppire Syntax Editor - How to Open It
A new PSPP program can be created using using File->New->Syntax which opens an empty Syntax Editor window where commands can be entered and executed. The Run menu executes the commands in the window, either all code in the Editor or the selected code. Files should be saved if they will be used later.
Many dialogs in psppire include a "Paste" button. Clicking it opens the Syntax editor (if not already open) and inserts the syntax corresponding to the dialog choices. This is a quick way to learn PSPP syntax and to refine or extend the generated commands. Files should be saved if further work is planned.
Opening Existing Files
To Open Existing
.spsfiles, use File->Open. The dialog displays PSPP data files and.spssyntax files by default. To display only syntax files, choose Syntax Files (*sps) from the filter menu in the lower right.Creating a new syntax file via File->Open, requires an empty
.spsfile to exist beforehand; otherwise, File->New->Syntax, is simpler.About PSPP Syntax
PSPP syntax uses commands terminated by periods. Commands may span multiple lines. Comments begin with an asterisk (*) or may be written as block comments enclosed in
/*and*/. Most data management and analysis operations can be expressed in this format, whether run from the command line or from psppire’s Syntax Editor.Example: Running Syntax Not Available in psppire Menus
With these basics in place, the following example demostrates data processing in PSPP using the command‑line
pspp. The same syntax can be copied into psppire’s Syntax Editor window and executed there.DATA LIST FREEdoes not appear in psppire’s pull‑down menus; psppire provides no GUI menu item for this command, but it can still be entered and executed in the Syntax Editor.
BEGIN DATAandEND DATAmark the literal block of raw data that PSPP reads when usingDATA LISTto read raw data.If there are issues reading the data with
DATA LIST FREE, such as bad formats, the only reliable method to read the data is to apply formats for all variables on theDATA LIST FREEline.* Read simple space‑delimited data. DATA LIST FREE /ID (F6.0) AGE (F2.0) YEAR (F4.0) SEX (A6) . BEGIN DATA. 123456 27 1984 Male 987654 58 1990 Female END DATA. FORMATS ID (F6.0) AGE (F2.0) YEAR (F4.0) SEX (A6). LIST. DISPLAY DICTIONARY.Output:
Data List +------+---+----+------+ | ID |AGE|YEAR| SEX | +------+---+----+------+ |123456| 27|1984|Male | |987654| 58|1990|Female| +------+---+----+------+ Variables +----+--------+-----------------+-----+-----+---------+------------+------------+ |Name|Position|Measurement Level| Role|Width|Alignment|Print Format|Write Format| +----+--------+-----------------+-----+-----+---------+------------+------------+ |ID | 1|Scale |Input| 8|Right |F6.0 |F6.0 | |AGE | 2|Scale |Input| 8|Right |F2.0 |F2.0 | |YEAR| 3|Scale |Input| 8|Right |F4.0 |F4.0 | |SEX | 4|Nominal |Input| 6|Left |A6 |A6 | +----+--------+-----------------+-----+-----+---------+------------+------------+
DATA LIST FREEcan behave unpredictably reading some data. To ensure correct and reproducible results, specify formats for all variables. If in doubt, verify theDATA LIST FREEread by comparing it withGET DATA, which always requires explicit formats.Note: If psppire is preferred, rather than commandline syntax in an editor, setting the correct delimiter allows the import wizard to split the fields and load the data. However, its format guesses are sometimes wrong or missing. Variable formats may still need to be added manually (as shown above) to read the data correctly.
Inspecting Imported Data
Use
FREQUENCIESwhen quick counts, ranges, and missing‑value checks are needed. UseLISTwhen inspecting exact values, string variables, or formatting. UseDISPLAY DICTIONARYto see the attributes of what was imported. If the data are not being read correctly, add or adjust variable formats, and recheck. It is a good idea to do this with every data read.Another good thing to do is to look at your data, really look at it. If the data allow, do a plot. Look for large values, small values, values that are clumped together, values that are off by themselves, variables that change together, values that don't belong, and the general shape of the data. This can help you to understand how your analysis might be affected, for example.
GET DATA command
This example uses the same dataset as the
DATA LIST FREEexample above so the results ofGET DATAcan be compared with those fromDATA LIST FREE.Note: psppire’s File Import wizard uses
GET DATAinternally; the generated command appears in the output window. This syntax can be edited in the syntax editor, for example, changing the variable names and labels, without going through the import wizard repeatedly. psppire's data editor also allows changing variables names, types, alignment, and widths directly.The
GET DATAsyntax below was edited by hand; it was not generated by the File Import wizard. If an analysis needs to be reproduced—including the data‑input step—the syntax file of PSPP commands must be saved and rerun. The list output shows what happened, but it does not repeat the import. psppire’s output window always shows the results of your commands, while the command‑linepsppprogram produces output only when your syntax includes procedures that generate it or when PSPP reports an error. In either case, the output alone is not enough to recreate the import; the syntax must be saved to document the data-input step.* Save the following lines into a text file, for example: data.txt * GET DATA reads data from files. *123456 27 1984 Male *987654 58 1990 Female GET DATA /TYPE = TXT /FILE = 'data.txt' /DELCASE = LINE /DELIMITERS = " " /VARIABLES = ID F6 age F2 year F4 sex A6. LIST. DISPLAY DICTIONARY.Output:
Data List +------+---+----+------+ | ID |age|year| sex | +------+---+----+------+ |123456| 27|1984|Male | |987654| 58|1990|Female| +------+---+----+------+ Variables +----+--------+----------------+-----+-----+---------+------------+-----------+ | | | Measurement | | | | | Write | |Name|Position| Level | Role|Width|Alignment|Print Format| Format | +----+--------+----------------+-----+-----+---------+------------+-----------+ |ID | 1|Scale |Input| 8|Right |F6.0 |F6.0 | |age | 2|Scale |Input| 8|Right |F2.0 |F2.0 | |year| 3|Scale |Input| 8|Right |F4.0 |F4.0 | |sex | 4|Nominal |Input| 6|Left |A6 |A6 | +----+--------+----------------+-----+-----+---------+------------+-----------+There were no issues with the read of the data. The only limitation of
GET DATAis that it always reads data from a file, so a data file must exist.GET DATAreads the data just like theDATA LIST FREEexample above. It gives the same results.Reading CSV Files with GET DATA:
GET DATAcan read CSV files by usingTYPE=TXTandDELIMITERS=",". CSV files are treated as ordinary delimited text, so variable formats must still be specified.Reading Data in Fixed Column Format
DATA LIST FIXEDreads variables from specific column locations in a text data file. As long as the text is aligned in columns, PSPP can parse it exactly as specified.This example was run with psppire.
Syntax
DATA LIST FIXED /ID 1-5 NAME 6-13 (A) AGE 14-15. BEGIN DATA. 12345John 27 98765Maria 34 END DATA. LIST. DISPLAY DICTIONARY.PSPP sees:
- ID → columns 1–5 → numeric - NAME → columns 6–13 → 8‑character string - AGE → columns 14–15 → numericOutput
DATA LIST FIXED /ID 1-5 NAME 6-13 (A) AGE 14-15. Reading 1 record from INLINE. ╭────────┬──────┬───────┬──────╮ │Variable│Record│Columns│Format│ ├────────┼──────┼───────┼──────┤ │ID │ 1│1-5 │F5.0 │ │NAME │ 1│6-13 │A8 │ │AGE │ 1│14-15 │F2.0 │ ╰────────┴──────┴───────┴──────╯ BEGIN DATA. 12345John 27 98765Maria 34 END DATA. LIST. Data List ╭─────┬─────┬───╮ │ ID │ NAME│AGE│ ├─────┼─────┼───┤ │12345│John │ 27│ │98765│Maria│ 34│ ╰─────┴─────┴───╯ DISPLAY DICTIONARY. Variables ╭────┬────────┬─────────────────┬─────┬─────┬─────────┬────────────┬────────────╮ │Name│Position│Measurement Level│ Role│Width│Alignment│Print Format│Write Format│ ├────┼────────┼─────────────────┼─────┼─────┼─────────┼────────────┼────────────┤ │ID │ 1│Scale │Input│ 8│Right │F5.0 │F5.0 │ │NAME│ 2│Nominal │Input│ 8│Left │A8 │A8 │ │AGE │ 3│Scale │Input│ 8│Right │F2.0 │F2.0 │ ╰────┴────────┴─────────────────┴─────┴─────┴─────────┴────────────┴────────────╯There are a couple of points to remember when using DATA LIST FIXED:
1. DATA LIST FIXED is brittle by design
If the file shifts by even one space, every variable after that point is wrong. This is why fixed‑width files must be inspected in a monospace editor and each column range verified to ensure PSPP is reading the data correctly.
2. Strings must match their column width
PSPP does not infer string length in
DATA LIST FIXED. If NAME spans columns 6–13, it must be (A8) or (A) with that exact column range. The data width must match the span of the fixed width field.3. psppire states "Reading 1 record from INLINE"
even though there are two lines of data. psppire reads the first line of the INLINE data to determine the variable formats, which it reports. Then it reads the rest of the data.Reading Data with Implied Decimals
Some types of data, including government survey data such as CPS, NHANES in older ASCII releases, and older Census PUMS files, include data fields that have implied decimal places. This means the decimal point is not stored in the data file.
For example, a monthly earnings value that is
$582.89in real life would appear in the raw data as58289when two implied decimal places are used. Implied decimals are a legacy formatting method used in many long‑running surveys, and PSPP can read them directly usingDATA LIST FIXEDwith the right setup.The number of implied decimal places should be defined in the codebook or other documentation supplied for the study. It is usually noted in the variable definition as “IMPLIED DECIMAL” along with the number of decimal places.
This section goes over a short example of implied-decimal data and how to read it with PSPP.
Raw data:
58289 43750 120055What the codebook should show:
EARN columns 1-6 IMPLIED DECIMAL 2
** PSPP implied decimals example. DATA LIST FIXED /earn 1-6 (2). BEGIN DATA 58289 43750 120055 END DATA. LIST. DISPLAY DICTIONARY.For implied decimals only the number of decimals is specified in parentheses for that field, such as EARN in this example. PSPP determines the width from the column range.
Run this in psppire (via the Run menu) and it should produce output similar to the following. psppire uses box-drawing charcters for tables while the command=line pspp uses plain ASCII.
** Implied decimals example. DATA LIST FIXED /earn 1-6 (2). Reading 1 record from INLINE. ╭────────┬──────┬───────┬──────╮ │Variable│Record│Columns│Format│ ├────────┼──────┼───────┼──────┤ │earn │ 1│1-6 │F6.2 │ ╰────────┴──────┴───────┴──────╯ BEGIN DATA 58289 43750 120055 END DATA. LIST. Data List ╭───────╮ │ earn │ ├───────┤ │ 582.89│ │ 437.50│ │1200.55│ ╰───────╯ DISPLAY DICTIONARY. Variables ╭────┬────────┬─────────────────┬─────┬─────┬─────────┬────────────┬────────────╮ │Name│Position│Measurement Level│ Role│Width│Alignment│Print Format│Write Format│ ├────┼────────┼─────────────────┼─────┼─────┼─────────┼────────────┼────────────┤ │earn│ 1│Scale │Input│ 8│Right │F7.2 │F7.2 │ ╰────┴────────┴─────────────────┴─────┴─────┴─────────┴────────────┴────────────╯The
LISToutput shows that the input values were read with two decimal places applied.Reading Saved Files
Use
GET FILE=to read PSPP system files and portable files. UseGET DATAto read CSV files.* Reading those rare portable SPSS files. GET FILE='legacy.por'. *Reading a PSPP system file. GET FILE='mydata.sav'. *Reading a CSV file. GET DATA /TYPE=CSV /FILE='data.csv' /DELCASE=LINE /DELIMITERS="," /QUALIFIER='"' /ARRANGEMENT=DELIMITED /FIRSTCASE=2 /VARIABLES= id F8.0 age F3.0 gender A1. score F5.2 .Note:
FIRSTCASEtells pspp that the data to read begin on line 2. The variable names are often in line 1, so that line is skipped. FIRSTCASE is not needed if the CSV file is one long line of data and no header row. This example defines the file asDELIMITED, in this case comma-separated values, with the comma as the delimiter and double quotes as the text qualifier.File Handles in PSPP
File handles help reduce the typing of long path names in the program. Instead, a shorter file handle is defined and after that, the handle can be used to refer to the data file. Below, "survey" refers to "C:\path\to\survey.sav". After the handle is defined in this example, other commands can refer to the file as simply "survey".
* File handle usage. FILE HANDLE survey /FILE='C:\path\to\survey.sav'. GET FILE=survey.When the file handle is no longer needed, close it with
CLOSE FILE HANDLE [handle name]. psppire requires closing file handles so they don’t persist between runs and cause file handle errors when the program is rerun.The entire sequence of using file handles looks like this:
FILE HANDLE demo /FILE='demo' /FILE='/home/analysis/data/demo.sav'. GET FILE=demo. * Do some work here (uses the active file from GET FILE). FREQUENCIES VARIABLES=age sex income. CLOSE FILE HANDLE demo.Reading Less Data for Testing: N OF CASES
When developing syntax on a large data file, reading many thousands of cases just to test a few lines of code on a couple of values wastes time.
N OF CASEStells PSPP to read only the first N cases in the file. This speeds up development and makes debugging easier.N OF CASES 15. GET FILE=survey. * Try out the transformations. RECODE age income (SYSMIS=0). * Check a few variables. FREQUENCIES VARIABLES=age income.Modify these commands to fit the situation. Once the code is working, remove
N OF CASESand run the full analysis. This speeds up testing and development goes more smoothly overall. It is especially useful when reading large raw data files, testing recodes,COMPUTEstatements, checking variable formats, verifying merges, and building syntax incrementally.Selecting Data for Analysis: SELECT IF
SELECT IFis used to select cases for analysis or testing. Cases that are not selected are removed from the active dataset. If the dataset is saved after this point, the removed cases cannot be recovered except by reloading the original data file.* Keep only adults with non‑missing income. SELECT IF (age >= 18) AND (NOT SYSMIS(income)). EXECUTE.
SYSMISis used here to exclude cases with missing income values.Descriptive Statistics
The
DESCRIPTIVEScommand computes means, standard deviations, and other summary statistics. PSPP supports variable lists, subcommands, and formatting options. These are specified through theDESCRIPTIVEScommand syntax.
DESCRIPTIVESdoes not read raw data, so if no active dataset is available—or there is no PSPP system file to load—useDATA LISTorGET DATAto read the raw data before running the procedure.* Short example dataset for DESCRIPTIVES. DATA LIST FREE /x y z. BEGIN DATA 5 24 7 96 8 3 END DATA. LIST. DESCRIPTIVES VARIABLES = x y z /STATISTICS = MEAN STDDEV MIN MAX.Output:
Data List +-----+-----+----+ | x | y | z | +-----+-----+----+ | 5.00|24.00|7.00| |96.00| 8.00|3.00| +-----+-----+----+ Descriptive Statistics +--------------------+-+-----+-------+-------+-------+ | |N| Mean|Std Dev|Minimum|Maximum| +--------------------+-+-----+-------+-------+-------+ |x |2|50.50| 64.35| 5.00| 96.00| |y |2|16.00| 11.31| 8.00| 24.00| |z |2| 5.00| 2.83| 3.00| 7.00| |Valid N (listwise) |2| | | | | |Missing N (listwise)|0| | | | | +--------------------+-+-----+-------+-------+-------+This example uses a very small dataset so the results are easy to verify. The
LISTcommand shows that the data were read correctly. UsingLISTis always good practice when usingDATA LIST FREE. The means and minimum/maximum values can be checked by inspection, and although computing the standard deviations requires a bit more arithmetic, the values reported are consistent with the data.If the
/SAVEoption is included,DESCRIPTIVESalso computes Z‑scores for all variables specified and adds them to the active dataset as new variables. This works in PSPP syntax, and psppire includes a checkbos in theDESCRIPTIVESdialog labeled “Save Z‑scores of selected variables as new.”.The
FORMATsubcommand used byDESCRIPTIVESis accepted for backward compatibility but has no effect in PSPP.Frequencies
The
FREQUENCIEScommand produces counts and percentages for the values of one or more variables and is often the first procedure run after reading a dataset, because it quickly shows whether the values were read correctly and whether any unexpected categories or codes appear.
FREQUENCIEScan also produce summary statistics and simple charts, but in its basic form it is most useful as a data‑checking tool.DATA LIST FREE /group score. BEGIN DATA 1 10 1 12 2 15 2 18 2 18 END DATA. LIST. FREQUENCIES VARIABLES = group score.Output:
Data List +-----+-----+ |group|score| +-----+-----+ | 1.00|10.00| | 1.00|12.00| | 2.00|15.00| | 2.00|18.00| | 2.00|18.00| +-----+-----+ Statistics +---------+-----+-----+ | |group|score| +---------+-----+-----+ |N Valid | 5| 5| | Missing| 0| 0| +---------+-----+-----+ |Mean | 1.60|14.60| +---------+-----+-----+ |Std Dev | .55| 3.58| +---------+-----+-----+ |Minimum | 1.00|10.00| +---------+-----+-----+ |Maximum | 2.00|18.00| +---------+-----+-----+ group +----------+---------+-------+-------------+------------------+ | |Frequency|Percent|Valid Percent|Cumulative Percent| +----------+---------+-------+-------------+------------------+ |Valid 1.00| 2| 40.0%| 40.0%| 40.0%| | 2.00| 3| 60.0%| 60.0%| 100.0%| +----------+---------+-------+-------------+------------------+ |Total | 5| 100.0%| | | +----------+---------+-------+-------------+------------------+ score +-----------+---------+-------+-------------+------------------+ | |Frequency|Percent|Valid Percent|Cumulative Percent| +-----------+---------+-------+-------------+------------------+ |Valid 10.00| 1| 20.0%| 20.0%| 20.0%| | 12.00| 1| 20.0%| 20.0%| 40.0%| | 15.00| 1| 20.0%| 20.0%| 60.0%| | 18.00| 2| 40.0%| 40.0%| 100.0%| +-----------+---------+-------+-------------+------------------+ |Total | 5| 100.0%| | | +-----------+---------+-------+-------------+------------------+The
DATA LISTsuccessfully read the data, as shown by theLISToutput. The statistics we requested appear next. Of particular interest are the counts of valid and missing values. In this case all values are valid and none are missing, because this small example was constructed to be complete. Real datasets often contain missing or unexpected values, andFREQUENCIESis a quick way to identify them.Following the statistics output are the frequency tables for
groupandscore. There are two groups and 5 subjects. Thescorevariable shows 18 occurs twice and the o ther values occur once. These results can be checked directly against the data.
FREQUENCIEScan summarize multiple variables at once, but it treats each variable separately.CROSSTABSandCTABLEScan also produce counts for categorical variables, but they’re designed for examining relationships or producing formatted tables, not for quick single‑variable inspection.Cross-Tabulation
The
CROSSTABScommand displays the relationship between two categorical variables by showing the joint distribution of their values. It is often used afterFREQUENCIESto check that the categories of each variable combine as expected.Two-Variable Crosstabs with Counts and Percentages
The simplest use of
CROSSTABSis a two‑variable table. The following example shows a basic 2×2 crosstab with counts, percentages, and the chi‑square test.
CROSSTABScan also produce row, column, and total percentages, which make it easier to compare groups. The same small dataset is used in this example to show how the values of one variable are distributed within the categories of another. It also requests the chi‑square test, which PSPP prints below the table.Output:
DATA LIST FREE /group outcome. BEGIN DATA 1 0 1 1 2 0 2 1 2 1 END DATA. LIST. CROSSTABS /TABLES = group BY outcome /STATISTICS = CHISQ. Data List ╭─────┬───────╮ │group│outcome│ ├─────┼───────┤ │ 1.00│ .00│ │ 1.00│ 1.00│ │ 2.00│ .00│ │ 2.00│ 1.00│ │ 2.00│ 1.00│ ╰─────┴───────╯ CROSSTABS /TABLES = group BY outcome /STATISTICS = CHISQ. Summary ╭───────────────┬─────────────────────────────╮ │ │ Cases │ │ ├─────────┬─────────┬─────────┤ │ │ Valid │ Missing │ Total │ │ ├─┬───────┼─┬───────┼─┬───────┤ │ │N│Percent│N│Percent│N│Percent│ ├───────────────┼─┼───────┼─┼───────┼─┼───────┤ │group × outcome│5│ 100.0%│0│ .0%│5│ 100.0%│ ╰───────────────┴─┴───────┴─┴───────┴─┴───────╯ group × outcome ╭───────────────────┬─────────────┬──────╮ │ │ outcome │ │ │ ├──────┬──────┤ │ │ │ .00 │ 1.00 │ Total│ ├───────────────────┼──────┼──────┼──────┤ │group 1.00 Count │ 1│ 1│ 2│ │ Row % │ 50.0%│ 50.0%│100.0%│ │ Column %│ 50.0%│ 33.3%│ 40.0%│ │ Total % │ 20.0%│ 20.0%│ 40.0%│ │ ╶─────────────┼──────┼──────┼──────┤ │ 2.00 Count │ 1│ 2│ 3│ │ Row % │ 33.3%│ 66.7%│100.0%│ │ Column %│ 50.0%│ 66.7%│ 60.0%│ │ Total % │ 20.0%│ 40.0%│ 60.0%│ ├───────────────────┼──────┼──────┼──────┤ │Total Count │ 2│ 3│ 5│ │ Row % │ 40.0%│ 60.0%│100.0%│ │ Column %│100.0%│100.0%│100.0%│ │ Total % │ 40.0%│ 60.0%│100.0%│ ╰───────────────────┴──────┴──────┴──────╯ Chi-Square Tests ╭────────────────────────────┬─────┬──┬──────────────────────────┬─────────────────────┬─────────────────────╮ │ │Value│df│Asymptotic Sig. (2-tailed)│Exact Sig. (2-tailed)│Exact Sig. (1-tailed)│ ├────────────────────────────┼─────┼──┼──────────────────────────┼─────────────────────┼─────────────────────┤ │Pearson Chi-Square │ .14│ 1│ .709│ │ │ │Likelihood Ratio │ .14│ 1│ .710│ │ │ │Fisher's Exact Test │ │ │ │ 1.033│ .700│ │Continuity Correction │ .00│ 1│ 1.000│ │ │ │Linear-by-Linear Association│ .11│ 1│ .739│ │ │ │N of Valid Cases │ 5│ │ │ │ │ ╰────────────────────────────┴─────┴──┴──────────────────────────┴─────────────────────┴─────────────────────╯The crosstabulation shows how the categories of the two variables combine. This confirms that the values were read correctly and that all expected category combinations appear. The chi‑square test is printed below the table. In this example the p‑value is 0.709. Whether this is considered evidence against the null hypothesis depends on the analyst’s chosen significance level and the question being asked. The important point here is that the procedure ran correctly and produced the expected statistics.
Three-Variable Crosstabs Example
CROSSTABScan include more than two variables. When you specify three variables, PSPP produces a single nested table (for example, AGE × YEAR × SEX) with hierarchical counts and totals. If you include additional variables, PSPP may split the output into multiple tables depending on how much nesting it can format. For more flexible multi‑layered layouts, useCTABLES, which is designed for complex, formatted tables.Here is a small example of survey data that collects
YEARof the survey,AGEGROUPof respondent andINCCAT, the respondent's income category.CROSSTABScan display all three categories.** Survey data example . DATA LIST LIST / YEAR (F4.0) AGEGROUP (A5) INCCAT (A7). Reading free-form data from INLINE. ╭────────┬──────╮ │Variable│Format│ ├────────┼──────┤ │YEAR │F4.0 │ │AGEGROUP│A5 │ │INCCAT │A7 │ ╰────────┴──────╯ BEGIN DATA 2018 18-29 <30k 2018 18-29 30-60k 2018 30-44 30-60k 2018 45-64 60-90k 2020 18-29 30-60k 2020 30-44 60-90k 2020 30-44 90k+ 2020 45-64 30-60k END DATA. CROSSTABS /TABLES = AGEGROUP BY INCCAT BY YEAR. Summary ╭────────────────────────┬─────────────────────────────╮ │ │ Cases │ │ ├─────────┬─────────┬─────────┤ │ │ Valid │ Missing │ Total │ │ ├─┬───────┼─┬───────┼─┬───────┤ │ │N│Percent│N│Percent│N│Percent│ ├────────────────────────┼─┼───────┼─┼───────┼─┼───────┤ │AGEGROUP × INCCAT × YEAR│8│ 100.0%│0│ .0%│8│ 100.0%│ ╰────────────────────────┴─┴───────┴─┴───────┴─┴───────╯ AGEGROUP × INCCAT × YEAR ╭──────────────────────────────┬───────────────────────┬─────╮ │ │ INCCAT │ │ │ ├──────┬──────┬────┬────┤ │ │ │30-60k│60-90k│90k+│<30k│Total│ ├──────────────────────────────┼──────┼──────┼────┼────┼─────┤ │YEAR 2018 AGEGROUP 18-29 Count│ 1│ 0│ │ 1│ 2│ │ ╶───────────┼──────┼──────┼────┼────┼─────┤ │ 30-44 Count│ 1│ 0│ │ 0│ 1│ │ ╶───────────┼──────┼──────┼────┼────┼─────┤ │ 45-64 Count│ 0│ 1│ │ 0│ 1│ │ ╶────────────────────┼──────┼──────┼────┼────┼─────┤ │ Total Count│ 2│ 1│ │ 1│ 4│ │ ╶─────────────────────────┼──────┼──────┼────┼────┼─────┤ │ 2020 AGEGROUP 18-29 Count│ 1│ 0│ 0│ │ 1│ │ ╶───────────┼──────┼──────┼────┼────┼─────┤ │ 30-44 Count│ 0│ 1│ 1│ │ 2│ │ ╶───────────┼──────┼──────┼────┼────┼─────┤ │ 45-64 Count│ 1│ 0│ 0│ │ 1│ │ ╶────────────────────┼──────┼──────┼────┼────┼─────┤ │ Total Count│ 2│ 1│ 1│ │ 4│ ╰──────────────────────────────┴──────┴──────┴────┴────┴─────╯While
CROSSTABSsummarizes how categories combine, many analyses require modeling a quantitative outcome using one or more predictors. PSPP’sREGRESSIONcommand performs ordinary least squares estimation for this purpose. The following section shows a simple regression example and the standard output produced by PSPP.Linear Regression
The
REGRESSIONcommand fits an ordinary least squares model to a continuous dependent variable using one or more predictors. After checking the variables withFREQUENCIESandCROSSTABS, regression shows how the outcome relates to the predictors. PSPP provides the standard coefficients and model statistics used in OLS (ordinary least squares).DATA LIST LIST / y x1 x2. BEGIN DATA 10 1 4.1 12 2 5.0 13 3 6.2 15 4 7.1 16 5 8.0 END DATA. LIST. REGRESSION /DEPENDENT = y /METHOD = ENTER x1 x2 /STATISTICS = COEFF R ANOVA /SAVE = PRED RESID.Output:
This
/SAVEoption above creates two additional variables:PRED1(predicted values) andRES1(residuals). Many statistics courses require students to examine and plot the residuals and predicted values to check model assumptions. Saving these values in PSPP's regression procedure provides exactly what those assignments need.Data List ╭─────┬────┬────╮ │ y │ x1 │ x2 │ ├─────┼────┼────┤ │10.00│1.00│4.10│ │12.00│2.00│5.00│ │13.00│3.00│6.20│ │15.00│4.00│7.10│ │16.00│5.00│8.00│ ╰─────┴────┴────╯ Model Summary (y) ╭───┬────────┬─────────────────┬──────────────────────────╮ │ R │R Square│Adjusted R Square│Std. Error of the Estimate│ ├───┼────────┼─────────────────┼──────────────────────────┤ │.99│ .99│ .98│ .37│ ╰───┴────────┴─────────────────┴──────────────────────────╯ ANOVA (y) ╭──────────┬──────────────┬──┬───────────┬─────┬────╮ │ │Sum of Squares│df│Mean Square│ F │Sig.│ ├──────────┼──────────────┼──┼───────────┼─────┼────┤ │Regression│ 22.53│ 2│ 11.27│84.50│.012│ │Residual │ .27│ 2│ .13│ │ │ │Total │ 22.80│ 4│ │ │ │ ╰──────────┴──────────────┴──┴───────────┴─────┴────╯ Coefficients (y) ╭──────────┬────────────────────────────┬─────────────────────────┬────┬────╮ │ │ Unstandardized Coefficients│Standardized Coefficients│ │ │ │ ├───────────┬────────────────┼─────────────────────────┤ │ │ │ │ B │ Std. Error │ Beta │ t │Sig.│ ├──────────┼───────────┼────────────────┼─────────────────────────┼────┼────┤ │(Constant)│ 12.16│ 6.92│ .00│1.76│.177│ │x1 │ 2.60│ 2.20│ 1.72│1.18│.359│ │x2 │ -1.11│ 2.22│ -.73│-.50│.667│ ╰──────────┴───────────┴────────────────┴─────────────────────────┴────┴────╯ LIST. Data List ╭─────┬────┬────┬────┬─────╮ │ y │ x1 │ x2 │RES1│PRED1│ ├─────┼────┼────┼────┼─────┤ │10.00│1.00│4.10│-.20│10.20│ │12.00│2.00│5.00│ .20│11.80│ │13.00│3.00│6.20│-.07│13.07│ │15.00│4.00│7.10│ .33│14.67│ │16.00│5.00│8.00│-.27│16.27│ ╰─────┴────┴────┴────┴─────╯The output includes the standard tables for an ordinary least squares model. The Model Summary shows an R of 0.99 and an R‑square of 0.99 for this small example. The Coefficients table lists each predictor along with its estimate, standard error, t-statistic, and p‑value. These values confirm that the procedure ran correctly and that PSPP produced the expected regression statistics. Interpretation of the coefficients depends on the context and the analyst’s goals; the purpose here is simply to show how PSPP fits the model and reports the results.
Computing the P-value for F Manually
PSPP can be used to compute the p-value for the F-statistic. The CDF.F function returns the cumulative distribution function of the F distribution--that is, the probability that an F‑distributed variable is less than or equal to the specified value.
The following calculates the upper tail probability for an F value of 84.50 with 2 model and 2 error degrees of freedom, as shown in the ANOVA table from the regression output.
DO IF $CASENUM = 1. COMPUTE p_F = 1 - CDF.F(84.50, 2, 2). END IF. FORMATS p_F (F10.6). LIST p_F /CASES=FROM 1 TO 1. Data List ╭───────╮ │ p_F │ ├───────┤ │.011696│ ╰───────╯The computed p-value is 0.011696 which matches the pspp output within rounding error. The
DO IFblock runs theCOMPUTEexpression only for the first case, and theLISTcommand displays only that case. All remaining cases contain system-missing values forp_F.The remaining sections cover data formats, exporting, psppire behavior, and file‑combining. These topics are not part of the basic procedures shown above, but they are useful when working with real datasets and larger projects. psppire is introduced briefly above so new users can get started; its detailed behavior and limitations are covered in several later sections.
Notes and Limitations
- PSPP does not yet support everything SPSS does. Software should be installed based on what it can do, not what you wished it did.
- PSPP includes some functionality that SPSS does not.
- Your mileage may vary. New versions may have what you need. PSPP is a work in progress and maintained by volunteers.
- If you want to contribute, report bugs, and make suggestions, consider helping the pspp project at gnu.org.
- If you’re not sure what a PSPP command will do, read the manual or run the examples on a copy of your data before applying it to anything important. Some commands (like
NEW FILE,GET,MATCH FILES, orDATA LIST) replace or clear the active dataset, so save your data first if you want to keep it.- If a command works with files in memory or on disk, assume the results will disappear unless you save them before running the next command.
- Again: If you create it and want to keep it, it needs to be saved.
- Batch mode is stable and well-suited for reproducible workflows.
- Some dialogs in the GUI correspond to syntax that is not fully implemented.
- Table and chart output can be produced in several formats: plain text, PostScript, PDF, DocBook, TeX, or HTML depending on configuration.
Up to this point we have seen
DATA LIST FIXED,DATA LIST FREEandGET DATAused to read ordinary raw data files. Another kind of raw data that still appears in practice is the multi‑record data format.Multi-Record Data Format with PSPP
Multi-record data is fixed-width data where each case spans multiple lines in a fixed sequence. Data like this is still encountered in industries where legacy reporting systems, long-running surveys, or fixed-width exports are produced, or where efficient storage is important.
Here are two screenshots of parts of the 1980 PUMS codebook showing two different record types (Person and Housing):
1980 PUMS Codebook example (P Records):
![]()
1980 PUMS Codebook example (H Records):
![]()
PSPP can read this kind of multi-record data directly. This section shows how. There is no need to preprocess this data to get it into a dataset.
You may hear these referred to as multi-record case files, multi-record data files, stacked data, or (historically) hierarchical data files. This format originated when data were stored on punched paper cards, which limited the width of the data. Instead of making each record wider, this format made it taller by spreading one case across multiple fixed-width cards in a defined sequence.
This format stores a large number of variables and values in a compact form, spreading each case across multiple physical records. However, the format is not self-documenting. External documentation is required to interpret the file, usually in the form of a codebook defining how the data were collected, what the cards represent, the variable names, and the column locations, or PSPP code that reads the data correctly.
Here is an example of multi-record data:
044104101712288855439855238255785555221010000010000000005080 805060909080808090500000708000005092543214424122111432310100 000200000000080908080810090909100800000007070000051015441344 2412311243221 087107002688878693388888577338897238210020001020001010308090 810071008070506060906030607020505092444443225332444554400200 000100010001081009090810100708090707060304070302050845444421 1522124444441Each block in the example above consists of four records or "cards." Single digit responses make it nearly impossible to tell which value belongs to which variable. Different variables appear on different cards (or lines in the file), so this file cannot be read vertically as a simple rectangular dataset or CSV file.
PSPP can handle this type of data by reading each record with a separate set of input statements for each card. It then combines the values from all the records into a single (wide) case.
Tip: Every Record Must be Read or Explicitly Skipped
A multi‑record file is like a stack of cards. PSPP reads them in order, one record at a time. There are two safe options to not read every data card:
- Read the card (and delete variables later if they are not needed)
- Skip the card using a blank record specification (for example:
/ / /: slashes inDATA LISTskip cards)Every card must be accounted for. Removing a card is like removing variables from the middle of a
DATA LIST FREEstatement — everything after the gap shifts left, and the data becomes misaligned. In multi-record files, this misalignment appears vertically.Multi‑record files follow a vertical, sequential pattern that is not obvious at first glance. Without understanding the data file structure, it is easy to make assumptions that don’t match how these files must be read.
This page shows how to read a multi‑record file with PSPP and export the assembled dataset to CSV or SAV. Once in a modern format, the data can be used in modern analysis tools.
Before looking at PSPP syntax, it helps to see the basic pattern of a multi-record file. Each card has its own layout, and PSPP reads them in order:
- card 1 → /1 variables (Subject IDs and Treatments) - card 2 → /2 variables (Attitudes, Part 1) - card 3 → /3 variables (Attitudes, Part 2) - etc.This dataset has 4 cards per case. Two of the cards are not needed for analysis, so PSPP skips them using blank slash lines.
DATA LIST FILE=studydat RECORDS=4 NOTABLE /1 SUBJECT 1-3 STUDY 4 TRTMENT 5-6 FEEL1 10 FEEL2 11 FEEL3 12 /2 LIKE1 1 LIKE2 2 LIKE3 3 LIKE4 4 LIKE5 5 LIKE6 6 LIKE7 7 LIKE8 8 LIKE9 9 LIKE10 10 ACTU11 11 ACTU12 12 ACTU13 13 / / .To skip a card, PSPP is instructed to advance one record for each slash (/). To skip two cards, enter two slashes and end the command with a period.
This code illustrates how
DATA LIST(andDATA LIST FIXED) reads multi-record data. First, theRECORDS=4tells PSPP that each case consists of four cards, even though two cards are not used for analysis.Each card is defined by a
/Nand followed by variable names and their column locations on that card. Note that the column locations differ across the cards because each card has its own layout. In this example,/1and/2define the two cards being read.Output:
Data List +-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ |SUBJECT|STUDY|TRTMENT|FEEL1|FEEL2|FEEL3|LIKE1|LIKE2|LIKE3|LIKE4|LIKE5|LIKE6| +-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | 44| 1| 4| 7| 1| 2| 8| 0| 5| 0| 6| 0| | 87| 1| 7| 6| 8| 8| 8| 1| 0| 0| 7| 1| +-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+------+------+------+------+ |LIKE7|LIKE8|LIKE9|LIKE10|ACTU11|ACTU12|ACTU13| +-----+-----+-----+------+------+------+------+ | 9| 0| 9| 0| 8| 0| 8| | 0| 0| 8| 0| 7| 0| 5| +-----+-----+-----+------+------+------+------+ Variables +-------+--------+---------------+-----+-----+---------+-----------+----------+ | | | Measurement | | | | Print | Write | |Name |Position| Level | Role|Width|Alignment| Format | Format | +-------+--------+---------------+-----+-----+---------+-----------+----------+ |SUBJECT| 1|Scale |Input| 8|Right |F3.0 |F3.0 | |STUDY | 2|Nominal |Input| 8|Right |F1.0 |F1.0 | |TRTMENT| 3|Nominal |Input| 8|Right |F2.0 |F2.0 | |FEEL1 | 4|Nominal |Input| 8|Right |F1.0 |F1.0 | |FEEL2 | 5|Nominal |Input| 8|Right |F1.0 |F1.0 | |FEEL3 | 6|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE1 | 7|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE2 | 8|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE3 | 9|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE4 | 10|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE5 | 11|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE6 | 12|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE7 | 13|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE8 | 14|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE9 | 15|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE10 | 16|Nominal |Input| 8|Right |F1.0 |F1.0 | |ACTU11 | 17|Nominal |Input| 8|Right |F1.0 |F1.0 | |ACTU12 | 18|Nominal |Input| 8|Right |F1.0 |F1.0 | |ACTU13 | 19|Nominal |Input| 8|Right |F1.0 |F1.0 | +-------+--------+---------------+-----+-----+---------+-----------+----------+The 2 cases that were originally 4 records of data have been turned into 2 cases of one row each.
Moral to the story: Yes, PSPP can read multi-record data and anyone who says it can't is spreading misinformation. Please stop. It's been able to do this via
DATA LIST FIXEDfor many, many years, at least as far back as v0.82 of PSPP or older. Even psppire can run the code above. But it doesn't appear in any GUI menu. The functionality exists and pspp can read multi-record files.Note: Although the PSPP user manual describes a
/rec#form ofGET DATAfor multi-record fixed-column files, PSPP does not currently accept this syntax. Multi-record files are supported, but only throughDATA LIST FIXEDusing the RECORD= subcommand.Prevent errors when the data don’t match the expected pattern
For anyone unfamiliar with the vertical, sequential reading pattern that multi-record files depend upon, the format may seem unusual at first. The structure itself is not the problem. It is an efficient, compact way to store data--and it still works well today. But because it relies on reading multiple cards of data in the correct sequence, even a small slip can create problems that are hard to detect.
Just because it looks okay doesn’t always mean it is.
If a card is missing, out of order, or removed from the read pattern, the software doesn't complain. It simply reads the next physical record as the next logical one, and everything after that is wrong. The first card may appear correct, but the misalignment often becomes apparent only on later cards. Because PSPP cannot infer what was intended, it is essential to run
FREQUENCIESon a few key variables drawn from several cards to confirm that the data were read correctly.When all variables are single digits, a one‑off shift can still produce values that look plausible. In those cases,
FREQUENCIESmay not reveal the problem because the range of values is unchanged--only the counts shift. This is where having the original, documented frequencies or an original filled-out survey form is essential. WhenFREQUENCIEScannot be trusted to expose a misread due to complexity, the only dependable method is to check the values by hand to see where the read went wrong.Recoding Variables using RECODE
The
RECODEcommand can be used to change values in a variable or create new values in a new variable.In the next two examples, the recoded values are strings, so PSPP requires creating a string variable to hold them. Anything not recoded is given the System Missing value (
SYSMIS) because the code includesELSE=SYSMIS.This example also includes variable labels, value labels, and variable level definitions.
/* Create GENDERC wide enough to accept the longest value then recode.*/ STRING GENDERC (A8). RECODE GENDER (2="Female") (1="Male") (ELSE=SYSMIS) into GENDERC. /* Apply labels to the new variable */ VARIABLE LABELS GENDER "Gender of Respondent" GENDERC "Gender of Respondent (Char)". /* Value labels apply to the numeric variable (GENDER), not the string version. */ VALUE LABELS GENDER 1 "Male" 2 "Female" . VARIABLE LEVEL GENDER (NOMINAL) GENDERC (NOMINAL). /* Create a text Treatment var (TRTA) */ STRING TRTA(A1). RECODE TRTMENT (1="A") (2="B") (3="C") (4="D") (5="E") (6="F") (7="G") (8="H") (9="I") (10="J") (ELSE=SYSMIS) into TRTA. VARIABLE LABELS TRTA "Treatment Assignment (Char)". VARIABLE LEVEL TRTA (NOMINAL)./* Reverse these scales */ RECODE CONSIST TO KEEP_RID /* First consecutive block UNCERT TO POWER /* Second consecutive block */ MORALITY CONTROL CAUS_YOU /* Individual variables */ (1=9) (2=8) (3=7) (4=6) (5=5) (6=4) (7=3) (8=2) (9=1) (ELSE=SYSMIS).This scale-reversing recode is changing the numeric values in the same variables it reads from (
CONSIST TO KEEP_RID UNCERT TO POWER, andMORALITY CONTROL CAUS_YOU). Because those variables already exist, no new variables are needed. Anything not recoded is given the System Missing value (SYSMIS).This example also uses "TO" in the variable list.
TOis useful for shortening long variable lists and reducing typing, but it only works on variables that are consecutive in the data set. That is why this example has a break in the variable list.CONSIST TO KEEP_RIDcovers one consecutive block,UNCERT TO POWERcovers another, andMORALITY CONTROL CAUS_YOUare variables not part of any consecutive sequence, so they must be listed separately.
A variable list like CONSIST TO CAUS_YOUcauses PSPP to include every variable between those names in dictionary order — often far more than intended.
DISPLAY DICTIONARY.shows the variable order, making it easy to confirm which variables are consecutive before usingTO.
RECODEworks on both string to numeric coding and numeric to string coding.Using COMPUTE to Create Values
COMPUTEcreates a new variable or replaces the values of an existing one. It is used for simple arithmetic, scale scores, or system variables such as$CASENUM./* Compute an index number based on current case number */ COMPUTE IDN = $CASENUM. /* Average three items */ COMPUTE SITUATN = (CONSIST + WANTEDBY + IMPROVED) / 3. /* Average three items using a function */ COMPUTE SITUATN = MEAN(CONSIST, WANTEDBY, IMPROVED).These examples illustrate the main patterns of
COMPUTE: using system variables like $CASENUM, creating scale scores with arithmetic, and using functions inside a transformation. Most PSPP transformations follow one of these forms.PSPP Function Examples
PSPP has many functions that can be used to work on data. Here are five that are used frequently.
1.
MEAN()— the workhorseHandles missing values gracefully and uses the “function inside COMPUTE” pattern.
COMPUTE avg_score = MEAN(v1, v2, v3, v4).2.
SUM()— simple, predictable, and widely needed. Good for scales, counts, and composite scores.COMPUTE total = SUM(item1, item2, item3).3.
DATEDIFF()— essential for age and time intervals. This one solves a real problem for archivists and anyone working with dates.COMPUTE age = DATEDIFF(TODAY(), birthdate, "years").
DATEDIFFreturns the difference between two dates without adding 1. If inclusive counting is needed, add 1 manually.4.
LTRIM()/RTRIM()— string cleanup That will remove leading or trailing spaces from a string variable's values.COMPUTE clean_name = RTRIM(LTRIM(name)).5.
SD()— a simple statistical function that demonstrates PSPP’s analytic side. Useful for z‑scores or quick diagnostics.COMPUTE z = (score - MEAN(score)) / SD(score).PSPP includes hundreds of functions across math, statistics, strings, dates, logical tests, and data transformations. Only a few were shown here; the full list is in the PSPP manual, which is also linked at the end of this page.
Exporting and Saving PSPP Data (CSV, POR, SAV)
PSPP can export data in several formats using syntax, which is the most reproducible way to create files for use in other software. The examples below show how to write datasets in common formats. psppire also supports exporting through its File->Export dialog; its available formats are described in a later section.
/* Set the working directory to avoid path errors when saving files. */ CD '/home/user/analysis'. SAVE OUTFILE='spss/emo.sav'. EXPORT OUTFILE='spss/emo.por'. /* Write a CSV file with variable names in the first row. */ SAVE TRANSLATE /OUTFILE='spss/emo.csv' /TYPE=CSV /FIELDNAMES /REPLACE.The commands above show how to write data files in several formats using syntax.
SAVEcreates an SPSS system file (.sav), andEXPORTwrites a portable file (.por) for transferring data to other applications that support this format. TheSAVE TRANSLATEcommand writes a CSV file with variable names in the first row; theREPLACEoption allows it to overwrite an existing file.SAVE TRANSLATEcan also write tab‑separated files by specifyingTYPE=TAB.PSPP syntax can be run either from a shell command line or by pasting it into psppire’s Syntax Editor. psppire can also export the assembled data directly through its File->Export dialog, which provides a subset of the formats available in PSPP syntax. The following section describes psppire itself in more detail.
psppire: The PSPP Graphical Interface
Many PSPP users prefer psppire, the graphical interface that resembles the SPSS Data Editor. psppire is useful for data entry, quick exploration, and running common procedures without writing syntax. However, it does not expose all of PSPP's capabilities, and some dialogs correspond to syntax that is only partially implemented.
For reproducible analysis, batch processing, or advanced procedures, PSPP syntax is still the recommended approach. psppire can generate syntax for many commands, which can then be copied, edited, and reused in syntax files. But there are many more PSPP commands outside the GUI that may be needed from time to time.
For example, merging datasets (
MATCH FILES) is a standard PSPP operation, but it is not currently available through psppire’s menus.psppire is ideal for learning the syntax by example, but complex workflows are best handled directly in syntax files where all the PSPP commands can be used, including commands not in psppire. psppire shows only the commands for which dialogs exist; PSPP syntax supports many additional commands and options (although any PSPP commands can be used in psppire's Syntax Editor).
psppire and File Handles vs. Rerunning Programs
psppire keeps
FILE HANDLEdefinitions for the duration of the session. If a file handle is defined in syntax and then the program is rerun, psppire will report that the handle is already in use. This does not happen when running PSPP from the command line, because each run starts a fresh session.When using
FILE HANDLEs to give files meaningful names, addCLOSE FILE HANDLEcommands at the end of the program. This removes the handles so the syntax can be rerun without errors about handles in use.CLOSE FILE HANDLE demo. CLOSE FILE HANDLE psych. CLOSE FILE HANDLE out.Closing the file handles prevents the “handle already in use” error and allows rerunning the syntax in psppire without restarting the application.
psppire Export Formats
psppire can export the contents of the Output window in several formats. Only the formats shown in the File->Export menu are supported; if a format is not listed, psppire does not produce it. The export format is determined by the output file extension entered in the dialog. For example, entering "myfile.pdf" in the export dialog causes psppire to generate a PDF file.
Rich / Page‑Description Formats (cairo‑based)
These preserve layout, fonts, borders, and the exact appearance of the Output window.
- PDF (.pdf) — Suitable for sharing, printing, and archiving.
- PostScript (.ps) — Vector format; compatible with PostScript viewers and printers.
- Scalable Vector Graphics (.svg) — Vector format; editable in tools such as Inkscape.
Structured / Document Formats
These formats preserve tables and structure, suitable for editing or further processing.
- HTML (.html) — Good for viewing in a web browser, copying into other documents, or converting to other formats.
- OpenDocument Text (.odt) — Editable in LibreOffice and other ODT-compatible editors.
- SPSS Viewer (.spv) — Compatible with SPSS Viewer applications.
Plain Data Formats
These formats provide minimal formatting and are useful for scripting, data interchange or raw text.
- Text (generic) (.txt).
- Text [plain] (.txt) — Plain text output.
- Comma‑separated values (.csv) — For exporting data tables.
Unsupported Formats
Formats not shown in the File->Export menu are not implemented in the current version of psppire (or the one being used). Examples include:
- TeX / LaTeX (.tex)
- RTF
- PDF/A
Choosing a Format
- Use PDF, HTML, ODT, TXT, or PS for non-graphical statistical output.
- Use PDF, SVG, PNG, or PS for graphical output.
- Use CSV when exporting tables for use in other software.
Merging Files (MATCH FILES)
Merging data files is a routine part of data processing. Additional variables often come from a different file, for example. PSPP's
MATCH FILEScommand merges them into the main file for analysis or reporting purposes.psppire does not currently provide dialogs for merging datasets.
MATCH FILESis a standard PSPP command, but it must be run from PSPP syntax.The following example shows how to merge two datasets that share a common key variable (
PATID). One file contains demographic variables; the other contains psychological measures.MATCH FILEScombines them into a single dataset by matching cases onPATID.Example datasets
demo.sav
PATID AGE SEX 101 34 1 102 29 2 103 41 1 psych.sav
PATID SCORE1 SCORE2 101 12 18 102 15 20 103 11 17 Merged Output
PATID AGE SEX SCORE1 SCORE2 101 34 1 12 18 102 29 2 15 20 103 41 1 11 17 MATCH FILES /FILE='demo.sav' /TABLE='psych.sav' /BY PATID. EXECUTE. SAVE OUTFILE='merged.sav'.PSPP matches cases from both files using
PATID. Variables fromdemo.savandpsych.savappear together in the merged dataset. If a case appears in one file but not the other,MATCH FILESstill produces a case, but variables from the missing file are system-missing.The dataset named on
/FILEbecomes the active dataset after the merge. Variables from each/TABLEfile are added to it. In this example,demo.savis the base file. After MATCH FILES completes, the active dataset contains the merged result, which should be saved (for example, withSAVE OUTFILE='merged.sav'). PSPP system files contain the data set with all its variable attributes.PSPP always has exactly one active file, referred to in syntax as
*(asterisk). Commands such asGET FILE,MATCH FILES, andADD FILESreplace the active file with their result. Because the active file is overwritten whenever a new dataset is read or created, save any results desired to be kept before running another command that changes the active file.What happens when there is not a one-to-one merge? Subject 104 has been added to psych.sav but not to demo.sav. In this case, PATID 104 is merged but that subject's demographic variables are all missing.
Revised psych.sav
PATID SCORE1 SCORE2 101 12 18 102 15 20 103 11 17 104 14 19 Merged Output
PATID AGE SEX SCORE1 SCORE2 101 34 1 12 18 102 29 2 15 20 103 41 1 11 17 104 . . 14 19 If a PATID value appears in one file but not the other,
MATCH FILESstill creates a case in the merged dataset. This is not a missing file or a missing variable — it is a missing case in one of the datasets. PSPP has no values to supply for that side of the merge, so the variables from the file where the case is absent are set to system-missing. This is the safest behavior: it preserves the case without inventing zeros or placeholder values.Flags created with
/IN=track which file each case came from. An/INvariable is set to 1 if the case was present in that file, and 0 if it was not. This makes it easy to identify unmatched cases after the merge.MATCH FILES /FILE='demo.sav' /IN=indemo /TABLE='psych.sav' /IN=inpsych /BY PATID. SAVE OUTFILE='merged.sav'.Merged Output with /IN= Flags
PATID AGE SEX SCORE1 SCORE2 indemo inpsych 101 34 1 12 18 1 1 102 29 2 15 20 1 1 103 41 1 11 17 1 1 104 . . 14 19 0 1 In addition to
MATCH FILES, PSPP also providesADD FILESto append data from multiple files to the active data set, andUPDATE, which updates a master file with modifications from a transaction file.A note about copy-and-paste merging:
The
/IN=flags above show exactly what goes wrong when the data in two files doesn't match perfectly. A statistical merge can make these mismatches visible and preserve the structure of the data but only if you use the tools designed for that purpose. A copy‑and‑paste merge cannot. In a spreadsheet, unmatched cases, misaligned rows, and missing variables are all silent — you won't see them, and you won't know they happened. Copy/paste may appear to “work” on perfectly clean data, but the first time the files differ even slightly, the structure is destroyed and information is lost. This is why merges should always be done by a statistical package like PSPP, not by hand. Understand your data, understand its structure, and always verify your merges.APPENDIX
Further Reading
The PSPP user manual provides the full reference for all commands, functions, and procedures available in PSPP. It is worth consulting when details beyond the examples shown here are needed, or when exploring options available in the psppire menus.
Installing PSPP
PSPP is at version 2.1.1 currently and after several important bug fixes (file handle close, writing portable files, and reading multi-record data with skipped cards in
DATA LIST, and other issues) and the addition ofGLMandCTABLEScommands in the last several years, PSPP is more capable than ever. If your version is older than 2.1.1 consider upgrading.PSPP download page at gnu.org The gnu site gives instructions for installing PSPP for Windows, Mac, Debian, Ubuntu, Fedora, and with Flatpak.
PSPP source can also be obtained with git: savannah.gnu.org/git
FreeBSD: PSPP is not in the FreeBSD ports collection (expired 2025-03-01 after being marked broken). FreeBSD users can still install PSPP by building it from the official GNU source release. The PSPP build instructions work on FreeBSD. See
pspp/INSTALLafter cloning with Git for the details.