This tutorial demonstrates how to use full information maximum likelihood (FIML) estimation to deal with missing data in a regression model using `lavaan`

.

## Import Data

In this post I use FIML to deal with missing data in a multiple regression framework. First, I import the data from a text file named ‘employee.dat’. You can download a zip file of the data from Applied Missing Data website. I also have a github page for these examples here. Remember to replace the file path in the **read.table** function with the path to the text file location on your computer.

```
employee <- read.table("data/employee.dat")
```

Because the original text file does not include variable names, I name the variables in the new data frame:

```
names(employee) <- c("id", "age", "tenure", "female", "wbeing", "jobsat",
"jobperf", "turnover", "iq")
```

then I recode all data points with the value of -99 in the original text file, which indicates a missing value, to `NA`

, the missing data value recognized by R.

```
employee[employee == -99] <- NA
```

## Create Regression Model Object

Now we are ready to create a character string containing the regression model using the `lavaan`

model conventions. Note that **b1** and **b2** are labels that will be used later for the Wald test. These labels are equivalent to **(b1)** and **(b2)** after these variables in the Mplus code.

```
model <- '
# Regression model
jobperf ~ b1*wbeing + b2*jobsat
# Variances
wbeing ~~ wbeing
jobsat ~~ jobsat
# Covariance/correlation
wbeing ~~ jobsat
'
```

In addition to the regression model, I also estimated the variances and covariances of the predictors. I did this to replicate the results of the original Mplus example. In Mplus you have to estimate the variances of all of the predictors if any of them have missing data that you would like to model. In `lavaan`

the *fixed.x=FALSE* argument has the same effect (see below).

## Fit the Model

Next, I use the **sem** function to fit the model.

```
fit <- sem(model, employee, missing='fiml', meanstructure=TRUE,
fixed.x=FALSE)
```

Listwise deletion is the default, so the *missing=‘fiml’* argument tell `lavaan`

to use the FIML instead. I also included the *meanstructure=TRUE* argument to include the means of the observed variables in the model, and the *fixed.x=FALSE* argument to estimate the means, variances, and covariances. Again, I do this to replicate the results of the original Mplus example.

## Generate Output

We are now ready to look at the results.

```
summary(fit, fit.measures=TRUE, rsquare=TRUE, standardize=TRUE)
```

Compared to what we learned in the last post, the only thing new to the **summary** function is the *rsquare=TRUE* argument, which, not surprisingly, results in the model R^{2} being included in the summary output.

I only show the **Parameter estimates** section here:

```
Parameter estimates:
Information Observed
Standard Errors Standard
Estimate Std.err Z-value P(>|z|) Std.lv Std.all
Regressions:
jobperf ~
wbeing (b1) 0.476 0.055 8.665 0.000 0.476 0.447
jobsat (b2) 0.027 0.060 0.444 0.657 0.027 0.025
Covariances:
wbeing ~~
jobsat 0.467 0.098 4.780 0.000 0.467 0.336
Intercepts:
jobperf 2.869 0.382 7.518 0.000 2.869 2.289
wbeing 6.286 0.063 99.692 0.000 6.286 5.338
jobsat 5.959 0.065 91.836 0.000 5.959 5.055
Variances:
wbeing 1.387 0.108 1.387 1.000
jobsat 1.390 0.109 1.390 1.000
jobperf 1.243 0.087 1.243 0.792
R-Square:
jobperf 0.208
```

## Wald Test

In `lavaan`

the Wald test is called separately from the estimation function. This function will use the labels assigned in the model object above.

```
# Wald test is called seperately.
lavTestWald(fit, constraints='b1 == 0
b2 == 0')
```

Results of Wald Test

```
$stat
[1] 95.88081
$df
[1] 2
$p.value
[1] 0
```

There you have it! Regression with FIML in R. But, what if you have variables that you are not interested in incorporating in your model, but may have information about the missingness in the variables that are in your model? I will talk about that in the next post.