Missing Data in MATLAB
Working with missing data is a common task in data preprocessing. Although sometimes missing values signify a meaningful event in the data, they often represent unreliable or unusable data points. In either case, MATLAB® has many options for handling missing data.
Create and Organize Missing Data
The form that missing values take in MATLAB depends on the data type. For example, numeric data types such asdouble
useNaN
(not a number) to represent missing values.
x = [NaN 1 2 3 4];
You can also use themissing
value to represent missing numeric data or data of other types, such asdatetime
,string
, andcategorical
. MATLAB automatically converts themissing
value to the data's native type.
xDouble = [missing 1 2 3 4]
xDouble =1×5NaN 1 2 3 4
xDatetime = [missing datetime(2014,1:4,1)]
xDatetime =1x5 datetimeNaT 01-Jan-2014 01-Feb-2014 01-Mar-2014 01-Apr-2014
xString = [missing"a""b""c""d"]
xString =1x5 string"a" "b" "c" "d"
xCategorical = [missing categorical({'cat1''cat2''cat3''cat4'})]
xCategorical =1x5 categorical<定义> cat1 cat2 cat3 cat4
A data set might contain values that you want to treat as missing data, but are not standard MATLAB missing values in MATLAB such asNaN
. You can use thestandardizeMissing
function to convert those values to the standard missing value for that data type. For example, treat 4 as a missingdouble
value in addition toNaN
.
xStandard = standardizeMissing(xDouble,[4 NaN])
xStandard =1×5NaN 1 2 3 NaN
Suppose you want to keep missing values as part of your data set but segregate them from the rest of the data. Several MATLAB functions enable you to control the placement of missing values before further processing. For example, use the'MissingPlacement'
option with thesort
function to moveNaN
s to the end of the data.
xSort = sort(xStandard,'MissingPlacement','last')
xSort =1×51 2 3 NaN NaN
Find, Replace, and Ignore Missing Data
Even if you do not explicitly create missing values in MATLAB, they can appear when importing existing data or computing with the data. If you are not aware of missing values in your data, subsequent computation or analysis can be misleading.
For example, if you unknowingly plot a vector containing aNaN
value, theNaN
does not appear because theplot
function ignores it and plots the remaining points normally.
nanData = [1:9 NaN]; plot(1:10,nanData)
However, if you compute the average of the data, the result isNaN
. In this case, it is more helpful to know in advance that the data contains aNaN
, and then choose to ignore or remove it before computing the average.
meanData = mean(nanData)
meanData = NaN
One way to findNaN
s in data is by using theisnan
function, which returns a logical array indicating the location of anyNaN
value.
TF = isnan(nanData)
TF =1x10 logical array0 0 0 0 0 0 0 0 0 1
Similarly, theismissing
function returns the location of missing values in data for multiple data types.
TFdouble = ismissing(xDouble)
TFdouble =1x5 logical array1 0 0 0 0
TFdatetime = ismissing(xDatetime)
TFdatetime =1x5 logical array1 0 0 0 0
Suppose you are working with a table or timetable made up of variables with multiple data types. You can find all of the missing values with one call toismissing
, regardless of their type.
xTable = table(xDouble',xDatetime',xString',xCategorical')
xTable =5×4 tableVar1 Var2 Var3 Var4 ____ ___________ _________ ___________ NaN NaT1 01-Jan-2014 "a" cat1 2 01-Feb-2014 "b" cat2 3 01-Mar-2014 "c" cat3 4 01-Apr-2014 "d" cat4
TF = ismissing(xTable)
TF =5 x4逻辑阵列1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Missing values can represent unusable data for processing or analysis. Usefillmissing
to replace missing values with another value, or usermmissing
to remove missing values altogether.
xFill = fillmissing(xStandard,'constant',0)
xFill =1×50 1 2 3 0
xRemove = rmmissing(xStandard)
xRemove =1×31 2 3
Many MATLAB functions enable you to ignore missing values, without having to explicitly locate, fill, or remove them first. For example, if you compute the sum of a vector containingNaN
values, the result isNaN
. However, you can directly ignoreNaN
s in the sum by using the'omitnan'
option with thesum
function.
sumNan = sum(xDouble)
sumNan = NaN
sumOmitnan = sum(xDouble,'omitnan')
sumOmitnan = 10
See Also
ismissing
|fillmissing
|standardizeMissing
|missing