Counting Missing Values

Generic function for counting the percentage/amount of missing values in a zoo object, using a user-defined temporal scale.

Usage

cmv(x, ...)

# S3 method for default
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly", 
            "quarterly", "seasonal", "annual"),
            out.type=c("percentage", "amount"), dec=3, 
            start="00:00:00", start.fmt= "%H:%M:%S", tz, ...)

# S3 method for zoo
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly", 
            "quarterly", "seasonal", "annual"),
            out.type=c("percentage", "amount"), dec=3, 
            start="00:00:00", start.fmt= "%H:%M:%S", tz, ...)

# S3 method for data.frame
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly", 
            "quarterly", "seasonal", "annual"),
            out.type=c("percentage", "amount"), dec=3, 
            start="00:00:00", start.fmt= "%H:%M:%S", tz, 
            dates=1, date.fmt="%Y-%m-%d", ...)

# S3 method for matrix
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly", 
            "quarterly", "seasonal", "annual"),
            out.type=c("percentage", "amount"), dec=3, 
            start="00:00:00", start.fmt= "%H:%M:%S", tz,
            dates=1, date.fmt="%Y-%m-%d", ...)

Arguments

x

zoo, data.frame or matrix object, with the time series to be analised.
Measurements at several gauging stations can be stored in a data.frame or matrix object, and in that case, each column of x represents the time series measured in a gauging statio, and the column names of x have to correspond to the ID of each station (starting by a letter).

tscale

character with the temporal scale to be used for analysing the mssing data. Valid values are:
-) hourly: the percentage/amount of missing values will be given for each hour and ,therefore, the expected time frequency of x must be sub-hourly.
-) daily: the percentage/amount of missing values will be given for each day and, therefore, the expected time frequency of x must be sub-daily (i.e., hourly or sub-hourly).
-) weekly: the percentage/amount of missing values will be given for each week (starting on Monday) and, therefore, the expected time frequency of x must be sub-weekly (i.e., daily, (sub)hourly).
-) monthly: the percentage/amount of missing values will be given for each month and, therefore, the expected time frequency of x must be sub-monthly (i.e., daily, hourly or sub-hourly).
-) quarterly: the percentage/amount of missing values will be given for each quarter and, therefore, the expected time frequency of x must be sub-quarterly (i.e., monthly, daily, hourly or sub-hourly).
-) seasonal: the percentage/amount of missing values will be given for each weather season (see ?time2season) and, therefore, the expected time frequency of x must be sub-seasonal (i.e., monthly, daily, hourly or sub-hourly).
-) annual: the percentage/amount of missing values will be given for each year and, therefore, the expected time frequency of x must be sub-annual (i.e., seasonal, monthly, daily, hourly or sub-hourly).

dec

integer indicating the amount of decimal places included in the output.
It is only used when out.type=='percentage'.

start

character, indicating the starting time used for aggregating sub-daily time series into daily ones. It MUST be provided in the format specified by start.fmt.
This value is used to define the time when a new day begins (e.g., for some rain gauge stations).
-) All the values of x with a time attribute before start are considered as belonging to the day before the one indicated in the time attribute of those values.
-) All the values of x with a time attribute equal to start are considered to be equal to "00:00:00" in the output zoo object.
-) All the values of x with a time attribute after start are considered as belonging to the same day as the one indicated in the time attribute of those values.

It is useful when the daily values start at a time different from "00:00:00". Use with caution. See examples.

start.fmt

character indicating the format in which the time is provided in start, By default date.fmt=%H:%M:%S. See format in as.POSIXct.

tz

character, with the specification of the time zone used in both x and start. System-specific (see time zones), but "" is the current time zone, and "GMT" is UTC (Universal Time, Coordinated). See Sys.timezone and as.POSIXct.
If tz is missing (the default), it is automatically set to the time zone used in time(x).
This argument can be used to force using the local time zone or any other time zone instead of UTC as time zone.

dates

numeric, factor, POSIXct or POSIXt object indicating how to obtain the dates and times for each column of x (e.g., gauging station).
If dates is a number, it indicates the index of the column in x that stores the date and times.
If dates is a factor, it is converted into POSIXct class, using the date format specified by date.fmt
If dates is already of POSIXct or POSIXt class, this function verifies that the number of elements on it be equal to the number of elements in x.

date.fmt

character indicating the format in which the dates are stored in dates, By default date.fmt=%Y-%m-%d %H:%M:%S. See format in as.Date.
ONLY required when class(dates)=="factor" or class(dates)=="numeric".

out.type

character indicating how should be returned the missing values for each temporal scale. Valid values are:
-) percentage: the missing values are returned as an real value, representing the percentage of missing values in each temporal scale.
-) amount: the missing values are returned as an integer value, representing the absolute amount of missing values in each temporal scale.

...

further arguments passed to or from other methods.

Details

The amount of missing values in each temporal scale is computed just by counting the amount of NAs in each hour / day / week / month / quarter / season / year, while the percentage of missing values in each temporal scale is computed by dividing the previous number by the total number of data elements in each hour / day / week / month / quarter / season / year.

This function was developed to allow the selective removal of values when agregting from a high temporal resolution into a lower temporal resolution (e.g., from hourly to daily or from daily to monthly), using any of the temporal aggregation functions available int his package (e.g., hourly2daily, daily2monthly)

Value

a zoo object with the percentage/amount of missing values for each temporal scale selected by the user.

Author

Mauricio Zambrano-Bigiarini, mzb.devel@gmail

Examples

######################
## Ex1: Loading the DAILY precipitation data at SanMartino (25567 daily values)
data(SanMartinoPPts)
x <- SanMartinoPPts

## Transforming into NA the 10% of values in 'x'
n           <- length(x)
n.nas       <- round(0.1*n, 0)
na.index    <- sample(1:n, n.nas)
x[na.index] <- NA

# Getting the amount of NAs in 'x' for each week (starting on Monday)
cmv(x, tscale="weekly")

# Getting the amount of NAs in 'x' for each month
cmv(x, tscale="monthly")

# Getting the amount of NAs in 'x' for each quarter
cmv(x, tscale="quarterly")

# Getting the amount of NAs in 'x' for each weather season
cmv(x, tscale="seasonal")

# Getting the amount of NAs in 'x' for each year
cmv(x, tscale="annual")
######################
## Ex2: Loading the time series of HOURLY streamflows for the station 
## Karamea at Gorge (52579 hourly values)
data(KarameaAtGorgeQts)
x <- KarameaAtGorgeQts

## Transforming into NA the 30% of values in 'x'
n           <- length(x)
n.nas       <- round(0.1*n, 0)
na.index    <- sample(1:n, n.nas)
x[na.index] <- NA

# Getting the amount of NAs in 'x' for each day
cmv(x, tscale="daily")

# Getting the amount of NAs in 'x' for each weather season
cmv(x, tscale="seasonal")