Counting Missing Values
cmv.Rd
Generic function for counting the percentage/amount of missing values in a zoo object, using a user-defined temporal scale.
Usage
cmv(x, ...)
# S3 method for default
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly",
"quarterly", "seasonal", "annual"),
out.type=c("percentage", "amount"), dec=3,
start="00:00:00", start.fmt= "%H:%M:%S", tz, ...)
# S3 method for zoo
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly",
"quarterly", "seasonal", "annual"),
out.type=c("percentage", "amount"), dec=3,
start="00:00:00", start.fmt= "%H:%M:%S", tz, ...)
# S3 method for data.frame
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly",
"quarterly", "seasonal", "annual"),
out.type=c("percentage", "amount"), dec=3,
start="00:00:00", start.fmt= "%H:%M:%S", tz,
dates=1, date.fmt="%Y-%m-%d", ...)
# S3 method for matrix
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly",
"quarterly", "seasonal", "annual"),
out.type=c("percentage", "amount"), dec=3,
start="00:00:00", start.fmt= "%H:%M:%S", tz,
dates=1, date.fmt="%Y-%m-%d", ...)
Arguments
- x
zoo, data.frame or matrix object, with the time series to be analised.
Measurements at several gauging stations can be stored in a data.frame or matrix object, and in that case, each column ofx
represents the time series measured in a gauging statio, and the column names ofx
have to correspond to the ID of each station (starting by a letter).- tscale
character with the temporal scale to be used for analysing the mssing data. Valid values are:
-) hourly: the percentage/amount of missing values will be given for each hour and ,therefore, the expected time frequency ofx
must be sub-hourly.
-) daily: the percentage/amount of missing values will be given for each day and, therefore, the expected time frequency ofx
must be sub-daily (i.e., hourly or sub-hourly).
-) weekly: the percentage/amount of missing values will be given for each week (starting on Monday) and, therefore, the expected time frequency ofx
must be sub-weekly (i.e., daily, (sub)hourly).
-) monthly: the percentage/amount of missing values will be given for each month and, therefore, the expected time frequency ofx
must be sub-monthly (i.e., daily, hourly or sub-hourly).
-) quarterly: the percentage/amount of missing values will be given for each quarter and, therefore, the expected time frequency ofx
must be sub-quarterly (i.e., monthly, daily, hourly or sub-hourly).
-) seasonal: the percentage/amount of missing values will be given for each weather season (see ?time2season) and, therefore, the expected time frequency ofx
must be sub-seasonal (i.e., monthly, daily, hourly or sub-hourly).
-) annual: the percentage/amount of missing values will be given for each year and, therefore, the expected time frequency ofx
must be sub-annual (i.e., seasonal, monthly, daily, hourly or sub-hourly).- dec
integer indicating the amount of decimal places included in the output.
It is only used whenout.type=='percentage'
.- start
character, indicating the starting time used for aggregating sub-daily time series into daily ones. It MUST be provided in the format specified by
start.fmt
.
This value is used to define the time when a new day begins (e.g., for some rain gauge stations).
-) All the values ofx
with a time attribute beforestart
are considered as belonging to the day before the one indicated in the time attribute of those values.
-) All the values ofx
with a time attribute equal tostart
are considered to be equal to"00:00:00"
in the output zoo object.
-) All the values ofx
with a time attribute afterstart
are considered as belonging to the same day as the one indicated in the time attribute of those values.It is useful when the daily values start at a time different from
"00:00:00"
. Use with caution. See examples.- start.fmt
character indicating the format in which the time is provided in
start
, By defaultdate.fmt=%H:%M:%S
. Seeformat
inas.POSIXct
.- tz
character, with the specification of the time zone used in both
x
andstart
. System-specific (see time zones), but""
is the current time zone, and"GMT"
is UTC (Universal Time, Coordinated). SeeSys.timezone
andas.POSIXct
.
Iftz
is missing (the default), it is automatically set to the time zone used intime(x)
.
This argument can be used to force using the local time zone or any other time zone instead of UTC as time zone.- dates
numeric, factor, POSIXct or POSIXt object indicating how to obtain the dates and times for each column of
x
(e.g., gauging station).
Ifdates
is a number, it indicates the index of the column inx
that stores the date and times.
Ifdates
is a factor, it is converted into POSIXct class, using the date format specified bydate.fmt
Ifdates
is already of POSIXct or POSIXt class, this function verifies that the number of elements on it be equal to the number of elements inx
.- date.fmt
character indicating the format in which the dates are stored in
dates
, By defaultdate.fmt=%Y-%m-%d %H:%M:%S
. Seeformat
inas.Date
.
ONLY required whenclass(dates)=="factor"
orclass(dates)=="numeric"
.- out.type
character indicating how should be returned the missing values for each temporal scale. Valid values are:
-) percentage: the missing values are returned as an real value, representing the percentage of missing values in each temporal scale.
-) amount: the missing values are returned as an integer value, representing the absolute amount of missing values in each temporal scale.- ...
further arguments passed to or from other methods.
Details
The amount of missing values in each temporal scale is computed just by counting the amount of NAs in each hour / day / week / month / quarter / season / year, while the percentage of missing values in each temporal scale is computed by dividing the previous number by the total number of data elements in each hour / day / week / month / quarter / season / year.
This function was developed to allow the selective removal of values when agregting from a high temporal resolution into a lower temporal resolution (e.g., from hourly to daily or from daily to monthly), using any of the temporal aggregation functions available int his package (e.g., hourly2daily
, daily2monthly
)
Value
a zoo object with the percentage/amount of missing values for each temporal scale selected by the user.
Author
Mauricio Zambrano-Bigiarini, mzb.devel@gmail
Examples
######################
## Ex1: Loading the DAILY precipitation data at SanMartino (25567 daily values)
data(SanMartinoPPts)
x <- SanMartinoPPts
## Transforming into NA the 10% of values in 'x'
n <- length(x)
n.nas <- round(0.1*n, 0)
na.index <- sample(1:n, n.nas)
x[na.index] <- NA
# Getting the amount of NAs in 'x' for each week (starting on Monday)
cmv(x, tscale="weekly")
# Getting the amount of NAs in 'x' for each month
cmv(x, tscale="monthly")
# Getting the amount of NAs in 'x' for each quarter
cmv(x, tscale="quarterly")
# Getting the amount of NAs in 'x' for each weather season
cmv(x, tscale="seasonal")
# Getting the amount of NAs in 'x' for each year
cmv(x, tscale="annual")
######################
## Ex2: Loading the time series of HOURLY streamflows for the station
## Karamea at Gorge (52579 hourly values)
data(KarameaAtGorgeQts)
x <- KarameaAtGorgeQts
## Transforming into NA the 30% of values in 'x'
n <- length(x)
n.nas <- round(0.1*n, 0)
na.index <- sample(1:n, n.nas)
x[na.index] <- NA
# Getting the amount of NAs in 'x' for each day
cmv(x, tscale="daily")
# Getting the amount of NAs in 'x' for each weather season
cmv(x, tscale="seasonal")