INTERPOLATING A LOW-FREQUENCY TIME TO A HIGH-FREQUENCY ONE: PROGRAMING AND ESTIMATION PROCEDURE FOR MATLAB

This study provides an estimation procedures and statistical package programing for temporal disaggregation of time series data. That is, this method is used to disaggregate low frequency data to higher frequency data. Temporal disaggregation can be performed with one or more high frequency indicator series.


INTRODUCTION
Data is a crucial part of responsible research. Whenever the investigators or the research teams start a new research they should concern about issues related to data. If you have a clear plan about your data at the beginning of the research, you save time and effort later on. Also, you are assured that the data you produce will be preserved in a clear, useable format.
Research data are an essential and costly output of the scholarly research process, across all disciplines. They are an important part of the evidence necessary to evaluate research results, and to reconstruct the events and processes leading to them. It is a common problem for researchers and analysts about not having a series at the preferred frequency. For instance, instead of monthly output (gross domestic product: GDP), they only have either quarterly or annual GDP. Even in some time they don't have quarterly GDP. Instead of a daily stock market index, they only have a weekly index. While there is no way to completely make up for the missing time series, there are some useful techniques. That is, using one or more high frequency data series, the low frequency series can be disaggregated into a high frequency series. For example, quarterly imports could help disaggregating annual GDP, and/or monthly investment and monthly exports could help disaggregating the annual output.
In order to maintain the reliability of research, accurate data collection is necessary regardless of the field of study or preference for defining data (quantitative, qualitative). Both the selection of appropriate data collection/disaggregation method and clearly delineated instructions for their correct use reduce the likelihood of errors occurring are essential. The primary motivation for preserving data integrity is to support the detection of errors in the data collection process, whether they are made intentionally or not. Most, Craddick, Crawford, Redican, Rhodes, Rukenbrod, and Laws (2003) explain 'quality assurance' and 'quality control' 1 as two approaches that can preserve data integrity and ensure the scientific validity of study results. Each approach is implemented at different points in the research timeline (Whitney, Lind, and Wahl, 1998).Several researchers are considering above mentioned quality approaches when they interpolate the low frequency data in to high frequency data. For example, Chow and Lin (1971) and Goldberger (1962) used best linear unbiased interpolation method.
Although Litterman (1983), Fernandez (1981 and Chow-Lin (1971) use one or several indicators and perform a regression on the low frequency series, Litterman(1983) andFernandez (1981) are dealing with non-cointegrated series while Chow and Lin (1971) suited for cointegrated series. Alternatively, Dagum and Cholette (2006) disaggregate a series without an indicator. They primarily concerned with movement preservation, generating a series that is similar to the indicator series whether or not the indicator is correlated with the low frequency series. Pavía-Miralles (2010) classifies and reviews the procedures, provides interesting discussion on the history of the methodological development in this literature and permits to identify the assets and drawbacks of each method, to comprehend the current state of art on the subject and to identify the topics in need of further development.
All of the above mentioned techniques confirm that either the first or the last value (the sum/the average) of the resulting high frequency series is consistent with the low frequency series.  state that "they can deal with situations where the high frequency is an integer multiple of the low frequency (e.g. years to quarters, weeks to days), but not with irregular frequencies (e.g. weeks to months)". The interpolation methods are widely used in official statistics packages. That is, to perform the temporal disaggregation researchers are employing different software packages. For example, R extension by , Quilis (2012) used Matlab extension, RATS extension by Doan (2008), Barcellan et al. (2003) employed Ecotrim extension.
Although a very few studies (e.g. Quilis, 2012) provide a programming to interpolate low frequency data in to high frequency one using Matlab software, which is also early version 1 Quality assurance is the activities that take place before data collection begins and Quality control is the activities that take place during and after data collection even it is either a primary or secondary data. of it (Matlab 7.6 [R2008a]). Therefore, in this paper we derive best linear unbiased predictor of an individual drawing of Y (low frequency series) given X (X may be one or more high frequency series) in the linear regression model using Matlab 7.14 (R2012a) version. To that aim, we are describing the estimation procedure and manual Matlab programming.
The section 2 discusses the framework of interpolation method. Section 3 presents the estimation procedure and programing using an example. Finally the key results are obtained and discussed in section 4.

THE INTERPOLATION METHOD
The purpose of interpolation is to find out an unknown high frequency series (say Y: monthly GDP), whose averages, sums, first or last values are consistent with a known low frequency series (say annual or quarterly GDP). In order to estimate monthly GDP, one or more other high frequency indicator variables can be used. We collect these high frequency series in a matrix X. Hence, monthly observations of a series can be estimated using either bivariate or multiple regression relationship. Following Chow and Lin (1971) approach, the generalized linear regression model is given by 2 : where, Y is 12n × 1(or T × 1)vector of regress and observations, X is 12n × K (or T × K) matrix of regressors observations,β is K × 1 vector of coefficients and u~N(0, W). Also is the year, is the number of high frequency indicator series, is month (12 × ) and W is the 12n × 12 (or T × T) positive-definite of variance-covariance of disturbances (W = 12 2 ). For the purpose of statistical analysis, the indicator series X is going to be treated as fixed in the equation (1).
The equation (1) describes the sample period of 12n months relations between regressand and regressors, but we don't have the monthly series of Y instead we have annual data only. Therefore, to converts the 12n monthly observations into n annual observations, we need to transform this equation by multiplying compatibility matrix form. The transformed equation is given by: Throughout this study we will speak about estimating a monthly series given its annual data of that series and monthly data of indicator series. Also, we will provide the Matlab program on estimating monthly series given it annual and quarterly data of that series and monthly series of indicator series.
where, C is n × 12n matrix. In this case averages, first values or sums are consistent with a known low frequency series (i.e. Y). Therefore, for distribution and interpolation, C matrix can be takes the form as: Where CA represents that the averages values are consistent with regressand, C s denotes that the sums of monthly values are consistent with regressand and C F indicate that first month value is consistent with the regressand. For the temporal disaggregation we can use any one of these three alternatives.
Since all the data series are in annual basis in equation (2), now we will able to estimate this model by ordinary least square (OLS) method. The estimator of β is given by: Where, Ẋ, Ẏ are based on the annual data, and ẇ= CC ′ 2 Now the problem is, how to estimate the monthly observations on the dependent variables. To that aim, now assume that we estimate a vector of , which is identical with in the case of temporal disaggregation. Therefore, the regression model is given by: where, X Z and u Z are identical with X and of the equation (1) for the interpolation and distribution. Using some × matrix A, a linear unbiased estimator ẑ of z satisfies as: After solving this, the estimated value of Z is: where u̇= Ẏ− Ẋβ For the temporal disaggregation, we assume that where the definition of C is given by either C F or C A or C S .
As in Chow and Lin, to compute the estimated equation of (6) without any difficulties, we took three assumptions. The first case is to assume that monthly regression residuals are serially uncorrelated. In this case, to estimate the monthly observation of the dependent variable, one can easily assume that the term (ẇZẇ− 1 )u̇= C F ′ u̇ in equation (6) assigns the regression residual for any year to the first month of that year. On the other hand if we assume that the term (ẇZẇ− 1 )u̇= C A ′ u̇ in equation (6) assigns the regression residual for any year to the average values of 12 month of that year. Moreover, if we assume that the term (ẇZẇ− 1 )u̇= C S ′ u̇ in equation (6) assigns the regression residual for any year to the summation of 12 month of that year. Second case is to assume that monthly residuals follow first order auto-regression u t = αu t−1 + ε t with E(ε t ε s ) = θ ts σ 2 In this case, for the interpolation we needs In the third case, to estimate the w, we assume that although monthly residual series are serially uncorrelated, but variances are proportional to a certain linear combination of independent variables or to a known function of a regressors. Then, w will be the proportional and diagonal to a given matrix.

THE ESTIMATION PROCEDURE AND PROGRAMMING
Let assume that we have annual data of gross domestic product (GDP), which is taken as dependent variable (named as ). Let be monthly data matrix, which is considered as explanatory variables. Following the methodology described in the section 2, now we start to write a Matlab programing to interpolate GDP data from annual series to monthly series.
Before starting the programing, we should have data in proper format. Hence, since number of observations is differing between dependent and explanatory variables keep two separated excel/txt data file; one for dependent variable (low frequency data series) and another for explanatory variable/s (high frequency indicator series). First, we have to load the data file in to Matlab software. To do that, follow as below: clear all; % will clear the memory of the work file and start freshly format bank; % or you can write "format short"/ "format long" as well [num, txt, raw] = xlsread('explanatory.xlsx'); % num is initialized with all the numbers % txt is initialized with all the text % raw is a cell matrix with all the numbers & text numbers=cell2mat(raw(2:end,2:end));% This returns the matrix with all the numbers headings=cell2mat(raw(1:1,1:end)); % This returns the headings of your matrix text=raw(2:end,1); % This returns the first column. data1=numbers(:,:); % Defining the monthly data as a matrix data2=xlsread('gdp.xlsx'); % loading the annual data to Matlab file Second, we are assigning the raw and column for data series. To do that, follow as below: [p1 q1]=size(data1); % defining the raw and column for data1 [p2 q2]=size(data2); % defining the raw and column for data2 Third, define the dependent and independent variables as in matrix/vector form: x=data1(:,:); % defining explanatory variables as a matrix y=data2(:,2); % defining the dependent variable as a vector Fourth, since variables y and x has different frequencies, now we will construct matrix "C" to convert the regression model (1) from monthly to annual to maintain consistent number of observation. In this case, we use C F matrix, which first month value of a particular year is consistent the annual series.
First, we load the xlsx. data file to Matlab using following commands. clearall; formatbank; [num, txt, raw] = xlsread('explanatory.xlsx'); % num is initialized with all the numbers % txt is initialized with all the text % raw is a cell matrix with all the numbers & text numbers=cell2mat(raw(2:end,2:end)); % This returns the matrix with all the numbers headings=cell2mat(raw(1:1,1:end)); % This returns the headings of your matrix text=raw(2:end,1); % This returns the first column. data1=numbers(:,:); % Defining the monthly data as a matrix data2=xlsread('gdp.xlsx'); % loading the annual data to Matlab file Second, we assign the raw and column for data series using below commands.
[p1 q1]=size(data1); % defining the raw and column for data1 [p2 q2]=size(data2); % defining the raw and column for data2 Third, define the dependent and independent variables as in matrix/vector form: x=data1(:,:); % defining explanatory variables as a matrix y=data2(:,2); % defining the dependent variable as a vector Fourth, we will construct C_A matrix, in which average monthly value is equal to low frequency data, to convert the eq. (1) as annual series equation.

ESTIMATION RESULTS
We used three methods such as first month value, average monthly value and sum of the monthly value to disaggregate low frequency data into high frequency data following Chow and Lin (1971) approach. There is indeed one to one relationship between initial data and interpolated data among three methods. The results are given below:  Figure 1, 2, and 3 shows the monthly data of GDP from 1978M1 to 2011M12 After the temporal disaggregation and 2 nd panel represents the comparison of original low frequency data (i.e. original annual GDP) and interpolated GDP data. According to the 1 st panel, we can observe the similar pattern among three methods regardless of the matrix forms of C, that is either first month value (C F ) or average monthly value (C A ) or sum of the monthly value (C S ), that we used to convert regression model (1) to (2). This implies that our estimates satisfy the Best Linear Unbiased property as shown in Chow and Lin (1971).
In our example, we actually know the true data on annual GDP, so we can compare the interpolated values to the true values. With an indicator series, Chow-Lin procedures produce the series with one to one relationship among all three methods (Panel 2 of Figure  1, 2 and 3). This is, of course, due to fact that in this example, our estimate satisfies the Best Linear Unbiased property.

CONCLUDING AND REMARKS
This study attempt offers estimation procedure and Matlab programing to disaggregate a low frequency time series into a higher frequency series, using either first or the average, or the sum value of the resulting high frequency series is consistent with the low frequency series. Although temporal disaggregation can be performed with or without the help of one or more high frequency indicators, here we used more than one high frequency indicator series to the disaggregation. If good indicators are and estimation procedures are at hand, the resulting series may be close to the true series.2 nd panel of Figures 1, 2 and 3 proof this statement and we found one to one relationship between resulting series and true series among all three method suggesting that empirical researchers can use one of these method to disaggregate their data from low frequency to high frequency and proceed their work.