Chair: Krzysztof Jajuga
Room: S3B Sukiennice
Date: 28 June
|Title: <<< Effects of the influence of calibration procedures on reliability of indicators estimated by the state household sample surveys in Ukraine >>>
Procedures of statistical weights calibration are increasingly applying at the indicator estimation stage of the state household sample surveys in Ukraine. The practical purpose of calibrated estimation is to coordinate the results of surveys with the actual auxiliary information. This allows, on the one hand, to somewhat reduce the possible bias of estimates, and, on the other hand, to ensure matching of estimates with data from other sources. At the same time, calibration can lead to a significant decrease of the effective sample size and, accordingly, to a decrease of the accuracy of estimates. For instance in the State Household Living Conditions Survey (HLCS) provided by the State Statistics Service of Ukraine on quarterly basis, statistical weights are calibrated using relatively large number of external sources. This is primarily the data of demographic statistics on the number and structure of the population, data on the number of households, including households with children. In the experimental mode, the national accounts data and the tax administration data are used. Taking into account that the HLCS is the main source of information for measuring of a number of important indicators which in details reflect incomes, expenditures, consumption features, poverty of Ukrainian households it is important to estimate the impact of calibration on the reliability of key indicators. The article presents materials of estimation of influence of various calibration procedures on the quality of the statistical weights and on the reliability of indicators. Recommendations are given to optimize calibration procedures with the aim of minimization of their negative impact on reliability of estimation.
|Annalisa Lucarelli |
|Title: <<< A reweighting method to adjust for enterprise changes: the Italian quarterly job vacancy rate and hours worked estimates >>>
The Italian National Institute of Statistics produces quarterly estimates of the job vacancy rate and hours worked for EU Regulations and national dissemination through calibrated weights. Job vacancy and hours worked data are collected by two direct business surveys. The survey samples are drawn from the Italian statistical business register (ASIA), updated to two years before the reference quarter of the survey data, while the calibration constraints are derived from an administrative based source for the reference quarter. The distance in time between the reference periods of sampling frame and source for calibration constraints implies that sample units may be classified as belonging to different strata depending on whether the information in the sample frame or in the source used for calibration constraints is used, due to changes in the enterprises’ economic activity and/or size between the two reference periods. As a consequence, if the initial weights are based on the sample frame units classification, calibrated weights are found to show a high variability within some calibration strata. In particular, in a given calibration stratum, weights can be concentrated only on few sample units while the remaining ones can be near to zero. To ensure a more homogeneous distribution of the calibration weights within strata, the initial weights have been recalculated, updating the sample units classification with the information used in the calibration strata construction and based on the source used for the calibration constraints. Furthermore, the initial weights are calculated so as to correct also for non-response by the inverse of the response rate in the calibration stratum. The first empirical results obtained applying this method have shown a reduction in the estimated sampling variance with respect to those obtained using initial weights based only on the not updated information of the sampling frame.
|Carlos Fullea Carrera|
|Title: <<< Calibration as a tool to enhance accuracy and coherence in tourism statistics >>>
Calibration is a reweighting method widely used to improve the accuracy and coherence of official statistics. The Spanish Residents Travel Survey has as its main objective the measurement of the number of trips made by the resident population each month. Travelling is an activity performed by a small percentage of the population, making sampling highly inefficient. Besides, those making trips are more difficult to contact. On the other hand, the Hotel Occupancy Survey also measures the trips and overnight stays of residents staying at hotels and similar establishments, based on the information provided by the hotels. Each survey provides different figures for very similar magnitudes, creating confusion in the measurement of the tourism sector. This paper aims to describe I) the measures applied to improve the efficiency of the sample (subestratification based on a logit model predicting the probability of travelling), II) the calibration to sociodemographic variables to improve the accuracy of the estimates, and III) its extension including the hotel overnight stays provided by the Occupancy Survey in the calibration enhancing the coherence of tourism statistics. This paper assesses the differences in the main survey estimates (trips, overnight stays and average length), when comparing three scenarios: no calibration, calibration with sociodemographic variables, and calibration with all the variables, including overnight stays. In this process, regions with fewer records have to be grouped together each month due to sample size limitations. To calibrate each region separately, a reference period of three months could be considered, thereby increasing their sample size. The paper also explores the results of calibrating the three months of each quarter altogether, comparing them with the monthly estimates.
|Title: <<< Prediction Error Estimation of the Survey-Weighted Least Squares Model under Complex Sampling >>>
Many large-scale surveys make use of a complex sampling design where each observation unit is assigned a sampling weight which is developed over different stages. Survey-weighted least squares modelling (SWLS), the linear modelling of a continuous response based on its relationship with a number of covariates, correctly accounts for this complex sample design. One of the objectives of statistical modelling is the prediction of a future response. As such it is of importance to determine how well the selected model performs in the prediction of a future response. Cross-validation and resampling methods have long been used for this purpose under i.i.d data modelling, but not for the modelling of CS data. This talk introduces cross-validation and resampling methods for the evaluation of SWLS models based on the model’s prediction error. The investigation of the performance of the different prediction error estimation methods are evaluated through a simulation study. The Income and Expenditure Survey 2010/2011 of Statistics South Africa will form the basis of the analysis. The simulation study will also investigate whether the SWLS model’s predictive performance is improved through the truncation of outlier sampling weights. For this purpose two new thresholds, viz. the 1.5IQR and Hill, will be introduced.
|Title: <<< The R-Package ‘surveysd’ - Estimating Standard Errors in Complex Surveys with Rotating Panel Design >>>
There is an urgent need for regional indicators, especially on poverty and social exclusion, by policy makers in the EU. Surveys which were designed to estimate these indicators on national level, such as EU-SILC, usually do not provide the required precision for reliable estimates on regional levels like NUTS2 and below. With the R-Package ‚surveysd‘, we present a package to improve precision and estimate standard errors for social indicators on regional levels in a straightforward way. Regional estimates from subsequent waves are simply cumulated over time, assuming that these structural patterns remain fairly robust. Variance estimation for pooled data is complicated due to a high correlation within the pooled data. The package resolves the problem by using bootstrap techniques that incorporate pooling of correlated samples, like annual waves of EUSILC. In addition to variance estimation for point estimates the variance estimation for differences is supported. Usability of the package and variance improvement, using this bootstrap methodology, is demonstrated on EU-SILC UDB-data of selected countries with various sampling designs.
|Title: <<< Using administrative registers to improve sampling of EU-SILC in Austria >>>
EU-SILC in Austria uses income register data from administrative sources that are linked to the sample on micro-level to collect information about most components of the household income. Since this information is also available for the sampling frame it can be used for the sampling design. If the main research variables, especially the rate of people at risk of poverty or social exclusion (AROPE), were known beforehand, stratifying the sample by those variables would reduce the standard error of these variables and thus creating a more efficient sample. Since the former is obviously not the case, even stratifying by variables that are correlated with the main variables of interest should help reducing the standard error. For each person registered at addresses of private households in the sampling frame of EU-SILC 2016 and 2017 income data from registers were matched. The sum of all net income components were then aggregated for each address in the sampling frame creating a net household income based solely on register information. The first quartile of this characteristic is correlated with AROPE. Therefore it was decided to use it as a stratification criterion for the selection of the first wave sample of EU-SILC 2016 and 2017. Results show indeed a reduction of the standard error of AROPE and other main indicators, but only if other rotational samples of EU-SILC are not taken into account too. For the first wave of EU-SILC 2018 this approach was enhanced by using the newly available rich frame. It is a quarterly generated frame of the whole Austrian population based on a couple of different registers. Socio-demographic variables in combination with the available income information were used to train a machine learning model for estimating the AROPE for the whole frame. This predicted AROPE was then used as stratification variable.
|Title: <<< The Bundesbank’s Research Data and Service Center (RDSC): Gateway to Treasures of Micro-Data on the German Financial System >>>
The Bundesbank is one of the largest data producers in Germany and the data are of high quality because they are quality-tested administrative micro data. Due to legal requirements and in order to meet data protection requirements, individual data can be made available only under certain restrictions. In 2014 Bundesbank set up the Research and Data Center (RDSC) which provides analysts and internal and external researchers with access to selected micro-level data in the context of independent scientific research projects.
Some of RDSCs main tasks are:
Provide data access and data protection.
Mediation between data producers and external users.
Responsibility for the methodological improvement, physical provision and comprehensive documentation of high-quality microdata sets.
Consultancy and support services to prospective and existing data users inside and outside of the Bundesbank.
Provision of standardized linked micro data sets containing multiple data sources, not only from internal Bundesbank micro data but also from external data providers.
In order to create useful and comprehensive meta data for the provided micro data, global standards like SDMX and DDI are used. For the linkage of different micro data sources, a record linkage group creates high quality master data sets that enable the linkage not only between multiple Bundesbank micro data but also micro data provided by external services like company data or else. Consultancy and support services are provided by experienced data and research experts in the areas of economics, finance and social studies.
|Title: <<< Improving Monthly Estimates of Job Vacancies Survey With Statistical Model >>>
The monthly Job Vacancies Survey in Israel began in 2009 by the Central Bureau Of Statistics. The survey's goals are: To serve as a leading indicator for the cyclicality in the labour market (during recession firms will begin by reducing their job openings and only later proceed with dismissing employees). To aid in assessing the demand for labour and identifying work opportunities by industry and composition of employed persons. To supply a broad view of the labour market by comparing estimates of job vacancies and the profile of workers requested by employers (labour demand) against the estimates of job seekers and their profiles as derived from labor force survey data (labour supply). In order to reduce the response burden, it was determined that the firms that belongs to the take some strata (small and medium firms) will be divided randomly during the quarter. While the large firms from the take all strata will report monthly. The strata are determined by the number of employees in the firm while the survey's target is the number of vacancies in each firm. Sometimes those variables (employees, vacancies) do not have high correlation (there is a large variation of the vacancies between firms in the same strata). This sample design led to high variation of the monthly estimates with an error pattern and a high correlation to the estimates in period t-3. We applied a State Space model that reduces the sample error and smooth's the estimates. This process improved the quality of the survey's estimates and I will present this method which is strongly relevant to surveys with panel sample. The presentation will also analyze the unique characters of the vacancies survey and it implication to imputing missing items.