Speed Talk Session 12

Back to Schedule

Title of session: Estimation and calibration

Chair:

Room: Lecture room 2

Time: 13:15 - 14:00

Date: 28 June


Presenting AuthorAbstract
Carlos Fullea Carrera
e-mail: carlos.fullea.carrera@ine.es
Title: <<< Calibration as a tool to enhance accuracy and coherence in tourism statistics >>>
Calibration is a reweighting method widely used to improve the accuracy and coherence of official statistics. The Spanish Residents Travel Survey has as its main objective the measurement of the number of trips made by the resident population each month. Travelling is an activity performed by a small percentage of the population, making sampling highly inefficient. Besides, those making trips are more difficult to contact. On the other hand, the Hotel Occupancy Survey also measures the trips and overnight stays of residents staying at hotels and similar establishments, based on the information provided by the hotels. Each survey provides different figures for very similar magnitudes, creating confusion in the measurement of the tourism sector. This paper aims to describe I) the measures applied to improve the efficiency of the sample (subestratification based on a logit model predicting the probability of travelling), II) the calibration to sociodemographic variables to improve the accuracy of the estimates, and III) its extension including the hotel overnight stays provided by the Occupancy Survey in the calibration enhancing the coherence of tourism statistics. This paper assesses the differences in the main survey estimates (trips, overnight stays and average length), when comparing three scenarios: no calibration, calibration with sociodemographic variables, and calibration with all the variables, including overnight stays. In this process, regions with fewer records have to be grouped together each month due to sample size limitations. To calibrate each region separately, a reference period of three months could be considered, thereby increasing their sample size. The paper also explores the results of calibrating the three months of each quarter altogether, comparing them with the monthly estimates.
Retha Luus
e-mail: rluus@uwc.ac.za
Title: <<< Prediction Error Estimation of the Survey-Weighted Least Squares Model under Complex Sampling >>>
Many large-scale surveys make use of a complex sampling design where each observation unit is assigned a sampling weight which is developed over different stages. Survey-weighted least squares modelling (SWLS), the linear modelling of a continuous response based on its relationship with a number of covariates, correctly accounts for this complex sample design. One of the objectives of statistical modelling is the prediction of a future response. As such it is of importance to determine how well the selected model performs in the prediction of a future response. Cross-validation and resampling methods have long been used for this purpose under i.i.d data modelling, but not for the modelling of CS data. This talk introduces cross-validation and resampling methods for the evaluation of SWLS models based on the model’s prediction error. The investigation of the performance of the different prediction error estimation methods are evaluated through a simulation study. The Income and Expenditure Survey 2010/2011 of Statistics South Africa will form the basis of the analysis. The simulation study will also investigate whether the SWLS model’s predictive performance is improved through the truncation of outlier sampling weights. For this purpose two new thresholds, viz. the 1.5IQR and Hill, will be introduced.
Johannes Gussenbauer
e-mail: Johannes.Gussenbauer@statistik.gv.at
Title: <<< The R-Package ‘surveysd’ - Estimating Standard Errors in Complex Surveys with Rotating Panel Design >>>
There is an urgent need for regional indicators, especially on poverty and social exclusion, by policy makers in the EU. Surveys which were designed to estimate these indicators on national level, such as EU-SILC, usually do not provide the required precision for reliable estimates on regional levels like NUTS2 and below. With the R-Package ‚surveysd‘, we present a package to improve precision and estimate standard errors for social indicators on regional levels in a straightforward way. Regional estimates from subsequent waves are simply cumulated over time, assuming that these structural patterns remain fairly robust. Variance estimation for pooled data is complicated due to a high correlation within the pooled data. The package resolves the problem by using bootstrap techniques that incorporate pooling of correlated samples, like annual waves of EUSILC. In addition to variance estimation for point estimates the variance estimation for differences is supported. Usability of the package and variance improvement, using this bootstrap methodology, is demonstrated on EU-SILC UDB-data of selected countries with various sampling designs.
Thomas Glaser
e-mail: thomas.glaser@statistik.gv.at
Title: <<< Using administrative registers to improve sampling of EU-SILC in Austria >>>
EU-SILC in Austria uses income register data from administrative sources that are linked to the sample on micro-level to collect information about most components of the household income. Since this information is also available for the sampling frame it can be used for the sampling design. If the main research variables, especially the rate of people at risk of poverty or social exclusion (AROPE), were known beforehand, stratifying the sample by those variables would reduce the standard error of these variables and thus creating a more efficient sample. Since the former is obviously not the case, even stratifying by variables that are correlated with the main variables of interest should help reducing the standard error. For each person registered at addresses of private households in the sampling frame of EU-SILC 2016 and 2017 income data from registers were matched. The sum of all net income components were then aggregated for each address in the sampling frame creating a net household income based solely on register information. The first quartile of this characteristic is correlated with AROPE. Therefore it was decided to use it as a stratification criterion for the selection of the first wave sample of EU-SILC 2016 and 2017. Results show indeed a reduction of the standard error of AROPE and other main indicators, but only if other rotational samples of EU-SILC are not taken into account too. For the first wave of EU-SILC 2018 this approach was enhanced by using the newly available rich frame. It is a quarterly generated frame of the whole Austrian population based on a couple of different registers. Socio-demographic variables in combination with the available income information were used to train a machine learning model for estimating the AROPE for the whole frame. This predicted AROPE was then used as stratification variable.
Rafael Beier
e-mail: rafael.beier@bundesbank.de
Title: <<< The Bundesbank’s Research Data and Service Center (RDSC): Gateway to Treasures of Micro-Data on the German Financial System >>>
The Bundesbank is one of the largest data producers in Germany and the data are of high quality because they are quality-tested administrative micro data. Due to legal requirements and in order to meet data protection requirements, individual data can be made available only under certain restrictions. In 2014 Bundesbank set up the Research and Data Center (RDSC) which provides analysts and internal and external researchers with access to selected micro-level data in the context of independent scientific research projects.
Some of RDSCs main tasks are:
Provide data access and data protection.
Mediation between data producers and external users.
Responsibility for the methodological improvement, physical provision and comprehensive documentation of high-quality microdata sets.
Consultancy and support services to prospective and existing data users inside and outside of the Bundesbank.
Provision of standardized linked micro data sets containing multiple data sources, not only from internal Bundesbank micro data but also from external data providers.
In order to create useful and comprehensive meta data for the provided micro data, global standards like SDMX and DDI are used. For the linkage of different micro data sources, a record linkage group creates high quality master data sets that enable the linkage not only between multiple Bundesbank micro data but also micro data provided by external services like company data or else. Consultancy and support services are provided by experienced data and research experts in the areas of economics, finance and social studies.

Back to Schedule

Font Resize
Contrast