Session 25

Back to Schedule

Title of session: Data integration at work

Chair: Jean -Pierre Poncelet

Room: S4A Mariacki

Time: 11:30 - 13:00

Date: 28 June

Session 25 - papers & presentations

Presenting AuthorAbstract
Dag F Gravem
Title: <<< The use of loyalty card transaction data from retailers in Household Budget Statistics: an exploration >>>
Statistics Norway (SSB) is exploring different alternative data sources for household budget statistics. One such source is the use of loyalty membership data from retail chains, where particularly data from grocery retail chains (food and drink consumption) is of interest. The paper presents an explorative analysis on the use of this type of data from a methodological perspective, using a test-sample of receipt data with information on loyalty members from one of the largest grocery retail chains in Norway, containing over 121 million transactions, about half of them conducted by members. The main finding in the analysis is that using loyalty members purchase data from one retail chain to estimate Norwegian households grocery consumption is not advisable, even though data derives from a large retail chain with high market share. Members have a spending pattern that is different from non-members, and further different from the general population. Not only are the members a specific subpopulation when it comes to spending, there are also demographic differences. Considering gender and age, the members are a selective subset, and we find that differences in spending varies in different age groups and between men and women. The observed skewness of the members with respect to background variables, such as gender and age, suggests that an adjustment for these variables may remove some of the selection bias seen in the expenditure pattern. We found that this is not the case. Adjusting for age and gender only has a very small effect on the resulting spending pattern. We conclude that the members are a specific subpopulation with a spending pattern unlike that of the average Norwegian, and that this cannot be explained by age and gender. There must be other factors at play that are not available in the data used in this analysis.
Sofie De Broe
Title: <<< Smart business statistics: how to integrate technology and official statistics? >>>
Businesses are requested to deliver a lot of data for official statistical purposes. Traditionally, questionnaires are used as collection instrument, which represents high costs and response burden for businesses. Nowadays, however, a lot of data within businesses are already available in electronic format. Electronically available data could be collected by means of System-to-System (S2S) data-communication technology as a primary data-collection mode. An example is financial data in a computerized business information chain which automatically provides input for financial, tax and statistical reports. These data can be collected using e.g. SBR (Standard Business Reporting), a standard for S2S data communication. A second example regarding S2S data collection for statistical purposes involves sensor data. Increasingly electronic sensors are used to run a business, e.g. by agricultural (like dairy farms) and transportation businesses. In this paper we explore the challenges and consequences of S2S data collection. Drivers for switching from questionnaires to S2S data collection are working towards smart business statistics, (i.e. timely and new statistical output integrated in business processes), the reduction of response burden, and monitoring and benchmarking businesses to their counterparts. S2S data collection seems straightforward. However, important factors that affect or impede implementation are standardisation of S2S technology and harmonisation of metadata. Also definitional issues and trust in the quality of the data affect its use. S2S data collection offers opportunities to get access to data in a cost-effective manner (as opposed to sending out questionnaires); however, both companies and NSIs may need to do initial investments in the technology. In terms of getting access to the data and business participation, the question from businesses remains: "what’s in it for me?"
Our statement is: Technology is the enabler of innovations; it is the applied methodology and organizational context that make innovations work.
Arkadiusz Wisniowski
Title: <<< Utilizing Non-Probability Survey Data to Improve the Quality of Probability Survey Data for Small Samples >>>
Probability-based surveys serve a useful role in society as they provide a critical source of representative data used to track changes in the population over time, evaluate the impact of interventions, and inform policy-relevant decisions. However, probability-based surveys are struggling against increasing non-response rates and non-coverage rates in western societies, which have contributed to rising costs of data collection that have outpaced research budgets. To cut costs, surveys may choose (or be forced) to reduce sample sizes, but this comes with the drawback of increased variability of survey estimates and a higher likelihood of Type II errors. In this paper, we evaluate a Bayesian modeling approach for reducing the variability of survey estimates derived from small probability samples. The approach incorporates auxiliary information collected from concurrent (and larger) non-probability samples into the estimation process. We demonstrate the method using actual nationally-representative probability and non-probability survey data collected in Germany. We show that the data combination method produces survey estimates with substantially lower variability compared to probability-only survey estimates, especially for small sample sizes. We conclude with a discussion of the implications of this procedure for survey practice and propose some future research directions.
Romina Filippini
Title: <<< Design and evaluation of an editing and imputation strategy for micro-data from integrated administrative sources: the Italian case of the ARCHIMEDE Project >>>
In order to provide users with detailed statistical information at local level, in recent years the Italian National Institute of Statistics (Istat) has made available micro data collections based on the integration of several administrative sources. In particular, micro data archives have been produced in the context of the ARCHIMEDE project (Integrated Archives of Economic and Demographic Microdata) to enlarge the offer of statistical information on the households socio-economic conditions. Extracting reliable statistical information from multiple data sources is in general a complex task. The present work describes the methodology adopted to ensure more "complete" and "coherent" data dissemination and provides indications regarding the quality of the results produced. Specifically, the different editing and imputation techniques and tools used for the main variables in the database are illustrated. A specific focus concerns the consistency between the two quantitative variables “Income from employment” (from fiscal sources) and “Work Intensity”, (from social security data). The latter, taking values in [0,1], is defined in terms of amount of work carried out over 1 year. Before applying the error localization and imputation procedures, a detailed analysis based on auxiliary administrative sources has been conducted aimed at properly identifying erroneous cases to be corrected. Finally, impact of imputation at territorial level is evaluated by comparing values of some indicators based on the “raw” data, with the corresponding values based on the final adjusted data.
Diego Zardetto
Title: <<< Reconciling Estimates of Demographic Stocks and Flows through Balancing Methods >>>
In the near future, the Italian population census will be the result of the integration of administrative and survey data. This will enable Istat to deliver official population size estimates more frequently than it happened before through traditional censuses. Census-based estimates of population counts (‘stocks’) should be consistent with information about demographic events (‘flows’) available from municipal civil registries. In particular, the Demographic Balancing Equation (DBE) should be fulfilled, which states that final population counts P(t+1) are equal to starting population counts P(t) plus the sum of natural increase N (difference between births and deaths) and net migration M (difference between immigrants and emigrants):
Due to sampling and non-sampling errors, the DBE will not be trivially satisfied. Therefore, suitable methods must be investigated to obtain consistent final estimates. These methods should simultaneously adjust both the initial estimates of population counts and the rough civil registry figures, in such a way that the resulting macrodata exactly fulfill the DBE.
We formalize the problem of ensuring time and space consistency of demographic estimates as a constrained optimization problem. Given initial, rough estimates of stocks and flows entering the demographic balancing equations defined for all the geographic areas of a given territorial level, we search for final estimates that arebalanced, i.e. (i)satisfy all the DBEs, and (ii)are as close as possible to the initial estimates. To solve the problem, we propose to exploit the Stone-Byron approach that is commonly adopted for balancing large systems of national accounts.
Experiments on real data suggest that, under reasonable assumptions, the proposed approach determines improved estimates of population counts: besides gaining consistency, they exhibit lower bias and variance as compared to rough ones, and the observed efficiency gain seems robust against misspecification of reliability weights.

Back to Schedule

Font Resize