Speed Talk Session 4

Back to Schedule

Title of session: Quality in statistical domains

Chair: Luisa Muñoz

Room: S4A Mariacki

Time: 13:15 - 14:00

Date: 27 June

Speed Talk Session 4 - papers & presentations

Presenting AuthorAbstract
Roland Sturm
e-mail: roland.sturm@destatis.de
Title: <<< Quality assessment and quality measurement for profiling in Germany >>>
Official statistics in Germany is introducing profiling of enterprises as a new task of the statistical business register. Profiling in Germany aims to analyze the structure of enterprise groups in order to identify within enterprise groups the enterprises as defined by European law (Regulation 696/93). The staff for manual profiling is located in 14 Statistical Offices of the Laender. The tasks of the Federal Statistical Office are to coordinate the work, to evaluate and ensure good quality, to improve the methodology of profiling. The presentation will propose ways to deal with issues as:
Target population: Manual profiling requires high resources. Accordingly profilers should prioritize on “the most relevant” enterprise groups, expecting to find out about “the most relevant” enterprises. What are suited indicators for selecting profiling cases? How to assess the effects of profiling?
Workflow: How can 14 profiling institutions be coordinated in order to produce comparable results? How to assess effectiveness and efficiency of the methodology and the working procedures?
User orientation: Profiling is a means in order to provide statistical users in the statistical offices with appropriate statistical units, namely enterprises. How to identify the main points of interest of the users? How to store and provide the results of profiling to the users? How to integrate the user feedback in the adaption of the profiling methodology?
Structure and content: Profiling detects the enterprise perimeter – domain statisticians need a set of variables. How to organize the work share between profiling (detection of structures) and data collection (detection of variables)?
Mathias Revold
e-mail: rmk@ssb.no
Title: <<< Survey quality: Response bias in retrospective questions >>>
Studies show that survey respondents' failure to remember events accurately (forgetting), as well as psychological mechanisms leading people to present themselves in a socially desirable manner (social desirability bias) leads to inaccurate results. We will look at self-reported electoral participation as a case to examine the impact of forgetting on social desirability bias, and thereby the quality of survey data. In 2014 and 2017 the Norwegian survey on living conditions included questions on participation in the last parliamentary election. As there were no national elections between the surveys both referred to the 2013-election. Thus, we asked about the same event shortly and several years after it happened. Self-reported electoral participation is higher than actual participation. Some studies show that socially desirable answers increase when respondents forget events as time passes (Belli, Traugott, and Beckmann 2001). At the same time, the likelihood of forgetting decreases when events are regular and seen as important (Tourangeau, Rips, and Rasinski 2000). This could apply to electoral participation. We will look at differences in reported electoral participation between the surveys, overall and for demographic groups. These results can be compared to the actual turnout in the same groups by using the administrative register on electoral participation. Furthermore, the survey has a panel design which makes it possible to study changes in individuals’ responses. Thus, we will study the impact of forgetting and social desirability bias when surveys refer to events years after they happened. If forgetting leads people to give more socially desirable answers, self-reported electoral participation should increase over time. However, previous studies have examined the impact over relatively short time periods. If respondents feel less social desirability pressure years after the event, the reported participation rate may decrease and be closer to the actual election results.
Álvaro Gómez Losada
e-mail: alvaro.gomez-losada@ec.europa.eu
Title: <<< Machine Learning Techniques to Forecast Population Using Eurostat Data: An Exploratory Study >>>
Machine learning (ML) is concerned with the algorithms that transform data into useful intelligence. ML algorithms play a key role in the big data era, making it possible to analyze vast amounts of heterogeneous data in order to extract unknown and potentially useful information and to help inform intelligent decisions. National Statistical Institutes provide high quality statistical information that has been gathered and processed based on international standards and appropriate data analysis procedures. However, this rich information may not be suited to analysis by traditional ML algorithms because the sizes of the typical datasets fall well below big data thresholds. In this study we explore the usefulness of ML techniques to forecast the 2016 Spanish population size by age class, using demographic data for the years from 2004 to 2015, obtained from Eurostat's web page. The following variables were used to perform forecasting: immigration, emigration, fertility, deaths, first-time marriage and life expectancy by years. This approach was compared with ARIMA and exponential smoothing (ETS) projections, as well as with a baseline landmark, which considered the 2015 population by age as the projection for 2016. The Mean Average Error (MAE) and Root Squared Mean Error (RMSE) quality measures were used to evaluate the accuracy of the estimated population pyramid. In all cases, the ML algorithm outperformed the other approaches. Although these results seem to be very promising, a few methodological issues should be discussed, such as the implementation of ML methodologies on small data sets, and the type and nature of variables used to infer the predicted outcome.
Cristina Neves
e-mail: cristina.neves@ine.pt
Title: <<< Survey on management practices >>>
Statistics Portugal carried out an unprecedented survey over a sample of enterprises, established in the legal form of companies, achieving nearly four thousand valid replies, which made it possible to obtain information on management practices and characteristics in 2016. The survey, of a qualitative nature, falls within the scope of a range of statistical operations intended to disclose information on factors that, although with no explicit monetary reflection on enterprise accounting, constrain their competitiveness in a context of growing integration within overall economy. The main results obtained were divided according to four strata variables: Age of the enterprise, Belonging to an economic group, Size of the enterprise and Economic activity. Some examples of the main findings were:
- In 61.0% of the companies, top managers had a bachelor's or higher degree. This percentage was 82.9% in large enterprises and 43.7% in microenterprises.
- In about 70% of the companies, the top manager exercised functions under exclusivity (60.6% in microenterprises and 78.4% in large enterprises).
- In about 51% of the companies no promotions were awarded to employees with management functions. This percentage decreased to 44.3% in the case of employees with no management functions.
In order to obtain a synthetic measure of management quality, an indicator has been built (gscore), based on an indicator designed for a similar survey of the US Census Bureau. Combining the information from this survey with information reported to Statistics Portugal in other statistical operations, the results obtained indicate a significant relationship between management quality and the economic performance of the enterprises.
Anna-Kaisa Jaakkonen
e-mail: anna-kaisa.jaakkonen@luke.fi
Title: <<< Improving efficiency of the sample design with sensitivity analysis of the thresholds in the Finnish horticultural survey >>>
The horticultural statistics are collected for the European agricultural policies based on the regulations on permanent crops and crop statistics. In FInland the horticultural survey has been conducted annually as a total survey with a threshold on standard economic output (SO) of the horticultural enterprises. We investigate the impact of increasing the SO threshold both on the quality and coverage of the final estimates, and on the survey costs. With the sensitivity analysis, wedemonstrate how to optimise threshold on the standard economic output to balance the survey costs and the quality criteria of the survey defined in the EU regulation survey for permanent crops. We also present the method of deriving the standard economic output for the horticultural enterprises in the sampling frame. To reduce the survey costs, and to improve the quality of the survey data, the sampling design of the survey is reviewed in detail. The data collection of the horticultural survey uses mixed-mode approach; using register data, web-survey, telephone interviews and accepts also paper questionnaires. With the increase of the thresholds we can also analyse the increase in the web-survey response rate. Thus it is expected that larger horticultural enterprises tend to respond through the web-survey more likely; while those who are interviewed tend to be on average smaller enterprises. Therefore, we will also present the impact of the efficient sample design on the expected improvement on the timeliness of the survey data. The improvement of the sampling designs is increasingly topical as there are extensively new information needs, and the statistical offices must balance between the statistical and response burden. Improving the efficiency of the sampling design is directly reducing the survey costs and the response burden.
Caroline Bo
e-mail: cbo@dst.dk
Title: <<< Understanding the effect of the global economy on the Balance of Payment >>>
Increasing our focus on large enterprises’ production setups has been of great importance to heighten the quality of measuring global activities affecting the Danish economy. Two global production setups – merchanting and processing - are central. During 2014 when the BPM6 was being implemented, we found that there was significant underreporting of these types of activities. Hence, we were not catching the full scale of global activities by Danish enterprises affecting the Danish economy. We decided to start validating data on the largest cases, i.e. the Danish enterprises, expected to have most global production. By validating data on the largest companies on an ongoing basis, we now believe to have a trustworthy indication of how our large enterprises affect our Balance of payment. In this paper we will describe how we came to validate merchanting and processing data etc. in a validation across statistical domains as well as include a description of the practical approach to this work.
Annalisa Lucarelli
e-mail: anlucare@istat.it
Title: <<< 45-day flash estimates of a PEEI: the Italian job vacancy rate – methods, revisions, cyclical properties >>>
The EU regulation on quarterly job vacancy statistics requires data transmission within 70 and 45 days after the end of the reference quarter. The published indicator is the job vacancy rate, that is the ratio between the number of vacant posts and the sum of vacant and occupied posts, which is included among the Euro Principal European Economic Indicators (PEEIs) and is considered a potential leading indicator of the business cycle. The Italian job vacancy data are based on two direct business surveys and an auxiliary administrative based source (for editing and imputation and calibration). The procedure used to produce the data for the 70 day deadline makes full use of the reference quarter data from all three sources. However, for the 45 day deadline fewer data are available and as a consequence a different procedure needed to be developed and implemented. In particular, administrative based data for previous quarters are used, as well as more limited sets of respondents to the two direct surveys. The results have proven so far very satisfactory. The revisions between job vacancy rate estimates for the 45 and 70 day deadlines are often zero, especially at the higher aggregation levels. This happens also if the rate numerator and denominator change significantly between the two estimates, due to the different sets of direct survey respondents and the different populations on which the calibration constraints are based. Furthermore, the flash estimates job vacancy rate generally show good cyclical properties. The flash estimates quality, however, can be negatively affected by intense and prolonged downturns and upturns, when the impact of the use of calibration constraints based on previous quarters rather than the reference one can be more relevant. Improvements in the procedure to account for this limit could be studied in the future.

Back to Schedule

Font Resize