Chair: Maria João Zilhão
Room: S4A Mariacki
Time: 14:30 - 16:00
Date: 27 June
|Title: <<< Trusted Smart Statistics: A reflection on the future of (Official) Statistics >>>
The extended use of the Internet of Things (IoT) will eventually take big data to a whole new level and change the data landscape. Data capturing and processing capabilities coupled with analytical and statistical capabilities will be embedded in the smart systems themselves. Intelligence along the data life-cycle enhanced with cognitive processes will be essential components of future statistics. Algorithms will handle huge amounts of data at the limit of human capabilities for exploiting statistical data using traditional data processing methods. We call this smart statistics. We identify smart statistics as the future role of official statistics in a world impregnated with smart technologies. Smart technologies involve real-time, automated, interactive technologies that optimize the physical operation of appliances and consumer devices. Statistics themselves would then be transformed into a smart technology embedded in smart systems. However, statistics are only useful when they are trusted. In order to build trust into smart statistics the data life-cycle needs to be auditable, transparent, with guarantees of accuracy and privacy by design. This paper provides a reflection on the future of official statistics in a hyper-connected world dominated by the IoT. It briefly outlines the concept of smart technologies shaping the future of statistics emphasising the need to embed trust in smart statistics under principles for transposing algorithmic transparency and accountability in smart statistics.
|Title: <<< Improving the quality of official statistics with geographical disaggregation and dasymetric mapping: Two Eurostat experiments on tourism and population statistics >>>
Official statistics are often reported on statistical units, which are sometimes too large to depict properly the geographical distribution of the underlying phenomenon. In the European context for example, most statistics are produced only at national level (NUTS 0) and do not allow a true understanding of the spatial pattern at more local scales. Geographic resolution is a crucial component of quality in official statistics which should be better addressed.
This article describes two experiments carried out at Eurostat for disaggregating statistics with auxiliary geographic data. These experiments are both based on dasymetric mapping: Input statistical values are distributed at the level of geographical features; these new statistical values are then re-aggregated at the level of target statistical units with a finer resolution. A first experiment was the disaggregation of tourism statistics over Europe from NUTS 2 to NUTS 3 and a 10km resolution grid. The auxiliary geographic information used is a database containing the location of around 160000 touristic accomodations over Europe. The outcome reveals a striking image of touristic activity over Europe, with spatial patterns which cannot be revealed at NUTS 2 level. The second experiment was on the disaggregation of mobile phone data over Belgium to assess population distribution on a 1km resolution grid. Mobile phone data are collected at antenna level, whose reception zones are extremely irregular in shape and size, especially in rural areas. Cadastral information on the location and volume of each single building over Belgium has been used to locate more precisely mobile phone users around built-up areas. Both experiments show the pertinence of using geographic information with dasymetric mapping method to improve quality related to geographical resolution. This method has been implemented in the generic library EuroGeoStat (github.com/eurostat/EuroGeoStat) and is intended be applied to other domains.
|Title: <<< Investigation of linked open data technologies for publishing georeferenced statistical data >>>
Polish official statistics possesses a vast amount of statistical data dispersed among different databases and disseminated using various publication methods. While there is a significant increase in openness of the data, there is still a lot of work to be done in terms of integrating different data sources. That is why Statistics Poland decided to look into the linked open data technology and launched a project for a pilot implementation based on statistical and geospatial data samples. Multiple data sources published by official statistics have been identified, described with metadata and assessed in terms of their openness. At the same time, units of territorial division of the country that are used for statistical data dissemination have been catalogued, harmonized and generalized for years 2002-2016. Finally, linked open data technologies have been explored in order to find a feasible implementation method. This pilot covered statistical data from three major databases (Local Data Bank, STRATEG system and Demography database), geographical data for statistical units and the data sources catalogue. A thorough research has been performed on existing vocabularies and statistical linked open data implementations in order to create RDF metadata and establish a test SPARQL endpoint. The pilot linked open data implementation was a valuable exercise which provided a lot of answers but at the same time raised a lot of new questions: Is there a reference implementation for statistical data? Which vocabularies to use? What should we link to? How to encode geospatial data to make them most usable? Most implementations are technically correct but are they of good quality? Hopefully an increasing interest in linked open data along with pan-European cooperation fuelled by Eurostat’s DIGICOM project will provide answers to these questions and a reference statistical linked data implementation will surface soon.
|Title: <<< The impact of a centralized data collection approach on response rates of economic surveys and data quality. The Istat experience. >>>
In April 2016 Istat (Italian National Statistical Institute) started a corporate restructuring process that interested all the statistical production structures and that led to a completely renewed organizational model. Before the above mentioned reorganization, the statistical processes were organized according to the classical ‘stovepipe’ model, that involved independent, non-integrated, statistical processes including all the necessary skills: statisticians, information technology experts, thematic experts, methodologists. The new model restricts the thematic production processes only to the thematic experts, while all the “cross” expertise are all assigned to specialized structures. The main advantage of the new setup concerns the overall system efficiency, while the main disadvantage concern the increased fragmentation of the production processes. The new model was based on the following criteria:
- standardization and generalization of each phase of the productive process;
- specialization of personnel devoted to specific activities;
- detailed planning of the surveys calendar and of each data collection activity;
- realization a Business portal involving a set of services oriented to respondents.
Introduction of a centralized Contact center inbound and outbound Before the restructuring process, response rates in economic structural surveys (SBS, Prodcom, ICT, R&D, Innovation, Inwards and Outwards surveys) were quite low and unsatisfactory. The medium response rate was 59.9 per cent. After a year from the introduction of the new organization the medium response rate increased to 66.0 per cent. At the same time, the duration of the data collection periods reduced from 139.3 to 116.9 days. For short-term surveys the main result obtained is the increase in the number of the questionnaires transmitted by respondents within the ‘useful term’, as to say the deadline for the calculation of the provisional index. The paper will describe in detail the process innovations introduced and the main results achieved.
|Title: <<< Mode-effects in mixed-mode surveys: the Italian experience on social surveys using the web >>>
Due to the ever-increasing penetration of the internet into the Italian population, Istat is progressively expanding the use of multi-mode data collection involving Computer assisted web interview (Cawi) in social surveys. In 2017 a project was carried out on the optimal design of mixed-mode strategies to ensure high data quality levels by preventing and treating mixed-mode effects. The results of the project fed the handbook "Mode-effects in mixed-mode surveys - Theoretical issues and experimental applications on social surveys using the web", which is expected to contribute to the standardization of the current social surveys. Actually, the handbook provides a conceptual reference framework in the area of mixed-mode surveys, as well as an overview of the main issues related to the design of multi-mode data collection strategies and possible methodological approaches to face them. A selection of methods for the diagnosis and the treatment of selection and measurement effects are thoroughly analyzed and assessed based on experimental applications to multi-mode social surveys. The handbook is expected to make Istat researchers more aware of quality issues related to mixed-mode data collection using the web and of the need of preventing mixed-mode effects to ensure low biasing effects on estimates. The paper summaries the theoretical contents of the handbook, focusing on the main findings and recommendations emerged from the analyses and studies made.