Training course 3

Theme of the course/workshop: Big Data in Official Statistics

Facilitators/ Instructors: Jacek Maślankowski

Program

Time slot	Topic
09:30 – 09:45	Introduction and objectives of the course Presentation of the lecturers and participants
Morning session
09:45 – 11:00	Web scraping – history, tools, types of web scraping Acquiring data from web – manual vs. automatic tools Quality issues in web scraping – sustainability, coverage and representativeness Examples and exercises
11:00 – 11:30	Coffee Break
11:30 – 13:00	Web scraping sem-structured data Combining two different we data sources – de-duplication issues Examples and exercises
13:00 – 14:00	Lunch Break
Afternoon session
14:00 – 15:30	Machine learning fundamentals – supervised vs. unsupervised learning. Examples of text and numeric data. Text mining – processing high quality text data for machine learning. Examples and exercises.
15:30 – 16:00	Coffee Break
16:00 – 17:30	Machine learning with web data – how to prepare a good training dataset. Quality aspects of machine learning. Examples and exercises.

Description and objectives of the course:

Objectives

Show the fundamentals of the use of Big Data in official statistics using three different aspects of Big Data: web scraping, text mining and machine learning. The course will be based on practical examples and exercises that will allow participants in better understanding the concept of the use of Big Data in official statistics. Examples will be based on real issues of data gathering and processing for official statistics.

Participants’ profile

Should have a basic knowledge about the concept of Big Data. No need to have any programming skills – all examples in Python will be conducted with instructor. Should be familiar with the basic use of IT tools.

Overall description and approach

The workshop will be based on examples – participants will do exercises and run examples in Python language. Fundamentals of web scraping and machine learning will be provided with practical examples to use in official statistics. The goal is to identify the risk with web scraping and machine learning regarding the use in official statistics in terms of the data quality.

Facilitators/ Instructors (short biographical note)

Jacek Maślankowski, Ph.D., is a researcher and academic teacher at the Department of Business Informatics, University of Gdańsk and consultant in Statistical Office in Gdańsk (Statistics Poland). His research activities mostly concentrate on Big Data and Data Warehousing with Business Intelligence. He is an author of numerous publications regarding Big Data appliances. He was involved in Official Statistics in several Big Data projects, including contribution to the Big Data Quality Framework by UNECE. Currently he is a delegate to the ESSNet Big Data work packages: WP2 – Web Scraping Enterprise Characteristics (member), WP7 – Multi Domain (leader of methodology, consultant) and WP8 – Methodology (member, internal coordinator of IT Report). He is the author or co-author of statistical software, including Big Data Social Media Presence and Life Satisfaction.

Back to Schedule

Declaration of Website Availability

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.Privacy Policy Accept
Privacy & Cookies Policy

Font Resize

Contrast

News
Conference
▼
Registration & Accommodation
Scientific Committee
Programme Committee
Conference Programme
Keynote Speakers
Speakers
Conference venue
Social events
Key dates
Previous Q-Conferences
Scientific Information
▼
Papers & Presentations
Training courses
Guidelines for chairs
Guidelines for speakers
Guidelines for papers & presentations
Topics
Call for abstracts
Venue
▼
Getting to Kraków
Getting around Kraków
What to see
Eating out
Shopping
Weather
Emergencies
Visa & Entry
Gallery
Contact