Examining the Visualization Practices of Data Scientists on Kaggle

Naimul Hoque, Darius Coelho, Klaus Mueller

Visualization is an integral part of data science as it is used extensively in different phases of designing a data-based model and communicating its outcomes. Real-world data today is often large in terms of number of data entries and dimensionality, it can also contain inconsistencies or noise. The use of appropriate visualizations allows data scientists to quickly explore the data and examine outliers or inconsistencies from the data. This process of finding patterns and trends in the data is known as Exploratory Data Analysis (EDA). EDA can be used to extract unique insights from data, take appropriate business decisions, learn relationships between variables, all of which contribute towards the design of a good data-based model.