Tutorial: Visualization & Data Mining for High-Dimensional Datasets

by Alfred Inselberg, Tel Aviv University

Date: Nov 30, 2012, 13:15-16:15
Place: Linné Lecture Hall, Building H, Linnaeus University, Växjö 

The tutorial is open for all people interested in the topic and doesn't require any payments. However, all participants have to register via the form given below. People who are already registered for SIGRAD 2012 do NOT NEED to register for the tutorial seperately. A poster for download is available here.

Abstract

A dataset with M items has 2M subsets anyone of which may be the one fulfilling our objectives. With a good data display and interactivity our fantastic pattern-recognition can not only cut great swaths searching through this combinatorial explosion, but also extract insights from the visual patterns. These are the core reasons for data visualization. With parallel coordinates (abbr. ||-cs) the search for relations in multivariate datasets is transformed into a 2-D pattern recognition problem. The foundations are developed interlaced with applications. Guidelines and strategies for knowledge discovery are illustrated on several real datasets (financial, process control, credit-score, intrusion-detection etc) one with hundreds of variables. A geometric classification algorithm is presented and applied to complex datasets. It has low computational complexity providing the classification rule explicitly and visually. The minimal set of variables required to state the rule (features) is found and ordered by their predictive value. Multivariate relations can be modeled as hypersurfaces and used for decision support. A model of a (real) country's economy reveals sensitivies, impact of constraints, trade-offs and economic sectors unknowingly competing for the same resources. An overview of the methodology provides foundational understanding; learning the patterns corresponding to various multivariate relations. These patterns are robust in the presence of errors and that is good news for the applications. We stand at the threshold of breaching the gridlock of multidimensional visualization.

The parallel coordinates methodology has been applied to collision avoidance and conflict resolution algorithms for air traffic control (3 USA patents), computer vision (1 USA patent), data mining (1 USA patent), optimization, decision support and elsewhere.

Keywords: Exploratory Data Analysis, Classification for Data Mining, Multidimensional Visualization, Parallel Coordinates, Multidimensional/Multivariate Applications