23 Data Principles

T. Raghuveera

In the previous module, we looked at the various types of visualization techniques and how these techniques are applied in various aspects. We also looked about the design principles in detail and few examples of visualization techniques. Starting from this module, in the next few modules we will examine “The Principles of Data Visualisation” in detail. We start with an overview of Data Visualisation in this module. Data Visualization is the visual presentation of information (data), with the goal of providing the viewer with a qualitative understanding of the information contents.”The learning objectives for this module are to explore the following:

Learning Objectives:

(i) To introduce the common types of data in Data Visualization.

(ii) To learn how these types of data used in various aspects.

(iii) To learn about lessons of Data Visualization principles.

(iv) To know about the examples used to visualize the data in Data Visualization.

23.1.1 Data Visualization – Overview

Data visualization is a general term that describes any effort to help people understand the significance of data by placing it in a visual context, by exploring and discovery of data and others. Today’s data visualization tools go beyond the standard static images(Fig 1) such as charts and graphs used in Excel spreadsheets, displaying data in more sophisticated ways such as info graphics, dials and gauges, geographic maps, spark lines, heat maps, and detailed bar, pie and fever charts. The images may include interactive capabilities (Fig 2), enabling users to manipulate them or drill into the data.

23.1.2 Language of Data Visualization:

Data Visualization has the feature of forming a variety of sign system to solve professional problems. In general sign system is an orderly set of uniformly interpreted messages, signals that can be exchanged during the interaction. Under sign system each mark (point, line, or area) represents a data element. Choose visual variables to encode relationships between data elements to connect difference, similarity, order, proportion. Only position supports all relationships between the data elements. Huge range of alternative for data with diverse attributes will be available, and these data should be carved and identified with an effective representation of visual images to convey the information to the users.

23.2 Data Type Taxonomy

[1] One Dimensional (1-D):

1-Dimensional plotting architecture was designed to efficiently and effectively work with data sets consisting of many points, many images, many series.Usually shows the distribution of values of a single variable. Almost most plots are made from data in an Image window, number of objects (plots) can also be generated from tabular measurements. By using this collection of plotting functions can be made as a module or a bar. Example of 1-D representation is shown in Fig 3.

[2] Temporal Data Visualization:

Temporal visualisations are similar to one-dimensional linear visualisations, but differ because they have a start and finish time and items that may overlap each other.It also refers to the process of looking at or analyzing features or data that are dependant variables where time is an independent variable. It can take many different forms such as reconstruction of the spatial path as a feature that travels over time rate of change of velocity Size or other data/factors over time. Examples are:

(a) Connected Scatter Plot :

A connected scatter plot as shown in Fig 4 is a scatter plot, a plot that displays values of two variables for a set of data, with an added line that connects the data series.

(b) Polar Area Diagram:

A polar area diagram as shown in Fig 5 is similar to a traditional pie chart, but sectors differ in how far they extend from the center of the circle rather than by the size of their angles.

[3] 2D Data Visualization:

It is the representation of two set of relative data. Under 2D area types of data visualisation are usually geospatial, meaning that they relate to the relative position of things on the earth’s surface. In pictorial representation of 2D it takes as both x-axis and y-axis to plot the co-ordinate points.2D data represented in the various forms such as bar chart as shown in Fig 6, pie chart, bubble chart etc.

[4] 3D Data Visualization :

3D data elements are those with two dimensions on the plane and variation in Z dimension. It is easy to use 3D navigation as shown in Fig 7 and manipulation. If models are rendered in such a way as to imply a correspondence between data and image with its level of perception to acquire the object’s 3-D shape.

[5] N-D Data Visualization:

Objects are also called as items, instances, samples and observations. Features are called attributes, parameters, properties, variables, dimensions. Objects described by the same features x1,x2.,…..,xn form a data set. A combination of values of all features characterizes a particular object.

Xi= (xi1, xi2 …,xim), where i belongs to {1…… ,m}

Where n is the number of features, m is the number of objects, and i is the order number of the object. If the objects Xi = (xi1, xi2, ……, xin), i =1,…..,m are described by more than one feature, the data characterizing the objects are called multidimensional data .If the number of features is N, then X1,X2,…..,Xm are the n-dimensional data items. In simple, N number of dimensional visualization of the given data set. It is also called as Multi-dimensional data visualization as shown in figure 8.

[6] Networks:

Under this, each circle (node) represents a character and each connecting line (edge) represents two characters. These things are more likely to form groups if the groups appear familiar or meaningful. Connecting and grouping the nodes with edges with the similar set of data forms a network as shown in figure 9.

[7] Text and Documents:

Here we consider visualizing the text within a document, and collections of documents. Completely integrate words, numbers, images, diagrams to show different modes of evidence to not get locked into one way to see the evidence bring all together in the visualization mode. By using this method through visualization we can obtain insight from large collection of text documents. Integrates dependency text through structure can be mapped directly to visual forms as shown in fig 10.

23.3 Types of Data :

Data can be classified into tabular data, Relational data and spatial data. Further tabular data can be classified into categorical/Nominal data and ordered data respectively. Ordered data has two types such as ordinal data and Quantitative data.

23.4 Clear vision in Visualization:

Make the data stand out. Use visually prominent graphical elements to show the data. Do not clutter the data region. Use a reference line when there is an important value that must be seen across the entire graph, but do not let the line interfere with the data. Do not allow data labels in the data region to interfere with the quantitative data or to clutter the graph. Avoid putting notes, keys, and markers in the data region. Put keys and markers just outside the data region and put notes in the legend or in the text. Overlapping plotting symbols must be visually distinguishable. Proposed data sets must be readily visually discriminated. Visual clarity must be preserved under reduction and reproduction.

23.5 Clear Understanding in Visualization:

Put major conclusions into graphical form. Make legends comprehensive and informative. Error bars should be clearly explained. Proofread graphs. Strive for clarity.

23.6 Graphs:

A Visualization of a graph is a pictorial representation of the vertices and edges as shown in fig 11 of a large amount of quantitative information is packed into a small region through graphical conventions. Graphs are frequently drawn as node-link diagrams as the data should be an interactive and experimental process. The graph drawing should not be confused with the graph itself as it requires a careful and detailed study.

Now we are going to discuss about the basic eight principles of data visualization in upcoming sections.

23.7 Eight Principles of Data Visualization:

1. Understand the problem domain

2. Get sound data

3. Show the data and show comparisons

4. Incorporate visual design principles

5. Add small multiples

6. Add layers

7. Add axes or coding patterns

8. Combine metaphor.

23.7.1 Understand the problem domain:

Under this, if you are producing visualization for your own use or that of your department, then chances are good, you already understand the area you will be working in. But in reality there are many other cases like, the visualization have to be done for another department, or even an external stakeholder. Such times, you may need to ask questions and do more research to understand what involved to understand the actual problem domain of the system.

23.7.2 Get sound data:

It may seem obvious, but good data is at the heart of any effective visualization. Make sure the data you select is as accurate as possible. Have a sense of how it was gathered and what errors or inadequacies had existed. Make sure you get relevant data and enough of it. To create an effective visualization, you need to understand the meaning of the data you are working with. This can be a challenge if it has been stored as raw numbers.

23.7.3 Show the data and show comparisons:

Under this principle, the idea is to show the data by comparison for clear understand and analysis of an information or data. Picking the best type of visualization is an art and science. However, the basic rule of thumb is to choose a spatial metaphor that will show your data and its relationships as shown in fig 12, With minimum distractions or effort on the part of the viewer The representation is undertaken in different forms such as:

1.Network – To show connections, sometimes in a radial layout.

2.Linear – To show how something varies over time or in relation to another factor, often on an X/Y space.

3.Hierarchical – To show groupings and importance, these can come in many different layouts.

4.Parallel – To show reach, frequency or shares of a whole, these can come in many different layouts

23.7.4 Incorporate visual design principles:

Under this principle, to improve the visual design stand for easier analytical literacy by usage of sound visual design elements, like line, form, shape, value and colour. With the principles like “balance” and “variety” to make a visualization in both more inviting and easier to read for trends and comparisons as shown in Fig 13. This will become particularly important as we take our linear metaphor visualization to the next level.

23.7.5 Bring in More Dimensions:

Here once we have a good data and a sound next to visualize is to bring on more than three- dimensional space, as the time came to take the account of complexity in applying the task. Based on our knowledge and research, we can come up with an initial look after at the simple linear metaphor visualization. For example, Store sales.

1. How do we know that this uptick in sales is not just a seasonal trend?

2. Total sales are up, but has the new store layout succeeded in improving the performance of some departments that were struggling before?

3. Are we succeeding in getting more customers into the store and not just selling more to existing ones?

4. Are customers shopping more departments and buying a more diverse mix of items?

23.7.6 Add small multiples:

Small multiples are sets of charts of the same type, with the same scale, presented together at a small size and with minimal detail, usually in a grid of some kind. As small repeated variations of a graphic side-by-side allow for quick visual comparison. Whenever possible, scales should be kept the same and the axis of comparison, aligned .Adding some small, stacked thumbnails to our chart, the main one allows a comparison of sales trends and the one before that.

23.7.7 Add layers:

Layers are objects on the map that consist of one or more separate items, but are manipulated as a single unit. Layers generally reflect collections of objects that you add on top of the map to designate a common association. For example from the below pictorial representation making a graphic representation is more flexible and useful. We are going to break down the “top line” of total sales into departments as shown in Fig 14. The resulting stacked area chart shows the sales from the appliances department have increased. As a proportion of the whole, but media department sales have not improved much.

23.7.8 Add axes or coding patterns:

Here, another way to get more dimensions in a graphic is to add additional patterns for coding information. Such as varying the shape or colour of points on a plot based on a variable. An extra axis in space, alongside an existing one or in a new direction (for a 3D chart).It can also be useful for showing new variables. It’s important to be careful with this approach, as it can add clutter, but when used sparingly and with good design principles. It also helps in increasing a graphic’s usefulness.

23.7.9 Combine metaphors:

Under this principle, So far, we have used a linear metaphor for our visualization. We want to add a network metaphor to show connections between product categories in purchases. A pair of circular relationship (chord) diagrams showing snapshots at the beginning and end of the time period under consideration can help compare these connections. Like a pie chart, each product category is assigned a section of the circle, by percentage of total sales, but the centre of the circle is hollow. If a majority of purchases containing items in one category also included items as shown in Fig 15 in a second category, a line is drawn to that second category; line width is based on the average proportion of both categories in the mixed purchases. The increase in these chord lines from the first to second diagram suggests there are indeed more purchases that cross departments since our initiatives went into place. This relationship data would be even better if we could see it at any chosen point in time For example, to see what effect, if any, the layout change alone had, before the promotion started. As our graphics increase in complexity and sophistication, we need to think more carefully about how to deliver them.

23.8 Consider New Delivery Methods:

Considering new delivery methods including Integrated Project Delivery (IPD) has adopted few construction projects, as the point of visualization is to be viewed by the right people, in the right context to deliver the information in better understandable way. Few examples, we could enable scrubbing through time (great for seeing more network metaphors) are:

1. Drilling down and zooming out for a bird’s eye view.

2.Seeing new data live as it becomes available or even manipulating future variables to watch different scenarios.

23.9 Five Principles for creating effective data Visualization:

1. Be Open to Discovering New Insights:

Effective data visualizations enable the user to discover unexpected patterns and invite a different perspective of the data. During this discovery process, it was tempting to focus on the goals of the iteration and avoid being distracted by unexpected insights. For example, in the early iterations of looking at the data, we found an issue in the data collection process for one customer that opened up opportunities to improve the type of data being collected. This resulted in further operational savings that wouldn’t have been discovered if the team were only focused on the task.

2. Think Big, but Start Small:

Visualizing years of hospital spend data was no easy feat! Opinions were rampant and time was of the essence. Our internal users needed quick solutions to a very painful process that was costing them time and lost business. We kick started the project with a collaborative workshop to understand the quality of the data, the business objectives and the user needs of the visualization. Together we sketched out the high level end-to-end user journey, and identified a thin slice of the project by answering the question: What’s the smallest visualization we might build that gets the data into the hands of users and will generate the most learning’s for the project? We kept this high level journey alive over the course of the project to remind ourselves what we were creating and to avoid going deep in areas that weren’t necessary for the specific iteration goals.

3. Design for Your User:

User journeys and prototyping helped us define the user goals, what vital pieces of information was needed and at the right point in the process. During the first iteration of the visualizations, we realized how little screen real-estate we had! Information was tightly packed, the screen had too much information and the fonts were too small. We explored treemaps and pie charts as ways to visualize the potential cost savings by product category. Lots of discrete clusters made pie charts ineffective. When we tested the treemaps, users found it difficult to arrive at a clear decision when comparing across product categories. Users also grappled to understand more complex visualizations as they were mostly accustomed to excel type visualizations.

1. Prototype to Identify Needs:

Each visualization as shown in Fig 16 was created and designed by a small cross-functional team that set to find a solution at the intersection of: Will customers use it? Can we build it? And does it meet business needs?

Fig 16-Prototype to identify Needs.

2. Obtain Feedback Early and Often:

One of the common misconceptions about user testing is that the user will have all the magical answers to exactly what they should see and how the system will work. For example, our first user sketching session resulted in excel tables with lots of filters and sorting! At the risk of rebuilding excel, we revisited the user goals, and did a collaborative sketching session to first define the types of data points required. Then we did a power-ideation session to come up with alternative visual ways to arrive at the user goals that were more effective than tables of data.

23.10 Examples Of Data Visualization:

Examples of Data Visualization are shown in Fig 17 are as follows:

Fig 17.1-Examples of Data visualization.

Sortable tables, matrices:

Fig 17.2- Sortable Tables,Matrices.

23.11 Summary

To summarize, we have examined the following in this module namely various types of data in visualization, basic concepts of data visualization and its application and their principles. Examples and techniques used in data visualization were also discussed.

References

1. Colin Ware, “Information Visualization Perception for Design”, 2nd edition, Margon Kaufmann Publishers, 2004.
Robert Spence “Information visualization – Design for interaction”, 2nd Edition, Pearson Education, 2007.
Stephen Few, “Information Dashboard Design-The Effective Visual Communication of Data”, 1st Edition, O’Reilly Media Publisher, 2006.