Viz For Social Good - Part 2, The Build

This blog is going to centre around a recent visualization of mine and why and how I created it.

The visualisation this blog will cover

Before starting this blog post I reached out to ask the community if there would be much interest in writing this blog.  The response was a resounding ‘yes’ but it was also clear that not only should I provide a detailed guide on ‘how’ I built this visualization, but more importantly ‘why’ I came to my design choices.


To make this more accessible I have decided to split the blog into three separate blog posts and avoid creating one mammoth length of text that every gives up reading halfway through!


This blog is Part 2 - The Build


This is the second in a blog three-parter which will go into more detail of the calculations used and what they do, whilst also explaining other areas of the build such as colour use and interactivity.

Let’s begin the build 

The assumption here is that you have read the 1st part of this three-part blog series.  If you haven't then you can find the 1st blog, here.   


The build

Starting off with the VFSG dataset with the 'reach' data added with the two columns at the end.   

Note: for this viz, I used MSExcel for my data prep, Adobe Illustrator for the design elements and Tableau for the visualisation.


We start with our data densification using our data model (as discussed in Part 1) as this will allow us to create the various stages of the main radial.  

This is what we are going to need for our densification, [T] value.  

Full detail of the Data Model


Note: I did not know all of this when I started.  When I started I was adding additional [T] values as I wanted to plot new stages to the radial and I tweaked a lot of values as I was building and even reordering the values - so that the flow of the lines were continuous and the colour scale I finally settled on worked as I wanted.


Snippet of the Data Model build in MSExcel.


What the different [T] values create, in the final viz.



Hopefully the above images help with the understanding of what the Data Model [T] value is doing and what each value is enabling me to create.  To utilise the Data Model with the [T] values we need to join two tables, the original data set and the Data Model.  To help us do that I added an additional column called "One" to my original data set and in the column was the number '1' in every row.  When bringing the MSExcel sheets into Tableau to join them, both the Original Data set and the Data Model have a column with the number '1' on every row.  We can then use these columns to join the tables together, like this:

So to recap, this data densification produced by joining these two tables in this way is doing this:

For every row in your original data set, these rows are being duplicated the same number of times that you have rows in your data model; with the Data Model [T] column being added (joined) to the Original Data set.  Each row and duplicated row of the Original Data set takes one of the [T] values from the Data Model.  

This effectively is giving a duplicate of the Original Data set for every [T] value in your Data Model.  With this, as explained in Part 1, identifying these [T] values will allow you to build varying visual elements all within the same worksheet, as you can isolate those [T] values and apply different calculations to them.

So now we have densified the original data, lets identify our [T] values, by writing a calculated field called 'Separator':


The next calculation we will write is to help us draw the curves for the circular sankey and we will adapt for other curved lines from the volunteers to the individual visualisations ([T] 6 to 7). 


The adaption for the curved lines between the volunteers and their visualisations was required because the [T] values between 6 and 7 were too large for the LOG function to use.  So in the formula above I simply too away 6 from the [T} values to bring the LOG calculation for the curve to run between 0 and 1, in the same way that the LOG calculation for the circular sankey is taking away 1 from the [T] values between 1 and 2, to again run the calculation between 0 and 1.

The next part of what we need to produce the circular sankey is a 'Ranking' of the inner circles (Project / Charity) and the outer circles (Volunteers).  It is called a 'Ranking' because in Luke Stanke's original circular sankey this element was adaptive through interaction and selection of the data via a parameter control.  Here though there is no intention to create something where the data will change and the number of circles inner or outer will change.

So I found the easiest option was to push back this element of the build back into the original data set, rather than creating a bespoke calculation in Tableau.

The 'Ranking' is essentially an ordering of placing the circles.  I decided to place the Projects / Charities in order they were featured for VFSG.  The Volunteers were ordered by Surname and the Visualisations were ordered by volunteer and then project / charity.  This was done through a couple of straightforward lookups in MSExcel.

Example data set addition:


Just as with most radials the aim is to produce calculations that allow us to use trigonometry to plot each point on an x-y grid to produce the required visual.

The first direct step in creating that trigonometry calculation is to find the 'distance' from the centre of the radial for each point we want to plot - otherwise known as the Radius (but we will continue to refer to it as distance).


What you will notice here is there are a number of different sub-calculations going on within this.  Firstly there are straight distance values for the singular points (circles) we want to plot on some [T] values.  Next there are the sankey curves that use our [t (logodds)] calculation.  Lastly there are two calculations for the two 'reach' elements that use a parameter as part of the calculation.

I'll run through each.

Circle plots ([T] = 0, 3, 4, 5, 8, 9 and 10)

As mentioned these are plotted to a value that varies with the [T] number.  So as our [T] value increases so does the distance from the radial centre.  There was a lot of trial and error, and adjustment at play here.  Most of it was just visually how much room I wanted to provide each element in the radial.  For example, I didn't want the circular sankey to dominate the other elements, so I reduced the distance from Project / Charity to Volunteer circels.  Equally, I wanted to give the inner Project / Charity circles enough space between them to potentially label them at some stage, so I made sure the starting value for [T] = 0 was large enough.

Circular Sankey plots ([T] >=1 & <=2 and >=6 and <=7)

The circular sankey lines are actually plots of 100 points for each line.  This comes from the Data Model, where the [T] values between 1 - 2 and 6 - 7 are separated by 100 increments of 0.01.  

The calculation for Distance in the above image is broken into three parts.

Starting Distance - Curve calculation * adjustment multiplier

The starting distance was deliberately more than the circle distance the lines connect to; this was tweaked after looking at the end result.  The distance gap was to help the line curves come together more before meeting the circle. 

Reach elements ([T] = 11 and =12)

The 'reach' elements are the views and the followers data I added to the original data set.  These calculations are slightly different than before - The views are going to be plotted as a radial bar chart and therefore each bar will finish at a different radius from the centre. The followers, similar, but will be plotted as circles to begin with.

Again these calculations are broken into three parts.

Starting Distance + Normalised Value + Bar Multiplier

The starting distance is as described above in the Circular Sankey plots.  The Bar Multiplier, which is a parameter that I could control when in setting up the final view, allowed me to adjust how much real estate on the chart I was giving to the Radial Bars and Follower plots verses the rest of the radial as a whole.

The Normalised values were another set of calculations I pushed back to the original data set.  I knew from looking at the data that I couldn't just add views ranging between 10 and 35,000 to my radius, it just wouldn't work, my bars would be too long and the rest of the radial inside the bar chart wouldn't be seen.  So I decided to try and normalise the values for both Views and Followers so they ranged on a scale between 0 and 1.  Then those numbers could be added to the distance calculation without dominating the radial.  
I normalised the numbers in Excel by:


However, I quickly found when plotting the Radial Bar, that the range in values in the Views, meant the bar chart was massively impacted by a handful of visualisations that either were made Tableau 'Viz of the Day' or had even more impressive numbers.  

My radial bars when plotted looked like this:


I wasn't happy with that result.  

Whilst it is not technically wrong and it shows the 'reach' that some visualisations had were far more than others, I didn't feel it went far enough in 'celebrating' EVERYONE's work.  I wanted the final viz, that all the volunteers could look through and SEE their own work and efforts shown in the radial.  So I quickly realised to get a better 'form' to the visualisation to provide what I was looking for I needed to make the Radial bars work in a LOG scale.  Sadly though the Radial Bar chart is not an out-the-box chart type with an axis I can quickly alter through a tick box.  So here I went again, back to Excel to adjust the Views and Followers normalised values to work on a Log Scale using the LOG function in Excel to adjust the values.  This gave me a better range of bars, that allowed for the majority of the visualisations to be seen and interacted with.

The next calculation we needed to build the radial was Path.

As Luke Stanke says in his circular sankey tutorial, if Distance is our Radial, then Path is now correctly termed as Angle (remember we will be working with Trigonometry, SIN & COS).

This is the calculation used for Path:

This is the most complicated calculation used in the build of this radial and I will attempt to explain what is happening the best I can.

First thing to say is that [Rank Delta] and [Rank Delta 2] are both formulae that we haven't covered yet, but will do.  Suffice to say, they determine if the sankey lines need to go clockwise or anti-clockwise round the radial, dependent on which route is shorter.

Again because of the nature of this radial build we have a number of different calculations going on depending on the [T] value / radial element we are drawing.  But this time there are just two types of calculation being used.

I'll run through each.

Circle plots AND Reach elements ([T] = 0, 3, 4, 5, 8, 9 and 10) AND ([T] = 11 and =12)

These are combined here in the explanation because they are using the same logic.  We want to find the angle required for each of these and because all of these are plotting singularly a Project / Charity, a Volunteer or a Visualisation, all of these are plotted with the same method:

Order Number ('Ranking') of the element  /  total number of that element.

So for example, Projects / Charities ([T]=0):

[Rank Charity] / 31
because there are 31 Projects / Charities in the data set.

You will notice in the above image of the full calculation that I have highlighted the [Rank Select].  This is a parameter I used to adjust the total number of volunteers, because I found when visualising the data with the end radial, that some volunteers were duplicated because their names were misspelt or had a change or home town.  So it was easier to correct the data and then update the parameter than going back through any custom calculations.

Circular Sankey plots ([T] >=1 & <=2 and >=6 and <=7)

The circular sankey plots are where we make use of the 'Rank Delta' formulae - hence why there is a nested IF statement used.

This formula was taken from Luke's tutorial and I was able adjust it to work for my own needs.  Essentially as I understand it, the angle is calculated through a combination of the start and end positions and the angle is adjusted as the [T] value increases.

To quickly cover off the two Rank Delta calculations that determine if the sankey lines flow clockwise or anti-clockwise.


Trigonometry!

We are nearly there now to build the radial just a couple of calculations to finalise - our trigonometry calculations that allow us to plot on an X-Y grid.

Firstly, our X plot.  

The IF statement is removing unwanted [T] values (the Followers data and the volunteers with multiple vizzes), because those values we want to plot with lines AROUND the radial and not from in to out.

The [x] calculation is basic trigonometry with a +1 adjustment to the distance calculation (to make more space in the centre of the radial for labelling), and also a [Angle Adj] parameter to help me ensure the radial starts (Project / Charity) from the top (12 o'clock) position.

Our Y plot.
Same rules apply, except for excluding the [T] values as before.  This is because we want [y] to work for all [T] values.  

Next, we identify a second [x] plot called [x (2nd Axis)], this calculation covers all the [T] values previous excluded in the first [x] calculation.




We are now ready to build the radial in Tableau!

Without running a step-by-step "put pill A to card B, then drag....".  There are no Table Calculations to worry about so these two screenshots of the set up should provide everything required to build the visualisation.

Axis 1, this is using the Line marks to draw from the inner radial to the outer radial bars, using [T] for the path.

Axis 2, this is again using the Line mark to draw around the radial linking each volunteer.  Which produces our lines for the level of followers and also connecting the multiple vizzes for a single volunteer at the base of the Radial bar chart.



I am going to leave the build here and will be writing a THIRD blog in the next few days which will cover the DESIGN of the radial - what I did to finesse it, colours, sizes, imagery etc.

Thank you for reading and I hope this proves inspirational for others.

Lastly a BIG thank you to Luke Stanke, who's initial circular sankey work I based so much of my method for this off of.  Thanks Luke!

All the best and I hope you read the next blog post!

Sam 




Comments

  1. Thanks for sharing the best information and suggestions, it is very nice and very useful to us. I appreciate the work that you have shared in this post. Keep sharing these types of articles here.How To Use Search Function In Excel

    ReplyDelete

Post a Comment