As an established tech job board going back many years, Dice has a treasure trove of data from which it is always looking to yield insights. In 2016, they produced a visualisation showing how two measures, supply and demand, were correlated for skills and job titles. A year later, an update was due and I became involved. Having greater experience with D3, I was able to move beyond the standard form provided by the C3 framework and explore bespoke possibilities.
This is a slightly more in-depth version of the Dice Insights blog post you should definitely check out and describes some of the processes behind the scenes and the design decisions that went into this iteration.
When I approached the original visualisation, I was looking for the specific insights it afforded and how these were conveyed. Restricting myself the skills, each one had two key variables, supply and demand. These were defined as the relative proportion of candidate profiles and job postings, respectively, normalised. The ratio between supply and demand was naturally captured by the term 'heat'. This is clearly more useful than knowing about either of these indicators separately.
A scatterplot is the standard form for a data scientist to inspect the relationship between these two variables. The original visualisation used a regression line to emphasise the insight that market forces were working and job seekers were listing skills that employers were looking for. Unfortunately, this plot made it hard to present more than a few hundred points before they started severely obscuring each other.
For people in the careers marketplace - either as recruiters, candidates or hirers - there were a host of additional questions that the visualisation could help answer: Which skills are the most underserved? As a candidate, which of my skills are most 'valuable'? Which should I learn? What are the outliers where demand is chronically underserved or there is a glut of supply?
This visualisation displays many numbers, of percentages and ratios, in the popup and on the logarithmic axis. I reproduced these initially until I realised that they only carried insight within the context of this visualisation, relative to other skills. It's impossible to visually read the exact position of a data point on an axis. It was also hard to compare skills which might have a similar heat but were distant on the 2-dimensional axis. Concluding that the display of precise numeric values was transmitting lots of data with negligible insight I removed them. The chart immediately felt unburdened.
By focusing on the comparison in terms of heat and exposing all points we arrived at an alternative plot: the beeswarm plot.
The bee swarm gave us a much more economical use of the space, allowing us to easily double the number of skills shown. Clustering them also enabled easier comparison whilst distributing them along the axis of primary interest - heat. D3 provided the force-based layout behind this. It also made it straightforward to transition to and from the scatterplot.
The space gained supported another enhancement to this visualisation: categories. We already had a categorisation of skills from earlier work which was easily adapted for this purpose. After experimenting with colours and multiple simultaneous groups, we settled on pulling out one category at a time.
By request, I also added programming languages. This was more of a grey area than you might expect at first: SQL? Shell? HTML? jQuery? Ruby on rails? As a rule, I excluded frameworks and probably anything that wasn't Turing complete. For support, I used Wikipedia's list of programming languages.
A clear enhancement for this iteration was rapid labelling and supporting multiple persistent labels for easier comparison. However, this gave rise to the challenge of overlapping placement - particularly with the dense layout of Bee swarm. Force layout to the rescue again, anchoring labels to their skills and letting them find free space even as the skills transition between different arrangements.
Aligning skills along a heat axis highlighted the most important feature of the skills but it did drop the source information about the demand and supply values. Very common skills were indistinguishable from rare ones if their heat was the same. I tried restoring this by splitting the circular nodes into semi-circles with the top and bottom half representing demand and supply respectively. This felt like an intuitive and visually appealing way to include these factors without disturbing the base chart or demanding much more space.
We had lots of great feedback in places like Fast Company and already have the next iteration in testing.