I recently started learning D3 in an attempt to add another arrow to my data visualization quiver. D3 is a data visualization library built on top of JavaScript, and is generally considered the most detail-oriented, top of the line, data visualization tool in the business. You control everything in your graphics.
I’m currently 100 pages into the wonderful book Interactive Data Visualization for the Web by Scott Murray and I figured this is a great time to take the training wheels off and build my first D3 plot outside of a canned tutorial. It’s also an opportunity to test out the new R package r2d3
, which let’s you build D3 plots within an R Markdown file.
A great use case for r2d3
is a situation where you want to conduct your data cleaning and analysis in R and your visualizations within D3. With r2d3
you can do it all within one R Markdown file. This creates a clean, short, and reproducible data pipeline. You don’t have to export an R data frame as a csv or JSON file and then import that file into an additional D3 file. You’ve cut out an additional data file and script file.
Let’s run through a basic example. Let’s say you want to plot the median household income of every North Carolina county. You want to use R to import and clean the data, and create the visualization with D3. Now this is a simple use case and ggplot within R could easily handle such a basic plot. But, I’m only one week into D3, so hang with me.
First, we can use the tidycensus
package in R to call the US Census API and import the median incomes of every county in North Carolina. We’ll also clean up the data once it’s imported, including ordering the data from highest to lowest median incomes.
library(tidyverse)
library(r2d3)
library(tidycensus)
library(knitr)
# import household median income for all counties in NC through the US Census API
median_income <- get_acs(geography = "county",
variables = "S1903_C03_001",
year = 2017,
state = "NC",
survey = "acs1")
# clean the data
median_income <- median_income %>%
# only keep needed columns
select(NAME, estimate) %>%
# remove phrase "county, North Carolina" from county name
mutate(NAME = str_replace(NAME, " County, North Carolina", "")) %>%
# order columns from highest to lowest in median income
arrange(desc(estimate)) %>%
# rename columns to more descriptive names
rename(county = NAME, income = estimate)
With R’s magic we’ve easily imported and cleaned the data. We now have an R data frame that holds the median incomes of all North Carolina counties. Here are the first five rows:
kable(head(median_income))
county | income |
---|---|
Union | 77691 |
Wake | 77318 |
Orange | 69940 |
Mecklenburg | 65588 |
Moore | 64184 |
Cabarrus | 61490 |
Now, we want a simple bar chart of these median incomes. Without the r2d3
package our hassles would start. We would have to export our R data frame as a csv or JSON file, create a separate D3 file and script, and import the data into the D3 script. Not a big deal, but annoying. It’s more files to manage and more path dependencies to preserve.
But with r2d3
we can simply pass our R data frame to an R Markdown d3 code block. In our running example, here is what the code chuck header looks like:
```{d3 data=median_income, options=list(color = 'teal'), fig.height = 10}
data=median_income
makes everything possible. We pass an R data frame to the data
argument and r2d3
converts the R data frame to a JSON file. It’s this JSON file that D3 works with. Then, within the code block we create a stunning plot using D3 and JavaScript (or in the case of our beginner bar chart, not so stunning).
// create comma format for median income numbers, which we will
// use when displaying the numbers on the plot
format = d3.format(",");
// create padding variable
var padding = 20
// create scaling variable
var xScale = d3.scaleLinear()
.domain([0, d3.max(data, function(d) {return d.income; })])
.range([padding, width * .8]); // make width 80% of frame
var yScale = d3.scaleBand()
.domain(d3.range(data.length))
.rangeRound([0, height])
.paddingInner(0.05);
// make x axis
var xAxis = d3.axisBottom()
.scale(xScale);
// create variable that signifies how much to offset the y text values
// it will be used for county names and text labels of median income values
var yTextOffset = yScale.bandwidth() *.7
// create bar chart with data of median income
svg.selectAll("rect")
.data(data)
.enter()
.append('rect')
.attr('width', function(d) { return xScale(d.income); })
.attr('height', yScale.bandwidth())
.attr('y', function(d, i) { return yScale(i);})
.attr('fill', options.color)
// create variable to store both text elements
// (county name and text value of median income)
var texts = svg.selectAll("text")
.data(data)
.enter();
// create text of county name
texts.append('text')
.text(function(d) {return d.county})
.attr('y', function(d, i) { return yScale(i) + yTextOffset; })
.attr('x', function(d){ return 3})
.attr('font-size', '13px')
.attr('font-family', 'sans-serif')
.attr('fill', 'white');
// create text of median income value
texts.append('text')
.text(function(d) {return "$" + format(d.income)})
.attr('y', function(d, i) { return yScale(i) + yTextOffset; })
.attr('x', function(d) { return xScale(d.income) - 50; })
.attr('font-size', '13px')
.attr('font-family', 'sans-serif')
.attr('fill', 'white')
.attr('font-weight', 'bold');
Tada, my first non-tutorial D3 graphic! All within the comfort of RStudio and an R Markdown file.
Now let’s talk D3. That simple bar chart took me 50 lines of code! No interactivity, not even a title (next chapter in the book I’m working through). With D3, not only do you get to specify everything, but you have to specify everything.
In reality, a simple bar chart like this doesn’t need D3’s power. For example, here’s a similar bar chart in ggplot.
ggplot(median_income, aes(fct_reorder(county, income), income)) +
geom_col(fill = 'lightseagreen') +
coord_flip() +
scale_y_continuous(labels = scales::dollar) +
labs(title = 'North Carolina median household income by county',
x = 'Median Income',
y = 'County') +
theme_minimal()
Seven lines of code. And it probably looks better than my D3 version. In fairness to D3, however, I’ve been using ggplot for over two years and I just hit my one week anniversary with D3.
D3 lets you customize everything and your imagination is the only limiting factor. On the surface this is great. But, you’d better have a good imagination. There are few defaults, few presets; you define everything. The color palettes, fonts, spacing, and sizing are all on you. For everything. So you better have the imagination, creativity, and patience to define your graphics.
For simple graphics like the bar chart above D3 isn’t worth it. ggplot will still be my go-to tool. But, I can envision D3’s power when I need an interactive web visualization that pops and it’s not just another bar chart or line chart. And hopefully my next D3 post will unveil a graphic with a bit more sizzle and flair.
Header photo courtesy of Dardan