Visualizing Bar Passage with Python’s Altair

I stumbled the Python visualization library Altair about 6 months ago. I was instantly hooked, as it blows away the reinging king of Python visualizations matplotlib. But, I hate matplotlib. Between matplotlib and a half-missing box crayons, I’ll take the crayons.

That’s why Altair’s clean graphics and logical syntax easily capterd me. But, I’ve really only dabbled in Altair, and have not explored its interactivity at all. I’ll dig a little deeper into Altair’s interactivity in this post, and see if I can uncover some bar passage insights along the way.

State and national bar passage rates

The first plot shows individual state bar passage rates, with the US rate superimposed in black. Lines for states highlight when you hover over them and a tooltip window shows the highlighted state and its passage rate.

There is one problem with the chart as-is. The tooltip only reveals its fruits when you hover over the intersection of the line and year; you cannot just hover over any section of a state’s line. You’ve got to go to the specific section of the state’s line where it crosses a year. The issue is addresses in this Stackoverflow question, but I could not successfully implement the solution.

import pandas as pd
import altair as alt
import numpy as np

# isolate unique years, to set as tick mark labels on x-axis
unique_years = list(state['calendaryear'].unique())

# select line when mouse hovers over it
selector = alt.selection_single(on='mouseover', nearest=True, empty='none')

# create lines for state rates
state_lines = alt.Chart(state).mark_line().encode(
    x=alt.X('calendaryear:Q', title='Year', 
            axis=alt.Axis(values=unique_years,
                         format="0")),
    y=alt.Y('state_pass_pct', title='Bar Pass Rate (%)',
            scale=alt.Scale(zero=False),
            axis=alt.Axis(format='%')),
    color=alt.condition(selector, 'state:N', alt.value('lightgray'), legend=None),
    tooltip=[alt.Tooltip('state:N', title='State'),
             alt.Tooltip('calendaryear:Q', title='Year'),
             alt.Tooltip('state_pass_pct:Q', title='Bar Passage (%)')]
).add_selection(
        selector
)

# create single line for US rate
us_lines = alt.Chart(master_nation).mark_line().encode(
    x='calendaryear:Q',
    y='avg_pass:Q',
    color=alt.value('black')
)

alt.layer(state_lines, us_lines
).properties(
    width=600,
    height=400
)

Does the drop in bar passage relate to drops in applications, GPAs and/or LSAT scores?

The line charts - both at the state level and nation-wide - show a general decrease in bar passage. Moreover, the decrease picks up traction in 2013. Let’s see if this relates to a decrease in average LSAT and undergraduate GPA. The plot below shows the average LSAT and undergraduate GPA for all entering students, along with the nation-wide bar passage rate and the number of people applying to law school. Since we’re focusing on comparing trends, and not examining absolute values, the data was standardized to have a mean of zero and standard deviation of one. This places bar passage, LSAT, gpa, and number of applicants on the same scale; allowing for easy visual comparisions of yearly trends.

# create line chart with horizontal rule selector --------------
# taken from https://altair-viz.github.io/gallery/multiline_tooltip.html

# Create a selection that chooses the nearest point & selects based on x-value
nearest = alt.selection(type='single', nearest=True, on='mouseover',
                        fields=['calendaryear'], empty='none')

# create line chart
line = alt.Chart(master_nation_scale).mark_line(point=True).encode(
    x=alt.X('calendaryear:O', title='Year'), 
    y=alt.Y('value', title='Scaled Value'),
    color='variable:N'
).properties(
    width=400,
    height=300
)

# Transparent selectors across the chart. This is what tells us
# the x-value of the cursor
selectors = alt.Chart(master_nation_scale).mark_point().encode(
    x='calendaryear:O',
    opacity=alt.value(0),
).add_selection(
    nearest
)

# Draw points on the line, and highlight based on selection
points = line.mark_point().encode(
    opacity=alt.condition(nearest, alt.value(2), alt.value(0))
)

# Draw a rule at the location of the selection
rules = alt.Chart(master_nation_scale).mark_rule(color='gray').encode(
    x='calendaryear:O',
).transform_filter(
    nearest
)

# Put the four layers into a chart and bind the data
alt.layer(
    line, selectors, points, rules
).properties(
    width=400, height=300
)

It’s clear from the chart that bar passage, GPA, LSAT, and the number of applicants rise and fall together. Correlation isn’t causation, of course, but it’s also not nothing.

The relationship between bar passage and school-level undergraduate GPA and LSAT scores

Now let’s create scatter plots of bar passage and a school’s median undergraduate GPA, and bar passage and median LSAT scores. The scatterplots below incorporate brushing interactivity. You can click and drag your mouse over points in one of the plots and those points will become highlighted in the other plot.

brush = alt.selection(type='interval', resolve='global')

base = alt.Chart(school_pass).mark_circle(size=60).encode(
        y=alt.Y('avgschoolpasspct', title='School Bar Passage Rate (%)',
               scale=alt.Scale(zero=False),
               axis=alt.Axis(format='%')),
        color=alt.condition(brush, alt.value('steelblue'), alt.value('lightgray')),
        tooltip=[alt.Tooltip('schoolname:N', title='School'),
                 alt.Tooltip('calendaryear:Q', title='Year'),
                 alt.Tooltip('adj_uggpa50:Q', title='Median GPA'),
                 alt.Tooltip('adj_lsat50:Q', title='Median LSAT'),
                 alt.Tooltip('avgschoolpasspct:Q', title='Bar Passage (%)')]
).add_selection(
    brush
).properties(
    width=400,
    height=400
)

alt.hconcat(base.encode(x=alt.X('adj_uggpa50', title='Median GPA' , scale=alt.Scale(zero=False))), 
            base.encode(x=alt.X('adj_lsat50', title='Median LSAT', scale=alt.Scale(zero=False))))

Did different types of schools see different changes in their LSAT / GPA

For the final set of graphs, we will see if different types of schools saw different changes in bar passage rates. For each year, let’s bin all law school in to one of seven buckets, with the buckets created as follows:

Standardized all median GPA and median LSAT scores for a given year to have a mean of zero and standard deviation of one;
Sum these two scores for each school;
Rank all schools based on this summed score; and
Assign all schools to one of seven bins, from lowest to highest based on the ranking of the summed score, with an equal number of schools in each bin.

We can then calculate the average bar passage rate, percentage change in applications, median GPA, and median LSAT for each of the seven bins. This might tell us whether school with lower ow higher GPA and LSAT combos say larger changes in bar passage, applications, GPA, and LSAT scores than other schools.

We’re using percentage changes in applications instead of application numbers because top schools receive a lot more applications than other schools. The percentage change allows us to normalize the data and spot relative differences.

Additionally, the percentage change is each year’s percentage change since 2013. That’s why 2013 will be zero for all bins.

We’ll create four different plots that will be identical except for the y-axis. To simplify our code, we’ll create a function for the plot with parameters to control the y axis values and labels. That way, we don’t have to copy and paste the code four times.

When you hover over the plots, a vertical line is displayed along with text of the y value. This allows for easy comparison of the y values, which are the values of interest for these plots.

def group_plot(y_axis, y_axis_label, y_axis_format):
    
    # Create a selection that chooses the nearest point & selects based on x-value
    nearest = alt.selection(type='single', nearest=True, on='mouseover',
                            fields=['calendaryear'], empty='none')

    # create line chart
    line = alt.Chart(groups).mark_line(point=True).encode(
        x=alt.X('calendaryear:O', title='Year'), 
        y=alt.Y(y_axis, title=y_axis_label,
               scale=alt.Scale(zero=False),
               axis=alt.Axis(format=y_axis_format)),
        color=alt.Color('group_num:N', title='LSAT / GPA Combo', sort=gp_descriptions)
    ).properties(
        width=400,
        height=300
    )

    # Transparent selectors across the chart. This is what tells us
    # the x-value of the cursor
    selectors = alt.Chart(groups).mark_point().encode(
        x='calendaryear:O',
        opacity=alt.value(0),
    ).add_selection(
        nearest
    )

    # Draw points on the line, and highlight based on selection
    points = line.mark_point().encode(
        opacity=alt.condition(nearest, alt.value(2), alt.value(0))
    )
    
    # Draw text labels near the points, and highlight based on selection
    text = line.mark_text(align='left', dx=5, dy=-5).encode(
        text=alt.condition(nearest, y_axis, alt.value(' '))
    )

    # Draw a rule at the location of the selection
    rules = alt.Chart(groups).mark_rule(color='gray').encode(
        x='calendaryear:O',
    ).transform_filter(
        nearest
    )

    # Put the four layers into a chart and bind the data
    final = alt.layer(
        line, selectors, points, rules, text
    )
    
    return(final)

Now, we’ll create each individual plot.

# create individual plots
bar_pass_plot = group_plot('bar_pass:Q', 'Bar Passage Rate', "%")
avg_gpa_plot = group_plot('avg_gpa:Q', 'Average GPA', ".2")
avg_lsat_plot = group_plot('avg_lsat:Q', 'Average LSAT', "0")
perc_app_plot = group_plot('apps_perc_change:Q', 'Percent Change in Applications Since 2013', "%")

linechart_top = alt.hconcat(bar_pass_plot, perc_app_plot)
linechart_bottom = alt.hconcat(avg_gpa_plot, avg_lsat_plot)

linechart_top

linechart_bottom

The story of these plots lie on the top two. Look at the purple line, which represents schools with the lowest LSAT / GPA combo. Not only did their bar passage rates fall sharper than other groups, but their percentage decline in applications collapsed. It dropped nearly 40% between 2013 and 2017. Conversly, the blue lines - representing schools with the highest LSAT / GPA combo - stayed relatively stable in both bar passage and the number of applications.

Summary

Prior to Altair, Python lagged behind R in visualizations. This is no longer so. It’s still relatively new so documentation is slim and features are lacking at times. But, the graphics are stunning and it’s easy to add simple interactivity. Plus, the syntax is easy to grasp. Overall, it fills a void in the Python data visualization landscape, and I look forward to diving deeper into its mechanics.