Figures of the week
Below are some thoughts on each week's figures. Your opinion may differ, and that is okay. If you find a good (or bad, or ugly) visualization that you find interesting, send it to me!
Some of these figures are self-contained, but others will require taking a look at the document the figure is a part of.
-
Week 1: The Fed balance sheet. This is a fairly straight-forward figure. The x-axis is time and the y-axis is millions of dollars of different types of assets held by the Fed.
The figure is well-labeled. It will not look good printed in black-and-white, but this is clearly meant for web use. The colors are also a bit too saturated for my eye.
This is a stacked area graph. This is a good choice for representing the composition of a total. In this case, the total height at any point in time is the size of the balance sheet. The colors tell you the importance of the subcomponents of the total. It is quickly apparent that the Fed's balance sheet grew from less than $1 tril. to $4+ tril. through acquisitions of long-term treasuries and federal agency debt (Fannie and Freddie bonds). These purchases are largely the result of the quantitative and credit easing programs the Fed undertook during the last recession. -
Week 2: Popularity of programming languages. This figure is from the State of the Octoverse: a yearly report from Github.
The goal of this figure is pretty simple and it gets its point across. I like that the lines are labeled in 2019. We can immediately read off the current ordering of popularity. Plotting the shares of each language instead of their rank would have made this a stronger figure. With the market shares, we can see how languages are becoming more or less dominant over time.
When creating visualizations, every bit of ink should be helping the reader understand the data. In this case, the background is completely unnecessary. In fact, the background makes the lines harder to see: It's making the figure worse! -
Week 3: Farmers and the presidential vote. The figure is a scatter plot in which each point is a state. The x-axis is the Trump vote share and the y-axis is the share of farmers in the voting-age population. The data points are further differentiated by color: Red for Trump states and Blue for Clinton states.
This is a clean figure that makes it easy to see that Trump did better in the farming-intensive states (scatter plots are good choices for showing correlations). I like that the figure has words beneath the title that explain the data in the figure. This is part of fivethirtyeight's style: If you poke around a bit on the site, you will see this everywhere.
This figure left me wanting more: Which two states are up at the top? Directly below the figure is a table with exactly this kind of information. This is also a nice example of the idea that you do not always need a picture. Sometimes, a table is a better choice. -
Week 4: Trade flows. I find this figure incredibly hard to read. It reminds me of a mediocre Johnny Depp movie.
The reader needs to keep straight what the size and the color of each country mean. I have a hard time comparing sizes: Is the United States larger than Russia? By how much? Why are all the North American countries red? I do not think it is because they are North American. It has something to do with "Share of Partner in Opposite Direction." What does that mean?
This figure is trying to do too much at once and ends up doing nothing. -
Week 5: Birthdays. The figure here is called a heatmap. Each axis corresponds to a different variable: In this case, months on one axis and days of the month on the other. The color of each square is proportionate to a third variable: In this case, the number of people born on that day.
The eye is drawn to the dark bands in July, August, and September — these months are most popular for births. I will leave it to you to speculate as to why this pattern might occur.
Heatmaps are often used to summarize the correlation between many variables. This is done to easily see which variables are strongly related to each other. This example (last figure on the page, bottom right panel...See why we number our figures?) computes the correlation of planted vegetables. Darker colors are more positively-correlated vegetable pairs.
Lastly, heatmaps are popular in visualizing spatial data (yes, heatmaps on maps). The x-axis is longitude and the y-axis is latitude. Here is a somewhat-clunky map that shows the crime rate in and around Madison. (You can find the Madison police department's data portal here.) -
Week 6: U.S. energy usage. This figure is from a report on U.S. energy production and usage. In the introductory memorandum, it states that one of the objectives was to "develop an 'energy display' system which, in less than an hour, could give an extremely busy person an understanding of the size and complexity of our national energy dilemma." I wonder which extremely busy person they had in mind?
The figure is a Sankey diagram. Sankey diagrams visualize the flow through a network. The left-hand side of the figure is the production of energy and as we move to the right, we see the uses of the energy. This figure is for 1970. The figure gives a good sense of how important different sources are (oil, gas, and coal dominate) and where it goes (almost half of it is wasted!). Note also, that this figure is set up for black-and-white printing. This is not surprising give that it was created in 1973, but keep in mind that not all figures will be consumed in color.
Perhaps the most famous Sankey diagram — and one of the most famous visualizations created — is Minard's depiction of Napoleon's campaign into Russia. It shows with startling clarity the staggering losses Napoleon's forces suffered. I have several colleagues with framed reproductions of this figure on their office walls. (I know, I know...)
I have never made a Sankey diagram before, but it seems like a natural use would be to visualize income and expenditure in the macroeconomy or the uses of revenue in a firm. The package matplotlib has a facility for creating these figures, as does plotly. -
Week 7: The year in charts. This article has a lot of figures in it — it is worth looking through the whole thing.
The first two figures are nice examples of bar graphs. Each bar is a month, and color is used to differentiate the year. The y-axis does not have a line, just a few labels: No unnecessary ink. I like using grid lines that are the same color as the figure background. They let you get a sense of the height of the bars in a quantitative way, but they are invisible otherwise.
I do not like that the figure title is in the upper-left corner of the figure. It looks too much like the other blobs of text in the figure. (The blobs of text, however, I do like.) -
Week 8: Taste of Madison bands. This figure visualizes the number of times a given band has played the WJJO stage at Taste of Madison. I'm not much of a music person, but the title of the article click-baited its way into my brain.
This figure gets a lot right, but I'm mystified by the bars. Does each band get two bars? Why? It makes it look like there are four bands that have played five times. I would rather see one bar per band. We could put some space (padding) between the bars so that the band names (important data in this case) do not run into each other.
Given the height of the figure I would add the x-axis labels at the top of the figure as well. The grid lines are not very important here: the values are very discrete. Notice, though, that the grid lines are almost invisible when a bar crosses them. -
Week 9: Dining in the time of COVID-19. We are seeing a lot of data analysis surrounding the effects of COVID-19. We have so much more data, and with much greater detail, than we have had in previous crises. This figure looks at year-over-year changes in Open Table reservations (Q: Why year-over-year changes?).
The patterns are so obvious, we can get away with 50 small graphs in a grid. In general, this approach is problematic because the small size of each figure makes it difficult to see detail. I like the circles to indicate when a state shut down seated dining. It makes it easy to see that most of the decline happened before officials closed the restaraunts. The color—magenta?—doesn't appeal to me. It's harsh and doesn't really convey much. It is easy to see which bars are positive and which are negative. -
Week 10: Chinese tariff exposure by county. This figure summarizes each county's exposure to Chinese retaliatory tariffs. The counties are color-coded to indicate where they fall in the national distribution of tariff exposure. Dark-red counties are in the 90th decile and dark-blue counties are in the 10th decile.
I used a two-color scheme so that it is easy to see which counties are in the bottom half of the distribution (any color blue) and the top half of the distribution (any color red). Most people associate red with 'hot,' so it seemed natural to make the most exposed counties red. I originally did not have the white state borders and it was difficult to see where one state ended and the other began. It is easy to see now. [You might also take a look at figure 4. It is a horizontal bar graph, which we covered in Friday #9.] -
Week 11: Too many central bankers? This figure highlights two pieces of data: The relative size of the number of central bankers in three countries and the distribution of those central bankers within a country. The Euro area has more than twice as many bankers than the United States, which has about five times more than Japan. We see, for example, that in the United States, the New York Fed and the Board of Governors employ the largest numbers of central bankers.
This is an example of a treemap chart. It is not much different from a pie chart, and has many of the same weaknesses. I could not tell by comparing the sizes of the blocks that Europe had more than twice as many bankers than the United States—I figured it out by looking at the numbers reported at their tops. Part of the problem is that the blocks' areas represent the number of bankers, and area grows by the square, which is difficult for me to visualize. In this case, I think a table of numbers would have worked better, or maybe a stacked bar plot. If you do want to make treemaps, the plotly package will get the job done. -
Week 12: Covid models. (From this NYT article.) The figure presents covid-death forecasts for New York from five different models. Each model projection is a different color and the confidence intervals are in lighter shades. The Los Alamos confidence intervals are so large they encompass both "I am Legend" and "Breakfast at Tiffany's." All the projections seem to be showing a decline in deaths. There is a disturbing difference between the JHU reported deaths and the NYT reported deaths.
Which model belongs to the brownish confidence interval? None of the lines are brown. I see an orange and a red line as contenders. The correct answer is the red line—why not make the confidence interval pink? Or make the red line dark brown? I also do not like the text "Reported New York deaths" and "Five models of future New York deaths" at random places in the figure and in seemingly random font sizes and weights. Instead, I might have drawn a horizontal line below the x-axis with arrows on each end. Then I would have marked the line where the data stop and the forecasts begin. h/t Michael Marotta '20 -
Week 13: NFL draft scores. (by @RNBWCV) As a Packer fan, this figure is particularly depressing. The figure reports the scores for each NFL team from several draft "gurus." The use of color seems appropriate, with blue being "cool" and red being "hot." Of additional interest here is the inclusion of the team logos. It is reasonably easy to add an image to a matplotlib plot. This stack overflow thread discusses ways to do it.
Do the logos add any content to the figure? No, the team names are right next to the logos. Does it make the figure easier to read? Possibly. It might be easier to pick out a familiar logo than to read through each line searching for a team. -
Week 14: UW's covid dashboard. A dashboard is a collection of visualizations presented in one place that aims to provide the user with all the data needed to monitor the data stream and make decisions. There are many tools out there that allow you to easily get a dashboard up and running and the companies providing these tools are cranking out covid dashboards all over the world.
In theory, dashboards are great. Who wouldn't want all their analysis in one place? Dashboards, however, need the same care and attention that a figure in a publication would receive. Do I need that pie chart? Do I need data markers when there are 270+ data points? Do I need this visualization at all?
Never trust the defaults.