home / fivethirtyeight / most-common-name/adjusted-name-combinations-matrix

Menu
  • GraphQL API

most-common-name/adjusted-name-combinations-matrix: 19

This directory contains the code and data behind the story Dear Mona, What’s The Most Common Name In America?

The main script file is most-common-name.R

There are four input files:

  • state-pop.csv - Total population and Hispanic population by state.
  • surnames.csv - Data on surnames from the U.S. Census Bureau, including a breakdown by race/ethnicity.
  • aging-curve.csv - Data from the Social Security Administration on the chances that someone born in the decade shown was still alive in 2013: http://www.ssa.gov/oact/NOTES/as120/LifeTables_Tbl_7.html
  • adjustments.csv - Taken directly from Lee Hartman's article: http://mypage.siu.edu/lhartman/johnsmith.html.

And five output files:

  • adjusted-name-combinations-list.csv - Adjusted estimates for the most common full names.
  • adjusted-name-combinations-matrix.csv - The same data from the file adjusted-name-combinations-list.csv but in matrix form. These are the estimates presented in the second (and final) table of the article.
  • independent-name-combinations-by-pop.csv - Matrix of estimates for the top 100 most common first names by top 100 most common surnames. These were calculated using independent odds, and displayed in the first table presented in the article.
  • new-top-firstNames.csv - Final estimated ranking of top first names.
  • new-top-surnames.csv - Final estimated ranking of top surnames.

Data source: https://github.com/fivethirtyeight/data/blob/master/most-common-name/adjusted-name-combinations-matrix.csv

This data as json, copyable

rowid Unnamed: 0 FirstName SMITH JOHNSON WILLIAMS BROWN JONES GARCIA RODRIGUEZ MILLER MARTINEZ DAVIS HERNANDEZ LOPEZ GONZALEZ WILSON ANDERSON THOMAS TAYLOR LEE MOORE JACKSON
19 66 Mark 9368.70696605657 8189.30425322347 5558.58805597028 4570.49562730225 4382.62989491142 1508.69831211492 1232.88145178821 5964.8689706833 1527.38081379791 4405.49784213841       3639.01752851825 4192.19400520269 2838.14785350277 3051.66142742583 1811.77602332064 2045.74460158949 2010.93578090779
Powered by Datasette · Queries took 1361.839ms · Data source: https://github.com/fivethirtyeight/data/blob/master/most-common-name/adjusted-name-combinations-matrix.csv