| Title: | Updated US State Facts and Figures |
|---|---|
| Description: | Updated versions of the 1970s "US State Facts and Figures" objects from the 'datasets' package included with R. The new data is compiled from a number of sources, primarily from the United States Census Bureau or the relevant federal agency. Modern tidy tibbles provide richer state-level data including identifiers, geography, capitals, demographics, and socioeconomic statistics. Convenience vectors parallel the base 'datasets' state objects but extend coverage to all 51 jurisdictions: the 50 states and the District of Columbia. |
| Authors: | Kiernan Nicholls [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-9229-7897>) |
| Maintainer: | Kiernan Nicholls <[email protected]> |
| License: | CC BY 4.0 |
| Version: | 1.0.0.9000 |
| Built: | 2026-05-14 17:54:04 UTC |
| Source: | https://github.com/k5cents/usa |
The United States Postal Service's official names for the cities in which ZIP codes are contained. This vector contains unique values, sorted alphabetically; because of this, they do not line up the other vectors in the way zip_codes and zip_centers do.
city_namescity_names
A character vector of length 19108.
Daniel Coven's web site and the CivicSpace US ZIP Code Database written by Schuyler Erle [email protected], 5 August 2004.
The county subdivisions of the US states and territories.
countiescounties
A tibble with 3,235 rows and 3 variables:
Five-digit FIPS code (state FIPS + county FIPS)
County name (type suffix such as "County", "Parish", "Borough" removed)
USPS state/territory abbreviation
Census TIGER 2020 national county reference file, https://www2.census.gov/geo/docs/reference/codes2020/national_county2020.txt
The name of distinct US counties.
county_namescounty_names
A character vector of length 1,925.
Census TIGER 2020 national county reference file, https://www2.census.gov/geo/docs/reference/codes2020/national_county2020.txt
A statistically representative synthetic sample of 20,000 Americans. Each record is a simulated survey respondent.
peoplepeople
A tibble with 20,000 rows and 40 variables:
Sequential unique ID
Random first name, see details
Random last name, see details
Gender (male/female)
Age capped at 85
Race and Ethnicity
Educational attainment
Census regional division
Marital status
Household size
Has children
Is a US citizen
Was born in the US
Family income
Employment status
Employment sector
Hours worked per week
Hours vary week to week
Has served in the military
Home ownership
Lives in metropolitan area
Household has internet access
Receives food stamps
Moved in the last year
Contacted or visited a public official
Participated in a product boycott
Participated in a community association
Talked with neighbors
Trusts neighbors
Uses a tablet or e-reader
Uses text messaging
Uses social media
Volunteered
Is registered to vote
Voted in the most recent midterm election
Political party
Religious (evangelical) affiliation
Political ideology
Follows government and public affairs
Owns a gun
This dataset was originally produced by the Pew Research center for their paper entitled For Weighting Online Opt-In Samples, What Matters Most? The synthetic population dataset was created to serve as a reference for making online opt-in surveys more representative of the overall population.
See Appendix B: Synthetic population dataset for a more detailed description of the method for and rationale behind creating this dataset.
In short, the dataset was created to overcome the limitations of using large, federal benchmark survey datasets such as the American Community Survey (ACS) or Current Population Survey (CPS). These surveys often do not contain the exact questions asked in online-opt in surveys, keeping them from being used for proper adjustment.
This synthetic dataset was created by combining nine separate benchmark datasets. Each had a set of common demographic variables but many added unique variables such as gun ownership or voter registration. The surveys were combined, stratified, sampled, combined, and imputed to fill missing values from each. From this large dataset, the original 20,000 surveys from the ACS were kept to ensure accurate demographic distribution.
The names were randomly assigned to respondents to better simulate a
synthetic sample of the population. First names were taken from the
babynames dataset which contains the Social Security Administration's
record of baby names from 1880 to 2017 along with gender and proportion.
First names were proportionally randomly assigned by birth year and sex. Last
names were taken from the Census Bureau, who provides the 162,254 most common
last names in the 2010 Census, covering over 90% of the population. For a
given surname, the proportion of that name belonging to members of each race
and ethnicity is provided. The last names were proportionally randomly
assigned by race.
“For Weighting Online Opt-In Samples, What Matters Most?” Pew Research Center, Washington, D.C. (January 26, 2018) https://www.pewresearch.org/methods/2018/01/26/for-weighting-online-opt-in-samples-what-matters-most/
The 2-letter USPS abbreviations for the 50 states and District of Columbia. Parallel to state_names.
state_abbsstate_abbs
A character vector of length 51.
https://www2.census.gov/geo/docs/reference/state.txt
Land area in square miles for the 50 states and District of Columbia. Parallel to state_names.
state_areasstate_areas
A numeric vector of length 51.
TIGER/Web REST API (State_County layer)
Capital cities for the 50 states and District of Columbia, with coordinates and 2020 Census population.
state_capitalsstate_capitals
A tibble with 51 rows and 5 variables:
2-letter USPS abbreviation (join key)
Capital city name
Latitudinal coordinate of the capital
Longitudinal coordinate of the capital
Capital city population (2020 Decennial Census, city proper)
https://www.census.gov/quickfacts/
A list with components named x and y giving the approximate geographic
centroid of each state in longitude and latitude. Parallel to state_names.
state_centersstate_centers
A list of length two, each element a numeric vector of length 51.
Centroid longitudinal coordinate
Centroid latitudinal coordinate
TIGER/Web REST API (State_County layer)
Take a vector of state identifiers and convert to a common format. Supports all five identifier types in state_ids: USPS abbreviation, full name, FIPS code, AP style abbreviation, and ISO 3166-2 code.
state_convert(x, to = c("abb", "name", "fips", "ap", "iso"))state_convert(x, to = c("abb", "name", "fips", "ap", "iso"))
x |
A character vector of state identifiers in any supported format. |
to |
The format returned: |
A character vector of single format state identifiers.
state_convert(c("AL", "Vermont", "06")) state_convert(c("AL", "Vermont", "06"), to = "name") state_convert(c("AL", "Vermont", "06"), to = "fips") state_convert(c("AL", "Vermont", "06"), to = "ap") state_convert(c("AL", "Vermont", "06"), to = "iso")state_convert(c("AL", "Vermont", "06")) state_convert(c("AL", "Vermont", "06"), to = "name") state_convert(c("AL", "Vermont", "06"), to = "fips") state_convert(c("AL", "Vermont", "06"), to = "ap") state_convert(c("AL", "Vermont", "06"), to = "iso")
The Census division to which each state belongs, one of nine. Parallel to state_names.
state_divisionsstate_divisions
A factor vector of length 51.
New England
Middle Atlantic
East North Central
West North Central
South Atlantic
East South Central
West South Central
Mountain
Pacific
https://www2.census.gov/programs-surveys/popest/geographies/2018/state-geocodes-v2018.xlsx
Updated version of the datasets::state.x77 matrix, which provided eight statistics from the 1970s. This version is a modern tibble with updated statistics.
state_factsstate_facts
A tibble with 51 rows and 9 variables:
Full state name
Resident population (2020 Decennial Census, April 1, 2020)
Votes in the Electoral College (2020 Census reapportionment, applies 2022–2032)
The date on which the state was admitted to the union
Per capita income in dollars (2022 ACS 1-year)
Life expectancy at birth in years, both sexes (2021 NCHS)
Homicide rate per 100,000 population (2022 FBI NIBRS)
Proportion of population 25+ with a bachelor's degree or higher (2022 ACS 1-year)
Mean number of days per year with minimum temperature below freezing (1991-2020 NCEI Climate Normals)
See also state_ids for state identifiers and state_geo for geography.
Population: 2020 Decennial Census PL 94-171 file, variable P1_001N via tidycensus
Electoral College: 2020 Census reapportionment (NARA https://www.archives.gov/electoral-college/allocation)
Income: 2022 ACS 1-year, variable B19301_001 (per capita income) via tidycensus
Life Expectancy: NCHS 2021 state life tables via https://data.cdc.gov/api/views/it4f-frdc/rows.csv
Murder: FBI Crime Data Explorer API (2022 NIBRS)
Education: 2022 ACS 1-year Subject Table S1501, variable S1501_C02_015 via tidycensus
Frost: NCEI 1991-2020 Climate Normals, variable ANN-TMIN-AVGNDS-LSTH032, https://www.ncei.noaa.gov/data/normals-annualseasonal/1991-2020/
Geographic and classificatory properties for the 50 states and District of
Columbia. Keyed by abb to join with state_ids.
state_geostate_geo
A tibble with 51 rows and 10 variables:
2-letter USPS abbreviation (join key)
Census Bureau region
Census Bureau division
Land area in square miles
Water area in square miles
Centroid latitudinal coordinate
Centroid longitudinal coordinate
TRUE for the 48 contiguous states and DC;
FALSE for Alaska and Hawaii
TRUE for states with no coastline on an ocean,
gulf, or Great Lake (21 states including DC)
Elevation of the state high point in feet
Regions and divisions: https://www2.census.gov/programs-surveys/popest/geographies/2018/state-geocodes-v2018.xlsx
Area and centroids: TIGER/Web REST API (State_County layer)
Peak elevations: USGS state high point records
The 50 states and District of Columbia — all naming and coding
systems used to refer to each state. The backing data for state_convert().
state_idsstate_ids
A tibble with 51 rows and 6 variables:
Full legal name
2-letter USPS abbreviation
Federal Information Processing Standard Publication 5-2 code
IPUMS Integrated Census Project (STATEICP) code, zero-padded 2-digit string
AP style abbreviation; the 8 states with no AP abbreviation (Alaska, Hawaii, Idaho, Iowa, Maine, Ohio, Texas, Utah) use the full state name per AP style
ISO 3166-2 code (e.g. "US-AL")
Naming convention: underscore objects (state_ids, state_facts,
state_geo) are modern purpose-built tibbles. Convenience vectors
(state_abbs, state_names, etc.) mirror the base R
datasets::state.* vectors but cover all 51 rows (50 states + DC).
Names, abbreviations, FIPS: https://www2.census.gov/geo/docs/reference/state.txt
ICP codes: https://usa.ipums.org/usa-action/variables/STATEICP
AP abbreviations: AP Stylebook
ISO 3166-2: ISO Online Browsing Platform
The full names for the 50 states and District of Columbia. Parallel to state_abbs.
state_namesstate_names
A character vector of length 51.
https://www2.census.gov/geo/docs/reference/state.txt
The Census region to which each state belongs, one of four. Parallel to state_names.
state_regionsstate_regions
A factor vector of length 51.
Northeast
Midwest
South
West
https://www2.census.gov/programs-surveys/popest/geographies/2018/state-geocodes-v2018.xlsx
The 6 US territories: Puerto Rico (PR) and the 5 island territories (AS, GU, MP, UM, VI).
territoryterritory
A tibble with 6 rows and 6 variables:
2-letter abbreviation
Full legal name
Federal Information Processing Standard Publication 5-2 code
Area in square miles
Center latitudinal coordinate
Center longitudinal coordinate
The 2-letter abbreviations for the US territories (PR, AS, GU, MP, UM, VI).
territory_abbsterritory_abbs
A character vector of length 6.
https://www2.census.gov/geo/docs/reference/state.txt
The area in square miles of the US territories (PR, AS, GU, MP, UM, VI).
territory_areasterritory_areas
A numeric vector of length 6.
TIGER/Web REST API (State_County layer)
A list with components named x and y giving the approximate geographic
center of each territory in longitude and latitude.
territory_centersterritory_centers
A list of length two, each element a numeric vector of length 6.
Center longitudinal coordinate
Center latitudinal coordinate
TIGER/Web REST API (State_County layer)
The full names for the US territories (PR, AS, GU, MP, UM, VI).
territory_namesterritory_names
A character vector of length 6.
https://www2.census.gov/geo/docs/reference/state.txt
A list with components named x and y giving the approximate geographic
center of each ZIP code in longitude and latitude.
zip_centerszip_centers
A list of length two, each element a numeric vector of length 44336.
Center longitudinal coordinate
Center latitudinal coordinate
Daniel Coven's web site and the CivicSpace US ZIP Code Database written by Schuyler Erle [email protected], 5 August 2004.
The United States Postal Service's 5-digit codes used to identify a particular postal delivery area.
zip_codeszip_codes
A character vector of length 44336.
Daniel Coven's web site and the CivicSpace US ZIP Code Database written by Schuyler Erle [email protected], 5 August 2004.
This tibble contains city, state, latitude, and longitude for U.S. ZIP codes
from the CivicSpace Database (August 2004) augmented by Daniel Coven's web site (updated on January 22, 2012).
The data was originally contained in the
zipcode CRAN package, which
was archived on January 1, 2020.
zipcodeszipcodes
A tibble with 44,336 rows and 5 variables:
5 digit ZIP code or military postal code (FPO/APO)
USPS official city name
USPS official state, territory abbreviation code
Decimal latitude
Decimal longitude
Daniel Coven's web site and the CivicSpace US ZIP Code Database written by Schuyler Erle [email protected], 5 August 2004.