Download American Community Survey data from the U.S. Census Bureau and reformat it for humans.
All of the data files processed by this repository are published in the data/processed/
folder. They can be called in to applications via their raw URLs, like https://raw.githubusercontent.com/datadesk/census-data-downloader/master/data/processed/acs5_2017_population_counties.csv
The library can be installed as a command-line interface that lets you download files on demand.
bash
$ pipenv install census-data-downloader
There's now a tool named censusdatadownloader
ready for you.
```bash Usage: censusdatadownloader [OPTIONS] TABLE COMMAND [ARGS]...
Download Census data and reformat it for humans
Options: --data-dir TEXT The folder where you want to download the data --year [2009-2020] The years of data to download. By default it gets only the latest year. Not all data are available for every year. Submit 'all' to get every year. --force Force the downloading of the data --help Show this message and exit.
Commands: aiannhhomelands Download American Indian, Alaska Native and... cnectas Download combined New England city and town... congressionaldistricts Download Congressional districts counties Download counties in all states countysubdivision Download county subdivisions csas Download combined statistical areas divisions Download divisions elementaryschooldistricts Download elementary school districts everything Download everything from everywhere msas Download metropolitian statistical areas nationwide Download nationwide data nectas Download New England city and town areas places Download Census-designated places pumas Download public use microdata areas regions Download regions secondaryschooldistricts Download secondary school districts statelegislativedistricts Download statehouse districts states Download states tracts Download Census tracts unifiedschooldistricts Download unified school districts urbanareas Download urban areas zctas Download ZIP Code tabulation areas ```
Before you can use it you will need to add your CENSUS_API_KEY to your environment. If you don't have an API key, you can go here. One quick way to add your key:
bash
$ export CENSUS_API_KEY='<your API key>'
Using it is as simple as providing one our processed table names to one of the download subcommands.
Here's an example of downloading all state-level data from the medianage
dataset.
bash
$ censusdatadownloader medianage states
You can specify the download directory with --data-dir
.
bash
$ censusdatadownloader --data-dir ./my-special-folder/ medianage states
And you can change the year you download with --year
.
bash
$ censusdatadownloader --year 2010 medianage states
That's it. Mix and match tables and subcommands to get whatever you need.
You can also download tables from Python scripts. Import the class of the processed table you wish to retrieve and pass in your API key. Then call one of the download methods.
This example brings in all state-level data from the medianhouseholdincomeblack dataset.
```python
from census_data_downloader.tables import MedianHouseholdIncomeBlackDownloader downloader = MedianHouseholdIncomeBlackDownloader('
') downloader.download_states() ```
You can specify the data directory and the years by passing in the data_dir
and years
keyword arguments.
```python
downloader = MedianHouseholdIncomeBlackDownloader('
', data_dir='./', years=2016) downloader.download_states() ```
A gallery of graphics powered by our data is available on Observable.
The Los Angeles Times used this library for an analysis of Census undercounts on Native American reservations. The code that powers it is available as an open-source computational notebook.
Subclass our downloader and provided it with its required inputs.
```python import collections from census_data_downloader.core.tables import BaseTableConfig from census_data_downloader.core.decorators import register
@register class MedianHouseholdIncomeDownloader(BaseTableConfig): PROCESSED_TABLE_NAME = "medianhouseholdincome" # Your humanized table name UNIVERSE = "households" # The universe value for this table RAW_TABLE_NAME = 'B19013' # The id of the source table RAW_FIELD_CROSSWALK = collections.OrderedDict({ # A crosswalk between the raw field name and our humanized field name. "001": "median" }) ```
Add it to the imports in the __init__.py
file and it's good to go.
The command-line interface is implemented using Click and setuptools. To install it locally for development inside your virtual environment, run the following installation command, as prescribed by the Click documentation.
bash
$ pip install --editable .
That's it. If you make some good ones, please consider submitting them as pull requests so everyone can benefit.
Bumps ipython from 8.7.0 to 8.10.0.
15ea1ed
release 8.10.0560ad10
DOC: Update what's new for 8.10 (#13939)7557ade
DOC: Update what's new for 8.10385d693
Merge pull request from GHSA-29gw-9793-fvw7e548ee2
Swallow potential exceptions from showtraceback() (#13934)0694b08
MAINT: mock slowest test. (#13885)8655912
MAINT: mock slowest test.a011765
Isolate the attack tests with setUp and tearDown methodsc7a9470
Add some regression tests for this changefd34cf5
Swallow potential exceptions from showtraceback()Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps cryptography from 38.0.4 to 39.0.1.
Sourced from cryptography's changelog.
39.0.1 - 2023-02-07
* **SECURITY ISSUE** - Fixed a bug where ``Cipher.update_into`` accepted Python buffer protocol objects, but allowed immutable buffers. **CVE-2023-23931** * Updated Windows, macOS, and Linux wheels to be compiled with OpenSSL 3.0.8.
.. _v39-0-0:
39.0.0 - 2023-01-01
- BACKWARDS INCOMPATIBLE: Support for OpenSSL 1.1.0 has been removed. Users on older version of OpenSSL will need to upgrade.
- BACKWARDS INCOMPATIBLE: Dropped support for LibreSSL < 3.5. The new minimum LibreSSL version is 3.5.0. Going forward our policy is to support versions of LibreSSL that are available in versions of OpenBSD that are still receiving security support.
- BACKWARDS INCOMPATIBLE: Removed the
encode_point
andfrom_encoded_point
methods on :class:~cryptography.hazmat.primitives.asymmetric.ec.EllipticCurvePublicNumbers
, which had been deprecated for several years. :meth:~cryptography.hazmat.primitives.asymmetric.ec.EllipticCurvePublicKey.public_bytes
and :meth:~cryptography.hazmat.primitives.asymmetric.ec.EllipticCurvePublicKey.from_encoded_point
should be used instead.- BACKWARDS INCOMPATIBLE: Support for using MD5 or SHA1 in :class:
~cryptography.x509.CertificateBuilder
, other X.509 builders, and PKCS7 has been removed.- BACKWARDS INCOMPATIBLE: Dropped support for macOS 10.10 and 10.11, macOS users must upgrade to 10.12 or newer.
- ANNOUNCEMENT: The next version of
cryptography
(40.0) will change the way we link OpenSSL. This will only impact users who buildcryptography
from source (i.e., not from awheel
), and specify their own version of OpenSSL. For those users, theCFLAGS
,LDFLAGS
,INCLUDE
,LIB
, andCRYPTOGRAPHY_SUPPRESS_LINK_FLAGS
environment variables will no longer be respected. Instead, users will need to configure their buildsas documented here
_.- Added support for :ref:
disabling the legacy provider in OpenSSL 3.0.x<legacy-provider>
.- Added support for disabling RSA key validation checks when loading RSA keys via :func:
~cryptography.hazmat.primitives.serialization.load_pem_private_key
, :func:~cryptography.hazmat.primitives.serialization.load_der_private_key
, and :meth:~cryptography.hazmat.primitives.asymmetric.rsa.RSAPrivateNumbers.private_key
. This speeds up key loading but is :term:unsafe
if you are loading potentially attacker supplied keys.- Significantly improved performance for :class:
~cryptography.hazmat.primitives.ciphers.aead.ChaCha20Poly1305
... (truncated)
d6951dc
changelog + security fix backport (#8231)138da90
workaround scapy bug in downstream tests (#8218) (#8228)69527bc
bookworm is py311 now (#8200)111deef
backport main branch CI to 39.0.x (#8153)338a65a
39.0.0 version bump (#7954)84a3cd7
automatically download and upload circleci wheels (#7949)525c0b3
Type annotate release.py (#7951)46d2a94
Use the latest 3.10 release when wheel building (#7953)f150dc1
fix CI to work with ubuntu 22.04 (#7950)8867724
fix README for python3 (#7947)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps certifi from 2022.9.24 to 2022.12.7.
9e9e840
2022.12.07Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
This is somewhat related to #2.
I find this project to be extremely useful and a great framework for a task that I have to do often. In my projects, I've found myself using the base classes and concepts from this project when I want to download and process data from other Census Bureau API sources.
However, for non-ACS sources, I find myself entirely reimplementing many of the methods on my geotype downloader classes because the changes in functionality aren't possible by just calling super()
and then adding additional logic.
I think adding these methods to BaseGeoTypeDownloader
could make adding additional data sources easier, both in this project, and for other users in their own projects:
BaseGeoTypeDownloader.get_api_client()
: This would be called from the constructor to set sefl.api
and allow subclasses to specify a customized subclass of census.Census
that supports additional API endpoints.BaseGeoTypeDownloader.get_field_type_map()
: This would be similar to BaseGeoTypeDownloader.get_raw_field_map()
except it would map from raw field names to types that would be passed to pd.Series.astype()
. Like BaseGeoTypeDownloader.get_raw_field_map()
, this would be called from BaseGeoTypeDownloader.process()
when setting the column types after reading in the raw table. The implementation could check for the existence of a FIELD_TYPES
attribute on the table configuration class, and if that doesn't exist, default to the existing logic for ACS tables that checks the field name suffix. Adding the ability to explicitly set type conversions allows supporting non-ACS tables that might have field names that don't have the same suffix convention as ACS tables.Reporting, editing, computer programming
GitHub Repositorycensus journalism data-journalism news python pandas api-wrapper demographics mapping-la-pipeline