A Flask+Elasticsearch UI for exploring the DC Inbox dataset from http://web.stevens.edu/dcinbox/Home.html
See it in action at https://dcinbox.herokuapp.com/
DC Inbox is a collection of over 60,000 official e-newsletters sent from members of the U.S. Congress to their constituents, established by Dr. Lindsey Cormack at Stevens Institute of Technology.
This repository contains a Flask app which uses Elasticsearch to provide a faceted search interface to the collection, along with some scripts to import the dataset into Elasticsearch.
You'll need an Elasticsearch server. The easiest way to get this is to download Elasticsearch 5 from https://www.elastic.co/downloads/elasticsearch then extract the archive and run it locally using bin/elasticsearch
I recommend setting up a Python virtual environment. Assuming you have virtualenv installed, you can do so like this:
cd dcinbox_explorer
virtualenv venv
source venv/bin/activate
Then install the dependencies:
pip install -r requirements.txt
Next, initialize the Elasticsearch index:
ELASTICSEARCH_HOST='localhost' \
ELASTICSEARCH_PORT=9200 \
ELASTICSEARCH_AUTH='' \
venv/bin/python script_create_index.py
Now you can run the development server using:
ELASTICSEARCH_HOST='localhost' \
ELASTICSEARCH_PORT=9200 \
ELASTICSEARCH_AUTH='' \
FLASK_APP=dcinbox.py \
FLASK_DEBUG=1 \
venv/bin/flask run
The site should be available on http://localhost:5000/
First, download the dataset from http://web.stevens.edu/dcinbox/dataset.json (around 250MB).
Then run the following:
ELASTICSEARCH_HOST='localhost' \
ELASTICSEARCH_PORT=9200 \
ELASTICSEARCH_AUTH='' \
venv/bin/python importer.py dataset.json
The importer script can accept a URL instead of a filename.
The app is ready to host on Heroku - it has the necessary Procfile already in place. You'll need to specify the Elasticsearch server location using environment variables like so:
heroku config:set ELASTICSEARCH_HOST=blah.us-east-1.aws.found.io
heroku config:set ELASTICSEARCH_PORT=9243
heroku config:set ELASTICSEARCH_AUTH='user:password'
heroku config:set ELASTICSEARCH_USE_SSL=1
Once you have deployed to Heroku you can configure the index like so:
heroku run python script_create_index.py
And even kick off a full dataset import like so:
heroku run python importer.py http://web.stevens.edu/dcinbox/dataset.json
Bumps ipython from 5.1.0 to 7.16.3.
d43c7c7
release 7.16.35fa1e40
Merge pull request from GHSA-pq7m-3gw7-gq5x8df8971
back to dev9f477b7
release 7.16.2138f266
bring back release helper from master branch5aa3634
Merge pull request #13341 from meeseeksmachine/auto-backport-of-pr-13335-on-7...bcae8e0
Backport PR #13335: What's new 7.16.28fcdcd3
Pin Jedi to <0.17.2.2486838
release 7.16.120bdc6f
fix conda buildDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps urllib3 from 1.19 to 1.26.5.
Sourced from urllib3's releases.
1.26.5
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
- Fixed deprecation warnings emitted in Python 3.10.
- Updated vendored
six
library to 1.16.0.- Improved performance of URL parser when splitting the authority component.
If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors
1.26.4
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
- Changed behavior of the default
SSLContext
when connecting to HTTPS proxy during HTTPS requests. The defaultSSLContext
now setscheck_hostname=True
.If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors
1.26.3
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
Fixed bytes and string comparison issue with headers (Pull #2141)
Changed
ProxySchemeUnknown
error message to be more actionable if the user supplies a proxy URL without a scheme (Pull #2107)If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors
1.26.2
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
- Fixed an issue where
wrap_socket
andCERT_REQUIRED
wouldn't be imported properly on Python 2.7.8 and earlier (Pull #2052)1.26.1
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
- Fixed an issue where two
User-Agent
headers would be sent if aUser-Agent
header key is passed asbytes
(Pull #2047)1.26.0
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
Added support for HTTPS proxies contacting HTTPS servers (Pull #1923, Pull #1806)
Deprecated negotiating TLSv1 and TLSv1.1 by default. Users that still wish to use TLS earlier than 1.2 without a deprecation warning should opt-in explicitly by setting
ssl_version=ssl.PROTOCOL_TLSv1_1
(Pull #2002) Starting in urllib3 v2.0: Connections that receive aDeprecationWarning
will failDeprecated
Retry
optionsRetry.DEFAULT_METHOD_WHITELIST
,Retry.DEFAULT_REDIRECT_HEADERS_BLACKLIST
andRetry(method_whitelist=...)
in favor ofRetry.DEFAULT_ALLOWED_METHODS
,Retry.DEFAULT_REMOVE_HEADERS_ON_REDIRECT
, andRetry(allowed_methods=...)
(Pull #2000) Starting in urllib3 v2.0: Deprecated options will be removed
... (truncated)
Sourced from urllib3's changelog.
1.26.5 (2021-05-26)
- Fixed deprecation warnings emitted in Python 3.10.
- Updated vendored
six
library to 1.16.0.- Improved performance of URL parser when splitting the authority component.
1.26.4 (2021-03-15)
- Changed behavior of the default
SSLContext
when connecting to HTTPS proxy during HTTPS requests. The defaultSSLContext
now setscheck_hostname=True
.1.26.3 (2021-01-26)
Fixed bytes and string comparison issue with headers (Pull #2141)
Changed
ProxySchemeUnknown
error message to be more actionable if the user supplies a proxy URL without a scheme. (Pull #2107)1.26.2 (2020-11-12)
- Fixed an issue where
wrap_socket
andCERT_REQUIRED
wouldn't be imported properly on Python 2.7.8 and earlier (Pull #2052)1.26.1 (2020-11-11)
- Fixed an issue where two
User-Agent
headers would be sent if aUser-Agent
header key is passed asbytes
(Pull #2047)1.26.0 (2020-11-10)
NOTE: urllib3 v2.0 will drop support for Python 2.
Read more in the v2.0 Roadmap <https://urllib3.readthedocs.io/en/latest/v2-roadmap.html>
_.Added support for HTTPS proxies contacting HTTPS servers (Pull #1923, Pull #1806)
Deprecated negotiating TLSv1 and TLSv1.1 by default. Users that still wish to use TLS earlier than 1.2 without a deprecation warning
... (truncated)
d161647
Release 1.26.52d4a3fe
Improve performance of sub-authority splitting in URL2698537
Update vendored six to 1.16.007bed79
Fix deprecation warnings for Python 3.10 ssl moduled725a9b
Add Python 3.10 to GitHub Actions339ad34
Use pytest==6.2.4 on Python 3.10+f271c9c
Apply latest Black formatting1884878
[1.26] Properly proxy EOF on the SSLTransport test suitea891304
Release 1.26.48d65ea1
Merge pull request from GHSA-5phf-pp7p-vc2rDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps pygments from 2.1.3 to 2.7.4.
Sourced from pygments's releases.
2.7.4
Updated lexers:
Fix infinite loop in SML lexer (#1625)
Fix backtracking string regexes in JavaScript/TypeScript, Modula2 and many other lexers (#1637)
Limit recursion with nesting Ruby heredocs (#1638)
Fix a few inefficient regexes for guessing lexers
Fix the raw token lexer handling of Unicode (#1616)
Revert a private API change in the HTML formatter (#1655) -- please note that private APIs remain subject to change!
Fix several exponential/cubic-complexity regexes found by Ben Caller/Doyensec (#1675)
Fix incorrect MATLAB example (#1582)
Thanks to Google's OSS-Fuzz project for finding many of these bugs.
2.7.3
... (truncated)
Sourced from pygments's changelog.
Version 2.7.4
(released January 12, 2021)
Updated lexers:
Fix infinite loop in SML lexer (#1625)
Fix backtracking string regexes in JavaScript/TypeScript, Modula2 and many other lexers (#1637)
Limit recursion with nesting Ruby heredocs (#1638)
Fix a few inefficient regexes for guessing lexers
Fix the raw token lexer handling of Unicode (#1616)
Revert a private API change in the HTML formatter (#1655) -- please note that private APIs remain subject to change!
Fix several exponential/cubic-complexity regexes found by Ben Caller/Doyensec (#1675)
Fix incorrect MATLAB example (#1582)
Thanks to Google's OSS-Fuzz project for finding many of these bugs.
Version 2.7.3
(released December 6, 2020)
... (truncated)
4d555d0
Bump version to 2.7.4.fc3b05d
Update CHANGES.ad21935
Revert "Added dracula theme style (#1636)"e411506
Prepare for 2.7.4 release.275e34d
doc: remove Perl 6 ref2e7e8c4
Fix several exponential/cubic complexity regexes found by Ben Caller/Doyenseceb39c43
xquery: fix pop from empty stack2738778
fix coding style in test_analyzer_lexer02e0f09
Added 'ERROR STOP' to fortran.py keywords. (#1665)c83fe48
support added for css variables (#1633)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps jinja2 from 2.8 to 2.11.3.
Sourced from jinja2's releases.
2.11.3
This contains a fix for a speed issue with the
urlize
filter.urlize
is likely to be called on untrusted user input. For certain inputs some of the regular expressions used to parse the text could take a very long time due to backtracking. As part of the fix, the email matching became slightly stricter. The various speedups apply tourlize
in general, not just the specific input cases.
- PyPI: https://pypi.org/project/Jinja2/2.11.3/
- Changes: https://jinja.palletsprojects.com/en/2.11.x/changelog/#version-2-11-3
2.11.2
2.11.1
This fixes an issue in async environment when indexing the result of an attribute lookup, like
{{ data.items[1:] }}
.2.11.0
- Changes: https://jinja.palletsprojects.com/en/2.11.x/changelog/#version-2-11-0
- Blog: https://palletsprojects.com/blog/jinja-2-11-0-released/
- Twitter: https://twitter.com/PalletsTeam/status/1221883554537230336
This is the last version to support Python 2.7 and 3.5. The next version will be Jinja 3.0 and will support Python 3.6 and newer.
2.10.3
2.10.2
2.10.1
- Changes: https://jinja.palletsprojects.com/en/2.10.x/changelog/#version-2-10-1
- Blog: https://palletsprojects.com/blog/jinja-2-10-1-released/
- Twitter: https://twitter.com/PalletsTeam/status/1114605127308992513
2.10
Primary changes
- A
NativeEnvironment
that renders Python types instead of strings. http://jinja.pocoo.org/docs/2.10/nativetypes/- A
namespace
object that works with{% set %}
. This replaces previous hacks for storing state across iterations or scopes. http://jinja.pocoo.org/docs/2.10/templates/#assignments- The
loop
object now hasnextitem
andprevitem
attributes, as well as achanged
method, for the common case of outputting something as a value in the loop changes. More complicated cases can use thenamespace
object. http://jinja.pocoo.org/docs/2.10/templates/#forInstall or upgrade
Install from PyPI with pip:
... (truncated)
Sourced from jinja2's changelog.
Version 2.11.3
Released 2021-01-31
- Improve the speed of the
urlize
filter by reducing regex backtracking. Email matching requires a word character at the start of the domain part, and only word characters in the TLD. :pr:1343
Version 2.11.2
Released 2020-04-13
- Fix a bug that caused callable objects with
__getattr__
, like :class:~unittest.mock.Mock
to be treated as a :func:contextfunction
. :issue:1145
- Update
wordcount
filter to trigger :class:Undefined
methods by wrapping the input in :func:soft_str
. :pr:1160
- Fix a hang when displaying tracebacks on Python 32-bit. :issue:
1162
- Showing an undefined error for an object that raises
AttributeError
on access doesn't cause a recursion error. :issue:1177
- Revert changes to :class:
~loaders.PackageLoader
from 2.10 which removed the dependency on setuptools and pkg_resources, and added limited support for namespace packages. The changes caused issues when using Pytest. Due to the difficulty in supporting Python 2 and :pep:451
simultaneously, the changes are reverted until 3.0. :pr:1182
- Fix line numbers in error messages when newlines are stripped. :pr:
1178
- The special
namespace()
assignment object in templates works in async environments. :issue:1180
- Fix whitespace being removed before tags in the middle of lines when
lstrip_blocks
is enabled. :issue:1138
- :class:
~nativetypes.NativeEnvironment
doesn't evaluate intermediate strings during rendering. This prevents early evaluation which could change the value of an expression. :issue:1186
Version 2.11.1
Released 2020-01-30
- Fix a bug that prevented looking up a key after an attribute (
{{ data.items[1:] }}
) in an async template. :issue:1141
... (truncated)
cf21539
release version 2.11.315ef8f0
Merge pull request #1343 from pallets/urlize-speedupef658dc
speed up urlize matchingeeca0fe
Merge pull request #1207 from mhansen/patch-12dd7691
Merge pull request #1209 from mhansen/patch-34892940
do_dictsort: update example ready to copy/paste7db7d33
api.rst: bugfix in docs, import PackageLoader9ec465b
fix changelog header737a4cd
release version 2.11.2179df6b
Merge pull request #1190 from pallets/native-evalDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps flask from 0.11.1 to 1.0.
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
I happened upon these installation instructions and thought "this is what Docker is for". ;)
Getting this error when running it, but it looks unrelated:
Fielddata is disabled on text fields by default. Set fielddata=true on [gender] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.
(A note about the implementation: the --build
flag to docker-compose up
isn't essential, but ensures that the development environment is always up to date by doing pip install
if requirements.txt
has changed. Really you only need to use the --build
flag if requirements.txt
changes.)