py-lexml-acervo
é um wrapper para a API de consulta do acervo do projeto LexML.
Trata-se de um portal especializado em informação jurídica e legislativa. Pretende-se reunir leis, decretos, acórdãos, súmulas, projetos de leis entre outros documentos das esferas federal, estadual e municipal dos Poderes Executivo, Legislativo e Judiciário de todo o Brasil: uma rede de informação legislativa e jurídica que pretende organizar, integrar e dar acesso às informações disponibilizadas nos diversos portais de órgãos do governo na Internet. @Fonte.
A API do LexML permite realizar pesquisas por meio de URLs e receber o resultado no formato XML. A API segue o padrão definido pelo Biblioteca do Congresso Norte-americano no projeto Search/Retrieval via URL (SRU).
Entre os dias 12 a 13 de agosto de 2019 por meio da presente API foram coletados os metadados do acervo do LexML de publicações datadas entre os anos de 1556 a 2019. O referido dataset está disponível na plataforma Kaggle. Para acessá-lo basta clicar na imagem abaixo. O dataset possui aproximadamente 2GB compactado, e descompacto em torno de 10GB.
<tipoDocumento>
<facet-tipoDocumento>
<date>
<urn>
<localidade>
<facet-localidade>
<autoridade>
<facet-autoridade>
<title>
<description>
<subject>
<type>
<identifier>
xml
<srw:recordData>
<srw_dc:dc xsi:schemaLocation="info:srw/schema/1/dc-schema http://www.loc.gov/z3950/agency/zing/srw/dc-schema.xsd">
<tipoDocumento>Lei</tipoDocumento>
<facet-tipoDocumento>Legislação::Lei</facet-tipoDocumento>
<dc:date>1983-12-05</dc:date>
<urn>urn:lex:br;rondonia:estadual:lei:1983-12-05;12</urn>
<localidade>Rondônia</localidade>
<facet-localidade>Estados::Rondônia</facet-localidade>
<autoridade>Estadual</autoridade>
<facet-autoridade>Estadual</facet-autoridade>
<dc:title>Lei ordinária nº 12, de 05 de Dezembro de 1983</dc:title>
<dc:description>ABRE CRÉDITO SUPLEMENTAR NO ORÇAMENTO VIGENTE.</dc:description>
<dc:subject>CRÉDITO SUPLEMENTAR</dc:subject>
<dc:type>html</dc:type>
<dc:identifier>000349953</dc:identifier>
</srw_dc:dc>
</srw:recordData>
git clone https://github.com/netoferraz/py-lexml-acervo.git
cd py-lexml-acervo
python -m venv .venv
source ./.venv/bin/activate
pip install -r requirements.txt
As consultas à API se baseiam na classe LexmlAcervo
localizada em ./apis/acervo
. Para realizar uma consulta se deve instanciar a referida classe passando como parâmetro de inicialização uma query string
no padrão CQL.
```python
consulta_acervo_2019 = LexmlAcervo("date=2019")
``
A efetiva consulta à API ocorre pela chamada de um dos métodos a seguir:
queryou
automatic_pagination. O primeiro é um chamado manual à API passando os parâmetros
startRecorde
maximumRecordsPerPageque são respectivamente: a posição de início no set de resultados retornado pela API gerado pela
query_string, enquanto o segundo faz uso dos mesmos parâmetros realizando um esgotamento da resposta associada a
query_string. Portanto, ao se utilizar o método
automatic_pagination` ele fará a paginação automática de todo o resultado associado à consulta.
```python
urn_decreto_2018 = LexmlAcervo("urn any decreto and date any 2018")
tree, start_record_next_query, maximumRecordsPerPage = urn_decreto_2018.query(startRecord=1, maximumRecordsPerPage=100) ```
```python
urn_decreto_2018 = LexmlAcervo("urn any decreto and date any 2018")
urn_decreto_2018.automatic_pagination(startRecord=1, maximumRecordsPerPage=100) ```
Cada página do set de resultados da query string
será armazenado em um arquivo XML
distinto. Para
realizar a persistência dos resultados deve-se invocar o método saveResults
da instância da classe LexmlAcervo
. O referido método possui dois parâmetros obrigatórios: path
e filename
. O primeiro é o diretório onde se deseja salvar os arquivos e o segundo é o nome base dos arquivos.
python
urn_decreto_2018.saveResults("./data/xml/decreto/2018", "2018")
A classe XmlToJson
oferece uma interface para converter os arquivos XML em JSON.
```python
year = 2018
xmlpath = Path(f"./data/xml/decreto/{year}").glob("*.xml")
jsonFiles = [XmlToJson(xmlfile).parseToJson() for xmlfile in xmlpath]
XmlToJson.saveResults(jsonFiles, f"./data/json/{year}", f"{year}") ```
Bumps certifi from 2019.6.16 to 2022.12.7.
9e9e840
2022.12.07b81bdb2
2022.09.24939a28f
2022.09.14aca828a
2022.06.15.2de0eae1
Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...b8eb5e9
2022.06.15.147fb7ab
Fix deprecation warning on Python 3.11 (#199)b0b48e0
fixes #198 -- update link in license9d514b4
2022.06.154151e88
Add py.typed to MANIFEST.in to package in sdist (#196)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps numpy from 1.18.4 to 1.22.0.
Sourced from numpy's releases.
v1.22.0
NumPy 1.22.0 Release Notes
NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:
- Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
- A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
- NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
- New methods for
quantile
,percentile
, and related functions. The new methods provide a complete set of the methods commonly found in the literature.- A new configurable allocator for use by downstream projects.
These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.
The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.
Expired deprecations
Deprecated numeric style dtype strings have been removed
Using the strings
"Bytes0"
,"Datetime64"
,"Str0"
,"Uint32"
, and"Uint64"
as a dtype will now raise aTypeError
.(gh-19539)
Expired deprecations for
loads
,ndfromtxt
, andmafromtxt
in npyio
numpy.loads
was deprecated in v1.15, with the recommendation that users usepickle.loads
instead.ndfromtxt
andmafromtxt
were both deprecated in v1.17 - users should usenumpy.genfromtxt
instead with the appropriate value for theusemask
parameter.(gh-19615)
... (truncated)
4adc87d
Merge pull request #20685 from charris/prepare-for-1.22.0-releasefd66547
REL: Prepare for the NumPy 1.22.0 release.125304b
wipc283859
Merge pull request #20682 from charris/backport-204165399c03
Merge pull request #20681 from charris/backport-20954f9c45f8
Merge pull request #20680 from charris/backport-20663794b36f
Update armccompiler.pyd93b14e
Update test_public_api.py7662c07
Update init.py311ab52
Update armccompiler.pyDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps ipython from 7.7.0 to 7.16.3.
d43c7c7
release 7.16.35fa1e40
Merge pull request from GHSA-pq7m-3gw7-gq5x8df8971
back to dev9f477b7
release 7.16.2138f266
bring back release helper from master branch5aa3634
Merge pull request #13341 from meeseeksmachine/auto-backport-of-pr-13335-on-7...bcae8e0
Backport PR #13335: What's new 7.16.28fcdcd3
Pin Jedi to <0.17.2.2486838
release 7.16.120bdc6f
fix conda buildDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps urllib3 from 1.25.3 to 1.26.5.
Sourced from urllib3's releases.
1.26.5
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
- Fixed deprecation warnings emitted in Python 3.10.
- Updated vendored
six
library to 1.16.0.- Improved performance of URL parser when splitting the authority component.
If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors
1.26.4
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
- Changed behavior of the default
SSLContext
when connecting to HTTPS proxy during HTTPS requests. The defaultSSLContext
now setscheck_hostname=True
.If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors
1.26.3
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
Fixed bytes and string comparison issue with headers (Pull #2141)
Changed
ProxySchemeUnknown
error message to be more actionable if the user supplies a proxy URL without a scheme (Pull #2107)If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors
1.26.2
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
- Fixed an issue where
wrap_socket
andCERT_REQUIRED
wouldn't be imported properly on Python 2.7.8 and earlier (Pull #2052)1.26.1
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
- Fixed an issue where two
User-Agent
headers would be sent if aUser-Agent
header key is passed asbytes
(Pull #2047)1.26.0
:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap
Added support for HTTPS proxies contacting HTTPS servers (Pull #1923, Pull #1806)
Deprecated negotiating TLSv1 and TLSv1.1 by default. Users that still wish to use TLS earlier than 1.2 without a deprecation warning should opt-in explicitly by setting
ssl_version=ssl.PROTOCOL_TLSv1_1
(Pull #2002) Starting in urllib3 v2.0: Connections that receive aDeprecationWarning
will failDeprecated
Retry
optionsRetry.DEFAULT_METHOD_WHITELIST
,Retry.DEFAULT_REDIRECT_HEADERS_BLACKLIST
andRetry(method_whitelist=...)
in favor ofRetry.DEFAULT_ALLOWED_METHODS
,Retry.DEFAULT_REMOVE_HEADERS_ON_REDIRECT
, andRetry(allowed_methods=...)
(Pull #2000) Starting in urllib3 v2.0: Deprecated options will be removed
... (truncated)
Sourced from urllib3's changelog.
1.26.5 (2021-05-26)
- Fixed deprecation warnings emitted in Python 3.10.
- Updated vendored
six
library to 1.16.0.- Improved performance of URL parser when splitting the authority component.
1.26.4 (2021-03-15)
- Changed behavior of the default
SSLContext
when connecting to HTTPS proxy during HTTPS requests. The defaultSSLContext
now setscheck_hostname=True
.1.26.3 (2021-01-26)
Fixed bytes and string comparison issue with headers (Pull #2141)
Changed
ProxySchemeUnknown
error message to be more actionable if the user supplies a proxy URL without a scheme. (Pull #2107)1.26.2 (2020-11-12)
- Fixed an issue where
wrap_socket
andCERT_REQUIRED
wouldn't be imported properly on Python 2.7.8 and earlier (Pull #2052)1.26.1 (2020-11-11)
- Fixed an issue where two
User-Agent
headers would be sent if aUser-Agent
header key is passed asbytes
(Pull #2047)1.26.0 (2020-11-10)
NOTE: urllib3 v2.0 will drop support for Python 2.
Read more in the v2.0 Roadmap <https://urllib3.readthedocs.io/en/latest/v2-roadmap.html>
_.Added support for HTTPS proxies contacting HTTPS servers (Pull #1923, Pull #1806)
Deprecated negotiating TLSv1 and TLSv1.1 by default. Users that still wish to use TLS earlier than 1.2 without a deprecation warning
... (truncated)
d161647
Release 1.26.52d4a3fe
Improve performance of sub-authority splitting in URL2698537
Update vendored six to 1.16.007bed79
Fix deprecation warnings for Python 3.10 ssl moduled725a9b
Add Python 3.10 to GitHub Actions339ad34
Use pytest==6.2.4 on Python 3.10+f271c9c
Apply latest Black formatting1884878
[1.26] Properly proxy EOF on the SSLTransport test suitea891304
Release 1.26.48d65ea1
Merge pull request from GHSA-5phf-pp7p-vc2rDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps pygments from 2.4.2 to 2.7.4.
Sourced from pygments's releases.
2.7.4
Updated lexers:
Fix infinite loop in SML lexer (#1625)
Fix backtracking string regexes in JavaScript/TypeScript, Modula2 and many other lexers (#1637)
Limit recursion with nesting Ruby heredocs (#1638)
Fix a few inefficient regexes for guessing lexers
Fix the raw token lexer handling of Unicode (#1616)
Revert a private API change in the HTML formatter (#1655) -- please note that private APIs remain subject to change!
Fix several exponential/cubic-complexity regexes found by Ben Caller/Doyensec (#1675)
Fix incorrect MATLAB example (#1582)
Thanks to Google's OSS-Fuzz project for finding many of these bugs.
2.7.3
... (truncated)
Sourced from pygments's changelog.
Version 2.7.4
(released January 12, 2021)
Updated lexers:
Fix infinite loop in SML lexer (#1625)
Fix backtracking string regexes in JavaScript/TypeScript, Modula2 and many other lexers (#1637)
Limit recursion with nesting Ruby heredocs (#1638)
Fix a few inefficient regexes for guessing lexers
Fix the raw token lexer handling of Unicode (#1616)
Revert a private API change in the HTML formatter (#1655) -- please note that private APIs remain subject to change!
Fix several exponential/cubic-complexity regexes found by Ben Caller/Doyensec (#1675)
Fix incorrect MATLAB example (#1582)
Thanks to Google's OSS-Fuzz project for finding many of these bugs.
Version 2.7.3
(released December 6, 2020)
... (truncated)
4d555d0
Bump version to 2.7.4.fc3b05d
Update CHANGES.ad21935
Revert "Added dracula theme style (#1636)"e411506
Prepare for 2.7.4 release.275e34d
doc: remove Perl 6 ref2e7e8c4
Fix several exponential/cubic complexity regexes found by Ben Caller/Doyenseceb39c43
xquery: fix pop from empty stack2738778
fix coding style in test_analyzer_lexer02e0f09
Added 'ERROR STOP' to fortran.py keywords. (#1665)c83fe48
support added for css variables (#1633)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Olá, tudo certo? Estou implementando um sistema para buscar os documentos jurídicos do LexML, mas gostaria de poder ordenar os resultados por data. Saberia me dizer como funciona? Tentei procurar na documentação do CQL e do SRU, mas não encontrei algo que funcionasse.
PS.: ótimo projeto este seu!