A server for multilanguage, composable NLP API in Python

andreaferretti, updated πŸ•₯ 2022-12-08 07:27:36

1. Charade

logo

A server for multilanguage, composable NLP API in Python.

1.1. Philosophy

Charade was born as a container where multiple independent natural language services can coexist and interact with each other. In order to develop on Charade, it may be useful to understand the reasons behind its implementation.

  • multiple analyses can be run over a single text - for instance named entity recognition and sentiment detection - so a request from a user should be able to specify what kind of tasks should be performed on the provided text
  • to avoid repeting work and ensure consistency, one task may be dependent on another: for instance, if both the NER and sentiment analysis rely on the same parsing stage, they will get to see the same tokens, something which would not be guaranteed if the two analyses performed tokenization internally
  • a single task could have many coexisting implementations, so that a developer would be free to experiment with new models without having to interfere with existing ones. The user consuming the service could then be able to request a particular implementation of a task by specifying its name
  • multiple implementations of a single task should offer a consistent interface, in order to ensure that clients or other downstream services can switch between them freely
  • the server should not be restricted to a single (natural) language, and various implementations should be free to decide what languages to support
  • developers implementing various models should be able to choose freely what technology to use, so various services can be implemented on top of NLTK, spaCy, pyTorch, TensorFlow, GenSim... Charade should make it easy to use any of these libraries to implement a particular model, without forcing other developers to adopt the same library
  • one should be able to implement as many tasks and models as desired, while choosing at deploy time which one are supported by the server - i.e. the server should be composable from Lego pieces

Therefore, the process of deploying Charade servers works as follows. The developers write various models to perform some tasks, possibly trying competing implementations in parallel. Various kind of models are already provided with Charade, but you should not shy from writing your own.

Once the models are ready, one writes an entry point script that actually loads only the ones that will be used in production. At every point of the process, one has available an API offering the existing models, and a user interface to try them.

1.2. What Charade is and is not

Charade is a framework that helps teams experimenting with multiple approaches to tackle some custom NLP task. It is meant to leverage existing NLP libraries, such as NLTK or spaCy, and not to replace them. A team using Charade can develop and evolve a suite of NLP capabilities - say NER, sentiment analysis and so on - while maintaining the possibility to customize them on particular datasets, and compose servers where only the relevant capabilities are deployed.

Charade is not itself a library for NLP tasks, although it provides some examples of models developed using various libraries. It is not a ready-made component either: while some of the models provided can be useful, we expect that teams using Charade will develop and customize their own models. The provided ones can serve as example, or can provide some capabilities in a larger deployment.

1.3. Installing

NB If you are on MacOS Mojave, make sure to have the XCode headers installed

xcode-select --install open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg

Also, OpenMP is required by PyTorch, on MacOS it can be installed by

brew install libomp

1.3.1. Using Pipenv (recommended)

Install Pipenv if needed (pip install pipenv). An introduction to Pipenv can be found here.

Create a virtual environment related to this project by running pipenv shell from inside the top directory in the project.

  1. If you want to develop Charade, you can install dependencies with this command:

pipenv install --dev

If you also make the iPython kernel for Charade visible to other environments, you can use

python -m ipykernel install --user --name="charade"

In this way, you can use any installation of Jupyter to launch the charade kernel.

  1. If instead you want to try Charade without developing, then run

pipenv install --ignore-pipfile

to install all dependencies.

  1. In both cases, download the models for spacy, allen-nlp and nltk via

``` python -m spacy download en python -m spacy download it python -m spacy download de

python -m nltk.downloader averaged_perceptron_tagger python -m nltk.downloader maxent_ne_chunker python -m nltk.downloader words

mkdir -p models/allen/pretrained wget https://s3-us-west-2.amazonaws.com/allennlp/models/ner-model-2018.12.18.tar.gz -O models/allen/pretrained/ner-model-2018.12.18.tar.gz ```

1.3.1.1. Common errors

NB If you get an error that you don't have the right version of Pyhton, you can manage that through PyEnv. To install PyEnv, see the installation instructions. On MacOS just run brew install pyenv. After having install PyEnv, install the required version of Pyhton, for instance pyenv install 3.6.8.

After this step, pipenv should detect the version of pyenv automatically.

1.3.2. Using Conda and Pip

If you don't need to develop Charade itself, you can create a virtual environment in Conda by running something like conda create -n charade python=3.6, then activate it with source activate charade (any other name will do). Then install dependencies with Pip inside the environment:

pip install -r requirements.txt

Finally, update spacy models via

python -m spacy download en python -m spacy download it

NB The requirements.txt file is autogenerated by Pipenv with the command pipenv lock --requirements > requirements.txt - do not edit this file by hand.

1.4. Running

Just define the server in src/main.py, then run

python src/main.py

The existing main.py file only contains those models that do not require a custom training step. The other models are commented. You can launch any of the traning scripts - they are ready, but may be trained on toy datasets, so be ready to adjust them to your needs - and then uncomment the resulting models in the main script.

Once you have a running server, you can try some queries. An example query can be sent using examples/request.sh. You can pass a parameter to select a particular request, for instance

bash examples/request.sh reprise

You can see available examples with ls examples.

Also, there is a frontend available at http://localhost:9000/app.

1.5. Docker running

The docker can be built by using scripts/build-docker.sh. Then, to run the docker container simply do

bash docker-compose up

NB Since both uwsgi and some services (e.g. pytorch) make use of multiple threads, this can cause deadlocks. To avoid them, we need to run the uwsgi command with the option --lazy-apps as specified in the Dockerfile (see https://engineering.ticketea.com/uwsgi-preforking-lazy-apps/ for an explanation of this mechanism). Note that if the uwsgi option --processes is > 1, each worker will load the full application and thus the server startup may require a lot of time and memory. By employing multiple threads and a single process instead (e.g. --processes 1 --threads 4) the server startup is fast enough.

1.6. Endpoints

A Charade server had just two endpoints:

  • GET /: returns a JSON describing the available services
  • POST /: post a request with a text and some services to be performed

1.7. Architecture

A Charade server is defined by instantiating and putting together various services. Each service is defined by

  • a task
  • a service name
  • optional dependencies
  • an actual implementation.

Tasks are used to denote interchangeable services. For instance, there may exist various NER models, possibly using different libraries and technologies. In this case, we will define a ner task, with the only requirement that if there are various implementations of ner, they need to abide to the same interface.

Names are used to distinguish different implementations of the same task. The task/name pair should identify a unique service. For instance, one could have deployed ner services named allen, nltk, pytorch-crf, pytorch-crf-2.

Dependencies can be used to avoid repeating the same task over and over. For instance, a ner implementation may (or may not) depend on some implementation of the parse task, which takes care of tokenization. At runtime, the server will ensure that the parse task is executed before ner.

The precise mechanism is as follows. The user request contains a field called tasks, which contains the list of tasks to be executed on the given chunk of text. For instance:

json "tasks": [ {"task": "parse", "name": "spacy"}, {"task": "ner", "name": "allen"}, {"task": "dates", "name": "misc"} ]

Tasks are executed in the order requested by the user. The objects returned by the various tasks populate corresponding fields in a response dictionary. For instance, for this request, the response object will have the shape

json { "parse": ..., "ner": ..., "dates": ... }

Each service can look at the request object and the response object (the part that has been populated so far). In this way, a service can look at the output produced by other services that come before.

If a dependency for a service has not been requested explicitly by the user, the server will choose any implementation of the dependency task and execute it before the dependent task. For instance, say one has a ner service called custom which depends on parse. If the user request contains

json "tasks": [ {"task": "ner", "name": "custom"}, {"task": "dates", "name": "misc"} ]

then the server will choose any implementation of parse and perform it before ner. This has two advantages:

  • duplication is reduced, for instance the parsing and tokenization of the text can be done just once and many other services can consume it
  • one has the guarantee that all services rely on the same tokenization, giving a better consistency.

Implementations are defined by writing a class that inherits from services.Service. The methods to override are Service.run(request, response) and Service.describe() (optional, but recommended). The former has access to

  • the user request
  • the part of the response constructed so far

and has to return a dictionary containing the service output. This method can raise services.MissingLanguage if the language of the request is not supported in the given service. The class should load any needed model in its constructor, to avoid reloading models for each request.

For instance, a trivial parser that just splits sentences on period and tokens on whitespace may look like this:

```python from services import Service

class SimpleParser(Service): def init(self): pass

def run(self, request, response):
    text = request['text']
    debug = request.get('debug', False)
    result = []
    start = 0
    end = 0
    for sentence in text.split('\.'):
        tokens = []
        for token in sentence.split(' '):
            start = end + 1
            end += start + len(token)
            if debug:
                tokens.append({
                    'text': token,
                    'start': start,
                    'end': end
                })
            else:
                tokens.append({
                    'start': start,
                    'end': end
                })
        result.append(tokens)
    return result

```

1.8. Requests

The user requests have the following fields:

  • text: required, the text to be analyzed
  • debug: optional flag, default False. Services can use this flag to decide to include additional information. Also, when this flag is set, the response contains an additional field debug with general information, such as timing of the services and the resolved ordering among tasks.
  • lang: 2 letter language of the text, optional. Default: autodetect
  • previous: see Resumable requests
  • tasks: a list of requested tasks, with the shape

json "tasks": [ {"task": "parse", "name": "spacy"}, {"task": "ner", "name": "allen"}, {"task": "dates", "name": "misc"} ]

plus possibly other service-dependent fields.

1.8.1. Resumable requests

Say there are two tasks, task A and task B. Task A has a dependency on B, which is much slower. When trying various implementations for A, it does not make sense to recompute the result of task B again and again. In this case, one may want to issue a request for task B, and then a second request for task A, passing the result of the previous request. In this way, there will be no need to recompute the result of task B.

In this case, one can put a field called previous in the request. The content of the field must match the response for the previous request. In this case, the server will resume computation from that point. For instance, a user request may look like this:

json { "text": "Ulisse Dini (Pisa, 14 novembre 1845 ...", "tasks": [ {"task": "names", "name": "misc"} ], "previous": { "ner": [ { "text": "Ulisse Dini", "start": 0, "end": 11, "label": "PER" }, ... ] } }

In this example, the ner step is already computed, and does not need to be recomputed again.

1.9. Describing services

Each service can be self describing by ovverriding the method describe(self) of the Service class. This can be used to report information about supported languages, dependencies, additional parameters needed in the request, trained models and so on. The class Service already defines a basic implementation, while services can add more specific information. Some standard keys to use for this purpose are:

  • langs: the supported languages; use ['*'] if any languages are supported
  • extra-params: an optional list of additional parameters of the request accepted by the service (see example)
  • models: a dictionary containing the information about the models used by the service

For each models, the following parameters are standardized:

  • pretrained: indicates that the model is included in the library
  • trained-at: datetime in ISO format
  • training-time: as format HH:mm:ss
  • datasets: list of datasets on which the model is trained
  • metrics: a dictionary of metrics that measure the performance of the model
  • params: a dictionary of parameters that were used to train the model

A complete example of response could look like this:

python { 'task': 'some-task', 'name': 'my-name', 'deps': ['parse'], 'optional_deps': ['ner'], 'langs': ['it', 'en'], 'extra-params': [ { 'name': 'some-param1', 'type': 'string', 'required': False }, { 'name': 'some-param2', 'type': 'int', 'required': True }, { 'name': 'some-param3', 'type': 'string', 'choices': ['value1', 'value2'], 'required': True } ], 'models': { 'it': { 'pretrained': False, 'trained-at': '2019-03-27T16:00:49', 'training-time': '02:35:23', 'datasets': ['some-dataset'], 'metrics': { 'accuracy': 0.935, 'precision': 0.87235, 'recall': 0.77253 }, 'params': { 'learning-rate': 0.001, 'momentum': 0.8, 'num-epochs': 50 }, }, 'en': { 'pretrained': True } } }

You can use the extra-params field to describe additional parameters that are required (or optional) for a specific service. Each extra parameter can take the shape

python { 'name': <string>, 'type': <string>, 'choices': <string list?>, 'required': <bool> }

where type can take the values "string" or "int", and choices can be used to optionally constrain the valid values for the parameter.

1.10. Services

The following services are defined. To read the interface: output types are written inside <>. A trailing ? denotes that the field is only present when debug is True in the user request.

1.10.1. Parsing

Splits the text into sentences and the sentences into tokens. The interface requires that the output has the shape

python [ [ {'start': <int>, 'end': <int>, 'text': <string?>}, ... ] ]

1.10.2. NER

Finds people, organizations, dates, places and other entities in the text. The interface requires that the output has the shape

python [ {'start': <int>, 'end': <int>, 'text': <string?>, 'label': <string>}, ... ]

1.10.3. Date extraction

Finds and parses dates in the text. The interface requires that the output has the shape

python [ {'start': <int>, 'end': <int>, 'text': <string?>, 'date': <string>}, ... ]

where date is formatted as yyyy-MM-dd.

1.10.4. Codes extraction

Finds common codes in the text. The interface requires that the output has the shape

python [ {'start': <int>, 'end': <int>, 'text': <string>, 'type': <string>, 'lang': <lang code>}, ... ]

1.10.5. Fiscal codes

Extracts information from fiscal codes. The interface requires that the output has the shape

python [ {'start': <int>, 'end': <int>, 'text': <string>, 'type': <string>, 'lang': <lang code>, 'correct': <bool>, # if the fiscal code is formally correct 'sex': <sex code>, 'birthdate' <string> } ]

1.10.6. Extractive summarization

Extracts the sentences from the text that best summarize it. The interface requires that the output has the shape

python [ {'start': <int>, 'end': <int>, 'text': <string?>}, ... ]

where the sentences are in order from most informative to least informative.

It can require additional (optional) parameters in the request:

  • num-extractive-sentences: the number of sentences to extract

1.10.7. Keyword extraction

Extracts the most relevant keywords from the text. The interface requires that the output has the shape

python [ {'text': <string>}, ... ]

where the keywords are in order from most to least relevant. Here we do not use spans, since the important information is the keyword, which is probably repeated many times across the text.

It can require additional (optional) parameters in the request:

  • num-keywords: the number of keywords to extract

1.10.8. Sentiment detection

Detects the sentiment used in various sentences of the text. The interface requires that the output has the shape

python [ {'start': <int>, 'end': <int>, 'sentiment': <float>, 'text': <string?>}, ... ]

where there is an entry for each sentence, and sentiment ranges from 0 (extremely negative) to 1 (extremely positive).

1.10.9. Names

Extract names and surnames of people mentioned in the text. It is a more refined version of NER, which just retrieves entities of type PER.

The interface requires that the output has the shape

python [ {'start': <int>, 'end': <int>, 'name': <string?>, 'surname': <string?>}, ... ]

1.10.10. Topic modeling

Does a soft clustering of text (for instance using LDA or similar techniques). This means that the text is associated to a distribution over topics. Topics themselves are discovered as a word mixture from the training data. The interface requires that the output has the shape

python { 'distribution': <array[float]>, 'best-topic': <int>, 'best-score': <float>, 'topics': <array[array[string]]?>, }

where each topic is represented with the arrary of its most representative words. The topics field is only present in debug mode.

It can require additional (optional) parameters in the request:

  • lda-model: the name of a pretrained LDA model

1.10.11. Classification

Does a classification of the text in a pre-trained and finite set of possible classes. This means that the text is associated to a distribution over possible classes, of which we only output the most fitting. The interface requires that the output has the shape

python { 'category': <string>, 'category_probability': <float>, 'distribution': <map[string, float]?> }

The distribution field is only present in debug mode.

1.11. How to create a new service

Create a new class in a file inside src/services which inherits from services.Service. In this class, make sure to call the Service constructor to register the service, like this:

python class SomeService(Service): def __init__(self, langs): Service.__init__(self, 'some-task', 'some-name', [], []) # first required deps, then optional deps ... Override the method def run(self, request, response) which implements the logic for your service. The return type for the service should be any dictionary.

Also, override the method describe(self) to return information about the service itself. A basic implementation of describe is in the Service class, so a standard implementation would look like:

python def describe(self): result = super().describe() result['key'] = value # more keys return result

For the common keys, see the section on Describing services.

Be sure to check out the following things:

  • The return type of run should be JSON serializable
  • If your service defines a new task, make sure to document it in the README
  • Otherwise, follow the type convention of existing services for the same task
  • If your service requires some previous step (e.g. parsing), try to add it as a dependency and do not hardcode it inside the service
  • If your service may benefit of some previous step (e.g. extra hints), you can add it as optional dependency; the main task will be performed whether or not the optional dependency is already scheduled, but if the optional dependency is scheduled anyway, it will be executed first.
  • If your service requires an optional parameter in the request, add it in the schema validator in src/server.py
  • If you cannot handle a certain language, raise services.MissingLanguage
  • If you have a model that needs a training step, follow the conventions under Organization
  • If you need an additional library, pipenv install the-library, then commit the new Pipfile and Pipfile.lock. Also remember to keep the requirements file up to date with pipenv lock --requirements > requirements.txt.
  • Add tests as needed

1.12. Testing

Tests are written with nose. If you have installed Charade in development mode (pipenv install --dev), you can run tests with the nosetests command.

Tests for a particular service should put under tests/services/test_the_service.py. The naming convention is so that Nose autodiscovery will find them when running nosetests. Classes and methods should also follow this naming convention:

python class TestTheThing(TestCase): def test_something(self): ...

You can also test here classes and functions under common. If you need to test something which is only used in training, put it under common as well.

Tests for Charade itself are placed under tests without further nesting.

1.13. Style guide

  • Follow PEP-8
  • Prefer long names such as request, result, token over req, res, tok
  • But be consistent with libraries: for instance, spacy defines document.ents Iterate over that as for ent in documents.ents:
  • Do not use trailing commas
  • Do not commit models or data - commit scripts to retrieve them
  • All bash scripts use set -e, set -u
  • Make sure that bash scripts can be called from anywhere (see the existing one for examples)

1.14. Organization

Follow a tree similar to the following

. β”œβ”€β”€ Pipfile β”œβ”€β”€ Pipfile.lock β”œβ”€β”€ README.md β”œβ”€β”€ TODO.md β”œβ”€β”€ data β”‚Β Β  └── ner β”‚Β Β  └── ... β”œβ”€β”€ examples β”‚Β Β  β”œβ”€β”€ request.json β”‚Β Β  β”œβ”€β”€ request.sh β”‚Β Β  β”œβ”€β”€ request2.json β”‚Β Β  └── request3.json β”œβ”€β”€ models β”‚Β Β  └── pytorch β”‚Β Β  └── ner β”‚Β Β  └── ... β”œβ”€β”€ requirements.txt β”œβ”€β”€ resources β”‚Β Β  β”œβ”€β”€ names β”‚Β Β  β”‚Β Β  └── it.txt β”‚Β Β  β”œβ”€β”€ stopwords β”‚Β Β  β”‚Β Β  └── en.txt β”‚Β Β  └── surnames β”‚Β Β  └── it.txt β”œβ”€β”€ scripts β”‚Β Β  └── pytorch β”‚Β Β  └── ner β”‚Β Β  └── it β”‚Β Β  β”œβ”€β”€ 1-get-data.sh β”‚Β Β  β”œβ”€β”€ 2-prepare-data.sh β”‚Β Β  └── 3-train.sh β”œβ”€β”€ src β”‚Β Β  β”œβ”€β”€ __init__.py β”‚Β Β  β”œβ”€β”€ common β”‚Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py β”‚Β Β  β”‚Β Β  └── pytorch β”‚Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py β”‚Β Β  β”‚Β Β  └── ner β”‚Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py β”‚Β Β  β”‚Β Β  └── model.py β”‚Β Β  β”œβ”€β”€ main.py β”‚Β Β  β”œβ”€β”€ server.py β”‚Β Β  β”œβ”€β”€ services β”‚Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py β”‚Β Β  β”‚Β Β  β”œβ”€β”€ allen.py β”‚Β Β  β”‚Β Β  β”œβ”€β”€ misc.py β”‚Β Β  β”‚Β Β  β”œβ”€β”€ pytorch.py β”‚Β Β  β”‚Β Β  β”œβ”€β”€ regex.py β”‚Β Β  β”‚Β Β  β”œβ”€β”€ spacy.py β”‚Β Β  β”‚Β Β  └── textrank.py β”‚Β Β  └── training β”‚Β Β  └── pytorch β”‚Β Β  └── ner β”‚Β Β  β”œβ”€β”€ generate_wikiner_vectors.py β”‚Β Β  └── train.py └── tests Β Β  β”œβ”€β”€ __init__.py Β Β  β”œβ”€β”€ services Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py Β Β  β”‚Β Β  └── test_textrank.py Β Β  └── test_server.py

It should be clear what goes where: data, models, resources, training and so on. When in doubt, follow existing conventions. The directory common holds code that should be shared at inference and training time.

Under data, only put data that is needed at training time - everything that is needed at inference time goes under models. If some data file is needed also at inference time, either

  • store the content of the file as a field inside the model, or
  • make sure that the training scripts copy the necessary files from data to models.

Issues

Bump certifi from 2019.9.11 to 2022.12.7

opened on 2022-12-08 07:27:35 by dependabot[bot]

Bumps certifi from 2019.9.11 to 2022.12.7.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/andreaferretti/charade/network/alerts).

Bump py from 1.8.0 to 1.10.0

opened on 2022-10-18 19:53:44 by dependabot[bot]

Bumps py from 1.8.0 to 1.10.0.

Changelog

Sourced from py's changelog.

1.10.0 (2020-12-12)

  • Fix a regular expression DoS vulnerability in the py.path.svnwc SVN blame functionality (CVE-2020-29651)
  • Update vendored apipkg: 1.4 => 1.5
  • Update vendored iniconfig: 1.0.0 => 1.1.1

1.9.0 (2020-06-24)

  • Add type annotation stubs for the following modules:

    • py.error
    • py.iniconfig
    • py.path (not including SVN paths)
    • py.io
    • py.xml

    There are no plans to type other modules at this time.

    The type annotations are provided in external .pyi files, not inline in the code, and may therefore contain small errors or omissions. If you use py in conjunction with a type checker, and encounter any type errors you believe should be accepted, please report it in an issue.

1.8.2 (2020-06-15)

  • On Windows, py.path.locals which differ only in case now have the same Python hash value. Previously, such paths were considered equal but had different hashes, which is not allowed and breaks the assumptions made by dicts, sets and other users of hashes.

1.8.1 (2019-12-27)

  • Handle FileNotFoundError when trying to import pathlib in path.common on Python 3.4 (#207).

  • py.path.local.samefile now works correctly in Python 3 on Windows when dealing with symlinks.

Commits
  • e5ff378 Update CHANGELOG for 1.10.0
  • 94cf44f Update vendored libs
  • 5e8ded5 testing: comment out an assert which fails on Python 3.9 for now
  • afdffcc Rename HOWTORELEASE.rst to RELEASING.rst
  • 2de53a6 Merge pull request #266 from nicoddemus/gh-actions
  • fa1b32e Merge pull request #264 from hugovk/patch-2
  • 887d6b8 Skip test_samefile_symlink on pypy3 on Windows
  • e94e670 Fix test_comments() in test_source
  • fef9a32 Adapt test
  • 4a694b0 Add GitHub Actions badge to README
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/andreaferretti/charade/network/alerts).

Bump protobuf from 3.10.0 to 3.18.3

opened on 2022-09-23 22:38:30 by dependabot[bot]

Bumps protobuf from 3.10.0 to 3.18.3.

Release notes

Sourced from protobuf's releases.

Protocol Buffers v3.18.3

C++

Protocol Buffers v3.16.1

Java

  • Improve performance characteristics of UnknownFieldSet parsing (#9371)

Protocol Buffers v3.18.2

Java

  • Improve performance characteristics of UnknownFieldSet parsing (#9371)

Protocol Buffers v3.18.1

Python

  • Update setup.py to reflect that we now require at least Python 3.5 (#8989)
  • Performance fix for DynamicMessage: force GetRaw() to be inlined (#9023)

Ruby

  • Update ruby_generator.cc to allow proto2 imports in proto3 (#9003)

Protocol Buffers v3.18.0

C++

  • Fix warnings raised by clang 11 (#8664)
  • Make StringPiece constructible from std::string_view (#8707)
  • Add missing capability attributes for LLVM 12 (#8714)
  • Stop using std::iterator (deprecated in C++17). (#8741)
  • Move field_access_listener from libprotobuf-lite to libprotobuf (#8775)
  • Fix #7047 Safely handle setlocale (#8735)
  • Remove deprecated version of SetTotalBytesLimit() (#8794)
  • Support arena allocation of google::protobuf::AnyMetadata (#8758)
  • Fix undefined symbol error around SharedCtor() (#8827)
  • Fix default value of enum(int) in json_util with proto2 (#8835)
  • Better Smaller ByteSizeLong
  • Introduce event filters for inject_field_listener_events
  • Reduce memory usage of DescriptorPool
  • For lazy fields copy serialized form when allowed.
  • Re-introduce the InlinedStringField class
  • v2 access listener
  • Reduce padding in the proto's ExtensionRegistry map.
  • GetExtension performance optimizations
  • Make tracker a static variable rather than call static functions
  • Support extensions in field access listener
  • Annotate MergeFrom for field access listener
  • Fix incomplete types for field access listener
  • Add map_entry/new_map_entry to SpecificField in MessageDifferencer. They record the map items which are different in MessageDifferencer's reporter.
  • Reduce binary size due to fieldless proto messages
  • TextFormat: ParseInfoTree supports getting field end location in addition to start.

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/andreaferretti/charade/network/alerts).

Bump bottle from 0.12.18 to 0.12.20

opened on 2022-06-03 22:51:33 by dependabot[bot]

Bumps bottle from 0.12.18 to 0.12.20.

Commits
  • a2b0ee6 Release of 0.12.20
  • 04b27f1 Merge branch 'release-0.12+cheroot' of https://github.com/juergh/bottle into ...
  • e140e1b Gracefully handle errors during early request binding.
  • 6e9c55a Added depr warning for the outdated cherrypy server adapter.
  • a3ba0eb Added 'cheroot' server adapter to list of server names, so it can be selected...
  • 888aa8e Add ServerAdapter for CherryPy >= 9
  • e1be22d Fix for Issue #586
  • ed32f36 Fix: Multipart file uploads with empty filename not detected as binary.
  • 1522198 Release of 0.12.19
  • 57a2f22 Do not split query strings on ; anymore.
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/andreaferretti/charade/network/alerts).

Bump ipython from 7.13.0 to 7.16.3

opened on 2022-01-21 20:31:45 by dependabot[bot]

Bumps ipython from 7.13.0 to 7.16.3.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/andreaferretti/charade/network/alerts).

Bump babel from 2.8.0 to 2.9.1

opened on 2021-10-21 18:51:31 by dependabot[bot]

Bumps babel from 2.8.0 to 2.9.1.

Release notes

Sourced from babel's releases.

Version 2.9.1

Bugfixes

  • The internal locale-data loading functions now validate the name of the locale file to be loaded and only allow files within Babel's data directory. Thank you to Chris Lyne of Tenable, Inc. for discovering the issue!

Version 2.9.0

Upcoming version support changes

  • This version, Babel 2.9, is the last version of Babel to support Python 2.7, Python 3.4, and Python 3.5.

Improvements

  • CLDR: Use CLDR 37 – Aarni Koskela (#734)
  • Dates: Handle ZoneInfo objects in get_timezone_location, get_timezone_name - Alessio Bogon (#741)
  • Numbers: Add group_separator feature in number formatting - Abdullah Javed Nesar (#726)

Bugfixes

  • Dates: Correct default Format().timedelta format to 'long' to mute deprecation warnings – Aarni Koskela
  • Import: Simplify iteration code in "import_cldr.py" – Felix Schwarz
  • Import: Stop using deprecated ElementTree methods "getchildren()" and "getiterator()" – Felix Schwarz
  • Messages: Fix unicode printing error on Python 2 without TTY. – Niklas HambΓΌchen
  • Messages: Introduce invariant that _invalid_pofile() takes unicode line. – Niklas HambΓΌchen
  • Tests: fix tests when using Python 3.9 – Felix Schwarz
  • Tests: Remove deprecated 'sudo: false' from Travis configuration – Jon Dufresne
  • Tests: Support Py.test 6.x – Aarni Koskela
  • Utilities: LazyProxy: Handle AttributeError in specified func – Nikiforov Konstantin (#724)
  • Utilities: Replace usage of parser.suite with ast.parse – Miro Hrončok

Documentation

  • Update parse_number comments – Brad Martin (#708)
  • Add iter to Catalog documentation – @​CyanNani123

Version 2.8.1

This patch version only differs from 2.8.0 in that it backports in #752.

Changelog

Sourced from babel's changelog.

Version 2.9.1

Bugfixes


* The internal locale-data loading functions now validate the name of the locale file to be loaded and only
  allow files within Babel's data directory.  Thank you to Chris Lyne of Tenable, Inc. for discovering the issue!

Version 2.9.0

Upcoming version support changes

  • This version, Babel 2.9, is the last version of Babel to support Python 2.7, Python 3.4, and Python 3.5.

Improvements


* CLDR: Use CLDR 37 – Aarni Koskela ([#734](https://github.com/python-babel/babel/issues/734))
* Dates: Handle ZoneInfo objects in get_timezone_location, get_timezone_name - Alessio Bogon ([#741](https://github.com/python-babel/babel/issues/741))
* Numbers: Add group_separator feature in number formatting - Abdullah Javed Nesar ([#726](https://github.com/python-babel/babel/issues/726))

Bugfixes


* Dates: Correct default Format().timedelta format to 'long' to mute deprecation warnings – Aarni Koskela
* Import: Simplify iteration code in &quot;import_cldr.py&quot; – Felix Schwarz
* Import: Stop using deprecated ElementTree methods &quot;getchildren()&quot; and &quot;getiterator()&quot; – Felix Schwarz
* Messages: Fix unicode printing error on Python 2 without TTY. – Niklas HambΓΌchen
* Messages: Introduce invariant that _invalid_pofile() takes unicode line. – Niklas HambΓΌchen
* Tests: fix tests when using Python 3.9 – Felix Schwarz
* Tests: Remove deprecated 'sudo: false' from Travis configuration – Jon Dufresne
* Tests: Support Py.test 6.x – Aarni Koskela
* Utilities: LazyProxy: Handle AttributeError in specified func – Nikiforov Konstantin ([#724](https://github.com/python-babel/babel/issues/724))
* Utilities: Replace usage of parser.suite with ast.parse – Miro Hrončok

Documentation
</code></pre>
<ul>
<li>Update parse_number comments – Brad Martin (<a href="https://github-redirect.dependabot.com/python-babel/babel/issues/708">#708</a>)</li>
<li>Add <strong>iter</strong> to Catalog documentation – <a href="https://github.com/CyanNani123"><code>@​CyanNani123</code></a></li>
</ul>
<h2>Version 2.8.1</h2>
<p>This is solely a patch release to make running tests on Py.test 6+ possible.</p>
<p>Bugfixes</p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>

<ul>
<li><a href="https://github.com/python-babel/babel/commit/a99fa2474c808b51ebdabea18db871e389751559"><code>a99fa24</code></a> Use 2.9.0's setup.py for 2.9.1</li>
<li><a href="https://github.com/python-babel/babel/commit/60b33e083801109277cb068105251e76d0b7c14e"><code>60b33e0</code></a> Become 2.9.1</li>
<li><a href="https://github.com/python-babel/babel/commit/412015ef642bfcc0d8ba8f4d05cdbb6aac98d9b3"><code>412015e</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/python-babel/babel/issues/782">#782</a> from python-babel/locale-basename</li>
<li><a href="https://github.com/python-babel/babel/commit/5caf717ceca4bd235552362b4fbff88983c75d8c"><code>5caf717</code></a> Disallow special filenames on Windows</li>
<li><a href="https://github.com/python-babel/babel/commit/3a700b5b8b53606fd98ef8294a56f9510f7290f8"><code>3a700b5</code></a> Run locale identifiers through <code>os.path.basename()</code></li>
<li><a href="https://github.com/python-babel/babel/commit/5afe2b2f11dcdd6090c00231d342c2e9cd1bdaab"><code>5afe2b2</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/python-babel/babel/issues/754">#754</a> from python-babel/github-ci</li>
<li><a href="https://github.com/python-babel/babel/commit/58de8342f865df88697a4a166191e880e3c84d82"><code>58de834</code></a> Replace Travis + Appveyor with GitHub Actions (WIP)</li>
<li><a href="https://github.com/python-babel/babel/commit/d1bbc08e845d03d8e1f0dfa0e04983d755f39cb5"><code>d1bbc08</code></a> import_cldr: use logging; add -q option</li>
<li><a href="https://github.com/python-babel/babel/commit/156b7fb9f377ccf58c71cf01dc69fb10c7b69314"><code>156b7fb</code></a> Quiesce CLDR download progress bar if requested (or not a TTY)</li>
<li><a href="https://github.com/python-babel/babel/commit/613dc1700f91c3d40b081948c0dd6023d8ece057"><code>613dc17</code></a> Make the import warnings about unsupported number systems less verbose</li>
<li>Additional commits viewable in <a href="https://github.com/python-babel/babel/compare/v2.8.0...v2.9.1">compare view</a></li>
</ul>
</details>

<br />
[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=babel&package-manager=pip&previous-version=2.8.0&new-version=2.9.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) ---
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/andreaferretti/charade/network/alerts).
Andrea Ferretti
GitHub Repository

nlp nlp-apis python