Data and implementation of ECCV2020 paper 'Adaptive Text Recognition through Visual Matching'

Chuhanxx, updated 🕥 2022-11-22 04:23:08

Adaptive Text Recognition through Visual Matching

📋 This repository contains the data and implementation of ECCV2020 paper Adaptive Text Recognition through Visual Matching

Abstract

This work addresses the problems of generalization and flexibility for text recognition in documents.
We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual decoding and linguistic modelling stages through intermediate representations in the form of similarity maps. By doing this, we turn text recognition into a visual matching problem, thereby achieving one-shot sequence recognition.
It can handle challenges that traditional architectures are unable to solve without expensive retraining, including: (i) it can change the number of classes simply by changing the exemplars; and (ii) it can generalize to novel languages and characters (not in the training data) simply by providing a new glyph exemplar set. We also demonstrate that the model can generalize to unseen fonts without requiring new exemplars from them.

Getting started

  1. Clone this repository git clone https://github.com/Chuhanxx/FontAdaptor.git
  2. Create conda virtual env and install the requirements
    (This implementation requires CUDA and python > 3.7) cd FontAdaptor source build_venv.sh
  3. Download data for training and evalutaion.
    (The dataset contains FontSynth + Omniglot) source download_data.sh
  4. Download our pre-trained model on four font atrributes + Omniglot

Test trained models

Test the model using test fonts as exemplars:

python test.py --evalset FontSynth --root ./data --model_folder /PATH/TO/CHECKPOINT

Test the model using randomly chosen training fonts as exemplars python test.py --evalset FontSynth --root ./data --model_folder /PATH/TO/CHECKPOINT --cross

Test the model on Omniglot:

python test.py --evalset Omniglot --root ./data --model_folder /PATH/TO/CHECKPOINT

You can visualize the prediction from the model by enabling --visualize

Training

** Note our FontSynth dataset has been updated in 04/12/2020 , please download/update it from here.

Train the model with English data: Choose number of attributes by setting trainset to attribute1,attribute2,attribute3 or attribute4

python train.py --trainset attribute4 --data eng --char_aug --name EXP_NAME --root ./data

Train the model with English data + Omniglot:

python train.py --trainset attribute4 --data omnieng --char_aug --name EXP_NAME --root ./data

Data

Our FontSynth dataset (16GB) can be downloaded directly from here. (updated 03/12/20)

We take 1444 fonts from the MJSynth dataset and split them into five categories by their appearance attributes as determined from their names: (1) regular, (2) bold, (3) italic, (4) light, and (5) others (i.e., all fonts with none of the first four attributes in their name)
For train- ing, we select 50 fonts at random from each split and generate 1000 text-line and glyph images for each font. For testing, we use all the 251 fonts in category (5).

The structure of this dataset is: ims/ font1/ font2/ ... gt/ train/ train_regular_50_resample.txt test/ val/ test_FontSynth.txt train_att4.txt ... fontlib/ googlefontdirectory/ ... In folder gt, there are txt files with lines in the following format:
font_name img_name gt_sentence (H,W) For training, it corresponds to an text-line image with path: ims/font_name/lines_byclusters/img_name
For testing, it corresponds to an text-line image with path: ims/font_name/test_new/img_name

gt/train_att4.txt and gt/train_att4.txt list the fonts selected for training and testing, source files of these fonts can be found in fontlib.

Citation

If you use this code etc., please cite the following paper:

@inproceedings{zhang2020Adaptive, title={Adaptive Text Recognition through Visual Matching}, author={Chuhan Zhang and Ankush Gupta and Andrew Zisserman}, booktitle={European Conference on Computer Vision (ECCV)}, year={2020} }

If you have any question, please contact [email protected] .

Issues

Bump pillow from 6.1.0 to 9.3.0

opened on 2022-11-22 04:23:05 by dependabot[bot]

Bumps pillow from 6.1.0 to 9.3.0.

Release notes

Sourced from pillow's releases.

9.3.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html

Changes

... (truncated)

Changelog

Sourced from pillow's changelog.

9.3.0 (2022-10-29)

  • Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]

  • Initialize libtiff buffer when saving #6699 [radarhere]

  • Inline fname2char to fix memory leak #6329 [nulano]

  • Fix memory leaks related to text features #6330 [nulano]

  • Use double quotes for version check on old CPython on Windows #6695 [hugovk]

  • Remove backup implementation of Round for Windows platforms #6693 [cgohlke]

  • Fixed set_variation_by_name offset #6445 [radarhere]

  • Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]

  • Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]

  • Added ExifTags enums #6630 [radarhere]

  • Do not modify previous frame when calculating delta in PNG #6683 [radarhere]

  • Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]

  • Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]

  • Added GPS TIFF tag info #6661 [radarhere]

  • Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]

  • Do not attempt normalization if mode is already normal #6644 [radarhere]

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Chuhanxx/FontAdaptor/network/alerts).

Bump joblib from 0.13.2 to 1.2.0

opened on 2022-09-30 19:08:07 by dependabot[bot]

Bumps joblib from 0.13.2 to 1.2.0.

Changelog

Sourced from joblib's changelog.

Release 1.2.0

  • Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

  • Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

  • Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

  • Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

  • Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

  • Vendor loky 3.3.0 which fixes several bugs including:

    • robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

    • avoiding leaking worker processes in case of nested loky parallel calls;

    • reliability spawn the correct number of reusable workers.

Release 1.1.0

  • Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

  • Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

... (truncated)

Commits
  • 5991350 Release 1.2.0
  • 3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)
  • cea26ff CI test the future loky-3.3.0 branch (#1338)
  • 8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)
  • 067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)
  • ac4ebd5 MAINT add back pytest warnings plugin (#1337)
  • a23427d Test child raises parent exits cleanly more reliable on macos (#1335)
  • ac09691 [MAINT] various test updates (#1334)
  • 4a314b1 Vendor loky 3.2.0 (#1333)
  • bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Chuhanxx/FontAdaptor/network/alerts).

Bump ipython from 7.6.1 to 7.16.3

opened on 2022-01-21 20:35:08 by dependabot[bot]

Bumps ipython from 7.6.1 to 7.16.3.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Chuhanxx/FontAdaptor/network/alerts).

About the pkl file

opened on 2021-08-02 03:35:29 by jiayunshu

I want to use my own data for training, can you tell me how to get the pkl file?

Can you provide a single image inference demo which was decoupled from FontSynth dataset?

opened on 2020-12-22 13:59:19 by KaiOtter

Hi, appreciate for your work. This noval idea of one-shot sequence recognition is meaningful in the fields of various, customized OCR solutions. Because pipeline and architecture are totally different with traditional text-Rec project, I'm puzzled with some implement details by only reading the codes. So, I hope you can provide a inference demo script which is fed with a single sample image and a glyph-line image.

Well, although test.py has been offered, dataloaders Class are deeply coupled with FontSynth dataset. People interested in this repo have to get data ready before running test. Honestly, the cost of trying is too high. In other way, the download speed also bothers me (from China). By <50KB/s, it will be done in more than 3 days to download 16GB.

Hope it's a easy work for you. XD