Kanji testing web app leveraging ML

Ambiwlans, updated 🕥 2023-03-13 23:59:33

JiKen

JiKen is a simple kanji quiz that uses statistics and machine learning to accurately and quickly predict a user's knowledge level. I always found it tedious to get a good read of my current kanji level while studying using existing tests which either take forever or are terribly innaccurate so I made this.

The name JiKen is a bit of a play on words/kanji. It could be read as 字検 (letter test, similar to 漢検 the infamous official kanji test) or 事件 (incident). I left it in romaji for the ambiguity. KTest was just a working title, and kind of lame.

Host/Location

https://jiken.fly.dev/

  • Using Fly as a webhost, ClearDB for MySQL, HerokuRedis for sessions.

Math

First thing to know to understand why this works so well is that kanji usage (and recognition) is not flat/random but has a relatively normal distribution and follows Ziph's Law. This allows us to make relatively sensible predictions of people's knowledge using a sigmoid function.

There are two main algorithms worth noting.

One predicts how many kanji you know (the graph) based on your answers. This is a Nelder-Mead regression algo with custom regularization: giving a lot of weight to the initial weights (safe assumption until data is collected), L2 reg (to avoid traps), some penalty to change between questions to give users a smooth experience. I also do bias correction as per https://cs.nyu.edu/~mohri/pub/bias.pdf since the questions selected are not random. No formal tuning methods were use, everything was done by hand until it felt good (the tuning target was to meet user expectations rather than simply being mathematically accurate).

The other algorithm ranks the difficulty of every kanji for future testing. If 100 people know "馬" but don't know "鹿" then the algorithm will shuffle the ranks around so that "鹿" is ranked lower, "馬" higher. This is called a Learning to rank algorithm: https://mlexplained.com/2019/05/27/learning-to-rank-explained-with-code/. Of course, this was again made more complicated by having biased sample selection.

Built With

  • Flask
  • SQLAlchemy (MySQL)
  • Redis (for sessions/ buffering)
  • APscheduler
  • Bootstrap
  • ChartJS
  • Genanki https://github.com/kerrickstaley/genanki (to generate anki files)
  • Datatables.js (for the missed word list)

Contact/Bugs

You can report bugs here or contact me via reddit, twitter #jiken, or e-mail.

Licensing/Contribute

Shoot me a message if you want to do something with this code.

Acknowledgments

More Info

  • Open alpha reddit thread: https://www.reddit.com/r/LearnJapanese/comments/eq380w/made_an_app_that_tests_your_kanji_level_in_30/

Issues

Bump werkzeug from 0.15.5 to 2.2.3

opened on 2023-02-15 22:48:18 by dependabot[bot]

Bumps werkzeug from 0.15.5 to 2.2.3.

Release notes

Sourced from werkzeug's releases.

2.2.3

This is a fix release for the 2.2.x release branch.

This release contains security fixes for:

2.2.2

This is a fix release for the 2.2.0 feature release.

2.2.1

This is a fix release for the 2.2.0 feature release.

2.2.0

This is a feature release, which includes new features and removes previously deprecated features. The 2.2.x branch is now the supported bugfix branch, the 2.1.x branch will become a tag marking the end of support for that branch. We encourage everyone to upgrade, and to use a tool such as pip-tools to pin all dependencies and control upgrades.

2.1.2

This is a fix release for the 2.1.0 feature release.

2.1.1

This is a fix release for the 2.1.0 feature release.

2.1.0

This is a feature release, which includes new features and removes previously deprecated features. The 2.1.x branch is now the supported bugfix branch, the 2.0.x branch will become a tag marking the end of support for that branch. We encourage everyone to upgrade, and to use a tool such as pip-tools to pin all dependencies and control upgrades.

2.0.3

... (truncated)

Changelog

Sourced from werkzeug's changelog.

Version 2.2.3

Released 2023-02-14

  • Ensure that URL rules using path converters will redirect with strict slashes when the trailing slash is missing. :issue:2533
  • Type signature for get_json specifies that return type is not optional when silent=False. :issue:2508
  • parse_content_range_header returns None for a value like bytes */-1 where the length is invalid, instead of raising an AssertionError. :issue:2531
  • Address remaining ResourceWarning related to the socket used by run_simple. Remove prepare_socket, which now happens when creating the server. :issue:2421
  • Update pre-existing headers for multipart/form-data requests with the test client. :issue:2549
  • Fix handling of header extended parameters such that they are no longer quoted. :issue:2529
  • LimitedStream.read works correctly when wrapping a stream that may not return the requested size in one read call. :issue:2558
  • A cookie header that starts with = is treated as an empty key and discarded, rather than stripping the leading ==.
  • Specify a maximum number of multipart parts, default 1000, after which a RequestEntityTooLarge exception is raised on parsing. This mitigates a DoS attack where a larger number of form/file parts would result in disproportionate resource use.

Version 2.2.2

Released 2022-08-08

  • Fix router to restore the 2.1 strict_slashes == False behaviour whereby leaf-requests match branch rules and vice versa. :pr:2489
  • Fix router to identify invalid rules rather than hang parsing them, and to correctly parse / within converter arguments. :pr:2489
  • Update subpackage imports in :mod:werkzeug.routing to use the import as syntax for explicitly re-exporting public attributes. :pr:2493
  • Parsing of some invalid header characters is more robust. :pr:2494
  • When starting the development server, a warning not to use it in a production deployment is always shown. :issue:2480
  • LocalProxy.__wrapped__ is always set to the wrapped object when the proxy is unbound, fixing an issue in doctest that would cause it to fail. :issue:2485
  • Address one ResourceWarning related to the socket used by run_simple. :issue:2421

... (truncated)

Commits
  • 22a254f release version 2.2.3
  • 517cac5 Merge pull request from GHSA-xg9f-g7g7-2323
  • babc8d9 rewrite docs about request data limits
  • 09449ee clean up docs
  • fe899d0 limit the maximum number of multipart form parts
  • cf275f4 Merge pull request from GHSA-px8h-6qxv-m22q
  • 8c2b4b8 don't strip leading = when parsing cookie
  • 7c7ce5c [pre-commit.ci] pre-commit autoupdate (#2585)
  • 19ae03e [pre-commit.ci] auto fixes from pre-commit.com hooks
  • a83d3b8 [pre-commit.ci] pre-commit autoupdate
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Ambiwlans/JiKen/network/alerts).

Bump certifi from 2019.6.16 to 2022.12.7

opened on 2022-12-08 06:15:16 by dependabot[bot]

Bumps certifi from 2019.6.16 to 2022.12.7.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Ambiwlans/JiKen/network/alerts).

Bump numpy from 1.16.2 to 1.22.0

opened on 2022-06-22 01:58:19 by dependabot[bot]

Bumps numpy from 1.16.2 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

  • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
  • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
  • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
  • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
  • A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Ambiwlans/JiKen/network/alerts).

Three or Four input boxes / checks, for more accurate calculations (and reduce bias)

opened on 2022-04-23 14:27:21 by patarapolw

Those <input type="text"> would be

  1. Meanings
  2. On readings
  3. Kun readings
  4. Associate vocabularies

`<html lang="ja">` or include Japanese-only fonts

opened on 2022-04-23 14:24:11 by patarapolw

This is important, as many Kanji have different shapes due to Han Unification. Chinese variant of a Kanji is of course outside the range a native, at least, should recognize the Kanji.

Ambiwlans

Space. Machine Learning. Code.

GitHub Repository