基于redis的简单代理工厂

ShichaoMa, updated 🕥 2022-12-08 06:42:23

自动从网上抓取免费代理,并对代理的可用性和匿名性进行检查,同时定时检查有效代理和无效代理,对于多次检查始终无效的代理,做放弃处理。同时检查函数可以自定义指定,用来针对不同的检查结果做出不同的反应。当然代理网站也可以自定义,简单的几行代码几条配置信息,实现最大限度的free-style。

INSTALL

```angular2html

python3 以上版本

pip install proxy-factory

依赖 redis(必须), tesseract-ocr(可选)

```

USAGE

```angular2html mashichaodeMac-mini:toolkit mashichao$ product -h usage: product [-h] [-cm CHECK_METHOD] [-sm SPIDER_MODULE] [--console] [--console-host CONSOLE_HOST] [--console-port CONSOLE_PORT] [-s SETTINGS] [-ls LOCALSETTINGS] [-d] [{stop,start,restart,status}]

positional arguments: {stop,start,restart,status}

optional arguments: -h, --help show this help message and exit -cm CHECK_METHOD, --check-method CHECK_METHOD proivde a check method to check proxies. eg:module.func -sm SPIDER_MODULE, --spider-module SPIDER_MODULE proivde a module contains proxy site spider methods. eg:module1.module2 --console start a console. --console-host CONSOLE_HOST console host. --console-port CONSOLE_PORT console port. -s SETTINGS, --settings SETTINGS Setting module. -ls LOCALSETTINGS, --localsettings LOCALSETTINGS Local setting module. -d, --daemon

  • product start: 程序开始(阻塞式)
  • product -d start: 程序开始(守护进程模式)
  • product restart 程序重启(守护进程模式)
  • product stop 程序关闭(守护进程模式)
  • product status 程序状态(守护进程模式)
  • product --console 开启一个console客户端,调试专用,详细请参见(https://github.com/ShichaoMa/toolkit)
  • product -s settings 指定一个配置模块。(只要在sys.path中就可以找到)
  • product -ls localsettings 指定一个自定义配置模块。(只要在sys.path中就可以找到)
  • product -cm check-method 指定一个自定义检查方法。(只要在sys.path中就可以找到)
  • product -sm spider-module 指定一个自定义的spider模块,存放自定义的spider方法。(只要在sys.path中就可以找到) ```

CONFIG

CUSTOM CHECK

python def check(self, proxy): """ 自义定检查方法 :param self: ProxyFactory对象 :param proxy: 代理 :return: True则代理可用,否则False """ import requests resp = requests.get("http://2017.ip138.com/ic.asp", proxies={"http": "http://%s"%proxy}) self.logger.info(resp.text) .... return resp.status_code < 300

CUSTOM PROXY SITE METHOD

python def fetch_custom(self, page=5): """ 自定义代理网站抓取 :param self:ProxyFactory对象 :param page: 可以在里记录一些可选参数,但是方法只能接收一个必选参数 :return: set类型的代理列表,ip:port """ proxies = set() url_tmpl = "http://www.kxdaili.com/dailiip/1/%d.html" for page_num in range(page): url = url_tmpl % (page_num + 1) soup = BeautifulSoup(get_html(url, self.headers), "html") table_tag = soup.find("table", attrs={"class": "segment"}) trs = table_tag.tbody.find_all("tr") for tr in trs: tds = tr.find_all("td") ip = tds[0].text port = tds[1].text latency = tds[4].text.split(" ")[0] if float(latency) < 0.5: # 输出延迟小于0.5秒的代理 proxy = "%s:%s" % (ip, port) proxies.add(proxy) return proxies

SETTINGS

```python REDIS_HOST = "0.0.0.0"

REDIS_PORT = 6379

质量不好的代理检查的时间间隔

BAD_CHECK_INTERVAL = 60

质量不好的代理连续检查失败次数的最大值,超过则丢弃

FAILED_TIMES = 5

质量好的代理检查的时间间隔

GOOD_CHECK_INTERVAL = 60

抓取新代理的时间间隔

FETCH_INTERVAL = 60

redis中用来存放有效代理的set

GOOD_PROXY_SET = "good_proxies"

redis中用来存放无效代理的hash

BAD_PROXY_HASH = "bad_proxies"

```

参考资料 一键获取免费真实的匿名代理

Issues

Bump certifi from 2019.9.11 to 2022.12.7

opened on 2022-12-08 06:42:22 by dependabot[bot]

Bumps certifi from 2019.9.11 to 2022.12.7.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/ShichaoMa/proxy_factory/network/alerts).

Bump ipython from 7.8.0 to 7.16.3

opened on 2022-01-21 20:13:32 by dependabot[bot]

Bumps ipython from 7.8.0 to 7.16.3.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/ShichaoMa/proxy_factory/network/alerts).

Bump pillow from 6.1.0 to 8.3.2

opened on 2021-09-08 01:32:38 by dependabot[bot]

Bumps pillow from 6.1.0 to 8.3.2.

Release notes

Sourced from pillow's releases.

8.3.2

https://pillow.readthedocs.io/en/stable/releasenotes/8.3.2.html

Security

  • CVE-2021-23437 Raise ValueError if color specifier is too long [hugovk, radarhere]

  • Fix 6-byte OOB read in FliDecode [wiredfool]

Python 3.10 wheels

  • Add support for Python 3.10 #5569, #5570 [hugovk, radarhere]

Fixed regressions

  • Ensure TIFF RowsPerStrip is multiple of 8 for JPEG compression #5588 [kmilos, radarhere]

  • Updates for ImagePalette channel order #5599 [radarhere]

  • Hide FriBiDi shim symbols to avoid conflict with real FriBiDi library #5651 [nulano]

8.3.1

https://pillow.readthedocs.io/en/stable/releasenotes/8.3.1.html

Changes

8.3.0

https://pillow.readthedocs.io/en/stable/releasenotes/8.3.0.html

Changes

... (truncated)

Changelog

Sourced from pillow's changelog.

8.3.2 (2021-09-02)

  • CVE-2021-23437 Raise ValueError if color specifier is too long [hugovk, radarhere]

  • Fix 6-byte OOB read in FliDecode [wiredfool]

  • Add support for Python 3.10 #5569, #5570 [hugovk, radarhere]

  • Ensure TIFF RowsPerStrip is multiple of 8 for JPEG compression #5588 [kmilos, radarhere]

  • Updates for ImagePalette channel order #5599 [radarhere]

  • Hide FriBiDi shim symbols to avoid conflict with real FriBiDi library #5651 [nulano]

8.3.1 (2021-07-06)

  • Catch OSError when checking if fp is sys.stdout #5585 [radarhere]

  • Handle removing orientation from alternate types of EXIF data #5584 [radarhere]

  • Make Image.array take optional dtype argument #5572 [t-vi, radarhere]

8.3.0 (2021-07-01)

  • Use snprintf instead of sprintf. CVE-2021-34552 #5567 [radarhere]

  • Limit TIFF strip size when saving with LibTIFF #5514 [kmilos]

  • Allow ICNS save on all operating systems #4526 [baletu, radarhere, newpanjing, hugovk]

  • De-zigzag JPEG's DQT when loading; deprecate convert_dict_qtables #4989 [gofr, radarhere]

  • Replaced xml.etree.ElementTree #5565 [radarhere]

... (truncated)

Commits
  • 8013f13 8.3.2 version bump
  • 23c7ca8 Update CHANGES.rst
  • 8450366 Update release notes
  • a0afe89 Update test case
  • 9e08eb8 Raise ValueError if color specifier is too long
  • bd5cf7d FLI tests for Oss-fuzz crash.
  • 94a0cf1 Fix 6-byte OOB read in FliDecode
  • cece64f Add 8.3.2 (2021-09-02) [CI skip]
  • e422386 Add release notes for Pillow 8.3.2
  • 08dcbb8 Pillow 8.3.2 supports Python 3.10 [ci skip]
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/ShichaoMa/proxy_factory/network/alerts).

Bump urllib3 from 1.25.6 to 1.25.8

opened on 2021-04-30 21:29:58 by dependabot[bot]

Bumps urllib3 from 1.25.6 to 1.25.8.

Release notes

Sourced from urllib3's releases.

1.25.8

Release: 1.25.8

1.25.7

No release notes provided.

Changelog

Sourced from urllib3's changelog.

1.25.8 (2020-01-20)

  • Drop support for EOL Python 3.4 (Pull #1774)

  • Optimize _encode_invalid_chars (Pull #1787)

1.25.7 (2019-11-11)

  • Preserve chunked parameter on retries (Pull #1715, Pull #1734)

  • Allow unset SERVER_SOFTWARE in App Engine (Pull #1704, Issue #1470)

  • Fix issue where URL fragment was sent within the request target. (Pull #1732)

  • Fix issue where an empty query section in a URL would fail to parse. (Pull #1732)

  • Remove TLS 1.3 support in SecureTransport due to Apple removing support (Pull #1703)

Commits
  • 2a57bc5 Release 1.25.8 (#1788)
  • a2697e7 Optimize _encode_invalid_chars (#1787)
  • d2a5a59 Move IPv6 test skips in server fixtures
  • d44f0e5 Factorize test certificates serialization
  • 84abc7f Generate IPV6 certificates using trustme
  • 6a15b18 Run IPv6 Tornado server from fixture
  • 4903840 Use trustme to generate IP_SAN cert
  • 9971e27 Empty responses should have no lines.
  • 62ef68e Use trustme to generate NO_SAN certs
  • fd2666e Use fixture to configure NO_SAN test certs
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/ShichaoMa/proxy_factory/network/alerts).

Bump py from 1.8.0 to 1.10.0

opened on 2021-04-20 18:39:43 by dependabot[bot]

Bumps py from 1.8.0 to 1.10.0.

Changelog

Sourced from py's changelog.

1.10.0 (2020-12-12)

  • Fix a regular expression DoS vulnerability in the py.path.svnwc SVN blame functionality (CVE-2020-29651)
  • Update vendored apipkg: 1.4 => 1.5
  • Update vendored iniconfig: 1.0.0 => 1.1.1

1.9.0 (2020-06-24)

  • Add type annotation stubs for the following modules:

    • py.error
    • py.iniconfig
    • py.path (not including SVN paths)
    • py.io
    • py.xml

    There are no plans to type other modules at this time.

    The type annotations are provided in external .pyi files, not inline in the code, and may therefore contain small errors or omissions. If you use py in conjunction with a type checker, and encounter any type errors you believe should be accepted, please report it in an issue.

1.8.2 (2020-06-15)

  • On Windows, py.path.locals which differ only in case now have the same Python hash value. Previously, such paths were considered equal but had different hashes, which is not allowed and breaks the assumptions made by dicts, sets and other users of hashes.

1.8.1 (2019-12-27)

  • Handle FileNotFoundError when trying to import pathlib in path.common on Python 3.4 (#207).

  • py.path.local.samefile now works correctly in Python 3 on Windows when dealing with symlinks.

Commits
  • e5ff378 Update CHANGELOG for 1.10.0
  • 94cf44f Update vendored libs
  • 5e8ded5 testing: comment out an assert which fails on Python 3.9 for now
  • afdffcc Rename HOWTORELEASE.rst to RELEASING.rst
  • 2de53a6 Merge pull request #266 from nicoddemus/gh-actions
  • fa1b32e Merge pull request #264 from hugovk/patch-2
  • 887d6b8 Skip test_samefile_symlink on pypy3 on Windows
  • e94e670 Fix test_comments() in test_source
  • fef9a32 Adapt test
  • 4a694b0 Add GitHub Actions badge to README
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/ShichaoMa/proxy_factory/network/alerts).

Bump pygments from 2.4.2 to 2.7.4

opened on 2021-03-29 20:39:38 by dependabot[bot]

Bumps pygments from 2.4.2 to 2.7.4.

Release notes

Sourced from pygments's releases.

2.7.4

  • Updated lexers:

    • Apache configurations: Improve handling of malformed tags (#1656)

    • CSS: Add support for variables (#1633, #1666)

    • Crystal (#1650, #1670)

    • Coq (#1648)

    • Fortran: Add missing keywords (#1635, #1665)

    • Ini (#1624)

    • JavaScript and variants (#1647 -- missing regex flags, #1651)

    • Markdown (#1623, #1617)

    • Shell

      • Lex trailing whitespace as part of the prompt (#1645)
      • Add missing in keyword (#1652)
    • SQL - Fix keywords (#1668)

    • Typescript: Fix incorrect punctuation handling (#1510, #1511)

  • Fix infinite loop in SML lexer (#1625)

  • Fix backtracking string regexes in JavaScript/TypeScript, Modula2 and many other lexers (#1637)

  • Limit recursion with nesting Ruby heredocs (#1638)

  • Fix a few inefficient regexes for guessing lexers

  • Fix the raw token lexer handling of Unicode (#1616)

  • Revert a private API change in the HTML formatter (#1655) -- please note that private APIs remain subject to change!

  • Fix several exponential/cubic-complexity regexes found by Ben Caller/Doyensec (#1675)

  • Fix incorrect MATLAB example (#1582)

Thanks to Google's OSS-Fuzz project for finding many of these bugs.

2.7.3

... (truncated)

Changelog

Sourced from pygments's changelog.

Version 2.7.4

(released January 12, 2021)

  • Updated lexers:

    • Apache configurations: Improve handling of malformed tags (#1656)

    • CSS: Add support for variables (#1633, #1666)

    • Crystal (#1650, #1670)

    • Coq (#1648)

    • Fortran: Add missing keywords (#1635, #1665)

    • Ini (#1624)

    • JavaScript and variants (#1647 -- missing regex flags, #1651)

    • Markdown (#1623, #1617)

    • Shell

      • Lex trailing whitespace as part of the prompt (#1645)
      • Add missing in keyword (#1652)
    • SQL - Fix keywords (#1668)

    • Typescript: Fix incorrect punctuation handling (#1510, #1511)

  • Fix infinite loop in SML lexer (#1625)

  • Fix backtracking string regexes in JavaScript/TypeScript, Modula2 and many other lexers (#1637)

  • Limit recursion with nesting Ruby heredocs (#1638)

  • Fix a few inefficient regexes for guessing lexers

  • Fix the raw token lexer handling of Unicode (#1616)

  • Revert a private API change in the HTML formatter (#1655) -- please note that private APIs remain subject to change!

  • Fix several exponential/cubic-complexity regexes found by Ben Caller/Doyensec (#1675)

  • Fix incorrect MATLAB example (#1582)

Thanks to Google's OSS-Fuzz project for finding many of these bugs.

Version 2.7.3

(released December 6, 2020)

... (truncated)

Commits
  • 4d555d0 Bump version to 2.7.4.
  • fc3b05d Update CHANGES.
  • ad21935 Revert "Added dracula theme style (#1636)"
  • e411506 Prepare for 2.7.4 release.
  • 275e34d doc: remove Perl 6 ref
  • 2e7e8c4 Fix several exponential/cubic complexity regexes found by Ben Caller/Doyensec
  • eb39c43 xquery: fix pop from empty stack
  • 2738778 fix coding style in test_analyzer_lexer
  • 02e0f09 Added 'ERROR STOP' to fortran.py keywords. (#1665)
  • c83fe48 support added for css variables (#1633)
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/ShichaoMa/proxy_factory/network/alerts).