自动从网上抓取免费代理,并对代理的可用性和匿名性进行检查,同时定时检查有效代理和无效代理,对于多次检查始终无效的代理,做放弃处理。同时检查函数可以自定义指定,用来针对不同的检查结果做出不同的反应。当然代理网站也可以自定义,简单的几行代码几条配置信息,实现最大限度的free-style。
```angular2html
pip install proxy-factory
```
```angular2html mashichaodeMac-mini:toolkit mashichao$ product -h usage: product [-h] [-cm CHECK_METHOD] [-sm SPIDER_MODULE] [--console] [--console-host CONSOLE_HOST] [--console-port CONSOLE_PORT] [-s SETTINGS] [-ls LOCALSETTINGS] [-d] [{stop,start,restart,status}]
positional arguments: {stop,start,restart,status}
optional arguments: -h, --help show this help message and exit -cm CHECK_METHOD, --check-method CHECK_METHOD proivde a check method to check proxies. eg:module.func -sm SPIDER_MODULE, --spider-module SPIDER_MODULE proivde a module contains proxy site spider methods. eg:module1.module2 --console start a console. --console-host CONSOLE_HOST console host. --console-port CONSOLE_PORT console port. -s SETTINGS, --settings SETTINGS Setting module. -ls LOCALSETTINGS, --localsettings LOCALSETTINGS Local setting module. -d, --daemon
python
def check(self, proxy):
"""
自义定检查方法
:param self: ProxyFactory对象
:param proxy: 代理
:return: True则代理可用,否则False
"""
import requests
resp = requests.get("http://2017.ip138.com/ic.asp", proxies={"http": "http://%s"%proxy})
self.logger.info(resp.text)
....
return resp.status_code < 300
python
def fetch_custom(self, page=5):
"""
自定义代理网站抓取
:param self:ProxyFactory对象
:param page: 可以在里记录一些可选参数,但是方法只能接收一个必选参数
:return: set类型的代理列表,ip:port
"""
proxies = set()
url_tmpl = "http://www.kxdaili.com/dailiip/1/%d.html"
for page_num in range(page):
url = url_tmpl % (page_num + 1)
soup = BeautifulSoup(get_html(url, self.headers), "html")
table_tag = soup.find("table", attrs={"class": "segment"})
trs = table_tag.tbody.find_all("tr")
for tr in trs:
tds = tr.find_all("td")
ip = tds[0].text
port = tds[1].text
latency = tds[4].text.split(" ")[0]
if float(latency) < 0.5: # 输出延迟小于0.5秒的代理
proxy = "%s:%s" % (ip, port)
proxies.add(proxy)
return proxies
```python REDIS_HOST = "0.0.0.0"
REDIS_PORT = 6379
BAD_CHECK_INTERVAL = 60
FAILED_TIMES = 5
GOOD_CHECK_INTERVAL = 60
FETCH_INTERVAL = 60
GOOD_PROXY_SET = "good_proxies"
BAD_PROXY_HASH = "bad_proxies"
```
参考资料 一键获取免费真实的匿名代理
Bumps certifi from 2019.9.11 to 2022.12.7.
9e9e840
2022.12.07b81bdb2
2022.09.24939a28f
2022.09.14aca828a
2022.06.15.2de0eae1
Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...b8eb5e9
2022.06.15.147fb7ab
Fix deprecation warning on Python 3.11 (#199)b0b48e0
fixes #198 -- update link in license9d514b4
2022.06.154151e88
Add py.typed to MANIFEST.in to package in sdist (#196)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps ipython from 7.8.0 to 7.16.3.
d43c7c7
release 7.16.35fa1e40
Merge pull request from GHSA-pq7m-3gw7-gq5x8df8971
back to dev9f477b7
release 7.16.2138f266
bring back release helper from master branch5aa3634
Merge pull request #13341 from meeseeksmachine/auto-backport-of-pr-13335-on-7...bcae8e0
Backport PR #13335: What's new 7.16.28fcdcd3
Pin Jedi to <0.17.2.2486838
release 7.16.120bdc6f
fix conda buildDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps pillow from 6.1.0 to 8.3.2.
Sourced from pillow's releases.
8.3.2
https://pillow.readthedocs.io/en/stable/releasenotes/8.3.2.html
Security
CVE-2021-23437 Raise ValueError if color specifier is too long [hugovk, radarhere]
Fix 6-byte OOB read in FliDecode [wiredfool]
Python 3.10 wheels
Fixed regressions
Ensure TIFF
RowsPerStrip
is multiple of 8 for JPEG compression #5588 [kmilos, radarhere]Updates for
ImagePalette
channel order #5599 [radarhere]Hide FriBiDi shim symbols to avoid conflict with real FriBiDi library #5651 [nulano]
8.3.1
https://pillow.readthedocs.io/en/stable/releasenotes/8.3.1.html
Changes
- Catch OSError when checking if fp is sys.stdout #5585 [
@radarhere
]- Handle removing orientation from alternate types of EXIF data #5584 [
@radarhere
]- Make Image.array take optional dtype argument #5572 [
@t-vi
]8.3.0
https://pillow.readthedocs.io/en/stable/releasenotes/8.3.0.html
Changes
- Use snprintf instead of sprintf #5567 [
@radarhere
]- Limit TIFF strip size when saving with LibTIFF #5514 [
@kmilos
]- Allow ICNS save on all operating systems #4526 [
@newpanjing
]- De-zigzag JPEG's DQT when loading; deprecate convert_dict_qtables #4989 [
@gofr
]- Do not use background or transparency index for new color #5564 [
@radarhere
]- Simplified code #5315 [
@radarhere
]- Replaced xml.etree.ElementTree #5565 [
@radarhere
]
... (truncated)
Sourced from pillow's changelog.
8.3.2 (2021-09-02)
CVE-2021-23437 Raise ValueError if color specifier is too long [hugovk, radarhere]
Fix 6-byte OOB read in FliDecode [wiredfool]
Add support for Python 3.10 #5569, #5570 [hugovk, radarhere]
Ensure TIFF
RowsPerStrip
is multiple of 8 for JPEG compression #5588 [kmilos, radarhere]Updates for
ImagePalette
channel order #5599 [radarhere]Hide FriBiDi shim symbols to avoid conflict with real FriBiDi library #5651 [nulano]
8.3.1 (2021-07-06)
Catch OSError when checking if fp is sys.stdout #5585 [radarhere]
Handle removing orientation from alternate types of EXIF data #5584 [radarhere]
Make Image.array take optional dtype argument #5572 [t-vi, radarhere]
8.3.0 (2021-07-01)
Use snprintf instead of sprintf. CVE-2021-34552 #5567 [radarhere]
Limit TIFF strip size when saving with LibTIFF #5514 [kmilos]
Allow ICNS save on all operating systems #4526 [baletu, radarhere, newpanjing, hugovk]
De-zigzag JPEG's DQT when loading; deprecate convert_dict_qtables #4989 [gofr, radarhere]
Replaced xml.etree.ElementTree #5565 [radarhere]
... (truncated)
8013f13
8.3.2 version bump23c7ca8
Update CHANGES.rst8450366
Update release notesa0afe89
Update test case9e08eb8
Raise ValueError if color specifier is too longbd5cf7d
FLI tests for Oss-fuzz crash.94a0cf1
Fix 6-byte OOB read in FliDecodecece64f
Add 8.3.2 (2021-09-02) [CI skip]e422386
Add release notes for Pillow 8.3.208dcbb8
Pillow 8.3.2 supports Python 3.10 [ci skip]Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps urllib3 from 1.25.6 to 1.25.8.
Sourced from urllib3's releases.
1.25.8
Release: 1.25.8
1.25.7
No release notes provided.
Sourced from urllib3's changelog.
1.25.8 (2020-01-20)
1.25.7 (2019-11-11)
Preserve
chunked
parameter on retries (Pull #1715, Pull #1734)Allow unset
SERVER_SOFTWARE
in App Engine (Pull #1704, Issue #1470)Fix issue where URL fragment was sent within the request target. (Pull #1732)
Fix issue where an empty query section in a URL would fail to parse. (Pull #1732)
Remove TLS 1.3 support in SecureTransport due to Apple removing support (Pull #1703)
2a57bc5
Release 1.25.8 (#1788)a2697e7
Optimize _encode_invalid_chars (#1787)d2a5a59
Move IPv6 test skips in server fixturesd44f0e5
Factorize test certificates serialization84abc7f
Generate IPV6 certificates using trustme6a15b18
Run IPv6 Tornado server from fixture4903840
Use trustme to generate IP_SAN cert9971e27
Empty responses should have no lines.62ef68e
Use trustme to generate NO_SAN certsfd2666e
Use fixture to configure NO_SAN test certsDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps py from 1.8.0 to 1.10.0.
Sourced from py's changelog.
1.10.0 (2020-12-12)
- Fix a regular expression DoS vulnerability in the py.path.svnwc SVN blame functionality (CVE-2020-29651)
- Update vendored apipkg: 1.4 => 1.5
- Update vendored iniconfig: 1.0.0 => 1.1.1
1.9.0 (2020-06-24)
Add type annotation stubs for the following modules:
py.error
py.iniconfig
py.path
(not including SVN paths)py.io
py.xml
There are no plans to type other modules at this time.
The type annotations are provided in external .pyi files, not inline in the code, and may therefore contain small errors or omissions. If you use
py
in conjunction with a type checker, and encounter any type errors you believe should be accepted, please report it in an issue.1.8.2 (2020-06-15)
- On Windows,
py.path.local
s which differ only in case now have the same Python hash value. Previously, such paths were considered equal but had different hashes, which is not allowed and breaks the assumptions made by dicts, sets and other users of hashes.1.8.1 (2019-12-27)
Handle
FileNotFoundError
when trying to import pathlib inpath.common
on Python 3.4 (#207).
py.path.local.samefile
now works correctly in Python 3 on Windows when dealing with symlinks.
e5ff378
Update CHANGELOG for 1.10.094cf44f
Update vendored libs5e8ded5
testing: comment out an assert which fails on Python 3.9 for nowafdffcc
Rename HOWTORELEASE.rst to RELEASING.rst2de53a6
Merge pull request #266 from nicoddemus/gh-actionsfa1b32e
Merge pull request #264 from hugovk/patch-2887d6b8
Skip test_samefile_symlink on pypy3 on Windowse94e670
Fix test_comments() in test_sourcefef9a32
Adapt test4a694b0
Add GitHub Actions badge to READMEDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps pygments from 2.4.2 to 2.7.4.
Sourced from pygments's releases.
2.7.4
Updated lexers:
Fix infinite loop in SML lexer (#1625)
Fix backtracking string regexes in JavaScript/TypeScript, Modula2 and many other lexers (#1637)
Limit recursion with nesting Ruby heredocs (#1638)
Fix a few inefficient regexes for guessing lexers
Fix the raw token lexer handling of Unicode (#1616)
Revert a private API change in the HTML formatter (#1655) -- please note that private APIs remain subject to change!
Fix several exponential/cubic-complexity regexes found by Ben Caller/Doyensec (#1675)
Fix incorrect MATLAB example (#1582)
Thanks to Google's OSS-Fuzz project for finding many of these bugs.
2.7.3
... (truncated)
Sourced from pygments's changelog.
Version 2.7.4
(released January 12, 2021)
Updated lexers:
Fix infinite loop in SML lexer (#1625)
Fix backtracking string regexes in JavaScript/TypeScript, Modula2 and many other lexers (#1637)
Limit recursion with nesting Ruby heredocs (#1638)
Fix a few inefficient regexes for guessing lexers
Fix the raw token lexer handling of Unicode (#1616)
Revert a private API change in the HTML formatter (#1655) -- please note that private APIs remain subject to change!
Fix several exponential/cubic-complexity regexes found by Ben Caller/Doyensec (#1675)
Fix incorrect MATLAB example (#1582)
Thanks to Google's OSS-Fuzz project for finding many of these bugs.
Version 2.7.3
(released December 6, 2020)
... (truncated)
4d555d0
Bump version to 2.7.4.fc3b05d
Update CHANGES.ad21935
Revert "Added dracula theme style (#1636)"e411506
Prepare for 2.7.4 release.275e34d
doc: remove Perl 6 ref2e7e8c4
Fix several exponential/cubic complexity regexes found by Ben Caller/Doyenseceb39c43
xquery: fix pop from empty stack2738778
fix coding style in test_analyzer_lexer02e0f09
Added 'ERROR STOP' to fortran.py keywords. (#1665)c83fe48
support added for css variables (#1633)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.