crawling app by uiautomator2 & mitmproxy
使用 URL_SCHEMA 跳转实现抖音用户视频和视频评论的抓取
python3
URL_SCHEMA_MAP = {
'home': "snssdk1128://feed?refer=web",
'user': 'snssdk1128://user/profile/{uid}?refer=web',
'detail': 'snssdk1128://aweme/detail/{aweme_id}?refer=web',
'challenge': 'snssdk1128://challenge/detail/{challenge_id}?refer=web',
'music': 'snssdk1128://music/detail/{music_id}?refer=web',
'live': 'snssdk1128://live?room_id={room_id}&user_id={user_id}&from=webview&refer=web',
'poi":': 'snssdk1128://poi/?id={poi_id}',
'webview': 'snssdk1128://webview?url={url}&from=webview&refer=web',
'webview_fullscreen': 'snssdk1128://webview?url={url}&from=webview&hide_nav_bar=1&refer=web',
'poidetail': 'snssdk1128://poi/detail?id={id}&from=webview&refer=web',
'forward': 'snssdk1128://forward/detail/{id}',
'billboard_word': 'snssdk1128://search/trending',
'billboard_video': "snssdk1128://search/trending?type=1",
'billboard_music': "snssdk1128://search/trending?type=2",
'billboard_positive': "snssdk1128://search/trending?type=3",
'billboard_star': "snssdk1128://search/trending?type=4",
}
下载 Android platform-tools 并解压获取 adb
- https://developer.android.com/studio/releases/platform-tools?hl=zh-Cn
```bash
开发者选项
)adb devices ```
bash
pipenv install
pipenv shell
uiautomator2 init
使用web界面查看和定位元素 ```bash python -m weditor
```
```bash cp .env.tpl .env cp -r .mitmproxy ~/.mitmproxy make run-mitmproxy
make up
./app/tools/simple_data_import.py
./dy.py crawler_users --max_num=200 --device_serial=xxxxx
./dy.py crawler_comments --device_serial=xxxxxxx
./dy.py crawler_follower --device_serial=72bf965
./crawler.py crawler_follower --max_num=200
./crawler.py run
```
```bash sudo cp frp/systemd/app-crawler.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl start app-crawler.service
sudo systemctl restart app-crawler.service
sudo systemctl enable app-crawler.service ```
bash
adb kill-server
adb start-server
还是不行,重启手机试试
adb devices
出现 no permissions (user in plugdev group; are your udev rules wrong?)
adb-devices-no-permissions-user-in-plugdev-group-are-your-udev-rules-wrong
weditor
打开时出现 adbutils.errors.AdbError: device not found
更换设备会出现,需要清理 Chrome 的 LocalStorage
测试机型
Xiaomi Mi 6
Bumps certifi from 2019.11.28 to 2022.12.7.
9e9e840
2022.12.07b81bdb2
2022.09.24939a28f
2022.09.14aca828a
2022.06.15.2de0eae1
Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...b8eb5e9
2022.06.15.147fb7ab
Fix deprecation warning on Python 3.11 (#199)b0b48e0
fixes #198 -- update link in license9d514b4
2022.06.154151e88
Add py.typed to MANIFEST.in to package in sdist (#196)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps pillow from 6.2.1 to 9.3.0.
Sourced from pillow's releases.
9.3.0
https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html
Changes
- Initialize libtiff buffer when saving #6699 [
@radarhere
]- Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [
@wiredfool
]- Inline fname2char to fix memory leak #6329 [
@nulano
]- Fix memory leaks related to text features #6330 [
@nulano
]- Use double quotes for version check on old CPython on Windows #6695 [
@hugovk
]- GHA: replace deprecated set-output command with GITHUB_OUTPUT file #6697 [
@nulano
]- Remove backup implementation of Round for Windows platforms #6693 [
@cgohlke
]- Upload fribidi.dll to GitHub Actions #6532 [
@nulano
]- Fixed set_variation_by_name offset #6445 [
@radarhere
]- Windows build improvements #6562 [
@nulano
]- Fix malloc in _imagingft.c:font_setvaraxes #6690 [
@cgohlke
]- Only use ASCII characters in C source file #6691 [
@cgohlke
]- Release Python GIL when converting images using matrix operations #6418 [
@hmaarrfk
]- Added ExifTags enums #6630 [
@radarhere
]- Do not modify previous frame when calculating delta in PNG #6683 [
@radarhere
]- Added support for reading BMP images with RLE4 compression #6674 [
@npjg
]- Decode JPEG compressed BLP1 data in original mode #6678 [
@radarhere
]- pylint warnings #6659 [
@marksmayo
]- Added GPS TIFF tag info #6661 [
@radarhere
]- Added conversion between RGB/RGBA/RGBX and LAB #6647 [
@radarhere
]- Do not attempt normalization if mode is already normal #6644 [
@radarhere
]- Fixed seeking to an L frame in a GIF #6576 [
@radarhere
]- Consider all frames when selecting mode for PNG save_all #6610 [
@radarhere
]- Don't reassign crc on ChunkStream close #6627 [
@radarhere
]- Raise a warning if NumPy failed to raise an error during conversion #6594 [
@radarhere
]- Only read a maximum of 100 bytes at a time in IMT header #6623 [
@radarhere
]- Show all frames in ImageShow #6611 [
@radarhere
]- Allow FLI palette chunk to not be first #6626 [
@radarhere
]- If first GIF frame has transparency for RGB_ALWAYS loading strategy, use RGBA mode #6592 [
@radarhere
]- Round box position to integer when pasting embedded color #6517 [
@radarhere
]- Removed EXIF prefix when saving WebP #6582 [
@radarhere
]- Pad IM palette to 768 bytes when saving #6579 [
@radarhere
]- Added DDS BC6H reading #6449 [
@ShadelessFox
]- Added support for opening WhiteIsZero 16-bit integer TIFF images #6642 [
@JayWiz
]- Raise an error when allocating translucent color to RGB palette #6654 [
@jsbueno
]- Moved mode check outside of loops #6650 [
@radarhere
]- Added reading of TIFF child images #6569 [
@radarhere
]- Improved ImageOps palette handling #6596 [
@PososikTeam
]- Defer parsing of palette into colors #6567 [
@radarhere
]- Apply transparency to P images in ImageTk.PhotoImage #6559 [
@radarhere
]- Use rounding in ImageOps contain() and pad() #6522 [
@bibinhashley
]- Fixed GIF remapping to palette with duplicate entries #6548 [
@radarhere
]- Allow remap_palette() to return an image with less than 256 palette entries #6543 [
@radarhere
]- Corrected BMP and TGA palette size when saving #6500 [
@radarhere
]
... (truncated)
Sourced from pillow's changelog.
9.3.0 (2022-10-29)
Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]
Initialize libtiff buffer when saving #6699 [radarhere]
Inline fname2char to fix memory leak #6329 [nulano]
Fix memory leaks related to text features #6330 [nulano]
Use double quotes for version check on old CPython on Windows #6695 [hugovk]
Remove backup implementation of Round for Windows platforms #6693 [cgohlke]
Fixed set_variation_by_name offset #6445 [radarhere]
Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]
Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]
Added ExifTags enums #6630 [radarhere]
Do not modify previous frame when calculating delta in PNG #6683 [radarhere]
Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]
Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]
Added GPS TIFF tag info #6661 [radarhere]
Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]
Do not attempt normalization if mode is already normal #6644 [radarhere]
... (truncated)
d594f4c
Update CHANGES.rst [ci skip]909dc64
9.3.0 version bump1a51ce7
Merge pull request #6699 from hugovk/security-libtiff_buffer2444cdd
Merge pull request #6700 from hugovk/security-samples_per_pixel-sec744f455
Added release notes0846bfa
Add to release notes799a6a0
Fix linting00b25fd
Hide UserWarning in logs05b175e
Tighter test case13f2c5a
Prevent DOS with large SAMPLESPERPIXEL in Tiff IFDDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
就是URL_SCHEMA这些链接是怎么获取的呀,我尝试了frida搜索Intent,还有am搜索,都没有找到,有点疑惑。
Bumps lxml from 4.4.2 to 4.9.1.
Sourced from lxml's changelog.
4.9.1 (2022-07-01)
Bugs fixed
- A crash was resolved when using
iterwalk()
(orcanonicalize()
) after parsing certain incorrect input. Note thatiterwalk()
can crash on valid input parsed with the same parser after failing to parse the incorrect input.4.9.0 (2022-06-01)
Bugs fixed
- GH#341: The mixin inheritance order in
lxml.html
was corrected. Patch by xmo-odoo.Other changes
Built with Cython 0.29.30 to adapt to changes in Python 3.11 and 3.12.
Wheels include zlib 1.2.12, libxml2 2.9.14 and libxslt 1.1.35 (libxml2 2.9.12+ and libxslt 1.1.34 on Windows).
GH#343: Windows-AArch64 build support in Visual Studio. Patch by Steve Dower.
4.8.0 (2022-02-17)
Features added
GH#337: Path-like objects are now supported throughout the API instead of just strings. Patch by Henning Janssen.
The
ElementMaker
now supportsQName
values as tags, which always override the default namespace of the factory.Bugs fixed
- GH#338: In lxml.objectify, the XSI float annotation "nan" and "inf" were spelled in lower case, whereas XML Schema datatypes define them as "NaN" and "INF" respectively.
... (truncated)
d01872c
Prevent parse failure in new test from leaking into later test runs.d65e632
Prepare release of lxml 4.9.1.86368e9
Fix a crash when incorrect parser input occurs together with usages of iterwa...50c2764
Delete unused Travis CI config and reference in docs (GH-345)8f0bf2d
Try to speed up the musllinux AArch64 build by splitting the different CPytho...b9f7074
Remove debug print from test.b224e0f
Try to install 'xz' in wheel builds, if available, since it's now needed to e...897ebfa
Update macOS deployment target version from 10.14 to 10.15 since 10.14 starts...853c9e9
Prepare release of 4.9.0.d3f77e6
Add a test for https://bugs.launchpad.net/lxml/+bug/1965070 leaving out the a...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps ipython from 7.10.1 to 7.16.3.
d43c7c7
release 7.16.35fa1e40
Merge pull request from GHSA-pq7m-3gw7-gq5x8df8971
back to dev9f477b7
release 7.16.2138f266
bring back release helper from master branch5aa3634
Merge pull request #13341 from meeseeksmachine/auto-backport-of-pr-13335-on-7...bcae8e0
Backport PR #13335: What's new 7.16.28fcdcd3
Pin Jedi to <0.17.2.2486838
release 7.16.120bdc6f
fix conda buildDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps mitmproxy from 4.0.4 to 7.0.3.
Sourced from mitmproxy's releases.
v7.0.3
- CVE-2021-39214: Fix request smuggling vulnerabilities reported by
@chinchila
- Expose TLS 1.0 as possible minimum version on older pyOpenSSL releases
- Fix compatibility with Python 3.10
You can find the latest release packages at https://mitmproxy.org/downloads/.
v7.0.2
- Fix a WebSocket crash introduced in 7.0.1 (
@mhils
)You can find the latest release packages at https://mitmproxy.org/downloads/.
v7.0.1
- Performance: Re-use OpenSSL contexts to enable TLS session resumption (
@mhils
)- Disable HTTP/2 CONNECT for Secure Web Proxies to fix compatibility with Firefox (
@mhils
)- Use local IP address as certificate subject if no other info is available (
@mhils
)- Make it possible to return multiple chunks for HTTP stream modification (
@mhils
)- Don't send WebSocket CONTINUATION frames when the peer does not send any (
@Pilphe
)- Fix HTTP stream modify example. (
@mhils
)- Fix a crash caused by no-op assignments to
Server.address
(@SaladDais
)- Fix a crash when encountering invalid certificates (
@mhils
)- Fix a crash when pressing the Home/End keys in some screens (
@rbdixon
)- Fix a crash when reading corrupted flow dumps (
@mhils
)- Fix multiple crashes on flow export (
@mhils
)- Fix a bug where ASGI apps did not see the request body (
@mhils
)- Minor documentation improvements (
@mhils
)You can find the latest release packages at https://mitmproxy.org/downloads/.
v7.0.0
Check out our release announcement blog post! 🎉
You can find the latest release packages at https://mitmproxy.org/downloads/.
v6.0.2
This release fixes another bug in mitmweb's serialization process. All other tools are unaffected.
You can find the latest release packages at https://mitmproxy.org/downloads/.
v6.0.1
This release fixes a bug in mitmweb's serialization process. All other tools are unaffected.
You can find the latest release packages at https://mitmproxy.org/downloads/.
v6.0.0
Check out our release announcement blog post! 🎉 🔗
You can find the latest release packages at https://mitmproxy.org/downloads/.
... (truncated)
Sourced from mitmproxy's changelog.
16 September 2021: mitmproxy 7.0.3
- CVE-2021-39214: Fix request smuggling vulnerabilities reported by
@chinchila
(@mhils
)- Expose TLS 1.0 as possible minimum version on older pyOpenSSL releases (
@mhils
)- Fix compatibility with Python 3.10 (
@mhils
)4 August 2021: mitmproxy 7.0.2
- Fix a WebSocket crash introduced in 7.0.1 (
@mhils
)3 August 2021: mitmproxy 7.0.1
- Performance: Re-use OpenSSL contexts to enable TLS session resumption (
@mhils
)- Disable HTTP/2 CONNECT for Secure Web Proxies to fix compatibility with Firefox (
@mhils
)- Use local IP address as certificate subject if no other info is available (
@mhils
)- Make it possible to return multiple chunks for HTTP stream modification (
@mhils
)- Don't send WebSocket CONTINUATION frames when the peer does not send any (
@Pilphe
)- Fix HTTP stream modify example. (
@mhils
)- Fix a crash caused by no-op assignments to
Server.address
(@SaladDais
)- Fix a crash when encountering invalid certificates (
@mhils
)- Fix a crash when pressing the Home/End keys in some screens (
@rbdixon
)- Fix a crash when reading corrupted flow dumps (
@mhils
)- Fix multiple crashes on flow export (
@mhils
)- Fix a bug where ASGI apps did not see the request body (
@mhils
)- Minor documentation improvements (
@mhils
)16 July 2021: mitmproxy 7.0
New Proxy Core (
@mhils
, blog post)Mitmproxy has a completely new proxy core, fixing many longstanding issues:
- Secure Web Proxy: Mitmproxy now supports TLS-over-TLS to already encrypt the connection to the proxy.
- Server-Side Greetings: Mitmproxy now supports proxying raw TCP connections, including ones that start with a server-side greeting (e.g. SMTP).
- HTTP/1 – HTTP/2 Interoperability: mitmproxy can now accept an HTTP/2 connection from the client, and forward it to an HTTP/1 server.
- HTTP/2 Redirects: The request destination can now be changed on HTTP/2 flows.
- Connection Strategy: Users can now specify if they want mitmproxy to eagerly connect upstream or wait as long as possible. Eager connections are required to detect protocols with server-side greetings, lazy connections enable the replay of responses without connecting to an upstream server.
- Timeout Handling: Mitmproxy will now clean up idle connections and also abort requests if the client disconnects in the meantime.
- Host Header-based Proxying: If the request destination is unknown, mitmproxy now falls back to proxying based on the Host header. This means that requests can often be redirected to mitmproxy using DNS spoofing only.
- Internals: All protocol logic is now separated from I/O ("sans-io"). This greatly improves testing capabilities, prevents a wide array of race conditions, and increases proper isolation between layers.
... (truncated)
c4f3597
mitmproxy 7.0.39fed8ae
Merge pull request from GHSA-22gh-3r9q-xf382323630
[requires.io] dependency update on v7.0.x branch (#4802)2027d1e
bump installbuilder (#4800)463bb90
add python 3.10 trove classifierc99ff55
tls: fix TLS1 constantb26e76e
fix Python 3.10 compatibility8b88e8f
mitmproxy 7.0.253a916e
fix websocket regression introduced in 7.0.1 (#4733)88374e9
reopen main for developmentDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
crawler crawling uiautomator2 mitmproxy douyin aweme