crawling App by uiautomator2 & mitmproxy

maguowei, updated 🕥 2022-12-08 03:19:36

app crawler

crawling app by uiautomator2 & mitmproxy

使用 URL_SCHEMA 跳转实现抖音用户视频和视频评论的抓取

python3 URL_SCHEMA_MAP = { 'home': "snssdk1128://feed?refer=web", 'user': 'snssdk1128://user/profile/{uid}?refer=web', 'detail': 'snssdk1128://aweme/detail/{aweme_id}?refer=web', 'challenge': 'snssdk1128://challenge/detail/{challenge_id}?refer=web', 'music': 'snssdk1128://music/detail/{music_id}?refer=web', 'live': 'snssdk1128://live?room_id={room_id}&user_id={user_id}&from=webview&refer=web', 'poi":': 'snssdk1128://poi/?id={poi_id}', 'webview': 'snssdk1128://webview?url={url}&from=webview&refer=web', 'webview_fullscreen': 'snssdk1128://webview?url={url}&from=webview&hide_nav_bar=1&refer=web', 'poidetail': 'snssdk1128://poi/detail?id={id}&from=webview&refer=web', 'forward': 'snssdk1128://forward/detail/{id}', 'billboard_word': 'snssdk1128://search/trending', 'billboard_video': "snssdk1128://search/trending?type=1", 'billboard_music': "snssdk1128://search/trending?type=2", 'billboard_positive': "snssdk1128://search/trending?type=3", 'billboard_star': "snssdk1128://search/trending?type=4", }

依赖安装

下载 Android platform-tools 并解压获取 adb - https://developer.android.com/studio/releases/platform-tools?hl=zh-Cn

```bash

列出连接的设备(设备需开启开发者选项

adb devices ```

bash pipenv install pipenv shell uiautomator2 init

抖音安装

  • 使用豌豆荚安装旧版抖音APP(v7.5.0以下版本仍然信任用户CA证书)

weditor

使用web界面查看和定位元素 ```bash python -m weditor

```

mitmproxy

安装和信任证书

  • https://docs.mitmproxy.org/stable/concepts-certificates/

使用

```bash cp .env.tpl .env cp -r .mitmproxy ~/.mitmproxy make run-mitmproxy

数据库启动

make up

导入测试数据

./app/tools/simple_data_import.py

指定设备抓取用户信息和视频列表

./dy.py crawler_users --max_num=200 --device_serial=xxxxx

指定设备抓取

./dy.py crawler_comments --device_serial=xxxxxxx

指定设备抓取用户粉丝

./dy.py crawler_follower --device_serial=72bf965

多设备 抓取用户粉丝

./crawler.py crawler_follower --max_num=200

多设备 抓取用户信息、评论

./crawler.py run

```

部署机器进程管理

```bash sudo cp frp/systemd/app-crawler.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl start app-crawler.service

重启服务进程

sudo systemctl restart app-crawler.service

开机自启动

sudo systemctl enable app-crawler.service ```

常见问题

  1. 找不到设备

bash adb kill-server adb start-server 还是不行,重启手机试试

  1. adb devices 出现 no permissions (user in plugdev group; are your udev rules wrong?)
  2. adb-devices-no-permissions-user-in-plugdev-group-are-your-udev-rules-wrong

  3. weditor 打开时出现 adbutils.errors.AdbError: device not found 更换设备会出现,需要清理 Chrome 的 LocalStorage

  4. openatx/weditor/issues/57

  5. 测试机型

  6. Xiaomi Mi 6

  7. Redmi Note 8

Issues

Bump certifi from 2019.11.28 to 2022.12.7

opened on 2022-12-08 03:19:36 by dependabot[bot]

Bumps certifi from 2019.11.28 to 2022.12.7.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/maguowei/app-crawler/network/alerts).

Bump pillow from 6.2.1 to 9.3.0

opened on 2022-11-22 04:55:25 by dependabot[bot]

Bumps pillow from 6.2.1 to 9.3.0.

Release notes

Sourced from pillow's releases.

9.3.0

https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html

Changes

... (truncated)

Changelog

Sourced from pillow's changelog.

9.3.0 (2022-10-29)

  • Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]

  • Initialize libtiff buffer when saving #6699 [radarhere]

  • Inline fname2char to fix memory leak #6329 [nulano]

  • Fix memory leaks related to text features #6330 [nulano]

  • Use double quotes for version check on old CPython on Windows #6695 [hugovk]

  • Remove backup implementation of Round for Windows platforms #6693 [cgohlke]

  • Fixed set_variation_by_name offset #6445 [radarhere]

  • Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]

  • Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]

  • Added ExifTags enums #6630 [radarhere]

  • Do not modify previous frame when calculating delta in PNG #6683 [radarhere]

  • Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]

  • Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]

  • Added GPS TIFF tag info #6661 [radarhere]

  • Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]

  • Do not attempt normalization if mode is already normal #6644 [radarhere]

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/maguowei/app-crawler/network/alerts).

作者你好,看了你的几个项目,非常棒,有个疑问想请教

opened on 2022-10-18 07:55:17 by MaNongXiaoGang

就是URL_SCHEMA这些链接是怎么获取的呀,我尝试了frida搜索Intent,还有am搜索,都没有找到,有点疑惑。

Bump lxml from 4.4.2 to 4.9.1

opened on 2022-07-06 20:34:57 by dependabot[bot]

Bumps lxml from 4.4.2 to 4.9.1.

Changelog

Sourced from lxml's changelog.

4.9.1 (2022-07-01)

Bugs fixed

  • A crash was resolved when using iterwalk() (or canonicalize()) after parsing certain incorrect input. Note that iterwalk() can crash on valid input parsed with the same parser after failing to parse the incorrect input.

4.9.0 (2022-06-01)

Bugs fixed

  • GH#341: The mixin inheritance order in lxml.html was corrected. Patch by xmo-odoo.

Other changes

  • Built with Cython 0.29.30 to adapt to changes in Python 3.11 and 3.12.

  • Wheels include zlib 1.2.12, libxml2 2.9.14 and libxslt 1.1.35 (libxml2 2.9.12+ and libxslt 1.1.34 on Windows).

  • GH#343: Windows-AArch64 build support in Visual Studio. Patch by Steve Dower.

4.8.0 (2022-02-17)

Features added

  • GH#337: Path-like objects are now supported throughout the API instead of just strings. Patch by Henning Janssen.

  • The ElementMaker now supports QName values as tags, which always override the default namespace of the factory.

Bugs fixed

  • GH#338: In lxml.objectify, the XSI float annotation "nan" and "inf" were spelled in lower case, whereas XML Schema datatypes define them as "NaN" and "INF" respectively.

... (truncated)

Commits
  • d01872c Prevent parse failure in new test from leaking into later test runs.
  • d65e632 Prepare release of lxml 4.9.1.
  • 86368e9 Fix a crash when incorrect parser input occurs together with usages of iterwa...
  • 50c2764 Delete unused Travis CI config and reference in docs (GH-345)
  • 8f0bf2d Try to speed up the musllinux AArch64 build by splitting the different CPytho...
  • b9f7074 Remove debug print from test.
  • b224e0f Try to install 'xz' in wheel builds, if available, since it's now needed to e...
  • 897ebfa Update macOS deployment target version from 10.14 to 10.15 since 10.14 starts...
  • 853c9e9 Prepare release of 4.9.0.
  • d3f77e6 Add a test for https://bugs.launchpad.net/lxml/+bug/1965070 leaving out the a...
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/maguowei/app-crawler/network/alerts).

Bump ipython from 7.10.1 to 7.16.3

opened on 2022-01-21 20:13:18 by dependabot[bot]

Bumps ipython from 7.10.1 to 7.16.3.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/maguowei/app-crawler/network/alerts).

Bump mitmproxy from 4.0.4 to 7.0.3

opened on 2021-09-20 19:59:13 by dependabot[bot]

Bumps mitmproxy from 4.0.4 to 7.0.3.

Release notes

Sourced from mitmproxy's releases.

v7.0.3

  • CVE-2021-39214: Fix request smuggling vulnerabilities reported by @​chinchila
  • Expose TLS 1.0 as possible minimum version on older pyOpenSSL releases
  • Fix compatibility with Python 3.10

You can find the latest release packages at https://mitmproxy.org/downloads/.

v7.0.2

  • Fix a WebSocket crash introduced in 7.0.1 (@​mhils)

You can find the latest release packages at https://mitmproxy.org/downloads/.

v7.0.1

  • Performance: Re-use OpenSSL contexts to enable TLS session resumption (@​mhils)
  • Disable HTTP/2 CONNECT for Secure Web Proxies to fix compatibility with Firefox (@​mhils)
  • Use local IP address as certificate subject if no other info is available (@​mhils)
  • Make it possible to return multiple chunks for HTTP stream modification (@​mhils)
  • Don't send WebSocket CONTINUATION frames when the peer does not send any (@​Pilphe)
  • Fix HTTP stream modify example. (@​mhils)
  • Fix a crash caused by no-op assignments to Server.address (@​SaladDais)
  • Fix a crash when encountering invalid certificates (@​mhils)
  • Fix a crash when pressing the Home/End keys in some screens (@​rbdixon)
  • Fix a crash when reading corrupted flow dumps (@​mhils)
  • Fix multiple crashes on flow export (@​mhils)
  • Fix a bug where ASGI apps did not see the request body (@​mhils)
  • Minor documentation improvements (@​mhils)

You can find the latest release packages at https://mitmproxy.org/downloads/.

v7.0.0

Check out our release announcement blog post! 🎉

You can find the latest release packages at https://mitmproxy.org/downloads/.

v6.0.2

This release fixes another bug in mitmweb's serialization process. All other tools are unaffected.

You can find the latest release packages at https://mitmproxy.org/downloads/.

v6.0.1

This release fixes a bug in mitmweb's serialization process. All other tools are unaffected.

You can find the latest release packages at https://mitmproxy.org/downloads/.

v6.0.0

Check out our release announcement blog post! 🎉 🔗

You can find the latest release packages at https://mitmproxy.org/downloads/.

... (truncated)

Changelog

Sourced from mitmproxy's changelog.

16 September 2021: mitmproxy 7.0.3

4 August 2021: mitmproxy 7.0.2

  • Fix a WebSocket crash introduced in 7.0.1 (@​mhils)

3 August 2021: mitmproxy 7.0.1

  • Performance: Re-use OpenSSL contexts to enable TLS session resumption (@​mhils)
  • Disable HTTP/2 CONNECT for Secure Web Proxies to fix compatibility with Firefox (@​mhils)
  • Use local IP address as certificate subject if no other info is available (@​mhils)
  • Make it possible to return multiple chunks for HTTP stream modification (@​mhils)
  • Don't send WebSocket CONTINUATION frames when the peer does not send any (@​Pilphe)
  • Fix HTTP stream modify example. (@​mhils)
  • Fix a crash caused by no-op assignments to Server.address (@​SaladDais)
  • Fix a crash when encountering invalid certificates (@​mhils)
  • Fix a crash when pressing the Home/End keys in some screens (@​rbdixon)
  • Fix a crash when reading corrupted flow dumps (@​mhils)
  • Fix multiple crashes on flow export (@​mhils)
  • Fix a bug where ASGI apps did not see the request body (@​mhils)
  • Minor documentation improvements (@​mhils)

16 July 2021: mitmproxy 7.0

New Proxy Core (@​mhils, blog post)

Mitmproxy has a completely new proxy core, fixing many longstanding issues:

  • Secure Web Proxy: Mitmproxy now supports TLS-over-TLS to already encrypt the connection to the proxy.
  • Server-Side Greetings: Mitmproxy now supports proxying raw TCP connections, including ones that start with a server-side greeting (e.g. SMTP).
  • HTTP/1 – HTTP/2 Interoperability: mitmproxy can now accept an HTTP/2 connection from the client, and forward it to an HTTP/1 server.
  • HTTP/2 Redirects: The request destination can now be changed on HTTP/2 flows.
  • Connection Strategy: Users can now specify if they want mitmproxy to eagerly connect upstream or wait as long as possible. Eager connections are required to detect protocols with server-side greetings, lazy connections enable the replay of responses without connecting to an upstream server.
  • Timeout Handling: Mitmproxy will now clean up idle connections and also abort requests if the client disconnects in the meantime.
  • Host Header-based Proxying: If the request destination is unknown, mitmproxy now falls back to proxying based on the Host header. This means that requests can often be redirected to mitmproxy using DNS spoofing only.
  • Internals: All protocol logic is now separated from I/O ("sans-io"). This greatly improves testing capabilities, prevents a wide array of race conditions, and increases proper isolation between layers.

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/maguowei/app-crawler/network/alerts).

Releases

douyin-v7.5.0 apk 2019-12-13 08:39:07

crawler crawling uiautomator2 mitmproxy douyin aweme