抖音爬虫,输入指定用户的抖音id,即可下载TA的所有视频作品

lfykid, updated 🕥 2022-12-08 03:03:39

logo

TikTokSpider

dd

使用方法

  1. 运行dyspider.py文件,输入要爬用户的id (注意不是手机端的抖音号)

dd

  1. 使用命令行参数 --uid 输入用户id

shell python dyspider.py --uid 64742778880

  • 抖音号不是抖音ID,抖音ID在这里:

dd

抖音ID可以通过将主页分享链接发用浏览器打开查看,详细步骤参见

抖音短视频的抖音号-抖音ID怎么看?

https://jingyan.baidu.com/article/d2b1d102ce2a885c7e37d4f5.html

dd

下载效果

运行代码将会在项目文件夹下新建一个与目标用户同名的文件夹,所有视频皆存入于此

dd

dd

分析过程

从用户主页的分享页面入手

  1. 进入用户主页,点击 - 分享名片 - 链接形式,将主页分享链接发送到电脑上用chrome打开,就可以看到用户的主页面了

dd

  1. 但此时还看不到用户的作品,将chrome设置成手机模式,刷新,bingo! 作品出来了

dd

  1. 点击作品,下拉,查看network,就可以看到我们要找的作品url列表啦

https://www.amemv.com/aweme/v1/aweme/post/?user_id=6xx1xx0&count=21&max_cursor=0&aid=1128&_signature=TG2uvBAbGAHzG19a.rniF0xtrq&dytk=14d65256b82dd042058b0eca9f85461b

  1. 观察一下,这是一个ajax链接,我们需要的所有信息都在返回的包里了

  2. 模拟请求,直接点击链接,copy as curl,然后复制到Postman里转成requests代码

  3. 返回json里有一个has_more字段,如果为1表示还可以下拉出现更多作品,如果为0表示已经是最后

  4. 当我们下拉的时候可以发现,新出现的url里只有max_cursor变了,新出现的max_cursor就是上次请求返回的max_cursor,有了has_more和max_cursor两个参数,我们就可以把所有urls取到了

  5. 写一个递归函数 get_all_video_urls 根据has_more字段将所有urls递归爬取下来,终止条件是has_more==0

  6. 用一个全局变量url_list = [] 存放爬到的每一个视频的名字和地址

  7. 运行编写好的代码后发现,视频数据格式不对,返回去检查,原来第9步中的url不是真实的视频url,而是一个302跳转地址,真实视频地址在第9步response headers里的Location里

  8. 添加禁止跳转的代码,获取真实视频url

    python response = requests.request("GET", url, headers=headers, allow_redirects=False) video_url = response.headers['Location']

搞定收工!

2018/09/11更新:添加显示下载进度条功能

todolist:

  • [x] 异常处理
  • [ ] 日志功能
  • [x] 显示下载进度条

Issues

Bump certifi from 2018.8.24 to 2022.12.7

opened on 2022-12-08 03:03:39 by dependabot[bot]

Bumps certifi from 2018.8.24 to 2022.12.7.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/lfykid/TikTokSpider/network/alerts).

Bump ipython from 6.5.0 to 7.16.3

opened on 2022-01-21 20:01:51 by dependabot[bot]

Bumps ipython from 6.5.0 to 7.16.3.

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/lfykid/TikTokSpider/network/alerts).

Bump urllib3 from 1.23 to 1.26.5

opened on 2021-06-02 00:23:43 by dependabot[bot]

Bumps urllib3 from 1.23 to 1.26.5.

Release notes

Sourced from urllib3's releases.

1.26.5

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

  • Fixed deprecation warnings emitted in Python 3.10.
  • Updated vendored six library to 1.16.0.
  • Improved performance of URL parser when splitting the authority component.

If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

1.26.4

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

  • Changed behavior of the default SSLContext when connecting to HTTPS proxy during HTTPS requests. The default SSLContext now sets check_hostname=True.

If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

1.26.3

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

  • Fixed bytes and string comparison issue with headers (Pull #2141)

  • Changed ProxySchemeUnknown error message to be more actionable if the user supplies a proxy URL without a scheme (Pull #2107)

If you or your organization rely on urllib3 consider supporting us via GitHub Sponsors

1.26.2

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

  • Fixed an issue where wrap_socket and CERT_REQUIRED wouldn't be imported properly on Python 2.7.8 and earlier (Pull #2052)

1.26.1

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

  • Fixed an issue where two User-Agent headers would be sent if a User-Agent header key is passed as bytes (Pull #2047)

1.26.0

:warning: IMPORTANT: urllib3 v2.0 will drop support for Python 2: Read more in the v2.0 Roadmap

  • Added support for HTTPS proxies contacting HTTPS servers (Pull #1923, Pull #1806)

  • Deprecated negotiating TLSv1 and TLSv1.1 by default. Users that still wish to use TLS earlier than 1.2 without a deprecation warning should opt-in explicitly by setting ssl_version=ssl.PROTOCOL_TLSv1_1 (Pull #2002) Starting in urllib3 v2.0: Connections that receive a DeprecationWarning will fail

  • Deprecated Retry options Retry.DEFAULT_METHOD_WHITELIST, Retry.DEFAULT_REDIRECT_HEADERS_BLACKLIST and Retry(method_whitelist=...) in favor of Retry.DEFAULT_ALLOWED_METHODS, Retry.DEFAULT_REMOVE_HEADERS_ON_REDIRECT, and Retry(allowed_methods=...) (Pull #2000) Starting in urllib3 v2.0: Deprecated options will be removed

... (truncated)

Changelog

Sourced from urllib3's changelog.

1.26.5 (2021-05-26)

  • Fixed deprecation warnings emitted in Python 3.10.
  • Updated vendored six library to 1.16.0.
  • Improved performance of URL parser when splitting the authority component.

1.26.4 (2021-03-15)

  • Changed behavior of the default SSLContext when connecting to HTTPS proxy during HTTPS requests. The default SSLContext now sets check_hostname=True.

1.26.3 (2021-01-26)

  • Fixed bytes and string comparison issue with headers (Pull #2141)

  • Changed ProxySchemeUnknown error message to be more actionable if the user supplies a proxy URL without a scheme. (Pull #2107)

1.26.2 (2020-11-12)

  • Fixed an issue where wrap_socket and CERT_REQUIRED wouldn't be imported properly on Python 2.7.8 and earlier (Pull #2052)

1.26.1 (2020-11-11)

  • Fixed an issue where two User-Agent headers would be sent if a User-Agent header key is passed as bytes (Pull #2047)

1.26.0 (2020-11-10)

  • NOTE: urllib3 v2.0 will drop support for Python 2. Read more in the v2.0 Roadmap <https://urllib3.readthedocs.io/en/latest/v2-roadmap.html>_.

  • Added support for HTTPS proxies contacting HTTPS servers (Pull #1923, Pull #1806)

  • Deprecated negotiating TLSv1 and TLSv1.1 by default. Users that still wish to use TLS earlier than 1.2 without a deprecation warning

... (truncated)

Commits
  • d161647 Release 1.26.5
  • 2d4a3fe Improve performance of sub-authority splitting in URL
  • 2698537 Update vendored six to 1.16.0
  • 07bed79 Fix deprecation warnings for Python 3.10 ssl module
  • d725a9b Add Python 3.10 to GitHub Actions
  • 339ad34 Use pytest==6.2.4 on Python 3.10+
  • f271c9c Apply latest Black formatting
  • 1884878 [1.26] Properly proxy EOF on the SSLTransport test suite
  • a891304 Release 1.26.4
  • 8d65ea1 Merge pull request from GHSA-5phf-pp7p-vc2r
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/lfykid/TikTokSpider/network/alerts).

Bump pygments from 2.2.0 to 2.7.4

opened on 2021-03-29 20:35:29 by dependabot[bot]

Bumps pygments from 2.2.0 to 2.7.4.

Release notes

Sourced from pygments's releases.

2.7.4

  • Updated lexers:

    • Apache configurations: Improve handling of malformed tags (#1656)

    • CSS: Add support for variables (#1633, #1666)

    • Crystal (#1650, #1670)

    • Coq (#1648)

    • Fortran: Add missing keywords (#1635, #1665)

    • Ini (#1624)

    • JavaScript and variants (#1647 -- missing regex flags, #1651)

    • Markdown (#1623, #1617)

    • Shell

      • Lex trailing whitespace as part of the prompt (#1645)
      • Add missing in keyword (#1652)
    • SQL - Fix keywords (#1668)

    • Typescript: Fix incorrect punctuation handling (#1510, #1511)

  • Fix infinite loop in SML lexer (#1625)

  • Fix backtracking string regexes in JavaScript/TypeScript, Modula2 and many other lexers (#1637)

  • Limit recursion with nesting Ruby heredocs (#1638)

  • Fix a few inefficient regexes for guessing lexers

  • Fix the raw token lexer handling of Unicode (#1616)

  • Revert a private API change in the HTML formatter (#1655) -- please note that private APIs remain subject to change!

  • Fix several exponential/cubic-complexity regexes found by Ben Caller/Doyensec (#1675)

  • Fix incorrect MATLAB example (#1582)

Thanks to Google's OSS-Fuzz project for finding many of these bugs.

2.7.3

... (truncated)

Changelog

Sourced from pygments's changelog.

Version 2.7.4

(released January 12, 2021)

  • Updated lexers:

    • Apache configurations: Improve handling of malformed tags (#1656)

    • CSS: Add support for variables (#1633, #1666)

    • Crystal (#1650, #1670)

    • Coq (#1648)

    • Fortran: Add missing keywords (#1635, #1665)

    • Ini (#1624)

    • JavaScript and variants (#1647 -- missing regex flags, #1651)

    • Markdown (#1623, #1617)

    • Shell

      • Lex trailing whitespace as part of the prompt (#1645)
      • Add missing in keyword (#1652)
    • SQL - Fix keywords (#1668)

    • Typescript: Fix incorrect punctuation handling (#1510, #1511)

  • Fix infinite loop in SML lexer (#1625)

  • Fix backtracking string regexes in JavaScript/TypeScript, Modula2 and many other lexers (#1637)

  • Limit recursion with nesting Ruby heredocs (#1638)

  • Fix a few inefficient regexes for guessing lexers

  • Fix the raw token lexer handling of Unicode (#1616)

  • Revert a private API change in the HTML formatter (#1655) -- please note that private APIs remain subject to change!

  • Fix several exponential/cubic-complexity regexes found by Ben Caller/Doyensec (#1675)

  • Fix incorrect MATLAB example (#1582)

Thanks to Google's OSS-Fuzz project for finding many of these bugs.

Version 2.7.3

(released December 6, 2020)

... (truncated)

Commits
  • 4d555d0 Bump version to 2.7.4.
  • fc3b05d Update CHANGES.
  • ad21935 Revert "Added dracula theme style (#1636)"
  • e411506 Prepare for 2.7.4 release.
  • 275e34d doc: remove Perl 6 ref
  • 2e7e8c4 Fix several exponential/cubic complexity regexes found by Ben Caller/Doyensec
  • eb39c43 xquery: fix pop from empty stack
  • 2738778 fix coding style in test_analyzer_lexer
  • 02e0f09 Added 'ERROR STOP' to fortran.py keywords. (#1665)
  • c83fe48 support added for css variables (#1633)
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/lfykid/TikTokSpider/network/alerts).