Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

MVIG-SJTU, updated 🕥 2023-02-08 14:30:00


  • Nov 2022: AlphaPose paper is released! Checkout the paper for more details about this project.
  • Sep 2022: Jittor version of AlphaPose is released! It achieves 1.45x speed up with resnet50 backbone on the training stage.
  • July 2022: v0.6.0 version of AlphaPose is released! HybrIK for 3D pose and shape estimation is supported!
  • Jan 2022: v0.5.0 version of AlphaPose is released! Stronger whole body(face,hand,foot) keypoints! More models are availabel. Checkout docs/
  • Aug 2020: v0.4.0 version of AlphaPose is released! Stronger tracking! Include whole body(face,hand,foot) keypoints! Colab now available.
  • Dec 2019: v0.3.0 version of AlphaPose is released! Smaller model, higher accuracy!
  • Apr 2019: MXNet version of AlphaPose is released! It runs at 23 fps on COCO validation set.
  • Feb 2019: CrowdPose is integrated into AlphaPose Now!
  • Dec 2018: General version of PoseFlow is released! 3X Faster and support pose tracking results visualization!
  • Sep 2018: v0.2.0 version of AlphaPose is released! It runs at 20 fps on COCO validation set (4.6 people per image on average) and achieves 71 mAP!


AlphaPose is an accurate multi-person pose estimator, which is the first open-source system that achieves 70+ mAP (75 mAP) on COCO dataset and 80+ mAP (82.1 mAP) on MPII dataset. To match poses that correspond to the same person across frames, we also provide an efficient online pose tracker called Pose Flow. It is the first open-source online pose tracker that achieves both 60+ mAP (66.5 mAP) and 50+ MOTA (58.3 MOTA) on PoseTrack Challenge dataset.

AlphaPose supports both Linux and Windows!

COCO 17 keypoints

Halpe 26 keypoints + tracking

Halpe 136 keypoints + tracking YouTube link

SMPL + tracking


Pose Estimation

Results on COCO test-dev 2015:

| Method | AP @0.5:0.95 | AP @0.5 | AP @0.75 | AP medium | AP large | |:-------|:-----:|:-------:|:-------:|:-------:|:-------:| | OpenPose (CMU-Pose) | 61.8 | 84.9 | 67.5 | 57.1 | 68.2 | | Detectron (Mask R-CNN) | 67.0 | 88.0 | 73.1 | 62.2 | 75.6 | | AlphaPose | 73.3 | 89.2 | 79.1 | 69.0 | 78.6 |

Results on MPII full test set:

| Method | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Ave | |:-------|:-----:|:-------:|:-------:|:-------:|:-------:|:-------:|:-------:|:-------:| | OpenPose (CMU-Pose) | 91.2 | 87.6 | 77.7 | 66.8 | 75.4 | 68.9 | 61.7 | 75.6 | | Newell & Deng | 92.1 | 89.3 | 78.9 | 69.8 | 76.2 | 71.6 | 64.7 | 77.5 | | AlphaPose | 91.3 | 90.5 | 84.0 | 76.4 | 80.3 | 79.9 | 72.4 | 82.1 |

More results and models are available in the docs/

Pose Tracking

Please read trackers/ for details.


Please read docs/ for details.


Please check out docs/

Model Zoo

Please check out docs/

Quick Start

  • Colab: We provide a colab example for your quick start.

  • Inference: Inference demo bash ./scripts/ ${CONFIG} ${CHECKPOINT} ${VIDEO_NAME} # ${OUTPUT_DIR}, optional

Inference SMPL (Download the SMPL model basicModel_neutral_lbs_10_207_0_v1.0.0.pkl from here and put it in model_files/). bash ./scripts/ ./configs/smpl/256x192_adam_lr1e-3-res34_smpl_24_3d_base_2x_mix.yaml ${CHECKPOINT} ${VIDEO_NAME} # ${OUTPUT_DIR}, optional For high level API, please refer to ./scripts/ To enable tracking, please refer to this page.

  • Training: Train from scratch bash ./scripts/ ${CONFIG} ${EXP_ID}

  • Validation: Validate your model on MSCOCO val2017 bash ./scripts/ ${CONFIG} ${CHECKPOINT}


Demo using FastPose model. ``` bash ./scripts/ configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml pretrained_models/fast_res50_256x192.pth ${VIDEO_NAME}


python scripts/ --cfg configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml --checkpoint pretrained_models/fast_res50_256x192.pth --indir examples/demo/

or if you want to use yolox-x as the detector

python scripts/ --detector yolox-x --cfg configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml --checkpoint pretrained_models/fast_res50_256x192.pth --indir examples/demo/ ```

Train FastPose on mscoco dataset. bash ./scripts/ ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml exp_fastpose

More detailed inference options and examples, please refer to

Common issue & FAQ

Check out for faq. If it can not solve your problems or if you find any bugs, don't hesitate to comment on GitHub or make a pull request!


AlphaPose is based on RMPE(ICCV'17), authored by Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai and Cewu Lu, Cewu Lu is the corresponding author. Currently, it is maintained by Jiefeng Li*, Hao-shu Fang*, Haoyi Zhu, Yuliang Xiu and Chao Xu.

The main contributors are listed in doc/


  • [x] Multi-GPU/CPU inference
  • [x] 3D pose
  • [x] add tracking flag
  • [ ] PyTorch C++ version
  • [x] Add model trained on mixture dataset (Check the model zoo)
  • [ ] dense support
  • [x] small box easy filter
  • [x] Crowdpose support
  • [ ] Speed up PoseFlow
  • [x] Add stronger/light detectors (yolox is now supported)
  • [x] High level API (check the scripts/

We would really appreciate if you can offer any help and be the contributor of AlphaPose.


Please cite these papers in your publications if it helps your research:

  author = {Fang, Hao-Shu and Li, Jiefeng and Tang, Hongyang and Xu, Chao and Zhu, Haoyi and Xiu, Yuliang and Li, Yong-Lu and Lu, Cewu},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  title = {AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time},
  year = {2022}

  title={{RMPE}: Regional Multi-person Pose Estimation},
  author={Fang, Hao-Shu and Xie, Shuqin and Tai, Yu-Wing and Lu, Cewu},

    title={Crowdpose: Efficient crowded scenes pose estimation and a new benchmark},
    author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
    booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},

If you used the 3D mesh reconstruction module, please also cite:

    title={Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation},
    author={Li, Jiefeng and Xu, Chao and Chen, Zhicun and Bian, Siyuan and Yang, Lixin and Lu, Cewu},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

If you used the PoseFlow tracking module, please also cite:

  author = {Xiu, Yuliang and Li, Jiefeng and Wang, Haoyu and Fang, Yinghong and Lu, Cewu},
  title = {{Pose Flow}: Efficient Online Pose Tracking},
  year = {2018}


AlphaPose is freely available for free non-commercial use, and may be redistributed under these conditions. For commercial queries, please drop an e-mail at mvig.alphapose[at]gmail[dot]com and cc lucewu[[at]sjtu[dot]edu[dot]cn. We will send the detail agreement to you.


Link (Mega drive) broken for human reid model

opened on 2023-03-20 14:15:25 by jschristophe


Thanks for your work.

The link for download the humand reid model is broken on mega drive and I can't download on the other site. Can you re upload the model on Mega drive and give us the new link ?



opened on 2023-03-07 09:41:15 by RosslynD

报错如下: Traceback (most recent call last): File "/content/drive/MyDrive/AlphaPose-pytorch/", line 34, in os.mkdir(args.outputpath) FileNotFoundError: [Errno 2] No such file or directory: 'examples/res/' ———————————————————————————————————— 我在examples文件夹下建立了res文件夹,依旧报错,请问如何解决?

TypeError: Object of type 'Tensor' is not JSON serializable

opened on 2023-03-07 02:23:45 by JeanGoh

I tried running the code, however I was unable to save the pose estimation results as a json file. Is ther any solutions for this?

Full code: import os import time from threading import Thread from queue import Queue

import cv2 import json import numpy as np import torch import torch.multiprocessing as mp

from alphapose.utils.transforms import get_func_heatmap_to_coord from alphapose.utils.pPose_nms import pose_nms, write_json

DEFAULT_VIDEO_SAVE_OPT = { 'savepath': 'examples/res/1.mp4', 'fourcc': cv2.VideoWriter_fourcc(*'mp4v'), 'fps': 25, 'frameSize': (640, 480) }

EVAL_JOINTS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

class DataWriter(): def init(self, cfg, opt, save_video=False, video_save_opt=DEFAULT_VIDEO_SAVE_OPT, queueSize=1024): self.cfg = cfg self.opt = opt self.video_save_opt = video_save_opt

    self.eval_joints = EVAL_JOINTS
    self.save_video = save_video
    self.heatmap_to_coord = get_func_heatmap_to_coord(cfg)
    # initialize the queue used to store frames read from
    # the video file
    if opt.sp:
        self.result_queue = Queue(maxsize=queueSize)
        self.result_queue = mp.Queue(maxsize=queueSize)

    if opt.save_img:
        if not os.path.exists(opt.outputpath + '/vis'):
            os.mkdir(opt.outputpath + '/vis')

    if opt.pose_flow:
        from trackers.PoseFlow.poseflow_infer import PoseFlowWrapper
        self.pose_flow_wrapper = PoseFlowWrapper(save_path=os.path.join(opt.outputpath, 'poseflow'))

    if self.opt.save_img or self.save_video or self.opt.vis:
        loss_type = self.cfg.DATA_PRESET.get('LOSS_TYPE', 'MSELoss')
        num_joints = self.cfg.DATA_PRESET.NUM_JOINTS
        if loss_type == 'MSELoss':
            self.vis_thres = [0.4] * num_joints
        elif 'JointRegression' in loss_type:
            self.vis_thres = [0.05] * num_joints
        elif loss_type == 'Combined':
            if num_joints == 68:
                hand_face_num = 42
                hand_face_num = 110
            self.vis_thres = [0.4] * (num_joints - hand_face_num) + [0.05] * hand_face_num

    self.use_heatmap_loss = (self.cfg.DATA_PRESET.get('LOSS_TYPE', 'MSELoss') == 'MSELoss')

def start_worker(self, target):
    if self.opt.sp:
        p = Thread(target=target, args=())
        p = mp.Process(target=target, args=())
    # p.daemon = True
    return p

def start(self):
    # start a thread to read pose estimation results per frame
    self.result_worker = self.start_worker(self.update)
    return self

def update(self):
    final_result = []
    norm_type = self.cfg.LOSS.get('NORM_TYPE', None)
    hm_size = self.cfg.DATA_PRESET.HEATMAP_SIZE
    if self.save_video:
        # initialize the file video stream, adapt ouput video resolution to original video
        stream = cv2.VideoWriter(*[self.video_save_opt[k] for k in ['savepath', 'fourcc', 'fps', 'frameSize']])
        if not stream.isOpened():
            print("Try to use other video encoders...")
            ext = self.video_save_opt['savepath'].split('.')[-1]
            fourcc, _ext = self.recognize_video_ext(ext)
            self.video_save_opt['fourcc'] = fourcc
            self.video_save_opt['savepath'] = self.video_save_opt['savepath'][:-4] + _ext
            stream = cv2.VideoWriter(*[self.video_save_opt[k] for k in ['savepath', 'fourcc', 'fps', 'frameSize']])
        assert stream.isOpened(), 'Cannot open video for writing'
    # keep looping infinitelyd
    while True:
        # ensure the queue is not empty and get item
        (boxes, scores, ids, hm_data, cropped_boxes, orig_img, im_name) = self.wait_and_get(self.result_queue)
        if orig_img is None:
            # if the thread indicator variable is set (img is None), stop the thread
            if self.save_video:
            write_json(final_result, self.opt.outputpath, form=self.opt.format, for_eval=self.opt.eval)
            print("Results have been written to json.")
        # image channel RGB->BGR
        orig_img = np.array(orig_img, dtype=np.uint8)[:, :, ::-1]
        if boxes is None or len(boxes) == 0:
            if self.opt.save_img or self.save_video or self.opt.vis:
                self.write_image(orig_img, im_name, stream=stream if self.save_video else None)
            # location prediction (n, kp, 2) | score prediction (n, kp, 1)
            assert hm_data.dim() == 4

            face_hand_num = 110
            if hm_data.size()[1] == 136:
                self.eval_joints = [*range(0,136)]
            elif hm_data.size()[1] == 26:
                self.eval_joints = [*range(0,26)]
            elif hm_data.size()[1] == 133:
                self.eval_joints = [*range(0,133)]
            elif hm_data.size()[1] == 68:
                face_hand_num = 42
                self.eval_joints = [*range(0,68)]
            elif hm_data.size()[1] == 21:
                self.eval_joints = [*range(0,21)]
            pose_coords = []
            pose_scores = []
            for i in range(hm_data.shape[0]):
                bbox = cropped_boxes[i].tolist()
                if isinstance(self.heatmap_to_coord, list):
                    pose_coords_body_foot, pose_scores_body_foot = self.heatmap_to_coord[0](
                        hm_data[i][self.eval_joints[:-face_hand_num]], bbox, hm_shape=hm_size, norm_type=norm_type)
                    pose_coords_face_hand, pose_scores_face_hand = self.heatmap_to_coord[1](
                        hm_data[i][self.eval_joints[-face_hand_num:]], bbox, hm_shape=hm_size, norm_type=norm_type)
                    pose_coord = np.concatenate((pose_coords_body_foot, pose_coords_face_hand), axis=0)
                    pose_score = np.concatenate((pose_scores_body_foot, pose_scores_face_hand), axis=0)
                    pose_coord, pose_score = self.heatmap_to_coord(hm_data[i][self.eval_joints], bbox, hm_shape=hm_size, norm_type=norm_type)
            preds_img =
            preds_scores =

            _result = []
            for k in range(len(scores)):
                        'proposal_score': torch.mean(preds_scores[k]) + scores[k] + 1.25 * max(preds_scores[k]),
                        'box':[boxes[k][0], boxes[k][1], boxes[k][2]-boxes[k][0],boxes[k][3]-boxes[k][1]] 

            result = {
                'imgname': im_name,
                'result': _result

            if self.opt.pose_flow:
                poseflow_result = self.pose_flow_wrapper.step(orig_img, result)
                for i in range(len(poseflow_result)):
                    result['result'][i]['idx'] = poseflow_result[i]['idx']

            if self.opt.save_img or self.save_video or self.opt.vis:
                if hm_data.size()[1] == 49:
                    from alphapose.utils.vis import vis_frame_dense as vis_frame
                elif self.opt.vis_fast:
                    from alphapose.utils.vis import vis_frame_fast as vis_frame
                    from alphapose.utils.vis import vis_frame
                #img = vis_frame(orig_img, result, self.opt, self.vis_thres)
                img = vis_frame(orig_img, result, self.opt)
                self.write_image(img, im_name, stream=stream if self.save_video else None)

def write_image(self, img, im_name, stream=None):
    if self.opt.vis:
        cv2.imshow("AlphaPose Demo", img)
    if self.opt.save_img:
        cv2.imwrite(os.path.join(self.opt.outputpath, 'vis', im_name), img)
    if self.save_video:

def wait_and_put(self, queue, item):

def wait_and_get(self, queue):
    return queue.get()

def save(self, boxes, scores, ids, hm_data, cropped_boxes, orig_img, im_name):
    # save next frame in the queue
    self.wait_and_put(self.result_queue, (boxes, scores, ids, hm_data, cropped_boxes, orig_img, im_name))

def running(self):
    # indicate that the thread is still running
    return not self.result_queue.empty()

def count(self):
    # indicate the remaining images
    return self.result_queue.qsize()

def stop(self):
    # indicate that the thread should be stopped, None, None, None, None, None, None)

def terminate(self):
    # directly terminate

def clear_queues(self):

def clear(self, queue):
    while not queue.empty():

def results(self):
    # return final result
    return self.final_result

def recognize_video_ext(self, ext=''):
    if ext == 'mp4':
        return cv2.VideoWriter_fourcc(*'mp4v'), '.' + ext
    elif ext == 'avi':
        return cv2.VideoWriter_fourcc(*'XVID'), '.' + ext
    elif ext == 'mov':
        return cv2.VideoWriter_fourcc(*'XVID'), '.' + ext
        print("Unknow video format {}, will use .mp4 instead of it".format(ext))
        return cv2.VideoWriter_fourcc(*'mp4v'), '.mp4'

What the meaning of self._check_centers in preprocess for halpe dataset?

opened on 2023-02-21 08:46:12 by Matthew-CQ

Hi, could u please explain the preprocess code meaning? ```python if self._check_centers and self._train: bbox_center, bbox_area = self._get_box_center_area((xmin, ymin, xmax, ymax)) kp_center, num_vis = self._get_keypoints_center_count(joints_3d) ks = np.exp(-2 * np.sum(np.square(bbox_center - kp_center)) / bbox_area) if (num_vis / 80.0 + 47 / 80.0) > ks: continue ````

crowdpose train

opened on 2023-02-21 01:41:35 by whffams2

Thank you very much for your work, I would like to ask how should I train crowdpose.

TypeError("forward() got an unexpected keyword argument 'flip_test'") when using demo_3d_inference

opened on 2023-02-19 12:59:24 by Schloool

Using the works fine for me.

However, when using the script the error TypeError("forward() got an unexpected keyword argument 'flip_test'") occurs.

Command: python scripts/ --cfg configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml --checkpoint pretrained_models/fast_res50_256x192.pth --indir examples/demo/ --save_img

Full log: /root/miniconda3/envs/alphapose/lib/python3.7/site-packages/torchvision/models/ UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. f"The parameter '{pretrained_param}' is deprecated since 0.13 and may be removed in the future, " /root/miniconda3/envs/alphapose/lib/python3.7/site-packages/torchvision/models/ UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights. warnings.warn(msg) Loading pose model from pretrained_models/fast_res50_256x192.pth... Loading YOLO model.. 0%| | 0/3 [00:01<?, ?it/s] TypeError("forward() got an unexpected keyword argument 'flip_test'") An error as above occurs when processing the images, please check it

The same error is issued when using dem demo_3d_inference-script in the official collab notebook.

Machine Vision and Intelligence Group @ SJTU
GitHub Repository Homepage

pose-estimation posetracking tracking gpu pytorch realtime human-pose-estimation human-tracking human-pose-tracking alpha-pose alphapose person-pose-estimation accurate crowdpose full-body whole-body skeleton keypoints human-computer-interaction human-joints