AlphaPose is an accurate multi-person pose estimator, which is the first open-source system that achieves 70+ mAP (75 mAP) on COCO dataset and 80+ mAP (82.1 mAP) on MPII dataset. To match poses that correspond to the same person across frames, we also provide an efficient online pose tracker called Pose Flow. It is the first open-source online pose tracker that achieves both 60+ mAP (66.5 mAP) and 50+ MOTA (58.3 MOTA) on PoseTrack Challenge dataset.
AlphaPose supports both Linux and Windows!
Results on COCO test-dev 2015:
| Method | AP @0.5:0.95 | AP @0.5 | AP @0.75 | AP medium | AP large | |:-------|:-----:|:-------:|:-------:|:-------:|:-------:| | OpenPose (CMU-Pose) | 61.8 | 84.9 | 67.5 | 57.1 | 68.2 | | Detectron (Mask R-CNN) | 67.0 | 88.0 | 73.1 | 62.2 | 75.6 | | AlphaPose | 73.3 | 89.2 | 79.1 | 69.0 | 78.6 |
Results on MPII full test set:
| Method | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Ave | |:-------|:-----:|:-------:|:-------:|:-------:|:-------:|:-------:|:-------:|:-------:| | OpenPose (CMU-Pose) | 91.2 | 87.6 | 77.7 | 66.8 | 75.4 | 68.9 | 61.7 | 75.6 | | Newell & Deng | 92.1 | 89.3 | 78.9 | 69.8 | 76.2 | 71.6 | 64.7 | 77.5 | | AlphaPose | 91.3 | 90.5 | 84.0 | 76.4 | 80.3 | 79.9 | 72.4 | 82.1 |
More results and models are available in the docs/MODEL_ZOO.md.
Please read trackers/README.md for details.
Please read docs/CrowdPose.md for details.
Please check out docs/INSTALL.md
Please check out docs/MODEL_ZOO.md
Colab: We provide a colab example for your quick start.
Inference: Inference demo
bash
./scripts/inference.sh ${CONFIG} ${CHECKPOINT} ${VIDEO_NAME} # ${OUTPUT_DIR}, optional
Inference SMPL (Download the SMPL model basicModel_neutral_lbs_10_207_0_v1.0.0.pkl
from here and put it in model_files/
).
bash
./scripts/inference_3d.sh ./configs/smpl/256x192_adam_lr1e-3-res34_smpl_24_3d_base_2x_mix.yaml ${CHECKPOINT} ${VIDEO_NAME} # ${OUTPUT_DIR}, optional
For high level API, please refer to ./scripts/demo_api.py
. To enable tracking, please refer to this page.
Training: Train from scratch
bash
./scripts/train.sh ${CONFIG} ${EXP_ID}
Validation: Validate your model on MSCOCO val2017
bash
./scripts/validate.sh ${CONFIG} ${CHECKPOINT}
Examples:
Demo using FastPose
model.
``` bash
./scripts/inference.sh configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml pretrained_models/fast_res50_256x192.pth ${VIDEO_NAME}
python scripts/demo_inference.py --cfg configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml --checkpoint pretrained_models/fast_res50_256x192.pth --indir examples/demo/
python scripts/demo_inference.py --detector yolox-x --cfg configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml --checkpoint pretrained_models/fast_res50_256x192.pth --indir examples/demo/ ```
Train FastPose
on mscoco dataset.
bash
./scripts/train.sh ./configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml exp_fastpose
More detailed inference options and examples, please refer to GETTING_STARTED.md
Check out faq.md for faq. If it can not solve your problems or if you find any bugs, don't hesitate to comment on GitHub or make a pull request!
AlphaPose is based on RMPE(ICCV'17), authored by Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai and Cewu Lu, Cewu Lu is the corresponding author. Currently, it is maintained by Jiefeng Li*, Hao-shu Fang*, Haoyi Zhu, Yuliang Xiu and Chao Xu.
The main contributors are listed in doc/contributors.md.
We would really appreciate if you can offer any help and be the contributor of AlphaPose.
Please cite these papers in your publications if it helps your research:
@article{alphapose,
author = {Fang, Hao-Shu and Li, Jiefeng and Tang, Hongyang and Xu, Chao and Zhu, Haoyi and Xiu, Yuliang and Li, Yong-Lu and Lu, Cewu},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
title = {AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time},
year = {2022}
}
@inproceedings{fang2017rmpe,
title={{RMPE}: Regional Multi-person Pose Estimation},
author={Fang, Hao-Shu and Xie, Shuqin and Tai, Yu-Wing and Lu, Cewu},
booktitle={ICCV},
year={2017}
}
@inproceedings{li2019crowdpose,
title={Crowdpose: Efficient crowded scenes pose estimation and a new benchmark},
author={Li, Jiefeng and Wang, Can and Zhu, Hao and Mao, Yihuan and Fang, Hao-Shu and Lu, Cewu},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={10863--10872},
year={2019}
}
If you used the 3D mesh reconstruction module, please also cite:
@inproceedings{li2021hybrik,
title={Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation},
author={Li, Jiefeng and Xu, Chao and Chen, Zhicun and Bian, Siyuan and Yang, Lixin and Lu, Cewu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={3383--3393},
year={2021}
}
If you used the PoseFlow tracking module, please also cite:
@inproceedings{xiu2018poseflow,
author = {Xiu, Yuliang and Li, Jiefeng and Wang, Haoyu and Fang, Yinghong and Lu, Cewu},
title = {{Pose Flow}: Efficient Online Pose Tracking},
booktitle={BMVC},
year = {2018}
}
AlphaPose is freely available for free non-commercial use, and may be redistributed under these conditions. For commercial queries, please drop an e-mail at mvig.alphapose[at]gmail[dot]com and cc lucewu[[at]sjtu[dot]edu[dot]cn. We will send the detail agreement to you.
Hello,
Thanks for your work.
The link for download the humand reid model is broken on mega drive and I can't download on the other site. Can you re upload the model on Mega drive and give us the new link ?
Thanks
报错如下:
Traceback (most recent call last):
File "/content/drive/MyDrive/AlphaPose-pytorch/demo.py", line 34, in
I tried running the writer.py code, however I was unable to save the pose estimation results as a json file. Is ther any solutions for this?
Full code: import os import time from threading import Thread from queue import Queue
import cv2 import json import numpy as np import torch import torch.multiprocessing as mp
from alphapose.utils.transforms import get_func_heatmap_to_coord from alphapose.utils.pPose_nms import pose_nms, write_json
DEFAULT_VIDEO_SAVE_OPT = { 'savepath': 'examples/res/1.mp4', 'fourcc': cv2.VideoWriter_fourcc(*'mp4v'), 'fps': 25, 'frameSize': (640, 480) }
EVAL_JOINTS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
class DataWriter(): def init(self, cfg, opt, save_video=False, video_save_opt=DEFAULT_VIDEO_SAVE_OPT, queueSize=1024): self.cfg = cfg self.opt = opt self.video_save_opt = video_save_opt
self.eval_joints = EVAL_JOINTS
self.save_video = save_video
self.heatmap_to_coord = get_func_heatmap_to_coord(cfg)
# initialize the queue used to store frames read from
# the video file
if opt.sp:
self.result_queue = Queue(maxsize=queueSize)
else:
self.result_queue = mp.Queue(maxsize=queueSize)
if opt.save_img:
if not os.path.exists(opt.outputpath + '/vis'):
os.mkdir(opt.outputpath + '/vis')
if opt.pose_flow:
from trackers.PoseFlow.poseflow_infer import PoseFlowWrapper
self.pose_flow_wrapper = PoseFlowWrapper(save_path=os.path.join(opt.outputpath, 'poseflow'))
if self.opt.save_img or self.save_video or self.opt.vis:
loss_type = self.cfg.DATA_PRESET.get('LOSS_TYPE', 'MSELoss')
num_joints = self.cfg.DATA_PRESET.NUM_JOINTS
if loss_type == 'MSELoss':
self.vis_thres = [0.4] * num_joints
elif 'JointRegression' in loss_type:
self.vis_thres = [0.05] * num_joints
elif loss_type == 'Combined':
if num_joints == 68:
hand_face_num = 42
else:
hand_face_num = 110
self.vis_thres = [0.4] * (num_joints - hand_face_num) + [0.05] * hand_face_num
self.use_heatmap_loss = (self.cfg.DATA_PRESET.get('LOSS_TYPE', 'MSELoss') == 'MSELoss')
def start_worker(self, target):
if self.opt.sp:
p = Thread(target=target, args=())
else:
p = mp.Process(target=target, args=())
# p.daemon = True
p.start()
return p
def start(self):
# start a thread to read pose estimation results per frame
self.result_worker = self.start_worker(self.update)
return self
def update(self):
final_result = []
norm_type = self.cfg.LOSS.get('NORM_TYPE', None)
hm_size = self.cfg.DATA_PRESET.HEATMAP_SIZE
if self.save_video:
# initialize the file video stream, adapt ouput video resolution to original video
stream = cv2.VideoWriter(*[self.video_save_opt[k] for k in ['savepath', 'fourcc', 'fps', 'frameSize']])
if not stream.isOpened():
print("Try to use other video encoders...")
ext = self.video_save_opt['savepath'].split('.')[-1]
fourcc, _ext = self.recognize_video_ext(ext)
self.video_save_opt['fourcc'] = fourcc
self.video_save_opt['savepath'] = self.video_save_opt['savepath'][:-4] + _ext
stream = cv2.VideoWriter(*[self.video_save_opt[k] for k in ['savepath', 'fourcc', 'fps', 'frameSize']])
assert stream.isOpened(), 'Cannot open video for writing'
# keep looping infinitelyd
while True:
# ensure the queue is not empty and get item
(boxes, scores, ids, hm_data, cropped_boxes, orig_img, im_name) = self.wait_and_get(self.result_queue)
if orig_img is None:
# if the thread indicator variable is set (img is None), stop the thread
if self.save_video:
stream.release()
write_json(final_result, self.opt.outputpath, form=self.opt.format, for_eval=self.opt.eval)
print("Results have been written to json.")
return
# image channel RGB->BGR
orig_img = np.array(orig_img, dtype=np.uint8)[:, :, ::-1]
if boxes is None or len(boxes) == 0:
if self.opt.save_img or self.save_video or self.opt.vis:
self.write_image(orig_img, im_name, stream=stream if self.save_video else None)
else:
# location prediction (n, kp, 2) | score prediction (n, kp, 1)
assert hm_data.dim() == 4
face_hand_num = 110
if hm_data.size()[1] == 136:
self.eval_joints = [*range(0,136)]
elif hm_data.size()[1] == 26:
self.eval_joints = [*range(0,26)]
elif hm_data.size()[1] == 133:
self.eval_joints = [*range(0,133)]
elif hm_data.size()[1] == 68:
face_hand_num = 42
self.eval_joints = [*range(0,68)]
elif hm_data.size()[1] == 21:
self.eval_joints = [*range(0,21)]
pose_coords = []
pose_scores = []
for i in range(hm_data.shape[0]):
bbox = cropped_boxes[i].tolist()
if isinstance(self.heatmap_to_coord, list):
pose_coords_body_foot, pose_scores_body_foot = self.heatmap_to_coord[0](
hm_data[i][self.eval_joints[:-face_hand_num]], bbox, hm_shape=hm_size, norm_type=norm_type)
pose_coords_face_hand, pose_scores_face_hand = self.heatmap_to_coord[1](
hm_data[i][self.eval_joints[-face_hand_num:]], bbox, hm_shape=hm_size, norm_type=norm_type)
pose_coord = np.concatenate((pose_coords_body_foot, pose_coords_face_hand), axis=0)
pose_score = np.concatenate((pose_scores_body_foot, pose_scores_face_hand), axis=0)
else:
pose_coord, pose_score = self.heatmap_to_coord(hm_data[i][self.eval_joints], bbox, hm_shape=hm_size, norm_type=norm_type)
pose_coords.append(torch.from_numpy(pose_coord).unsqueeze(0))
pose_scores.append(torch.from_numpy(pose_score).unsqueeze(0))
preds_img = torch.cat(pose_coords)
preds_scores = torch.cat(pose_scores)
_result = []
for k in range(len(scores)):
_result.append(
{
'keypoints':preds_img[k],
'kp_score':preds_scores[k],
'proposal_score': torch.mean(preds_scores[k]) + scores[k] + 1.25 * max(preds_scores[k]),
'idx':ids[k],
'box':[boxes[k][0], boxes[k][1], boxes[k][2]-boxes[k][0],boxes[k][3]-boxes[k][1]]
}
)
result = {
'imgname': im_name,
'result': _result
}
if self.opt.pose_flow:
poseflow_result = self.pose_flow_wrapper.step(orig_img, result)
for i in range(len(poseflow_result)):
result['result'][i]['idx'] = poseflow_result[i]['idx']
final_result.append(result)
if self.opt.save_img or self.save_video or self.opt.vis:
if hm_data.size()[1] == 49:
from alphapose.utils.vis import vis_frame_dense as vis_frame
elif self.opt.vis_fast:
from alphapose.utils.vis import vis_frame_fast as vis_frame
else:
from alphapose.utils.vis import vis_frame
#img = vis_frame(orig_img, result, self.opt, self.vis_thres)
img = vis_frame(orig_img, result, self.opt)
self.write_image(img, im_name, stream=stream if self.save_video else None)
def write_image(self, img, im_name, stream=None):
if self.opt.vis:
cv2.imshow("AlphaPose Demo", img)
cv2.waitKey(30)
if self.opt.save_img:
cv2.imwrite(os.path.join(self.opt.outputpath, 'vis', im_name), img)
if self.save_video:
stream.write(img)
def wait_and_put(self, queue, item):
queue.put(item)
def wait_and_get(self, queue):
return queue.get()
def save(self, boxes, scores, ids, hm_data, cropped_boxes, orig_img, im_name):
# save next frame in the queue
self.wait_and_put(self.result_queue, (boxes, scores, ids, hm_data, cropped_boxes, orig_img, im_name))
def running(self):
# indicate that the thread is still running
return not self.result_queue.empty()
def count(self):
# indicate the remaining images
return self.result_queue.qsize()
def stop(self):
# indicate that the thread should be stopped
self.save(None, None, None, None, None, None, None)
self.result_worker.join()
def terminate(self):
# directly terminate
self.result_worker.terminate()
def clear_queues(self):
self.clear(self.result_queue)
def clear(self, queue):
while not queue.empty():
queue.get()
def results(self):
# return final result
print(self.final_result)
return self.final_result
def recognize_video_ext(self, ext=''):
if ext == 'mp4':
return cv2.VideoWriter_fourcc(*'mp4v'), '.' + ext
elif ext == 'avi':
return cv2.VideoWriter_fourcc(*'XVID'), '.' + ext
elif ext == 'mov':
return cv2.VideoWriter_fourcc(*'XVID'), '.' + ext
else:
print("Unknow video format {}, will use .mp4 instead of it".format(ext))
return cv2.VideoWriter_fourcc(*'mp4v'), '.mp4'
Hi, could u please explain the preprocess code meaning? https://github.com/MVIG-SJTU/AlphaPose/blob/c60106d19afb443e964df6f06ed1842962f5f1f7/alphapose/datasets/halpe_coco_wholebody_136.py#L155 ```python if self._check_centers and self._train: bbox_center, bbox_area = self._get_box_center_area((xmin, ymin, xmax, ymax)) kp_center, num_vis = self._get_keypoints_center_count(joints_3d) ks = np.exp(-2 * np.sum(np.square(bbox_center - kp_center)) / bbox_area) if (num_vis / 80.0 + 47 / 80.0) > ks: continue ````
Thank you very much for your work, I would like to ask how should I train crowdpose.
Using the demo_interference.py
works fine for me.
However, when using the script demo_3d_interference.py
the error TypeError("forward() got an unexpected keyword argument 'flip_test'")
occurs.
Command:
python scripts/demo_3d_inference.py --cfg configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml --checkpoint pretrained_models/fast_res50_256x192.pth --indir examples/demo/ --save_img
Full log:
/root/miniconda3/envs/alphapose/lib/python3.7/site-packages/torchvision/models/_utils.py:209: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
f"The parameter '{pretrained_param}' is deprecated since 0.13 and may be removed in the future, "
/root/miniconda3/envs/alphapose/lib/python3.7/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Loading pose model from pretrained_models/fast_res50_256x192.pth...
Loading YOLO model..
0%| | 0/3 [00:01<?, ?it/s]
TypeError("forward() got an unexpected keyword argument 'flip_test'")
An error as above occurs when processing the images, please check it
The same error is issued when using dem demo_3d_inference
-script in the official collab notebook.
pose-estimation posetracking tracking gpu pytorch realtime human-pose-estimation human-tracking human-pose-tracking alpha-pose alphapose person-pose-estimation accurate crowdpose full-body whole-body skeleton keypoints human-computer-interaction human-joints