Generating realistic hand manipulations

We generate realistic hand manipulations of object by using a Generative Adversarial Network by conditioning on the hand and finger pose. We adopt an encoder-decoder architecture where we input a rough rendering of the skeletal pose and get a realistic rendering of a hand as the output. We generate datasets with simulated and real human arms at different poses and use it to train our model. Given an hand pose and object pose we showcase a pipeline to generate an image conditioned on the poses.

Example skeletal poses that we feed in as input to the network

Generated frames from our network

Using our network we can generate a sequence of frames that showcase an object interaction