ContPhy: Continuum Physical Concept Learning and Reasoning from Videos

1Tsinghua University 2Wuhan University 3MIT-IBM Watson AI Lab 4MIT 5UMass Amherst
(* denotes equal contributions.)

International Conference on Machine Learning (ICML), 2024
Paper | Codebase & Dataset (All Released Now!) | Bibtex



Abstract

We introduce the Continuum Physical Dataset (ContPhy), a novel benchmark for assessing machine physical commonsense. ContPhy complements existing physical reasoning benchmarks by encompassing the inference of diverse physical properties, such as mass and density, across various scenarios and predicting corresponding dynamics. We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy, which shows that current AI models still lack physical commonsense for the continuum, especially soft-bodies, and illustrates the value of the proposed dataset. We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models, which enjoy the advantages of both models, precise dynamic predictions, and interpretable reasoning. ContPhy aims to spur progress in perception and reasoning within diverse physical settings, narrowing the divide between human and machine intelligence in understanding the physical world.





Motivation

teaser

The motivation is derived from a range of everyday soft materials and their interaction with rigid objects, whose physical behaviors or functions vary because of their diverse physical properties. a) Gasoline flows more fluently than glue due to lower viscosity, while oil with lower density tends to float above water. b) Poplin and canvas exhibit surface wrinkles with varying granularity due to their distinct bending compliance. c) The lifting approach requires less force due to the re-distributed tensile forces facilitated by the movable pulley. d) Trajectories of tennis ball and dough ball demonstrate their differing elasticity and plasticity.

Showcases




IconFluid Hourglass

Introduction: In this device, various liquids of different densities and viscosities, each represented by distinct colors, are released from corresponding emitters situated at the uppermost part of the apparatus. Under the influence of gravity, these liquids descend and traverse a series of fixed ramps (resembling sticks). This arrangement causes alterations in their flow direction. Ultimately, the liquids are funneled into containers at the bottom. This process highlights distinctive behaviors arising from the interaction of multiple fluids, attributable to their significantly varied densities. Our research is oriented towards formulating inquiries pertaining to the physical properties of these liquids and the dynamic trajectories they exhibit.

Physical Property Questions

Density I

Is the density of the orange fluid greater than that of the green fluid?

  • a) Yes
  • b) No
  • c) Cannot Judge

Density II

Is the density of the pink fluid less than that of the orange fluid?

  • a) Yes
  • b) No
  • c) Cannot Judge

Dynamics Questions

Temporal Predictive

Will most fluid from the other green emitter pass the red stick?

  • a) Yes
  • b) No

Counterfactual

If the red stick were removed, would most orange fluid flow into the cyan container?

  • a) Yes
  • b) No

Goal-Driven

What can we do to guide most of the orange fluid into the cyan container?

  • a) Remove the red stick
  • b) Remove the orange stick
  • c) Remove the blue stick
  • d) Remove the gray stick

More Samples




IconRope-Pulley System

Introduction: An array of pulleys, including both movable and fixed types, along with anchor points, is arranged on a wall. Ropes are configured with their ends connected to pulleys, loads, or anchor points, and can be wound around the pulleys. These loads possess varying masses, interacting with other forces in the system, leading to the emergence of distinct motion patterns. | The primary objective of the model is to identify the tension distributions within this elementary rope system. Additionally, it is tasked with recognizing potential correlations or constraints among objects in motion, such as the coordinated movement of loads and the rotation of pulleys on a single rope. Moreover, the model is expected to infer numerical relationships between the loads' masses.

Physical Property Questions

Mass I

Is the mass of the purple cube greater than twice that of the brown cube?

  • a) Yes
  • b) No
  • c) Cannot Judge

Mass II

Is the mass of the sphere less than that of the brown cube?

  • a) Yes
  • b) No
  • c) Cannot Judge

Tension

Is the tension in the black rope approximately equal to half that in the red rope (i.e. the short rope linking purple cube)?

  • a) Yes
  • b) No
  • c) Cannot Judge

Dynamics Questions

Counterfactual

If the purple cube were far much heavier, which direction would the blue movable pulley rotate?

  • a) Anti-clockwise
  • b) Clockwise
  • c) Not affected

Goal-Driven

What can we do to rotate the blue fixed pulley clockwise?

  • a) Decrease gray cube mass
  • b) Decrease brown cube mass
  • c) Increase sphere mass
  • d) None of the above works

Goal-Driven

What can we do to lift the red cube upwards?

  • a) Decrease red cube mass
  • b) Increase purple cube mass
  • c) Increase sphere mass
  • d) None of the above works

More Samples




IconCloth Magic Trick

Introduction: A small table hosts an assortment of objects, including pillars and plates of varying sizes, colors, and masses. Two square pieces of cloth, each possessing distinct stretching, bending characteristics, and frictional properties, are gripped at one edge and moved forward to cover these objects, causing possible collision events. Cloths are then promptly released. The fabric obstructs the view of the objects but also delineates their shapes through its deformable surface. Objects may topple over if they exceed a certain height or have low mass, resulting in observable changes in the fabric's dynamic 3D surface geometry.This scenario serves as a test for a model's capacity to discern the physical attributes of the fabrics and to predict the spatial behavior of the concealed objects in dynamic situations.

Physical Property Questions

Bending Compliance

Is the left cloth much easier to bend or have wrinkles than the other?

  • a) Yes
  • b) No

Stretching Compliance

Is the elasticity of the right cloth much greater than that of the other?

  • a) Yes
  • b) No

Friction

Is the friction on the right cloth much greater than the other?

  • a) Yes
  • b) No

Dynamics Questions

Spatial Predictive

Does the white pillar collide with the yellow plate?

  • a) Yes
  • b) No

Spatial Predictive

Is the yellow plate finally in touch with the white pillar?

  • a) Yes
  • b) No

Spatial Predictive

Which phrase below can best describe the final pose of the brown pillar?

  • a) Standing upright
  • b) Leaning
  • c) Lying horizontally

More Samples




IconBall Playground

Introduction: A playground contains obstacles of different color, and pose, along with pits randomly arranged within. Soft balls with varying deformation resistance or plasticity yield are launched randomly within the space, with varying initial positions. These balls undergo a sequence of dynamic movements, including bouncing and permanent deformation. Ultimately, some may collide with obstacles and fall into pits. This experimental scenario serves as a test to determine whether the model can accurately discern the elasticity and plasticity properties of the soft bodies and moreover make dynamic predictions and inferences based on these observations.

Physical Property Questions

Elasticity

Is the elasticity of the cyan ball much greater than the blue ball?

  • a) Yes
  • b) No

Plasticity

Is the plasticity of the red ball much less than the blue ball?

  • a) Yes
  • b) No

Dynamics Questions

Temporal Predictive

Which pit will the cyan ball finally drop into?

  • a) The left pit
  • b) The right pit
  • c) It will not drop into any pits

Counterfactual

If we removed the cyan floating wall and other balls, which pit would the cyan ball drop into?

  • a) The left pit
  • b) The right pit
  • c) None of the above

Goal-Driven

What can we do to let the cyan ball drop into the right pit?

  • a) Remove the cyan floating wall and other balls
  • b) Remove the purple floating wall and other balls
  • c) None of the above works

More Samples




ContPRO, an Oracle Model


distribution

We introduce Continuum Physical Reasoning Oracle Model, ContPRO, which marries physics-based dynamics models with the recent large language models which enjoy the advantages of both models, precise dynamic predictions, and interpretable reasoning. With questions, predefined APIs, and specific prompts, an LLM will play as a program parser that translates questions into code snippets. The visual perception module predicts objects’ location and static attributes. The physical simulation module predicts dynamics. The symbolic execution module executes the code snippet to output the answer.


Selected Qualitative Results of ContPRO Generated Execution Programs

                
# Is the mass of the green cube greater than half that of the purple sphere?
def execute_command(video):
    rope_scene = SoftScene(video, 'rope')
    green_cube = rope_scene.find("green cube")
    purple_sphere = rope_scene.find("purple sphere")

    green_cube_mass = rope_scene.query_single('mass', green_cube[0])
    purple_sphere_mass = rope_scene.query_single('mass', purple_sphere[0])

    if green_cube_mass is not None and purple_sphere_mass is not None:
        return bool_to_yesno(green_cube_mass > 0.5 * purple_sphere_mass)
    else:
        return "can not answer"

# What can we do to let most of the pink fluid enter black container? | Remove green stick
def execute_command(video):
    fluid_scene = SoftScene(video, 'fluid')
    pink_fluid = fluid_scene.find("pink fluid")
    black_container = fluid_scene.find("black container")

    gd_init_event = fluid_scene.register_event([], "remove", "green stick")
    fluid_gd_scene = fluid_scene.init_dyn_simulation(gd_init_event)

    flag = fluid_gd_scene.happen([pink_fluid], "entering", [black_container])

    return bool_to_yesno(flag)

# Is the blue pillar finally in touch with the yellow plate?
def execute_command(video):
    cloth_scene = SoftScene(video, 'cloth')
    blue_pillar = cloth_scene.find("blue pillar")
    yellow_plate = cloth_scene.find("yellow plate")

    pred_init_event = cloth_scene.register_event([], "simulate", "")
    cloth_pred_scene = cloth_scene.init_dyn_simulation(pred_init_event)

    flag = cloth_pred_scene.happen([blue_pillar, yellow_plate], "touching", "")

    return bool_to_yesno(flag)                  

# If we removed the red floating wall and other balls, which pit would the black ball drop into?
def execute_command(video):
    ball_scene = SoftScene(video, 'ball')
    black_ball = ball_scene.find("black ball")
    pits = ball_scene.find("pit")

    cf_init_event = ball_scene.register_event([], "remove", "red floating wall and other balls")
    ball_cf_scene = ball_scene.init_dyn_simulation(cf_init_event)

    for pit in pits:
        if ball_cf_scene.happen([black_ball], "droping", pit):
            return pit

    return "can not answer"
                
              



Experiments


experiments

Comparison between baselines and human on ContPhy evaluation. We list some typical values among the results from various question families, Property, Counterfactual, Goal-driven and Predictive questions. Accuracies are reported with per option and per question.




Question Distribution

distribution

Question distribution statistics of fluid, rope, cloth and ball.




Dataset Details

Sensor data outputs are multimodal, depicting the 4D states of objects across various levels, ranging from object-level, point-level to event-level.




Citation

@inproceedings{zheng2024contphy,
  title={ContPhy: Continuum Physical Concept Learning and Reasoning from Videos},
  author={Zheng, Zhicheng and Yan, Xin and Chen, Zhenfang and Wang, Jingzhou and Lim, Qin Zhi Eddie and Tenenbaum, Joshua B and Gan, Chuang},
  booktitle={International Conference on Machine Learning},
  year={2024},
  organization={PMLR}
}