Isaac Sim 一百讲（7）：Camera

Isaac 101

前言#

在之前的若干章节中，我们已经十分 solid 地了解了从 Isaac Sim 的基础内容以及概念，到引入物体，距离完成一个比较简单的抓取物体的 demo，我们现在还需要的东西并不是很多了，也就只有相机以及机器人了。

本章节中我们将讲解如何创建一个相机，以及如何使用相机来获得不同类型的数据。

创建相机#

相机是 Isaac Sim 中的一种 Prim 类型，可以直接通过，详细的 API 可以看到 Isaac Sim 的 API 文档 ↗，在这里介绍所需的必要 API。

Isaac Sim 的 Camera 本质上是对于 render product 的上层封装，render product 则可以简单理解为渲染的一个媒介，在使用 Camera 的接口创建新的 Camera 的时候，一般来说会直接创建一个新的 render product。

然而因为 Isaac Sim 流水线的一些清理问题，被删除掉的相机的 render product 不能被很好地清理掉，假如说删除对应的相机，再在同样的 prim path 下创建同样的相机，就会在相当多的 world step 下无法获得到一些中间表征的渲染结果。在这里直接手动创建 render product 并指定给相机：

from omni.isaac.sensor import Camera
import omni.replicator.core as rep

rp = rep.create.render_product(rep.create.camera(), (1280, 720))
camera = Camera(
    prim_path="/World/camera",
    name="camera",
    frequency=1/60,
    resolution=(1280, 720),
    render_product_path=rp.path,
)

python

即可。

创建相机一般来说需要在 World 的 reset 之前，并在在 reset 之后，就可以使用相机进行渲染了。

渲染 RGB 图片#

Isaac Sim 想要进行更新，主要包括三种不同的形式，即，step(), render(), world.step(render=False)，其中 step() 是同时进行物理和渲染引擎的更新，render() 是进行渲染的更新，world.step(render=False) 是进行物理引擎的更新。

值得注意的是，因为 Isaac Sim 不知名的 Bug，实际上在运行的过程中 step() 是通过 omni 的一个闭源的 update 函数进行的更新，而 step(render=False) 和 render() 则相对正常。在绝大多数情况下，考虑物理，step 是否 render 为 True，不会有太大的影响，但是在 Articulation 物体的 stiffness 以及 damping 设置不合理的情况下，却会出现 step(render=False) 超调而 step(render=True) 正常的情况。

在参数不合理的情况下，我们需要认为机械臂的超调反应了正确的物理现象，即使用 step(render=False) 来更新物理引擎，并使用 render() 来更新渲染引擎，是尽可能优雅的方式，而不是直接使用 step()。

一个简单的函数可以帮助你获得相机的图像：

def get_rgb(camera: Camera) -> np.ndarray | None:
    frame = camera.get_rgba()
    if isinstance(frame, np.ndarray) and frame.size > 0:
        frame = frame[:, :, :3]
        return frame
    else:
        return None

python

只需要将 Camera 作为参数传入，即可获得 RGB 图片。

同时值得注意的是，Camera 支持使用 .set_world_pose() 以及 .set_local_pose() 来设置相机的世界坐标以及局部坐标，你可以使用这些 API 来设置相机的位置和姿态。

其他表征的获取#

如下述代码所示：

def get_depth(camera: Camera) -> np.ndarray | None:
    depth = camera._custom_annotators["distance_to_image_plane"].get_data()
    if isinstance(depth, np.ndarray) and depth.size > 0:
        return depth
    else:
        return None

def get_pointcloud(camera: Camera) -> np.ndarray | None:
    cloud = camera._custom_annotators["pointcloud"].get_data()["data"]
    if isinstance(cloud, np.ndarray) and cloud.size > 0:
        return cloud
    else:
        return None

def get_objectmask(camera: Camera) -> dict | None:
    annotator = camera._custom_annotators["semantic_segmentation"]
    annotation_data = annotator.get_data()
    mask = annotation_data["data"]
    idToLabels = annotation_data["info"]["idToLabels"]
    if isinstance(mask, np.ndarray) and mask.size > 0:
        return dict(mask=mask.astype(np.int8), id2labels=idToLabels)
    else:
        return None

def get_rgb(camera: Camera) -> np.ndarray | None:
    frame = camera.get_rgba()
    if isinstance(frame, np.ndarray) and frame.size > 0:
        frame = frame[:, :, :3]
        return frame
    else:
        return None

def get_bounding_box_2d_tight(camera: Camera) -> tuple[np.ndarray, dict]:
    annotator = camera._custom_annotators["bounding_box_2d_tight"]
    annotation_data = annotator.get_data()
    bbox = annotation_data["data"]
    info = annotation_data["info"]
    return bbox, info["idToLabels"]

def get_bounding_box_2d_loose(camera: Camera) -> tuple[np.ndarray, dict]:
    annotator = camera._custom_annotators["bounding_box_2d_loose"]
    annotation_data = annotator.get_data()
    bbox = annotation_data["data"]
    info = annotation_data["info"]
    return bbox, info["idToLabels"]

def get_bounding_box_3d(camera: Camera) -> tuple[list[dict], dict]:
    annotator = camera._custom_annotators["bounding_box_3d"]
    annotation_data = annotator.get_data()
    bbox = annotation_data["data"]
    info = annotation_data["info"]
    bbox_data = []
    for box in bbox:
        extents = {}
        (
            extents["class"],
            extents["x_min"],
            extents["y_min"],
            extents["z_min"],
            extents["x_max"],
            extents["y_max"],
            extents["z_max"],
            extents["transform"],
            _,
        ) = box
        extents["corners"] = get_world_corners_from_bbox3d(extents)
        bbox_data.append(extents)
    return bbox_data, info["idToLabels"]

def get_motion_vectors(camera: Camera) -> np.ndarray:
    annotator = camera._custom_annotators["motion_vectors"]
    annotation_data = annotator.get_data()
    motion_vectors = annotation_data
    return motion_vectors

python

其中如 bbox 的 label，源自于你需要为物体本身添加 semantic label，函数为：

from omni.isaac.core.utils.prims import get_prim_at_path
from omni.isaac.core.utils.semantics import add_update_semantics


def set_semantic_label(prim_path: str, label: str) -> None:
    prim = get_prim_at_path(prim_path)
    add_update_semantics(prim, semantic_label=label, type_label="class")

python

详细的输出格式见 Isaac Sim 的官方文档。

与现实相机对齐#

Isaac Sim 的相机自然也可以和现实中的相机进行对齐，此时外参通过 .set_world_pose() 来设置，内参则通过下述代码来设置：

def set_camera_rational_polynomial(
    camera: Camera,
    fx: float,
    fy: float,
    cx: float,
    cy: float,
    width: int,
    height: int,
    pixel_size: float = 3,
    f_stop: float = 2.0,
    focus_distance: float = 0.3,
    D: np.ndarray | None = None,
) -> Camera:
    if D is None:
        D = np.zeros(8)
    camera.initialize()
    camera.set_resolution([width, height])
    camera.set_clipping_range(0.02, 5)
    horizontal_aperture = pixel_size * 1e-3 * width
    vertical_aperture = pixel_size * 1e-3 * height
    focal_length_x = fx * pixel_size * 1e-3
    focal_length_y = fy * pixel_size * 1e-3
    focal_length = (focal_length_x + focal_length_y) / 2
    camera.set_focal_length(focal_length / 10.0)
    camera.set_focus_distance(focus_distance)
    camera.set_lens_aperture(f_stop * 100.0)
    camera.set_horizontal_aperture(horizontal_aperture / 10.0)
    camera.set_vertical_aperture(vertical_aperture / 10.0)
    camera.set_clipping_range(0.05, 1.0e5)
    diagonal = 2 * math.sqrt(max(cx, width - cx) ** 2 + max(cy, height - cy) ** 2)
    diagonal_fov = 2 * math.atan2(diagonal, fx + fy) * 180 / math.pi
    camera.set_projection_type("fisheyePolynomial")
    camera.set_rational_polynomial_properties(width, height, cx, cy, diagonal_fov, D)
    return camera

python

在设置之后，Isaac Sim 不支持直接使用 get_intrinsics() 来获得内参，此时需要自己实现对应的代码：

def get_intrinsic_matrix(camera: Camera) -> np.ndarray:
    fx, fy = compute_fx_fy(
        camera, camera.get_resolution()[1], camera.get_resolution()[0]
    )
    cx, cy = camera.get_resolution()[0] / 2, camera.get_resolution()[1] / 2
    return np.array([[fx, 0, cx], [0, fy, cy], [0, 0, 1]], dtype=np.float32)

def compute_fx_fy(camera: Camera, height: int, width: int) -> tuple[float, float]:
    focal_length = camera.get_focal_length()
    horiz_aperture = camera.get_horizontal_aperture()
    vert_aperture = camera.get_vertical_aperture()
    near, far = camera.get_clipping_range()
    fov = 2 * np.arctan(0.5 * horiz_aperture / focal_length)
    focal_x = height * focal_length / vert_aperture
    focal_y = width * focal_length / horiz_aperture
    return focal_x, focal_y

python

小结#

在本章节中，我们介绍了如何创建相机，以及如何使用相机来获得不同类型的数据。本身 Isaac Sim 的相机进行了比较良好的封装，所以使用起来还是比较方便的。

Isaac 101

前言#

创建相机#

渲染 RGB 图片#

更多参数#

其他表征的获取#

与现实相机对齐#

小结#