Just as we’re coming to terms with the power of the latest AI image generators comes another advance. Hot on the heels of DALL-E comes Point-E, an AI image generator for 3D modeling with a similar modus operandi.
AI image generators made great strides in the past year, allowing anyone to create sometimes stunning images from a text prompt. Currently, they can only create still images in 2D, but OpenAI, the company behind DALL-E 2, is one of the most popular image generators. just revealed their latest research on an AI-powered 3D modeling tool… but it looks pretty basic (see how to use DALL-E 2 to get started with OpenAI’s image generator).
Following the DALL-E comes the Point-E, a model that looks to bring revolutionary text-to-image technology to 3D modeling. OpenAI says the tool, which has been trained on millions of 3D models, can generate 3D point clouds from simple text messages. The catch? The resolution is quite poor.
The research paper, written by a team led by team led by Alex Nichol, says that unlike other methods, Point-E exploits a large corpus of (text, image) pairs, which allows it to follow diverse and complex prompts, while our The image-to-3D model is trained on a smaller dataset of (image, 3D) pairs.”
It says: “To produce a 3D object from a text prompt, we first sample an image using the text-to-image model, and then sample a 3D object depending on the sampled image.” Point-E runs a synthetic view 3D rendering through a series of diffusion models to create a 3D, RGB point cloud: first a coarse 1,024-point cloud, then a finer 4,096-point cloud.
The sample results in the research report may look basic compared to the images DALL-E 2 can produce, and compared to the 3D capabilities of existing systems. But creating 3D images is a hugely resource-intensive process. Programs like Google’s DreamFusion require hours of processing with multiple GPUs.
OpenAI acknowledges that its method underperforms in terms of quality of results, but says it produces samples in just a fraction of the time—we’re talking seconds rather than hours—and requires only one GPU, making 3D modeling more accessible. You can already try it yourself as OpenAI has shared the source code on Github (opens in new tab).
Read more: