Meta Just Revealed a System That Will Generate 3D Models With Text

Image via Meta

Creating three-dimensional objects has long been surprisingly difficult. You could either sculpt and scan an object or, if you were brave, build the object using a 3D CAD application, a time-consuming prospect that thousands of 3D designers have been fighting with for decades, especially in the worlds of movies and gaming.

Meta, for their part, just broke down a major barrier to 3D design by releasing an experimental tool called Meta 3D Gen. If you’ve every tried to make something for a 3D printer, for example, the abstract will be pretty interesting:

We introduce Meta 3D Gen (3DGen), a new state-of-the-art, fast pipeline for text-to-3D asset generation. 3DGen offers 3D asset creation with high prompt fidelity and high-quality 3D shapes and textures in under a minute. It supports physically-based rendering (PBR), necessary for 3D asset relighting in real-world applications. Additionally, 3DGen supports generative retexturing of previously generated (or artist-created) 3D shapes using additional textual inputs provided by the user. 3DGen integrates key technical components, Meta 3D AssetGen and Meta 3D TextureGen, that we developed for text-to-3D and text-to-texture generation, respectively. By combining their strengths, 3DGen represents 3D objects simultaneously in three ways: in view space, in volumetric space, and in UV (or texture) space. The integration of these two techniques achieves a win rate of 68% with respect to the single-stage model. We compare 3DGen to numerous industry baselines, and show that it outperforms them in terms of prompt fidelity and visual quality for complex textual prompts, while being significantly faster.

So let’s unpack that. The product basically lets you ask the AI for something like a T-Rex wearing wearing a sweater. Sixty-eight percent of the time the AI will produce something that looks like what you requested (not great) but when it its, it hits.

Image via Meta

Then the model lets you change some aspects of the 3D creation after the fact — in this case, changing the coloration.

The most difficult part of this whole process is the generation of a 3D model. Two-dimensional models are, to a degree, easy. You have plenty of examples, plenty of information to pull from, and plenty of potential paths to get from “draw me five people drinking coffee on Mars” to this:

Image via Meta.ai

See if you can spot the error.

AI idiocy aside, making that picture is now a fairly simple process, akin to using LLMs to write your press releases. The AI just cuts away all the stuff that doesn’t look like the prompt.

Keep Your SSN Off The Dark Web

Every day, data brokers profit from your sensitive info — phone number, DOB, SSN — selling it to the highest bidder. And who’s buying it? Best case: companies target you with ads. Worst case: scammers and identity thieves.

It’s time you check out Incogni. It scrubs your personal data from the web, confronting the world’s data brokers on your behalf. And unlike other services, Incogni helps remove your sensitive information from all broker types, including those tricky People Search Sites.

Help protect yourself from identity theft, spam calls, and health insurers raising your rates. Plus, just for The Media Copilot readers: Get 55% off Incogni using code COPILOT.

How it works

Three dimensional objects are harder. What Meta has done is quite interesting. From their paper:

This process begins in AssetGen by generating several fairly consistent views of the object by utilizing a multi-view and multi-channel version of a text-to-image generator. Then, a reconstruction network in AssetGen extracts a first version of the 3D object in volumetric space. This is followed by mesh extraction, establishing the object’s 3D shape and an initial version of its texture. Finally, a TextureGen’s component regenerates the texture, utilizing a combination of view-space and UV-space generation, boosting the texture quality and resolution while retaining fidelity to the initial prompt.

So basically they are making multiple views of a simple object — a T-Rex in a sweater — and extrapolating a 3D object from those multiple images. This is an ingenious use of new tech (Meta’s LLM) and older tech: the 3D extrapolation system which has been around for a while.

What this means in practice is that it just got a lot easier to generate movies and games. You can generate a hero in multiple poses, extrapolate them in 3D, and then render them as movable objects for the game. You can also create, say, a massive set piece like a car or a castle on the fly. It’s pretty scary, but, as I said before, creating 3D objects can be pretty time-consuming and anything that can help is probably going to be a boon to the industry. Just don’t tell the folks who have been slaving away creating 3D aliens for AAA video games and Marvel movies for the past decade. They probably don’t want to be replaced.

Image via Meta

Introducing The Media Copilot Events and Dinner Series

Over the next year we will be planning our event and dinner series. Here are some specifics:

Events will be held monthly and involve pitches, networking, and deep discussion. If you’d like to sponsor an event, please reply to the newsletter or ping team@mediacopilot.ai. If you have a space in the New York City area that might work for a meetup, please get in touch!

Dinner Series is all about connecting the companies building AI-driven platforms and experiences with the media (journalists, executives, product managers, and other stakeholders). If you’d like to sponsor one of our dinners, please email team@mediacopilot.ai. We would love to get your project in front of decision-makers and this is a simple, economical way to do it.

If you’d like to attend one of our upcoming events, please RSVP here and include your city so we can plan an event near you. Thanks!

RSVP

Ready to start using AI like a pro?


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.