Google presented two artificial intelligence models for converting text to video

Facebook’s parent company Meta last week unveiled a new artificial intelligence model that turns text descriptions into short, silent videos. Apparently, Google is also working on such a system; Because it has presented two new artificial intelligence models for converting text to video, one of which focuses on the quality of images and the other is intended for producing longer clips.

First, we will take a look at Imagen Video, Google’s artificial intelligence model for creating high-quality videos. The system works based on techniques from Google’s previous picture-in-picture system, but uses a bunch of new components to turn still frames into smooth motion.

written by VergeThe results obtained from Google’s artificial intelligence, such as the Make-A-Video meta model, are incredible, strange and, of course, disappointing in some cases. The most convincing examples are videos that repeat animations, such as green sprouts forming the words Imagen or wooden sculptures surfing in space. Because we don’t necessarily expect such videos to follow strict rules of temporal and spatial composition. In fact, these animations can run a little slower.

Among the weakest results obtained from Google’s text-to-video conversion model, we can point out clips such as shoveling snow, which repeat the movements of humans and animals. Here, when we have such a clear idea of ​​how the body and limbs move, the deformation and deterioration of the film is more obvious. Regardless, however, all of the videos are pretty impressive.

Credit provided for making this video: A British shorthair jumping off the couch.

Explanation provided for the making of this video: Sprouts in the form of the text “Imagen” coming out of a legendary book.

Credit provided for making this video: Snow Shoveling.

Description provided for the making of this video: A wooden sculpture surfing on a surfboard in space.

Google researchers note that the Imagen Video model offers videos with 16 frames and a speed of 3 frames per second with a resolution of 24 x 48 pixels. In the next step, this content is processed with artificial intelligence-based image resolution models, and in this step, the output is upgraded to 128 frames at a speed of 24 frames per second and a resolution of 1280 x 768 pixels. This is a higher resolution than the Make-A-Video Meta model (768 x 768 pixels).

As we explained about the first meta-model for converting text to video, the emergence of this technology brings with it various challenges, including racial and gender biases and the potential of their abuse for advertising, pornography and providing false information. Google researchers in their article briefly avoid this issue. The team says:

For video production models to have a positive impact on society, these systems should be used by enhancing and increasing human creativity. However, it is also possible to abuse these models to create fake, hateful or harmful content.

The team notes that it has tested filters to prevent abuse of the text-to-video conversion models, but they don’t provide an explanation of how effective those filters are; Of course, according to the researchers, their artificial intelligence model has performed well in several security and ethical cases.

Imagen Video is a research project, and by not releasing it to the general public, Google prevents its possible harm to society. It’s worth noting that the Make-A-Video Meta AI is also not available to the public and is similarly limited. These models, such as text-to-image conversion systems, will be available to other researchers as open-source models shortly before public release, and at that time, new security and ethical challenges will arise regarding their use.

Explanation provided for making this video: A cat to the left of a dog.

Explanation provided for making this video: teddy bear washing dishes.

Explanation provided for the making of this video: Hand lifts the cup.

Related article:

In addition to Imagen Video, Google has employed a separate team of researchers to develop another text-to-video conversion model called Phenaki. Compared to Imagen Video, this model is developed with a focus on creating longer videos and follows detailed instructions. For example, consider the following text:

Heavy traffic in a futuristic city. An alien space junk arrives in a futuristic city. The camera zooms in on the alien spaceship. The camera pans forward and shows an astronaut in a blue room. The astronaut is typing on the keyboard. The camera pans away from the astronaut. The astronaut leaves the key and goes to the left. The astronaut drops the keyboard and leaves. The camera moves beyond the astronaut and shows the screen. The screen behind the astronaut shows fish swimming in the sea. Random zoom in on bluefish. We follow a blue fish as it swims in a dark ocean. The camera points to the sky through the water. The ocean and coastline of a futuristic city. Zoom into a futuristic skyscraper. The camera zooms in on one of the windows. We are in an office room with empty desks. A lion walks on office desks. The camera zooms in on the lion’s face inside the office. The zoom continues to reveal a lion wearing a dark suit in an office room. A lion in a suit looks at the camera and smiles. The camera slowly zooms in on the exterior of the skyscraper. Timelapse of the sunset in the modern city.

The Phenaki artificial intelligence model produces a video like the one below after receiving the above text.

Watch on camera

Clearly, the above video lacks coherence and clarity, and is actually lower in quality than samples produced with Imagen Video, but the stable collection of scenes and settings looks really appealing.

In the article explaining the Phenaki model, the researchers say that their method can create very long videos, and in fact, there is no limit to the length of the video. They have also announced that future versions of this model will be part of a wider range of tools for artists and ordinary users, offering new and exciting ways to express creativity. Now we have to see in what direction the future of this system based on artificial intelligence will go and whether Google can solve its ethical and security challenges or not.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker