Meta’s new AI can convert text to video

Recently, artificial intelligence tools for converting text to images have become one of the most fascinating topics in the field of technology. Now researchers are moving towards the next frontiers of this technology; Using artificial intelligence to convert text to video.

A team of engineers from Meta’s machine learning unit have unveiled a new artificial intelligence model called Make-A-Video. As the name of this tool suggests, it allows users to create a short video according to the text by providing a text description of the desired scene. Currently, the videos produced with this model look artificial and some blurred elements and weak animations can be seen in them, but this technology is considered a very significant development in the field of artificial intelligence-based content production.

Meta announced its new AI in a blog post. The company says:

Generative AI research will advance creative expression by giving people the tools to easily and quickly create new content. Just by giving Make-A-Video a few words or a line of text, the tool can bring the imagination to life and produce unique videos with different colors and scenery.

Mark Zuckerberg, On his Facebook account, Meta’s CEO described the Make-A-Video tool as an amazing development, stating:

Producing video is much more difficult than producing photos, because in this situation, in addition to correctly producing each pixel, the system must predict the change of pixels over time.

The clips that Make-A-Video produces are usually no longer than five seconds and have no sound, but the tool supports a wide range of requests. The best way to judge the performance of Meta’s new AI model is to look at some of its outputs. The videos you see below are all produced with Make-A-Video Meta. Additionally, the command used to make these videos is described below. However, these videos are available from Meta Verge has been placed, and for now, no one has direct access to the company’s new AI tool. This means that the world’s social media giant may have delivered the best results yet from its new artificial intelligence.

Although it is clear that the above videos are computer generated, the output of these types of AI models will improve rapidly in the near future. In comparison, artificial intelligence-based image generation tools have gone from creating unintelligible images to producing real and high-quality photos in just a few years. Although due to the great complexity of the subject, the speed of development of video creation with artificial intelligence is likely to be slower, but the prize of integrated video production will motivate many institutions and companies to invest significant resources in this project.

Related article:

In his blog post following the introduction of Make-A-Video, Meta notes that AI-powered video production tools can be invaluable to content creators and artists, but that, like text-to-image models, there are troubling prospects for text-to-video tools. There is. The output of this technology can be misused for disinformation, advertising, and most likely, based on what we’ve seen with AI-based imaging systems and deepfake, to produce pornography or harassment and intimidation.

Meta wants to focus on how to build new AI systems, such as text-to-video tools, and has recently published an article on the Make-A-Video model. The company plans to release a version of the system, but has not shared any details on when or how it will be available.

Of course, Meta is not the only company working on artificial intelligence tools for video production. Earlier this year, a group of researchers from Tsinghua University and the Beijing Academy of Artificial Intelligence (BAAI) introduced their text-to-video conversion model, CogVideo, which is now publicly available. Examples of output from the CogVideo tool are given below.

Meta researchers note in Make-A-Video’s introduction article that the model is being trained on pairs of images, subtitles, as well as videos. The educational content is prepared from two datasets (WebVid-10M and HD-VILA-100M), which together contain millions of videos and hundreds of thousands of hours of video. The data provided to train this model consists of videos created by sites such as Shutterstock.

The Meta researchers note that the technical limitations of their text-to-video AI model go beyond current problems such as inconsistent animations or creating blurry clips. For example, their training methods cannot learn the information that a human would infer from watching a video. Other problems of this model include the limitation in producing videos longer than five seconds with multiple scenes and events, as well as higher resolution. Make-A-Video currently produces 16 frames of video at a resolution of 64 x 64 pixels, which it then upscales to 768 x 768 pixels using a separate AI model.

Meta’s team also stated that Make-A-Video, like other AI models trained on data collected from the web, learned social biases and possibly exaggerated and harmful ones. These biases in text-to-image conversion models often reinforce social biases. However, it is impossible to say what biases the meta model has learned without open access.

Meta says it will share this research with the public, along with the results of its new AI, and that the company will continue to use the AI ​​framework to refine and evolve its approach to this emerging technology.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker