Machine Learning

What Can Machines (Software) Do in Image Editing

05.03.2024
572

To edit an image programmatically, a GAN model needs extended semantics and prior training. This way it can mimic expressions and movements of the character. What competing generative networks have already achieved in this – read in the review.

Data Science and Machine Learning

9583b3106656

StyleGAN


This is a graphical interface for editing portraits. The basis is the ArcFace model. On the screen there are fields for entering text, and with the help of keywords you can determine editing parameters: change color, volume and length of hair, remove or add makeup, and even easily adjust the age of the character.

EigenGAN


The system itself identifies hidden spaces and uses them for manipulations: changing the character's gender, rotating the body, changing pose or hairstyle. It only struggles with glasses because they are rare in the data, and sometimes confuses gender or pose. Other features are handled well.

ReStyle


To edit an image, the model inverts the image's latent code. Instead of a single pass to predict the code, the system calculates at each step the residual relative to the current state of the inverted latent code, so quality improves significantly.
59b8d7469e92

Geometry-Free View Synthesis


The system builds a three-dimensional image from a single photograph. It's enough to upload an image of a room or part of an apartment, and it will complete several variants itself. A quantized space representation is used, without the need for ready-made 3D models or geometry descriptions – the system learns spatial parameters on its own.

LatentCLR


Works with the latent space of GAN models and identifies meaningful vectors. Uses contrastive learning without human supervision. Non-linear vectors are identified in trained versions like BigGAN and StyleGAN2.

Articulated Animation


Capable of creating full-body deepfake: separates the body from the background, identifies movement style and generates new movements. Not dependent on a specific person – learns and applies to anyone you input.

VideoGPT


New architecture for video generation. Uses VQ-VAE automatic encoder to create a latent representation of video without labeling, with three-dimensional convolutions and self-attention, and then GPT for autoregression and time encoding.

MiVOS


Tracks objects in video and creates binary masks. Masks are transferred between frames thanks to a convolutional network, and the user can compare and correct at any point through a convenient graphical interface.

DINO


Innovative approach without manual labeling: combination of transformers and self-learning. Models learn on unlabeled data, apply selective focus and generate hypotheses.

CPA


Predicts the effect of combinations of connected features.
05.03.2024
572
Discuss Project
Choose the type of task, tell us about it, how do you see the solution and the result:
0/500
Please fill in all required fields