JoliGEN Options

Here are all the available options to call with train.py

Parameter

Type

Default

Description

--checkpoints_dir

string

./checkpoints

models are saved here

--dataroot

string

None

path to images (should have subfolders trainA, trainB, valA, valB, etc)

--ddp_port

string

12355

--gpu_ids

string

0

gpu ids: e.g. 0 0,1,2, 0,2. use -1 for CPU

--model_type

string

cut

chooses which model to use. Values: cut, cycle_gan, palette

--name

string

experiment_name

name of the experiment. It decides where to store samples and models

--phase

string

train

train, val, test, etc

--suffix

string

customized suffix: opt.name = opt.name + suffix: e.g., {model}_{netG}_size{load_size}

--test_batch_size

int

1

input batch size

--warning_mode

flag

whether to display warning

--with_amp

flag

whether to activate torch amp on forward passes

--with_tf32

flag

whether to activate tf32 for faster computations (Ampere GPU and beyond only)

--with_torch_compile

flag

whether to activate torch.compile for some forward and backward functions (experimental)

Discriminator

Parameter

Type

Default

Description

--D_dropout

flag

whether to use dropout in the discriminator

--D_n_layers

int

3

only used if netD==n_layers

--D_ndf

int

64

# of discrim filters in the first conv layer

--D_netDs

array

['projected_d', 'basic']

specify discriminator architecture, D_n_layers allows you to specify the layers in the discriminator. NB: duplicated arguments will be ignored.

--D_no_antialias

flag

if specified, use stride=2 convs instead of antialiased-downsampling (sad)

--D_no_antialias_up

flag

if specified, use [upconv(learned filter)] instead of [upconv(hard-coded [1,3,3,1] filter), conv]

--D_norm

string

instance

instance normalization or batch normalization for D. Values: instance, batch, none

--D_proj_config_segformer

string

models/configs/segformer/segformer_config_b0.json

path to segformer configuration file

--D_proj_interp

int

-1

whether to force projected discriminator interpolation to a value > 224, -1 means no interpolation

--D_proj_network_type

string

efficientnet

projected discriminator architecture. Values: efficientnet, segformer, vitbase, vitsmall, vitsmall2, vitclip16

--D_proj_weight_segformer

string

models/configs/segformer/pretrain/segformer_mit-b0.pth

path to segformer weight

--D_spectral

flag

whether to use spectral norm in the discriminator

--D_temporal_every

int

4

--D_temporal_frame_step

int

30

how many frames between successive frames selected

--D_temporal_num_common_char

int

-1

how many characters (the first ones) are used to identify a video; if =-1 natural sorting is used

--D_temporal_number_frames

int

5

how many successive frames use for temporal loss

--D_vision_aided_backbones

string

clip+dino+swin

specify vision aided discriminators architectures, they are frozen then output are combined and fitted with a linear network on top, choose from dino, clip, swin, det_coco, seg_ade and combine them with +

--D_weight_sam

string

path to sam weight for D, e.g. models/configs/sam/pretrain/sam_vit_b_01ec64.pth

Generator

Parameter

Type

Default

Description

--G_attn_nb_mask_attn

int

10

--G_attn_nb_mask_input

int

1

--G_backward_compatibility_twice_resnet_blocks

flag

if true, feats will go through resnet blocks two times for resnet_attn generators. This option will be deleted, it's for backward compatibility (old models were trained that way).

--G_config_segformer

string

models/configs/segformer/segformer_config_b0.json

path to segformer configuration file for G

--G_diff_n_timestep_test

int

1000

Number of timesteps used for UNET mha inference (test time).

--G_diff_n_timestep_train

int

2000

Number of timesteps used for UNET mha training.

--G_dropout

flag

dropout for the generator

--G_nblocks

int

9

# of layer blocks in G, applicable to resnets

--G_netE

string

resnet_256

specify multimodal latent vector encoder. Values: resnet_128, resnet_256, resnet_512, conv_128, conv_256, conv_512

--G_netG

string

mobile_resnet_attn

specify generator architecture. Values: resnet_9blocks, resnet_6blocks, resnet_3blocks, resnet_12blocks, mobile_resnet_9blocks, mobile_resnet_3blocks, resnet_attn, mobile_resnet_attn, unet_256, unet_128, stylegan2, smallstylegan2, segformer_attn_conv, segformer_conv, ittr, unet_mha, uvit

--G_ngf

int

64

# of gen filters in the last conv layer

--G_norm

string

instance

instance normalization or batch normalization for G. Values: instance, batch, none

--G_padding_type

string

reflect

whether to use padding in the generator. Values: reflect, replicate, zeros

--G_spectral

flag

whether to use spectral norm in the generator

--G_stylegan2_num_downsampling

int

1

Number of downsampling layers used by StyleGAN2Generator

--G_unet_mha_attn_res

array

[16]

downrate samples at which attention takes place

--G_unet_mha_channel_mults

array

[1, 2, 4, 8]

channel multiplier for each level of the UNET mha

--G_unet_mha_group_norm_size

int

32

--G_unet_mha_norm_layer

string

groupnorm

. Values: groupnorm, batchnorm, layernorm, instancenorm, switchablenorm

--G_unet_mha_num_head_channels

int

32

--G_unet_mha_num_heads

int

1

--G_unet_mha_res_blocks

array

[2, 2, 2, 2]

distribution of resnet blocks across the UNet stages, should have same size as --G_unet_mha_channel_mults

--G_unet_mha_vit_efficient

flag

if true, use efficient attention in UNet and UViT

--G_uvit_num_transformer_blocks

int

6

Number of transformer blocks in UViT

Algorithm-specific

GAN model

Parameter

Type

Default | Description

–alg_gan_lambda

float | 1.0 | weight for GAN
| loss:GAN(G(X))

CUT model

Parameter

Type

Default

Description

--alg_cut_HDCE_gamma

float

1.0

--alg_cut_HDCE_gamma_min

float

1.0

--alg_cut_MSE_idt

flag

use MSENCE loss for identity mapping: MSE(G(Y), Y))

--alg_cut_flip_equivariance

flag

Enforce flip-equivariance as additional regularization. It's used by FastCUT, but not CUT

--alg_cut_lambda_MSE_idt

float

1.0

weight for MSE identity loss: MSE(G(X), X)

--alg_cut_lambda_NCE

float

1.0

weight for NCE loss: NCE(G(X), X)

--alg_cut_lambda_SRC

float

0.0

weight for SRC (semantic relation consistency) loss: NCE(G(X), X)

--alg_cut_nce_T

float

0.07

temperature for NCE loss

--alg_cut_nce_idt

flag

use NCE loss for identity mapping: NCE(G(Y), Y))

--alg_cut_nce_includes_all_negatives_from_minibatch

flag

(used for single image translation) If True, include the negatives from the other samples of the minibatch when computing the contrastive loss. Please see models/patchnce.py for more details.

--alg_cut_nce_layers

string

0,4,8,12,16

compute NCE loss on which layers

--alg_cut_nce_loss

string

monce

CUT contrastice loss. Values: patchnce, monce, SRC_hDCE

--alg_cut_netF

string

mlp_sample

how to downsample the feature map. Values: sample, mlp_sample, sample_qsattn, mlp_sample_qsattn

--alg_cut_netF_dropout

flag

whether to use dropout with F

--alg_cut_netF_nc

int

256

--alg_cut_netF_norm

string

instance

instance normalization or batch normalization for F. Values: instance, batch, none

--alg_cut_num_patches

int

256

number of patches per layer

CycleGAN model

Parameter

Type

Default

Description

--alg_cyclegan_lambda_A

float

10.0

weight for cycle loss (A -> B -> A)

--alg_cyclegan_lambda_B

float

10.0

weight for cycle loss (B -> A -> B)

--alg_cyclegan_lambda_identity

float

0.5

use identity mapping. Setting lambda_identity other than 0 has an effect of scaling the weight of the identity mapping loss. For example, if the weight of the identity loss should be 10 times smaller than the weight of the reconstruction loss, please set lambda_identity = 0.1

--alg_cyclegan_rec_noise

float

0.0

whether to add noise to reconstruction

ReCUT / ReCycleGAN

Parameter

Type

Default

Description

--alg_re_P_lr

float

0.0002

initial learning rate for P networks

--alg_re_adversarial_loss_p

flag

if True, also train the prediction model with an adversarial loss

--alg_re_netP

string

unet_128

specify P architecture. Values: resnet_9blocks, resnet_6blocks, resnet_attn, unet_256, unet_128

--alg_re_no_train_P_fake_images

flag

if True, P won't be trained over fake images projections

--alg_re_nuplet_size

int

3

Number of frames loaded

--alg_re_projection_threshold

float

1.0

threshold of the real images projection loss below with fake projection and fake reconstruction losses are applied

Diffusion model

Parameter

Type

Default

Description

--alg_palette_computed_sketch_list

array

['canny', 'hed']

what to use for random sketch

--alg_palette_cond_embed_dim

int

32

nb of examples processed for inference

--alg_palette_cond_image_creation

string

y_t

how cond_image is created. Values: y_t, previous_frame, computed_sketch, low_res

--alg_palette_conditioning

string

whether to use conditioning or not. Values: , mask, class, mask_and_class

--alg_palette_generate_per_class

flag

whether to generate samples of each images

--alg_palette_inference_num

int

-1

nb of examples processed for inference

--alg_palette_lambda_G

float

1.0

weight for supervised loss

--alg_palette_loss

string

MSE

loss for denoising model. Values: L1, MSE, multiscale

--alg_palette_prob_use_previous_frame

float

0.5

prob to use previous frame as y cond

--alg_palette_sam_crop_delta

flag

extend crop's width and height by 2*crop_delta before computing masks

--alg_palette_sam_final_canny

flag

whether to perform a Canny edge detection on sam sketch to soften the edges

--alg_palette_sam_max_mask_area

float

0.99

maximum area in proportion of image size for a mask to be kept

--alg_palette_sam_min_mask_area

float

0.001

minimum area in proportion of image size for a mask to be kept

--alg_palette_sam_no_output_binary_sam

flag

whether to not output binary sketch before Canny

--alg_palette_sam_no_sample_points_in_ellipse

flag

whether to not sample the points inside an ellipse to avoid the corners of the image

--alg_palette_sam_no_sobel_filter

flag

whether to not use a Sobel filter on each SAM masks

--alg_palette_sam_points_per_side

int

16

number of points per side of image to prompt SAM with (# of prompted points will be points_per_side**2)

--alg_palette_sam_redundancy_threshold

float

0.62

redundancy threshold above which redundant masks are not kept

--alg_palette_sam_sobel_threshold

float

0.7

sobel threshold in % of gradient magintude

--alg_palette_sam_use_gaussian_filter

flag

whether to apply a gaussian blur to each SAM masks

--alg_palette_sampling_method

string

ddpm

choose the sampling method between ddpm and ddim. Values: ddpm, ddim

--alg_palette_sketch_canny_range

array

[0, 765]

range for Canny thresholds

--alg_palette_super_resolution_scale

float

2.0

scale for super resolution

--alg_palette_task

string

inpainting

Values: inpainting, super_resolution

Datasets

Parameter

Type

Default

Description

--data_crop_size

int

256

then crop to this size

--data_dataset_mode

string

unaligned

chooses how datasets are loaded. Values: unaligned, unaligned_labeled_cls, unaligned_labeled_mask, self_supervised_labeled_mask, unaligned_labeled_mask_cls, self_supervised_labeled_mask_cls, unaligned_labeled_mask_online, self_supervised_labeled_mask_online, unaligned_labeled_mask_cls_online, self_supervised_labeled_mask_cls_online, aligned, nuplet_unaligned_labeled_mask, temporal, self_supervised_temporal, single

--data_direction

string

AtoB

AtoB or BtoA. Values: AtoB, BtoA

--data_inverted_mask

flag

whether to invert the mask, i.e. around the bbox

--data_load_size

int

286

scale images to this size

--data_max_dataset_size

int

1000000000

Maximum number of samples allowed per dataset. If the dataset directory contains more than max_dataset_size, only a subset is loaded.

--data_num_threads

int

4

# threads for loading data

--data_online_context_pixels

int

0

context pixel band around the crop, unused for generation, only for disc

--data_online_fixed_mask_size

int

-1

if >0, it will be used as fixed bbox size (warning: in dataset resolution ie before resizing)

--data_online_select_category

int

-1

category to select for bounding boxes, -1 means all boxes selected

--data_online_single_bbox

flag

whether to only allow a single bbox per online crop

--data_preprocess

string

resize_and_crop

scaling and cropping of images at load time. Values: resize_and_crop, crop, scale_width, scale_width_and_crop, none

--data_refined_mask

flag

whether to use refined mask with sam

--data_relative_paths

flag

whether paths to images are relative to dataroot

--data_sanitize_paths

flag

if true, wrong images or labels paths will be removed before training

--data_serial_batches

flag

if true, takes images in order to make batches, otherwise takes them randomly

Online created datasets

Parameter

Type

Default

Description

--data_online_creation_color_mask_A

flag

Perform task of replacing color-filled masks by objects

--data_online_creation_crop_delta_A

int

50

size of crops are random, values allowed are online_creation_crop_size more or less online_creation_crop_delta for domain A

--data_online_creation_crop_delta_B

int

50

size of crops are random, values allowed are online_creation_crop_size more or less online_creation_crop_delta for domain B

--data_online_creation_crop_size_A

int

512

crop to this size during online creation, it needs to be greater than bbox size for domain A

--data_online_creation_crop_size_B

int

512

crop to this size during online creation, it needs to be greater than bbox size for domain B

--data_online_creation_load_size_A

array

[]

load to this size during online creation, format : width height or only one size if square

--data_online_creation_load_size_B

array

[]

load to this size during online creation, format : width height or only one size if square

--data_online_creation_mask_delta_A

array

[0]

ratio mask offset to allow generation of a bigger object in domain B (for semantic loss) for domain A, format : width (x) height (y) or only one size if square

--data_online_creation_mask_delta_B

array

[0]

mask offset to allow generation of a bigger object in domain B (for semantic loss) for domain B, format : width (y) height (x) or only one size if square

--data_online_creation_mask_random_offset_A

array

[0.0]

ratio mask size randomization (only to make bigger one) to robustify the image generation in domain A, format : width (x) height (y) or only one size if square

--data_online_creation_mask_random_offset_B

array

[0.0]

mask size randomization (only to make bigger one) to robustify the image generation in domain B, format : width (y) height (x) or only one size if square

--data_online_creation_mask_square_A

flag

whether masks should be squared for domain A

--data_online_creation_mask_square_B

flag

whether masks should be squared for domain B

--data_online_creation_rand_mask_A

flag

Perform task of replacing noised masks by objects

Semantic segmentation network

Parameter

Type

Default

Description

--f_s_all_classes_as_one

flag

if true, all classes will be considered as the same one (ie foreground vs background)

--f_s_class_weights

array

[]

class weights for imbalanced semantic classes

--f_s_config_segformer

string

models/configs/segformer/segformer_config_b0.json

path to segformer configuration file for f_s

--f_s_dropout

flag

dropout for the semantic network

--f_s_net

string

vgg

specify f_s network [vgg

--f_s_nf

int

64

# of filters in the first conv layer of classifier

--f_s_semantic_nclasses

int

2

number of classes of the semantic loss classifier

--f_s_semantic_threshold

float

1.0

threshold of the semantic classifier loss below with semantic loss is applied

--f_s_weight_sam

string

path to sam weight for f_s, e.g. models/configs/sam/pretrain/sam_vit_b_01ec64.pth

--f_s_weight_segformer

string

path to segformer weight for f_s, e.g. models/configs/segformer/pretrain/segformer_mit-b0.pth

Semantic classification network

Parameter

Type

Default

Description

--cls_all_classes_as_one

flag

if true, all classes will be considered as the same one (ie foreground vs background)

--cls_class_weights

array

[]

class weights for imbalanced semantic classes

--cls_config_segformer

string

models/configs/segformer/segformer_config_b0.py

path to segformer configuration file for cls

--cls_dropout

flag

dropout for the semantic network

--cls_net

string

vgg

specify cls network [vgg

--cls_nf

int

64

# of filters in the first conv layer of classifier

--cls_semantic_nclasses

int

2

number of classes of the semantic loss classifier

--cls_semantic_threshold

float

1.0

threshold of the semantic classifier loss below with semantic loss is applied

--cls_weight_segformer

string

path to segformer weight for cls, e.g. models/configs/segformer/pretrain/segformer_mit-b0.pth

Output

Parameter

Type

Default

Description

--output_no_html

flag

do not save intermediate training results to [opt.checkpoints_dir]/[opt.name]/web/

--output_print_freq

int

100

frequency of showing training results on console

--output_update_html_freq

int

1000

frequency of saving training results to html

--output_verbose

flag

if specified, print more debugging information

Visdom display

Parameter

Type

Default

Description

--output_display_G_attention_masks

flag

--output_display_aim_port

int

53800

aim port of the web display

--output_display_aim_server

string

http://localhost

aim server of the web display

--output_display_diff_fake_real

flag

if True x - G(x) is displayed

--output_display_env

string

visdom display environment name (default is "main")

--output_display_freq

int

400

frequency of showing training results on screen

--output_display_id

int

1

window id of the web display

--output_display_ncols

int

0

if positive, display all images in a single visdom web panel with certain number of images per row.(if == 0 ncols will be computed automatically)

--output_display_networks

flag

Set True if you want to display networks on port 8000

--output_display_type

array

['visdom'] |

output display, either visdom or aim. Values: visdom, aim

--output_display_visdom_port

int

8097

visdom port of the web display

--output_display_visdom_server

string

http://localhost

visdom server of the web display

--output_display_winsize

int

256

display window size for both visdom and HTML

Model

Parameter

Type

Default

Description

--model_depth_network

string

DPT_Large

specify depth prediction network architecture. Values: DPT_Large, DPT_Hybrid, MiDaS_small, DPT_BEiT_L_512, DPT_BEiT_L_384, DPT_BEiT_B_384, DPT_SwinV2_L_384, DPT_SwinV2_B_384, DPT_SwinV2_T_256, DPT_Swin_L_384, DPT_Next_ViT_L_384, DPT_LeViT_224

--model_init_gain

float

0.02

scaling factor for normal, xavier and orthogonal.

--model_init_type

string

normal

network initialization. Values: normal, xavier, kaiming, orthogonal

--model_input_nc

int

3

# of input image channels: 3 for RGB and 1 for grayscale. Values: 1, 3

--model_multimodal

flag

multimodal model with random latent input vector

--model_output_nc

int

3

# of output image channels: 3 for RGB and 1 for grayscale. Values: 1, 3

Training

Parameter

Type

Default

Description

--train_D_accuracy_every

int

1000

--train_D_lr

float

0.0002

discriminator separate learning rate

--train_G_ema

flag

whether to build G via exponential moving average

--train_G_ema_beta

float

0.999

exponential decay for ema

--train_G_lr

float

0.0002

initial learning rate for generator

--train_batch_size

int

1

input batch size

--train_beta1

float

0.9

momentum term of adam

--train_beta2

float

0.999

momentum term of adam

--train_cls_l1_regression

flag

if true l1 loss will be used to compute regressor loss

--train_cls_regression

flag

if true cls will be a regressor and not a classifier

--train_compute_D_accuracy

flag

--train_compute_metrics

flag

--train_compute_metrics_test

flag

--train_continue

flag

continue training: load the latest model

--train_epoch

string

latest

which epoch to load? set to latest to use latest cached model

--train_epoch_count

int

1

the starting epoch count, we save the model by <epoch_count>, <epoch_count>+<save_latest_freq>, ...

--train_export_jit

flag

whether to export model in jit format

--train_gan_mode

string

lsgan

the type of GAN objective. vanilla GAN loss is the cross-entropy objective used in the original GAN paper. Values: vanilla, lsgan, wgangp, projected

--train_iter_size

int

1

backward will be applied each iter_size iterations, it simulate a greater batch size : its value is batch_size*iter_size

--train_load_iter

int

0

which iteration to load? if load_iter > 0, the code will load models by iter_[load_iter]; otherwise, the code will load models by [epoch]

--train_lr_decay_iters

int

50

multiply by a gamma every lr_decay_iters iterations

--train_lr_policy

string

linear

learning rate policy. Values: linear, step, plateau, cosine

--train_metrics_every

int

1000

--train_mm_lambda_z

float

0.5

weight for random z loss

--train_mm_nz

int

8

number of latent vectors

--train_n_epochs

int

100

number of epochs with the initial learning rate

--train_n_epochs_decay

int

100

number of epochs to linearly decay learning rate to zero

--train_nb_img_max_fid

int

1000000000

Maximum number of samples allowed per dataset to compute fid. If the dataset directory contains more than nb_img_max_fid, only a subset is used.

--train_optim

string

adam

optimizer (adam, radam, adamw, ...). Values: adam, radam, adamw, lion

--train_pool_size

int

50

the size of image buffer that stores previously generated images

--train_save_by_iter

flag

whether saves model by iteration

--train_save_epoch_freq

int

1

frequency of saving checkpoints at the end of epochs

--train_save_latest_freq

int

5000

frequency of saving the latest results

--train_semantic_cls

flag

if true semantic class losses will be used

--train_semantic_mask

flag

if true semantic mask losses will be used

--train_temporal_criterion

flag

if true, MSE loss will be computed between successive frames

--train_temporal_criterion_lambda

float

1.0

lambda for MSE loss that will be computed between successive frames

--train_use_contrastive_loss_D

flag

Semantic training

Parameter

Type

Default

Description

--train_sem_cls_B

flag

if true cls will be trained not only on domain A but also on domain B

--train_sem_cls_lambda

float

1.0

weight for semantic class loss

--train_sem_cls_pretrained

flag

whether to use a pretrained model, available for non "basic" model only

--train_sem_cls_template

string

basic

classifier/regressor model type, from torchvision (resnet18, ...), default is custom simple model

--train_sem_idt

flag

if true apply semantic loss on identity

--train_sem_lr_cls

float

0.0002

cls learning rate

--train_sem_lr_f_s

float

0.0002

f_s learning rate

--train_sem_mask_lambda

float

1.0

weight for semantic mask loss

--train_sem_net_output

flag

if true apply generator semantic loss on network output for real image rather than on label.

--train_sem_use_label_B

flag

if true domain B has labels too

Semantic training with masks

Parameter

Type

Default

Description

--train_mask_charbonnier_eps

float

1e-06

Charbonnier loss epsilon value

--train_mask_compute_miou

flag

--train_mask_disjoint_f_s

flag

whether to use a disjoint f_s with the same exact structure

--train_mask_f_s_B

flag

if true f_s will be trained not only on domain A but also on domain B

--train_mask_for_removal

flag

if true, object removal mode, domain B images with label 0, cut models only

--train_mask_lambda_out_mask

float

10.0

weight for loss out mask

--train_mask_loss_out_mask

string

L1

loss for out mask content (which should not change). Values: L1, MSE, Charbonnier

--train_mask_miou_every

int

1000

--train_mask_no_train_f_s_A

flag

if true f_s won't be trained on domain A

--train_mask_out_mask

flag

use loss out mask

Data augmentation

Parameter

Type

Default

Description

--dataaug_APA

flag

if true, G will be used as augmentation during D training adaptively to D overfitting between real and fake images

--dataaug_APA_every

int

4

How often to perform APA adjustment?

--dataaug_APA_nimg

int

50

APA adjustment speed, measured in how many images it takes for p to increase/decrease by one unit.

--dataaug_APA_p

int

0

initial value of probability APA

--dataaug_APA_target

float

0.6

--dataaug_D_diffusion

flag

whether to apply diffusion noise augmentation to discriminator inputs, projected discriminator only

--dataaug_D_diffusion_every

int

4

How often to perform diffusion augmentation adjustment

--dataaug_D_label_smooth

flag

whether to use one-sided label smoothing with discriminator

--dataaug_D_noise

float

0.0

whether to add instance noise to discriminator inputs

--dataaug_affine

float

0.0

if specified, apply random affine transforms to the images for data augmentation

--dataaug_affine_scale_max

float

1.2

if random affine specified, max scale range value

--dataaug_affine_scale_min

float

0.8

if random affine specified, min scale range value

--dataaug_affine_shear

int

45

if random affine specified, shear range (0,value)

--dataaug_affine_translate

float

0.2

if random affine specified, translation range (-value*img_size,+value*img_size) value

--dataaug_diff_aug_policy

string

choose the augmentation policy : color randaffine randperspective. If you want more than one, please write them separated by a comma with no space (e.g. color,randaffine)

--dataaug_diff_aug_proba

float

0.5

proba of using each transformation

--dataaug_imgaug

flag

whether to apply random image augmentation

--dataaug_no_flip

flag

if specified, do not flip the images for data augmentation

--dataaug_no_rotate

flag

if specified, do not rotate the images for data augmentation

JoliGEN Models

Models

Name

Paper

CycleGAN

https://arxiv.org/abs/1703.10593

CyCADA

https://arxiv.org/abs/1711.03213

CUT

https://arxiv.org/abs/2007.15651

RecycleGAN

https://arxiv.org/abs/1808.05174

StyleGAN2

https://arxiv.org/abs/1912.04958

Generator architectures

Architecture

Number of parameters

Resnet 9 blocks

11.378M

Mobile resnet 9 blocks

1.987M

Resnet attn

11.823M

Mobile resnet attn

2.432M

Segformer b0

4.158M

Segformer attn b0

4.60M

Segformer attn b1

14.724M

Segformer attn b5

83.016M

UNet with mha

~60M configurable

ITTR

~30M configurable