Project - Deep Generative Models / Summer 2025

Code of Honor

This project is intended to deepen your understanding and develop your skills, and it forms a substantial part of your final evaluation. It must be completed collaboratively as a group. Any form of academic dishonesty is a violation of the Code of Honor. You are encouraged to use publicly available resources, provided that all sources are clearly cited and your individual contributions are clearly explained. Failure to properly acknowledge your contribution may be considered a lack of participation, and projects without meaningful individual contributions will be deemed incomplete.

The course project will be seriously started in the second half of the course. In these project, you choose a topic from the list of available topics and work through semester to deliver the requested outcomes of the project. Regardless of topic of the project, you will need to follow the following steps:

Make a group of 4. Due to the course size, smaller group size is only accepted under special circumstances, e.g., working on an open-ended topic of your own or a group member dropping in the middle of semester.
Choose your topic by the end of Week 6. It is strongly suggested to choose as soon as possible to get into the problem and start preliminaries.
You will be allocated to a TA, who could help you throughout the project.
Deliver initial milestones of the project in a progress briefing. The progress briefing will serve as the base for your final report.
Deliver your final results by the end of semester. This includes the final report, the source codes, and a final presentation in our internal seminar.

Submission Procedure

The main body of work is submitted through Git. In addition, each group submits a final paper and gives a presentation. In this respect, please follow these steps.

Each group must maintain a Git repository, e.g., GitHub or GitLab, for the project. By the time of final submission, the repository should have:
- Well-documented codebase
- Clear README.md with setup and usage instructions
- A requirements.txt file listing all required packages or an environment.yaml file with a reproducible environment setup
- Demo script or notebook showing sample input-output
- If applicable, a /doc folder with extended documentation
A final report (maximum 5 pages) must be submitted in a PDF format. The report should be written in the provided formal style, including an abstract, introduction, method, experiments, results, and conclusion.
Important: Submissions that do not use template are considered incomplete.
A 5-minute presentation (maximum 5 slides including the title slide) is given on the internal seminar on Week 14, i.e., Aug 4 to Aug 8, by the group. For presentation, any template can be used.

Project Topics

Category A: Multimodal Generative Models

Topic A-1: Text-to-Image Generation using Pretrained LMs and Generative Architectures

See Complete Project Description
Objective: Design and implement a multimodal generative model that takes text descriptions as input and generates corresponding images. For language processing, a pretrained LM, e.g., BERT or RoBERTa, is used. The designed multimodal model should integrate this pretrained LM into a generative architecture such as a VAE, GAN, or diffusion model.
Supervisor: Amir Hossein Mobasheri

Topic A-2: Image-to-Text Generation using Pretrained Vision Models and LMs

See Complete Project Description
Objective: Design and implement a multimodal generative model that takes an image as input and generates a descriptive caption or sentence. A pretrained vision model, e.g., ResNet, ViT, or CLIP, is used to extract image features, which are then passed into a LM to generate coherent textual descriptions.
Supervisor: Likun Cai

See Complete Project Description
Objective: Build a model that learns a shared embedding for text and image inputs. Given a text-image pair, the model should embed both modalities into a common space such that semantically aligned pairs are close together and misaligned pairs are distant. This is a foundational task for generative models and retrieval-based generation methods.
Supervisor: Amir Hossein Mobasheri

Category B: Applications of Generative Models

See Complete Project Description
Objective: Design a simple intelligent agent that takes algorithmic problem descriptions, e.g., from introductory programming course or Leetcode-style tasks, and generates not only the corresponding code but also an educational breakdown of the solution. The agent should aim to provide human-readable explanations alongside correct and runnable code, and include a self-refinement mechanism to debug and correct incorrect generations.
Supervisor: Mohammadreza Safavi

Topic B-2: Generative Adversarial Imitation Learning with Transformer-Based Policy Net

See Complete Project Description
Objective: Implement a modernized version of generative adversarial imitation learning (GAIL), where the generator, i.e., the policy network, is modeled using a Transformer. The goal is to train the generator to imitate expert behavior in a simple reinforcement learning (RL) environment through adversarial training.
Supervisor: Mohammadreza Safavi

Topic B-3: Sequence Modeling for Reinforcement Learning with Decision Transformers

See Complete Project Description
Objective: Design and implement a Decision Transformer, a generative model that treats reinforcement learning (RL) as a sequence modeling task. The model should learn to predict the next action based on the historical trajectories and a desired return-to-go.
Supervisor: Amirhosein Rostami

Category C: Tiny AI Products

Topic C-1: Personalized Text-to-Speech using VAE or Diffusion Models

See Complete Project Description
Objective: Design and implement a simplified text-to-speech (TTS) system that generates speech audio conditioned on speaker identity. The project should use a generative model, e.g., a VAE or diffusion-based, to synthesize speech features, which can then be converted into audio using available Vocoders.
Supervisor: Amirhosein Rostami

Topic C-2: Tiny Diffusion Model with Alternative Core for Image Generation

See Complete Project Description
Objective: Design and implement a tiny diffusion model for low-resolution image generation, with a focus on architectural simplification, ablation analysis, and experimentation. Rather than using existing denoising diffusion probabilistic models (DDPMs) implementations, students are expected to build a minimal functional prototype from scratch, inspired by the original DDPM paper and recent simplifications.
Supervisor: Likun Cai

Topic C-3: Text-guided Image Editing through Latent Modification in VAEs

See Complete Project Description
Objective: Design and implement an image editing pipeline that modifies visual content based on a given textual prompt. The goal is to build a lightweight but effective multimodal editing system that uses text embeddings to guide latent modifications in a VAE-based generative model.
Supervisor: Likun Cai

Category D: Open-ended

Description: An open-ended project can be selected given that the project description is prepared similar to the standard course projects (see tha sample below). The project description should clearly specify the objective, motivation, considered requirements and milestones. You can submit your topic along with the description through Crowdmark.
Sample Project Description
Supervisor: Amirhosein Rostami

Templates for Proposal, Report and Presentation

Proposal Template This is the template for Category D: Open-ended. You can use other template as well
Report Template - LaTex: Other templates are not accepted!
Prsentation Template You can use other template as well