Correct timestep for initial noise addition#533
Closed
patrickvonplaten wants to merge 1 commit intoCompVis:mainfrom
Closed
Correct timestep for initial noise addition#533patrickvonplaten wants to merge 1 commit intoCompVis:mainfrom
patrickvonplaten wants to merge 1 commit intoCompVis:mainfrom
Conversation
This was referenced Dec 10, 2022
patrickvonplaten
added a commit
to patrickvonplaten/stablediffusion
that referenced
this pull request
Dec 10, 2022
Analog to CompVis/stable-diffusion#533 the timestep for the noise addition seems to be off by one here
This was referenced Dec 27, 2022
Contributor
Author
|
Closing because of inactivity |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After some experimentation, I am pretty sure that the wrong timestep is used for the additive noise for image-2-image.
Essentially what is happening here is that
t+1is used as the timestep to add noise to the original image, but[0, ..., t]is used afterwards for the denoising process. We should however also usetwhen adding the noise to the original image.This can be quite easily verified by doing the following. Run a img2img with a small number of update steps and a very low strength because then differences between t and t+1 become quite clear.
E.g. when I run:
With the current code, I get the following output:

After the fix, I get some output which has much less noise, therefore showcasing that the noise addition and consecutive denoising process matches:
