Abstract: The existing text-to-image generation methods based on stable diffusion yield better results in low-semantic prompt but often neglect the generation quality of high-semantic prompt such as ...