Bad Result #6

uhSuiL · 2024-06-26T13:13:14Z

I used your model on my task, it seems no that good?
I clipped the size of my img to (512, 128) following your size. The original input img is the first below followed by the result img.

Is there anything wrong?
Below is my code：

from PIL import Image
from omegaconf import OmegaConf

from model.IDM.utils.util import instantiate_from_config


if __name__ == '__main__':
    DiffTSR_yaml_config = './model/DiffTSR_config.yaml'
    DiffTSR_ckpt_config = './ckpt/DiffTSR.ckpt'
    DiffTSR_config = OmegaConf.load(DiffTSR_yaml_config)
    DiffTSR_model = instantiate_from_config(DiffTSR_config.model)
    DiffTSR_model.load_model(DiffTSR_ckpt_config)
    print("Model Loaded")
    
    lq_image_pil = Image.open('./test.png').convert('RGB')
    lq_image_pil = lq_image_pil.resize((512, 128))

    # Start sampling!
    sr_output = DiffTSR_model.DiffTSR_sample(lq_image_pil)
    # Save sr image!
    sr_image_pil = Image.fromarray(sr_output, 'RGB')
    sr_image_pil.save('./task2_new.png')

uhSuiL · 2024-06-26T13:43:58Z

I worried if the input img should consist of only single line text, so I conducted 3 other tests, result seems not to meet the expectation：
(Test below: I mask the second line)

(Test below: I cut out the second line and simply resize img to (512, 128) )

(Test below: I cut out the second line and white margin in the first line and resize it to (512, 128) making it not that deformed)

I guess my images is not that hard to recognize text for human.

uhSuiL · 2024-06-26T15:22:48Z

I conducted another test: padding on left, right, top, bottom to keep the size (512, 128), leaving the text image centric and deformed.
This is result:

YuzheZhang-1999 · 2024-06-27T02:58:35Z

Thank you for your interest in this work. There are a few key points that require clarification.

First, this project is currently only applicable to single-line text images, with the input size limited to patches of 128x512, and the number of text characters is no more than 24. Therefore, for other images containing text patches, you should first detect the text line image from the original image using a text detection method like PaddleOCR, then crop and resize the patches to 128x512 and input them into the DiffTSR model.

Second, the text area in the cropped image should occupy the center. Usually, the text patches detected by the text detection model meet this condition. Additionally, the DiffTSR model is robust to text deformation.

Third, the DiffTSR model focuses on scene text images. We have not fully tested its performance in other scenarios, but it can be easily adapted for other scenarios with fine-tuning.

For more details, please refer to the main manuscript and the supplementary materials. Thanks for your interest, and we are also working on developing methods that are more adaptable.

uhSuiL · 2024-06-27T08:43:03Z

Thanks for your reply and appreciate your work.

I'm going to check your paper again and follow your proposal soon afterwards to see whether the model actually works or not in my case. Please keep this issue open, I think I could post my feedback here.

sailgu · 2025-01-18T15:29:01Z

我这也是这样，怎么感觉和搞生物的一样。。。。

超分后：

YuzheZhang-1999 · 2025-05-08T08:51:54Z

我这也是这样，怎么感觉和搞生物的一样。。。。超分后：

YuzheZhang-1999 · 2025-05-08T08:52:12Z

I conducted another test: padding on left, right, top, bottom to keep the size (512, 128), leaving the text image centric and deformed. This is result:

YuzheZhang-1999 · 2025-05-08T09:24:01Z

I used your model on my task, it seems no that good? I clipped the size of my img to (512, 128) following your size. The original input img is the first below followed by the result img. Is there anything wrong? Below is my code：

from PIL import Image
from omegaconf import OmegaConf

from model.IDM.utils.util import instantiate_from_config

if name == 'main':
DiffTSR_yaml_config = './model/DiffTSR_config.yaml'
DiffTSR_ckpt_config = './ckpt/DiffTSR.ckpt'
DiffTSR_config = OmegaConf.load(DiffTSR_yaml_config)
DiffTSR_model = instantiate_from_config(DiffTSR_config.model)
DiffTSR_model.load_model(DiffTSR_ckpt_config)
print("Model Loaded")
lq_image_pil = Image.open('./test.png').convert('RGB')
lq_image_pil = lq_image_pil.resize((512, 128))

# Start sampling!
sr_output = DiffTSR_model.DiffTSR_sample(lq_image_pil)
# Save sr image!
sr_image_pil = Image.fromarray(sr_output, 'RGB')
sr_image_pil.save('./task2_new.png')

The input image needs to be reasonably cropped into patches, and you can pull the latest code for testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bad Result #6

Bad Result #6

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Bad Result #6

Bad Result #6

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!