8000 Bad Result · Issue #6 · YuzheZhang-1999/DiffTSR · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Bad Result #6

New issue

Have a question about this project? Sign up for a free GitHub 8000 account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
uhSuiL opened this issue Jun 26, 2024 · 8 comments
Open

Bad Result #6

uhSuiL opened this issue Jun 26, 2024 · 8 comments

Comments

@uhSuiL
Copy link
uhSuiL commented Jun 26, 2024

I used your model on my task, it seems no that good?
I clipped the size of my img to (512, 128) following your size. The original input img is the first below followed by the result img.
test
task2_new
Is there anything wrong?
Below is my code:

from PIL import Image
from omegaconf import OmegaConf

from model.IDM.utils.util import instantiate_from_config


if __name__ == '__main__':
    DiffTSR_yaml_config = './model/DiffTSR_config.yaml'
    DiffTSR_ckpt_config = './ckpt/DiffTSR.ckpt'
    DiffTSR_config = OmegaConf.load(DiffTSR_yaml_config)
    DiffTSR_model = instantiate_from_config(DiffTSR_config.model)
    DiffTSR_model.load_model(DiffTSR_ckpt_config)
    print("Model Loaded")
    
    lq_image_pil = Image.open('./test.png').convert('RGB')
    lq_image_pil = lq_image_pil.resize((512, 128))

    # Start sampling!
    sr_output = DiffTSR_model.DiffTSR_sample(lq_image_pil)
    # Save sr image!
    sr_image_pil = Image.fromarray(sr_output, 'RGB')
    sr_image_pil.save('./task2_new.png')
@uhSuiL
Copy link
Author
uhSuiL commented Jun 26, 2024

I worried if the input img should consist of only single line text, so I conducted 3 other tests, result seems not to meet the expectation:
(Test below: I mask the second line)
test2
test2_new
(Test below: I cut out the second line and simply resize img to (512, 128) )
test3
test3_new
(Test below: I cut out the second line and white margin in the first line and resize it to (512, 128) making it not that deformed)
test4
test4_new
I guess my images is not that hard to recognize text for human.

@uhSuiL
Copy link
Author
uhSuiL commented Jun 26, 2024

I conducted another test: padding on left, right, top, bottom to keep the size (512, 128), leaving the text image centric and deformed.
This is result:
test5
test5_new

@YuzheZhang-1999
Copy link
Owner

Thank you for your interest in this work. There are a few key points that require clarification.

First, this project is currently only applicable to single-line text images, with the input size limited to patches of 128x512, and the number of text characters is no more than 24. Therefore, for other images containing text patches, you should first detect the text line image from the original image using a text detection method like PaddleOCR, then crop and resize the patches to 128x512 and input them into the DiffTSR model.

Second, the text area in the cropped image should occupy the center. Usually, the text patches detected by the text detection model meet this condition. Additionally, the DiffTSR model is robust to text deformation.

Third, the DiffTSR model focuses on scene text images. We have not fully tested its performance in other scenarios, but it can be easily adapted for other scenarios with fine-tuning.

For more details, please refer to the main manuscript and the supplementary materials. Thanks for your interest, and we are also working on developing methods that are more adaptable.

@uhSuiL
Copy link
Author
uhSuiL commented Jun 27, 2024

Thanks for your reply and appreciate your work.

I'm going to check your paper again and follow your proposal soon afterwards to see whether the model actually works or not in my case. Please keep this issue open, I think I could post my feedback here.

@sailgu
Copy link
sailgu commented Jan 18, 2025

我这也是这样,怎么感觉和搞生物的一样。。。。
Image
超分后:
Image

@YuzheZhang-1999
Copy link
Owner

我这也是这样,怎么感觉和搞生物的一样。。。。 Image 超分后: Image

Image

@YuzheZhang-1999
Copy link
Owner

I conducted another test: padding on left, right, top, bottom to keep the size (512, 128), leaving the text image centric and deformed. This is result: test5 test5_new

Image

@YuzheZhang-1999
Copy link
Owner

I used your model on my task, it seems no that good? I clipped the size of my img to (512, 128) following your size. The original input img is the first below followed by the result img. test task2_new Is there anything wrong? Below is my code:

from PIL import Image
from omegaconf import OmegaConf

from model.IDM.utils.util import instantiate_from_config

if name == 'main':
DiffTSR_yaml_config = './model/DiffTSR_config.yaml'
DiffTSR_ckpt_config = './ckpt/DiffTSR.ckpt'
DiffTSR_config = OmegaConf.load(DiffTSR_yaml_config)
DiffTSR_model = instantiate_from_config(DiffTSR_config.model)
DiffTSR_model.load_model(DiffTSR_ckpt_config)
print("Model Loaded")

lq_image_pil = Image.open('./test.png').convert('RGB')
lq_image_pil = lq_image_pil.resize((512, 128))

# Start sampling!
sr_output = DiffTSR_model.DiffTSR_sample(lq_image_pil)
# Save sr image!
sr_image_pil = Image.fromarray(sr_output, 'RGB')
sr_image_pil.save('./task2_new.png')

The input image needs to be reasonably cropped into patches, and you can pull the latest code for testing.

Image
Image
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0