Description
The steps of transcription and translation currently appear to be relatively tightly coupled. We can see that the subtitles generated by the transcription are processed in the translation step.
# file: openlrc/openlrc.py
def process_translation(base_name, target_lang, transcribed_opt_sub, skip_trans):
...
if skip_trans:
shutil.copy(transcribed_opt_sub.filename, final_json_path)
transcribed_opt_sub.filename = final_json_path
return transcribed_opt_sub
...
And finally generated in translation worker.
def translation_worker(self, transcription_queue, target_lang, skip_trans, bilingual_sub):
...
# Handle translation
final_subtitle = process_translation(base_name, target_lang, transcribed_opt_sub, skip_trans)
# Generate and move subtitle files
generate_subtitle_files(final_subtitle, base_name, subtitle_format)
...
This seems to violate the SRP.
At the same time, even specified skip trans=True
, the translation thread will still be started. Users pay for the additional performance overhead even though they are not using it.
I wish we could decouple the two steps of transcription and translation:
- The translation step no longer processes transcribed files.
- The translation thread is no longer started when skip_trans=False
is specified.
I am not familiar with nlp related knowledge. But if you agree, maybe I can try to complete this improvement.