This is a Streamlit application that allows users to upload an image, extracts text from the image using Azure Computer Vision, and then generates a Word document with the extracted text. The Word document is formatted with a step-by-step guide based on the extracted text, and it also includes the uploaded image as a screenshot. The application is built by Tony Esposito.
- Upload an image in PNG, JPEG, or JPG format.
- Extract text from the image using Azure Computer Vision.
- Generate a Word document that includes the extracted text and the image.
- Download the generated Word document.
- Python 3.x
- Streamlit
- Azure Cognitive Services SDK
- PIL (Pillow)
- python-docx
- OpenAI Python package
-
Clone the repository:
git clone <repository-url>
-
Navigate to the project directory:
cd <project-directory>
-
Install the required packages:
pip install -r requirements.txt
-
Set up environment variables:
AZURE_SUBSCRIPTION_KEY
: Your Azure subscription key for Computer Vision.AZURE_ENDPOINT
: Endpoint URL for Azure Computer Vision.OPENAI_API_KEY
: Your OpenAI API key.
You can set these variables in a
.env
file or directly in your system's environment variables. -
Run the Streamlit application:
streamlit run <your-script-name>.py
- Open the Streamlit application in your web browser.
- Upload an image using the file uploader.
- The application will display the text extracted from the image.
- A Word document will be generated, and a download button will appear.
- Click the download button to get the generated Word document.
generate_word_document(text, image_path, doc_path)
: Function to generate the Word document.- Streamlit UI: Code for rendering the Streamlit interface.
- Azure Computer Vision: Code for extracting text from images.
- Word Document Generation: Code for creating and saving the Word document.
Built by Tony Esposito.