WO2021221614A1 - Document orientation detection and correction - Google Patents
Document orientation detection and correction Download PDFInfo
- Publication number
- WO2021221614A1 WO2021221614A1 PCT/US2020/030262 US2020030262W WO2021221614A1 WO 2021221614 A1 WO2021221614 A1 WO 2021221614A1 US 2020030262 W US2020030262 W US 2020030262W WO 2021221614 A1 WO2021221614 A1 WO 2021221614A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- orientation
- electronic document
- classification
- rotated
- processor
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/242—Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/243—Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Definitions
- Documents may be scanned on flatbed scanners, mobile scanners, or the like. However, depending on the orientation in which documents were scanned, the scanned electronic document may not be stored, printed or displayed in the proper orientation. For example, a document may be scanned upside-down, resulting in the scanned electronic document being in an incorrect orientation by 180 degrees.
- FIG. 1 depicts a block diagram of an example apparatus that determines an orientation of an electronic document based on a machine-learning (ML) orientation classifier;
- ML machine-learning
- FIG. 2 depicts a block diagram of an example of training the ML orientation classifier based on a training corpus of labeled documents that are labeled with respective orientations;
- FIG. 3 depicts an example of orientation classifications generated by the ML orientation classifier
- FIG. 4 depicts an example of a pipeline for detecting and correcting an orientation of an electronic document
- FIG. 5 depicts an example of illustrative details of using the pipeline for detecting and correcting an orientation of an electronic document
- FIG. 6 depicts a flow diagram of an example method of determining and adjusting an orientation of an electronic document
- FIG. 7 depicts a block diagram of an example non-transitory machine-readable storage medium of determining a corrective angle of rotation to correct an orientation of an electronic document based on an orientation classification of the electronic document determined by the ML orientation classifier.
- the terms “a” and “an” may be intended to denote at least one of a particular element.
- the term “includes” means includes but not limited to, the term “including” means including but not limited to.
- the term “based on” means based at least in part on.
- An electronic document may refer to a non-physical file that may be stored on and accessed from a computer readable medium (examples of computer readable media are disclosed later herein).
- Examples of an electronic document may include, without limitation, an imaged physical document (such as obtained from a scanner device or camera), a photograph, and/or other non physical file that may be stored on and accessed from a computer readable medium.
- an electronic document may be converted from another electronic document (such as saving an electronic document in one format into another format) without retaining orientation information of the original electronic document.
- An orientation of an electronic document may refer to a rotational angle of a document about a central axis of rotation that protrudes from a reading plane on which content is to be consumed.
- An electronic document may therefore have multiple orientations depending on how the document is scanned or otherwise created. It should be noted that the orientation as used herein is distinct from a “portrait” or “landscape” layout of a document.
- a “correct orientation” of an electronic document may refer to an orientation in which the electronic document is intended to be consumed.
- the correct orientation may include an orientation in which the human user is to read the electronic document on a display screen, or how the electronic document is to be printed as a physical document.
- An incorrect orientation of the electronic document may result in user frustration but also downstream image processing issues such as improper Optical Character Recognition (OCR), which may depend upon a correct orientation to correctly recognize text, or preparing the electronic document for printing.
- OCR Optical Character Recognition
- an apparatus may determine an orientation of an electronic document using an ML orientation classifier that is trained on a training corpus of electronic documents. Electronic documents in the training corpus may be labeled to indicate their respective orientations. Electronic documents in the training corpus may be referred to as “labeled documents.”
- the ML orientation classifier may be trained using a Convolutional Neural Network (CNN), which is an ML computer vision technique.
- CNN Convolutional Neural Network
- an apparatus may electronically rotate the electronic document by various angles of rotation clockwise and/or counter clockwise (such as 0, 90, 180, and 270) to generate rotated images for analysis by the ML orientation classifier. The ML orientation classifier may output orientation classifications for each rotated image.
- Each classification in the set of classifications may correspond to a predicted orientation of the rotated image.
- the ML orientation classifier may output probabilities that the rotated image is in a particular orientation.
- the ML orientation classifier may output a corrective angle of rotation that should be applied to the rotated image to achieve the correct orientation for the rotated image and a respective probability that a corresponding corrective angle of rotation should be applied to the rotated image.
- the apparatus may post-process outputs (the sets of classifications of all the rotated images) of the ML orientation classifier.
- the scores from the ML orientation classifier of each rotated image may be normalized according to the rotation applied to each of the rotated images to map the orientation classifications for each rotated image to the original electronic document. An average across each class normalized score may then be calculated to give a final orientation classification and confidence of the electronic document.
- the final orientation classification may indicate an orientation of the electronic document.
- the apparatus may determine a corrective angle of rotation to apply to the electronic document based on the final orientation classification. In these examples, the apparatus may correct the orientation of electronic document based on the corrective angle of rotation. Such correction may improve image processing capabilities, such as optical character recognition that may depend upon correct orientation of images (such as the electronic document) being analysed, preparing the electronic document for printing, and/or other image processing that may depend on a correct orientation of electronic documents.
- FIG. 1 depicts a block diagram of an example apparatus 100 that determines an orientation of an electronic document based on an ML orientation classifier.
- the apparatus 100 shown in FIG. 1 may be a computing device, a server, a device being tested for failure, and/or other devices.
- the apparatus 100 may include a processor 102 that may control operations of the apparatus 100.
- the processor 102 may be a semiconductor-based microprocessor, a central processing unit (CPU), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), and/or other suitable hardware device.
- CPU central processing unit
- ASIC application specific integrated circuit
- FPGA field-programmable gate array
- the apparatus 100 has been depicted as including a single processor 102, it should be understood that the apparatus 100 may include multiple processors, multiple cores, or the like, without departing from the scope of the apparatus 100 disclosed herein.
- the apparatus 100 may include a memory 110 that may have stored thereon machine-readable instructions (which may also be termed computer readable instructions) 112-120 that the processor 102 may execute.
- the memory 110 may be an electronic, magnetic, optical, or other physical storage device that includes or stores executable instructions.
- the memory 110 may be, for example, Random Access memory (RAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage device, an optical disc, and the like.
- the memory 110 may be a non-transitory machine-readable storage medium, where the term “non-transitory” does not encompass transitory propagating signals. It should be understood that the example apparatus 100 depicted in FIG. 1 may include additional features and that some of the features described herein may be removed and/or modified without departing from the scope of the example apparatus 100. In the examples that follow in FIG. 1 , reference will be made to FIG. 5 for illustrative clarity.
- the processor 102 may fetch, decode, and execute the instructions 112 to access an electronic document 501 , for which a correct orientation of the electronic document 501 may be unknown.
- the electronic document 501 may be scanned from an flatbed or portable scanner, imaged by a camera, converted from another electronic document, and/or be generated or stored by another source of electronic documents.
- the processor 102 may fetch, decode, and execute the instructions
- Each rotated image (501 A, 501 B, 501 C, and 501 D) among the plurality of rotated images 501 A-D may correspond to the electronic document 501 being electronically rotated by a corresponding angle of rotation).
- the electronic document 501 may be electronically rotated by 0° to generate rotated image 501 A (thus, digital rotation by 0° may correspond to not rotating the electronic document 501 ).
- the electronic document 501 may be electronically rotated by 90° to generate rotated image 501 B.
- the electronic document 501 may be electronically rotated by 180° to generate rotated image 501 C and electronically rotated by 270° to generate rotated image 501 D.
- An example of electronically rotating an electronic document is described further respect to the image rotator 410 illustrated at FIG. 4. It should be noted that the angles of rotation may be in a first direction (such as a counter-clockwise direction) including a first set of angles comprising 0°, 90°, 180°, and 270° as shown in FIG. 5.
- angles of rotation may be in another direction (such as a clockwise direction) including a second set of angles comprising zero, -90, -180, and -270. So long as the angle of rotation and its direction are accounted for, any direction of rotation may be used.
- the arrow direction illustrated in each of the rotated images 501 A-D does not denote a correct orientation of the electronic document 501. Rather, the arrow direction is merely intended to convey digital rotation of the electronic document 501 by a corresponding angle of rotation.
- rotated image 501 D does not necessarily denote a correct orientation of the electronic document 501. Still further, although four angles of rotation are shown in FIG.
- any number of angles of rotation greater than two may be used.
- any number of rotated images 501A-D greater than two may be generated.
- the processor 102 may fetch, decode, and execute the instructions 116 and 118.
- the processor 102 may fetch, decode, and execute the instructions 116 to provide the rotated image as an input to a machine-learning (ML) classifier (such as the ML orientation classifier 230 illustrated in FIG. 5).
- the ML classifier may be trained on a training corpus (such as training corpus 210) of images labeled with orientations.
- the processor 102 may further obtain, as an output of the ML orientation classifier 230, a plurality of orientation classifications (510A, 510B, 510C, or 510D) for the rotated image 501 A, 501 B, 501 C, or 501 D.
- a plurality of orientation classifications 510A, 510B, 510C, or 510D
- Each rotated image (501 A, 501 B, 501 C, or 501 D) of the plurality of rotated images 501 A-D may therefore be assigned with a respective plurality of orientation classifications 510A-D for the rotated image 501 A, 501 B, 501 C, or 501 D.
- rotated image 501 A may be associated with a plurality of orientation classifications 510A
- rotated image 501 B may be associated with a plurality of orientation classifications 510B
- rotated image 501 C may be associated with a plurality of orientation classifications 510C
- rotated image 501 D may be associated with a plurality of orientation classifications 510D.
- Other plurality of orientation classifications may include similar classifications.
- the plurality of classifications 510A-D may each be associated with a plurality of scores 512A-D.
- each orientation classification from among the plurality of orientation classifications of each rotated image may include a respective score 512 indicating a probability that the orientation classification is correct.
- the 0° orientation classification includes a score of 0.97
- the 90° orientation classification includes a score of 0.02
- the 180° orientation classification includes a score of 0.00
- the 270° orientation classification includes a score of 0.01 .
- the ML orientation classifier 230 may determine a 97% probability that the rotated image 501 A is in the 0° (correct orientation), a 2% probability that the rotated image 501 A is rotated 90° relative to the correct orientation, a 0% probability that the rotated image 501 A is rotated 180° relative to the correct orientation, and a 1% probability that the rotated image 501 A is rotated 270° relative to the correct orientation.
- Other plurality of orientation classifications 510B-D may include a respective plurality of scores 512B-D as described with respect to the plurality of classifications 510A.
- the processor 102 may fetch, decode, and execute the instructions 118 to determine a final orientation classification 540 of the electronic document 501 based on a plurality of orientation classifications (such as the plurality of classifications 510A-D and scores 512A-D) received from the ML orientation classifier 230.
- a plurality of orientation classifications such as the plurality of classifications 510A-D and scores 512A-D
- the instructions may further cause the processor to aggregate the respective classifications and scores of each orientation classification of the rotated images 501 A-D to generate a plurality of aggregate classifications 530 and corresponding plurality of respective aggregate scores 532.
- the process may then the final orientation classification 540 based on the plurality of aggregate classifications 530 and corresponding plurality of respective aggregate scores 532.
- 0° classification score associated with each of the rotated images 501 A-D may be aggregated together to create an aggregate score for the 0° classification.
- each 90° classification associated with each of the rotated images 501 A-D may be aggregated together to create an aggregate score for the 90° classification.
- each 180° classification associated with each of the rotated images 501 A-D may be aggregated together to create an aggregate score for the 180° classification.
- each 270° classification associated with each of the rotated images 501 A-D may be aggregated together to create an aggregate score for the 270° classification.
- the instructions may further cause the processor to normalize each orientation classification from among the plurality of orientation classifications based on a corresponding angle of rotation applied to the electronic document used to generate the corresponding rotated image, wherein the respective scores are aggregated after the respective scores of each orientation classification have been normalized.
- the ML orientation classifier 230 may classify the orientation of an input without respect to whether the input has been rotated by the image rotator 410.
- the classifications 510A-D may refer to classifications relative to its own frame of reference. To normalize the frame of reference back to the original orientation of the electronic document, each of the classifications 510 may be shifted, in a magnitude and direction opposite of the angle of rotation to create the rotated image 501 A-D. To illustrate, reference will be made to the plurality of classifications 510A-D and normalized plurality of classifications 520A-D.
- the plurality of classifications 510A are normalized by 0°, resulting in no changes.
- other rotated images 501 B-D were rotated by a non-zero angle of rotation and therefore may be normalized by the non-zero angle of rotation in an opposite direction.
- the plurality of classifications 510B may each be normalized by -90° (where the negative sign indicates an opposite direction) to generate corresponding ones of the normalized plurality of classifications 520B.
- each classification of the plurality of classifications 520A may be shifted by -90° to take the score of a classification 510A corresponding to the shifted classification.
- the 0° classification of the classifications 510A may be normalized by -90° to become the 270° classification (where 0-90 is 270 for rotational angle purposes) of the normalized classifications 520A
- the 90° classification of the classifications 510A may be normalized by -90° to become the 0° classification of the normalized classifications 520A
- the 180° classification of the classifications 510A may be normalized by -90° to become the 90° classification of the normalized classifications 520A
- the 270° classification of the classifications 510A may be normalized by -90° to become the 180° classification of the normalized classifications 520A.
- Other ones of the plurality of classifications 510C-D may be similarly normalized.
- each of the plurality of classifications 520A-D may refer to the original orientation of the electronic document 501.
- the instructions may further cause the processor to determine an average score of each normalized orientation classification.
- the normalized 0° classification of the plurality of classifications 520A-D may be averaged to obtain an average score of 0.2425
- the normalized 90° classification of the plurality of classifications 520A-D may be averaged to obtain an average score of 0.5125
- the normalized 180° classification of the plurality of classifications 520A-D may be averaged to obtain an average score of 0.005
- the normalized 270° classification of the plurality of classifications 520A-D may be averaged to obtain an average score of 0.24.
- the final orientation classification 540 may be selected based on a highest average. As shown in FIG. 5, for example, the highest average may correspond to the 90°.
- the instructions may further cause the processor to determine, based on the final orientation classification 540, a corrective angle of rotation to apply to the electronic document to achieve a target orientation, the corrective angle of rotation being selected from among zero and non-zero values, wherein a zero value for the corrective angle of rotation indicates that the electronic document is already in the target orientation.
- the instructions may further cause the processor to electronically rotate the electronic document based on the corrective angle of rotation.
- the electronic document 501 may be determined to be at a 90° orientation based on the final orientation classification 540 and the angle of correction may be determined based on a magnitude and opposite direction of the final orientation classification 540.
- the electronic document 501 may be digitally by -90° to correct the orientation of the electronic document 501.
- a performance of such image processing may be improved based on the corrected orientation of the electronic document 501.
- the image processing may include OCR on the electronic document 501 , which may be improved by facilitating letter and word recognition when the correct orientation of the electronic document 501 is used.
- the image processing may include print formatting on the electronic document 501 , which may be improved by facilitating proper print orientations when the correct orientation of the electronic document 501 is used.
- the labeled documents 212A, 212B, 212C, 212D may respectively include a collection of electronic documents that have been labeled (such as by a human annotator) as being in a 0°, 90°, 180°, and 270° orientation.
- the training corpus 210 may include various labeled examples of electronic documents that are in various orientations.
- a labeled document 212A-D may include image objects, text objects, and/or other type of content.
- ML training 220 may include image object training 222, text object training 224, and/or other types of content training to correlate objects found in the labeled documents 212A-D with respective orientation labels.
- image object training 222 may include identifying image objects in labeled documents 212A-D. In some examples, such image identification may be automated or may be labeled by human annotators as well.
- image object training 222 may include identifying text objects in labeled documents 212A-D. In some examples, such image identification may be automated or may be labeled by human annotators as well.
- image object training 222 and text object training 224 may be merged to identify pixel features (regardless of whether such pixel features were from an image object or text object).
- ML training 220 may be based on a CNN.
- a CNN may assign learnable weights and biases to various objects, such as image and/or text objects in the labeled documents 212A-D in order to distinguish the objects from one another.
- the weights and biases may reflect distinctions in objects that are oriented one way or another.
- the ML orientation classifier 230 may output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels.
- the orientation classifications may include a set of orientations, where each orientation is associated with a respective probability that the orientation is correct.
- FIG. 3 depicts an example 300 of orientation classifications 310 generated by the ML orientation classifier 230.
- the ML orientation classifier 230 may take, as an input, an electronic document 301. The correct orientation of the electronic document 301 may be unknown.
- the ML orientation classifier 230 may output, based on the input electronic document 301 , a plurality of orientation classifications 310 that include or are associated with a corresponding plurality of scores 312.
- Each orientation classification of the plurality of classifications 310 may be a prediction by the ML orientation classifier 230 that the electronic document 301 is a corresponding orientation.
- Each score of the plurality of scores 320 may indicate a probability that the corresponding orientation is correct.
- the plurality of orientation classifications 310 may include four classifications output by the ML orientation classifier 230, although other numbers of classifications may be output.
- the plurality of orientation classifications 310 may include a 0° classification, a 90° classification, a 180° classification, and a 270° classification.
- One of the plurality of orientation classifications 310 may correspond to a correct orientation while other ones of the plurality of orientation classifications 310 may correspond to an incorrect orientation.
- the 0° classification may correspond to a prediction that the electronic document 301 is in a correct orientation
- the 90° orientation correspond to a prediction that the electronic document 301 is in an incorrect orientation that is rotated 90° relative to the correct orientation
- the 180° orientation correspond to a prediction that the electronic document 301 is in an incorrect orientation that is rotated 180° relative to the correct orientation
- the 270° orientation correspond to a prediction that the electronic document 301 is in an incorrect orientation that is rotated 270° relative to the correct orientation.
- the plurality of scores 312 may sum to a probability of 1.0 (or 100 percent).
- the ML orientation classifier 230 predicts that the electronic document 301 is in a 0° (correct) orientation with 8 percent probability, a 90° incorrect orientation with 76 percent probability, a 180° incorrect orientation with 16 percent probability, and a 270° incorrect orientation with 5 percent probability. It should be noted that the plurality of scores 312 may be represented using any scoring value (other than decimal-based probabilities).
- FIG. 4 depicts an example of a pipeline 400 for detecting and correcting an orientation of an electronic document.
- the pipeline 400 may include an electronic document generator 402, an image rotator 410, the ML orientation classifier 230, a score normalizer420, and a pondered voting 430.
- the electronic document generator 402, the image rotator 410, the ML orientation classifier 230, the score normalizer 420, and/or the pondered voting 430 may include instructions fetched and executed by a processor (such as processor 102 illustrated in FIG. 1) and/or hardware.
- the electronic document generator 402, the image rotator 410, the ML orientation classifier 230, the score normalizer 420, and/or the pondered voting 430 may be under the control of a processor, such as the processor 102.
- the electronic document generator 402 may generate the electronic document 401.
- the electronic document generator 402 may include a scanner device, such as one incorporated in a flatbed or mobile type of scanner, a camera, such as one incorporated in a mobile phone, and/or other type of document generator, including applications that may convert an electronic document from one format to another format.
- the image rotator 410 may electronically rotate the electronic document 401 based on angles of rotation 0°, 90°, 180°, and 270° to generate four rotated images (conceptually depicted as four arrows). Other numbers of angles of rotation may be used instead.
- the image rotator 410 may electronically rotate the electronic document 401 based on interpolation. For example, the image rotator 410 may map a pixel of the electronic document 401 from an original position in a pixel grid to an interpolated position in the pixel grid based on the angle of rotation. In some examples, the angles of rotation may be in increments of 90° to avoid lossless interpolation. A 90° rotation may be lossless because each pixel may be repositioned onto another pixel position while a rotation other than 90° may reposition the original pixel onto a border between two pixels. Thus, the original pixel may be divided, resulting in loss of data at the divided pixel.
- the ML orientation classifier 230 may take as input each image of the rotated images from the image rotator 410.
- the ML orientation classifier 230 may output, for each rotated image, a plurality of classifications, including a corresponding plurality of scores.
- the ML orientation classifier 230 may be invoked four times, once for each rotated image from the image rotator 410.
- the ML orientation classifier 230 may generate four sets of a plurality of orientation classifications with corresponding scores.
- the score normalizer 420 may normalize the scores to account for the angle of rotation applied to the electronic document 401 . Such normalization may adjust the classifications and scores to correspond to the original (non-rotated) orientation of the electronic document 401.
- the pondered voting 430 may determine an aggregate (such as sum and/or average) of each classification score corresponding to each orientation classification to generate aggregate scores.
- a final orientation classification 440 may be generated.
- a corrective angle of rotation 450 may be determined based on the final orientation classification 440.
- the corrective angle of rotation 450 may be the reverse direction and same magnitude of the final orientation classification 440.
- a corrected electronic document 403 may be generated by invoking the image rotator 410 to electronically rotate the electronic document 401 based on the corrective angle of rotation 450.
- the pipeline 400 may be implemented according to various architectures (not shown).
- the pipeline 400 may be implemented entirely within the apparatus 100 illustrated in FIG. 1.
- the apparatus 100 may generate the electronic document 401 , such as by scanning or taking a photograph of a physical document, and also detect and correct the orientation of the electronic document 401.
- the apparatus 100 may include a scanner device or a mobile phone that may detect and/or correct the orientation of the electronic document 401 using the image rotator 410, the ML orientation classifier 230, the score normalizer 420, and/or the pondered voting 430.
- the pipeline 400 may be implemented in a distributed manner.
- the electronic document generator 402 may be separate from the apparatus 100.
- the apparatus 100 may include a server device that receives the electronic document 401 from the electronic document generator 402 and then detects and/or corrects the orientation of the electronic document 401 using the image rotator 410, the ML orientation classifier 230, the score normalizer 420, and/or the pondered voting 430.
- FIG. 6 depicts a flow diagram that illustrates an example method 600 of determining and adjusting an orientation of an electronic document (such as electronic document 501).
- the method 600 may include accessing an electronic document.
- the method 600 may include electronically rotating the electronic document by a plurality of angles of rotation (such as 0, 90, 180, 270 illustrated in FIG. 5) to generate a plurality of rotated images (such as rotated images 501 A-D). Each rotated image among the plurality of rotated images may correspond to the electronic document being electronically rotated by a corresponding angle of rotation.
- the method 600 may include, for each rotated image from among the plurality of rotated images, determining a plurality of orientation classifications (such as a plurality of orientation classifications 510A-D) of the rotated image and a score (such as a score among the plurality of scores 520A- D) for each orientation classification indicating a probability that the orientation classification is correct.
- a plurality of orientation classifications such as a plurality of orientation classifications 510A-D
- a score such as a score among the plurality of scores 520A- D
- the method 600 may include determining an aggregate score based on the scores for the plurality of orientation classifications.
- the method 600 may include determining a final orientation classification (such as the final orientation classification 540) based on the aggregate score.
- the method 600 may include adjusting (such as electronically rotating) an orientation of the electronic document based on the final orientation classification.
- Some or all of the operations set forth in the method 600 may be included as utilities, programs, or subprograms, in any desired computer accessible medium.
- the method 600 may be embodied by computer programs, which may exist in a variety of forms.
- some operations of the method 600 may exist as machine-readable instructions, including source code, object code, executable code or other formats. Any of the above may be embodied on a non-transitory machine-readable (such as computer-readable) storage medium. Examples of non-transitory machine-readable storage media include computer system RAM, ROM, EPROM, EEPROM, and magnetic or optical disks or tapes. It is therefore to be understood that any electronic device capable of executing the above-described functions may perform those functions enumerated above.
- FIG. 7 depicts a block diagram of an example non-transitory machine-readable storage medium 700 of determining a corrective angle of rotation to correct an orientation of an electronic document based on an orientation classification of the electronic document determined by the ML orientation classifier 230.
- the machine-readable instructions 702 may cause the processor (such as processor 102 illustrated in FIG. 1) to access an electronic document (such as electronic document 501 ).
- the machine-readable instructions 704 may cause the processor to electronically rotate the electronic document by a plurality of angles of rotation (such as 0, 90, 180, and 270 illustrated in FIG. 5) to generate a plurality of rotated images (such as rotated images 501 A-D). Each rotated image among the plurality of rotated images may correspond to the electronic document being electronically rotated by a corresponding angle of rotation.
- the processor may fetch, decode, and execute the machine-readable instructions 706 to provide the rotated image as an input to a machine-learning (ML) classifier (such as the ML orientation classifier 230).
- the ML classifier may be trained on a training corpus of images (such as training corpus 210) labeled with orientations.
- the processor may be caused to obtain as an output of the ML orientation classifier a plurality of orientation classifications (such as the plurality of classifications 51 OA-D) for the rotated image.
- the machine-readable instructions 708 may cause the processor to determine, based on a plurality of orientation classifications received from the ML orientation classifier, a final orientation classification (such as the final orientation classification 540).
- the machine-readable instructions 710 may cause the processor to determine, based on the final orientation classification, a corrective angle of rotation to apply to the electronic document to achieve a target orientation (such as a correct orientation), the corrective angle of rotation being selected from among zero and non-zero values.
- a zero value for the corrective angle of rotation indicates that the electronic document is already in the target orientation.
- each orientation classification from among the plurality of orientation classifications of each rotated image may associated with a respective score (such as a score from among each of the plurality of scores 512A-D) indicating the probability that the corresponding orientation classification is correct.
- the instructions when executed further cause the processor to determine an average of the respective scores of each orientation classification, and determine the final orientation classification based on the average of the respective scores.
- the instructions when executed further cause the processor to adjust (such as electronically rotate) an orientation of the electronic document based on the selected corrective angle if the selected corrective angle is non-zero, perform image analysis on the electronic document based on the adjusted orientation.
- the instructions when executed further cause the processor to perform OCR on the electronic document to recognize text or prepare the electronic document for printing.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
- Editing Of Facsimile Originals (AREA)
Abstract
An apparatus may include a processor that may be caused to access an electronic document and electronically rotate the electronic document by a plurality of angles of rotation to generate a plurality of rotated images. For each rotated image from among the plurality of rotated images, the processor may further provide the rotated image as an input to a machine-learning (ML) orientation classifier. The processor may further determine a final orientation classification of the electronic document based on a plurality of orientation classifications received from the ML orientation classifier.
Description
DOCUMENT ORIENTATION DETECTION AND CORRECTION
BACKGROUND
[0001] Documents may be scanned on flatbed scanners, mobile scanners, or the like. However, depending on the orientation in which documents were scanned, the scanned electronic document may not be stored, printed or displayed in the proper orientation. For example, a document may be scanned upside-down, resulting in the scanned electronic document being in an incorrect orientation by 180 degrees.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] Features of the present disclosure may be illustrated by way of example and not limited in the following figure(s), in which like numerals indicate like elements, in which:
[0003] FIG. 1 depicts a block diagram of an example apparatus that determines an orientation of an electronic document based on a machine-learning (ML) orientation classifier;
[0004] FIG. 2 depicts a block diagram of an example of training the ML orientation classifier based on a training corpus of labeled documents that are labeled with respective orientations;
[0005] FIG. 3 depicts an example of orientation classifications generated by the ML orientation classifier;
[0006] FIG. 4 depicts an example of a pipeline for detecting and correcting an orientation of an electronic document;
[0007] FIG. 5 depicts an example of illustrative details of using the pipeline for detecting and correcting an orientation of an electronic document;
[0008] FIG. 6 depicts a flow diagram of an example method of determining and adjusting an orientation of an electronic document; and [0009] FIG. 7 depicts a block diagram of an example non-transitory machine-readable storage medium of determining a corrective angle of rotation to correct an orientation of an electronic document based on an orientation classification of the electronic document determined by the ML orientation classifier.
DETAILED DESCRIPTION
[0010] For simplicity and illustrative purposes, the present disclosure may be described by referring mainly to examples. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure.
[0011] Throughout the present disclosure, the terms “a” and “an” may be intended to denote at least one of a particular element. As used herein, the term “includes” means includes but not limited to, the term “including” means including but not limited to. The term “based on” means based at least in part on.
[0012] Disclosed herein are improved apparatuses, methods, and machine-readable media that may detect and correct an orientation of electronic documents. An electronic document may refer to a non-physical file that may be stored on and accessed from a computer readable medium (examples of computer readable media are disclosed later herein). Examples of an electronic document may include, without limitation, an imaged physical document (such as obtained from a scanner device or camera), a photograph, and/or other non physical file that may be stored on and accessed from a computer readable medium. In some examples, an electronic document may be converted from another electronic document (such as saving an electronic document in one format into another format) without retaining orientation information of the original electronic document.
[0013] An orientation of an electronic document (which may also be referred to as a document orientation) may refer to a rotational angle of a document about a central axis of rotation that protrudes from a reading plane on which content is to be consumed. An electronic document may therefore have multiple orientations depending on how the document is scanned or otherwise created. It should be noted that the orientation as used herein is distinct from a “portrait” or “landscape” layout of a document. A “correct orientation” of an electronic document may refer to an orientation in which the electronic document is intended to be consumed. For example, the correct orientation may include an orientation in which the human user is to read the electronic document on a display screen, or how the electronic document is to be printed as a physical document. An incorrect orientation of the electronic document may result in user frustration but also downstream image processing issues such as improper Optical Character Recognition (OCR), which may depend upon a correct orientation to correctly recognize text, or preparing the electronic document for printing.
[0014] In some examples, an apparatus may determine an orientation of an electronic document using an ML orientation classifier that is trained on a training corpus of electronic documents. Electronic documents in the training corpus may be labeled to indicate their respective orientations. Electronic documents in the training corpus may be referred to as “labeled documents.” In some examples, the ML orientation classifier may be trained using a Convolutional Neural Network (CNN), which is an ML computer vision technique.
[0015] In some examples, an apparatus may electronically rotate the electronic document by various angles of rotation clockwise and/or counter clockwise (such as 0, 90, 180, and 270) to generate rotated images for analysis by the ML orientation classifier. The ML orientation classifier may output orientation classifications for each rotated image. Each classification in the set of classifications may correspond to a predicted orientation of the rotated image. Thus, in some examples, for each rotated image, the ML orientation classifier may output probabilities that the rotated image is in a particular orientation. In other examples, the ML orientation classifier may output a corrective angle of rotation that should be applied to the rotated image to achieve the correct orientation for the rotated image and a respective probability that a corresponding corrective angle of rotation should be applied to the rotated image.
[0016] In some examples, the apparatus may post-process outputs (the sets of classifications of all the rotated images) of the ML orientation classifier. The scores from the ML orientation classifier of each rotated image may be normalized according to the rotation applied to each of the rotated images to map the orientation classifications for each rotated image to the original electronic document. An average across each class normalized score may then be calculated to give a final orientation classification and confidence of the electronic document. The final orientation classification may indicate an orientation of the electronic document. In some examples, the apparatus may determine a corrective angle of rotation to apply to the electronic document based on the final orientation classification. In these examples, the apparatus may correct the orientation of electronic document based on the corrective angle of rotation. Such
correction may improve image processing capabilities, such as optical character recognition that may depend upon correct orientation of images (such as the electronic document) being analysed, preparing the electronic document for printing, and/or other image processing that may depend on a correct orientation of electronic documents.
[0017] FIG. 1 depicts a block diagram of an example apparatus 100 that determines an orientation of an electronic document based on an ML orientation classifier. The apparatus 100 shown in FIG. 1 may be a computing device, a server, a device being tested for failure, and/or other devices.
[0018] As shown in FIG. 1 , the apparatus 100 may include a processor 102 that may control operations of the apparatus 100. The processor 102 may be a semiconductor-based microprocessor, a central processing unit (CPU), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), and/or other suitable hardware device. Although the apparatus 100 has been depicted as including a single processor 102, it should be understood that the apparatus 100 may include multiple processors, multiple cores, or the like, without departing from the scope of the apparatus 100 disclosed herein.
[0019] The apparatus 100 may include a memory 110 that may have stored thereon machine-readable instructions (which may also be termed computer readable instructions) 112-120 that the processor 102 may execute. The memory 110 may be an electronic, magnetic, optical, or other physical storage device that includes or stores executable instructions. The memory 110 may be, for example, Random Access memory (RAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage device, an optical disc,
and the like. The memory 110 may be a non-transitory machine-readable storage medium, where the term “non-transitory” does not encompass transitory propagating signals. It should be understood that the example apparatus 100 depicted in FIG. 1 may include additional features and that some of the features described herein may be removed and/or modified without departing from the scope of the example apparatus 100. In the examples that follow in FIG. 1 , reference will be made to FIG. 5 for illustrative clarity.
[0020] The processor 102 may fetch, decode, and execute the instructions 112 to access an electronic document 501 , for which a correct orientation of the electronic document 501 may be unknown. The electronic document 501 may be scanned from an flatbed or portable scanner, imaged by a camera, converted from another electronic document, and/or be generated or stored by another source of electronic documents.
[0021 ] The processor 102 may fetch, decode, and execute the instructions
114 to electronically rotate the electronic document 501 by a plurality of angles of rotation (illustrated in FIG. 5 as 0°, 90°, 180°, and 270°) to generate a plurality of rotated images (illustrated as 501 A-D in FIG. 5). Each rotated image (501 A, 501 B, 501 C, and 501 D) among the plurality of rotated images 501 A-D may correspond to the electronic document 501 being electronically rotated by a corresponding angle of rotation). For example, the electronic document 501 may be electronically rotated by 0° to generate rotated image 501 A (thus, digital rotation by 0° may correspond to not rotating the electronic document 501 ). The electronic document 501 may be electronically rotated by 90° to generate rotated image 501 B. Similarly, the electronic document 501 may be electronically rotated by
180° to generate rotated image 501 C and electronically rotated by 270° to generate rotated image 501 D. An example of electronically rotating an electronic document is described further respect to the image rotator 410 illustrated at FIG. 4. It should be noted that the angles of rotation may be in a first direction (such as a counter-clockwise direction) including a first set of angles comprising 0°, 90°, 180°, and 270° as shown in FIG. 5.
[0022] In another example, the angles of rotation may be in another direction (such as a clockwise direction) including a second set of angles comprising zero, -90, -180, and -270. So long as the angle of rotation and its direction are accounted for, any direction of rotation may be used. It should be further noted that the arrow direction illustrated in each of the rotated images 501 A-D does not denote a correct orientation of the electronic document 501. Rather, the arrow direction is merely intended to convey digital rotation of the electronic document 501 by a corresponding angle of rotation. For example, rotated image 501 D does not necessarily denote a correct orientation of the electronic document 501. Still further, although four angles of rotation are shown in FIG. 5, any number of angles of rotation greater than two may be used. In other words, any number of rotated images 501A-D greater than two may be generated. [0023] For each rotated image (501 A, 501 B, 501 C, 501 D) of the plurality of rotated images 501 A-D, the processor 102 may fetch, decode, and execute the instructions 116 and 118. For example, the processor 102 may fetch, decode, and execute the instructions 116 to provide the rotated image as an input to a machine-learning (ML) classifier (such as the ML orientation classifier 230
illustrated in FIG. 5). The ML classifier may be trained on a training corpus (such as training corpus 210) of images labeled with orientations.
[0024] In some examples, the processor 102 may further obtain, as an output of the ML orientation classifier 230, a plurality of orientation classifications (510A, 510B, 510C, or 510D) for the rotated image 501 A, 501 B, 501 C, or 501 D. [0025] Each rotated image (501 A, 501 B, 501 C, or 501 D) of the plurality of rotated images 501 A-D may therefore be assigned with a respective plurality of orientation classifications 510A-D for the rotated image 501 A, 501 B, 501 C, or 501 D. For example, rotated image 501 A may be associated with a plurality of orientation classifications 510A, rotated image 501 B may be associated with a plurality of orientation classifications 510B, rotated image 501 C may be associated with a plurality of orientation classifications 510C, and rotated image 501 D may be associated with a plurality of orientation classifications 510D. To illustrate, reference is made to the plurality of orientation classifications 510A, in which each orientation classification may classify the rotated image 501 A as being in a 0° (correct orientation), 90° (rotated 90 degrees relative to the correct orientation), 180° (rotated 180 degrees relative to the correct orientation), and 270° (rotated 270 degrees relative to the correct orientation). Other plurality of orientation classifications (510B-D) may include similar classifications.
[0026] In some examples, the plurality of classifications 510A-D may each be associated with a plurality of scores 512A-D. For example, each orientation classification from among the plurality of orientation classifications of each rotated image may include a respective score 512 indicating a probability that the orientation classification is correct.
[0027] To illustrate, reference is again made to the plurality of orientation classifications 510A, in which the 0° orientation classification includes a score of 0.97, the 90° orientation classification includes a score of 0.02, the 180° orientation classification includes a score of 0.00, and the 270° orientation classification includes a score of 0.01 . In other words, forthe rotated image 501 A, the ML orientation classifier 230 may determine a 97% probability that the rotated image 501 A is in the 0° (correct orientation), a 2% probability that the rotated image 501 A is rotated 90° relative to the correct orientation, a 0% probability that the rotated image 501 A is rotated 180° relative to the correct orientation, and a 1% probability that the rotated image 501 A is rotated 270° relative to the correct orientation. Other plurality of orientation classifications 510B-D may include a respective plurality of scores 512B-D as described with respect to the plurality of classifications 510A.
[0028] The processor 102 may fetch, decode, and execute the instructions 118 to determine a final orientation classification 540 of the electronic document 501 based on a plurality of orientation classifications (such as the plurality of classifications 510A-D and scores 512A-D) received from the ML orientation classifier 230.
[0029] In some examples, to determine the final orientation classification of the electronic document, the instructions may further cause the processor to aggregate the respective classifications and scores of each orientation classification of the rotated images 501 A-D to generate a plurality of aggregate classifications 530 and corresponding plurality of respective aggregate scores 532. The process may then the final orientation classification 540 based on the
plurality of aggregate classifications 530 and corresponding plurality of respective aggregate scores 532. For example, 0° classification score associated with each of the rotated images 501 A-D may be aggregated together to create an aggregate score for the 0° classification. Likewise, each 90° classification associated with each of the rotated images 501 A-D may be aggregated together to create an aggregate score for the 90° classification. Similarly, each 180° classification associated with each of the rotated images 501 A-D may be aggregated together to create an aggregate score for the 180° classification. Still further, each 270° classification associated with each of the rotated images 501 A-D may be aggregated together to create an aggregate score for the 270° classification. [0030] In some examples, to aggregate the respective scores, the instructions may further cause the processor to normalize each orientation classification from among the plurality of orientation classifications based on a corresponding angle of rotation applied to the electronic document used to generate the corresponding rotated image, wherein the respective scores are aggregated after the respective scores of each orientation classification have been normalized. For example, the ML orientation classifier 230 may classify the orientation of an input without respect to whether the input has been rotated by the image rotator 410. Thus, the classifications 510A-D may refer to classifications relative to its own frame of reference. To normalize the frame of reference back to the original orientation of the electronic document, each of the classifications 510 may be shifted, in a magnitude and direction opposite of the angle of rotation to create the rotated image 501 A-D. To illustrate, reference will be made to the plurality of classifications 510A-D and normalized plurality of
classifications 520A-D. Because the image 501 A was not rotated (because its corresponding angle of rotation was 0°, the plurality of classifications 510A are normalized by 0°, resulting in no changes. Thus, the plurality of classifications 510A and the same as the normalized plurality of classifications 520A. On the other hand, other rotated images 501 B-D were rotated by a non-zero angle of rotation and therefore may be normalized by the non-zero angle of rotation in an opposite direction.
[0031] For example, for rotated image 501 B, which was rotated 90°, the plurality of classifications 510B may each be normalized by -90° (where the negative sign indicates an opposite direction) to generate corresponding ones of the normalized plurality of classifications 520B. In other words, each classification of the plurality of classifications 520A may be shifted by -90° to take the score of a classification 510A corresponding to the shifted classification. For example, the 0° classification of the classifications 510A may be normalized by -90° to become the 270° classification (where 0-90 is 270 for rotational angle purposes) of the normalized classifications 520A, the 90° classification of the classifications 510A may be normalized by -90° to become the 0° classification of the normalized classifications 520A, the 180° classification of the classifications 510A may be normalized by -90° to become the 90° classification of the normalized classifications 520A, and the 270° classification of the classifications 510A may be normalized by -90° to become the 180° classification of the normalized classifications 520A. Other ones of the plurality of classifications 510C-D may be similarly normalized. After normalizations, each of the plurality of classifications 520A-D may refer to the original orientation of the electronic document 501.
[0032] In some examples, to aggregate the respective scores, the instructions may further cause the processor to determine an average score of each normalized orientation classification. For example, the normalized 0° classification of the plurality of classifications 520A-D may be averaged to obtain an average score of 0.2425, the normalized 90° classification of the plurality of classifications 520A-D may be averaged to obtain an average score of 0.5125, the normalized 180° classification of the plurality of classifications 520A-D may be averaged to obtain an average score of 0.005, and the normalized 270° classification of the plurality of classifications 520A-D may be averaged to obtain an average score of 0.24. The final orientation classification 540 may be selected based on a highest average. As shown in FIG. 5, for example, the highest average may correspond to the 90°.
[0033] In some examples, the instructions may further cause the processor to determine, based on the final orientation classification 540, a corrective angle of rotation to apply to the electronic document to achieve a target orientation, the corrective angle of rotation being selected from among zero and non-zero values, wherein a zero value for the corrective angle of rotation indicates that the electronic document is already in the target orientation. In some of these examples, the instructions may further cause the processor to electronically rotate the electronic document based on the corrective angle of rotation. For example, the electronic document 501 may be determined to be at a 90° orientation based on the final orientation classification 540 and the angle of correction may be determined based on a magnitude and opposite direction of the final orientation
classification 540. In the illustrated example, the electronic document 501 may be digitally by -90° to correct the orientation of the electronic document 501.
[0034] In some examples, perform image processing on the electronic document 501 after the electronic document 501 is rotated to correct the orientation of the electronic document 501. A performance of such image processing may be improved based on the corrected orientation of the electronic document 501. For example, the image processing may include OCR on the electronic document 501 , which may be improved by facilitating letter and word recognition when the correct orientation of the electronic document 501 is used. In another example, the image processing may include print formatting on the electronic document 501 , which may be improved by facilitating proper print orientations when the correct orientation of the electronic document 501 is used. [0035] FIG. 2 depicts a block diagram 200 of an example of training the ML orientation classifier 230 based on a training corpus 210 of labeled documents 212A-D that are labeled with respective orientations. The labeled documents 212A, 212B, 212C, 212D may respectively include a collection of electronic documents that have been labeled (such as by a human annotator) as being in a 0°, 90°, 180°, and 270° orientation. Thus, the training corpus 210 may include various labeled examples of electronic documents that are in various orientations. In some examples, a labeled document 212A-D may include image objects, text objects, and/or other type of content. As such, ML training 220 may include image object training 222, text object training 224, and/or other types of content training to correlate objects found in the labeled documents 212A-D with respective orientation labels.
[0036] In some examples, image object training 222 may include identifying image objects in labeled documents 212A-D. In some examples, such image identification may be automated or may be labeled by human annotators as well. In some examples, image object training 222 may include identifying text objects in labeled documents 212A-D. In some examples, such image identification may be automated or may be labeled by human annotators as well. In some examples, image object training 222 and text object training 224 may be merged to identify pixel features (regardless of whether such pixel features were from an image object or text object).
[0037] In some examples, ML training 220 may be based on a CNN. A CNN] may assign learnable weights and biases to various objects, such as image and/or text objects in the labeled documents 212A-D in order to distinguish the objects from one another. In the context of orientation classification, the weights and biases may reflect distinctions in objects that are oriented one way or another. [0038] Based on this distinction of objects in the electronic document and the knowledge of the orientations of the labelled documents 212A-D obtained during ML training 220, the ML orientation classifier 230 may output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels. The orientation classifications may include a set of orientations, where each orientation is associated with a respective probability that the orientation is correct.
[0039] FIG. 3 depicts an example 300 of orientation classifications 310 generated by the ML orientation classifier 230. The ML orientation classifier 230 may take, as an input, an electronic document 301. The correct orientation of the
electronic document 301 may be unknown. The ML orientation classifier 230 may output, based on the input electronic document 301 , a plurality of orientation classifications 310 that include or are associated with a corresponding plurality of scores 312. Each orientation classification of the plurality of classifications 310 may be a prediction by the ML orientation classifier 230 that the electronic document 301 is a corresponding orientation. Each score of the plurality of scores 320 may indicate a probability that the corresponding orientation is correct. [0040] To illustrate, the plurality of orientation classifications 310 may include four classifications output by the ML orientation classifier 230, although other numbers of classifications may be output. For example, the plurality of orientation classifications 310 may include a 0° classification, a 90° classification, a 180° classification, and a 270° classification. One of the plurality of orientation classifications 310 may correspond to a correct orientation while other ones of the plurality of orientation classifications 310 may correspond to an incorrect orientation.
[0041] In the illustrated example, the 0° classification may correspond to a prediction that the electronic document 301 is in a correct orientation, the 90° orientation correspond to a prediction that the electronic document 301 is in an incorrect orientation that is rotated 90° relative to the correct orientation, the 180° orientation correspond to a prediction that the electronic document 301 is in an incorrect orientation that is rotated 180° relative to the correct orientation, and the 270° orientation correspond to a prediction that the electronic document 301 is in an incorrect orientation that is rotated 270° relative to the correct orientation. As also illustrated, the plurality of scores 312 may sum to a probability of 1.0 (or 100
percent). Thus, as illustrated, the ML orientation classifier 230 predicts that the electronic document 301 is in a 0° (correct) orientation with 8 percent probability, a 90° incorrect orientation with 76 percent probability, a 180° incorrect orientation with 16 percent probability, and a 270° incorrect orientation with 5 percent probability. It should be noted that the plurality of scores 312 may be represented using any scoring value (other than decimal-based probabilities).
[0042] FIG. 4 depicts an example of a pipeline 400 for detecting and correcting an orientation of an electronic document. The pipeline 400 may include an electronic document generator 402, an image rotator 410, the ML orientation classifier 230, a score normalizer420, and a pondered voting 430. The electronic document generator 402, the image rotator 410, the ML orientation classifier 230, the score normalizer 420, and/or the pondered voting 430 may include instructions fetched and executed by a processor (such as processor 102 illustrated in FIG. 1) and/or hardware. In some examples, the electronic document generator 402, the image rotator 410, the ML orientation classifier 230, the score normalizer 420, and/or the pondered voting 430 may be under the control of a processor, such as the processor 102.
[0043] In some examples, the electronic document generator 402 may generate the electronic document 401. For example, the electronic document generator 402 may include a scanner device, such as one incorporated in a flatbed or mobile type of scanner, a camera, such as one incorporated in a mobile phone, and/or other type of document generator, including applications that may convert an electronic document from one format to another format.
[0044] In some examples, the image rotator 410 may electronically rotate the electronic document 401 based on angles of rotation 0°, 90°, 180°, and 270° to generate four rotated images (conceptually depicted as four arrows). Other numbers of angles of rotation may be used instead.
[0045] In some examples, the image rotator 410 may electronically rotate the electronic document 401 based on interpolation. For example, the image rotator 410 may map a pixel of the electronic document 401 from an original position in a pixel grid to an interpolated position in the pixel grid based on the angle of rotation. In some examples, the angles of rotation may be in increments of 90° to avoid lossless interpolation. A 90° rotation may be lossless because each pixel may be repositioned onto another pixel position while a rotation other than 90° may reposition the original pixel onto a border between two pixels. Thus, the original pixel may be divided, resulting in loss of data at the divided pixel. [0046] In some examples, the ML orientation classifier 230 may take as input each image of the rotated images from the image rotator 410. The ML orientation classifier 230 may output, for each rotated image, a plurality of classifications, including a corresponding plurality of scores. In other words, in the illustrated example, the ML orientation classifier 230 may be invoked four times, once for each rotated image from the image rotator 410. Thus, the ML orientation classifier 230 may generate four sets of a plurality of orientation classifications with corresponding scores.
[0047] In some examples, the score normalizer 420 may normalize the scores to account for the angle of rotation applied to the electronic document 401 .
Such normalization may adjust the classifications and scores to correspond to the original (non-rotated) orientation of the electronic document 401.
[0048] In some examples, the pondered voting 430 may determine an aggregate (such as sum and/or average) of each classification score corresponding to each orientation classification to generate aggregate scores. [0049] Based on the pondered voting 430, a final orientation classification 440 may be generated. A corrective angle of rotation 450 may be determined based on the final orientation classification 440. For example, the corrective angle of rotation 450 may be the reverse direction and same magnitude of the final orientation classification 440. In some examples, a corrected electronic document 403 may be generated by invoking the image rotator 410 to electronically rotate the electronic document 401 based on the corrective angle of rotation 450.
[0050] The pipeline 400 may be implemented according to various architectures (not shown). For example, the pipeline 400 may be implemented entirely within the apparatus 100 illustrated in FIG. 1. In these examples, the apparatus 100 may generate the electronic document 401 , such as by scanning or taking a photograph of a physical document, and also detect and correct the orientation of the electronic document 401. In some of these examples, the apparatus 100 may include a scanner device or a mobile phone that may detect and/or correct the orientation of the electronic document 401 using the image rotator 410, the ML orientation classifier 230, the score normalizer 420, and/or the pondered voting 430.
[0051] In other examples, the pipeline 400 may be implemented in a distributed manner. In these examples, the electronic document generator 402 may be separate from the apparatus 100. In some of these examples, the apparatus 100 may include a server device that receives the electronic document 401 from the electronic document generator 402 and then detects and/or corrects the orientation of the electronic document 401 using the image rotator 410, the ML orientation classifier 230, the score normalizer 420, and/or the pondered voting 430.
[0052] Various manners in which the apparatus 100 may operate to determine whether a device will fail based on autoencoders are discussed in greater detail with respect to the method 600 depicted in FIG. 6. It should be understood that the method may include additional operations and that some of the operations described therein may be removed and/or modified without departing from the scope of the methods. The description of the method may be made with reference to the features depicted in FIGS. 1 and 5 for purposes of illustration.
[0053] FIG. 6 depicts a flow diagram that illustrates an example method 600 of determining and adjusting an orientation of an electronic document (such as electronic document 501). At 602, the method 600 may include accessing an electronic document. At 604, the method 600 may include electronically rotating the electronic document by a plurality of angles of rotation (such as 0, 90, 180, 270 illustrated in FIG. 5) to generate a plurality of rotated images (such as rotated images 501 A-D). Each rotated image among the plurality of rotated images may
correspond to the electronic document being electronically rotated by a corresponding angle of rotation.
[0054] At 606, the method 600 may include, for each rotated image from among the plurality of rotated images, determining a plurality of orientation classifications (such as a plurality of orientation classifications 510A-D) of the rotated image and a score (such as a score among the plurality of scores 520A- D) for each orientation classification indicating a probability that the orientation classification is correct.
[0055] At 608, the method 600 may include determining an aggregate score based on the scores for the plurality of orientation classifications.
[0056] At 610, the method 600 may include determining a final orientation classification (such as the final orientation classification 540) based on the aggregate score. At 612, the method 600 may include adjusting (such as electronically rotating) an orientation of the electronic document based on the final orientation classification.
[0057] Some or all of the operations set forth in the method 600 may be included as utilities, programs, or subprograms, in any desired computer accessible medium. In addition, the method 600 may be embodied by computer programs, which may exist in a variety of forms. For example, some operations of the method 600 may exist as machine-readable instructions, including source code, object code, executable code or other formats. Any of the above may be embodied on a non-transitory machine-readable (such as computer-readable) storage medium. Examples of non-transitory machine-readable storage media include computer system RAM, ROM, EPROM, EEPROM, and magnetic or
optical disks or tapes. It is therefore to be understood that any electronic device capable of executing the above-described functions may perform those functions enumerated above.
[0058] FIG. 7 depicts a block diagram of an example non-transitory machine-readable storage medium 700 of determining a corrective angle of rotation to correct an orientation of an electronic document based on an orientation classification of the electronic document determined by the ML orientation classifier 230.
[0059] The machine-readable instructions 702 may cause the processor (such as processor 102 illustrated in FIG. 1) to access an electronic document (such as electronic document 501 ).
[0060] The machine-readable instructions 704 may cause the processor to electronically rotate the electronic document by a plurality of angles of rotation (such as 0, 90, 180, and 270 illustrated in FIG. 5) to generate a plurality of rotated images (such as rotated images 501 A-D). Each rotated image among the plurality of rotated images may correspond to the electronic document being electronically rotated by a corresponding angle of rotation.
[0061] For each rotated images of the plurality of rotated images, the processor may fetch, decode, and execute the machine-readable instructions 706 to provide the rotated image as an input to a machine-learning (ML) classifier (such as the ML orientation classifier 230). The ML classifier may be trained on a training corpus of images (such as training corpus 210) labeled with orientations.
[0062] In some examples, the processor may be caused to obtain as an output of the ML orientation classifier a plurality of orientation classifications (such as the plurality of classifications 51 OA-D) for the rotated image.
[0063] The machine-readable instructions 708 may cause the processor to determine, based on a plurality of orientation classifications received from the ML orientation classifier, a final orientation classification (such as the final orientation classification 540).
[0064] The machine-readable instructions 710 may cause the processor to determine, based on the final orientation classification, a corrective angle of rotation to apply to the electronic document to achieve a target orientation (such as a correct orientation), the corrective angle of rotation being selected from among zero and non-zero values. In these examples, a zero value for the corrective angle of rotation indicates that the electronic document is already in the target orientation. In some examples, each orientation classification from among the plurality of orientation classifications of each rotated image may associated with a respective score (such as a score from among each of the plurality of scores 512A-D) indicating the probability that the corresponding orientation classification is correct.
[0065] In some examples, the instructions when executed further cause the processor to determine an average of the respective scores of each orientation classification, and determine the final orientation classification based on the average of the respective scores. In some examples, the instructions when executed further cause the processor to adjust (such as electronically rotate) an orientation of the electronic document based on the selected corrective angle if
the selected corrective angle is non-zero, perform image analysis on the electronic document based on the adjusted orientation. In some examples, to perform the image processing, the instructions when executed further cause the processor to perform OCR on the electronic document to recognize text or prepare the electronic document for printing.
[0066] Although described specifically throughout the entirety of the instant disclosure, representative examples of the present disclosure have utility over a wide range of applications, and the above discussion is not intended and should not be construed to be limiting, but is offered as an illustrative discussion of aspects of the disclosure.
[0067] What has been described and illustrated herein is an example of the disclosure along with some of its variations. The terms, descriptions and figures used herein are set forth by way of illustration only and are not meant as limitations. Many variations are possible within the scope of the disclosure, which is intended to be defined by the following claims - and their equivalents - in which all terms are meant in their broadest reasonable sense unless otherwise indicated.
Claims
1. An apparatus comprising: a processor; and a non-transitory machine-readable medium on which is stored instructions that when executed by the processor, cause the processor to: access an electronic document; electronically rotate the electronic document by a plurality of angles of rotation to generate a plurality of rotated images; for each rotated image from among the plurality of rotated images: provide the rotated image as an input to a machine-learning (ML) orientation classifier; and determine a final orientation classification of the electronic document based on a plurality of orientation classifications received from the ML orientation classifier.
2. The apparatus of claim 1 , wherein the instructions further cause the processor to: electronically rotate the electronic document based on the final orientation classification to correct an orientation of the electronic document.
3. The apparatus of claim 2, wherein the instructions further cause the processor to:
determine, based on the final orientation classification, a corrective angle of rotation to apply to the electronic document to achieve a target orientation, the corrective angle of rotation being selected from among zero and non-zero values, wherein a zero value for the corrective angle of rotation indicates that the electronic document is already in the target orientation, and wherein the electronic document is electronically rotated based on the corrective angle of rotation.
4. The apparatus of claim 3, wherein the instructions further cause the processor to: perform image processing on the electronic document after the electronic document is rotated to correct the orientation of the electronic document, a performance of the image processing being improved based on the corrected orientation of the electronic document.
5. The apparatus of claim 1 , wherein each orientation classification from among the plurality of orientation classifications of each rotated image comprises a respective score indicating a probability that the orientation classification is correct.
6. The apparatus of claim 5, wherein to determine the final orientation classification of the electronic document, the instructions further cause the processor to: aggregate the respective scores of each orientation classification; and
select a final orientation classification based on the aggregated respective scores.
7. The apparatus of claim 6, wherein to aggregate the respective scores, the instructions further cause the processor to: normalize each orientation classification from among the plurality of orientation classifications based on the corresponding angle of rotation applied to the electronic document used to generate the corresponding rotated image, wherein the respective scores are aggregated after the respective scores of each orientation classification have been normalized.
8. The apparatus of claim 7, wherein to aggregate the respective scores, the instructions further cause the processor to: determine an average score of each normalized orientation classification.
9. The apparatus of claim 1 , wherein the plurality of angles of rotation is selected from a first set of angles and a second set of angles, the first set of angles comprising zero, 90, 180, and 270 degrees and the second set of angles comprising zero, -90, -180, and -270 degrees.
10. The apparatus of claim 1 , wherein the instructions further cause the processor to: train the ML orientation classifier based on graphical and/or text features in the training corpus of images to operate on graphical images and/or text.
11. A computer-implemented method, comprising: accessing an electronic document; electronically rotating the electronic document by a plurality of angles of rotation to generate a plurality of rotated images; for each rotated image from among the plurality of rotated images determining a plurality of orientation classifications and a score for each orientation classification indicating a probability that the orientation classification is correct; determining an aggregate score based on the scores for the plurality of orientation classifications; determining a final orientation classification based on the aggregate score; and adjusting an orientation of the electronic document based on the final orientation classification.
12. The method of claim 11 , further comprising: for each rotated image from among the plurality of rotated images: determining, by the processor, a third orientation classification of the rotated image and a third score indicating a third probability that the third orientation classification of the rotated image is correct; determining, by the processor, a fourth orientation classification of the rotated image and a fourth score indicating a fourth probability that the fourth orientation classification of the rotated image is correct;
generating, by the processor, an aggregate third score based on each third score determined for each rotated image, the aggregate third score indicating an aggregate probability that the third orientation classification of the rotated image is correct; generating, by the processor, an aggregate fourth score based on each fourth score determined for each rotated image, the aggregate fourth score indicating an aggregate probability that the fourth orientation classification of the rotated image is correct; wherein determining the final orientation classification is based further on the aggregate third score and the aggregate fourth score.
13. The method of claim 11 , further comprising: determining a corrective angle of rotation based on the final orientation classification.
14. A non-transitory machine-readable medium on which is stored machine- readable instructions that when executed by a processor, cause the processor to: access an electronic document; electronically rotate the electronic document by a plurality of angles of rotation to generate a plurality of rotated images; for each rotated image from among the plurality of rotated images: provide the rotated image as an input to a machine-learning (ML) orientation classifier;
determine, based on a plurality of orientation classifications from the ML orientation classifier, a final orientation classification; and determine, based on the final orientation classification, a corrective angle of rotation to apply to the electronic document to achieve a target orientation.
15. The non-transitory machine-readable medium of claim 14, the instructions when executed further cause the processor to: adjust an orientation of the electronic document based on the final orientation classification; and perform image processing on the electronic document based on the adjusted orientation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2020/030262 WO2021221614A1 (en) | 2020-04-28 | 2020-04-28 | Document orientation detection and correction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2020/030262 WO2021221614A1 (en) | 2020-04-28 | 2020-04-28 | Document orientation detection and correction |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021221614A1 true WO2021221614A1 (en) | 2021-11-04 |
Family
ID=78373782
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2020/030262 WO2021221614A1 (en) | 2020-04-28 | 2020-04-28 | Document orientation detection and correction |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021221614A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114120323A (en) * | 2021-11-05 | 2022-03-01 | 北京量子之歌科技有限公司 | Management method, device, equipment and storage medium for bill payment |
US20230162520A1 (en) * | 2021-11-23 | 2023-05-25 | Abbyy Development Inc. | Identifying writing systems utilized in documents |
WO2023123763A1 (en) * | 2021-12-31 | 2023-07-06 | 上海合合信息科技股份有限公司 | Direction correction method and apparatus for document image |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110258150A1 (en) * | 2010-01-15 | 2011-10-20 | Copanion, Inc. | Systems and methods for training document analysis system for automatically extracting data from documents |
US9141607B1 (en) * | 2007-05-30 | 2015-09-22 | Google Inc. | Determining optical character recognition parameters |
US20160105619A1 (en) * | 2014-10-10 | 2016-04-14 | Korea Advanced Institute Of Science And Technology | Method and apparatus for adjusting camera top-down angle for mobile document capture |
US10616443B1 (en) * | 2019-02-11 | 2020-04-07 | Open Text Sa Ulc | On-device artificial intelligence systems and methods for document auto-rotation |
-
2020
- 2020-04-28 WO PCT/US2020/030262 patent/WO2021221614A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9141607B1 (en) * | 2007-05-30 | 2015-09-22 | Google Inc. | Determining optical character recognition parameters |
US20110258150A1 (en) * | 2010-01-15 | 2011-10-20 | Copanion, Inc. | Systems and methods for training document analysis system for automatically extracting data from documents |
US20160105619A1 (en) * | 2014-10-10 | 2016-04-14 | Korea Advanced Institute Of Science And Technology | Method and apparatus for adjusting camera top-down angle for mobile document capture |
US10616443B1 (en) * | 2019-02-11 | 2020-04-07 | Open Text Sa Ulc | On-device artificial intelligence systems and methods for document auto-rotation |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114120323A (en) * | 2021-11-05 | 2022-03-01 | 北京量子之歌科技有限公司 | Management method, device, equipment and storage medium for bill payment |
US20230162520A1 (en) * | 2021-11-23 | 2023-05-25 | Abbyy Development Inc. | Identifying writing systems utilized in documents |
WO2023123763A1 (en) * | 2021-12-31 | 2023-07-06 | 上海合合信息科技股份有限公司 | Direction correction method and apparatus for document image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11126892B2 (en) | Dual stage neural network pipeline systems and methods | |
US9729755B2 (en) | Intelligent image correction with preview | |
US20240346069A1 (en) | Recognizing text in image data | |
AU2017301369B2 (en) | Improving optical character recognition (OCR) accuracy by combining results across video frames | |
WO2021221614A1 (en) | Document orientation detection and correction | |
US5181260A (en) | Method for determining the amount of skew of image, method for correcting the same, and image data processing system | |
US9171359B1 (en) | Method and system for auto-correcting perspective distortion in document images | |
US7483564B2 (en) | Method and apparatus for three-dimensional shadow lightening | |
US20230267619A1 (en) | Method and system of recognizing object edges and computer-readable storage medium | |
JPH08241411A (en) | System and method for evaluation of document image | |
CN111814785B (en) | Invoice recognition method, training method of relevant model, relevant equipment and device | |
CN103714327A (en) | Method and system for correcting image direction | |
CN108491866B (en) | Pornographic picture identification method, electronic device and readable storage medium | |
US10049268B2 (en) | Selective, user-mediated content recognition using mobile devices | |
WO2018233171A1 (en) | Method and apparatus for entering document information, computer device and storage medium | |
US20150302243A1 (en) | Distance based binary classifier of handwritten words | |
US20200026944A1 (en) | System for extracting text from images | |
CN110084229A (en) | A kind of seal detection method, device, equipment and readable storage medium storing program for executing | |
US9110926B1 (en) | Skew detection for vertical text | |
WO2023123763A1 (en) | Direction correction method and apparatus for document image | |
US9483834B1 (en) | Object boundary detection in an image | |
US8731336B2 (en) | Multi-perpendicular line-based deskew with affine and perspective distortion correction | |
US9607360B2 (en) | Modifying the size of document content based on a pre-determined threshold value | |
CN113011249A (en) | Bill auditing method, device, equipment and storage medium | |
JP6598080B2 (en) | Image reading apparatus, image reading method, image forming apparatus, and image reading program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20933217 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20933217 Country of ref document: EP Kind code of ref document: A1 |