CN114140314A

CN114140314A - Face image processing method and device

Info

Publication number: CN114140314A
Application number: CN202010820696.8A
Authority: CN
Inventors: 庄兆永; 钱生; 潘邵武
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2022-03-04

Abstract

The embodiment of the application provides a method and a device for processing a face image, and relates to a face image processing technology in the field of Artificial Intelligence (AI). The method comprises the following steps: acquiring a first face image and a second face image; determining face control points from the face key points of the first face image according to the face key points of the first face image and the face key points of the second face image, and determining offset vectors corresponding to the face control points; the first face image is different from the second face image, and the offset vector corresponding to the face control point comprises a difference value between a coordinate corresponding to the face key point corresponding to the second face image and a coordinate corresponding to the face control point; inputting the face control points and offset vectors corresponding to the face control points multiplied by the weight coefficients into an image warping algorithm to obtain a first face image after geometric deformation; and outputting the first face image after the geometric deformation.

Description

Face image processing method and device

Technical Field

The present application relates to the field of Artificial Intelligence (AI), and in particular, to a method and an apparatus for processing a face image.

Background

Caricatures are forms that depict a person (typically a person's face, i.e., a human face) in exaggerated form by sketching, pencil drawing, or other artistic forms. Caricatures are often used for entertainment, as a form of communication of humour or irony. Currently, techniques for automatic generation of a face caricature have appeared, and as shown in fig. 1, the face caricature generation is to exaggerate the geometric features of a face while preserving the human face discernability in the original image (e.g., a self-portrait of a user), while achieving style migration of the image.

The existing face caricature generation algorithm is basically implemented by using an end-to-end generated confrontation network (GAN). For example, the currently mainstream human face caricature generation networks include CariGANs, WarpGAN, MW-GAN, and the like. As shown in fig. 2, taking CariGANs as an example, after a face image is input into the CariGANs, a face cartoon can be output.

After the end-to-end face cartoon generating confrontation network is well designed, a large amount of face cartoon data are required to be trained to generate a good face cartoon effect, and the current face cartoon is difficult to collect, the collected number is small in scale, the training effect is poor, and a satisfactory face cartoon is difficult to generate.

Disclosure of Invention

The embodiment of the application provides a method and a device for processing a face image, and provides a scheme for generating a face cartoon of the face image.

In a first aspect, an embodiment of the present application provides a face image processing method, where the method includes: acquiring a first face image and a second face image; determining face control points from the face key points of the first face image according to the face key points of the first face image and the face key points of the second face image, and determining offset vectors corresponding to the face control points; the first face image is different from the second face image, and the offset vector corresponding to the face control point comprises a difference value between a coordinate corresponding to the face key point corresponding to the second face image and a coordinate corresponding to the face control point; inputting the face control points and offset vectors corresponding to the face control points multiplied by the weight coefficients into an image warping algorithm to obtain a first face image after geometric deformation; when the weight coefficient is a first value, the distinguishing characteristics of the first face image and the second face image are strengthened on the first face image after the geometric deformation; when the weight coefficient is a second value, the distinguishing characteristic of the first face image, which is different from the second face image, is weakened on the first face image after the geometric deformation; wherein the distinguishing characteristic comprises the size of the five sense organs; and outputting the first face image after the geometric deformation.

The face image processing method provided by the embodiment of the application is equivalent to the fact that geometric deformation is carried out on the face in the first face image by adopting a mathematical model, instead of carrying out geometric deformation on the face in the first face image by using an antagonistic network generated by training of a large amount of face cartoon data, the face geometric deformation effect can be explained, the face geometric deformation degree is controllable, and the generated face cartoon quality can be higher. In addition, the embodiment of the present application may directly warp (or referred to as geometrically deforming) the first face image according to the offset vector and an image warping algorithm (e.g., warp algorithm) to obtain the geometrically deformed first face image. The human face image is directly distorted during the distortion, and a human face model does not need to be generated and then distorted on the human face image.

In one possible implementation, the face control points include partial face key points of the first face image. That is, the face control points are part of the face key points determined from all the face key points of the first face image.

In a possible implementation manner, determining a face control point of a first face image and an offset vector corresponding to the face control point according to a face key point of the first face image and a face key point of a second face image includes: coordinates corresponding to the face key points of the first face image comprise S first subsets corresponding to the five sense organs, and the mean value and/or the variance corresponding to each first subset are respectively calculated; coordinates corresponding to the face key points of the second face image comprise S second subsets corresponding to the five sense organs, and the mean value and/or the variance corresponding to each second subset are respectively calculated; the S first subsets correspond to the S second subsets one by one; wherein S is an integer greater than or equal to 1; determining N target facial features of the first facial image according to the absolute value of the mean and/or variance of each of the S first subsets and the absolute value of the mean and/or variance of each of the S second subsets, wherein N is less than or equal to S; the face control points of the first face image comprise face key points corresponding to each target facial feature in the N target facial features in the first face image. After the face control points and the corresponding offset vectors thereof are determined, the face control points and the offset vectors corresponding to the face control points multiplied by the weight coefficients can be input into an image warping algorithm to obtain a first face image after geometric deformation. Therefore, geometric deformation is conducted on the face in the first face image equivalently by adopting a mathematical model, instead of conducting geometric deformation on the face in the first face image by using an antagonistic network generated by training of a large amount of face cartoon data, the face geometric deformation effect can be explained, the face geometric deformation degree is controllable, and the generated face cartoon quality can be higher.

In one possible implementation, determining the N target facial features of the first face image from the absolute value of the mean and/or variance of each of the S first subsets and the absolute value of the mean and/or variance of each of the S second subsets comprises: determining the difference between the absolute value of the mean and/or the variance of each of the S first subsets and the absolute value of the mean and/or the variance of the second subset corresponding to the first subset to obtain S differences; and determining N first subsets corresponding to the maximum N differences in the S differences, wherein the five sense organs corresponding to the N first subsets are N target five sense organs. After the N target facial features are determined, the face control points and the corresponding offset vectors thereof can be determined according to the N target facial features, and then the face control points and the offset vectors corresponding to the face control points multiplied by the weight coefficients can be input into an image warping algorithm to obtain a first face image after geometric deformation. Therefore, geometric deformation is conducted on the face in the first face image equivalently by adopting a mathematical model, instead of conducting geometric deformation on the face in the first face image by using an antagonistic network generated by training of a large amount of face cartoon data, the face geometric deformation effect can be explained, the face geometric deformation degree is controllable, and the generated face cartoon quality can be higher.

In one possible implementation, the S first subsets corresponding to the five sense organs include first subsets corresponding to eyebrows, eyes, a nose, a mouth, and a face contour, respectively. Alternatively, the eyebrows, the eyes, the mouth, and the like may be regarded as a whole, and S is 5, or the eyes may be divided into the left eye and the right eye, the eyebrows may be divided into the left eyebrows and the right eyebrows, the mouth may be divided into the upper lip and the lower lip, and S is 8.

In one possible implementation, the second face image is an average face image. The geometric characteristics of the human face in the first human face image can be exaggerated based on the average face, the generated human face cartoon has higher quality, the human face geometric deformation effect can be explained, and the human face geometric deformation degree is controllable.

In one possible implementation, the second face image is a cartoon face image. The geometric characteristics of the human face in the first human face image can be exaggerated based on the cartoon face, the generated human face cartoon has higher quality, the human face geometric deformation effect can be explained, and the human face geometric deformation degree is controllable.

In one possible implementation, the method further includes: respectively detecting the face positions of the first face image and the second face image; performing first processing on the first face image and the second face image, wherein the first processing comprises image cutting processing and/or image rotation processing; the first processed first facial image and the first processed second facial image are aligned. Therefore, the aligned first face image and the aligned second face image can be directly compared, and face models do not need to be generated respectively according to the first face image and the second face image for comparison.

In one possible implementation, the image warping algorithm is a Warp algorithm. The Warp algorithm can Warp a plurality of face control points at the same time, and the warping efficiency is high.

In a possible implementation manner, before the offset vectors corresponding to the face control points and the face control points multiplied by the weight coefficients are input into an image warping algorithm to obtain the first geometrically deformed face image, the method further includes: and carrying out style migration on the first face image according to the style image and the style migration network. That is, the style of the first face image may be migrated first, and then the first face image after the style migration may be geometrically deformed.

In one possible implementation, the method further includes: and carrying out style migration on the first face image after the geometric deformation according to the style image and the style migration network. That is to say, the first facial image may be geometrically deformed first, and then the first facial image after the geometric deformation may be subjected to style migration.

In one possible implementation, the style migration network is a neural network trained from a plurality of style images belonging to at least one of landscape images, comic images, or art images. Therefore, the network is not limited to be migrated according to the cartoon image training style, and the problem of poor training effect caused by difficulty in collection of the cartoon of the face and small quantity and scale of the cartoon of the face at present is solved. Because the scenic images and the like are richer and easier to collect, the style migration network obtained by training has a better effect.

In one possible implementation, the style migration network is an AdaIN style migration network.

In one possible implementation, the second face image is the same as the stylized image. That is, the geometric deformation can be performed according to the second face image, and the style migration can be performed according to the second face image. In this way, the user's operation can be simplified without the need for the user to select multiple times (select the second face image and the style image).

In a second aspect, an embodiment of the present application provides a face image processing method, where the method includes: responding to the operation of a user for carrying out first image processing on a first face image, and displaying a first interface; the first image processing comprises the steps of carrying out geometric deformation processing and style migration processing on the human face in the first human face image; displaying a first face image obtained by geometrically deforming a face in the first face image according to the second face image on a first interface; and responding to the style image selected by the user on the first interface, performing style migration on the geometrically deformed first face image according to the style image, and displaying the style migrated first face image on the first interface. The embodiment of the application provides a scheme for generating a face cartoon of a face image, which can be used for carrying out geometric deformation on a first face image and then carrying out style migration on the first face image after geometric deformation.

In a third aspect, an embodiment of the present application provides a face image processing method, where the method includes: responding to the operation of a user for carrying out first image processing on a first face image, and displaying a first interface; the first image processing comprises the steps of carrying out geometric deformation processing and style migration processing on the human face in the first human face image; performing style migration on the first face image according to the style image in response to the style image selected by the user on the first interface; and geometrically deforming the face in the first face image after the style migration according to the second face image, and displaying the geometrically deformed first face image on a first interface. The embodiment of the application provides a scheme for generating a face cartoon of a face image, which can be used for firstly carrying out style migration on a first face image and then carrying out geometric deformation on the first face image after the style migration.

In one possible implementation, the first interface includes a preview area and a chart image selection area; the style image selection area displays a style image to be selected or a label of the style image; and the preview area displays the first face image after the geometric deformation or the first face image after the style transition. Therefore, the user can select the style image or the label of the style image in the style image selection area, and can see the first face image after the geometric deformation or the first face image after the style transition in the preview area, so that the user experience can be improved.

In a possible implementation manner, the first interface further includes a reference face image selection area, and the content displayed in the reference face image selection area includes a caricature image to be selected or a label of the caricature image. The user can select the cartoon image or the label of the cartoon image in the reference face image selection area, and the first face image can be geometrically deformed according to the cartoon image or the label of the cartoon image selected by the user. It should be noted that the second face image may be one image of the reference face images.

In one possible implementation, the method further includes: determining face control points from the face key points of the first face image according to the face key points of the first face image and the face key points of the second face image, and determining offset vectors corresponding to the face control points; wherein the first face image is different from the second face image; the offset vector corresponding to the face control point comprises the mark of the face control point, and the coordinate corresponding to the face key point corresponding to the second face image is subtracted from the coordinate corresponding to the face control point; inputting the face control points and offset vectors corresponding to the face control points multiplied by the weight coefficients into an image warping algorithm to obtain a first face image after geometric deformation; the weight coefficient can be a first value, the first value can be less than 0, and the distinguishing features of the first facial image and the second facial image are strengthened on the geometrically deformed first facial image; the weight coefficient can be a second value, the second value can be larger than 0, and the distinguishing characteristic of the first face image, which is different from the second face image, is weakened on the geometrically deformed first face image; wherein the distinguishing characteristic includes the size of the five sense organs. Optionally, the distinguishing features may also include the layout of the five sense organs. The face image processing method provided by the embodiment of the application is equivalent to the fact that geometric deformation is carried out on the face in the first face image by adopting a mathematical model, instead of carrying out geometric deformation on the face in the first face image by using an antagonistic network generated by training of a large amount of face cartoon data, the face geometric deformation effect can be explained, the face geometric deformation degree is controllable, and the generated face cartoon quality can be higher. In addition, the embodiment of the present application may directly warp (or referred to as geometrically deforming) the first face image according to the offset vector and an image warping algorithm (e.g., warp algorithm) to obtain the geometrically deformed first face image. The human face image is directly distorted during distortion, and a human face model does not need to be generated and then distorted on the human face image.

In one possible implementation, the face control points include partial face key points of the first face image.

In a possible implementation manner, determining a face control point from the face key points of the first face image according to the face key points of the first face image and the face key points of the second face image, and determining an offset vector corresponding to the face control point includes: coordinates corresponding to the face key points of the first face image comprise S first subsets corresponding to the five sense organs, and the mean value and/or the variance corresponding to each first subset are respectively calculated; coordinates corresponding to the face key points of the second face image comprise S second subsets corresponding to the five sense organs, and the mean value and/or the variance corresponding to each second subset are respectively calculated; the S first subsets correspond to the S second subsets one by one; wherein S is an integer greater than or equal to 1; determining N target facial features of the first facial image according to the absolute value of the mean and/or variance of each of the S first subsets and the absolute value of the mean and/or variance of each of the S second subsets, wherein N is less than or equal to S; the face control points of the first face image comprise face key points corresponding to each target facial feature in the N target facial features in the first face image. After the face control points and the corresponding offset vectors thereof are determined, the face control points and the offset vectors corresponding to the face control points multiplied by the weight coefficients can be input into an image warping algorithm to obtain a first face image after geometric deformation. Therefore, geometric deformation is conducted on the face in the first face image equivalently by adopting a mathematical model, instead of conducting geometric deformation on the face in the first face image by using an antagonistic network generated by training of a large amount of face cartoon data, the face geometric deformation effect can be explained, the face geometric deformation degree is controllable, and the generated face cartoon quality can be higher.

In one possible implementation, the S first subsets corresponding to the five sense organs include first subsets corresponding to eyebrows, eyes, a nose, a mouth, and a face contour, respectively.

In one possible implementation, the first interface further includes a five sense organ region displaying images of S five sense organs or labels of the five sense organs. When one or more facial features in the facial feature region are selected, displaying the first face image after the geometric deformation of the one or more facial features in the preview region. The subsequent user can select the five sense organs needing geometric deformation from the S five sense organs.

In one possible implementation, N target facial features of the S facial features are selected by default. I.e. the N target facial features can be geometrically deformed by default. Namely, the electronic equipment can automatically select the N target facial features, the N target facial features are subjected to geometric deformation by default, user selection is not needed, and operation of a user can be simplified. In one possible approach, the N may be equal to 1.

In one possible implementation, the method further includes: and receiving an operation of selecting or cancelling any one of the S five sense organs by the user, and responding to the operation, and performing or cancelling geometric deformation processing on the corresponding five sense organs in the first face image. Therefore, the user can independently select the five sense organs needing geometric deformation, so that the user has more participation and the user experience is improved.

In a possible implementation manner, the first interface further includes a first control, and the first control is used for controlling the geometric deformation degree of the selected facial features. Therefore, the user can control the geometric deformation degree of the selected five sense organs, so that the user has more participation and the user experience is improved.

In a fourth aspect, the present application provides a chip system that includes one or more interface circuits and one or more processors. The interface circuit and the processor are interconnected by a line.

The above chip system may be applied to an electronic device including a communication module and a memory. The interface circuit is configured to receive signals from a memory of the electronic device and to transmit the received signals to the processor, the signals including computer instructions stored in the memory. When executed by a processor, the computer instructions may cause an electronic device to perform the method as set forth in the first, second, or third aspect, and any of its possible designs.

Alternatively, the above-described chip system may be applied to a server (server apparatus) including a communication module and a memory. The interface circuit is configured to receive signals from the memory of the server and to send the received signals to the processor, the signals including computer instructions stored in the memory. The server may perform the method as described in the first, second or third aspect and any of its possible designs when the computer instructions are executed by the processor.

In a fifth aspect, the present application provides a computer-readable storage medium comprising computer instructions. The computer instructions, when executed on an electronic device (such as a mobile phone), cause the electronic device to perform the method of the first, second or third aspect and any of its possible designs.

Alternatively, the computer instructions, when executed on a server, cause the server to perform a method as set forth in the first, second or third aspect and any of its possible designs.

In a sixth aspect, the present application provides a computer program product for, when run on a computer, causing the computer to perform the method as set forth in the first, second or third aspect and any one of its possible designs.

In a seventh aspect, an embodiment of the present application provides an image processing apparatus, including a processor, a processor coupled to a memory, where the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor, the apparatus implements the method according to the first aspect, the second aspect, or the third aspect, and any possible design manner thereof. The apparatus may be an electronic device or a server device; or may be an integral part of the electronic device or the server device, such as a chip.

In an eighth aspect, the present application provides an image processing apparatus, which may be divided into different logical units or modules according to functions, and each unit or module performs different functions, so that the apparatus performs the method described in the above first aspect, second aspect, or third aspect, and any possible design manner thereof.

In a ninth aspect, an embodiment of the present application provides an image processing system, which includes an electronic device and a server, where the electronic device and the server respectively perform part of the steps, and cooperate with each other to implement the method described in the first aspect, the second aspect, or the third aspect, and any possible design manner thereof.

It should be understood that, for the beneficial effects that can be achieved by the chip system according to the fourth aspect, the computer readable storage medium according to the fifth aspect, the computer program product according to the sixth aspect, the apparatus according to the seventh aspect, the eighth aspect, and the system according to the ninth aspect, reference may be made to the beneficial effects in the first aspect, the second aspect, or the third aspect, and any possible design manner thereof, and no further description is provided herein.

Drawings

FIG. 1 is a schematic diagram of an original image and a caricature of a human face;

fig. 2 is a schematic diagram of a human face cartoon generation network in the prior art;

fig. 3A is a schematic diagram of style migration according to an embodiment of the present application;

fig. 3B is a schematic diagram of a face key point according to an embodiment of the present application;

fig. 4A is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

fig. 4B is a schematic diagram of a software structure of an electronic device according to an embodiment of the present application;

fig. 5A is a schematic diagram of a processing process of a face image based on an average face according to an embodiment of the present application;

fig. 5B is a schematic diagram of another average face-based facial image processing process provided in the embodiment of the present application;

fig. 6A is a schematic diagram of a process of geometrically deforming a human face based on an average face according to an embodiment of the present application;

fig. 6B is a schematic diagram of another process for geometrically deforming a human face based on an average face according to an embodiment of the present application;

fig. 7 is a schematic diagram of a process for determining a face control point and an offset vector based on an average face according to an embodiment of the present application;

fig. 8 is a schematic diagram of a face control point and an offset vector according to an embodiment of the present application;

fig. 9 is a schematic diagram of a processing process of another average face-based facial image according to an embodiment of the present application;

fig. 10A is a schematic view of a processing process of a human face image based on a cartoon face according to an embodiment of the present application;

fig. 10B is a schematic diagram of a processing process of another cartoon-based face image according to an embodiment of the present application;

fig. 11A is a schematic view of a process of geometrically deforming a human face based on a cartoon face according to an embodiment of the present application;

fig. 11B is a schematic diagram of a process for determining a face control point and an offset vector based on a caricature provided in an embodiment of the present application;

FIG. 12A is a schematic diagram illustrating an input and an output of an image warping algorithm according to an embodiment of the present application;

fig. 12B is a schematic diagram of style migration according to an embodiment of the present application;

fig. 12C is a schematic view of a processing process of another human face image based on a cartoon face according to an embodiment of the present application;

FIG. 13 is a schematic illustration of a display provided by an embodiment of the present application;

FIG. 14 is a schematic illustration of a display provided by an embodiment of the present application;

FIG. 15 is a schematic illustration of a display provided by an embodiment of the present application;

FIG. 16 is a schematic view of a display provided by an embodiment of the present application;

FIG. 17 is a schematic illustration of a display provided by an embodiment of the present application;

FIG. 18A is a schematic view of a display provided in accordance with an embodiment of the present application;

FIG. 18B is a schematic view of a display provided in accordance with an embodiment of the present application;

fig. 19 is a schematic flowchart of a face image processing method according to an embodiment of the present application;

fig. 20 is a schematic flowchart of a face image processing method according to an embodiment of the present application;

fig. 21 is a schematic structural diagram of a chip system according to an embodiment of the present application.

Detailed Description

For clarity and conciseness of the following description of the various embodiments, a brief introduction to related concepts or technologies is first presented:

style migration network: the style migration network may perform style migration on the raw image according to the style image. As shown in fig. 3A, the original image may be a generic image (e.g., a photograph taken by a user) and the stylized image may be an image having a characteristic, such as an abstract drawing created by an artist. The original image and the style image are input into a style migration network, the style migration network can perform style migration on the original image according to the style image, and finally the style migration image can be obtained. The style migration image has the content and layout of the original image, and has the color and texture of the style image, wherein the texture may include features such as lines or patterns on the image.

Average face: the average face may be a face image obtained by averaging facial features (the layout, size, and the like of five sense organs) of a plurality of faces and re-synthesizing the face image. As shown in fig. 3B, the average face may be calculated from a database (e.g., helln dataset) with multiple front, positive poses and facial expressions that are not too exaggerated for human faces.

Alternatively, different average faces may be calculated based on different dimensions, for example, a male average face and a female average face are calculated based on gender dimensions, respectively; or calculate asian average face, african average face, american average face, etc. based on the dimensions of the region. Optionally, multiple dimensions may be superimposed to calculate different average faces, for example, an asian male average face and an asian female average face are calculated based on the dimensions of the region and the gender, respectively, which is not limited in the present application.

CariGANs: CariGANs is a human face cartoon generating network, and the CariGANs can be composed of two major sub-networks, namely a style migration network (CaristyGAN) and a geometric deformation network (CarigeoGAN). The geometric deformation network is responsible for realizing the geometric exaggeration deformation of the human face, and the style migration network is responsible for completing the arbitrary style migration of the human face image, and finally the generation of the human face cartoon is realized. The geometric deformation network and the style migration network are obtained by training together according to the same batch of human face cartoon data. Because the training processes of the geometric deformation network and the style migration network are closely related, the geometric deformation network and the style migration network are interactive and fused with each other, so that the human face cartoon generated by the CariGANs has low interpretability and uncontrollable result (namely, the generated human face cartoon cannot be artificially controlled, and can only be directly generated by a trained neural network).

The embodiment of the application provides a face image processing method, which comprises the following steps: acquiring a first face image and a second face image; determining face control points from the face key points of the first face image according to the face key points of the first face image and the face key points of the second face image, and determining offset vectors corresponding to the face control points; wherein the first face image is different from the second face image; geometrically deforming the face in the first face image according to the face control point and the offset vector corresponding to the face control point; and outputting the first face image after the geometric deformation. The face image processing method provided by the embodiment of the application is equivalent to the fact that geometric deformation is carried out on the face in the first face image by adopting a mathematical model, instead of carrying out geometric deformation on the face in the first face image by using an antagonistic network generated by training of a large amount of face cartoon data, the face geometric deformation effect can be explained, the face geometric deformation degree is controllable, and the generated face cartoon quality can be higher.

The face image processing method provided by the embodiment of the application can be applied to electronic equipment. The electronic device may be, for example, a mobile phone, a tablet computer, a desktop computer (desktop computer), a handheld computer, a notebook computer (laptop computer), an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), an Augmented Reality (AR) Virtual Reality (VR) device, and the like, and the embodiment of the present application is not limited to a specific form of the electronic device. Or, the face image processing method provided by the embodiment of the application can be applied to server equipment.

As shown in fig. 4A, the electronic device may be a mobile phone 100. The mobile phone 100 may include a processor 110, an external memory interface 120, an internal memory 121, a USB interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a radio frequency module 150, a communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display 194, a SIM card interface 195, and the like. The sensor module may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

The structure illustrated in the embodiment of the present invention is not limited to the mobile phone 100. It may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a Neural-Network Processing Unit (NPU), etc. The different processing units may be independent devices or may be integrated in the same processor.

The controller may be a decision maker directing the various components of the handset 100 to work in concert as instructed. Is the neural center and command center of the handset 100. The controller generates an operation control signal according to the instruction operation code and the time sequence signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor is a cache memory. Instructions or data that have just been used or recycled by the processor may be saved. If the processor needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses and reducing the latency of the processor, thereby increasing the efficiency of the system.

In some embodiments, the processor 110 may include an interface. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface.

The I2C interface is a bi-directional synchronous serial bus that includes a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, the processor may include multiple sets of I2C buses. The processor may be coupled to the touch sensor, charger, flash, camera, etc. via different I2C bus interfaces. For example: the processor may be coupled to the touch sensor via an I2C interface, such that the processor and the touch sensor communicate via an I2C bus interface to implement the touch functionality of the cell phone 100.

The I2S interface may be used for audio communication. In some embodiments, the processor may include multiple sets of I2S buses. The processor may be coupled to the audio module via an I2S bus to enable communication between the processor and the audio module. In some embodiments, the audio module can transmit audio signals to the communication module through the I2S interface, so as to realize the function of answering the call through the bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module and the communication module may be coupled by a PCM bus interface. In some embodiments, the audio module may also transmit the audio signal to the communication module through the PCM interface, so as to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication, with different sampling rates for the two interfaces.

The UART interface is a universal serial data bus used for asynchronous communications. The bus is a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor with the communication module 160. For example: the processor communicates with the Bluetooth module through the UART interface to realize the Bluetooth function. In some embodiments, the audio module may transmit the audio signal to the communication module through the UART interface, so as to realize the function of playing music through the bluetooth headset.

The MIPI interface can be used to connect a processor with peripheral devices such as a display screen and a camera. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, the processor and the camera communicate through a CSI interface to implement the camera function of the handset 100. The processor and the display screen communicate through a DSI interface to implement the display function of the mobile phone 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, the GPIO interface may be used to connect the processor with a camera, display screen, communication module, audio module, sensor, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.

The USB interface 130 may be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc. The USB interface may be used to connect a charger to charge the mobile phone 100, or may be used to transmit data between the mobile phone 100 and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. But may also be used to connect other electronic devices such as AR devices, etc.

The interface connection relationship between the modules in the embodiment of the present invention is only schematically illustrated, and does not limit the structure of the mobile phone 100. The mobile phone 100 may adopt different interface connection modes or a combination of multiple interface connection modes in the embodiment of the present invention.

The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module may receive charging input from a wired charger via a USB interface. In some wireless charging embodiments, the charging management module may receive a wireless charging input through a wireless charging coil of the cell phone 100. The charging management module can also supply power to the electronic device through the power management module 141 while charging the battery.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module receives the input of the battery and/or the charging management module and supplies power to the processor, the internal memory, the external memory, the display screen, the camera, the communication module and the like. The power management module may also be used to monitor parameters such as battery capacity, battery cycle number, battery state of health (leakage, impedance), etc. In some embodiments, the power management module 141 may also be disposed in the processor 110. In some embodiments, the power management module 141 and the charging management module may also be disposed in the same device.

The wireless communication function of the mobile phone 100 can be implemented by the antenna module 1, the antenna module 2, the rf module 150, the communication module 160, a modem, and a baseband processor.

The

antennas

1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the handset 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the cellular network antenna may be multiplexed into a wireless local area network diversity antenna. In some embodiments, the antenna may be used in conjunction with a tuning switch.

The RF module 150 may provide applications including second generation (2) for the handset 100^thgeneration, 2G)/third generation (3)^thgeneration, 3G)/fourth generation (4)^thgeneration, 4G)/fifth generation (5)^thgeneration, 5G), and the like. May include at least one filter, switch, power Amplifier, Low Noise Amplifier (LNA), etc. The radio frequency module receives electromagnetic waves through the antenna 1, and processes the received electromagnetic waves such as filtering, amplification and the like, and transmits the electromagnetic waves to the modem for demodulation. The radio frequency module can also amplify the signal modulated by the modem, and the signal is converted into electromagnetic wave by the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the rf module 150 may be located thereIn the processor 150. In some embodiments, at least some functional modules of the rf module 150 may be disposed in the same device as at least some modules of the processor 110.

The modem may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to a speaker, a receiver, etc.) or displays an image or video through a display screen. In some embodiments, the modem may be a stand-alone device. In some embodiments, the modem may be separate from the processor, in the same device as the rf module or other functional module.

The communication module 160 may provide a communication processing module including a solution for wireless communication, such as Wireless Local Area Network (WLAN) (e.g., WiFi), bluetooth, Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like, which is applied to the mobile phone 100. The communication module 160 may be one or more devices integrating at least one communication processing module. The communication module receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor. The communication module 160 may also receive a signal to be transmitted from the processor, frequency-modulate and amplify the signal, and convert the signal into electromagnetic waves via the antenna 2 to radiate the electromagnetic waves.

In some embodiments, the antenna 1 of the handset 100 is coupled to the radio frequency module and the antenna 2 is coupled to the communication module. So that the handset 100 can communicate with networks and other devices via wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), LTE, 5G New wireless communication (New Radio, NR), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, and the like. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The mobile phone 100 implements the display function through the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing and is connected with a display screen and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a miniature, a Micro led, a Micro-o led, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the handset 100 may include 1 or N display screens, with N being a positive integer greater than 1.

As also shown in fig. 4A, the cell phone 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen, an application processor, and the like.

The ISP is used for processing data fed back by the camera. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the handset 100 may include 1 or N cameras, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the handset 100 is in frequency bin selection, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

Video codecs are used to compress or decompress digital video. Handset 100 may support one or more codecs. Thus, the handset 100 can play or record video in a variety of encoding formats, such as: MPEG1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can realize applications such as intelligent recognition of the mobile phone 100, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the mobile phone 100. The external memory card communicates with the processor through the external memory interface to realize the data storage function. For example, files such as music, video, etc. are saved in an external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the cellular phone 100 and data processing by executing instructions stored in the internal memory 121. The memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The data storage area may store data (e.g., audio data, a phonebook, etc.) created during use of the handset 100, and the like. Further, the memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, other volatile solid-state storage devices, a universal flash memory (UFS), and the like.

The mobile phone 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc.

The audio module is used for converting digital audio information into analog audio signals to be output and converting the analog audio input into digital audio signals. The audio module may also be used to encode and decode audio signals. In some embodiments, the audio module may be disposed in the processor 110, or some functional modules of the audio module may be disposed in the processor 110.

The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The cellular phone 100 can listen to music through a speaker or listen to a hands-free call.

The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the handset 100 receives a call or voice information, it can receive voice by placing the receiver close to the ear.

The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or sending voice information, a user can input a voice signal into the microphone by making a sound by approaching the microphone through the mouth of the user. The handset 100 may be provided with at least one microphone. In some embodiments, the handset 100 may be provided with two microphones to achieve a noise reduction function in addition to collecting sound signals. In some embodiments, the mobile phone 100 may further include three, four or more microphones to collect sound signals and reduce noise, and may further identify sound sources and implement directional recording functions.

The headphone interface 170D is used to connect a wired headphone. The earphone interface may be a USB interface, or may be an open mobile platform (OMTP) standard interface of 3.5mm, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used for sensing a pressure signal, and converting the pressure signal into an electrical signal. In some embodiments, the pressure sensor may be disposed on the display screen. There are many types of pressure sensors, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor, the capacitance between the electrodes changes. The handset 100 determines the intensity of the pressure from the change in capacitance. When a touch operation is applied to the display screen, the mobile phone 100 detects the intensity of the touch operation according to the pressure sensor. The cellular phone 100 can also calculate the touched position based on the detection signal of the pressure sensor. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 180B may be used to determine the motion attitude of the cellular phone 100. In some embodiments, the angular velocity of the handset 100 about three axes (i.e., the x, y, and z axes) may be determined by a gyroscope sensor. The gyro sensor may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyroscope sensor detects the shake angle of the mobile phone 100, and calculates the distance to be compensated for the lens module according to the shake angle, so that the lens can counteract the shake of the mobile phone 100 through reverse movement, thereby achieving anti-shake. The gyroscope sensor can also be used for navigation and body feeling game scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the handset 100 calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by a barometric pressure sensor.

The magnetic sensor 180D includes a hall sensor. The handset 100 may detect the opening and closing of the flip holster using a magnetic sensor. In some embodiments, when the handset 100 is a flip phone, the handset 100 may detect the opening and closing of the flip based on the magnetic sensor. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.

The acceleration sensor 180E can detect the magnitude of acceleration of the cellular phone 100 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the handset 100 is stationary. The method can also be used for recognizing the terminal gesture, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 180F for measuring a distance. The handset 100 may measure distance by infrared or laser. In some embodiments, the scene is photographed and the cell phone 100 may utilize a range sensor to measure the distance to achieve fast focus.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. Infrared light is emitted outward through the light emitting diode. Infrared reflected light from nearby objects is detected using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the cell phone 100. When insufficient reflected light is detected, it can be determined that there is no object near the cellular phone 100. The mobile phone 100 can detect that the user holds the mobile phone 100 close to the ear by using the proximity light sensor, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor can also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense the ambient light level. The mobile phone 100 may adaptively adjust the display screen brightness according to the perceived ambient light level. The ambient light sensor can also be used to automatically adjust the white balance when taking a picture. The ambient light sensor may also cooperate with the proximity light sensor to detect whether the cell phone 100 is in a pocket to prevent inadvertent contact.

The fingerprint sensor 180H is used to collect a fingerprint. The mobile phone 100 can utilize the collected fingerprint characteristics to unlock the fingerprint, access the application lock, take a photograph of the fingerprint, answer an incoming call with the fingerprint, and the like.

The temperature sensor 180J is used to detect temperature. In some embodiments, the handset 100 implements a temperature processing strategy using the temperature detected by the temperature sensor. For example, when the temperature reported by the temperature sensor exceeds the threshold, the mobile phone 100 performs a reduction in the performance of the processor located near the temperature sensor, so as to reduce power consumption and implement thermal protection.

The touch sensor 180K is also referred to as a "touch panel". Can be arranged on the display screen. For detecting a touch operation acting thereon or thereabout. The detected touch operation may be passed to an application processor to determine the touch event type and provide a corresponding visual output via the display screen.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor may acquire a vibration signal of a human voice vibrating a bone mass. The bone conduction sensor can also contact the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, the bone conduction sensor may also be disposed in the earpiece. The audio module 170 may analyze a voice signal based on the vibration signal of the bone block vibrated by the sound part obtained by the bone conduction sensor, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signals acquired by the bone conduction sensor, and a heart rate detection function is realized.

The keys 190 include a power-on key, a volume key, and the like. The keys may be mechanical keys. Or may be touch keys. The cellular phone 100 receives a key input, and generates a key signal input related to user setting and function control of the cellular phone 100.

The motor 191 may generate a vibration cue. The motor can be used for incoming call vibration prompt and can also be used for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The touch operation on different areas of the display screen can also correspond to different vibration feedback effects. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a Subscriber Identity Module (SIM). The SIM card can be attached to and detached from the cellular phone 100 by being inserted into or pulled out from the SIM card interface. The handset 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface can support a Nano SIM card, a Micro SIM card, a SIM card and the like. Multiple cards can be inserted into the same SIM card interface at the same time. The types of the plurality of cards may be the same or different. The SIM card interface may also be compatible with different types of SIM cards. The SIM card interface may also be compatible with external memory cards. The mobile phone 100 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the handset 100 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the mobile phone 100 and cannot be separated from the mobile phone 100.

The software system of the mobile phone 100 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present invention uses an Android system with a layered architecture as an example to exemplarily illustrate a software structure of the mobile phone 100.

The layered architecture divides the software into several layers, each layer having a clear role and division of labor. And the layers communicate with each other through an interface. In some embodiments, the Android system is divided into four layers, which are an application layer, an application framework layer, an Android runtime and system library, and a kernel layer from top to bottom.

The application layer may include a series of application packages.

As shown in fig. 4B, the application package may include applications such as camera, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, video, short message, etc. For example, the face image processing method provided by the embodiment of the present application may be executed in a gallery or a camera.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.

As shown in fig. 4B, the application framework layer may include an activity manager, a window manager, a content provider, a view system, a resource manager, a notification manager, and the like, which is not limited in this embodiment.

Activity Manager (Activity Manager): for managing the lifecycle of each application. Applications typically run in the operating system in the form of Activity. For each Activity, there is an application record (activetyrecord) in the Activity manager corresponding to it, which records the state of the Activity of the application. The Activity manager can schedule Activity processes for the application using this Activity record as an identification.

Window manager (windowmanager service): graphical User Interface (GUI) resources for managing GUI resources used on a screen may specifically be used to: the method comprises the steps of obtaining the size of a display screen, creating and destroying a window, displaying and hiding the window, arranging the window, managing a focus, managing an input method, managing wallpaper and the like.

The system library and the kernel layer below the application framework layer may be referred to as an underlying system, and the underlying system includes an underlying display system for providing display services, for example, the underlying display system includes a display driver in the kernel layer and a surface manager in the system library. In addition, the bottom layer system in the application further comprises an identification module for identifying the physical form change of the flexible screen, and the identification module can be independently arranged in the bottom layer display system and also can be arranged in a system library and/or a kernel layer.

As shown in fig. 4B, the Android Runtime (Android Runtime) includes a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system. The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android. The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

As shown in fig. 4B, the system library may include a plurality of function modules. For example: surface manager (surface manager), Media Libraries (Media Libraries), OpenGL ES, SGL, and the like.

The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

OpenGL ES is used to implement three-dimensional graphics drawing, image rendering, compositing, and layer processing, among others.

SGL is a drawing engine for 2D drawing.

As shown in fig. 4B, the kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the present application, unless otherwise specified, "at least one" means one or more, "a plurality" means two or more. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.

For convenience of understanding, the following specifically introduces the facial image processing method provided in the embodiment of the present application, taking the first facial image as the facial image selected by the user and the second facial image as the average facial image, with reference to the drawings.

In some embodiments, as shown in fig. 5A or fig. 5B, the flow of the face image processing method may include: inputting the first face image and the average face image into a face geometric deformation module based on the average face image, wherein the face geometric deformation module can output the first face image after geometric deformation (the face shape (especially the chin) of the first face image after geometric deformation becomes thinner (sharper)); and inputting the first face image and the style image after the geometric deformation into a style migration network, and finally obtaining the face cartoon.

The human face geometric deformation module based on the average face image has the function of geometrically deforming the first human face image. The following specifically describes a processing flow of the first face image and the average face image by the face geometric deformation module based on the average face image:

as shown in fig. 6A or fig. 6B, the method flow includes: first, face position detection is performed on a first face image and an average face image. For example, the face position detection may be to detect five key points of the face (including the tip of the nose, the left eye corner of the left eye, the right eye corner of the right eye, the left mouth corner, and the right mouth corner) so as to determine the face position from the five key points of the face.

Then, the first face image and the average face image may be subjected to first processing based on the face position, and the first processing may include image cutting (processing). Taking the first face image as an example, when the first face image is subjected to image cutting, the first face image can be cut into a preset size by taking the nose tip of the face in the first face image as the center. For example, the first face image may be cut to a size of 256 × 256 pixels. Similarly, the size of the average face image may be cut to 256 × 256 pixel size. The first cut face image and the average face image are aligned, that is, the nose tip of the face in the first cut face image and the nose tip of the face in the average face image are coincident. The algorithm of the image segmentation process may be, for example, an affine transformation function (cv:: warpAffine).

Optionally, the first processing may further include image rotation (processing). That is, before image cutting is performed on the first face image and the average face image, if the face position in the first face image or the average face image is deflected or skewed (for example, the person faces his head), the first face image or the average face image may be rotated by a corresponding angle to correct the face position (for example, an angle between a line between two eyes of the face and a horizontal line is smaller than a preset threshold). Alternatively, the first or average face image may be rotated to a consistent angle. The algorithm of the image rotation process may be, for example, an affine transformation function (cv:: warpAffine).

After the first processing is performed on the first face image and the average face image, face keypoint detection may be performed on the first face image and the average face image. Illustratively, the first face image and the average face image may contain 68 face keypoints, respectively. Wherein, the eyebrow can include 10 key points (5 key points for each of the two eyebrows), the eye can include 12 key points (6 key points for each of the two eyes), the nose can include 9 key points, the mouth can include 20 key points, and the face shape (face contour) can include 17 key points. It should be noted that, when different face keypoint detection algorithms are used to perform face keypoint detection on the first face image and the average face image, the first face image or the average face image may include other numbers of face keypoints, for example, 100, 150, 200, and the like, which is not limited in this application.

In one possible design, after determining the face key points of the first face image and the average face image, the face control points (which may also be referred to as control points for short) of the first face image and the offset vectors corresponding to the face control points may be determined according to all the face key points of the first face image and all the face key points of the average face image.

As shown in fig. 7, when calculating the face control point of the first face image and the offset vector corresponding to the face control point, the facial features of the facial images may be calculated according to the face key point of the first face image and the face key point of the average face image. The facial features can be represented by the mean and/or variance of the coordinates corresponding to the facial key points corresponding to the facial features. Wherein the mean may represent the position of the five sense organs and the variance may represent the size of the five sense organs.

In one possible design, the coordinates corresponding to the face key points of the first face image comprise S first subsets corresponding to the five sense organs, and the mean value and/or the variance corresponding to each first subset are respectively calculated; coordinates corresponding to the face key points of the second face image comprise S second subsets corresponding to the five sense organs, and the mean value and/or the variance corresponding to each second subset are respectively calculated; the S first subsets correspond to the S second subsets one by one; wherein S is an integer of 1 or more. For example, the face keypoints of each face image may be divided into 5(S ═ 5) subsets by five sense organs such as eyebrows, eyes, nose, mouth, and face contour, and the mean and/or variance in each of the five sense organs keypoint subsets may be calculated. Then, the N target facial features of the first face image are determined according to the absolute value of the mean and/or variance of each of the S first subsets and the absolute value of the mean and/or variance of each of the S second subsets. For example, the difference between the absolute value of the mean and/or the variance of each of the S first subsets and the absolute value of the mean and/or the variance of the second subset corresponding to the first subset may be determined, resulting in S differences; and determining N first subsets corresponding to the maximum N differences in the S differences, wherein the corresponding five sense organs of the N first subsets are N target five sense organs, and N is less than or equal to S.

Wherein, the absolute value of the mean and/or the variance of each of the S first subsets may refer to the absolute value of the mean of each of the S first subsets, or may refer to the absolute value of the variance of each of the S first subsets, or may refer to the absolute value of the mean and the absolute value of the variance of each of the S first subsets. Similarly, the absolute value of the mean and/or the variance of each of the S second subsets may refer to the absolute value of the mean of each of the S second subsets, or may refer to the absolute value of the variance of each of the S second subsets, or may refer to the absolute value of the mean and the absolute value of the variance of each of the S second subsets. It should be noted that, when the absolute value of the mean and/or the variance of each of the S first subsets refers to the absolute value of the mean of each of the S first subsets, the absolute value of the mean and/or the variance of each of the S second subsets refers to the absolute value of the mean of each of the S second subsets; when the absolute value of the mean and/or the variance of each of the S first subsets refers to the absolute value of the variance of each of the S first subsets, the absolute value of the mean and/or the variance of each of the S second subsets refers to the absolute value of the variance of each of the S second subsets; when the absolute value of the mean and/or the variance of each of the S first subsets refers to the absolute value of the mean and the absolute value of the variance of each of the S first subsets, the absolute value of the mean and/or the variance of each of the S second subsets refers to the absolute value of the mean and the absolute value of the variance of each of the S second subsets.

For example, as shown in fig. 8, for a face shape of a first face image, corresponding face key points may include face key points with Identifiers (IDs) of 1 to 17, each face key point may correspond to a coordinate (e.g., a coordinate based on a rectangular coordinate system), and a mean and/or variance of coordinates corresponding to the face key points corresponding to the face shape, that is, a mean and/or variance of coordinates corresponding to the face key points with IDs of 1 to 17.

After obtaining the variance and/or mean of each facial feature of the first face image and the average face image, the variance and/or mean corresponding to the same facial feature of the first face image and the average face image may be compared to determine N target facial features of the first face image (to distinguish facial features). For example, the absolute value of the difference in the variance and/or mean of the five sense organs of the first face image and the average face image may be calculated. Each of the five sense organs may correspond to an absolute value, and the absolute values corresponding to a plurality of (e.g., 5) five sense organs are sorted, the first N distinguishing five sense organs with larger absolute values are the N distinguishing five sense organs with larger differences between the first face image and the average face image, and the first N distinguishing five sense organs with larger absolute values may be considered as the distinguishing five sense organs (or called as distinctive five sense organs) of the first face image.

After the difference of five sense organs is determined, the face control point of the first face image can be determined according to the difference of five sense organs. For example, all the key points of each of the above N different facial features may be used as the face control points. Alternatively, part of the key points of each of the N different facial features may be used as the face control points of the first face image. Wherein N may be preset, and N is an integer greater than or equal to 1. As shown in fig. 6B, N (e.g., one) distinguishing five sense organs of the first face image and the average face image may be face shapes, and then some or all of the key points (e.g., IDs of 1 to 17) corresponding to the face shapes are face control points of the first face image.

After the face control point of the first face image is determined, the offset vector corresponding to the face control point of the first face image can be further determined. For example, as shown in fig. 8, assuming that N (e.g., 1) facial features of the first face image are facial shapes, all the key points corresponding to the facial shapes may be taken as face control points, that is, the control points corresponding to the facial shapes may include face key points with IDs of 1-17, and for a control point with ID 2, the corresponding offset vector may be the coordinate of the control point with ID 2 on the average face image (for distinction, the key point may be denoted as 2'), minus the coordinate of the face key point with ID 2 on the first face image. The calculation manner of the offset vectors corresponding to other control points is similar, and is not described herein.

In another possible design, the face control points (also referred to as control points for short) of the first face image and the offset vectors corresponding to the face control points may be determined according to the part of the face key points of the first face image and the part of the face key points of the average face image. For example, the face control points and the offset vectors corresponding to the face control points may be determined for the face key points corresponding to the five sense organs selected by the user in the first face image. For the facial features selected by the user, calculating the variance and mean (represented by a) of coordinates corresponding to the facial key points corresponding to the facial features in the first facial image, calculating the variance and mean (represented by B) of coordinates corresponding to the facial key points corresponding to the facial features in the second facial image, determining whether the difference between the absolute value of a and the absolute value of B meets a preset threshold, and if so, determining the facial key points corresponding to the facial features selected by the user in the first facial image as the facial control points. The offset vector corresponding to the face control point refers to the above description, which is not described herein again.

As shown in fig. 6A, after the offset vectors corresponding to the face control points and the face control points of the first face image are obtained through calculation, the offset vector corresponding to each face control point may be multiplied by a weight coefficient W, where W may be greater than 0 or less than 0. Then, the face in the first face image may be geometrically deformed based on the offset vector multiplied by the weight coefficient W to obtain a geometrically deformed first face image. Assuming that the offset vector corresponding to the face control point includes a difference between the identification of the face control point and the coordinate corresponding to the face key point corresponding to the average face image, when the weight coefficient is less than 0 (e.g., W is-1, -3, etc.), the distinguishing feature between the first face image and the second face image is enhanced on the first face image after geometric deformation, that is, the similarity between the first face image and the average face image is weakened. When the weight coefficient W is greater than 0 (e.g., W is 1, 3, etc.), the distinctive feature of the first face image and the second face image is weakened on the geometrically deformed first face image, that is, the face in the first face image approaches the average face in the average face image. Wherein the distinguishing characteristic includes the size of the five sense organs. Optionally, the distinguishing features may also include the layout of the five sense organs.

It should be noted that the geometric face deformation performed on the first face image may be performed on the first face image before image segmentation (which may be referred to as an original image), or may be performed on the first face image after image segmentation (which may be referred to as a post-segmentation image).

For example, taking the facial shape as the difference five sense organs between the first face image and the average face image as an example, when the first face image before image segmentation (which may be referred to as an original image) is geometrically transformed, the input of the image warping algorithm may be input 1 or input 2. Input 1 may include a first cut face image, coordinates of control points of the first face image (for example, part or all of key points of the face shape, ID is 1-17), and an offset vector (not 0) corresponding to the control points. It should be noted that, when the offset vector corresponding to the control point is not 0, the position of the control point may be changed according to the magnitude and direction of the offset vector, so as to distort (geometrically deform, or understand as changing the pixel position of the image) the cut first face image. When the input of the image warping algorithm is input 1, the facial features of the face in the image output by the image warping algorithm may be deformed correspondingly, which is caused by the processing characteristics of the image warping algorithm itself. In order to avoid the image warping algorithm from warping five sense organs other than the face as much as possible, the input 2 of the image warping algorithm may be input 2, where the input 2 includes the first face image after cutting, the coordinates of the control points of the first face image (for example, some or all of the key points of the face, ID is 1-17), and the offset vector (not 0) corresponding to the control points. And, the input 2 may further include coordinates corresponding to face key points of five sense organs (other five sense organs, ID 1-48) other than the face shape in the first face image and an offset vector thereof (being 0). It should be noted that, when the offset vector corresponding to the face key points of other five sense organs in the first face image is 0, the positions of the face key points of other five sense organs are not changed, that is, the face key points of other five sense organs in the cut first face image are "fixed" so as not to be distorted by the image distortion algorithm.

Still taking the facial form as an example of the difference five sense organs between the first face image and the average face image, when the first face image after image segmentation (referred to as the segmented image) is geometrically transformed, the input of the image warping algorithm may be input 3 or input 4. The input 3 may include a face image before cutting (i.e., an original image), coordinates of control points of the original image (e.g., some or all of the key points of the face shape, ID is 1-17), and offset vectors (other than 0) corresponding to the control points, and the control points of the original image and the offset vectors thereof may be mapped according to the control points of the first face image after cutting and the offset vectors thereof. When the offset vector corresponding to the control point of the original image is not 0, the position of the control point may be changed according to the magnitude and direction of the offset vector, thereby distorting the original image. When the input of the image warping algorithm is input 3, the facial features of the face in the image output by the image warping algorithm may be deformed accordingly due to the processing characteristics of the image warping algorithm itself. In order to avoid the distortion of the five sense organs except the face shape by the image distortion algorithm as much as possible, the input 4 of the image distortion algorithm may be input 4, the input 4 includes the original image and the coordinates of the control points of the original image (for example, some or all key points of the face shape, ID is 1-17) and the offset vector (not 0) corresponding to the control points, and the control points of the original image and the offset vector thereof may be mapped according to the control points of the first cut face image and the offset vector thereof. The input 4 may further include coordinates corresponding to face key points of five sense organs except for the face shape of the original image and offset vectors (which are 0) corresponding to the face key points of the other five sense organs, and the face key points of the five sense organs except for the face shape in the original image may be obtained by mapping the face key points of the corresponding five sense organs of the first face image after the image is cut. When the offset vectors corresponding to the face key points of other five sense organs in the original image are 0, the positions of the face key points of other five sense organs in the original image are not changed, that is, the face key points of other five sense organs in the original image are fixed, so that the face key points are not distorted by the image distortion algorithm. The image warping algorithm may be, for example, warp algorithm.

After the first facial image is subjected to geometric deformation processing, style migration can be performed on the first facial image subjected to geometric deformation. The Style image and the geometrically deformed first face image may be input to a Style migration network (e.g., an Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization (AdaIN) Style migration network), and the Style migration network may perform Style migration on the geometrically deformed first face image according to a Style of the Style image. The style migration network may be a neural network trained from a plurality of style images belonging to at least one of landscape images, comic images, or art images.

In other embodiments, as shown in fig. 9, the flow of the face image processing method may include: inputting the first face image and the style image into a style migration network, wherein the style migration network can output the first face image after style migration; and inputting the first face image and the average face image after the style migration into a face geometric deformation module based on the average face image, and finally outputting a face cartoon.

In a possible design, before the first face image is input into the style migration network, face key points of the first face image need to be detected, which is because the feature of the first face image may be changed after the style migration network processes the first face image, which may cause difficulty in subsequently detecting the face key points of the first face image, thereby affecting a subsequent geometric deformation process on the first face image. The functions and processing flows of the style migration network and the face geometric deformation module based on the average face image may refer to the above description, which is not repeated herein.

In the face image processing method based on the average face image, the face geometric deformation module based on the average face image can perform geometric deformation on the face in the first face image based on the average face image, and can exaggerate or weaken the characteristics of the face in the first face image, and the style migration network can perform style migration on the first face image after geometric deformation based on the style image, so as to finally obtain the face cartoon corresponding to the first face image. Compared with the prior art, the method has the advantages that the human face cartoon generating task is decoupled into two completely independent subtasks of human face geometric deformation and style migration, the human face geometric deformation subtask adopts a human face geometric deformation method based on an average face image, the method is equivalent to a mathematical model rather than an confrontation network generated by training a large amount of human face cartoon data, the geometric characteristics of the human face in the first human face image can be exaggerated based on the average face, the quality of the generated human face cartoon is higher, the human face geometric deformation effect can be explained, and the human face geometric deformation degree is controllable.

The following specifically introduces the face image processing method provided by the embodiment of the present application, taking the first face image as the face image selected by the user and the second face image as the cartoon face image, by way of example with reference to the accompanying drawings.

In some embodiments, as shown in fig. 10A or fig. 10B, the flow of the face image processing method may include: inputting the first face image and the cartoon face image into a face geometric deformation module based on the cartoon face image, wherein the face geometric deformation module can output the first face image after geometric deformation; and inputting the first face image and the style image after the geometric deformation into a style migration network, and finally outputting the face cartoon.

The cartoon face image-based face geometric deformation module has a function of geometrically deforming a first face image. The following specifically describes a processing flow of the first face image and the cartoon face image by the face geometric deformation module based on the cartoon face image:

as shown in fig. 11A, the method flow includes: first, face position detection is performed on the first face image, for example, the face position detection may be to detect five key points of a face (including a nose tip, a left eye corner of a left eye, a right eye corner of a right eye, a left mouth corner, and a right mouth corner) so as to determine a face position according to the five key points of the face. Then, a first processing may be performed on the first face image based on the face position, where the first processing may include image cutting and image rotation, and the specific process may refer to the above related description, which is not described herein again. After the first face image is subjected to the first processing, face key point detection can be performed on the first face image. Namely, the face key points of the first face image are obtained by prediction of a face key point detection algorithm. It should be noted that, in the embodiment of the present application, the face key points of the cartoon face image may be obtained by performing first processing on the cartoon face image and then performing manual calibration.

After the face key points of the first face image and the cartoon face image are determined, the face control points (also referred to as control points for short) of the first face image and the offset vectors corresponding to the face control points can be determined according to the face key points of the first face image and the face key points of the cartoon face image. As shown in fig. 11B, when calculating the face control point of the first face image and the offset vector corresponding to the face control point, the facial features of the facial images may be calculated according to the face key point of the first face image and the face key point of the cartoon face image. The facial features can be represented by the mean and/or variance of the coordinates corresponding to the facial key points corresponding to the facial features. After obtaining the variance and/or mean of each facial feature of the first face image and the caricature image, the variance and/or mean corresponding to the same facial feature of the first face image and the caricature image may be compared to determine N target facial features (distinguishing facial features). Wherein N may be preset, and N is an integer greater than or equal to 1. And/or, a user-selected distinguishing five sense organs may be received. For example, the user may add other five sense organs on the basis of the N distinct five sense organs determined according to the variance and/or the mean, or may cancel some or all of the N distinct five sense organs determined according to the variance and/or the mean, and select other five sense organs as the distinct five sense organs. After the difference of five sense organs is determined, the face control point of the first face image can be determined according to the difference of five sense organs. For example, all the key points of the above N different facial features (including all the key points of each of the N different facial features) may be used as the face control points. Alternatively, part of the key points of each of the N different facial features may be used as the face control points of the first face image. After the face control point of the first face image is determined, the offset vector corresponding to the face control point of the first face image can be further determined.

As shown in fig. 11A, after the offset vectors corresponding to the face control points and the face control points of the first face image are obtained through calculation, the offset vector corresponding to each face control point may be multiplied by a weight coefficient W, where W may be greater than 0 or less than 0. Assuming that the offset vector corresponding to the face control point includes a coordinate corresponding to the face control point minus a coordinate corresponding to a face key point corresponding to the face control point in the caricature image, when a weight coefficient W is less than 0 (for example, W is-3), the distinguishing feature between the first face image and the caricature image is weakened on the first face image after geometric deformation, that is, the similarity between the first face image and the caricature image is enhanced, so that the face in the first face image approaches the caricature in the caricature image; when the weight coefficient is greater than 0 (e.g., W is 3), the distinctive feature of the first face image and the caricature image is enhanced on the geometrically deformed first face image, and the effect of reducing the similarity between the first face image and the caricature image can be achieved. Wherein the distinguishing features include the size of the five sense organs and the layout of the five sense organs. Then, geometric deformation of the face may be performed based on the offset vector multiplied by the weight coefficient W (i.e., geometric deformation of the face in the first face image) to obtain a geometrically deformed first face image. It should be noted that, performing geometric face deformation on the first face image may be performing geometric deformation on the first face image before cutting (i.e., the original image), or may be performing geometric deformation on the first face image after cutting.

Illustratively, as shown in fig. 12A, taking the different five sense organs of the first face image and the cartoon face image as the mouth as an example, when the input of the image warping algorithm is input 1, the image warping algorithm may output image 1. Wherein, input 1 comprises the first face image after cutting, the coordinates of the control points of the first face image (for example, part or all key points of the mouth, with the ID of 49-68) and the offset vector (not 0) corresponding to the control points. It should be noted that, when the offset vector corresponding to the control point is not 0, the position of the control point may be changed according to the magnitude and direction of the offset vector, so as to distort (geometrically deform) the cut first face image. When the input to the image warping algorithm is input 2, the image warping algorithm may output image 2. Wherein, the input 2 comprises the first face image after cutting, the coordinates of the control points of the first face image (for example, part or all key points of the mouth, the ID is 49-68) and the offset vector (not 0) corresponding to the control points. And, the input 2 may further include coordinates corresponding to face key points of five sense organs (other five sense organs, ID 1-48) except for the mouth in the first face image and an offset vector thereof (being 0). It should be noted that, when the offset vector corresponding to the face key points of other five sense organs in the first face image is 0, the positions of the face key points of other five sense organs are not changed, that is, the face key points of other five sense organs in the cut first face image are "fixed" so as not to be distorted by the image distortion algorithm. When the input to the image warping algorithm is input 3, the image warping algorithm may output image 3. The input 3 includes a face image (i.e. an original image) before cutting, coordinates of control points of the original image (e.g. some or all key points of the mouth, ID is 49-68) and offset vectors (not 0) corresponding to the control points, and the control points of the original image and the offset vectors thereof may be mapped according to the control points of the first face image after cutting and the offset vectors thereof. When the offset vector corresponding to the control point of the original image is not 0, the position of the control point may be changed according to the magnitude and direction of the offset vector, thereby distorting the original image. When the input to the image warping algorithm is input 4, the image warping algorithm may output image 4. The input 4 includes an original image and coordinates of control points of the original image (e.g., some or all key points of the mouth, ID is 49-68) and offset vectors (not 0) corresponding to the control points, and the control points of the original image and the offset vectors thereof may be mapped according to the control points of the cut first face image and the offset vectors thereof. When the offset vector corresponding to the control point is not 0, the position of the control point can be changed according to the size and the direction of the offset vector, so that the first face image before cutting is distorted. The input 4 may further include coordinates corresponding to the face key points of the five sense organs except the mouth of the original image and offset vectors (0) corresponding to the face key points of the other five sense organs, and the face key points of the five sense organs except the mouth of the original image may be obtained by mapping the face key points of the corresponding five sense organs of the first face image after image cutting. When the offset vectors corresponding to the face key points of other five sense organs in the original image are 0, the positions of the face key points of other five sense organs in the original image are not changed, that is, the face key points of other five sense organs in the original image are fixed, so that the face key points are not distorted by the image distortion algorithm.

After the first facial image is subjected to geometric deformation processing, style migration can be performed on the first facial image subjected to geometric deformation. For example, the style image and the geometrically deformed first face image may be input to a style migration network, and the style migration network may perform style migration on the geometrically deformed first face image according to a style of the style image. The style image may belong to at least one of a landscape image, a comic image, or an art image. Alternatively, the style image may be the same as the second face image (caricature face image).

For example, as shown in fig. 12B, on the basis of inputting a style image into a style migration network, when a first face image input into the style migration network is input 1, the output of the style migration network is output 1; when the first face image input into the style migration network is input 2, the output of the style migration network is output 2; when the first face image input into the style migration network is input 3, the output of the style migration network is output 3; when the first face image input to the style migration network is input 4, the output of the style migration network is output 4. And outputting 1 to 4, namely the human face cartoon subjected to geometric deformation processing and style migration processing.

In other embodiments, as shown in fig. 12C, the flow of the face image processing method may include: inputting the first face image and the style image into a style migration network, wherein the style migration network can output the first face image after style migration; and inputting the first human face image and the cartoon face image after the style migration into a human face geometric deformation module based on the cartoon face image, and finally outputting the human face cartoon. It should be noted that before the first face image is input into the style migration network, the face key points of the first face image need to be detected, which is because after the style migration network processes the first face image, the features of the first face image may be changed, which may cause difficulty in subsequently detecting the face key points of the first face image, thereby affecting the subsequent geometric deformation process on the first face image. The functions and processing flows of the style migration network and the cartoon face image-based face geometric deformation module may refer to the above description, which is not repeated herein.

In the cartoon face based face image processing method, the face geometric deformation module may perform geometric deformation on the face in the first face image based on the cartoon face image, for example, the face in the first face image approaches the cartoon face in the cartoon face image, and the style migration network may perform style migration on the first face image after geometric deformation based on the style image, so as to finally obtain the face cartoon. Compared with the prior art, the method has the advantages that the human face cartoon generating task is decoupled into two completely independent subtasks of human face geometric deformation and style migration, the human face geometric deformation subtask adopts human face geometric deformation based on the cartoon face image, is a mathematical model rather than an confrontation network generated by training a large amount of human face cartoon data, and can perform geometric deformation on the human face in the first human face image based on the cartoon face image, for example, the geometric characteristics of the human face in the first human face image approach the geometric characteristics of the cartoon face, so that the generated human face cartoon has higher quality, the human face geometric deformation effect can be explained, and the human face geometric deformation degree is controllable.

The face image processing method provided by the embodiments of the application can be run in a target application of an electronic device (e.g., a mobile phone). The target application may be any application that has a picture editing function or may invoke a picture editing function, such as a gallery application, a cropping application, a camera application, an instant messaging application, a blog application, or a game application. It should be noted that the face image processing method provided by the embodiments of the present application may be executed by an electronic device and/or a server. That is to say, the face geometric deformation module and the style migration network may be integrated on the electronic device, so that the first face image selected by the user may be subjected to corresponding geometric deformation processing and style migration processing. Or, the face geometric deformation module and the style migration network may be integrated on the server, and the electronic device may send the first face image selected by the user to the server, so that the server performs corresponding geometric deformation processing and style migration processing on the first face image, and returns a processing result to the electronic device. The server may be a server corresponding to the target application.

As shown in fig. 13, an icon 302 of the gallery application is included in a main interface (i.e., a desktop) 301 of the mobile phone, and the mobile phone may receive a click operation of a user on the icon 302 of the gallery application, and in response to the click operation, the mobile phone may open the gallery application. In response to an operation of the user selecting one image (first face image) in the interface of the gallery application, the cellular phone may display an interface 401 as shown in (a) of fig. 14. In response to an operation of the user selecting "more" 402 on the interface 401, as shown in (b) of fig. 14, the cellular phone may display a prompt box 403. The prompt box 403 may include more functions such as generating a human face caricature (function) 404 and uploading a cloud album (function) 405, etc.

In one possible implementation, in response to the user selecting the operation of generating the face caricature 404 at the prompt box 403, the mobile phone may display an interface 501 as shown in (a) in fig. 15. The interface 501 includes a preview box 502, a five sense organ selection area 504, and a grid selection area 505. The facial features selection area 504 includes a plurality of facial features, each of which may be displayed in the form of an image, and may include eyes, a mouth, a face shape (a contour of a human face), and the like. The arrows on the left and right sides of the facial features selection area 504 can be used to show more facial features, such as the eyebrows and nose. The style selection area 505 may be used to select styles of images, wherein different styles of pictures may be included to represent different image styles, for example, the style represented by the left-to-right images in the style selection area 505 may be a geometric style, a sunlight style, an abstract style, etc., and the arrows on the left and right sides of the style selection area 505 may be used to show more styles of images (i.e., images representing other styles).

In a possible case where the face in the first face image is more different from the average face image in the face shape, as shown in (b) of fig. 15, a checkbox 5041 may be displayed to prompt the user to distinguish the facial shape of the five sense organs. Further, the mobile phone may receive an operation of the user selecting more five sense organs, or the mobile phone may receive an operation of the user canceling the selected five sense organs (for example, facial shape).

As shown in (c) of fig. 15, the mobile phone may receive an operation of selecting a style image by the user, for example, a checkbox 5051 may be displayed to prompt that the current style image is checked, the mobile phone may perform style migration on the geometrically deformed first face image according to the style image in the checkbox 5051, and the style migrated first face image is displayed in a preview box 502.

Optionally, as shown in fig. 16, the interface 501 may further include a sliding bar 610 and a sliding block 611, where the sliding block 611 is located at different positions, so that the deformation of five sense organs is differentiated by different magnitudes or degrees. For example, as shown in (a) of fig. 16, in the embodiment of the present application, when the slider 611 is located at the current position of the slide bar 610, it may indicate that the degree of deformation for distinguishing the five sense organs is "1", and as shown in (b) of fig. 16, when the slider 611 is located at the current position of the slide bar 610, it may indicate that the degree of deformation for distinguishing the five sense organs is "3", that is, the degree of deformation for distinguishing the five sense organs is higher.

In another possible implementation, in response to the user selecting the operation of generating the face caricature 404 at the prompt box 403, the mobile phone may display an interface 601 as shown in (a) of fig. 17. The interface 601 includes a preview box 602, a five-sense organ selection area 604, and a grid selection area 605. The facial features selection area 604 may include a plurality of facial features, which may be displayed in the form of labels, which may include, for example, eyes, mouth, face (facial contour), etc., and the arrows on the left and right sides of the facial features selection area 604 may be used to show more facial features, such as eyebrows and nose. The style selection area 605 may be used to select a style of the image, which may include different labels to represent different styles, for example, a geometric style, a sunlight style, an abstract style, etc., and the arrows on the left and right sides of the style selection area 605 may be used to show more style labels (i.e., labels representing other styles). In a possible case where the face in the first face image is greatly different from the average face image in the face shape, as shown in (b) of fig. 17, a check box 6041 may be displayed to prompt the user to distinguish the facial shape of the five sense organs, and the first face image after geometrically deforming the face shape may be displayed in the preview box 602. As shown in (c) in fig. 17, the mobile phone may receive an operation of selecting a style image by the user, for example, a check box 6051 may be displayed to indicate that the "geometric" style label is selected, the mobile phone may perform style migration on the geometrically deformed first face image according to the style image corresponding to the style label in the check box 6051, and display the style migrated first face image in the preview box 602.

In yet another possible implementation, in response to the user selecting the operation of generating the face caricature 404 at the prompt box 403, the mobile phone may display an interface 701 as shown in (a) in fig. 18A. The interface 701 includes a preview box 702, a second face image (or referred to as a reference face image) selection area 704, and a style selection area 705. The second facial image selection area 704 may include an average facial image and a caricature facial image, and the first facial image may be geometrically deformed according to the image selected by the user. The arrows on the left and right sides of the second facial image selection area 704 may be used to present more second facial images (more average facial images or caricature facial images). In a possible implementation manner, the interface 701 may further include a sliding bar 610 and a slider 611, and the user may adjust the degree of geometric deformation of the corresponding five sense organs in the first face image by adjusting the position of the slider 611. Genre selection area 705 includes a plurality of different genre images. For example, the style represented by the left-to-right images in style selection area 505 may be a geometric style, a sunlight style, an abstract style, and the like, in that order. Arrows on the left and right sides of the genre selection area 705 may be used to present more genre images (i.e., images representing other genres). Alternatively, different styles in style selection area 705 may be represented by different labels, which are not limited herein.

The mobile phone can default to select geometric deformation of the first face image according to the average face. Alternatively, the mobile phone may receive an operation of the user selecting the comic face image, and as shown in (b) of fig. 18A, the mobile phone may display a checkbox 7041 to prompt that the current comic face image is selected. The mobile phone may geometrically deform the first face image according to the caricature face image in the check box 7041, and display the geometrically deformed first face image in the preview box 702. The mobile phone may further receive an operation of selecting a style image by the user, as shown in (c) of fig. 18A, the mobile phone may display a check box 7051 to prompt that the current style image is checked, the mobile phone may perform style migration on the geometrically deformed first face image according to the style image in the check box 7051, and display the style migrated first face image in the preview box 702. In one possible design, the style selection area 705 may not be included in the interface 701, and the mobile phone may perform geometric deformation processing and style migration processing on the first facial image in response to the caricature image selected by the user, that is, the style image may be the same as the second facial image (caricature image).

In another possible design, as shown in fig. 18B, the style selecting area 705 may display images obtained by performing different style migration processing on the first face image in the preview interface 702 according to different style images, and the user may see the effect of performing different style migration processing on the first face image in advance, so as to select according to the needs or preferences of the user, and may improve the user experience.

In one possible design, the cell phone may open the camera application in response to a touch operation of the camera application by a user. The human face cartoon of the human face image collected by the camera can be displayed in real time on a preview interface of the camera application. In response to the photographing operation by the user, a face caricature similar to that shown in the preview box 502 in (c) in fig. 15 may be obtained, or a face caricature similar to that shown in the preview box 702 in (c) in fig. 18A may be obtained.

As shown in fig. 19, an embodiment of the present application provides a face image processing method, including:

1901. and acquiring a first face image and a second face image.

1902. And determining face control points from the face key points of the first face image according to the face key points of the first face image and the face key points of the second face image, and determining offset vectors corresponding to the face control points.

The first face image is different from the second face image. The offset vector corresponding to the face control point can be used to change the position of the face control point.

1903. And inputting the face control points and the offset vectors corresponding to the face control points multiplied by the weight coefficients into an image warping algorithm to obtain a first face image after geometric deformation.

When the weight coefficient is less than 0, the distinguishing characteristics of the first face image and the second face image are strengthened on the first face image after the geometric deformation; when the weight coefficient is larger than 0, the distinguishing characteristics of the first face image and the second face image are weakened on the first face image after geometric deformation; wherein the distinguishing characteristic includes the size of the five sense organs.

1904. And outputting the first face image after the geometric deformation.

The first facial image may be a facial image selected by a user, and the second facial image may be an average facial image or a cartoon facial image. When the second face image is the average face image, reference may be made to the related description of the method embodiment shown in fig. 6A or fig. 6B in steps 1901 to 1904, which is not repeated herein. When the second face image is a cartoon face image, reference may be made to relevant description of the method embodiment shown in fig. 11A in steps 1901 to 1904, which is not described herein again.

As shown in fig. 20, an embodiment of the present application provides a face image processing method, including:

2001. and displaying a first interface in response to the operation of the first image processing on the first face image by the user.

The first image processing comprises geometric deformation processing and style migration processing of the face in the first face image. As shown in (b) in fig. 14, the operation of the user performing the first image processing on the first face image may be an operation of the user selecting generation of the face caricature 404 in the prompt box 403 in a case where the first face image is opened.

2002. And displaying the first face image obtained by geometrically deforming the face in the first face image according to the second face image on the first interface.

2003. And responding to the style image selected by the user on the first interface, performing style migration on the geometrically deformed first face image according to the style image, and displaying the style migrated first face image on the first interface.

Illustratively, as shown in fig. 15 (a), the first interface may be an interface 501, the first face image may be a face image selected by the user, and the second face image may be an average face image. As shown in (a) of fig. 18A, the first interface may be an interface 701, the first face image may be a face image selected by the user, and the second face image may be a comic face image selected by the user. Step 2001-step 2003 may refer to the description related to the method embodiment shown in fig. 15 or fig. 18A, which is not described herein again.

Embodiments of the present application further provide a chip system, as shown in fig. 21, which includes at least one processor 2101 and at least one interface circuit 2102. The processor 2101 and the interface circuit 2102 may be interconnected by wires. For example, the interface circuit 2102 may be used to receive signals from other devices (e.g., a memory of an electronic apparatus). As another example, the interface circuit 2102 may be used to send signals to other devices (e.g., the processor 2101).

For example, the interface circuit 2102 may read instructions stored in a memory in the electronic device and send the instructions to the processor 2101. The instructions, when executed by the processor 2101, may cause an electronic device, such as the cell phone 100 shown in fig. 4A, to perform the various steps in the embodiments described above.

Of course, the chip system may further include other discrete devices, which is not specifically limited in this embodiment of the present application.

Embodiments of the present application also provide a computer-readable storage medium, which includes computer instructions, and when the computer instructions are executed on an electronic device (such as the mobile phone 100 shown in fig. 4A), the mobile phone 100 executes various functions or steps performed by the electronic device in the above-described method embodiments.

Embodiments of the present application further provide a computer program product, which, when running on a computer, causes the computer to execute each function or step performed by the electronic device in the above method embodiments.

The embodiment of the present application further provides an image processing apparatus, where the apparatus may be divided into different logic units or modules according to functions, and each unit or module executes different functions, so that the apparatus executes each function or step executed by the electronic device in the above method embodiments.

From the above description of the embodiments, it is obvious for those skilled in the art to realize that the above function distribution can be performed by different function modules according to the requirement, that is, the internal structure of the device is divided into different function modules to perform all or part of the above described functions.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributed to by the prior art, or all or part of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A face image processing method is characterized by comprising the following steps:

acquiring a first face image and a second face image;

determining a face control point from the face key points of the first face image according to the face key points of the first face image and the face key points of the second face image, and determining an offset vector corresponding to the face control point; the first face image is different from the second face image, and the offset vector corresponding to the face control point comprises a difference value between a coordinate corresponding to the identification of the face control point in a face key point corresponding to the second face image and a coordinate corresponding to the face control point;

inputting the face control points and offset vectors corresponding to the face control points multiplied by the weight coefficients into an image warping algorithm to obtain a first face image after geometric deformation; when the weight coefficient is a first value, the distinguishing feature of the first face image, which is different from the second face image, is strengthened on the first face image after the geometric deformation; when the weight coefficient is a second value, the distinguishing characteristic of the first face image, which is different from the second face image, is weakened on the first face image after the geometric deformation; wherein the distinguishing characteristic comprises the size of the five sense organs;

and outputting the first face image after the geometric deformation.

2. The method of claim 1,

the face control points comprise part of face key points of the first face image.

3. The method according to claim 1 or 2, wherein determining the offset vectors corresponding to the face control points of the first face image and the face control points according to the face key points of the first face image and the face key points of the second face image comprises:

coordinates corresponding to the face key points of the first face image comprise S first subsets corresponding to the five sense organs, and the mean value and/or the variance corresponding to each first subset are respectively calculated; coordinates corresponding to the face key points of the second face image comprise S second subsets corresponding to the five sense organs, and the mean value and/or the variance corresponding to each second subset are respectively calculated; the S first subsets correspond to the S second subsets one by one; wherein S is an integer greater than or equal to 1;

determining N target facial features of the first facial image according to the absolute value of the mean and/or variance of each of the S first subsets and the absolute value of the mean and/or variance of each of the S second subsets, wherein N is less than or equal to S;

the face control points of the first face image comprise face key points corresponding to each target facial feature in the N target facial features in the first face image.

4. A method according to claim 3 wherein determining the N target facial features of the first facial image from the absolute value of the mean and/or variance of each of the S first subsets and the absolute value of the mean and/or variance of each of the S second subsets comprises:

determining the difference between the absolute value of the mean and/or the variance of each of the S first subsets and the absolute value of the mean and/or the variance of the second subset corresponding to the first subset to obtain S differences; determining N first subsets corresponding to the largest N differences in the S differences, wherein the N first subsets correspond to the five sense organs which are the N target five sense organs.

5. The method according to any one of claims 3 to 4,

the S first subsets of corresponding five sense organs comprise first subsets corresponding to eyebrows, eyes, a nose, a mouth and a face contour respectively.

6. The method according to any one of claims 1 to 5,

the second face image is an average face image.

7. The method according to any one of claims 1 to 5,

the second face image is a cartoon face image.

8. The method according to any one of claims 1-7, further comprising:

respectively detecting the face positions of the first face image and the second face image;

performing first processing on the first face image and the second face image, wherein the first processing comprises image cutting processing and/or image rotation processing; the first processed first facial image and the first processed second facial image are aligned.

9. The method according to any one of claims 1 to 8,

the image warping algorithm is a Warp algorithm.

10. The method according to any one of claims 1-9, wherein before inputting the offset vectors corresponding to the face control points and the face control points multiplied by weighting coefficients into an image warping algorithm to obtain a geometrically deformed first face image, the method further comprises:

and carrying out style migration on the first face image according to the style image and a style migration network.

11. The method according to any one of claims 1-9, further comprising:

and carrying out style migration on the first face image after the geometric deformation according to the style image and a style migration network.

12. The method according to claim 10 or 11,

the style migration network is a neural network obtained by training according to a plurality of style images, and the plurality of style images belong to at least one of landscape images, cartoon images or art image images.

13. The method according to any one of claims 10 to 12,

the style migration network is an AdaIN style migration network.

14. The method according to any one of claims 10 to 13,

the second face image is the same as the style image.

15. A face image processing method is characterized by comprising the following steps:

responding to the operation of a user for carrying out first image processing on a first face image, and displaying a first interface; the first image processing comprises geometric deformation processing and style transfer processing of the human face in the first human face image;

displaying a first face image obtained by geometrically deforming a face in the first face image according to a second face image on the first interface;

and responding to the style image selected by the user on the first interface, performing style migration on the first geometrically deformed face image according to the style image, and displaying the first face image after the style migration on the first interface.

16. A face image processing method is characterized by comprising the following steps:

performing style migration processing on the first face image according to the style image selected by the user on the first interface;

and performing geometric deformation processing on the face in the first face image after the style migration according to the second face image, and displaying the first face image after the geometric deformation on the first interface.

17. The method according to claim 15 or 16,

the first interface comprises a preview area and a style image selection area; the style image selection area displays a style image to be selected or a label of the style image; and the preview area displays the first face image after the geometric deformation or the first face image after the style migration.

18. The method according to any one of claims 15 to 17,

the first interface further comprises a reference face image selection area, and the content displayed in the reference face image selection area comprises a cartoon image to be selected or a label of the cartoon image.

19. The method according to any one of claims 15-18, further comprising:

inputting the face control points and offset vectors corresponding to the face control points multiplied by the weight coefficients into an image warping algorithm to obtain a first face image after geometric deformation; when the weight coefficient is a first value, the distinguishing feature of the first face image, which is different from the second face image, is strengthened on the first face image after the geometric deformation; when the weight coefficient is a second value, the distinguishing characteristic of the first face image, which is different from the second face image, is weakened on the first face image after the geometric deformation; wherein the distinguishing characteristic comprises the size of the five sense organs.

20. The method of claim 19,

21. The method according to claim 19 or 20, wherein determining face control points from the face key points of the first face image and the face key points of the second face image, and determining offset vectors corresponding to the face control points comprises:

22. A method according to claim 21 wherein determining the N target facial features of the first face image from the absolute value of the mean and/or variance of each of the S first subsets and the absolute value of the mean and/or variance of each of the S second subsets comprises:

23. The method of claim 21 or 22,

24. The method of any one of claims 15-23,

the first interface further comprises a five sense organ area, the five sense organ area displays images of S five sense organs or labels of the five sense organs, and when one or more five sense organs in the five sense organ area are selected, the first face image of the one or more geometrically deformed five sense organs is displayed in the preview area.

25. The method of any one of claims 21-24,

n target facial features of the S facial features are selected by default.

26. The method of claim 24 or 25, further comprising:

and receiving an operation of selecting or cancelling any one of the S five sense organs by the user, and responding to the operation to perform or cancel geometric deformation processing on the corresponding five sense organs in the first face image.

27. The method of claim 25 or 26,

the first interface further comprises a first control, and the first control is used for controlling the geometric deformation degree of the selected facial features.

28. The method of any one of claims 15-27,

the second face image is an average face image.

29. The method of any one of claims 15-27,

the second face image is a cartoon face image.

30. A chip system, comprising one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected through a line;

the chip system is applied to an electronic device comprising a communication module and a memory; the interface circuit to receive signals from the memory and to send the signals to the processor, the signals including computer instructions stored in the memory; the electronic device performs the method of any of claims 1-29 when the processor executes the computer instructions.

31. A computer-readable storage medium comprising computer instructions;

the computer instructions, when executed on an electronic device, cause the electronic device to perform the method of any of claims 1-29.

32. An image processing apparatus comprising a processor coupled to a memory, the memory storing program instructions that, when executed by the processor, cause the apparatus to implement the method of any of claims 1 to 29.