[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2024120031A1 - 处理视频数据的方法、装置、计算机设备和存储介质 - Google Patents

处理视频数据的方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2024120031A1
WO2024120031A1 PCT/CN2023/126236 CN2023126236W WO2024120031A1 WO 2024120031 A1 WO2024120031 A1 WO 2024120031A1 CN 2023126236 W CN2023126236 W CN 2023126236W WO 2024120031 A1 WO2024120031 A1 WO 2024120031A1
Authority
WO
WIPO (PCT)
Prior art keywords
resolution
data
video
texture
texture data
Prior art date
Application number
PCT/CN2023/126236
Other languages
English (en)
French (fr)
Inventor
杨学营
鲍金龙
刘世川
李正通
林钊武
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2024120031A1 publication Critical patent/WO2024120031A1/zh
Priority to US18/915,488 priority Critical patent/US20250039330A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0117Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving conversion of the spatial resolution of the incoming video signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/60Extraction of image or video features relating to illumination properties, e.g. using a reflectance or lighting model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Definitions

  • the present disclosure relates to the field of cloud technology, and in particular to a method, apparatus, computer equipment and storage medium for processing video data.
  • Super-resolution technology (also referred to as super-resolution technology) has been widely used in practical applications, such as medical image reconstruction, facial image reconstruction, ultra-high-definition television, ultra-high-definition video playback, etc.
  • super-resolution technology can improve the resolution of one or more frames in the original video through hardware or software methods, and reconstruct low-resolution video into high-resolution video.
  • the current super-resolution technology has high storage costs, large computational workload, and time-consuming algorithms, resulting in video freezes when mobile terminals using super-resolution technology play real-time videos.
  • a method for processing video data comprising: receiving compressed video data, the compressed video data having a first resolution; obtaining texture data having the first resolution based on one or more video frames in the compressed video data; generating texture data having a second resolution based on the texture data having the first resolution, the second resolution being higher than the first resolution; and generating one or more video frames having the second resolution through a rendering operation based on the texture data having the second resolution.
  • a device for processing video data comprising: a receiving module, configured to receive compressed video data, the compressed video data having a first resolution; an extraction module, configured to obtain texture data having the first resolution based on one or more video frames in the compressed video data; a super-resolution processing module, configured to generate texture data having a second resolution based on the texture data having the first resolution, the second resolution being higher than the first resolution; and a rendering module, configured to generate one or more video frames having the second resolution through a rendering operation based on the texture data having the second resolution.
  • a computer device comprising: one or more processors; and one or more memories, wherein a computer executable program is stored, when the computer executable program is executed by the processor and the display, the method as described above is performed.
  • a computer program product or a computer program comprising computer instructions, the computer instructions being stored in a computer-readable storage medium.
  • a processor of a computer device reads the computer instructions from the computer-readable medium, and the processor executes the computer instructions, so that the computer device executes the methods provided in the above-mentioned various aspects or various optional implementations of the above-mentioned various aspects.
  • a computer-readable storage medium on which computer-executable instructions are stored, and the instructions are used to implement the above method when executed by a processor.
  • the various embodiments of the present disclosure save network transmission bandwidth and improve video transmission efficiency by compressing the video on the server side and sending videos with smaller data volume and lower definition.
  • the lightweight super-resolution algorithm also known as super-resolution algorithm
  • the video storage cost of the mobile terminal and ensure the video viewing effect.
  • FIG1 shows a schematic diagram of an application scenario according to an embodiment of the present disclosure.
  • FIG. 2 shows a user interface according to an embodiment of the present disclosure.
  • FIG. 3 shows a flowchart of a method for processing video data according to an embodiment of the present disclosure.
  • FIG. 4 shows an architecture diagram of a device for processing video data according to an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram illustrating a process of compressing original video data on a server according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram illustrating acquiring texture data having a first resolution according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram illustrating acquiring texture data having a second resolution according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram illustrating generating one or more video frames having a second resolution based on texture data having a second resolution according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram illustrating a rendering operation according to an embodiment of the present disclosure.
  • FIG. 10 shows a schematic diagram of an electronic device according to an embodiment of the present disclosure.
  • FIG. 11 is a schematic diagram showing the architecture of an exemplary computing device according to an embodiment of the present disclosure.
  • FIG. 12 shows a schematic diagram of a storage medium according to an embodiment of the present disclosure.
  • the first data may be referred to as the second data, and similarly, the second data may be referred to as the first data.
  • Both the first data and the second data may be data, and in some cases, may be separate and different data.
  • the meaning of the term "at least one" in the present application refers to one or more, and the meaning of the term “multiple” in the present application refers to two or more, for example, multiple audio frames refer to two or more audio frames.
  • the size of the sequence number of each process does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application. It should also be understood that determining B according to (based on) A does not mean determining B only according to (based on) A, but B can also be determined according to (based on) A and/or other information.
  • Cloud technology refers to a hosting technology that unifies hardware, software, network and other resources within a wide area network or local area network to achieve data computing, storage, processing and sharing.
  • Cloud technology is a general term for network technology, information technology, integration technology, management platform technology, application technology, etc. based on the cloud computing business model. It can form a resource pool that can be used on demand and is flexible and convenient. Cloud computing technology will become an important support.
  • the background services of technical network systems require a large amount of computing and storage resources, such as video websites, image websites and more portal websites. With the rapid development and application of the Internet industry, in the future, each item may have its own identification mark, and all need to be transmitted to the background system for logical processing. Data of different levels will be processed separately. All kinds of industry data require strong system backing support, which can only be achieved through cloud computing.
  • Resolution refers to the ability of a measurement or display system to distinguish details. It indicates the ability to distinguish two points or lines in a video frame. Resolution can also characterize the clarity of an image. The higher the resolution, the better the image quality and the more details it can show; but conversely, because more information is recorded, the larger the file will be. Units for describing resolution include DPI (dots per inch), LPI (lines per inch) and PPI (pixels per inch). Among them, PPI is a commonly used unit, which describes the ratio of the number of pixels per unit length to the unit length. PPI is also called pixel density. The higher the pixel density, the denser the pixels. 5PPI means 5 pixels per inch, and 500PPI means 500 pixels per inch. The higher the PPI value, the higher the clarity of the picture and video.
  • Convolutional Neural Networks are a type of feedforward neural networks that include convolution calculations and have a deep structure. They are one of the representative algorithms of deep learning. Convolutional neural networks have representation learning capabilities and can perform shift-invariant classification on input information according to their hierarchical structure. Therefore, they are also called "Shift-Invariant Artificial Neural Networks (SIANN)".
  • SIANN Shift-Invariant Artificial Neural Networks
  • the industry has also proposed another solution, which is to pre-install a large number of neural network training models on mobile terminals and use deep learning algorithms (such as CNN algorithms) to super-resolution low-resolution videos on mobile terminals.
  • deep learning algorithms have high requirements on the size and quality of models and operators. In scenarios with large models, This will cause the mobile terminal to consume too much computing resources, resulting in lag when playing videos.
  • the present disclosure provides a method, apparatus, storage medium and computer device for processing video data, the method comprising: receiving compressed video data, the compressed video data having a first resolution; based on one or more video frames in the compressed video data, obtaining texture data having the first resolution; based on the texture data having the first resolution, generating texture data having a second resolution, the second resolution being higher than the first resolution; and based on the texture data having the second resolution, generating one or more video frames having the second resolution through a rendering operation.
  • the various embodiments of the present disclosure save network transmission bandwidth and improve video transmission efficiency by compressing the video on the server side and sending videos with smaller data volume and lower definition.
  • the lightweight super-resolution algorithm also known as super-resolution algorithm
  • the video storage cost of the mobile terminal and ensure the video viewing effect.
  • FIG1 shows a schematic diagram of an application scenario 100 according to an embodiment of the present disclosure, in which a server 110 and multiple terminals 120 (e.g., mobile terminals) are schematically shown.
  • Video data can be stored in the mobile terminal 120 or the server 110, and the terminal and the server can be directly or indirectly connected via wired or wireless communication, so that video data can be transmitted between the mobile terminal 120 and the server 110.
  • the terminal 120 may be a terminal with a storage unit and a microprocessor, such as a mobile phone, a tablet computer, a laptop computer, a desktop computer, a personal computer (PC), a smart speaker or a smart watch, but is not limited thereto.
  • the server 110 may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content distribution networks (CDNs), and big data and artificial intelligence platforms.
  • cloud services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content distribution networks (CDNs), and big data and artificial intelligence platforms.
  • the server 110 can process the original video data using various data compression techniques to generate compressed video.
  • the compressed video data can be stored in a smaller storage space and transmitted using fewer network resources.
  • the server 110 will publish the storage address of the compressed video on the server on the portal website after completing the compression of the original video data. Afterwards, the server 110 will transmit the compressed video to one or more terminals 120 according to the request of the terminal 120 for the video data service.
  • the terminal 120 is installed with an application for playing videos
  • the server 110 may be a background server of the application deployed in the terminal, used to interact with the terminal running the application for playing videos to provide computing and application service support to the terminal (or the application deployed in the terminal).
  • the application for playing videos installed on the terminal 120 can be a product integrating audio and video communication functions to provide multi-platform live broadcast, on-demand, short video, real-time audio and video, beauty effects and other audio and video and communication capabilities such as mobile terminals, PC terminals, Web terminals, and mini-program terminals.
  • the blocks can be integrated into a software development kit (SDK) that supports the above functions to achieve the effect of access once and call everywhere.
  • SDK software development kit
  • a software development kit is a collection of development tools that software engineers use to build application software for a specific software package, software framework, hardware platform, operating system, etc.
  • modules related to super-resolution technology can be integrated into the application in the form of SDK plug-ins.
  • SDK plug-ins are programs written in accordance with certain standardized application program interfaces, which can support multiple platforms (such as IOS platforms or Android platforms) at the same time and call function libraries or data on these platforms to convert compressed video data from the server 110 into high-definition video data.
  • the user can click the "Local Super-Resolution” button to call the module related to the super-resolution technology deployed on the terminal 120, and realize real-time conversion and playback of low-definition video to high-definition video through the lightweight and efficient super-resolution technology according to the embodiment of the present disclosure.
  • the present disclosure provides a method for displaying video data, the method comprising: receiving compressed video data, the compressed video data having a first resolution; displaying a button for controlling the resolution of the video data; in response to the button for controlling the resolution of the video data being triggered, generating one or more video frames having a second resolution based on one or more video frames in the compressed video data, the second resolution being higher than the first resolution; and displaying the one or more video frames having the second resolution.
  • the present disclosure also provides a device for displaying video data, the device comprising a receiver (receiving module), a processor and a display, wherein the receiver is configured to receive compressed video data, the compressed video data having a first resolution; the display is configured to display a button for controlling the resolution of the video data at a first moment; the processor is configured to generate one or more video frames with a second resolution based on one or more video frames in the compressed video data in response to the button for controlling the resolution of the video data being triggered, the second resolution being higher than the first resolution; and the display is also configured to display the one or more video frames with the second resolution at a second moment.
  • the present disclosure also provides a method for processing video data, the method comprising: receiving compressed video data, the compressed video data having a first resolution; obtaining texture data having the first resolution based on one or more video frames in the compressed video data; generating texture data having a second resolution based on the texture data having the first resolution, the second resolution being higher than the first resolution; and generating one or more video frames having the second resolution through a rendering operation based on the texture data having the second resolution.
  • the present disclosure further provides a device for processing video data, the device comprising: a receiving module, configured to receive compressed video data, the compressed video data having a first resolution; an extraction module, configured to obtain texture data having the first resolution based on one or more video frames in the compressed video data; a super-resolution processing module, configured to generate texture data having a second resolution based on the texture data having the first resolution, the second resolution being higher than the first resolution; and a rendering module, configured to generate texture data having a second resolution based on the texture data having the first resolution.
  • the texture data has a second resolution, and one or more video frames having the second resolution are generated through a rendering operation.
  • the super-resolution technology Compared with traditional technologies, the super-resolution technology according to the embodiments of the present invention abandons complex neural network models and instead makes full use of the terminal's hardware and software capabilities to decode video data, extract and process texture data, and render video frames. Without consuming too much CPU computing resources, it efficiently and lightly completes the real-time conversion and playback of low-definition video to high-definition video.
  • the application scenario diagram shown in Figure 1 and the video playback interface shown in Figure 2 are merely examples.
  • the application scenario and video playback interface described in the embodiments of the present disclosure are intended to more clearly illustrate the technical solutions of the embodiments of the present disclosure, and do not constitute a limitation on the technical solutions provided by the embodiments of the present disclosure.
  • Persons of ordinary skill in the art will appreciate that with the evolution of super-resolution technology and the emergence of new business scenarios, the technical solutions provided by the embodiments of the present disclosure are equally applicable to similar technical problems.
  • Fig. 3 shows a flow chart of a method 30 for processing video data according to an embodiment of the present disclosure.
  • Fig. 4 shows an architecture diagram of an apparatus for implementing the method 30 according to an embodiment of the present disclosure.
  • method 30 includes the following operations S301 to S304.
  • method 30 may be executed by the terminal 120 described in detail above, but the present disclosure is not limited thereto.
  • compressed video data is received, the compressed video data having a first resolution.
  • the terminal 120 may receive compressed video data from the server 110.
  • the original video data before compression may be video data in MP4 or HLS format.
  • the original video data before compression may be 1080P (resolution 1920 ⁇ 1080) original video data, which indicates that the long side of each video frame in the original video data includes 1920 pixels, and the short side includes 1080 pixels.
  • the server 110 may compress the 1080P original video data into 720P video data with lower clarity and smaller data volume.
  • the resolution of 720P video data is 1280 ⁇ 720, that is, the long side of each video frame in the compressed video data includes 1280 pixels, and the short side includes 720 pixels.
  • the compressed 720P video data has a lower definition, the corresponding data volume is smaller and only consumes less storage and bandwidth costs.
  • the application for playing video carried on the terminal 120 can obtain the compressed video data from the receiver of the terminal 120 via the data receiving interface of the video player by calling the video player.
  • the application for playing video carried on the terminal 120 can also interface with the receiver of the terminal 120 to obtain the video stream in real time through the streaming technology and continuously send it to the video player in real time.
  • An example of compressing the original video data by the server 110 will be further described with reference to FIG5 , of course, the present disclosure is not limited thereto.
  • a video frame refers to a single image picture that is the smallest unit in an image animation.
  • a video frame is a still picture, and continuous frames form an animation, such as a television image.
  • Each frame is a still image, and displaying multiple frames quickly and continuously creates the illusion of motion.
  • the video frame here can be any one of an I frame, a P frame, and a B frame.
  • an I frame can refer to an independent frame that can be independently decoded without referring to other images based on its own information.
  • a P frame can refer to an "inter-frame prediction coding frame", which requires reference to different parts of the previous I frame and/or P frame to be decoded.
  • a B frame can refer to a "bidirectional prediction coding frame", which is decoded using its previous and subsequent frames as reference frames.
  • the present disclosure is not limited to this.
  • the compressed video data may be decapsulated and decoded by a decoding engine in a video player to obtain one or more video frames.
  • the decoding engine will perform the following process: using the central processing unit of the terminal to decapsulate the compressed video data into a video code stream; using the graphics processing unit of the terminal to decode the video code stream into one or more video frames. These video frames will then be used to obtain texture data having the first resolution.
  • the present disclosure is not limited to this.
  • the decoding engine can be used to decapsulate and decode the video data.
  • the encapsulation format is also called a container, and the compressed video data will be encapsulated in the container.
  • “Decapsulation” refers to the process of obtaining video bitstreams, audio bitstreams, subtitles and metadata information from the "container”. This process is usually processed by the terminal's central processing unit (CPU).
  • Decoding includes hardware decoding and software decoding, and its function is to decode the code stream into one or more video frames (YUV data or RGB data). These video frames include pixel values for each pixel displayed on the display of the terminal.
  • “hardware decoding” refers to processing video data based on GPU (graphics processing unit)
  • “software decoding” refers to processing video data based on CPU (central processing unit).
  • the decoding engine can use MediaExtractor to decapsulate, and then use MediaCodec to hardware decode the decapsulated video data to obtain the above-mentioned one or more video frames.
  • FFMpeg can also be used on Android to software decode the decapsulated video data.
  • the decoding engine can use VideoToolbox to hardware decode the decapsulated video data, or use FFMpeg to software decode the decapsulated video data.
  • VideoToolbox to hardware decode the decapsulated video data
  • FFMpeg to software decode the decapsulated video data.
  • the super-resolution processing engine can obtain texture data from the above-mentioned video frame.
  • the texture data has the same resolution as the video frame.
  • Texture data is a structured data for describing the color information of the video frame, which can be stored and used in the form of a data object including multiple images with the same image format. These multiple images with the same image format are also called texture images. Texture data can not only provide input of texture information to various shaders (such as vertex shaders and fragment shaders), but can also be used as a rendering object.
  • Texture data with a first resolution is structured data for describing the color information of the video frame, which can include multiple first texture images with the same image format, and each first texture image has a first resolution.
  • each video frame includes a plurality of pixels and the size, format and dimension of each video frame are the same.
  • the super-resolution processing engine extracts some images with specific constraints from these video frames as texture images, and places them into a container identified by the texture identifier.
  • the texture data may also optionally include a texture type, a texture size and an image format of the above-mentioned image with specific constraints.
  • the texture type defines the arrangement of the texture image
  • the texture size defines the size of the texture image
  • the image format defines the format of the texture image.
  • the present disclosure is not limited to this.
  • the process of obtaining texture data can be briefly described as follows: first, the above video frame is bound to a surface (Surface) object (Object) (e.g., for rendering), and then a texture identifier (also known as a texture ID) is obtained through a surface texture (Surface) object corresponding to the surface object, wherein the texture identifier is associated with texture data having a first resolution, and the texture data having a first resolution is obtained based on the texture identifier.
  • a surface object is a data structure that provides a canvas (Canvas) object to an application (e.g., a video player) for subsequent video rendering and presentation, and maintains an image cache object inside the surface object for image display.
  • a surface texture object is an object that combines a surface object and a texture object, and is used to convert one or more video frames (also known as an image stream) into texture data so that the texture data can be processed by hardware (e.g., an OpenGL-based embedded system OpenGL ES).
  • OpenGL ES is a cross-platform, fully functional 2D and 3D graphics application program interface API, which is mainly designed for a variety of embedded systems-including consoles, mobile phones, handheld devices, home appliances, and automobiles, etc.
  • OpenGL ES is a subset of OpenGL adapted for desktop computers, creating a flexible low-level interactive interface between software and graphics acceleration.
  • OpenGL ES includes floating-point and fixed-point system descriptions as well as local window system specifications for portable devices.
  • texture data with a second resolution is generated based on the texture data with the first resolution, where the second resolution is higher than the first resolution.
  • various lightweight super-resolution algorithms can be used to generate texture data with a second resolution.
  • These lightweight super-resolution algorithms can rely on the ability of the software and hardware at the terminal to extract and process texture data. Without consuming too much CPU computing resources, they mainly rely on the GPU to complete the real-time conversion of low-definition texture data to high-definition texture data in a lightweight and efficient manner.
  • the following is a lightweight super-resolution algorithm suitable for GPU processing in the form of an example.
  • These super-resolution algorithms usually only perform linear or nonlinear transformations on each pixel value in the texture data with a first resolution, and do not rely on a neural network model. Of course, those skilled in the art should understand that the present disclosure is not limited to this.
  • a texture with a second resolution may be generated by detecting and removing distortion from a low-resolution signal. Data, this process is also called anti-aliasing. Since the server 110 often uses a downsampling algorithm in the process of compressing the original video data, it may cause obvious errors in the texture data of 2-3 pixels somewhere in the video frame, and make some continuously changing line segments or color blocks become discontinuous pixels. At the same time, the texture data of the first resolution may cause jagged edges on the bevel of the texture data.
  • the super-resolution processor in the super-resolution processing engine can detect these erroneous pixels and the set of jagged pixels at the bevel in the texture data with the first resolution, and sample and mix these pixels with the pixels adjacent to these pixels to obtain texture data with the second resolution.
  • base plate data with a second resolution may be generated by an upsampling algorithm and/or an interpolation algorithm, and the base plate data may be adjusted in detail (for example, using the anti-aliasing process described above to adjust some pixels on the base plate data) to obtain texture data with a second resolution.
  • algorithms with lower computational complexity such as sinc, lanczos, and dcci 2pass may be used to generate base plate data, but the present disclosure is not limited thereto.
  • algorithms based on multi-channel conjugation or multi-frame sub-pixels may also be used to adjust some pixels on the base plate data to improve the quality of the base plate data. The present disclosure is not limited thereto.
  • the gradient distribution in the texture data with the first resolution can be adjusted by using an additional gradient transform algorithm to generate texture data with the second resolution.
  • An example of the process of adjusting the gradient distribution in the texture data with the first resolution using the additional gradient transform algorithm will be further described with reference to FIG. 7 , but of course, the present disclosure is not limited thereto.
  • one or more video frames with the second resolution are generated through a rendering operation based on the texture data with the second resolution.
  • Rendering refers to the process of generating a video frame based on a model by a terminal.
  • a model is a description of a video frame strictly defined in a language or data structure, and includes information such as geometry, viewpoint, texture, lighting, and shadow.
  • a GPU of the terminal 120 is used to complete the rendering operation to adapt to the scene of the real-time video.
  • the texture data with the second resolution can be used as data for describing surface details (including surface color details) in the above-mentioned model.
  • the terminal 120 will trigger multiple GPU units based on a vertical synchronization signal to render the picture in the video frame based on the texture data with the second resolution.
  • These GPU units can perform operations such as vertex shading, shape assembly, geometry shading, rasterization, and fragment shader in sequence to calculate the RGB (Red, Green, blue) value of each pixel in the video frame, and then obtain the video frame that the terminal 120 is about to display.
  • RGB Red, Green, blue
  • the vertical synchronization signal is a synchronization signal for the GPU on the terminal to calculate a frame of the picture, which indicates the end of the previous frame and the beginning of the next frame, that is, usually a frame of the picture will be rendered using the time interval between two adjacent vertical synchronization signals.
  • An example of a process of rendering texture data with a second resolution using the hardware environment of the terminal to obtain one or more video frames will be further described with reference to Figures 8 and 9. Of course, the present disclosure is not limited to this.
  • operations S301 to S304 may be performed sequentially, in parallel, or in other adjusted orders.
  • the embodiments of the present disclosure do not limit the execution order of the various steps, and may be adjusted according to actual conditions.
  • method 30 may selectively perform some of the operations in operations S301 to S304, or may perform some additional operations in addition to operations S301 to S304, and the embodiments of the present disclosure do not limit this.
  • operation S305 is optionally included.
  • one or more video frames with the second resolution are displayed. Specifically, these video frames may be displayed on the user interface shown in FIG. 2, and the present disclosure is not limited thereto.
  • the various embodiments of the present disclosure save network transmission bandwidth and improve video transmission efficiency by compressing the video on the server side and sending videos with smaller data volume and lower definition.
  • the lightweight super-resolution algorithm also known as super-resolution algorithm
  • the video storage cost of the mobile terminal and ensure the video viewing effect.
  • the present disclosure also provides a device for processing video data, the device comprising: a receiving module, configured to receive compressed video data, the compressed video data having a first resolution; an extraction module (such as the module for extracting texture data shown in FIG. 4 ), configured to obtain texture data having the first resolution based on one or more video frames in the compressed video data; a super-resolution processing module (such as the super-resolution processor in FIG. 4 ), configured to generate texture data having a second resolution based on the texture data having the first resolution, the second resolution being higher than the first resolution; and a rendering module (such as the module for rendering on the screen in FIG. 4 ), configured to generate one or more video frames having the second resolution through a rendering operation based on the texture data having the second resolution.
  • a receiving module configured to receive compressed video data, the compressed video data having a first resolution
  • an extraction module such as the module for extracting texture data shown in FIG. 4
  • the super-resolution processing module such as the super-resolution processor in FIG. 4
  • a rendering module
  • the device for processing video data includes: a display and a processor, the display being configured to display a button for controlling the resolution of the video data at a first moment; the processor being configured to generate one or more video frames with a second resolution based on one or more video frames in the compressed video data in response to the button for controlling the resolution of the video data being triggered; and the display being further configured to display the one or more video frames with the second resolution at a second moment.
  • FIG. 5 is a schematic diagram illustrating a process of compressing original video data on a server according to an embodiment of the present disclosure.
  • the process in the embodiments of the present disclosure refers to a program with certain independent functions (a running activity on a certain data set). It is the basic unit of dynamic execution of the operating system. In the operating system, the process is both the basic allocation unit and the basic execution unit.
  • a process can be a container, which usually contains kernel objects, address space, statistics, and several threads. It may not actually execute code instructions itself, but is handed over to the threads in the process. Program execution.
  • ffmpeg is an open source software that can perform recording, conversion, and streaming functions of multiple audio and video formats.
  • ffmpeg integrates audio and video decoder libraries that can be used for various projects and format conversion libraries for audio and video.
  • the server 110 wakes up the process of processing video data compression.
  • a first resolution e.g., 720P
  • the thread returns the name 720P.mp4 to the main thread so that the main thread can store the compressed video accordingly, for example, to cover the original video data.
  • the process of processing the user request by the server 110 will send the compressed video data to the corresponding terminal 120 after receiving the user request that meets the requirements.
  • the amount of data can be reduced by 40%, reducing the storage and network bandwidth costs. The above process is only an example and the present disclosure does not limit this.
  • FIG. 6 is a schematic diagram illustrating acquiring texture data having a first resolution according to an embodiment of the present disclosure.
  • the acquiring of texture data having the first resolution based on one or more video frames in the compressed video data includes: binding the video frame to a surface object for rendering, and acquiring a texture identifier based on a surface texture object corresponding to the surface object; and acquiring texture data having the first resolution based on the texture identifier.
  • the acquiring of texture data having the first resolution based on the texture identifier also includes: creating a frame buffer object, and binding two-dimensional texture data to the frame buffer object to use the two-dimensional texture data as a color buffer of the frame buffer object; and activating the frame buffer object, and based on the texture identifier, sampling the hardware texture data output by the graphics processing unit of the terminal into the two-dimensional texture data to use the two-dimensional texture data as texture data having the first resolution.
  • the present disclosure is not limited to this.
  • the super-resolution processing engine has a built-in surface object for obtaining one or more video frames and writing the one or more video frames into a buffer queue (for example, BufferQueue).
  • a buffer queue for example, BufferQueue
  • the decoding engine is MediaCodec, which supports directly inputting one or more video frames into the surface object.
  • the buffer queue in the surface object will act as a producer, and the surface texture object buffer (SurfaceTexture Buffer) will act as a consumer.
  • the two communicate based on the producer-consumer model and jointly process video frames in a limited shared storage area.
  • the buffer object in the surface object can add video frames to the shared storage area in sequence according to the first-in-first-out principle, and the surface texture object extracts video frames from the shared storage area and stores them in the surface texture object buffer, and then Some video frames are converted into texture images to obtain hardware texture data.
  • the hardware texture data can be texture data of TEXTURE_EXTERNAL_OES type, which can be directly processed by an opengl-based embedded system.
  • the present disclosure is not limited to this.
  • the super-resolution processing engine renders the hardware texture data into two-dimensional texture data through pipeline operation.
  • the process may include the following operations: First, create a frame buffer object (framebuffer object, FBO) mFrameBuffer and use the two-dimensional texture data (also recorded as Texture2D texture) as the color buffer of the frame buffer object mFrameBuffer. Then, before hardware rendering, the frame buffer object is set as the activated frame buffer object mFrameBuffer, so that the hardware texture data will be sampled into the frame buffer object mFrameBuffer.
  • framebuffer object FBO
  • the two-dimensional texture data also recorded as Texture2D texture
  • the frame buffer object mFrameBuffer uses the two-dimensional texture data as the color buffer, the frame buffer object mFrameBuffer can sample the hardware texture data into a two-dimensional texture image. After the frame buffer object mFrameBuffer is closed, the two-dimensional texture data including the two-dimensional texture image will be obtained for super-resolution processing. This process indirectly samples the hardware texture data onto the two-dimensional texture data through the pipeline pipeline, thereby obtaining the required texture data.
  • the above function GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER,mFrameBuffer) binds the above frame buffer object mFrameBuffer to the target frame buffer object GLES30.GL_FRAMEBUFFER in the hardware environment, so that the hardware texture data will be sampled into the frame buffer.
  • the function GLES30.glFramebufferTexture2D(..) uses the two-dimensional texture data as the color buffer of the frame buffer object, so that the frame buffer object mFrameBuffer can sample the hardware texture data into a two-dimensional texture image.
  • the function GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER,0) deactivates the frame buffer object mFrameBuffer.
  • the above texture data acquisition process realizes the rendering of video frames onto the externally set two-dimensional texture object. This achieves decoupling between texture data and video frames, provides an operable object decoupled from other modules of the video player for the subsequent application of the super-resolution algorithm, and reduces the computational complexity of the video player.
  • FIG. 7 is a schematic diagram illustrating acquiring texture data having a second resolution according to an embodiment of the present disclosure.
  • the generating of the texture data with the second resolution based on the texture data with the first resolution includes: converting the image format of the texture data with the first resolution from the RGBA format (first color model format) to the YUV format (second color model format) to extract the Y channel data (luminance component channel data) corresponding to the texture data with the first resolution; processing the Y channel data using an additional gradient algorithm to generate the Y channel data with the second resolution; and generating the texture data with the second resolution based on the Y channel data with the second resolution.
  • the present disclosure is not limited thereto.
  • the texture data with the first resolution has width pixels on the long side and height pixels on the short side. Width and height are both positive integers.
  • the texture data with the first resolution is usually in RGBA format.
  • RGBA is a color space model consisting of an RGB color space and an Alpha channel. RGBA represents red (Red), green (Green), blue (Blue) and an Alpha channel (Alpha).
  • the texture data with the first resolution can be converted into a YUV format first.
  • YUV is another model for representing color, in which Y refers to the brightness component (i.e., Y channel data), U refers to the blue chrominance component (i.e., U channel data), and V refers to the red chrominance component (i.e., V channel data).
  • Y channel data Since the sensitivity of human eyes to Y channel data is much greater than that to U channel data and V channel data, in order to further reduce the computational complexity, only the Y channel data may be processed to adjust the details in the Y channel data to generate Y channel data with a second resolution. Then, the Y channel data with the second resolution, the U channel data with the first resolution, and the V channel data with the first resolution are fused to generate texture data with a second resolution.
  • the generating of texture data with a second resolution based on the texture data with the first resolution includes: generating Y channel data in a horizontal direction and Y channel data in a vertical direction based on the Y channel data with the second resolution; processing the Y channel data in the horizontal direction and the Y channel data in the vertical direction using an additional gradient algorithm to generate Y channel data in a horizontal direction with the second resolution and Y channel data in a vertical direction with the second resolution; generating Y channel data with a second resolution based on the Y channel data in the horizontal direction with the second resolution and the Y channel data in the vertical direction with the second resolution; and generating the texture data with the second resolution based on the Y channel data with the second resolution.
  • a gradient profile is a one-dimensional profile along the gradient direction of zero-crossing pixels in an image (e.g., the Y channel data of the first resolution), which can be represented by two parameters, namely, a shape parameter ⁇ and a profile sharpness ⁇ .
  • the shape parameter ⁇ is a parameter for controlling the overall shape of the gradient of the image
  • the profile sharpness ⁇ is a parameter for controlling the sharpness of the gradient profile. The smaller the profile sharpness ⁇ , the higher the sharpness of the gradient profile. Then, according to the ratio r, the gradient field of the Y channel data of the first resolution is converted into the gradient field of the Y channel data with the second resolution, thereby obtaining the Y channel data with the second resolution.
  • the present disclosure is not limited to this.
  • the ratio r can be calculated in advance before applying the additional gradient algorithm, so that in the whole process, the super-resolution processing engine only needs to perform simple linear calculations on each pixel in the Y channel data of the first resolution to obtain the value of each pixel in the Y channel data with the second resolution.
  • the whole process does not require the participation of a neural network, is lightweight and efficient, and can greatly save the amount of calculation of the terminal.
  • Fig. 8 is a schematic diagram showing the generation of one or more video frames with a second resolution based on texture data with a second resolution according to an embodiment of the present disclosure.
  • Fig. 9 is a schematic diagram showing a rendering operation according to an embodiment of the present disclosure.
  • generating one or more video frames with the second resolution through a rendering operation based on the texture data with the second resolution includes: binding the texture data with the second resolution to a surface object of an output video frame based on a registered callback function corresponding to the surface texture object; and generating one or more video frames with the second resolution through a rendering operation based on the surface object bound with the texture data with the second resolution.
  • the present disclosure is not limited thereto.
  • the hardware environment of the terminal 120 first needs to obtain the texture data with the second resolution generated in operation S303.
  • the example process of obtaining the texture data with the second resolution can be briefly described as follows: First, set the registration callback function for the above-mentioned surface texture object (SurfaceTexture).
  • the registration callback function calls the function by using a function pointer, and passes the function pointer as a parameter to decouple the caller and the callee, so as to avoid establishing a separate thread to determine the processing of the texture data.
  • the function setOnFrameAvailableListener can be used to enable the surface texture object to listen to the decoded texture data with the first resolution, and directly obtain the texture data with the second resolution through the registration callback function. Since operation S303 has been decoupled from the native operation inside the video player Mediacodc, the texture data with the second resolution can no longer be directly rendered using Mediacodc, so it is necessary to design a rendering operation specific to the texture data with the second resolution (as shown in FIG9 ) to bind the texture data with the second resolution to the surface object of the output video frame and render these video frames with the second resolution.
  • generating one or more video frames having the second resolution through a rendering operation includes: sequentially performing top-down processing on the texture data having the second resolution. Point transformation operations, primitive assembly operations, and rasterization operations are performed to generate one or more video frames having the second resolution.
  • the texture data with the second resolution is transformed using a vertex shader.
  • the graphics in all video frames are strings of data in the computer, which can be represented as an N*3 matrix in a three-dimensional coordinate system, where N is the number of vertices and 3 represents the x, y, and z position coordinates.
  • Vertex transformation is to translate, reduce, enlarge, rotate, and perform other operations on vertices in the coordinate system through a series of matrix transformations.
  • the vertex shader is a vertex shader program source code/executable file used to describe the model transformation, view transformation, projection transformation, and lighting (Transform and lighting) processing that needs to be performed on the vertex.
  • the above process can be summarized using the following pseudo code, but the present disclosure is certainly not limited to this.
  • the texture data (which has the second resolution) after the vertex transformation can be assembled by the primitive assembler.
  • the primitive assembly operation enables the terminal to connect the vertices after the matrix transformation. For example, assuming three points after the vertex transformation, after the primitive assembly operation, it can be determined whether to draw the three points into a triangle or two straight lines.
  • the texture data assembled by the primitives can be rasterized using a fragment shader.
  • the fragment shader is a fragment shader program source code/executable file used to describe operations (such as color mixing) performed on fragments.
  • the rasterization operation can perform color rendering or texture rendering on the primitives.
  • the rasterized primitives can be actually visible after being output to the frame buffer. That is, the rasterization operation converts the coordinates of the graphics into screen pixel coordinates, and finally converts the mathematical description of the primitives into fragments for display on the screen, and then the video frame can be displayed on the screen through the frame buffer.
  • the above process can be summarized using the following pseudo code, of course, the present disclosure is not limited to this.
  • it may also include: displaying a button for controlling the resolution of video data; in response to the button for controlling the resolution of video data being triggered, generating one or more video frames with the second resolution based on one or more video frames in the compressed video data; and displaying the one or more video frames with the second resolution.
  • an electronic device for implementing the method according to the embodiment of the present disclosure.
  • Fig. 10 shows a schematic diagram of an electronic device 2000 according to an embodiment of the present disclosure.
  • the electronic device 2000 may include one or more processors 2010 and one or more memories 2020.
  • the memory 2020 stores a computer-readable code, and when the computer-readable code is run by the one or more processors 2010, the method described above may be executed.
  • the processor in the embodiments of the present disclosure may be an integrated circuit chip with signal processing capabilities.
  • the above processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • the disclosed methods, operations and logic block diagrams in the embodiments of the present disclosure may be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc., which may be an X86 architecture or an ARM architecture.
  • various example embodiments of the present disclosure may be implemented in hardware or dedicated circuits, software, firmware, logic, or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device.
  • controller microprocessor
  • various aspects of the embodiments of the present disclosure are illustrated or described as block diagrams, flow charts, or using some other graphical representation, it will be understood that the blocks, devices, systems, techniques, or methods described herein may be implemented as non-limiting examples in hardware, software, firmware, dedicated circuits or logic, general purpose hardware, or a controller. or other computing device, or some combination thereof.
  • the method or apparatus according to the embodiment of the present disclosure may also be implemented with the aid of the architecture of the computing device 3000 shown in FIG11.
  • the computing device 3000 may include a bus 3010, one or more CPUs 3020, a read-only memory (ROM) 3030, a random access memory (RAM) 3040, a communication port 3050 connected to a network, an input/output component 3060, a hard disk 3070, and the like.
  • the storage device in the computing device 3000 such as the ROM 3030 or the hard disk 3070, may store various data or files used for processing and/or communication of the method provided in the present disclosure and program instructions executed by the CPU.
  • the computing device 3000 may also include a user interface 3080.
  • the architecture shown in FIG11 is only exemplary. When implementing different devices, one or more components in the computing device shown in FIG11 may be omitted according to actual needs.
  • a computer-readable storage medium is also provided.
  • Fig. 12 shows a schematic diagram of a storage medium 4000 according to the present disclosure.
  • the computer storage medium 4020 stores computer readable instructions 4010.
  • the computer readable storage medium in the embodiment of the present disclosure may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
  • the volatile memory may be a random access memory (RAM), which is used as an external cache.
  • RAM synchronously linked dynamic random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDRSDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronously linked dynamic random access memory
  • DR RAM direct memory bus random access memory
  • the present disclosure also provides a computer program product or a computer program, which includes a computer instruction stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the method according to the present disclosure.
  • each box in the flowchart or block diagram may represent a module, a program segment, or a portion of code, which contains one or more executable instructions for implementing the specified logical functions.
  • the system, method and computer program product may be implemented as an alternative.
  • the functions noted in the blocks may also occur in a different order than that noted in the accompanying drawings. For example, two blocks shown in succession may actually be executed substantially in parallel, or they may sometimes be executed in reverse order, depending on the functions involved.
  • each block in the block diagram and/or flow chart, and combinations of blocks in the block diagram and/or flow chart may be implemented with a dedicated hardware-based system that performs the specified functions or operations, or may be implemented with a combination of dedicated hardware and computer instructions.
  • various example embodiments of the present disclosure may be implemented in hardware or dedicated circuits, software, firmware, logic, or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device.
  • firmware or software may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device.
  • blocks, devices, systems, techniques, or methods described herein may be implemented in hardware, software, firmware, dedicated circuits or logic, general purpose hardware or controllers or other computing devices, or some combination thereof as non-limiting examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computer Graphics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Image Generation (AREA)

Abstract

本公开提供一种处理视频数据的方法及装置、显示视频数据的方法及装置、计算机设备和计算机可读存储介质,该方法包括:接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率;基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据;基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据,所述第二分辨率高于所述第一分辨率;以及基于所述具有第二分辨率的纹理数据,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧。本公开的各个实施例通过在终端播放视频时,通过轻量级的超分算法,增强视频的画质,节省移动终端的视频存储成本,并保证了视频观看效果。

Description

处理视频数据的方法、装置、计算机设备和存储介质
本申请要求2022年12月09日提交的申请号为202211589458.6、发明名称为“处理视频数据的方法及装置”的中国专利申请的优先权。
技术领域
本公开涉及云技术领域,具体涉及一种处理视频数据的方法、装置、计算机设备和存储介质。
背景技术
超分辨率(Super-Resolution)技术(又简称为超分技术)已经具有广泛的实际应用,如医学图像重建、人脸图像重建、超高清电视、超高清视频播放等。针对视频应用,超分辨率(Super-Resolution)技术可以通过硬件或软件方法提高原视频中的一个或多个帧的分辨率,将低分辨率视频重建成高分辨率视频。
然而目前的超分辨率技术的存储成本高、运算量较大、算法耗时,导致使用超分辨率技术的移动终端在播放实时视频时,视频出现卡顿。
发明内容
根据本公开的第一方面,公开了一种处理视频数据的方法,该方法包括:接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率;基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据;基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据,所述第二分辨率高于所述第一分辨率;以及基于所述具有第二分辨率的纹理数据,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧。
根据本公开的第二方面,公开了一种处理视频数据的装置,所述装置包括:接收模块,被配置为接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率;提取模块,被配置为基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据;超分处理模块,被配置为基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据,所述第二分辨率高于所述第一分辨率;以及渲染模块,被配置为基于所述具有第二分辨率的纹理数据,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧。
根据本公开的第三方面,公开了一种计算机设备,该计算机设备包括:一个或多个处 理器;以及一个或多个存储器,其中存储有计算机可执行程序,当由所述处理器和所述显示器执行所述计算机可执行程序时,执行如上所述的方法。
根据本公开的第四方面,公开了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述各个方面或者上述各个方面的各种可选实现方式中提供的方法。
根据本公开的第五方面,公开了一种计算机可读存储介质,其上存储有计算机可执行指令,所述指令在被处理器执行时用于实现上述方法。
本公开的各个实施例通过在服务端压缩视频,通过下发数据量较小且清晰度较低的视频节省了网络传输带宽,提高了视频的传输效率。同时,在终端播放视频时,通过轻量级的超分辨率算法(又称为超分算法),增强视频的画质,节省移动终端的视频存储成本,并保证了视频观看效果。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本公开的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。
图1示出了根据本公开实施例的应用场景示意图。
图2示出了根据本公开实施例的用户界面。
图3示出了根据本公开实施例的处理视频数据的方法的流程图。
图4示出了根据本公开实施例的处理视频数据的装置的架构图。
图5是示出根据本公开的实施例的在服务器上对原始视频数据进行压缩的进程的示意图。
图6是示出根据本公开的实施例的获取具有第一分辨率的纹理数据的示意图。
图7是示出根据本公开的实施例的获取具有第二分辨率的纹理数据的示意图。
图8是示出根据本公开的实施例的基于具有第二分辨率的纹理数据生成具有第二分辨率的一个或多个视频帧的示意图。
图9是示出根据本公开的实施例的渲染操作的示意图。
图10示出了根据本公开实施例的电子设备的示意图。
图11示出了根据本公开实施例的示例性计算设备的架构的示意图。
图12示出了根据本公开实施例的存储介质的示意图。
具体实施方式
为了使得本公开的目的、技术方案和优点更为明显,下面将参照附图详细描述根据本公开的示例实施例。显然,所描述的实施例仅仅是本公开的一部分实施例,而不是本公开的全部实施例,应理解,本公开不受这里描述的示例实施例的限制。
在本说明书和附图中,具有基本上相同或相似操作和元素用相同或相似的附图标记来表示,且对这些操作和元素的重复描述将被省略。同时,在本公开的描述中,术语“第一”“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种示例的范围的情况下,第一数据可以被称为第二数据,并且类似地,第二数据可以被称为第一数据。第一数据和第二数据都可以是数据,并且在某些情况下,可以是单独且不同的数据。本申请中术语“至少一个”的含义是指一个或多个,本申请中术语“多个”的含义是指两个或两个以上,例如,多个音频帧是指两个或两个以上的音频帧。
应理解,在本文中对各种示例的描述中所使用的术语只是为了描述特定示例,而并非旨在进行限制。如在对各种示例的描述和所附权利要求书中所使用的那样,单数形式“一个(“a”“an”)”和“该”旨在也包括复数形式,除非上下文另外明确地指示。
还应理解,本文中所使用的术语“和/或”是指并且涵盖相关联的所列出的项目中的一个或多个项目的任何和全部可能的组合。术语“和/或”,是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本申请中的字符“/”,一般表示前后关联对象是一种“或”的关系。
还应理解,在本申请的各个实施例中,各个过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。还应理解,根据(基于)A确定B并不意味着仅仅根据(基于)A确定B,还可以根据(基于)A和/或其它信息来确定B。
还应理解,术语“包括”(也称“includes”、“including”、“Comprises”和/或“Comprising”)当在本说明书中使用时指定存在所陈述的特征、整数、操作、操作、元素、和/或部件,但是并不排除存在或添加一个或多个其他特征、整数、操作、操作、元素、部件、和/或其分组。
还应理解,术语“如果”可被解释为意指“当...时”(“when”或“upon”)或“响应于确定”或“响应于检测到”。类似地,根据上下文,短语“如果确定...”或“如果检测到[所陈述的条件或事件]”可被解释为意指“在确定...时”或“响应于确定...”或“在检测到[所陈述的条件或事件] 时”或“响应于检测到[所陈述的条件或事件]”。
为便于描述本公开,以下介绍与本公开有关的概念。
在对本公开进行详细描述之前,为了帮助对本公开的技术方案的理解,下面首先对本公开要用到的术语进行解释。
云技术(Cloud technology):云技术是指在广域网或局域网内将硬件、软件、网络等系列资源统一起来,实现数据的计算、储存、处理和共享的一种托管技术。云技术基于云计算商业模式应用的网络技术、信息技术、整合技术、管理平台技术、应用技术等的总称,可以组成资源池,按需所用,灵活便利。云计算技术将变成重要支撑。技术网络系统的后台服务需要大量的计算、存储资源,如视频网站、图片类网站和更多的门户网站。伴随着互联网行业的高度发展和应用,将来每个物品都有可能存在自己的识别标志,都需要传输到后台系统进行逻辑处理,不同程度级别的数据将会分开处理,各类行业数据皆需要强大的系统后盾支撑,只能通过云计算来实现。
分辨率(resolution):分辨率泛指测量或显示系统对细节的分辨能力,其指示能够分辨一帧视频帧中的两个点或线的能力。分辨率还可以表征影像的清晰度。分辨率越高代表影像质量越好,越能表现出更多的细节;但相对的,因为纪录的信息越多,文件也就会越大。描述分辨率的单位包括DPI(点每英寸)、LPI(线每英寸)和PPI(像素每英寸)。其中PPI为常用的单位,其描述单位长度内的像素数量与单位长度的比值。PPI又称为像素密度,像素密度越高,说明像素越密集,5PPI表示每英寸有5个像素,500PPI表示每英寸有500个像素,PPI的数值高,图片和视频的清晰度就更高。
卷积神经网络(Convolutional Neural Networks,CNN):卷积神经网络是一类包含卷积计算且具有深度结构的前馈神经网络(Feedforward Neural Networks),是深度学习(deep learning)的代表算法之一。卷积神经网络具有表征学习(representation learning)能力,能够按其阶层结构对输入信息进行平移不变分类(shift-invariant classification),因此也被称为“平移不变人工神经网络(Shift-Invariant Artificial Neural Networks,SIANN)”。
目前,为使得移动终端能够播放高分辨率的视频,工业界通常会在服务器端,利用离线超分技术对分辨率较低的视频进行处理,以生成高分辨率的视频,然后再将该高分辨率的视频下发至移动终端。然而,该方案应用于实时视频时难度较高,并且经过超分处理后的视频将占用更多的存储资源,导致服务器端的存储成本增加。同时由于下发高分辨率的视频将占用过多的网络带宽,其也会导致网络的拥塞和网络资源的浪费。
工业界还提出了另一种解决思路,其尝试在移动终端预置大量的神经网络训练模型,利用深度学习算法(例如,CNN算法)来在移动终端处对低分辨率的视频进行超分处理。然而,深度学习算法对模型和算子的大小和质量的要求都比较高。在模型较大的场景下, 会导致移动终端消耗过多的计算资源,导致移动终端在播放视频时出现卡顿。
因此,为了弥补以上不足,本公开提供了一种处理视频数据的方法、装置、存储介质和计算机设备,该方法包括:接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率;基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据;基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据,所述第二分辨率高于所述第一分辨率;以及基于所述具有第二分辨率的纹理数据,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧。
本公开的各项实施例通过在服务端压缩视频,通过下发数据量较小且清晰度较低的视频节省了网络传输带宽,提高了视频的传输效率。同时,在终端播放视频时,通过轻量级的超分辨率算法(又称为超分算法),增强视频的画质,节省移动终端的视频存储成本,并保证了视频观看效果。
图1示出了根据本公开实施例的应用场景100的示意图,其中示意性地示出了服务器110和多个终端120(例如移动终端)。视频数据可以存储于移动终端120或服务器110上,终端和服务器可以通过有线或无线通信方式进行直接或间接地连接,由此视频数据能够在移动终端120和服务器110间传输。
终端120可以是手机、平板电脑、笔记本电脑、台式计算机、个人计算机(PC,Personal Computer)、智能音箱或智能手表等具备储存单元、安装有微处理器的终端,但并不局限于此。服务器110可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(CDN,Content Delivery Network)、以及大数据和人工智能平台等基础云计算服务的云服务器。
服务器110可以采用各种数据压缩技术,对原始视频数据进行处理,以生成压缩后的视频。经过压缩后的视频数据可以由更小的存储空间进行存储,并使用更少的网络资源进行传输。服务器110将在完成原始视频数据的压缩之后将该压缩后的视频在服务器上的存储地址发布在门户网站上。之后,服务器110将根据终端120对视频数据服务的请求,向一个或多个终端120传输压缩后的视频。
终端120上安装了播放视频的应用,而服务器110可以是终端中部署的应用程序的后台服务器,用于与运行播放视频的应用的终端进行交互,以向该终端(或终端中部署的应用程序)提供计算和应用服务支持。
作为一个示例,终端120上搭载的用于播放视频的应用可以是集音视频通讯功能一体的产品,以提供移动端、PC端、Web端、小程序端等多平台直播、点播、短视频、实时音视频、美颜特效等音视频及通信能力。超分辨率技术(以下简称为超分技术)相关的模 块可以被集成在支持上述功能的软件开发工具包(Software Development Kit,SDK)中,以实现一次接入处处调用的效果。更具体的,软件开发工具包是一些软件工程师为特定的软件包、软件框架、硬件平台、操作系统等建立应用软件时的开发工具的集合。
例如,超分辨率技术相关的模块可以以SDK插件(Plug-in)的形态集成到应用中。SDK插件作为一种遵循一定规范的应用程序接口编写出来的程序,其可以同时支持多个平台(例如IOS平台或Android平台),并调用这些平台上的函数库或数据,将来自服务器110端的压缩后的视频数据转换为高清的视频数据。
如图2所示,在终端120播放视频的用户界面上,用户可以通过点击“本地超分”按钮,调用部署在终端120上的超分辨率技术相关的模块,通过根据本公开实施例的轻量高效的超分技术,实现低清视频到高清视频的实时转换和播放。
本公开提供了显示视频数据的方法,所述方法包括:接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率;显示用于控制视频数据的分辨率的按钮;响应于所述用于控制视频数据的分辨率的按钮被触发,基于所述经压缩的视频数据中的一个或多个视频帧,生成具有第二分辨率的一个或多个视频帧,所述第二分辨率高于所述第一分辨率;以及显示所述具有第二分辨率的一个或多个视频帧。
对应地,本公开还提供一种显示视频数据的装置,所述装置包括接收器(接收模块)、处理器和显示器,其中,所述接收器,被配置为接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率;所述显示器,被配置为在第一时刻,显示用于控制视频数据的分辨率的按钮;所述处理器,被配置为响应于所述用于控制视频数据的分辨率的按钮被触发,基于所述经压缩的视频数据中的一个或多个视频帧,生成具有第二分辨率的一个或多个视频帧,所述第二分辨率高于所述第一分辨率;以及所述显示器还被配置为在第二时刻,显示所述具有第二分辨率的一个或多个视频帧。
本公开还提供了处理视频数据的方法,所述方法包括:接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率;基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据;基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据,所述第二分辨率高于所述第一分辨率;以及基于所述具有第二分辨率的纹理数据,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧。
对应地,本公开还提供一种处理视频数据的装置,所述装置包括:接收模块,被配置为接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率;提取模块,被配置为基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据;超分处理模块,被配置为基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据,所述第二分辨率高于所述第一分辨率;以及渲染模块,被配置为基于所述具有第 二分辨率的纹理数据,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧。
相比于传统技术,根据本公开实施例的超分技术摒弃了复杂的神经网络模型,而是充分利用终端处软硬件的解码视频数据的能力、提取和处理纹理数据的能力、以及渲染视频帧的处理能力,在不消耗过多的CPU的计算资源的情况下,轻量高效地完成了低清视频到高清视频的实时转换和播放。
图1所示的应用场景示意图以及图2所示的视频播放界面仅仅是一个示例,本公开实施例描述的应用场景以及视频播放界面是为了更加清楚的说明本公开实施例的技术方案,并不构成对于本公开实施例提供的技术方案的限定,本领域普通技术人员可知,随着超分技术的演变和新业务场景的出现,本公开实施例提供的技术方案对于类似的技术问题,同样适用。
以下结合图3至图11对根据本公开实施例进行更详细介绍。
图3示出了根据本公开实施例的处理视频数据的方法30的流程图。图4示出了根据本公开实施例的实现方法30的装置的架构图。
如图3所示,方法30包括以下操作S301至操作S304。可选地,方法30可以由以上详述的终端120执行,当然本公开并不以此为限。
在操作S301中,接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率。
可选地,参考图4,终端120可以从服务器110中接收经压缩的视频数据。作为一个示例,压缩前的原始视频数据可以是MP4或HLS格式的视频数据。例如,压缩前的原始视频数据可以是一个1080P(分辨率为1920×1080)的原始视频数据,其指示该原始视频数据中的每个视频帧的长边包括1920个像素,而短边包括1080个像素。如果终端120要直接从服务器110接收该原始视频数据,那么终端120和服务器110都需要消耗大量的存储和带宽成本。为此,服务器110可以将该1080P的原始视频数据压缩成清晰度较低且数据量较小的720P视频数据。720P的视频数据的分辨率为1280×720,也即压缩后的视频数据中的每个视频帧的长边包括1280个像素,而短边包括720个像素。压缩后的720P的视频数据虽然清晰度较低,但其对应的数据量较小,仅需要消耗较少的存储和带宽成本。
作为一个示例,如图4所示,终端120上搭载的用于播放视频的应用可以通过调用视频播放器以经由该视频播放器的数据接收接口从终端120的接收器获取上述的经压缩的视频数据。对于实时视频播放的场景,终端120上搭载的用于播放视频的应用还可以与终端120的接收器接口以通过流式传输技术实时获取视频流并连续实时地发送至视频播放器。之后将参考图5进一步描述服务器110压缩原始视频数据的一个示例,当然,本公开并不以此为限。
在操作S302中,基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述 第一分辨率的纹理数据。
例如,视频帧是指影像动画中最小单位的单幅影像画面。其中,一帧的视频帧就是一副静止的画面,连续的帧就形成动画,如电视图像等。每一帧都是静止的图像,快速连续地显示多个帧便形成了运动的假象。这里的视频帧可以是I帧、P帧和B帧中的任意一项。其中,I帧可以是指通过自带的信息,无需参考其它图像便可独立进行解码的独立帧。P帧可以是指“帧间预测编码帧”,需要参考前面的I帧和/或P帧的不同部分,才能进行解码。B帧可以是指“双向预测编码帧”,以其前帧和后帧作为参考帧进行解码。当然本公开并不以此为限。
作为一个示例,如图4所示,可以通过视频播放器中的解码引擎对所述经压缩的视频数据进行解封装和解码来获取一个或多个的视频帧。在一个示例性的实施过程中,该解码引擎将执行以下过程:利用终端的中央处理单元,将所述经压缩的视频数据解封装成视频码流;利用终端的图形处理单元,将所述视频码流解码成一个或多个视频帧。之后将利用这些视频帧,获取具有所述第一分辨率的纹理数据。当然本公开并不以此为限。
其中,解码引擎可以用于对视频数据进行解封装和解码。封装格式也叫做容器,上述经压缩的视频数据将被封装在容器中。“解封装”是指从“容器”中获取视频码流、音频码流、字幕和元数据信息的过程。该过程通常使用终端的中央处理单元(CPU)处理。
“解码”包括硬件解码和软件解码,其作用是将码流解码成一个或多个的视频帧(YUV数据或RGB数据)。这些视频帧包括用于在终端的显示器上显示的每个像素点的像素值。其中,“硬件解码”是指基于GPU(图形处理单元)来处理视频数据,而“软件解码”是指基于CPU(中央处理单元)来处理视频数据。例如,在Android平台上,解码引擎可以使用MediaExtractor来进行解封装,然后使用MediaCodec来硬件解码经解封装后的视频数据,从而获取上述的一个或多个视频帧。此外,在Android也可以使用FFMpeg来软件解码经解封装后的视频数据。又例如,在IOS平台上,解码引擎可以使用VideoToolbox来硬件解码经解封装后的视频数据,或者使用FFMpeg来软件解码经解封装后的视频数据。本公开下文的各个实施例均以硬件解码为例进行详细说明,当然,本公开并不以此为限。
如图4所示,超分处理引擎可以从上述视频帧中获取纹理数据。纹理数据与视频帧具有相同的分辨率。纹理数据是一种用于描述视频帧的颜色信息的结构化的数据,其可以以一种包括了具有相同图像格式的多个图像的数据对象的形式被存储和使用。这些具有相同图像格式的多个图像又称为纹理图像。纹理数据不仅可以向各种着色器(例如顶点着色器和片段着色器)提供纹理信息的输入,还可以被用作渲染对象。具有第一分辨率的纹理数据为用于描述所述视频帧的颜色信息的结构化的数据,其可以包括具有相同图像格式的多个第一纹理图像,每个第一纹理图像具有第一分辨率。
根据本公开的实施例的纹理数据具有纹理标识符,其用于标识该纹理数据。正如上述,每个视频帧包括多个像素并且各视频帧的尺寸、格式和维度相同。而为了描述这些视频帧,超分处理引擎从这些视频帧中提取了部分具有特定约束的图像作为纹理图像,并将其放置入以纹理标识符为标识的容器内。此外,纹理数据还可选地包括纹理类型、纹理大小和上述具有特定约束的图像的图像格式。其中,纹理类型定义了纹理图像的排列方式,纹理大小定义了纹理图像的大小,而图像格式则限定纹理图像的格式。当然本公开并不以此为限。
作为一个示例,获取纹理数据的过程可以被简述如下:首先将上述视频帧绑定至(例如用于渲染的)表面(Surface)对象(Object),然后通过该表面对象对应的表面纹理(Surface)对象获取纹理标识符(又称为纹理ID),所述纹理标识符与具有第一分辨率的纹理数据相关联,并基于所述纹理标识符获取具有第一分辨率的纹理数据。其中,表面对象是一种数据结构体,其向应用(例如,视频播放器)提供了画布(Canvas)对象以用于后续的视频渲染和呈现,并在表面对象内部维护了一个图像缓存对象以用于图像的显示。而表面纹理对象是一种将表面对象和纹理对象进行组合的对象,其用于将一个或多个视频帧(又称为图像流)转换为纹理数据,以使得纹理数据可以被硬件处理(例如基于OpenGL的嵌入系统OpenGL ES)。本文主要以OpenGL ES为例进行说明,但是本领域技术人员应当理解本公开并不以此为限。其中,OpenGL ES是一种跨平台的功能完善的2D和3D图形应用程序接口API,主要针对多种嵌入式系统专门设计-包括控制台、移动电话、手持设备、家电设备和汽车等等。OpenGL ES是适配于台式电脑的OpenGL子集组成,创造了软件与图形加速间灵活的底层交互接口。OpenGL ES包含浮点运算和定点运算系统描述以及针对便携设备的本地视窗系统规范。
之后将参考图6进一步描述超分处理引擎获取纹理数据的过程的一个示例,当然,本公开并不以此为限。
在操作S303中,基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据,所述第二分辨率高于所述第一分辨率。
根据本公开实施例的方法30可以采用各种轻量级的超分算法来生成具有第二分辨率的纹理数据。这些轻量级的超分算法可以依赖于终端处软硬件的提取和处理纹理数据的能力,在不消耗过多的CPU的计算资源的情况下,主要依赖于GPU,以轻量高效地完成了低清纹理数据到高清纹理数据的实时转换。以下以示例的形式给出适用于GPU处理的轻量级的超分算法。这些超分算法通常仅对具有第一分辨率的纹理数据中的各个像素值进行线性变换或非线性变换,而不依赖于神经网络模型。当然本领域技术人员应当理解本公开并不以此为限。
例如,可以通过检测和去除低分辨率的信号畸变的方式来生成具有第二分辨率的纹理 数据,这一过程又称为抗锯齿过程(anti-aliasing)。由于服务器110在对原始视频数据进行压缩的过程中,往往采用降采样算法,其可能导致视频帧中某处具有2-3个像素的纹理数据出现明显错误,并使得一些连续变化的线段或色块变为不连续的像素点。同时第一分辨率的纹理数据可能使得纹理数据中斜边的边缘存在锯齿。为了消除以上影响,如图4所示,超分处理引擎中的超分处理器可以通过检测具有第一分辨率的纹理数据中的这些错误像素点和斜边处呈锯齿样的像素点集合,并将这些像素点与这些像素点临近的像素点进行采样混合,以得到具有第二分辨率的纹理数据。
又例如,可以通过上采样算法和/或插值算法生成具有第二分辨率的底版数据,并将该底版数据进行细节上的调整(例如,利用以上描述的抗锯齿过程来调整底版数据上的部分像素)以获取具有第二分辨率的纹理数据。考虑到终端120的限制,可以使用sinc、lanczos、dcci 2pass等计算量较低算法来生成底版数据,当然本公开并不以此为限。又例如,除了利用抗锯齿过程来调整底版数据上的部分像素以外,还可以利用基于多通道共轭或多帧子像素的算法来调整底版数据上的部分像素以提升底版数据的质量。当然本公开并不以此为限。
又例如,可以通过附加梯度变换(Gradient Transform)算法,调整具有第一分辨率的纹理数据中的梯度分布,生成具有第二分辨率的纹理数据。之后将参考图7进一步描述利用附加梯度变换算法对第一分辨率的纹理数据中的梯度分布进行调整的过程的一个示例,当然,本公开并不以此为限。
在操作S304中,基于所述具有第二分辨率的纹理数据,通过渲染操作生成具有第二分辨率的一个或多个视频帧。
渲染(render)是指以终端基于模型生成视频帧的过程。模型是用语言或者数据结构进行严格定义的视频帧的描述,它包括几何、视点、纹理、照明和阴影等信息。在本公开中,使用终端120的GPU来完成渲染操作,以适配实时视频的场景。所述具有第二分辨率的纹理数据可以作为上述的模型中用于描述表面细节(包括表面颜色细节)的数据。终端120将基于垂直同步信号来触发多个GPU单元来基于所述具有第二分辨率的纹理数据渲染视频帧中的画面。这些GPU单元可以依次执行顶点着色(Vertex Shadering)、形状装配(Shape Assembly)、几何着色(Geometry Shader)、光栅化(Rasterization)、和片段着色器(Fragment Shadder)等操作,计算出视频帧中的每个像素的RGB(Red,Green,blue)值,进而获取终端120即将显示的视频帧。
垂直同步信号是终端上的GPU计算一帧画面的同步信号,其指示着前一帧的结束和下一帧的开始,也即通常一帧画面将使用两个相邻的垂直同步信号间间隔的时间来完成渲染。之后将参考图8至图9进一步描述利用终端的硬件环境渲染具有第二分辨率的纹理数据进而得到一个或多个视频帧的过程的一个示例,当然,本公开并不以此为限。
在本公开至少一个实施例中,操作S301至操作S304可以顺序执行,可以并行执行,也可以按调整后的其他次序执行,本公开的实施例对各个步骤的执行顺序不作限制,可以根据实际情况调整。在本公开至少一个实施例中,方法30可以选择地执行操作S301至操作S304中的部分操作,也可以执行除了操作S301至操作S304以外的一些附加操作,本公开的实施例对此不作限制。例如,在本公开的一些示例实施例中,还可选地包括操作S305。例如,在操作S305中,显示所述具有第二分辨率的一个或多个视频帧。具体地,可以在图2所示的用户界面上显示这些视频帧,本公开并不以此为限。
由此,本公开的各个实施例通过在服务端压缩视频,通过下发数据量较小且清晰度较低的视频节省了网络传输带宽,提高了视频的传输效率。同时,在终端播放视频时,通过轻量级的超分辨率算法(又称为超分算法),增强视频的画质,节省移动终端的视频存储成本,并保证了视频观看效果。
此外,本公开还提供了一种处理视频数据的装置,所述装置包括:接收模块,被配置为接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率;提取模块(例如图4中示出的用于提取纹理数据的模块),被配置为基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据;超分处理模块(例如图4中的超分处理器),被配置为基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据,所述第二分辨率高于所述第一分辨率;以及渲染模块(例如图4中的用于渲染上屏的模块),被配置为基于所述具有第二分辨率的纹理数据,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧。
所述处理视频数据的装置包括:显示器和处理器,所述显示器,被配置为在第一时刻,显示用于控制视频数据的分辨率的按钮;所述处理器,被配置为响应于所述用于控制视频数据的分辨率的按钮被触发,基于所述经压缩的视频数据中的一个或多个视频帧,生成所述具有第二分辨率的一个或多个视频帧;以及所述显示器还被配置为在第二时刻,显示所述具有第二分辨率的一个或多个视频帧。
接下结合图5至图9来对根据本公开实施例的处理视频数据的方法30的一些可选细节进行进一步说明。
图5是示出根据本公开的实施例的在服务器上对原始视频数据进行压缩的进程的示意图。
值得注意的是,本公开的实施例中的进程(Process)是指具有一定独立功能的程序(关于某个数据集合的一次运行活动)。它是操作系统动态执行的基本单元,在操作系统中,进程既是基本的分配单元,也是基本的执行单元。进程可以是容器,通常包含内核对象、地址空间、统计信息和若干线程。它本身可以并不真正执行代码指令,而是交由进程内的线 程执行。
本公开的实施例的进程可以调用各种压缩工具以对原始视频数据进行压缩。以下以ffmpeg工具为例来进行说明,但是本领域技术人员应当理解本公开并不以此为限。ffmpeg是一种开放源代码的软件,其可以执行音频和视频多种格式的录影、转换、串流功能。ffmpeg集成了可用于各种项目的音频和视频的解码器库以及用于音频和视频的格式转换库。
作为一个示例,如图5所示,服务器110在接收到视频发布者上传的高清原始视频数据后,处理视频数据压缩的进程被唤醒。进程内的主线程将该高清原始视频数据传递至origin.mp4文件,并调用线程池中空闲的线程,使该空闲的线程执行ffmpeg命令(例如,ffmpeg-i origin.mp4-vf scale=1280:720 720P.mp4),以将该高清原始视频数据压缩为具有第一分辨率(例如,720P)的视频数据,并将经压缩的视频命名为720P.mp4。接着,该线程将名称为720P.mp4传回至主线程,以使得主线程能够对应地存储经压缩的视频,例如以覆盖原始视频数据。之后,服务器110处理用户请求的进程将在接收到符合要求的用户请求之后,将该经压缩的视频数据发送至对应的终端120。通过图5示出的压缩过程,可以减少40%的数据量,减少存储和网络带宽成本。以上过程仅为示例,本公开对此不进行限制。
图6是示出根据本公开的实施例的获取具有第一分辨率的纹理数据的示意图。
在本公开的一个示例中,所述基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据包括:将所述视频帧绑定至用于渲染的表面对象,并基于与所述表面对象对应的表面纹理对象获取纹理标识符;以及基于所述纹理标识符,获取具有所述第一分辨率的纹理数据。而所述基于所述纹理标识符,获取具有所述第一分辨率的纹理数据还包括:创建帧缓冲对象,并将二维纹理数据绑定至所述帧缓冲对象以将所述二维纹理数据作为所述帧缓冲对象的颜色缓冲;以及将所述帧缓冲对象激活,并基于所述纹理标识符,将终端的图形处理单元输出的硬件纹理数据采样至所述二维纹理数据中,以将所述二维纹理数据作为具有所述第一分辨率的纹理数据。当然本公开并不以此为限。
如图6所示,超分处理引擎内置了一个表面对象,以用于获取一个或多个视频帧,并将所述一个或多个视频帧写入至缓冲队列(例如,BufferQueue)中。假设解码引擎为MediaCodec,其支持将一个或多个视频帧直接输入至该表面对象中。在表面对象中的缓冲队列将作为生产者,而表面纹理对象缓冲(SurfaceTexture Buffer)将作为消费者,二者基于生产者-消费者模型进行通信,共同处理有限共享存储区域中的视频帧。例如,表面对象中的缓冲对象可以将视频帧按照先进先出的原则依序将视频帧添加至共享存储区域中,而表面纹理对象从该共享存储区域中提取视频帧并将其存储至表面纹理对象缓冲,然后将这 些视频帧转换为纹理图像以获取硬件纹理数据。硬件纹理数据可以是TEXTURE_EXTERNAL_OES类型的纹理数据,其可以由基于opengl的嵌入式系统直接处理。当然本公开并不以此为限。
接着,超分处理引擎通过管道流水线操作将硬件纹理数据渲染成二维纹理数据。如图6所示,该过程可以包括以下操作:首先,创建一个帧缓冲对象(framebuffer object,FBO)mFrameBuffer并将二维纹理数据(又记为Texture2D纹理)作为帧缓冲对象mFrameBuffer的颜色缓冲。然后,在硬件渲染之前,将该帧缓冲对象设置为激活的帧缓冲对象mFrameBuffer,从而硬件纹理数据将被采样到该帧缓冲对象mFrameBuffer中。由于该帧缓冲对象mFrameBuffer是以二维纹理数据作为颜色缓冲的,因此该帧缓冲对象mFrameBuffer将可以将该硬件纹理数据采样成二维纹理图像。在帧缓冲对象mFrameBuffer关闭之后,将获取包括二维纹理图像的二维纹理数据用于超分处理。这个过程通过管道流水线,间接地将硬件纹理数据采样到了二维纹理数据上,由此获取了所需的纹理数据。
上述过程可以使用以下伪代码进行概括,当然本公开并不以此为限。
上述函数GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER,mFrameBuffer)将上述的帧缓冲对象mFrameBuffer绑定到硬件环境中的目标帧缓冲对象GLES30.GL_FRAMEBUFFER中,从而硬件纹理数据将被采样到该帧缓冲中。函数GLES30.glFramebufferTexture2D(..)则将二维纹理数据作为帧缓冲对象的颜色缓冲,以使得帧缓冲对象mFrameBuffer将可以将该硬件纹理数据采样成二维纹理图像。函数GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER,0)则将帧缓冲对象mFrameBuffer去激活。
上述的纹理数据的获取过程实现了将视频帧渲染到了外部设置的二维纹理对象上,从 而实现了纹理数据与视频帧间的解耦,为后续的超分算法的应用提供了与视频播放器的其它模块解耦的可操作对象,降低了视频播放器的计算复杂度。
图7是示出根据本公开的实施例的获取具有第二分辨率的纹理数据的示意图。
在本公开的一个示例中,在操作S303中,所述基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据包括:将所述具有第一分辨率的纹理数据的图像格式从RGBA格式(第一颜色模型格式)转换为YUV格式(第二颜色模型格式),以提取与具有第一分辨率的纹理数据相对应的Y通道数据(亮度分量通道数据);利用附加梯度算法,对所述Y通道数据进行处理,以生成具有第二分辨率的Y通道数据;基于所述具有第二分辨率的Y通道数据,生成所述具有第二分辨率的纹理数据。当然,本公开并不以此为限。
如图7所示,假设具有第一分辨率的纹理数据在长边上具有width个像素,在短边上具有height个像素。Width和height均为正整数。具有第一分辨率的纹理数据通常为RGBA格式。RGBA是一种色彩空间的模型,由RGB色彩空间和Alpha通道组成。RGBA代表红(Red)、绿(Green)、蓝(Blue)和Alpha通道(Alpha)。为了进一步降低运算量,可以先将该具有第一分辨率的纹理数据转换为YUV格式。YUV是另一种表示颜色的模型,其中,Y是指亮度分量(也即Y通道数据),U指蓝色色度分量(也即U通道数据),而V指红色色度分量(也即V通道数据)。
由于人眼对Y通道数据的敏感度远远超过对U通道数据和V通道数据的敏感度,为了进一步减少运算复杂度,可以仅对Y通道数据进行处理,以调整Y通道数据中的细节,以生成具有第二分辨率的Y通道数据。然而再将具有第二分辨率的Y通道数据、具有第一分辨率的U通道数据和具有第一分辨率的V通道数据进行融合,从而生成具有第二分辨率的纹理数据。
具体地,在本公开的一个示例中,在操作S303中,所述基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据包括:基于所述具有第二分辨率的Y通道数据,生成水平方向的Y通道数据和垂直方向的Y通道数据;利用附加梯度算法,对水平方向的Y通道数据和垂直方向的Y通道数据进行处理,以生成具有第二分辨率的水平方向的Y通道数据和具有第二分辨率的垂直方向的Y通道数据;基于具有第二分辨率的水平方向的Y通道数据和具有第二分辨率的垂直方向的Y通道数据,生成具有第二分辨率的Y通道数据;以及基于所述具有第二分辨率的Y通道数据,生成所述具有第二分辨率的纹理数据。
在一个示例中,在Y通道数据(或水平方向或垂直方向上的Y通道数据)上应用附加梯度算法的过程可以被简述如下:首先,计算第一分辨率的Y通道数据的梯度轮廓p1= {λ11}与第二分辨率的Y通道数据的梯度轮廓p2={λ22}之间的比例r。梯度轮廓(gradient profile)是沿着某个图像(例如,第一分辨率的Y通道数据)中过零像素的梯度方向的一维轮廓,其可以用两个参数进行表示,分别为形状参数λ和轮廓锐度σ。形状参数λ是用于控制图像的梯度的整体形状的参数,轮廓锐度σ是用于控制梯度轮廓的锐度的参数。轮廓锐度σ越小,梯度轮廓的锐度越高。接着,根据该比例r,将第一分辨率的Y通道数据的梯度场转换为具有第二分辨率的Y通道数据的梯度场,由此得到具有第二分辨率的Y通道数据。当然本公开并不以此为限。
上述过程中,比例r可以在应用附加梯度算法前就提前计算好,从而在整个过程中,超分处理引擎仅需要对第一分辨率的Y通道数据中的各个像素进行简单的线性计算,就能得到具有第二分辨率的Y通道数据中的各个像素的值。整个过程无需神经网络的参与,轻量高效,能够大大节省终端的计算量。
图8是示出根据本公开的实施例的基于具有第二分辨率的纹理数据生成具有第二分辨率的一个或多个视频帧的示意图。图9是示出根据本公开的实施例的渲染操作的示意图。
在本公开的一个示例中,在操作S304中,所述基于所述具有第二分辨率的纹理数据,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧包括:基于与所述表面纹理对象对应的注册回调函数,将所述具有第二分辨率的纹理数据绑定至输出视频帧的表面对象上;以及基于绑定有具有第二分辨率的纹理数据的表面对象,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧。当然本公开并不以此为限。
如图8所示,终端120的硬件环境首先需要获取在操作S303中生成的具有第二分辨率的纹理数据。结合图6,获取具有第二分辨率的纹理数据示例过程可以被简要描述如下:首先,对上述的表面纹理对象(SurfaceTexture)设定注册回调函数。注册回调函数通过使用函数指针来调用函数,并将函数指针作为参数进行传递以使得调用者和被调用者解耦,以避免单独建立一个线程来确定纹理数据的处理情况。例如,可以使用函数setOnFrameAvailableListener,以使得表面纹理对象能够听到(listen)解码后的具有第一分辨率的纹理数据,并通过注册回调函数直接获取到具有第二分辨率的纹理数据。由于操作S303与视频播放器Mediacodc内部原生的操作已经解耦了,具有第二分辨率的纹理数据已经无法再直接利用Mediacodc渲染了,因此需要设计特定于具有第二分辨率的纹理数据的渲染操作(如图9所示),以将具有第二分辨率的纹理数据绑定到输出视频帧的表面对象上,并渲染这些具有第二分辨率的视频帧。
如图9所示,在本公开的一个示例中,在操作S304中,所述通过渲染操作生成具有所述第二分辨率的一个或多个视频帧包括:对所述具有第二分辨率的纹理数据依次进行顶 点变换操作、图元装配操作、光栅化操作,以生成具有所述第二分辨率的一个或多个视频帧。
首先,利用顶点着色器将具有第二分辨率的纹理数据进行顶点变换。具体地,所有视频帧中的图形在计算机中都是一串串数据,在三维坐标系中可以表示为一个N*3的矩阵,N为顶点的数量,3分别代表x、y、z位置坐标。顶点变换就是通过一系列矩阵变换在坐标系中对顶点进行平移、缩小\放大、旋转等操作。顶点着色器则是用来描述顶点需要执行的模型变换、视变换、投影变换、光照(Transform and lighting)处理的顶点着色器程序源代码/可执行文件。作为一个示例,上述过程可以使用以下伪代码进行概括,当然本公开并不以此为限。
接着,可以利用图元装配器将经顶点变换后的纹理数据(其具有第二分辨率)进行图元装配。图元装配操作将使得终端能够连接这些经过矩阵变换后的顶点。例如,假设经过顶点变换后的三个点,在经过图元装配操作之后,便可以确定是将这三个点绘制成一个三角形,还是绘制成两条直线。
然后,可以利用片段着色器将经图元装配后的纹理数据进行光栅化。其中,片段着色器是用来描述片段上执行操作(如颜色混合)的片元着色器程序源代码/可执行文件。光栅化操作能够对图元进行颜色渲染或者纹理渲染。光栅化后的图元输出帧缓冲后便能实际可见。也即,光栅化操作通过将图形的坐标转化为屏幕像素坐标,最终将图元的数学描述转化为用于显示在屏幕上的片段,然后通过帧缓存即可在屏幕上显示视频帧。作为一个示例,上述过程可以使用以下伪代码进行概括,当然本公开并不以此为限。

上述过程中,通过对具有第二分辨率的纹理图像进行上述各项操作以渲染上屏,解决了经超分处理引擎处理后的纹理数据无法直接由视频播放器直接处理的问题,并通过各个GPU单元以较低的计算资源实现了渲染操作,进一步降低了终端的计算成本。
在上述方法中,还可以包括:显示用于控制视频数据的分辨率的按钮;响应于所述用于控制视频数据的分辨率的按钮被触发,基于所述经压缩的视频数据中的一个或多个视频帧,生成所述具有第二分辨率的一个或多个视频帧;以及显示所述具有第二分辨率的一个或多个视频帧。
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例,在此不再一一赘述。
根据本公开的又一方面,还提供了一种电子设备,用于实施根据本公开实施例的方法。图10示出了根据本公开实施例的电子设备2000的示意图。
如图10所示,所述电子设备2000可以包括一个或多个处理器2010,和一个或多个存储器2020。其中,所述存储器2020中存储有计算机可读代码,所述计算机可读代码当由所述一个或多个处理器2010运行时,可以执行如上所述的方法。
本公开实施例中的处理器可以是一种集成电路芯片,具有信号的处理能力。上述处理器可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本公开实施例中的公开的各方法、操作及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,可以是X86架构或ARM架构的。
一般而言,本公开的各种示例实施例可以在硬件或专用电路、软件、固件、逻辑,或其任何组合中实施。某些方面可以在硬件中实施,而其他方面可以在可以由控制器、微处理器或其他计算设备执行的固件或软件中实施。当本公开实施例的各方面被图示或描述为框图、流程图或使用某些其他图形表示时,将理解此处描述的方框、装置、系统、技术或方法可以作为非限制性的示例在硬件、软件、固件、专用电路或逻辑、通用硬件或控制器 或其他计算设备,或其某些组合中实施。
例如,根据本公开实施例的方法或装置也可以借助于图11所示的计算设备3000的架构来实现。如图11所示,计算设备3000可以包括总线3010、一个或多个CPU 3020、只读存储器(ROM)3030、随机存取存储器(RAM)3040、连接到网络的通信端口3050、输入/输出组件3060、硬盘3070等。计算设备3000中的存储设备,例如ROM 3030或硬盘3070可以存储本公开提供的方法的处理和/或通信使用的各种数据或文件以及CPU所执行的程序指令。计算设备3000还可以包括用户界面3080。当然,图11所示的架构只是示例性的,在实现不同的设备时,根据实际需要,可以省略图11示出的计算设备中的一个或多个组件。
根据本公开的又一方面,还提供了一种计算机可读存储介质。图12示出了根据本公开的存储介质4000的示意图。
如图12所示,所述计算机存储介质4020上存储有计算机可读指令4010。当所述计算机可读指令4010由处理器运行时,可以执行参照以上附图描述的根据本公开实施例的方法。本公开实施例中的计算机可读存储介质可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(ROM)、可编程只读存储器(PROM)、可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM)或闪存。易失性存储器可以是随机存取存储器(RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(SDRAM)、双倍数据速率同步动态随机存取存储器(DDRSDRAM)、增强型同步动态随机存取存储器(ESDRAM)、同步连接动态随机存取存储器(SLDRAM)和直接内存总线随机存取存储器(DR RAM)。应注意,本文描述的方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。应注意,本文描述的方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本公开实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行根据本公开实施例的方法。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,所述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替 换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
一般而言,本公开的各种示例实施例可以在硬件或专用电路、软件、固件、逻辑,或其任何组合中实施。某些方面可以在硬件中实施,而其他方面可以在可以由控制器、微处理器或其他计算设备执行的固件或软件中实施。当本公开实施例的各方面被图示或描述为框图、流程图或使用某些其他图形表示时,将理解此处描述的方框、装置、系统、技术或方法可以作为非限制性的示例在硬件、软件、固件、专用电路或逻辑、通用硬件或控制器或其他计算设备,或其某些组合中实施。
在上面详细描述的本公开的示例实施例仅仅是说明性的,而不是限制性的。本领域技术人员应该理解,在不脱离本公开的原理和精神的情况下,可对这些实施例或其特征进行各种修改和组合,这样的修改应落入本公开的范围内。

Claims (15)

  1. 一种处理视频数据的方法,所述方法包括:
    接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率;
    基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据;
    基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据,所述第二分辨率高于所述第一分辨率;以及
    基于所述具有第二分辨率的纹理数据,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧。
  2. 如权利要求1所述的方法,所述基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据包括:
    利用终端的中央处理单元,将所述经压缩的视频数据解封装成视频码流;
    利用终端的图形处理单元,将所述视频码流解码成一个或多个视频帧;以及
    利用终端的图形处理单元,根据所述一个或多个视频帧获取具有所述第一分辨率的纹理数据。
  3. 如权利要求1或2所述的方法,其中,
    所述具有第一分辨率的纹理数据为用于描述所述视频帧的颜色信息的结构化的数据,其包括具有相同图像格式的多个第一纹理图像,每个第一纹理图像具有第一分辨率;并且
    所述具有第二分辨率的纹理数据为用于描述所述视频帧的颜色信息的结构化的数据,其包括具有相同图像格式的多个第二纹理图像,每个第二纹理图像具有第二分辨率。
  4. 如权利要求1至3任一项所述的方法,所述基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据包括:
    将所述视频帧绑定至用于渲染的表面对象,并基于与所述表面对象对应的表面纹理对象获取纹理标识符;以及
    基于所述纹理标识符,获取具有所述第一分辨率的纹理数据。
  5. 如权利要求4所述的方法,所述基于所述纹理标识符,获取具有所述第一分辨率的纹理数据还包括:
    创建帧缓冲对象,并将二维纹理数据绑定至所述帧缓冲对象以将所述二维纹理数据作为所述帧缓冲对象的颜色缓冲;以及
    将所述帧缓冲对象激活,并基于所述纹理标识符,将终端的图形处理单元输出的硬件纹理数据采样至所述二维纹理数据中,以将所述二维纹理数据作为具有所述第一分辨率的 纹理数据。
  6. 如权利要求1至5任一项所述的方法,所述基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据包括:
    将所述具有第一分辨率的纹理数据的图像格式从第一颜色模型格式转换为第二颜色模型格式,以提取与具有第一分辨率的纹理数据相对应的亮度分量通道数据;
    利用附加梯度算法,对所述亮度分量通道数据进行处理,以生成具有第二分辨率的亮度分量通道数据;以及
    基于所述具有第二分辨率的亮度分量通道数据,生成所述具有第二分辨率的纹理数据。
  7. 如权利要求6所述的方法,所述基于所述具有第二分辨率的亮度分量通道数据,生成所述具有第二分辨率的纹理数据包括:
    基于所述具有第二分辨率的亮度分量通道数据,生成水平方向的亮度分量通道数据和垂直方向的亮度分量通道数据;
    利用附加梯度算法,对水平方向的亮度分量通道数据和垂直方向的亮度分量通道数据进行处理,以生成具有第二分辨率的水平方向的亮度分量通道数据和具有第二分辨率的垂直方向的亮度分量通道数据;
    基于具有第二分辨率的水平方向的亮度分量通道数据和具有第二分辨率的垂直方向的亮度分量通道数据,生成具有第二分辨率的亮度分量通道数据;以及
    基于所述具有第二分辨率的亮度分量通道数据,生成所述具有第二分辨率的纹理数据。
  8. 如权利要求4所述的方法,所述基于所述具有第二分辨率的纹理数据,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧包括:
    基于与所述表面纹理对象对应的注册回调函数,将所述具有第二分辨率的纹理数据绑定至输出视频帧的表面对象上;以及
    基于绑定有具有第二分辨率的纹理数据的表面对象,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧。
  9. 如权利要求1至8任一项所述的方法,所述通过渲染操作生成具有所述第二分辨率的一个或多个视频帧包括:
    对所述具有第二分辨率的纹理数据依次进行顶点变换操作、图元装配操作、光栅化操作,以生成具有所述第二分辨率的一个或多个视频帧。
  10. 如权利要求1至9任一项所述的方法,所述方法还包括:在终端的显示器上显示所述具有第二分辨率的一个或多个视频帧。
  11. 如权利要求1至9任一项所述的方法,所述方法包括:
    显示用于控制视频数据的分辨率的按钮;
    响应于所述用于控制视频数据的分辨率的按钮被触发,基于所述经压缩的视频数据中的一个或多个视频帧,生成所述具有第二分辨率的一个或多个视频帧;以及
    显示所述具有第二分辨率的一个或多个视频帧。
  12. 一种处理视频数据的装置,所述装置包括:
    接收模块,被配置为接收经压缩的视频数据,所述经压缩的视频数据具有第一分辨率;
    提取模块,被配置为基于所述经压缩的视频数据中的一个或多个视频帧,获取具有所述第一分辨率的纹理数据;
    超分处理模块,被配置为基于所述具有第一分辨率的纹理数据,生成具有第二分辨率的纹理数据,所述第二分辨率高于所述第一分辨率;以及
    渲染模块,被配置为基于所述具有第二分辨率的纹理数据,通过渲染操作生成具有所述第二分辨率的一个或多个视频帧。
  13. 如权利要求12所述的处理视频数据的装置,所述设备包括处理器和显示器,其中,
    所述显示器,被配置为在第一时刻,显示用于控制视频数据的分辨率的按钮;
    所述处理器,被配置为响应于所述用于控制视频数据的分辨率的按钮被触发,基于所述经压缩的视频数据中的一个或多个视频帧,生成所述具有第二分辨率的一个或多个视频帧;以及
    所述显示器还被配置为在第二时刻,显示所述具有第二分辨率的一个或多个视频帧。
  14. 一种计算机设备,包括:
    一个或多个处理器;以及
    一个或多个存储器,其中存储有计算机可执行程序,当由所述处理器执行所述计算机可执行程序时,执行权利要求1-11中任一项所述的方法。
  15. 一种计算机可读存储介质,其上存储有计算机可执行指令,所述指令在被处理器执行时用于实现权利要求1-11中任一项所述的方法。
PCT/CN2023/126236 2022-12-09 2023-10-24 处理视频数据的方法、装置、计算机设备和存储介质 WO2024120031A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/915,488 US20250039330A1 (en) 2022-12-09 2024-10-15 Method and apparatus for processing video data, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211589458.6 2022-12-09
CN202211589458.6A CN116996741A (zh) 2022-12-09 2022-12-09 处理视频数据的方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/915,488 Continuation US20250039330A1 (en) 2022-12-09 2024-10-15 Method and apparatus for processing video data, computer device, and storage medium

Publications (1)

Publication Number Publication Date
WO2024120031A1 true WO2024120031A1 (zh) 2024-06-13

Family

ID=88523849

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/126236 WO2024120031A1 (zh) 2022-12-09 2023-10-24 处理视频数据的方法、装置、计算机设备和存储介质

Country Status (3)

Country Link
US (1) US20250039330A1 (zh)
CN (1) CN116996741A (zh)
WO (1) WO2024120031A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119255047A (zh) * 2024-12-03 2025-01-03 天翼云科技有限公司 视频流数据处理方法、装置、计算机设备、存储介质和程序产品

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110223228A (zh) * 2019-05-16 2019-09-10 北京奇艺世纪科技有限公司 图像处理方法、装置、终端设备及存储介质
CN110868625A (zh) * 2019-11-22 2020-03-06 北京金山云网络技术有限公司 一种视频播放方法、装置、电子设备及存储介质
US20200193566A1 (en) * 2018-12-12 2020-06-18 Apical Limited Super-resolution image processing
CN114501141A (zh) * 2022-01-04 2022-05-13 杭州网易智企科技有限公司 视频数据处理方法、装置、设备和介质
CN114746902A (zh) * 2020-04-02 2022-07-12 索尼集团公司 用于纹理映射应用的块压缩纹理的超分辨率

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200193566A1 (en) * 2018-12-12 2020-06-18 Apical Limited Super-resolution image processing
CN110223228A (zh) * 2019-05-16 2019-09-10 北京奇艺世纪科技有限公司 图像处理方法、装置、终端设备及存储介质
CN110868625A (zh) * 2019-11-22 2020-03-06 北京金山云网络技术有限公司 一种视频播放方法、装置、电子设备及存储介质
CN114746902A (zh) * 2020-04-02 2022-07-12 索尼集团公司 用于纹理映射应用的块压缩纹理的超分辨率
CN114501141A (zh) * 2022-01-04 2022-05-13 杭州网易智企科技有限公司 视频数据处理方法、装置、设备和介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119255047A (zh) * 2024-12-03 2025-01-03 天翼云科技有限公司 视频流数据处理方法、装置、计算机设备、存储介质和程序产品

Also Published As

Publication number Publication date
CN116996741A (zh) 2023-11-03
US20250039330A1 (en) 2025-01-30

Similar Documents

Publication Publication Date Title
CN109983757B (zh) 全景视频回放期间的视图相关操作
US11418832B2 (en) Video processing method, electronic device and computer-readable storage medium
CN109983500B (zh) 重新投影全景视频图片的平板投影以通过应用进行渲染
US10666863B2 (en) Adaptive panoramic video streaming using overlapping partitioned sections
US11941748B2 (en) Lightweight view dependent rendering system for mobile devices
EP3804349B1 (en) Adaptive panoramic video streaming using composite pictures
JP6333858B2 (ja) 複数の視覚コンポーネントを有する画面を共有するためのシステム、装置、および方法
WO2021197157A1 (zh) 视频流的处理方法、装置、电子设备及计算机可读介质
US20250039330A1 (en) Method and apparatus for processing video data, computer device, and storage medium
CN110428382A (zh) 一种用于移动终端的高效视频增强方法、装置和存储介质
CN111885346A (zh) 画面码流合成方法、终端、电子设备和存储介质
CN109658488B (zh) 一种虚实融合系统中通过可编程gpu加速解码摄像头视频流的方法
US20150117515A1 (en) Layered Encoding Using Spatial and Temporal Analysis
CN112511896A (zh) 一种视频渲染方法及装置
US12079924B2 (en) System and method for dynamic images virtualization
US9335964B2 (en) Graphics server for remotely rendering a composite image and method of use thereof
CN110049347B (zh) 在直播界面配置图像的方法、系统、终端和装置
CN113205599B (zh) 一种视频三维融合时gpu加速的视频纹理更新方法
CN111935483B (zh) 一种视频图像无损编解码方法及系统
CN114245137A (zh) 由gpu执行的视频帧处理方法和包括gpu的视频帧处理装置
CN114567784A (zh) 一种用于飞腾显卡的vpu视频解码输出方法及系统
CN117063473A (zh) 3d对象的流传输方法、装置和程序
CN112995134A (zh) 一种三维视频流媒体传输方法与可视化方法
CN117370696A (zh) 小程序页面的加载方法、装置、电子设备及存储介质
WO2023193524A1 (zh) 直播视频处理方法、装置、电子设备、计算机可读存储介质及计算机程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23899612

Country of ref document: EP

Kind code of ref document: A1