Tencent America LLC patent applications published on November 30th, 2023

From WikiPatents
Revision as of 05:36, 5 December 2023 by Wikipatents (talk | contribs)
Jump to navigation Jump to search

Summary of the patent applications from Tencent America LLC on November 30th, 2023

Tencent America LLC has recently filed several patents related to video processing and coding, as well as speech recognition. These patents describe methods and apparatuses for processing three-dimensional visual content, predicting movement of vertices in a mesh, performing chroma from luma intra prediction in video coding, coding video with motion vectors, decoding video using temporal interpolated prediction, and coding video data with flags for referencing and outputting. Additionally, there are patents related to decoding and blending picture areas in images, as well as automatic speech recognition for bilingual code-switched and monolingual speech.

Summary: Tencent America LLC has filed patents for methods and apparatuses related to video processing, coding, and decoding, as well as speech recognition. These patents cover various aspects such as predicting movement of vertices in a mesh, performing intra prediction in video coding, coding video with motion vectors, decoding video using temporal interpolated prediction, and coding video data with flags for referencing and outputting. There are also patents related to decoding and blending picture areas in images, as well as automatic speech recognition for bilingual code-switched and monolingual speech.

Notable Applications:

  • Method and apparatus for processing three-dimensional visual content.
  • Method for predicting movement of vertices in a mesh.
  • Method and apparatus for performing chroma from luma intra prediction in video coding.
  • Method and system for coding video with motion vectors.
  • Video decoder for decoding video using temporal interpolated prediction.
  • Method and apparatus for processing video bitstreams.
  • Method and apparatus for coding video data with flags for referencing and outputting.
  • Decoding device for predicting and decoding picture areas in images.
  • Method, apparatus, and computer-readable medium for automatic speech recognition using conditional factorization for bilingual code-switched and monolingual speech.



Patent applications for Tencent America LLC on November 30th, 2023

TOOLS FOR CONFORMANCE OF NETWORK-BASED MEDIA PROCESSING (NBMP) DOCUMENTS AND ENTITIES (18301657)

Main Inventor

Iraj SODAGAR


Brief explanation

The patent application describes methods, apparatus, and computer readable storage medium for verifying NBMP documents and entities.
  • The invention involves using an Application Programming Interface (API) to interact with NBMP entities.
  • The API operations can include creating, updating, retrieving, or deleting NBMP entities.
  • The method includes invoking the API operation and receiving a response from the NBMP entity.
  • Based on the response, the method determines whether the NBMP entity passes an API test corresponding to the API operation.

Abstract

Methods, apparatus, and computer readable storage medium for verifying NBMP documents and entities. One method may include invoking an Application Programming Interface (API) corresponding to an API operation supported by an NBMP entity, the API operation being related to at least one of: a create operation; an update operation; a retrieve operation; or a delete operation; receiving a response from the NBMP entity; and determining, based on the response, whether the NBMP entity passes an API test corresponding to the API operation.

MULTI-AGENT PATH PLANNING METHOD AND DEVICE, AND STORAGE MEDIUM (17827602)

Main Inventor

Xifeng GAO


Brief explanation

- The patent application describes a method for multi-agent path planning.

- The method involves generating a 2D floor plan of a space and optimizing it to find a target mesh that maximizes a metric function. - The metric function takes into account the number of agents that the mesh can hold and the number of agents in the largest connected component of the graph corresponding to the mesh. - The target mesh is then converted into a target graph, where each vertex represents a position for an agent and each edge represents a possible path for an agent to travel on. - Paths are planned for multiple agents based on their origins, destinations, and the graph, ensuring that the agents do not collide with each other.

Abstract

A multi-agent path planning method is provided. The method includes: generating an initial mesh of a 2D floor plan of a space; optimizing the initial mesh to find a target mesh maximizing a value of a metric function, the metric function including a term reflecting a number of agents that a graph corresponding to a candidate mesh can hold and a term reflecting a number of agents in a largest connected component of the graph corresponding to the candidate mesh, the candidate mesh being a mesh; converting the target mesh into a target graph, each vertex of the target graph representing a position that an agent can reside at, and each edge of the target graph representing a path that an agent can travel on; and planning paths for the plurality of agents according to origins and destinations of the agents and the graph, wherein the agents traveling on the planned paths do not collide with each other.

TECHNIQUES FOR IMPROVED ZERO-SHOT VOICE CONVERSION WITH A CONDITIONAL DISENTANGLED SEQUENTIAL VARIATIONAL AUTO-ENCODER (17826987)

Main Inventor

Chunlei ZHANG


Brief explanation

- The patent application describes a method for voice conversion using a conditional disentangled sequential variational auto-encoder (C-DSVAE).

- The method involves receiving input speech segments and encoding them using a shared encoder to generate a speaker embedding and a content embedding. - The speaker embedding and content embedding are further encoded using separate encoders to obtain encoded results. - A content bias is enabled, and the content embedding is reshaped using the content bias. - Finally, a reconstructed speech output is generated based on the encoded results and the reshaped content embedding.

Abstract

A method, system, apparatus, and computer-readable medium for voice conversion using a conditional disentangled sequential variational auto-encoder (C-DSVAE) is provided. The method, performed by at least one processor, includes receiving input speech segments, encoding the input speech segments via a shared encoder to generate a speaker embedding and a content embedding, and encoding a posterior distribution of the speaker embedding via a speaker encoder and encoding a posterior distribution of the content embedding via a content encoder to obtain encoded results. The method further includes enabling a content bias, reshaping the content embedding using the content bias, and generating a reconstructed speech output based on the encoded results and the reshaped content embedding.

CONDITIONAL FACTORIZATION FOR JOINTLY MODELING CODE-SWITCHED AND MONOLINGUAL ASR (17828240)

Main Inventor

Chunlei ZHANG


Brief explanation

This patent application describes a method, apparatus, and computer-readable medium for automatic speech recognition using conditional factorization for bilingual code-switched and monolingual speech.
  • The approach involves receiving an audio observation sequence that contains audio in either a first language or a second language.
  • The audio observation sequence is then mapped into two separate sequences of hidden representations using encoders specific to each language.
  • A label-to-frame sequence is generated based on the hidden representations from both languages using a joint neural network model.
  • This method allows for accurate speech recognition in bilingual code-switched and monolingual speech scenarios.

Abstract

A method, apparatus, and non-transitory computer-readable medium for automatic speech recognition using conditional factorization for bilingual code-switched and monolingual speech may include receiving an audio observation sequence comprising a plurality of frames, the audio observation sequence including audio in a first language or a second language. The approach may further include mapping the audio observation sequence into a first sequence of hidden representations, the mapping being generated by a first encoder corresponding to the first language and mapping the audio observation sequence into a second sequence of hidden representations, the mapping being generated by a second encoder corresponding to the second language. The approach may further include generating a label-to-frame sequence based on the first sequence of hidden representations and the second sequence of hidden representations, using a joint neural network based model.

METHOD AND APPARATUS FOR ASYMMETRIC BLENDING OF PREDICTIONS OF PARTITIONED PICTURES (17983017)

Main Inventor

Han GAO


Brief explanation

- The patent application describes a decoding device that predicts and decodes a picture area of an input image.

- The picture area is divided into two parts by a partitioning boundary. - The pixels of each part are predicted using suitable measures. - Blending masks are applied to generate blended regions for the predicted pixels. - The blending masks modify the pixels to generate a complete prediction of the picture area based on both parts. - The blending mask is based on first and second thresholds, which are defined relative to the partitioning boundary. - The first and second thresholds may have different values to create an asymmetrical blending relative to the boundary. - The prediction, combination, and decoding of the picture area are adaptive to the differing contents of the parts within it.

Abstract

A decoding device is provided for decoding of a bitstream, and more specifically for predicting a picture area of an input image for decoding. The picture area has been divided into at least first and second parts by a partitioning boundary, and the pixels of each part are predicted according to suitable measures. Blending masks are then applied to generate blended regions for the predicted pixels, modifying the pixels to generate a complete prediction of the picture area based on both parts. The blending mask is based on first and second thresholds, which are defined relative to the partitioning boundary, and which may have different values to produce an asymmetrical blending relative to the boundary. Based on suitable threshold values, the prediction, combination, and decoding of the picture area is more adaptive to differing contents of the parts therein.

SUBBLOCK BASED MOTION VECTOR PREDICTOR DISPLACEMENT VECTOR REORDERING USING TEMPLATE MATCHING (17985127)

Main Inventor

Han GAO


Brief explanation

The patent application describes a method and apparatus for video encoding/decoding.
  • The apparatus includes processing circuitry for receiving prediction information of a current coding block in a current picture from a coded video bitstream.
  • The prediction information indicates that the current coding block is coded using a subblock-based temporal motion vector prediction (SbTMVP) mode.
  • The apparatus derives multiple displacement vector (DV) candidates by applying multiple DV offset candidates to a fixed DV predictor of the current coding block.
  • It compares a template of the current coding block with each of multiple templates located at positions specified by the DV candidates.
  • The apparatus calculates a cost value associated with each DV offset candidate based on the comparison.
  • It reorders the DV offset indices of the multiple candidates based on their calculated cost values.

Abstract

Aspects of the disclosure provide a method and an apparatus for video encoding/decoding. The apparatus includes processing circuitry for: receiving prediction information of a current coding block in a current picture from a coded video bitstream, the prediction information indicating that the current coding block is coded using a subblock-based temporal motion vector prediction (SbTMVP) mode; deriving multiple displacement vector (DV) candidates by applying multiple DV offset candidates to a fixed DV predictor of the current coding block; comparing a template of the current coding block with each of multiple templates, each template of the multiple templates being located at a position specified by a corresponding one of the multiple DV candidates; calculating a cost value associated with each one of the multiple DV offset candidates based on the comparing; and reordering DV offset indices of the multiple DV offset candidates based on their calculated cost values.

METHOD FOR DERIVATION OF PICTURE OUTPUT FOR NON-REFERENCED PICTURE IN CODED VIDEO STREAM (18310058)

Main Inventor

Byeongdoo CHOI


Brief explanation

The patent application describes a method, computer program, and computer system for coding video data.
  • The video data consists of a current picture and one or more other pictures.
  • The method checks a first flag to determine if the current picture is referenced by the other pictures in a decoding order.
  • The method also checks a second flag to determine if the current picture is output.
  • Based on the values of the first and second flags, the video data is decoded.

Abstract

A method, computer program, and computer system is provided for coding video data. Video data including a current picture and one or more other pictures is received. A first flag corresponding to whether the current picture is referenced by the one or more other pictures in a decoding order is checked. A second flag corresponding to whether the current picture is output; is checked. The video data is decoded based on values corresponding to the first flag and the second flag.

SUBBLOCK LEVEL TEMPORAL MOTION VECTOR PREDICTION WITH MULTIPLE DISPLACEMENT VECTOR PREDICTORS AND AN OFFSET (17984107)

Main Inventor

Lien-Fei CHEN


Brief explanation

- The patent application describes a method and apparatus for processing video bitstreams.

- The processing circuitry receives a coded video bitstream that includes a current picture with a current block. - The circuitry determines that the current block is coded in a subblock-based temporal motion vector prediction (SbTMVP) mode. - It determines a plurality of displacement vector (DV) predictor (DVP) candidates and receives a base index and DV offset for the current block. - Based on this information, the circuitry determines a DV that indicates a block collocated with the current block in a reference picture. - The circuitry then reconstructs a subblock in the current block based on motion information from a corresponding subblock in the collocated block.

  • Simplified Explanation:

- The patent application describes a method and device for processing video data. - It determines how a current block in a video is coded using a specific mode. - It calculates displacement vectors based on predictor candidates and offsets. - These displacement vectors help reconstruct subblocks in the current block using motion information from a reference picture.

Abstract

Aspects of the disclosure provide a method and an apparatus including processing circuitry that receives a coded video bitstream comprising a current picture that includes a current block. The processing circuitry determines, based on a syntax element in the coded video bitstream, that the current block including a plurality of subblocks is coded in a subblock-based temporal motion vector prediction (SbTMVP) mode. The processing circuitry determines a plurality of displacement vector (DV) predictor (DVP) candidates and receives a base index indicating a DVP in the plurality of DVP candidates and a DV offset of the current block. The processing circuitry determines a DV based on the DVP and the DV offset. The DV indicates a block collocated with the current block in a collocated reference picture. The processing circuitry reconstructs a subblock in the plurality of subblocks based on motion information of a corresponding subblock in the collocated block.

METHOD AND APPARATUS FOR TEMPORAL INTERPOLATED PREDICTION IN VIDEO BITSTREAM (17982071)

Main Inventor

Han GAO


Brief explanation

- The patent application is for a video decoder that can decode a video bitstream encoded in a temporal interpolated prediction (TIP) mode.

- The decoder generates first and second motion vectors for a block of a current picture, which point to reference frames or reference pictures within those frames. - The motion vectors are then refined using a decoder-side motion vector refinement (DMVR) process, which is based on a bilateral matching process. - The refined motion vectors are used to decode the block of the current picture. - The refinement process involves considering candidates for the refined motion vectors, selected through bilateral matching. - The refinement can be applied at both block and sub-block divisions of the current picture.

Abstract

A video decoder is provided for the decoding of a video bitstream encoded in a temporal interpolated prediction (TIP) mode. First and second motion vectors pointing to respective reference frames, or reference pictures within those frames, are generated for a block of a current picture. The motion vectors are then refined by application of a decoder-side motion vector refinement (DMVR) process, based on a bilateral matching process, and the refined motion vectors are used to decode the block. The refinement may more specifically involve consideration of candidates for the refined motion vectors, selected by the bilateral matching. The refinement may be applied at both block and sub-block divisions of the current picture.

SYSTEMS AND METHODS FOR COMBINING SUBBLOCK MOTION COMPENSATION AND OVERLAPPED BLOCK MOTION COMPENSATION (18142192)

Main Inventor

Liang ZHAO


Brief explanation

- The patent application describes methods and systems for coding video.

- The method involves receiving a current frame with a coding block that has multiple subblocks. - Each subblock is associated with a different motion vector and includes a first subblock located at the boundary of the coding block. - The method includes determining the motion vector of the coding block and the first motion vector of the first subblock. - Motion compensation data of the first subblock is generated based on the motion vector and the first motion vector. - This is achieved by identifying a prediction block based on the motion vector of the coding block and identifying a first prediction block based on the first motion vector of the first subblock. - The prediction block and the first prediction block are then combined to generate the motion compensation data of the first subblock.

Abstract

The implementations described herein include methods and systems for coding video. In one aspect, a method includes receiving a current frame including a current coding block. The current coding block has multiple subblocks. The subblocks are associated with different motion vectors, and include a first subblock located at a boundary of the current coding block. The method includes determining a motion vector of the current coding block, determining a first motion vector of the first subblock, and generating motion compensation data of the first subblock based on the motion vector and the first motion vector of the first subblock, e.g., by identifying a prediction block based on the motion vector of the current coding block, identifying a first prediction block based on the first motion vector of the first subblock, and combining the prediction block and the first prediction block to generate the motion compensation data of the first subblock.

SIGNALING OF DOWNSAMPLING FILTERS FOR CHROMA FROM LUMA INTRA PREDICTION MODE (18054054)

Main Inventor

Jing YE


Brief explanation

This patent application describes methods and apparatuses for performing chroma from luma (CfL) intra prediction in video coding. 
  • The invention involves receiving a current block from a coded video bitstream.
  • The invention obtains a syntax element from the bitstream, which indicates the downsampling filter used for predicting the current block in a CfL intra prediction mode.
  • If the syntax element indicates the use of a first downsampling filter, the invention determines a set of filter coefficients based on that filter and downsamples the current block using a specific number of sampling positions.
  • If the syntax element indicates the use of a second downsampling filter, the invention determines a different set of filter coefficients based on that filter and downsamples the current block using a different number of sampling positions.
  • The invention then reconstructs the current block after downsampling it.

Abstract

Methods and apparatuses for performing chroma from luma (CfL) intra prediction, including: receiving a current block from a coded video bitstream; obtaining, from the coded video bitstream, a syntax element indicating which of two or more downsampling filters is used for predicting the current block in a CfL intra prediction mode; in response to the syntax element indicating that a first downsampling filter is used for the current block: determining a plurality of filter coefficients according to the first downsampling filter; and downsampling the current block based on the determined plurality of coefficients using a first number of sampling positions; in response to the syntax element indicating that a second downsampling filter is used for the current block: determining the plurality of filter coefficients according to the second downsampling filter; downsampling the current block based on the determined plurality of coefficients using a second number of sampling positions, wherein the second number of sampling positions are different from the first number of sampling positions; and reconstructing the current block after downsampling the current block.

TEMPORAL PREDICTION BASED VERTEX POSITION COMPRESSION (18127487)

Main Inventor

Jun TIAN


Brief explanation

The abstract describes a method for predicting the movement of vertices in a mesh over time.
  • The method determines neighboring vertices of a current vertex in a mesh at a specific time.
  • Each neighboring vertex is connected to the current vertex through an edge in the mesh.
  • The method calculates the estimation errors of the neighboring vertices by comparing their positions in a reference frame (at a different time) with their positions in the current frame.
  • A prediction residue is determined for the current vertex based on the neighboring estimation errors.
  • Prediction information for the current vertex is generated using the prediction residue.

Abstract

A plurality of neighboring vertices of a current vertex in a current frame of a mesh is determined. The current frame corresponds to the mesh at a first time instance. Each of the plurality of neighboring vertices is connected to the current vertex through a respective edge in the mesh. A plurality of neighboring estimation errors of the plurality of neighboring vertices is determined. Each of the plurality of neighboring estimation errors indicates a difference between a reference vertex of a corresponding one of the plurality of neighboring vertices in a reference frame and the corresponding one of the plurality of neighboring vertices in the current frame. The reference frame corresponds to the mesh at a second time instance. A prediction residue of the current vertex is determined based on the plurality of neighboring estimation errors. Prediction information of the current vertex is generated based on the determined prediction residue.

DYNAMIC MESH COMPRESSION USING INTER AND INTRA PREDICTION (18303129)

Main Inventor

Xiaozhong XU


Brief explanation

The patent application describes a method and apparatus for processing three-dimensional visual content.
  • Obtaining volumetric data of three-dimensional visual content.
  • Dividing a plurality of three-dimensional meshes from the volumetric data to obtain a patch.
  • The patch includes vertices of the three-dimensional meshes.
  • Forming a prediction group by selecting a subset of vertices from the patch.
  • Signaling a prediction mode for the prediction group collectively.

Abstract

There is includes a method and apparatus comprising computer code configured to cause a processor or processors to perform obtaining volumetric data of at least one three-dimensional (3D) visual content, obtaining a patch by dividing a plurality of 3D meshes from the volumetric data, the patch including vertices of at least one of the 3D meshes, forming a prediction group comprising a subset of the vertices of the patch, and signaling a prediction mode of the prediction group collectively for the subset of the plurality of vertices of the patch.