17957486. DETERMINISTIC BROADCASTING FROM SHARED MEMORY simplified abstract (Intel Corporation)

From WikiPatents
Jump to navigation Jump to search

DETERMINISTIC BROADCASTING FROM SHARED MEMORY

Organization Name

Intel Corporation

Inventor(s)

Fangwen Fu of Folsom CA (US)

Chunhui Mei of San Diego CA (US)

Maxim Kazakov of San Diego CA (US)

Biju George of Folsom CA (US)

Jorge Parra of El Dorado Hills CA (US)

Supratim Pal of Folsom CA (US)

DETERMINISTIC BROADCASTING FROM SHARED MEMORY - A simplified explanation of the abstract

This abstract first appeared for US patent application 17957486 titled 'DETERMINISTIC BROADCASTING FROM SHARED MEMORY

Simplified Explanation

Embodiments described herein provide a technique to enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load requests from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.

  • Graphics processor with cache memory and graphics core
  • Plurality of hardware threads in the graphics core
  • Memory access circuitry for memory access by hardware threads
  • Processing of load requests and detection of duplicates
  • Single read from cache memory for duplicate load requests
  • Transmission of data to requesting hardware threads

Potential Applications

This technology can be applied in graphics processing units (GPUs), gaming consoles, virtual reality systems, and other devices requiring high-performance graphics processing.

Problems Solved

1. Efficient data transfer from cache memory to register files 2. Reduction of redundant memory accesses and improved performance

Benefits

1. Faster data processing and improved overall system performance 2. Reduced power consumption due to optimized memory access 3. Enhanced graphics rendering capabilities for gaming and multimedia applications

Potential Commercial Applications

Optimized graphics processors for gaming consoles, virtual reality systems, and high-performance computing devices.

Possible Prior Art

One possible prior art could be techniques for optimizing memory access in graphics processors, such as cache hierarchies and memory coalescing methods.

Unanswered Questions

How does this technique compare to existing methods of data transfer in graphics processors?

This article does not provide a direct comparison to existing methods of data transfer in graphics processors. It would be beneficial to understand the specific advantages and disadvantages of this technique compared to traditional approaches.

What impact could this technology have on the gaming industry in terms of graphics performance and user experience?

While the benefits of this technology are outlined, the specific impact on the gaming industry in terms of graphics performance and user experience is not discussed. Further exploration into this area could provide valuable insights into the potential market impact of this innovation.


Original Abstract Submitted

Embodiments described herein provide a technique enable a broadcast load from an L1 cache or shared local memory to register files associated with hardware threads of a graphics core. One embodiment provides a graphics processor comprising a cache memory and a graphics core coupled with the cache memory. The graphics core includes a plurality of hardware threads and memory access circuitry to facilitate access to memory by the plurality of hardware threads. The graphics core is configurable to process a plurality of load request from the plurality of hardware threads, detect duplicate load requests within the plurality of load requests, perform a single read from the cache memory in response to the duplicate load requests, and transmit data associated with the duplicate load requests to requesting hardware threads.