The buffers from the new Draw Manager increase their size as needed,
but they never shrink.
Add `StorageArrayBuffer::trim_to_next_power_of_2` function that can
downsize the buffer following the same heuristic as `get_or_resize`.
Add `StorageVectorBuffer::trim_and_clear`, which calls
`trim_to_next_power_of_2` automatically.
Pull Request: https://projects.blender.org/blender/blender/pulls/114857
PR Introduces GPU_storagebuf_sync_to_host as an explicit routine to
flush GPU-resident storage buffer memory back to the host within the
GPU command stream.
The previous implmentation relied on implicit synchronization of
resources using OpenGL barriers which does not match the
paradigm of explicit APIs, where indiviaul resources may need
to be tracked.
This patch ensures GPU_storagebuf_read can be called without
stalling the GPU pipeline while work finishes executing. There are
two possible use cases:
1) If GPU_storagebuf_read is called AFTER an explicit call to
GPU_storagebuf_sync_to_host, the read will be synchronized.
If the dependent work is still executing on the GPU, the host
will stall until GPU work has completed and results are available.
2) If GPU_storagebuf_read is called WITHOUT an explicit call to
GPU_storagebuf_sync_to_host, the read will be asynchronous
and whatever memory is visible to the host at that time will be used.
(This is the same as assuming a sync event has already been signalled.)
This patch also addresses a gap in the Metal implementation where
there was missing read support for GPU-only storage buffers.
This routine now uses a staging buffer to copy results if no
host-visible buffer was available.
Reading from a GPU-only storage buffer will always stall
the host, as it is not possible to pre-flush results, as no
host-resident buffer is available.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/113456
This PR is contains the initial capture pipeline for planar probes.
It requires work to generate the correct view to capture and to include
the result during ray tracing. These will be developed in a separate PR.
This PR detects if a planar probe is active in the scene. If this is
the case the planar probe pipeline will be activated. During rendering
this is done by querying the depsgraph, during viewport drawing this
is done during sync. If an planar probe is detected and the pipeline
wasn't activated. The pipeline will be activated and the sampling
will be reset to ensure the pipeline is filled with all objects.
Per object the user can set the visibility of the object in planar
reflections.
![image](/attachments/fcfb40f9-f174-491c-bfba-e7f00f49aa1c)
For a reflection plane the resolution and clipping offset can be set.
EDIT: Resolution option was removed because too complex to
implement with the little time we have at the moment.
![image](/attachments/e42ad9ce-8af8-45d1-aa3a-630db1901ad3)
Related to #112966
Co-authored-by: Clément Foucault <foucault.clem@gmail.com>
Pull Request: https://projects.blender.org/blender/blender/pulls/113203
Split shadow rendering per LOD per tilemap and improve
fragment shader invocation rate by using multi-viewport.
Also changes the layout of the atlas to be 4 x 4 x Layers.
This allow to grow the atlas while keeping the content
and page indirection correct, but this isn't implemented
in this patch.
# First attempt
Shadow rendering using atomic proved to be less than ideal
and performance were not quite to an acceptable level.
The previous method had issue with atomic contention when
a lot of triangle would overlap and too many fragment shader
invocations with quite complex indirection rules and biases
which made the technique costly.
The new implementation leverage multi viewport and
layered rendeing to effectively replace the need for atomic
and render directly to the shadow atlas. Using the well
supported extension these are free on modern hardware and
do not need a geometry shader.
One view per tile is needed since we use the viewport index
and the layer index as a way to index a specific tile in the
array.
# Geometric Complexity Problem
The counterpart of this is that we need to draw one geometry
instance per tile which is 32x32 time more instances (at most)
than with the previous method.
This means that we will have to find a way to mitigate this
geometry cost by either reducing the number of tiles per
tilemaps (in other words, making the system less memory efficient)
or splitting complex objects' geometry into smaller, more
cull friendly chunks (for example, like the sculpt PBVH nodes).
The later seems to be a longer term solution as it requires
way too much engineering time we have right now.
# Update Lag Problem
This also mean we can only update up to 64 tile per redraw
which is not enough even in the most basic cases. This leads
to missing or over shadowing when a light updates until there
is no updates and the shadow rendering can catch up.
One possible solution is to update a lower LODs first waiting
until there is no update to render. This would allow no artifact
during the transforms (unless there is too many light updates
even for lowest LOD, but that was an issue also for the
previous implementation). This could also help with the
geometric complexity.
# Solution
In the end, we decided to have one view per lod. This limits
the complexity of the fragment shader (improve speed),
reduces the number of views per tilemap (fix update lag),
and reduces the number of instances.
This also mean we cannot render directly to the atlas anymore
and reverted to the atomic solution. Using the smallest
possible viewport, we assure that there isn't that much fragment
shader invocations which was one of the bottleneck. And also
reduces the amount of geometry instances that pass the
clipping test.
Pull Request: https://projects.blender.org/blender/blender/pulls/110979
Listing the "Blender Foundation" as copyright holder implied the Blender
Foundation holds copyright to files which may include work from many
developers.
While keeping copyright on headers makes sense for isolated libraries,
Blender's own code may be refactored or moved between files in a way
that makes the per file copyright holders less meaningful.
Copyright references to the "Blender Foundation" have been replaced with
"Blender Authors", with the exception of `./extern/` since these this
contains libraries which are more isolated, any changed to license
headers there can be handled on a case-by-case basis.
Some directories in `./intern/` have also been excluded:
- `./intern/cycles/` it's own `AUTHORS` file is planned.
- `./intern/opensubdiv/`.
An "AUTHORS" file has been added, using the chromium projects authors
file as a template.
Design task: #110784
Ref !110783.
This is a full rewrite of the raytracing denoise pipeline. It uses the
same principle as before but now uses compute shaders for every stages
and a tile base approach. More aggressive filtering is needed since we
are moving towards having no prefiltered screen radiance buffer. Thus
we introduce a temporal denoise and a bilateral denoise stage to the
denoising. These are optionnal and can be disabled.
Note that this patch does not include any tracing part and only samples
the reflection probes. It is focused on denoising only. Tracing will
come in another PR.
The motivation for this is that having hardware raytracing support
means we can't prefilter the radiance in screen space so we have to
have better denoising. Also this means we can have better surface
appearance with support for other BxDF model than GGX. Also GGX support
is improved.
Technically, the new denoising fixes some implementation mistake the
old pipeline did. It separates all 3 stages (spatial, temporal,
bilateral) and use random sampling for all stages hoping to create
a noisy enough (but still stable) output so that the TAA soaks the
remaining noise. However that's not always the case. Depending on the
nature of the scene, the input can be very high frequency and might
create lots of flickering. That why another solution needs to be found
for the higher roughness material as denoising them becomes expensive
and low quality.
Pull Request: https://projects.blender.org/blender/blender/pulls/110117
When using `Texture.ensure_cube_array` the resulting texture wasn't actually
layered (array) and when used resulted into incorrect behavior.
Until now this function isn't used, but will be in when Eevee-next
world reflective light PR lands #108149 .
Pull Request: https://projects.blender.org/blender/blender/pulls/109497
A lot of files were missing copyright field in the header and
the Blender Foundation contributed to them in a sense of bug
fixing and general maintenance.
This change makes it explicit that those files are at least
partially copyrighted by the Blender Foundation.
Note that this does not make it so the Blender Foundation is
the only holder of the copyright in those files, and developers
who do not have a signed contract with the foundation still
hold the copyright as well.
Another aspect of this change is using SPDX format for the
header. We already used it for the license specification,
and now we state it for the copyright as well, following the
FAQ:
https://reuse.software/faq/
Adds stencil texture view support for Metal, allowing reading of
stencil component during texture sample/read.
Stencil view creation refactored to use additional parameter in
textureview creation function, due to deferred stencil parameter
causing double texture view creation in Metal, when this should
ideally be provided upfront.
Authored by Apple: Michael Parkin-White
Pull Request: https://projects.blender.org/blender/blender/pulls/107971
This function was not exposed outside of internal GPU module.
Renaming `draw::Texture::depth()` to `is_depth` for consistency
and removing the ambiguity.
For example
```
OIIOOutputDriver::~OIIOOutputDriver()
{
}
```
becomes
```
OIIOOutputDriver::~OIIOOutputDriver() {}
```
Saves quite some vertical space, which is especially handy for
constructors.
Pull Request: https://projects.blender.org/blender/blender/pulls/105594
For every other texture types this is expected to be implicitly
`GPU_DATA_FLOAT`. There is only one case where this is not the case.
I believe this was previously needed because the data type was
conditionning the texture creation. This is not the case anymore.
Implements virtual shadow mapping for EEVEE-Next primary shadow solution.
This technique aims to deliver really high precision shadowing for many
lights while keeping a relatively low cost.
The technique works by splitting each shadows in tiles that are only
allocated & updated on demand by visible surfaces and volumes.
Local lights use cubemap projection with mipmap level of detail to adapt
the resolution to the receiver distance.
Sun lights use clipmap distribution or cascade distribution (depending on
which is better) for selecting the level of detail with the distance to
the camera.
Current maximum shadow precision for local light is about 1 pixel per 0.01
degrees.
For sun light, the maximum resolution is based on the camera far clip
distance which sets the most coarse clipmap.
## Limitation:
Alpha Blended surfaces might not get correct shadowing in some corner
casses. This is to be fixed in another commit.
While resolution is greatly increase, it is still finite. It is virtually
equivalent to one 8K shadow per shadow cube face and per clipmap level.
There is no filtering present for now.
## Parameters:
Shadow Pool Size: In bytes, amount of GPU memory to dedicate to the
shadow pool (is allocated per viewport).
Shadow Scaling: Scale the shadow resolution. Base resolution should
target subpixel accuracy (within the limitation of the technique).
Related to #93220
Related to #104472
Rewrite of the Workbench engine using C++ and the new Draw Manager API.
The new engine can be enabled in Blender `Preferences > Experimental > Workbench Next`.
After that, the engine can be selected in `Properties > Scene > Render Engine`.
When `Workbench Next` is the active engine, it also handles the `Solid` viewport mode rendering.
The rewrite aims to be functionally equivalent to the current Workbench engine, but it also includes some small fixes/tweaks:
- `In Front` rendered objects now work correctly with DoF and Shadows.
- The `Sampling > Viewport` setting is actually used when the viewport is in `Render Mode`.
- In `Texture` mode, textured materials also use the material properties. (Previously, only non textured materials would)
To do:
- Sculpt PBVH.
- Volume rendering.
- Hair rendering.
- Use the "no_geom" shader versions for shadow rendering.
- Decide the final API for custom visibility culling (Needed for shadows).
- Profile/optimize.
Known Issues:
- Matcaps are not loaded until they’re shown elsewhere. (e.g. when opening the `Viewort Shading` UI)
- Outlines are drawn between different materials of the same object. (Each material submesh has its own object handle)
Reviewed By: fclem
Maniphest Tasks: T101619
Differential Revision: https://developer.blender.org/D16826
Texture usage flags can now be provided during texture creation specifying
the ways in which a texture can be used. This allows the GPU backends to
perform contextual optimizations which were not previously possible. This
includes enablement of hardware lossless compression which can result in
a 15%+ performance uplift for bandwidth-limited scenes on hardware such
as Apple-Silicon using Metal.
GPU_TEXTURE_USAGE_GENERAL can be used by default if usage is not known
ahead of time. Patch will also be relevant for the Vulkan backend.
Authored by Apple: Michael Parkin-White
Ref T96261
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D15967
This is a new implementation of the draw manager using modern
rendering practices and GPU driven culling.
This only ports features that are not considered deprecated or to be
removed.
The old DRW API is kept working along side this new one, and does not
interfeer with it. However this needed some more hacking inside the
draw_view_lib.glsl. At least the create info are well separated.
The reviewer might start by looking at `draw_pass_test.cc` to see the
API in usage.
Important files are `draw_pass.hh`, `draw_command.hh`,
`draw_command_shared.hh`.
In a nutshell (for a developper used to old DRW API):
- `DRWShadingGroups` are replaced by `Pass<T>::Sub`.
- Contrary to DRWShadingGroups, all commands recorded inside a pass or
sub-pass (even binds / push_constant / uniforms) will be executed in order.
- All memory is managed per object (except for Sub-Pass which are managed
by their parent pass) and not from draw manager pools. So passes "can"
potentially be recorded once and submitted multiple time (but this is
not really encouraged for now). The only implicit link is between resource
lifetime and `ResourceHandles`
- Sub passes can be any level deep.
- IMPORTANT: All state propagate from sub pass to subpass. There is no
state stack concept anymore. Ensure the correct render state is set before
drawing anything using `Pass::state_set()`.
- The drawcalls now needs a `ResourceHandle` instead of an `Object *`.
This is to remove any implicit dependency between `Pass` and `Manager`.
This was a huge problem in old implementation since the manager did not
know what to pull from the object. Now it is explicitly requested by the
engine.
- The pases need to be submitted to a `draw::Manager` instance which can
be retrieved using `DRW_manager_get()` (for now).
Internally:
- All object data are stored in contiguous storage buffers. Removing a lot
of complexity in the pass submission.
- Draw calls are sorted and visibility tested on GPU. Making more modern
culling and better instancing usage possible in the future.
- Unit Tests have been added for regression testing and avoid most API
breakage.
- `draw::View` now contains culling data for all objects in the scene
allowing caching for multiple views.
- Bounding box and sphere final setup is moved to GPU.
- Some global resources locations have been hardcoded to reduce complexity.
What is missing:
- ~~Workaround for lack of gl_BaseInstanceARB.~~ Done
- ~~Object Uniform Attributes.~~ Done (Not in this patch)
- Workaround for hardware supporting a maximum of 8 SSBO.
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D15817
This new implementation does all downsampling in a single compute shader
dispatch, removing a lot of complexity from the previous recursive
downsampling.
This is heavilly inspired by the Single-Pass-Downsampler from GPUOpen:
https://github.com/GPUOpen-Effects/FidelityFX-SPD
However I do not implement all the optimization bits as they require
vulkan (GL_KHR_shader_subgroup) and is not as versatile (it is only
for HiZ).
Timers inside renderdoc report ~0.4ms of saving on a 2048*1024 render for
the whole downsampling. Note that the previous implementation only
processed 6 mips where the new one processes 8 mips.
```
EEVEE ~1.0ms
EEVEE-Next ~0.6ms
```
Padding has been bumped to be of 128px for processing 8 mips.
A new debug option has been added (debug value 2) to validate the HiZ.
This removes the quirk of having to call the sync function for each new
render loop.
# Conflicts:
# source/blender/draw/engines/eevee_next/eevee_view.cc