This patch unifies the Defocus node between the CPU and GPU compositors.
Both nodes now use a variable sized bokeh kernel which is always odd
sized for a center pixel guarantee. Further the CPU implementation now
properly handles half pixel offsets when doing interpolation, and always
sets the threshold to zero similar to the GPU implementation.
This patch ports the GLSL SMAA library to the CPU compositor in order to
unify the anti-aliasing behavior between the CPU and GPU compositor.
Additionally, the SMAA texture generator was removed since it is now
unused.
Previously, we used an external C++ library for SMAA anti-aliasing,
which is itself a port of the GLSL SMAA library. However, the code
structure and results of the library were different, which made it quite
difficult to match results between CPU and GPU, hence the decision to
port the library ourselves.
The port was performed through a complete copy of the library to C++,
retaining the same function and variable names, even if they are
different from Blender's naming conversions. The necessary code changes
were done to make it work in C++, including manually doing swizzling
which changes the code structure a bit.
Even after porting the library, there were still major differences
between CPU and GPU, due to different arithmetic precision. To fix this
some of the bilinear samplers used in branches and selections were
carefully changed to use point samplers to avoid discontinuities around
branches, also resulting in a nice performance improvement. Some slight
differences still exist due to different bilinear interpolation, but
they shall be looked into later once we have a baseline implementation.
The new implementation is slower than the existing implementation, most
likely due to the liberal use of bilinear interpolation, since it is
quite cheap on GPUs and the code even does more work to use bilinear
interpolation to avoid multiple texture fetches, except this causes a
slow down on CPUs. Some of those were alleviated as mentioned in the
previous section, but we can probably look into optimizing it further.
Pull Request: https://projects.blender.org/blender/blender/pulls/119414
This patch implements the GPU Bloom glare for the CPU compositor, and
adds a new option for it, leaving the Fog Glow option unimplemented once
again for the GPU compositor.
Pull Request: https://projects.blender.org/blender/blender/pulls/119128
This patch unifies the anti-aliasing of plane deforms between the CPU
and GPU compositors. The CPU used a multi-sample approach, where the
mask was computed 8 times with a jitter, then averaged to get smooth
edges. The GPU relied on the anisotropic filtering with zero boundaries
to smooth the edges.
Furthermore, the CPU implementation ignored the anti-aliasing for the
deformed image and also relied anisotropic filtering like the GPU, so
its outputs were different.
To unify both implementation, we use the existing SMAA anti-aliasing
algorithm instead, and use the anti-aliased mask for the image output as
well. This affects both the Corner Pin and Plane Deform nodes.
A consequence of this change for the Plane Deform node is that motion
blur will appear to have less samples, that's because it was sampled
8-times more in the previous implementation. But users can just increase
the samples in the node to account for that.
Pull Request: https://projects.blender.org/blender/blender/pulls/118853
Currently, the CPU compositor smoothes its input size in variable size
mode. It is unclear why this is the case, but it seems the logic is that
sharp transitions in the size input are undesirable. Alternatively, this
is similar to the morphological blurring step in the Defocus node. But
it does not use standard weights and it is not morphological in nature
at all.
This patch removes the smoothing step and uses the original size
provided by the user. Looking at resources online, it seems users almost
always expect the size inputs to be used directly, so there is no reason
for force smooth their inputs.
Pull Request: https://projects.blender.org/blender/blender/pulls/118757
New ("fullframe") CPU compositor backend is being used now, and all the code
related to "tiled" CPU compositor is just never used anymore. The new backend
is faster, uses less memory, better matches GPU compositor, etc.
TL;DR: 20 thousand lines of code gone.
This commit:
- Removes various bits and pieces related to "tiled" compositor (execution
groups, one-pixel-at-a-time node processing, read/write buffer operations
related to node execution groups).
- "GPU" (OpenCL) execution device, that was only used by several nodes of
the tiled compositor.
- With that, remove CLEW external library too, since nothing within Blender
uses OpenCL directly anymore.
Pull Request: https://projects.blender.org/blender/blender/pulls/118819
This patch adjusts the Variable Size Bokeh Blur node such that it
matches between CPU and GPU. The GPU implementation is mostly followed
for the reasons stated below.
The first difference is a bug in the CPU implementation, where the upper
limit of the blur window is not considered, but the lower limit is.
The second difference is due to CPU ignoring outside pixels instead of
clamping them to border, which is done until an option is added to the
node to control the boundary condition.
The third difference is due to CPU ignoring the bounding box input.
The fourth difference is that CPU doesn't allow zero maximum blur
radius, which is a valid option.
The fifth difference is that the threshold option, which is only used
for the Defocus node, was considered in a greater than manner, while it
should be greater than or equal. Since the default threshold of one
should allow a blur size of one.
The GPU implementation now considers the maximum size of its input,
following the CPU implementation.
Pull Request: https://projects.blender.org/blender/blender/pulls/117947
This patch ports the newly redesigned GPU Inpaint node to the CPU. The
code is mostly identical to the GPU code with the necessary adjustments
to make it work with CPU.
See #114849 for more information.
Pull Request: https://projects.blender.org/blender/blender/pulls/117716
This patch rewrites and optimizes the Double Edge Mask node to be orders
of magnitude faster. For a 1k complex mask, it is 650x faster, while for
a 1k simple mask, it is only 50x faster.
This improvement is attributed to the use of a new Jump Flooding
algorithm as well as multi-threading, matching the GPU implementation.
The result of the new implementation differs in that the boundaries of
the masks are now identified as the last pixels inside the mask.
Pull Request: https://projects.blender.org/blender/blender/pulls/117545
This patch changes all anti-aliasing operations to use SMAA instead of
the Scale3x-based operation. That's because SMAA is more accurate while
the Scale3x one is more a hack.
When using the MapUV node for certain NPR workflows (for example palette
based remapping of colors) it can be useful to not use the default
anisotropic filtering.
In preparation of potentially adding more filter modes at a later stage
and to keep things consistent with the 'transform' node we use the full
set of interpolation modes in the enum, but expose only the implemented
ones in RNA..
The old implementation was a simple rounding operation and was not
implemented for full-frame compositor.
The issue with the old implementation is that it will not give
satisfactory results for images with high frequency details,
including cases when is used for a preview on Cycles render with
low number of samples. Additionally, when applied on animated
footage it produces very noisy result.
The new algorithm uses an explicit pixel size setting, which allows
the node to be used on its own, without need to have scale-down and
scale-up nodes. It also uses neighbour averaging, which produces
better looking result during animation and noisy input images.
The old tiled compositor setup will render without changes with
this change. This commit does not include modifications in the GPU
compositor implementation.
Ref #88150
Pull Request: https://projects.blender.org/blender/blender/pulls/117223
For high radii Kuwahara, we use a Summed Area Table (SAT) implementation
to accelerate the classic variant of the algorithm. The problem is that
due to limited floating point precision, the SAT can produce artifacts
in its output.
An attempt to fix this was implemented in #114191, and while that patch
improved precision by 10x, the artifacts still existed, albeit less
noticeable. But since the improved precision also meant a performance
penalty, it was decided that the improvement is not worth it.
Since the artifacts are only noticeable for scenes with very high
values, this patch adds a High Precision option that defaults to false
and can be enabled by the user upon noticing any artifacts. The option
simply uses direction convolution instead of SAT in this case. The
downside, of course, is that it can be orders of magnitude slower.
An alternative to using this option is for the user to clamp the input
or downsample the image. Both methods should be documented in the
documentation.
Fixes: #113578.
Pull Request: https://projects.blender.org/blender/blender/pulls/115763
Along with the 4.1 libraries upgrade, we are bumping the clang-format
version from 8-12 to 17. This affects quite a few files.
If not already the case, you may consider pointing your IDE to the
clang-format binary bundled with the Blender precompiled libraries.
This patch unifies the behavior of the scale node by removing the arbitrary limit of scaling images. The change applies to scale and transform nodes.
Note: with this change, Blender can crash for large resizing factors (around 10 000 x 10 000 relative factors). We think it's very unlikely users will run into this issue, so we agreed errors coming from failed memory allocation won't be handled, as is the case for GPU compositor.
Pull Request: https://projects.blender.org/blender/blender/pulls/114764
This patches refactors the compositor File Output mechanism and
implements the file output node for the Realtime Compositor. The
refactor was done for the following reasons:
1. The existing file output mechanism relied on a global EXR image
resource where the result of each compositor execution for each
view was accumulated and stored in the global resource, until the
last view is executed, when the EXR is finally saved. Aside from
relying on global resources, this can cause effective memory leaks
since the compositor can be interrupted before the EXR is written and
closed.
2. We need common code to share between all compositors since we now
have multiple compositor implementations.
3. We needed to take the opportunity to fix some of the issues with the
existing implementation, like lossy compression of data passes,
and inability to save single values passes.
The refactor first introduced a new structure called the Compositor
Render Context. This context stores compositor information related to
the render pipeline and is persistent across all compositor executions
of all views. Its extended lifetime relative to a single compositor
execution lends itself well to store data that is accumulated across
views. The context currently has a map of File Output objects. Those
objects wrap a Render Result structure and can be used to construct
multi-view images which can then be saved after all views are executed
using the existing BKE_image_render_write function.
Minor adjustments were made to the BKE and RE modules to allow saving
using the BKE_image_render_write function. Namely, the function now
allows the use of a source image format for saving as well as the
ability to not save the render result as a render by introducing two new
default arguments. Further, for multi-layer EXR saving, the existent of
a single unnamed render layer will omit the layer name from the EXR
channel full name, and only the pass, view, and channel ID will remain.
Finally, the Render Result to Image Buffer conversion now take he number
of channels into account, instead of always assuming color channels.
The patch implements the File Output node in the Realtime Compositor
using the aforementioned mechanisms, replaces the implementation of the
CPU compositor using the same Realtime Compositor implementation, and
setup the necessary logic in the render pipeline code.
Pull Request: https://projects.blender.org/blender/blender/pulls/113982
The Cryptomatte node considers preview channels as part of the
Cryptomatte layers, producing bad pick outputs at best and bad mattes at
worst.
The Cryptomatte specification specifies that a layer with the same name
as the typename is a (Now deprecated) preview layer, so it should be
ignored, which is what the patch does.
Pull Request: https://projects.blender.org/blender/blender/pulls/115879
Changes:
- Renamed Split Viewer Node to Split Node
- Split Node is now under `Utilities` (similar to Switch node)
- Versioning: split viewer from 4.0 and before is replaced with the new split node connected to a new viewer node.
Pull Request: https://projects.blender.org/blender/blender/pulls/114245
Blender freezes when adding a Cryptomatte node and using the GPU
compositor, even if the node is later removed.
This was because the CPU compositor acquired a render result mutex and
never released it due to an early exit that missed that release. This
patch fixes that by adding an appropriate release in that early exist.
This patch changes the interpolation algorithm utilized by the Keying
Screen node to a Gaussian Radial Basis Function Interpolation. This is
proposed because the current Voronoi triangulation based interpolation
has the following properties:
- Not temporally stable since the triangulation can abruptly change as
tracking markers change position.
- Not smooth in the mathematical sense, which is also readily visible in
the artists sense.
- Computationally expensive due to the triangulation and naive
rasterization algorithm.
On the other hand, the RBF interpolation method is temporally stable and
continuous, smooth and infinitely differentiable, and relatively simple
to compute assuming low number of markers, which is typically the case
for keying screen objects.
This breaks backward compatibility, but the keying screen is only used
as a secondary input for keying in typical compositor setups, so one
should expect minimal difference in outputs.
Pull Request: https://projects.blender.org/blender/blender/pulls/112480
The hash tables and vector blenlib headers were pulling many more
headers than they actually need, including the C base math header,
our C string API header, and the StringRef header. All of this
potentially slows down compilation and polutes autocomplete
with unrelated information.
Also remove the `ListBase` constructor for `Vector`. It wasn't used
much, and making it easy to use `ListBase` isn't worth it for the
same reasons mentioned above.
It turns out a lot of files depended on indirect includes of
`BLI_string.h` and `BLI_listbase.h`, so those are fixed here.
Pull Request: https://projects.blender.org/blender/blender/pulls/111801
This patch implements the Anisotropic Kuwahara filter for the Realtime
compositor and replaces the existing CPU implementation with a new one to be
compatible with the GPU implementation. The implementation is based on three
papers on Anisotropic Kuwahara filtering, presented and detailed in the code.
The new implementation exposes two extra parameters that control the sharpness
and directionality of the output, giving more artistic freedom.
While the implementation is different from the existing CPU implementation, it
is a higher quality one that is also faster and conforms better to the methods
described in the papers.
Examples can be seen in the pull request description.
Pull Request: https://projects.blender.org/blender/blender/pulls/110786
Include counts of some headers while making full blender build:
- BLI_color.hh 1771 -> 1718
- BLI_math_color.h 1828 -> 1783
- BLI_math_vector.hh 496 -> 405
- BLI_index_mask.hh 1341 -> 1267
- BLI_task.hh 958 -> 903
- BLI_generic_virtual_array.hh 509 -> 435
- IMB_colormanagement.h 437 -> 130
- GPU_texture.h 806 -> 780
- FN_multi_function.hh 331 -> 257
Note: DNA_node_tree_interface_types.h needs color include only
for the currently unused (but soon to be used) socket_color function.
Future step is to figure out how to include
DNA_node_tree_interface_types.h less.
Pull Request: #111113
Listing the "Blender Foundation" as copyright holder implied the Blender
Foundation holds copyright to files which may include work from many
developers.
While keeping copyright on headers makes sense for isolated libraries,
Blender's own code may be refactored or moved between files in a way
that makes the per file copyright holders less meaningful.
Copyright references to the "Blender Foundation" have been replaced with
"Blender Authors", with the exception of `./extern/` since these this
contains libraries which are more isolated, any changed to license
headers there can be handled on a case-by-case basis.
Some directories in `./intern/` have also been excluded:
- `./intern/cycles/` it's own `AUTHORS` file is planned.
- `./intern/opensubdiv/`.
An "AUTHORS" file has been added, using the chromium projects authors
file as a template.
Design task: #110784
Ref !110783.
Both the `Math` node and the `Vector Math` currently only explicitly
support modulo using truncated division which is oftentimes not the
type of modulo desired as it behaves differently for negative numbers
and positive numbers.
Floored Modulo can be created by either using the `Wrap` operation or
a combination of multiple `Math` nodes. However both methods obfuscate
the actual intend of the artist and the math operation that is actually
used.
This patch adds modulo using floored division to the scalar `Math` node,
explicitly stating the intended math operation and renames the already
existing `"Modulo"` operation to `"Truncated Modulo"` to avoid confusion.
Only the ui name is changed, so this should not break compatibility.
Pull Request: https://projects.blender.org/blender/blender/pulls/110728
It was only used by OpenEXR and Iris images, and saving the Z Buffer
in those formats was disabled by default. This option comes from the
times prior to the addition of the Multilayer EXR.
It also worth noting that it was not possible to save Iris with Depth
pass from Blender as internally it is called IRIZ format and it was
not exposed. But even after exposing this format option something still
was missing as saving and loading ITIZ did not show up the Depth pass.
The reason of removal is to make it a more clear match of the ImBuf
with a render pass, and use it instead of a custom type in the render
result and render pass API. This will simplify the API and also avoid
stealing buffers and making shallow copies when showing the render
result.
For the cases when Depth is needed a Multilayer EXR is to be used,
as most likely more than just the Depth will be needed.
On a user level this change:
- Removes the "Z Buffer" option from the interface.
- It preserves existing sockets in compositor nodes, but it will
output black image. Also changing the image data-block will
remove the socket unless a Multilayer EXR with Depth pass image
is selected.
- Removes "Depth" socket of the Viewer and Composite nodes.
Ref #108618
Pull Request: https://projects.blender.org/blender/blender/pulls/109687
Compute edges of image once based on luminance instead of all 3 channels.
This also gives a modest performance improvement of 8%. Measured on intel i9 CPU using a 1920 x 3199 image.
Pull Request: https://projects.blender.org/blender/blender/pulls/108858
The filter is used to reduce noise while preserving edges. It can be used to create a cartoon effect from photorealistic images.
It offers two variations:
1) Classic aka isotropic kuwahara filter: simple and faster computation. Algorithm splits an area around a single pixel in four parts and computes the mean of the region with the lowest standard deviation.
2) Anisotropic Kuwahara filter: improves the classical approach by considering the direction of structures of regions
This patch implements both approaches above as multi-threaded operations for the full-frame and tiled compositor.
Co-authored-by: Sergey Sharybin <sergey@blender.org>
Pull Request: https://projects.blender.org/blender/blender/pulls/107015