Basically, make re-alloc and memcpy from the same lock, otherwise one
thread might be re-allocating thread while another one is trying to
copy data there.
Reported by Mohamed Sakr in IRC, thanks!
Since all the shadow catchers are already assumed to be in the footage,
the shadows they cast on each other are already in the footage too. So
don't just let shadow catchers skip self, but all shadow catchers.
Another justification is that it should not matter if the shadow catcher
is modeled as one object or multiple separate objects, the resulting
render should be the same.
Differential Revision: https://developer.blender.org/D2763
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.
For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.
Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.
This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.
Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.
Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner
Reviewed By: lukasstockner97, maiself, nirved, dingto
Subscribers: brecht
Differential Revision: https://developer.blender.org/D2586
The issue here was mainly coming from minimal pixel width feature
which is quite commonly enabled in production shots.
This feature will use some probabilistic heuristic in the curve
intersection function to check whether we need to return intersection
or not. This probability is calculated for every intersection check.
Now, when we use multiple BVH nodes for curve primitives we increase
probability of that primitive to be considered a good intersection
for us. This is similar to increasing minimal width of curve.
What is worst here is that change in the intersection probability
fully depends on exact layout of BVH, meaning probability might
change differently depending on a view angle, the way how builder
binned the primitives and such. This makes it impossible to do
simple check like dividing probability by number of BVH steps.
Other solution might have been to split BVH into fully independent
trees, but that will increase memory usage of all the static
objects in the scenes, which is also not something desirable.
For now used most simple but robust approach: store BVH primitives
time and test it in curve intersection functions. This solves the
regression, but has two downsides:
- Uses more memory.
which isn't surprising, and ANY solution to this problem will
use more memory.
What we still have to do is to avoid this memory increase for
cases when we don't use BVH motion steps.
- Reduces number of maximum available textures on pre-kepler cards.
There is not much we can do here, hardware gets old but we need
to move forward on more modern hardware..
This way we can stop traversing BVH node early on.
Gives about 2-2.5x times render time improvement with 3 BVH steps.
Hopefully this gives no measurable performance loss for scenes with
single BVH step.
Traversal is currently only implemented for QBVH, meaning old CPUs
and GPU do not benefit from this change.
Similar to the previous commit, the statistics goes as:
BVH Steps Render time (sec) Memory usage (MB)
0 46 260
1 27 373
2 18 598
3 15 826
Scene used for the tests is the agent's body from one of the barber
shop scenes (no textures or anything, just a diffuse material).
Once again this is limited to regular (non-spatial split) BVH,
Support of spatial split to this feature will come later.
The idea is to create several smaller BVH nodes for each of the motion
curve primitives. This acts as a forced spatial split for the single
primitive.
This gives up render time speedup of motion blurred hair in the cost
of extra memory usage. The numbers goes as:
BVH Steps Render time (sec) Memory usage (MB)
0 258 191
1 123 278
2 69 453
3 43 627
Scene used for the tests is the agent's hair from one of the barber
shop scenes.
Currently it's only limited to scenes without spatial split enabled,
since the spatial split builder requires some changes to work properly
with motion steps coordinates.
This avoids intersection AABB of different curve primitives
which makes it less ray-to-primitive intersections.
This gives about 30% speedup of hair rendering in the barber
shop scenes here. There is still some work to be done on those
files to solve major speed issues on certain frames.
This way we can have different limits for regular and motion curves
which we'll do in one of the upcoming commits in order to gain some
percents of speedup.
The reasoning here is that motion curves are usually intersecting
lots of others bounding boxes, which makes it inefficient to have
single primitive in the leaf node.
Maximal number of elements is supposed to be inclusive. That is what
it was always meant in this file and what @brecht considered still
the case in 6974b69c61.
In fact, the commit message to that change mentions that we allowed
up to 2 curve primitives per leaf while in fact it was doing up to 1
curve primitive.
Making it real 2 primitives at a max gives about 5% slowdown for the
koro.blend scene. This is a reason why BVHParams.max_curve_leaf_size
was changed to 1 by this change.
It was possible to have non-initialized unaligned BVH split
to be used when regular BVH split SAH was inf. Now we ensure
that unaligned splitter is only used when it's really initialized.
It's a regression and should be in 2.78a.
This reverts commit ecbfa31caa.
Original commit broke logic in nodes re-fitting. That area can
access non-existing children momentarely. Not sure what would
be best solution here, for now simply reverting the change/
The issue was caused by some false-positive empty non-AABB intersection.
Tried to tweak it a bit so it does not record intersection anymore.
Hopefully will work for all platforms. Tested here on iMac and Debian.
For proper indexing to work we need to use unaligned node with
identity transform instead of aligned nodes when doing refit.
To be backported to 2.78 release.
This is a special builder type which is allowed to orient nodes to
strands direction, hence minimizing their surface area in comparison
with axis-aligned nodes. Such nodes are much more efficient for hair
rendering.
Implementation of BVH builder is based on Embree, and generally idea
there is to calculate axis-aligned SAH and oriented SAH and if SAH
of oriented node is smaller than axis-aligned SAH we create unaligned
node.
We store both aligned and unaligned nodes in the same tree (which
seems to be different from what Embree is doing) so we don't have
any any extra calculations needed to set up hair ray for BVH
traversal, hence avoiding any possible negative effect of this new
BVH nodes type.
This new builder is currently not in use, still need to make BVH
traversal code aware of unaligned nodes.
This seems to be straightforward way to support heterogeneous nodes
in the same tree.
There is some penalty related on 4gig limit of the address space now,
but here's are the thing:
Traversal code was already using ints to store final offset, so
there can't be regressions really.
This is a required commit to make it possible to encode both aligned
and unaligned nodes in the same array. Also, in the future we can use
this to get rid of __leaf_nodes array (which is a bit tricky to do since
trickery in pack_instances().
There are several internal changes for this:
First idea is to make __tri_verts to behave similar to __tri_storage,
meaning, __tri_verts array now contains all vertices of all triangles
instead of just mesh vertices. This saves some lookup when reading
triangle coordinates in functions like triangle_normal().
In order to make it efficient needed to store global triangle offset
somewhere. So no __tri_vindex.w contains a global triangle index which
can be used to read triangle vertices.
Additionally, the order of vertices in that array is aligned with
primitives from BVH. This is needed to keep cache as much coherent as
possible for BVH traversal. This causes some extra tricks needed to
fill the array in and deal with True Displacement but those trickery
is fully required to prevent noticeable slowdown.
Next idea was to use this __tri_verts instead of __tri_storage in
intersection code. Unfortunately, this is quite tricky to do without
noticeable speed loss. Mainly this loss is caused by extra lookup
happening to access vertex coordinate.
Fortunately, tricks here and there (i,e, some types changes to avoid
casts which are not really coming for free) reduces those losses to
an acceptable level. So now they are within couple of percent only,
On a positive site we've achieved:
- Few percent of memory save with triangle-only scenes. Actual save
in this case is close to size of all vertices.
On a more fine-subdivided scenes this benefit might become more
obvious.
- Huge memory save of hairy scenes. For example, on koro.blend
there is about 20% memory save. Similar figure for bunny.blend.
This memory save was the main goal of this commit to move forward
with Hair BVH which required more memory per BVH node. So while
this sounds exciting, this memory optimization will become invisible
by upcoming Hair BVH work.
But again on a positive side, we can add an option to NOT use Hair
BVH and then we'll have same-ish render times as we've got currently
but will have this 20% memory benefit on hairy scenes.
It was initially unsupported because initial idea of checking visibility
of all children was slowing scenes down a lot. Now the idea has changed
and we only perform visibility check of current node. This avoids huge
slowdown (from tests here it seems to be withing 1-2%, but more tests
would never hurt) and gives nice speedup of ray traversal for complex
scenes which utilized ray visibility.
Here's timing of koro.blend:
Without visibility check With visibility check
Original file 4min 20sec 4min 23sec
Camera rays only 1min 43 sec 55sec
Unfortunately, this doesn't come for free and requires extra data in
BVH node, which increases memory usage of BVH nodes by 15%. This we
can solve with some future trickery of avoiding __tri_storage created
for curve segments.
In certain types of animation it's possible to have some objects
scaling to zero. In this case we can save render times by avoid
traversing such instances.
Better to do ti ahead of a time, so traversal stays simple.
Reviewers: lukasstockner97, dingto, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2048
Some of these values can get quite large and are hard to read, adding this
makes it easy to read them at a glance.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2039
This commits implements threaded sorting of references when looking for
object spatial split. It mainly useful when doing initial binning, which
happens from main thread.
Gives nice speedup of BVH build for Bunny.blend: 36sec vs. 55sec for
the Rabbit mesh BVH build.
On more complex scenes the speedup is probably minimal, but still nice
to have more instant rendering for simplier scenes.
Some further tests with production scenes would be interesting.
Reviewers: juicyfruit, dingto, lukasstockner97, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D1928
Simple fix: release lock earlier.
Reduces spatial split build time from 96 to 53sec on the Bunny.blend
(using studio Intel for benchmark).
NOTE: Timing difference is not that spectacular when comparing numbers
with builds before memory optimization, but even then it's about 20%
faster build.
This was only visible on systems with lots of threads and root of the issue
was that we've been pre-allocating too much memory for all the threads.
Now we only pre-allocate data for the main thread and rest of the threads
does allocation on-demand.
This brings down memory usage from 36Gig to 6.9Gig when building spatial
split for the Bunny.blend file on our Intel beast.
Originally regression was happened by the threaded spacial split builder
commit.
The title actually covers it all, This commit exploits all the work
being done in previous changes to make it possible to build spatial
splits in threads.
Works quite nicely, but has a downside of some extra memory usage.
In practice it doesn't seem to be a huge problem and that we can
always look into later if it becomes a real showstopper.
In practice it shows some nice speedup:
- BMW27 scene takes 3 now (used to be 4)
- Agent shot takes 5 sec (used to be 80)
Such non-linear speedup is most likely coming from much less amount
of heap re-allocations. A a downside, there's a bit of extra memory
used by BVH arrays. From the tests amount of extra memory is below
0.001% so far, so it's not that bad at all.
Reviewers: brecht, juicyfruit, dingto, lukasstockner97
Differential Revision: https://developer.blender.org/D1820
Policy here is a bit more complicated, if tree becomes too deep we're
forced to create a leaf node and size of that leaf wouldn't be so well
predicted, which means it's quite tricky to use single stack array for
that.
Made it more official feature that StackAllocator will fall-back to
heap when running out of stack memory.
It's still much better than always using heap allocator.
Currently they're staying at 1 (actual size over capacity), but we
will be changing it quite soon in order to avoid having too much
memory re-allocation happening at a BVH build time and will be
playing with different policies for that.
In some files stack memory was overruning the pre-allocated stack.
Perhaps we should fall-back to a hep-allocated stack so release builds
don't crash in works case but just becoming slower.
There are in fact some missing parts to it (Split BVH builder should
be creating bins from result of Object Split constructor).
Doable, but need to quickly fix issue for the studio here, easier to
revert for now.
This has following advantages:
- Localizes all the run-time storage into a single structure,
which could easily be extended further.
- Storage could be created per-thread, so once builder is
threaded we wouldn't have any conflicts between threads.
- Global nature of the storage avoids memory re-allocation
on the runtime, keeping builder as fast as possible.
Currently it's just API changes, which don't affect user at all.
Uses new StackAllocator from util_stack_allocator. Some tweaks to the stack
storage size are possible, read notes in the code about this.
At this point we might want to rename allocator files to util_allocator_foo.c,
so the stay nicely grouped in the folder.
This commit makes it so casting subsurface rays will totally ignore all
the BVH nodes and primitives which do not belong to a current object,
making it much simpler traversal code and reduces number of intersection
tests.
Reviewers: brecht, juicyfruit, dingto, lukasstockner97
Differential Revision: https://developer.blender.org/D1823
Notes:
- There is still some bvh cache code, but that is from the engines initial commit, we might clean this up further or keep it.
- Changes in util_cache.h/.c are kept, this might be re-used in the future.
Avoid memmove() happening on every insert of duplicated node to the references
list. Temporary pre-allocated vector is used for new references which is then
being inserted into actual array in one go later.
Gives around 4x speedup building spatially split BVH for the grass field in the
cassette player shot from Gooseberry.
This commit implements object reference node spatial split making it possible
to use spatial split for top-level BVH.
The code is not in use yet because enabling spatial split on top level BVH is
not coming for free and it needs to be investigated if it's worth in terms of
improved render times.
This way it's easy to add more reference types allowed for splitting to the
BVH reference split function without making this function too much big. This
way it's possible to experiment with such features as splitting object instance
references.
So far should not be any functional changes.
Previous idea behind having vector during building and array for actual storage
was needed in order to minimize amount of re-allocations happening during the
build, but it lead to double memory overhead used by those arrays at the vector
to array conversion stage.
Issue with such approach was that for BVH without spatial split size of arrays
is known in advance and it never changes, which made vector to array conversion
totally redundant.
Also after testing with several rather complex from spatial split scenes (such
as trees) it seems even conservative approach of reallocation (when we perform
re-allocation when leaf does not fit into the memory) doesn't give measurable
difference in time.
This makes it so we can switch to array, which will avoid unneeded memory
re-allocations when spatial split is disabled without harming other cases.
it's a bit difficult to measure exact benefit of this change on our production
files here, but depending on the scene it might give quite reasonable memory
save.
This way we can get rid of inefficient memory usage caused by BVH boundbox
part being unused by leaf nodes but still being allocated for them. Doing
such split allows to save 6 of float4 values for QBVH per leaf node and 3
of float4 values for regular BVH per leaf node.
This translates into following memory save using 01.01.01.G rendered
without hair:
Device memory size Device memory peak Global memory peak
Before the patch: 4957 5051 7668
With the patch: 4467 4562 7332
The measurements are done against current master. Still need to run speed tests
and it's hard to predict if it's faster or not: on the one hand leaf nodes are
now much more coherent in cache, on the other hand they're not so much coherent
with regular nodes anymore.
Reviewers: brecht, juicyfruit
Subscribers: venomgfx, eyecandy
Differential Revision: https://developer.blender.org/D1236
It was an issue with what bounds to use for BVH node during construction.
Also corrected case when there are all 4 primitive types in the range and
also there're objects in the same range.
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
The issue was caused by mismatch in how aligned triangles storage was
filled in during BVH construction and how it was used during rendering.
Basically, i was leaving uninitialized storage for triangles when
there was deformation motion blur detected for the mesh. Was likely
some sort of optimization, but in fact it's still possible that regular
triangles would be needed for rendering.
So now we're storing aligned storage for all triangle primitives and
only skipping motion triangles (the deformation motion blur flag from
mesh is now ignored).
Ideally we should get rid of those temporary vectors anyway, but
it's not so trivial because of the alignment. For untl then we'll
just have a bit worse solution. This part of code is not the root
of the issue of memory spikes for now anyway.
But since we're getting rid of temporary memory earlier actual spike
is a bit smaller as now. For example in franck_sheep file it's now
5489.69MB vs. previously 5599.90MB.
This way we save 3 bytes per BVH node while building BVH, which overall
gives 100Mb memory save when preparing Frank for render.
It's not really much comparing to overall memory usage (which is 11Gb
during scene preparation here) but still doesn't harm to have solved.
When doing BVH leaf node split we can't rely on leaf size limit from
BVH parameters in case there's spatial split enabled.
This commit basically reverts previous optimization change here which
used stack-allocated memory and uses heap-allocated vector now.
It's possible to boost this code up again by using own allocator.
This commit basically makes it so statistics print from different BVH trees are not
being interleaved with each other. Glog ensures this when debug print is done as a
single put to stream operator.
Since leaf node gets split further into per-primitive type leaves old check
for number of curves became a bit ridiculous -- it might lead to two leaf nodes
each of which would contain only one curve primitive (one motion curve and one
regular curve).
This lead to quite dramatic slowdown for Victor model -- around 40%, which is
totally unacceptable.
This commit is aimed to prevent such situation and from quick render test it
seems victor is now back to normal render time. Further testing is needed tho.
There are also other ideas about splitting the node, will need to look into
them next.
This commit enables BVH leaf nodes split by the primitive type and makes it
so BVH traversal code is now aware and benefits from this.
As was mentioned in original commit, this change is crucial to be able to do
single ray to multiple triangle intersection. But it also appears to give
barely visible speedup in some scene.
In any case there should be no noticeable slowdown, and this change is what
we need to have anyway.
The idea of this change is make it possible to split leaf nodes by primitive
type, making leaf containing primitives of the same type.
This would become handy when working on a single ray to multiple triangles
intersection code, plus with careful implementation it might give some extra
benefits on BVH traversal code by avoiding primitive type fetch and check for
each primitive in the node. But that's a bit tricky to have benefits on this
change only because depth of BVH increases.
This option is not exposed to the interface at all and not used even secretly,
the commit is only needed to help working further in this direction without
messing around with local patches and worrying of them running out of date.
This is harmless for now because tail of the node is zero in there, but better
to fix it early so in the case of extending BVH nodes this code doesn't give
issues.
This commit implements traversal for QBVH tree, which is based on the old loop
code for traversal itself and Embree for node intersection.
This commit also does some changes to the loop inspired by Embree:
- Visibility flags are only checked for primitives.
Doing visibility check for every node cost quite reasonable amount of time
and in most cases those checks are true-positive.
Other idea here would be to do visibility checks for leaf nodes only, but
this would need to be investigated further.
- For minimum hair width we extend all the nodes' bounding boxes.
Again doing curve visibility check is quite costly for each of the nodes and
those checks returns truth for most of the hierarchy anyway.
There are number of possible optimization still, but current state is good
enough in terms it makes rendering faster a little bit after recent watertight
commit.
Currently QBVH is only implemented for CPU with SSE2 support at least. All
other devices would need to be supported later (if that'd make sense from
performance point of view).
The code is enabled for compilation in kernel. but blender wouldn't use it
still.
Using this paper: Sven Woop, Watertight Ray/Triangle Intersection
http://jcgt.org/published/0002/01/05/paper.pdf
This change is expected to address quite reasonable amount of reports from the
bug tracker, plus it might help reducing the noise in some scenes.
Unfortunately, it's currently about 7% slower than the previous solution with
pre-computed triangle plane equations, but maybe with some smart tweaks to the
code (tests reshuffle, using SIMD in a nice way or so) we can avoid the speed
regression.
But perhaps smartest thing to do here would be to change single triangle / ray
intersection with multiple triangles / ray intersections. That's how Embree does
this and it's watertight single ray intersection is not any faster that this.
Currently only triangle intersection is modified accordingly to the paper, in
the future we would also want to modify the node / ray intersection.
Reviewers: brecht, juicyfruit
Subscribers: dingto, ton
Differential Revision: https://developer.blender.org/D819
The idea is to store visibility flags for leaf nodes only since visibility check
for inner nodes costs too much for QBVH hence it is not optimal to perform.
Leaf QBVH nodes have plenty of space to store all sort of flags, so we can make
nodes one element smaller, saving noticeable amount of memory.
Previously offsets were calculated based on the BVH node size,
which is wrong and real PITA in cases when some extra data is
to be added into (or removed from) the node.
Now use offsets which are not calculated form the node size.
This solves quite an over-allocation in BVH instances packing code,
unfortunately, it's not a magic bullet to solve memory bump caused
by the recent QBVH changes.
For that we'll likely need to decouple storage for leaf and inner
nodes. However, it's not really clear for now if it's something
important since that'd still be just a fraction of memory comparing
to all the hi-res textures.
Title says it all, quite straightforward implementation.
Would only mention that there's a bit of code duplication around packing node
into pack.nodes. Trying to de-duplicate it ends up in quite hairy code (like
functions with loads of arguments some of which could be NULL in certain
circumstances etc..). Leaving solving this duplication for later.
Before all the nodes were counted and allocated, leading to situations when
bunch of allocated memory is not used because reasonable amount of nodes are
simply ignored.
The issue was noticed with gcc-4.7 (used by the release build environment)
which didn't generate optimal enough code for BVH references swap. Seems it
looked up for the assign operator for each of the reference structure members
even though nothing special was required for assignment.
Forcing compiler to use simple memory copy gives speedup of like 2-3 times.
The issue doesn't happen with OSX's clang and new gcc-4.9, but since we're
gonna to stick to gcc-4.7 for official releases for quite some time still it's
nice to have performance issues resolved for all the compilers.
Didn't put the code into #ifdef so if in the future some issues appears with
alignment or assignment which need to happen as an operator we notice this
earlier.
Before this Cycles used to try using the cache even so it knew for the
fact that reading it from the disk failed. This change doesn't make it
more stable if someone will try to trick Cycles and give malformed data
but it solves general cases when Blender crashed during the cache write
and will preserve rendering from crashing when trying to use that partial
cache.
- Removed the cycles subdivision and interpolation of hairkeys.
- Removed the parent settings.
- Removed all of the advanced settings and presets.
- This simplifies the UI to a few settings for the primitive type and a shape mode.
* Increase the maximum amount of closures per shader from 16 to 64, so more complex closure trees can be rendered.
I measured performance on CPU and GPU (Geforce 540M) and couldn't find a performance impact, but if someone encounters a noticeable impact on his system, please report.
On the BMW scene, this gives roughly a 10% speedup overall with clang/gcc, and 30%
speedup with visual studio (2008). It turns out visual studio was optimizing the
existing code quite poorly compared to pretty good autovectorization by clang/gcc,
but hand written SSE code also gives a smaller speed boost there.
This code isn't enabled when using the hair minimum width feature yet, need to
make that work with the SSE code still.
* Revert r57203 (len() renaming)
There seems to be a problem with nVidia OpenCL after this and I haven't figured out the real cause yet.
Better to selectively enable native length() later, after figuring out what's wrong.
This fixes [#35612].
* Rename some math functions:
len -> length
len_squared -> length_squared
normalize_len -> normalize_length
* This way OpenCL uses its inbuilt length() function, rather than our own. The other two functions have been renamed for consistency.
* Tested CPU, CUDA and OpenCL compile, should be no functional changes.
The problem was (again) the x86 extended precision float register being used for
one float value while the other was rounded to lower precision. This caused the
strictly weak order requirement for std::sort to be broken.
Code is added to restrict the pixel size of strands in cycles. It works best with ribbon primitives and a preset for these is included. It uses distance dependent expansion of the strands and then stochastic strand removal to give a fading. To prevent a slowdown for triangle mesh objects in the BVH an extra visibility flag has been added. It is also only applied for camera rays.
The strand width settings are also changed, so that the particle size is not included in the width calculation. Instead there is a separate particle system parameter for width scaling.
The curve segment primitive has been added. This includes an intersection function and changes to the BVH.
A few small errors in the line segment intersection routine are also fixed.
should be no functional changes yet. UV, tangent and intercept are now stored
as attributes, with the intention to add more like multiple uv's, vertex
colors, generated coordinates and motion vectors later.
Things got a bit messy due to having both triangle and curve data in the same
mesh data structure, which also gives us two sets of attributes. This will get
cleaned up when we split the mesh class.
Patch [#33445] - Experimental Cycles Hair Rendering (CPU only)
This patch allows hair data to be exported to cycles and introduces a new line segment primitive to render with.
The UI appears under the particle tab and there is a new hair info node available.
It is only available under the experimental feature set and for cpu rendering.
- move object_iterators.c --> view3d_iterators. (ED_object.h had to include ED_view3d.h which isn't so nice)
- move projection functions from view3d_view.c --> view3d_project.c (view3d_view was becoming a mishmash of utility functions and operators).
- some some cmake includes as system-includes.
* Multithreaded image loading, each thread can load a separate image.
* Better multithreading for multiple instanced meshes, different threads can now
build BVH's for different meshes, rather than all cooperating on the same mesh.
Especially noticeable for dynamic BVH building for the viewport, gave about
2x faster build on 8 core in fairly complex scene with many objects.
* The main thread waiting for worker threads can now also work itself, so
(num_cores + 1) threads will be working, this supposedly gives better
performance on some operating systems, but did not measure performance for
this very detailed yet.
=== BVH build time optimizations ===
* BVH building was multithreaded. Not all building is multithreaded, packing
and the initial bounding/splitting is still single threaded, but recursive
splitting is, which was the main bottleneck.
* Object splitting now uses binning rather than sorting of all elements, using
code from the Embree raytracer from Intel.
http://software.intel.com/en-us/articles/embree-photo-realistic-ray-tracing-kernels/
* Other small changes to avoid allocations, pack memory more tightly, avoid
some unnecessary operations, ...
These optimizations do not work yet when Spatial Splits are enabled, for that
more work is needed. There's also other optimizations still needed, in
particular for the case of many low poly objects, the packing step and node
memory allocation.
BVH raytracing time should remain about the same, but BVH build time should be
significantly reduced, test here show speedup of about 5x to 10x on a dual core
and 5x to 25x on an 8-core machine, depending on the scene.
=== Threads ===
Centralized task scheduler for multithreading, which is basically the
CPU device threading code wrapped into something reusable.
Basic idea is that there is a single TaskScheduler that keeps a pool of threads,
one for each core. Other places in the code can then create a TaskPool that they
can drop Tasks in to be executed by the scheduler, and wait for them to complete
or cancel them early.
=== Normal ====
Added a Normal output to the texture coordinate node. This currently
gives the object space normal, which is the same under object animation.
In the future this might become a "generated" normal so it's also stable for
deforming objects, but for now it's already useful for non-deforming objects.
=== Render Layers ===
Per render layer Samples control, leaving it to 0 will use the common scene
setting.
Environment pass will now render environment even if film is set to transparent.
Exclude Layers" added. Scene layers (all object that influence the render,
directly or indirectly) are shared between all render layers. However sometimes
it's useful to leave out some object influence for a particular render layer.
That's what this option allows you to do.
=== Filter Glossy ===
When using a value higher than 0.0, this will blur glossy reflections after
blurry bounces, to reduce noise at the cost of accuracy. 1.0 is a good
starting value to tweak.
Some light paths have a low probability of being found while contributing much
light to the pixel. As a result these light paths will be found in some pixels
and not in others, causing fireflies. An example of such a difficult path might
be a small light that is causing a small specular highlight on a sharp glossy
material, which we are seeing through a rough glossy material. With path tracing
it is difficult to find the specular highlight, but if we increase the roughness
on the material the highlight gets bigger and softer, and so easier to find.
Often this blurring will be hardly noticeable, because we are seeing it through
a blurry material anyway, but there are also cases where this will lead to a
loss of detail in lighting.
disk to be reused by the next render.
This is useful for rendering animations where only the camera or materials change.
Note that saving the BVH to disk only to be removed for the next frame is slower
if this is not the case and the meshes do actually change.
For a render, it will save bvh files to the cache user directory, and remove all
cache files from other renders. The files are named using a MD5 hash based on the
mesh, to verify if the meshes are still the same.
* Fix crash in light path node
* Fix struct alignment issue for cuda
* Fix issue with instances taking up too much memory
* Fix issue with ray visibility working incorrect on some objects
* Enable OpenCL always and remove option, it has no dependencies so may as well
* Refuse to load kernel if OpenCL version < 1.1, recent drivers are needed
* Better error handling for OpenCL device
* 3D views with rendered draw mode will now revert to wireframe on file load
* Add max diffuse/glossy/transmission bounces
* Add separate min/max for transparent depth
* Updated/added some presets that use these options
* Add ray visibility options for objects, to hide them from
camera/diffuse/glossy/transmission/shadow rays
* Is singular ray output for light path node
Details here:
http://wiki.blender.org/index.php/Dev:2.5/Source/Render/Cycles/LightPaths
* add some (disabled) test code for using OpenImageIO in imbuf
* link cycles, openimageio and boost into blender instead of a shared library
* some cmakefile changes to simplify the code and follow conventions better
* this may solve running cycles problems on windows XP, or give a different
and hopefully more useful error message