Forumshttps://software.intel.com/en-us/view/forum-page-default/36940
enIntel(R) TBB 2017 released!https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/684925
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><ul>
<li><a href="https://software.intel.com/en-us/articles/whats-new-intel-threading-building-blocks-2017">What's new</a> in Intel® TBB 2017 and <a href="https://software.intel.com/en-us/articles/intel-threading-building-blocks-release-notes/">Release notes</a>.</li>
<li>Intel® TBB commercial standalone: <a href="https://software.intel.com/en-us/intel-tbb">https://software.intel.com/en-us/intel-tbb</a> </li>
<li>Intel® TBB in Community Licensing: <a href="https://software.intel.com/sites/campaigns/nest/">https://software.intel.com/sites/campaigns/nest/</a> </li>
<li>Intel® TBB open source version: <a href="https://www.threadingbuildingblocks.org/" rel="nofollow">https://www.threadingbuildingblocks.org/</a></li>
<li>Updated and improved Intel® TBB 2017 <a href="https://software.intel.com/en-us/tbb-tutorial">Tutorial</a> and <a href="https://software.intel.com/en-us/tbb-documentation">General documentation</a></li>
</ul>
</div></div></div>Thu, 08 Sep 2016 12:51:14 +0000Alexey M. (Intel)684925 at https://software.intel.comPatterns of when and where to use this_task_arena::isolatehttps://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/703652
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>Hi,</p>
<p>my question is about <strong>when and where </strong>to use the TBB 2017 preview feature <em>this_task_arena::isolate</em>.</p>
<p>We hit on the problem described in <a href="https://software.intel.com/en-us/node/684814:">https://software.intel.com/en-us/node/684814:</a></p>
<ul>
<li>we have a parallel_for in high-level code</li>
<li>one worker thread processes a task from this outer loop</li>
<li>the thread does trigger lazy initialization of a data-structure. This is an internal optimization. The thread
<ul>
<li>takes a lock,</li>
<li>then creates a data-structure using an inner parallel_for, while still working on the task from the outer loop.</li>
<li>The worker thread processes a task from this inner loop.</li>
<li>Then the thread becames available again, before the inner loop is finished and while it is still holding the lock.
<ul>
<li>It processes a task from the outer loop.</li>
<li>This triggers again the lazy initialization of the same data-structure, for which the thread already holds the lock.</li>
<li>Deadlock</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Now, based on <a href="https://software.intel.com/en-us/node/684814">https://software.intel.com/en-us/node/684814</a> and other threads (e.g. <a href="https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/401006">https://software.intel.com/en-us/forums/intel-threading-building-blocks/...</a>, <a href="https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/611256">https://software.intel.com/en-us/forums/intel-threading-building-blocks/...</a>, <a href="https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/285550">https://software.intel.com/en-us/forums/intel-threading-building-blocks/...</a>) the best options seem to be:</p>
<ul>
<li><strong>Do not nest parallel_for</strong>. In this particular case we moved the lazy initialization before the outer loop, so the problem is solved. But it surprised us, because the two loops occur at very different levels of the application: high-level logic, low-level library code. And we are not sure if we have other cases where such a potentially fatal nesting could happen.</li>
<li><strong>Use a <em>task_arena </em>for starting the inner loop</strong>. This supposedly has some non-trivial overhead.</li>
<li><strong>Or use the preview feature <em>this_task_arena::isolate</em></strong>. Supposedly this has only very little overhead. And I think in the case of our inner loop it is sufficient, that only the thread that starts the inner loop is prevented from taking tasks from the outer loop.</li>
</ul>
<p>Now my question is:</p>
<ul>
<li><strong>Shouldn't parallel_for loops in libraries always be within a <em>task_arena </em>resp. <em>this_task_arena::isolate</em>?</strong> At least for code like ours, where you don't know if your code might called from within another parallel_for.</li>
<li>If so, where to start the <em>task_arena </em>resp. <em>this_task_arena::isolate</em> scope? Should it be <strong>close to the critical code and contain just the parallel_for call?</strong> Or, in our case with the lazy initialization guarded by a lock, should it be where it <strong>conceptually make sense, right after the lock is taken?</strong></li>
</ul>
<p>Thanks,</p>
<p>Alexander</p>
<p> </p>
</div></div></div>Tue, 22 Nov 2016 15:45:43 +0000Alexander F.703652 at https://software.intel.comWhen does TBB scalable allocator clean its buffershttps://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/703487
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>If I am NOT setting a soft heap limit with scalable_allocation_mode (TBBMALLOC_SET_SOFT_HEAP_LIMIT, &lt;size&gt;) what is the default limit id there is one? And when does TBB actually start cleaning up its internal buffers? I am using TBB 4.4.5</p>
<p> </p>
</div></div></div><section class="field field-name-field-zone field-type-taxonomy-term-reference field-label-above clearfix">
<h2 class="field-label">Zone:&nbsp;</h2>
<ul class="field-items">
<li class="field-item even">
<a href="/en-us/taxonomy/term/20798" typeof="skos:Concept" property="rdfs:label skos:prefLabel">Server</a> </li>
<li class="field-item odd">
<a href="/en-us/taxonomy/term/20800" typeof="skos:Concept" property="rdfs:label skos:prefLabel">Windows*</a> </li>
</ul>
</section>
Fri, 18 Nov 2016 20:51:44 +0000Ritwik D.703487 at https://software.intel.comHow to consume an overwrite node?https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/703212
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>I have a task graph who's body has a maximum concurrency of 1, similar to the pipeline described in <a href="https://software.intel.com/en-us/blogs/2011/09/14/how-to-make-a-pipeline-with-an-intel-threading-building-blocks-flow-graph">How to make a pipeline with an Intel® Threading Building Blocks flow graph</a>.</p>
<p>Similarly to the pipeline example, I want to automatically start another evaluation of the body of the pipeline if an input is already available as soon as the body ends. I want the input to my graph to be an <a href="https://software.intel.com/en-us/node/506228">overwrite_node</a>, since that allows me to always keep the "freshest" input possible for the task graph to consume when it's ready. However, when the task graph reads from the <a href="https://software.intel.com/en-us/node/506228">overwrite_node</a>, the contents of the <a href="https://software.intel.com/en-us/node/506228">overwrite_node</a> are not invalidated. That means that the task graph's body might run with the same input multiple times, which is undesirable for me. I'd like for the <a href="https://software.intel.com/en-us/node/506228">overwrite_node</a>'s contents to be invalidated when they are passed to their successor, that way each input can only cause at most one evaluation of the graph. Is this possible?</p>
<p>Some background detail: I'm writing D3D12 commands and submitting them within task nodes in order to render a frame of animation. With the current design, I can't be submitting more than one frame of animation concurrently, since that would cause the order of submission of rendering commands from multiple frames to become quasi-randomly interlaced (= likely nonsense). Since the inputs to frames of rendering can be produced faster than the speed at which frames are rendered, I keep only the "freshest" input, and drop stale frames. I just don't want to render the same frame multiple times in a row, since the results are already displayed on the screen meaning that recomputing them would be wasteful.</p>
<p>My current workaround is to manually implement the buffering and limiting with custom multi-threaded code. It works, but it might be overly complicated, so I hope that figuring out this problem will simplify things.</p>
<p>Nicolas</p>
</div></div></div><section class="field field-name-field-thread-topic field-type-list-text field-label-above"><h2 class="field-label">Thread Topic:&nbsp;</h2><div class="field-items"><div class="field-item even">Help Me</div></div></section>Tue, 15 Nov 2016 01:02:35 +0000Nicolas G.703212 at https://software.intel.comInvoking parallel_for while holding a lockhttps://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/703114
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>Hello,</p>
<p>The following code does not finish (mostly on my system). Am I doing something I shouldn't do? (i.e. invoking parallel_for while holding a lock) I think if parallel_for is invoked while all the other workers are blocked, the parallel_for routine should be executed single-threaded but it seems like this is not what's happening...</p>
<pre class="brush:cpp;">#include &lt;iostream&gt;
#include &lt;tbb/tbb.h&gt;
using namespace std;
int main( int argc, char* ap_args[] ) {
cout &lt;&lt; "test start." &lt;&lt; endl;
tbb::task_scheduler_init init( 12 );
tbb::mutex* p_mutex;
p_mutex = new tbb::mutex();
for( int step = 0 ; step &lt; 1000 ; step++ ) {
cout &lt;&lt; "step " &lt;&lt; step &lt;&lt; " start." &lt;&lt; endl;
tbb::parallel_for( tbb::blocked_range&lt;int&gt; ( 0, 10, 1 ), [&amp;]( const tbb::blocked_range&lt;int&gt;&amp; r ) {
for( int i = r.begin() ; i &lt; r.end() ; i++ ) {
cout &lt;&lt; "i=" &lt;&lt; i &lt;&lt; " before lock." &lt;&lt; endl;
p_mutex-&gt;lock();
cout &lt;&lt; "i=" &lt;&lt; i &lt;&lt; " after lock." &lt;&lt; endl;
tbb::parallel_for( tbb::blocked_range&lt;int&gt; ( 0, 100, 1 ), [&amp;]( const tbb::blocked_range&lt;int&gt;&amp; r2 ) {
int localSum = 0;
for( int j = r2.begin() ; j &lt; r2.end() ; j++ ) {
localSum += j;
}
cout &lt;&lt; "localSum=" &lt;&lt; localSum &lt;&lt; endl;
} );
cout &lt;&lt; "i=" &lt;&lt; i &lt;&lt; " before unlock." &lt;&lt; endl;
p_mutex-&gt;unlock();
cout &lt;&lt; "i=" &lt;&lt; i &lt;&lt; " after unlock." &lt;&lt; endl;
}
} );
cout &lt;&lt; "step " &lt;&lt; step &lt;&lt; " end." &lt;&lt; endl;
}
delete p_mutex;
cout &lt;&lt; "test end." &lt;&lt; endl;
return 0;
}</pre></div></div></div>Sat, 12 Nov 2016 03:32:17 +0000Seunghwa Kang703114 at https://software.intel.comtbb::concurrent_lru_cache designhttps://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/702900
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>I'm using the concurrent_lru_cache and generally like it, but it appears that there is no possibility for the callback function to fail: operator[] allocates a new bucket and calls the function that produce the value if is_new_value_needed() returns true.</p>
<p>So far, so good, but what if the function fails? It cannot throw an exception as any other thread waiting on that cache slot will now spin forever.</p>
<p>The only option when the function fails is to return an invalid value (by convention) and have the users of the cache check for that value. In my case, I store pointers in the cache, so storing a nullptr when the function fails makes sense.</p>
<p>Still, when this occur, the cache slot is now polluted with that invalid value, and there is no way for the caller to retry as it will get the same value over and over (until more usage on the cache eventually causes that particular slot to be discarded but that's not the point).</p>
<p>I think one simple way to work around this would be to allow to forcibly discard a cache entry, something like .discard(k) that would free up the slot and allow retries.</p>
<p>An alternate design could allow the callback to throw exceptions that would bubble up to the thread(s) waiting on that cache slot (and leave the slot empty), or yet another design would use a container-like interface like .push_front(k, val) and bool .lookup(k, &amp;val) and let the caller manage the item's creations entirely.</p>
<p>Additionally, a Boolean .hit() method on the handle_object, returning true on cache hits, would be useful, for example to produce hit ratio stats helping to understand the cache efficiency and pick an optimal cache size - this one is easy and I managed to implement it.</p>
<p>Thoughts? I see the LRU cache has been in preview for a long time, what are the plans for its evolution?</p>
<p>-- Axel</p>
</div></div></div><section class="field field-name-field-thread-topic field-type-list-text field-label-above"><h2 class="field-label">Thread Topic:&nbsp;</h2><div class="field-items"><div class="field-item even">Question</div></div></section>Wed, 09 Nov 2016 21:37:42 +0000Axel R.702900 at https://software.intel.comUnicode stiil compile errorhttps://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/701984
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>I compiled the latest libtbb (also tbbmalloc and tbbmalloc_proxy) with VS2013 on Win10 x64 in Unicode mode. It failed with several errors, such as "const char* type is not compatible with const unicode_char_t*" in line 632 proxy.cpp. I just wound why macro _T or _TEXT was not used to realize unicode support, or you can just use char* and functions appended with 'A' to avoid these problems. By the way, a semicolon missed in line 601 proxy.cpp.</p>
</div></div></div><section class="field field-name-field-zone field-type-taxonomy-term-reference field-label-above clearfix">
<h2 class="field-label">Zone:&nbsp;</h2>
<ul class="field-items">
<li class="field-item even">
<a href="/en-us/taxonomy/term/20800" typeof="skos:Concept" property="rdfs:label skos:prefLabel">Windows*</a> </li>
</ul>
</section>
<section class="field field-name-field-thread-topic field-type-list-text field-label-above"><h2 class="field-label">Thread Topic:&nbsp;</h2><div class="field-items"><div class="field-item even">Bug Report</div></div></section>Thu, 03 Nov 2016 02:21:31 +0000Horson L.701984 at https://software.intel.coms390x 64-bit support in tbbhttps://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/701983
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>Would you, please, add the following to build/linux.inc in tbb?</p>
<pre class="brush:bash;">ifeq ($(arch),s390x)
def_prefix = lin64
endif</pre><p>This addition would let us build for that 64-bit platform in Fedora without patching tbb. Thank you.</p>
</div></div></div>Thu, 03 Nov 2016 01:21:25 +0000Jerry J.701983 at https://software.intel.comUsing Vodafone SIM (other than AIRTEL) with Telit module over gsm/gprs communication on windriver Linux MI 3.1https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/701270
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"></div></div></div><div class="field field-name-field-attachments field-type-file field-label-hidden"><div class="field-items"><div class="field-item even"><table class="sticky-enabled">
<thead><tr><th>Attachment</th><th>Size</th> </tr></thead>
<tbody>
<tr class="odd"><td><span class="file"><a href="https://software.intel.com/sites/default/files/managed/d3/0b/sim_card_operator_default.doc" class="button-cta" type="application/msword; length=4364">Download</a><img class="file-icon" typeof="foaf:Image" src="https://software.intel.com/sites/all/themes/isn3/css/images/attachment_icon.png" alt="application/msword" title="application/msword" /> <a href="https://software.intel.com/sites/default/files/managed/d3/0b/sim_card_operator_default.doc" type="application/msword; length=4364">sim_card_operator_default.doc</a></span></td><td>4.26 KB</td> </tr>
<tr class="even"><td><span class="file"><a href="https://software.intel.com/sites/default/files/managed/4a/4a/network.doc" class="button-cta" type="application/msword; length=813">Download</a><img class="file-icon" typeof="foaf:Image" src="https://software.intel.com/sites/all/themes/isn3/css/images/attachment_icon.png" alt="application/msword" title="application/msword" /> <a href="https://software.intel.com/sites/default/files/managed/4a/4a/network.doc" type="application/msword; length=813">network.doc</a></span></td><td>813 bytes</td> </tr>
</tbody>
</table>
</div></div></div><section class="field field-name-field-zone field-type-taxonomy-term-reference field-label-above clearfix">
<h2 class="field-label">Zone:&nbsp;</h2>
<ul class="field-items">
<li class="field-item even">
<a href="/en-us/taxonomy/term/42773" typeof="skos:Concept" property="rdfs:label skos:prefLabel">Code for Good</a> </li>
<li class="field-item odd">
<a href="/en-us/taxonomy/term/68597" typeof="skos:Concept" property="rdfs:label skos:prefLabel">Intel® RealSense™ Technology</a> </li>
<li class="field-item even">
<a href="/en-us/taxonomy/term/45744" typeof="skos:Concept" property="rdfs:label skos:prefLabel">Internet of Things</a> </li>
<li class="field-item odd">
<a href="/en-us/taxonomy/term/82948" typeof="skos:Concept" property="rdfs:label skos:prefLabel">Artificial Intelligence</a> </li>
<li class="field-item even">
<a href="/en-us/taxonomy/term/82622" typeof="skos:Concept" property="rdfs:label skos:prefLabel">Modern Code</a> </li>
<li class="field-item odd">
<a href="/en-us/taxonomy/term/79579" typeof="skos:Concept" property="rdfs:label skos:prefLabel">Networking</a> </li>
</ul>
</section>
<section class="field field-name-field-thread-topic field-type-list-text field-label-above"><h2 class="field-label">Thread Topic:&nbsp;</h2><div class="field-items"><div class="field-item even">How-To</div></div></section>Thu, 27 Oct 2016 06:58:17 +0000Swapnil C.701270 at https://software.intel.comquestion about memory leak by using pipeline https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/700968
<div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even" property="content:encoded"><p>when I initial like this: tbb::task_scheduler_init parallel, the program has memory leak, but when I set it as: task_scheduler_init init(1), there is no memory leak. </p>
<p>I want to know the reason.</p>
<p>ps: the visual leak detector(VLD) can't detect the memory leak. but I run the program in a loop, and the same time monitor the memory usage delta, indeed the memory leak.</p>
</div></div></div>Wed, 26 Oct 2016 03:47:34 +0000Yuan Y.700968 at https://software.intel.com