• Español – América Latina
  • Português – Brasil
  • Tiếng Việt
  • App quality
  • Android Developers

Better performance through threading

Making adept use of threads on Android can help you boost your app’s performance. This page discusses several aspects of working with threads: working with the UI, or main, thread; the relationship between app lifecycle and thread priority; and, methods that the platform provides to help manage thread complexity. In each of these areas, this page describes potential pitfalls and strategies for avoiding them.

Main thread

When the user launches your app, Android creates a new Linux process along with an execution thread. This main thread, also known as the UI thread, is responsible for everything that happens onscreen. Understanding how it works can help you design your app to use the main thread for the best possible performance.

The main thread has a very simple design: Its only job is to take and execute blocks of work from a thread-safe work queue until its app is terminated. The framework generates some of these blocks of work from a variety of places. These places include callbacks associated with lifecycle information, user events such as input, or events coming from other apps and processes. In addition, app can explicitly enqueue blocks on their own, without using the framework.

Nearly any block of code your app executes is tied to an event callback, such as input, layout inflation, or draw. When something triggers an event, the thread where the event happened pushes the event out of itself, and into the main thread’s message queue. The main thread can then service the event.

While an animation or screen update is occurring, the system tries to execute a block of work (which is responsible for drawing the screen) every 16ms or so, in order to render smoothly at 60 frames per second . For the system to reach this goal, the UI/View hierarchy must update on the main thread. However, when the main thread’s messaging queue contains tasks that are either too numerous or too long for the main thread to complete the update fast enough, the app should move this work to a worker thread. If the main thread cannot finish executing blocks of work within 16ms, the user may observe hitching, lagging, or a lack of UI responsiveness to input. If the main thread blocks for approximately five seconds, the system displays the Application Not Responding (ANR) dialog, allowing the user to close the app directly.

Moving numerous or long tasks from the main thread, so that they don’t interfere with smooth rendering and fast responsiveness to user input, is the biggest reason for you to adopt threading in your app.

Threads and UI object references

By design, Android View objects are not thread-safe . An app is expected to create, use, and destroy UI objects, all on the main thread. If you try to modify or even reference a UI object in a thread other than the main thread, the result can be exceptions, silent failures, crashes, and other undefined misbehavior.

Issues with references fall into two distinct categories: explicit references and implicit references.

Explicit references

Many tasks on non-main threads have the end goal of updating UI objects. However, if one of these threads accesses an object in the view hierarchy, application instability can result: If a worker thread changes the properties of that object at the same time that any other thread is referencing the object, the results are undefined.

For example, consider an app that holds a direct reference to a UI object on a worker thread. The object on the worker thread may contain a reference to a View ; but before the work completes, the View is removed from the view hierarchy. When these two actions happen simultaneously, the reference keeps the View object in memory and sets properties on it. However, the user never sees this object, and the app deletes the object once the reference to it is gone.

In another example, View objects contain references to the activity that owns them. If that activity is destroyed, but there remains a threaded block of work that references it—directly or indirectly—the garbage collector will not collect the activity until that block of work finishes executing.

This scenario can cause a problem in situations where threaded work may be in flight while some activity lifecycle event, such as a screen rotation, occurs. The system wouldn’t be able to perform garbage collection until the in-flight work completes. As a result, there may be two Activity objects in memory until garbage collection can take place.

With scenarios like these, we suggest that your app not include explicit references to UI objects in threaded work tasks. Avoiding such references helps you avoid these types of memory leaks, while also steering clear of threading contention.

In all cases, your app should only update UI objects on the main thread. This means that you should craft a negotiation policy that allows multiple threads to communicate work back to the main thread, which tasks the topmost activity or fragment with the work of updating the actual UI object.

Implicit references

A common code-design flaw with threaded objects can be seen in the snippet of code below:

The flaw in this snippet is that the code declares the threading object MyAsyncTask as a non-static inner class of some activity (or an inner class in Kotlin). This declaration creates an implicit reference to the enclosing Activity instance. As a result, the object contains a reference to the activity until the threaded work completes, causing a delay in the destruction of the referenced activity. This delay, in turn, puts more pressure on memory.

A direct solution to this problem would be to define your overloaded class instances either as static classes, or in their own files, thus removing the implicit reference.

Another solution would be to always cancel and clean up background tasks in the appropriate Activity lifecycle callback, such as onDestroy . This approach can be tedious and error prone, however. As a general rule, you should not put complex, non-UI logic directly in activities. In addition, AsyncTask is now deprecated and it is not recommended for use in new code. See Threading on Android for more details on the concurrency primitives that are available to you.

Threads and app activity lifecycles

The app lifecycle can affect how threading works in your application. You may need to decide that a thread should, or should not, persist after an activity is destroyed. You should also be aware of the relationship between thread prioritization and whether an activity is running in the foreground or background.

Persisting threads

Threads persist past the lifetime of the activities that spawn them. Threads continue to execute, uninterrupted, regardless of the creation or destruction of activities, although they will be terminated together with the application process once there are no more active application components. In some cases, this persistence is desirable.

Consider a case in which an activity spawns a set of threaded work blocks, and is then destroyed before a worker thread can execute the blocks. What should the app do with the blocks that are in flight?

If the blocks were going to update a UI that no longer exists, there’s no reason for the work to continue. For example, if the work is to load user information from a database, and then update views, the thread is no longer necessary.

By contrast, the work packets may have some benefit not entirely related to the UI. In this case, you should persist the thread. For example, the packets may be waiting to download an image, cache it to disk, and update the associated View object. Although the object no longer exists, the acts of downloading and caching the image may still be helpful, in case the user returns to the destroyed activity.

Managing lifecycle responses manually for all threading objects can become extremely complex. If you don’t manage them correctly, your app can suffer from memory contention and performance issues. Combining ViewModel with LiveData allows you to load data and be notified when it changes without having to worry about the lifecycle. ViewModel objects are one solution to this problem. ViewModels are maintained across configuration changes which provides an easy way to persist your view data. For more information about ViewModels see the ViewModel guide , and to learn more about LiveData see the LiveData guide . If you would also like more information about application architecture, read the Guide To App Architecture .

Thread priority

As described in Processes and the Application Lifecycle , the priority that your app’s threads receive depends partly on where the app is in the app lifecycle. As you create and manage threads in your application, it’s important to set their priority so that the right threads get the right priorities at the right times. If set too high, your thread may interrupt the UI thread and RenderThread, causing your app to drop frames. If set too low, you can make your async tasks (such as image loading) slower than they need to be.

Every time you create a thread, you should call setThreadPriority() . The system’s thread scheduler gives preference to threads with high priorities, balancing those priorities with the need to eventually get all the work done. Generally, threads in the foreground group get about 95% of the total execution time from the device, while the background group gets roughly 5%.

The system also assigns each thread its own priority value, using the Process class.

By default, the system sets a thread’s priority to the same priority and group memberships as the spawning thread. However, your application can explicitly adjust thread priority by using setThreadPriority() .

The Process class helps reduce complexity in assigning priority values by providing a set of constants that your app can use to set thread priorities. For example, THREAD_PRIORITY_DEFAULT represents the default value for a thread. Your app should set the thread's priority to THREAD_PRIORITY_BACKGROUND for threads that are executing less-urgent work.

For more information on managing threads, see the reference documentation about the Thread and Process classes.

Helper classes for threading

For developers using Kotlin as their primary language, we recommend using coroutines . Coroutines provide a number of benefits, including writing async code without callbacks as well as structured concurrency for scoping, cancellation and error handling.

The framework also provides the same Java classes and primitives to facilitate threading, such as the Thread , Runnable , and Executors classes, as well as additional ones such as HandlerThread . For further information, please refer to Threading on Android .

The HandlerThread class

A handler thread is effectively a long-running thread that grabs work from a queue and operates on it.

Consider a common challenge with getting preview frames from your Camera object. When you register for Camera preview frames, you receive them in the onPreviewFrame() callback, which is invoked on the event thread it was called from. If this callback were invoked on the UI thread, the task of dealing with the huge pixel arrays would be interfering with rendering and event processing work.

In this example, when your app delegates the Camera.open() command to a block of work on the handler thread, the associated onPreviewFrame() callback lands on the handler thread, rather than the UI thread. So, if you’re going to be doing long-running work on the pixels, this may be a better solution for you.

When your app creates a thread using HandlerThread , don’t forget to set the thread’s priority based on the type of work it’s doing . Remember, CPUs can only handle a small number of threads in parallel. Setting the priority helps the system know the right ways to schedule this work when all other threads are fighting for attention.

The ThreadPoolExecutor class

There are certain types of work that can be reduced to highly parallel, distributed tasks. One such task, for example, is calculating a filter for each 8x8 block of an 8 megapixel image. With the sheer volume of work packets this creates, HandlerThread isn’t the appropriate class to use.

ThreadPoolExecutor is a helper class to make this process easier. This class manages the creation of a group of threads, sets their priorities, and manages how work is distributed among those threads. As workload increases or decreases, the class spins up or destroys more threads to adjust to the workload.

This class also helps your app spawn an optimum number of threads. When it constructs a ThreadPoolExecutor object, the app sets a minimum and maximum number of threads. As the workload given to the ThreadPoolExecutor increases, the class will take the initialized minimum and maximum thread counts into account, and consider the amount of pending work there is to do. Based on these factors, ThreadPoolExecutor decides on how many threads should be alive at any given time.

How many threads should you create?

Although from a software level, your code has the ability to create hundreds of threads, doing so can create performance issues. Your app shares limited CPU resources with background services, the renderer, audio engine, networking, and more. CPUs really only have the ability to handle a small number of threads in parallel; everything above that runs into priority and scheduling issue . As such, it’s important to only create as many threads as your workload needs.

Practically speaking, there’s a number of variables responsible for this, but picking a value (like 4, for starters), and testing it with Systrace is as solid a strategy as any other. You can use trial-and-error to discover the minimum number of threads you can use without running into problems.

Another consideration in deciding on how many threads to have is that threads aren’t free: they take up memory. Each thread costs a minimum of 64k of memory. This adds up quickly across the many apps installed on a device, especially in situations where the call stacks grow significantly.

Many system processes and third-party libraries often spin up their own threadpools. If your app can reuse an existing threadpool, this reuse may help performance by reducing contention for memory and processing resources.

Content and code samples on this page are subject to the licenses described in the Content License . Java and OpenJDK are trademarks or registered trademarks of Oracle and/or its affiliates.

Last updated 2024-01-03 UTC.

Set Thread Priority In QtConcurrent

QThread class have method setPriority(), but QtConcurrent doesn't. So i write own implementation with using call_once function. But current C++ standard does not have this function, it's only in C+ 0x and BOOST. After some googling i found few implementations and select one. QAtomicInt not ideal class for create thread-safe code and i had to use few undocumented features like QBasicAtomicInt, because it POD type and can be initialized statically inside executable file before all potential parallel initializations inside concurrent threads.

This is example how to use this functions for set thread priority once per thread at run:

P.S.: This implementation faster than boost::call_once, but slower than std::call_once

Navigation menu

Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. — Herb Sutter and Andrei Alexandrescu , C++ Coding Standards

Prev

Chapter 38. Thread 4.8.0

Anthony williams, vicente j. botet escriba.

Copyright © 2007 -11 Anthony Williams

Copyright © 2011 -17 Vicente J. Botet Escriba

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt )

Table of Contents

Boost.Thread enables the use of multiple threads of execution with shared data in portable C++ code. It provides classes and functions for managing the threads themselves, along with others for synchronizing data between the threads or providing separate copies of data specific to individual threads.

The Boost.Thread library was originally written and designed by William E. Kempf (version 1).

Anthony Williams version (version 2) was a major rewrite designed to closely follow the proposals presented to the C++ Standards Committee, in particular N2497 , N2320 , N2184 , N2139 , and N2094

Vicente J. Botet Escriba started (version 3) the adaptation to comply with the accepted Thread C++11 library (Make use of Boost.Chrono and Boost.Move) and the Shared Locking Howard Hinnant proposal except for the upward conversions. Some minor non-standard features have been added also as thread attributes, reverse_lock, shared_lock_guard.

In order to use the classes and functions described here, you can either include the specific headers specified by the descriptions of each class or function, or include the master thread library header:

which includes all the other headers in turn.

BlackBerry QNX: Real-Time OS and Software for Embedded Systems

Launch your critical embedded systems faster with our commercial RTOS, hypervisor, development tools and services.

Trusted in Embedded Systems Everywhere

Trusted in Embedded Systems Everywhere

Everything You Need to Build Better Embedded Systems

Professional services.

Embedded systems are more software-driven and complex than ever. Let us provide the software foundation and building blocks to help you focus on delivering value-added features and software—not OS maintenance.

  • Foundation products including the QNX ®  OS 8.0, the QNX ® Software Development Platform (SDP) 8.0 with a POSIX-compliant development environment, and the QNX ® Hypervisor
  • Safety-certified variants of our products that accelerate your certification efforts
  • Security solutions  including secure over-the-air updates
  • Middleware to boost your development efforts and time to market

Software

You need more than software to be successful. You need a partner who knows the job isn’t done until you are in production.

  • A variety of support packages and technical advice from developers, engineers, and architects
  • Best-in-class product documentation complemented by our knowledge base
  • Board support packages for a wide range of ARM and x86 processors

Support

Whether you need to augment your team, kickstart a project or certify your products, you can rely on our embedded and OS experts to provide the expertise and experience you need.

  • Security services and binary code analysis solutions
  • Custom development
  • Safety services to help you achieve IEC 61508, ISO 26262, IEC 62304 and EN 5012X certifications
  • Training courses developed and led by experts in functional safety and embedded software development

Professional Services

Why Choose BlackBerry QNX Services

Safety

Scalability

Reliability

Reliability

Where we help, connected and autonomous vehicles, robotics and automation.

Medical Devices

Medical Device Operating System

Real-time os for rail systems.

Heavy Machinery

Operating System for Heavy Machinery

Industrial Controls

Real-Time OS for Industrial Control Systems

Aerospace and Defense

Embedded OS for Aerospace and Defense

Commercial Vehicles

Software for Commercial Vehicles

Structural Dependency

  • Upcoming Events

Safety Critical Systems Symposium - 2024, Bristol, UK, February 13 — 14

Embedded World 2024, Nuremberg, Germany, April 9 — 11

AEK - Automobil Elektronik Kongress, Ludwigsburg, Germany - June 18 — 19, 2024

AES - Audio Engineering Society 5th International Conference on Automotive Audio, Gothenburg, Sweden - June 26 — 28, 2024

Event, Embedded

Event, Automotive

Function windows :: Win32 :: System :: Threading :: SetThreadPriorityBoost

From: William E. Kempf ( wekempf_at_[hidden] ) Date: 2002-11-05 09:45:36

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net

This browser is no longer supported.

Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.

SetThreadPriority function (processthreadsapi.h)

Sets the priority value for the specified thread. This value, together with the priority class of the thread's process, determines the thread's base priority level.

[in] hThread

A handle to the thread whose priority value is to be set.

The handle must have the THREAD_SET_INFORMATION or THREAD_SET_LIMITED_INFORMATION access right. For more information, see Thread Security and Access Rights . Windows Server 2003:   The handle must have the THREAD_SET_INFORMATION access right.

[in] nPriority

The priority value for the thread. This parameter can be one of the following values.

If the thread has the REALTIME_PRIORITY_CLASS base class, this parameter can also be -7, -6, -5, -4, -3, 3, 4, 5, or 6. For more information, see Scheduling Priorities .

Return value

If the function succeeds, the return value is nonzero.

If the function fails, the return value is zero. To get extended error information, call GetLastError .

Windows Phone 8.1:   Windows Phone Store apps may call this function but it has no effect. The function will return a nonzero value indicating success.

Every thread has a base priority level determined by the thread's priority value and the priority class of its process. The system uses the base priority level of all executable threads to determine which thread gets the next slice of CPU time. Threads are scheduled in a round-robin fashion at each priority level, and only when there are no executable threads at a higher level does scheduling of threads at a lower level take place.

The SetThreadPriority function enables setting the base priority level of a thread relative to the priority class of its process. For example, specifying THREAD_PRIORITY_HIGHEST in a call to SetThreadPriority for a thread of an IDLE_PRIORITY_CLASS process sets the thread's base priority level to 6. For a table that shows the base priority levels for each combination of priority class and thread priority value, see Scheduling Priorities .

For IDLE_PRIORITY_CLASS , BELOW_NORMAL_PRIORITY_CLASS , NORMAL_PRIORITY_CLASS , ABOVE_NORMAL_PRIORITY_CLASS , and HIGH_PRIORITY_CLASS processes, the system dynamically boosts a thread's base priority level when events occur that are important to the thread. REALTIME_PRIORITY_CLASS processes do not receive dynamic boosts.

All threads initially start at THREAD_PRIORITY_NORMAL . Use the GetPriorityClass and SetPriorityClass functions to get and set the priority class of a process. Use the GetThreadPriority function to get the priority value of a thread.

Use the priority class of a process to differentiate between applications that are time critical and those that have normal or below normal scheduling requirements. Use thread priority values to differentiate the relative priorities of the tasks of a process. For example, a thread that handles input for a window could have a higher priority level than a thread that performs intensive calculations for the CPU.

When manipulating priorities, be very careful to ensure that a high-priority thread does not consume all of the available CPU time. A thread with a base priority level above 11 interferes with the normal operation of the operating system. Using REALTIME_PRIORITY_CLASS may cause disk caches to not flush, cause the mouse to stop responding, and so on.

The THREAD_PRIORITY_* values affect the CPU scheduling priority of the thread. For threads that perform background work such as file I/O, network I/O, or data processing, it is not sufficient to adjust the CPU scheduling priority; even an idle CPU priority thread can easily interfere with system responsiveness when it uses the disk and memory. Threads that perform background work should use the THREAD_MODE_BACKGROUND_BEGIN and THREAD_MODE_BACKGROUND_END values to adjust their resource scheduling priorities; threads that interact with the user should not use THREAD_MODE_BACKGROUND_BEGIN .

When a thread is in background processing mode, it should minimize sharing resources such as critical sections, heaps, and handles with other threads in the process, otherwise priority inversions can occur. If there are threads executing at high priority, a thread in background processing mode may not be scheduled promptly, but it will never be starved.

Windows Server 2008 and Windows Vista:   While the system is starting, the SetThreadPriority function returns a success return value but does not change thread priority for applications that are started from the system Startup folder or listed in the HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ Windows \ CurrentVersion \ Run registry key. These applications run at reduced priority for a short time (approximately 60 seconds) to make the system more responsive to user actions during startup.

Windows 8.1 and Windows Server 2012 R2 : This function is supported for Windows Store apps.

Windows Phone 8.1: Windows Phone Store apps may call this function but it has no effect.

The following example demonstrates the use of thread background mode.

Requirements

GetPriorityClass

GetThreadPriority

Process and Thread Functions

Scheduling Priorities

SetPriorityClass

Was this page helpful?

Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see: https://aka.ms/ContentUserFeedback .

Submit and view feedback for

Additional resources

Daniel Lemire's blog

Daniel Lemire is a computer science professor at the Data Science Laboratory of the Université du Québec (TÉLUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist and a free-speech advocate.

Reusing a thread in C++ for better performance

In a previous post , I measured the time necessary to start a thread, execute a small job and return.

The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes thousands of nanoseconds and thousands of cycles. The work in my case is just incrementing a counter: any task more involved will increase the overall cost. The C++ standard API also provides an async function to call one function and return: it is practically equivalent to starting a new thread and joining it, as I just did.

Creating a new thread each time is fine if you have a large task that needs to run for milliseconds. However, if you have tiny tasks, it won’t do.

What else could you do? Instead of creating a thread each time, you could create a single thread. This thread loops and periodically sleep, waiting to be notified that there is work to be done. I am using the C++11 standard approach.

It should be faster and overall more efficient. You should expect gains ranging from 2x to 5x. If you use a C++ library with thread pools and/or workers, it is likely to adopt such an approach, albeit with more functionality and generality. However, the operating system is in charge of waking up the thread and may not do so immediately so it is not likely to be the fastest approach.

What else could you do? You could simply avoid as much as possible system dependencies and just loop on an atomic variable. The downside of the tight loop (spin lock) approach is that your thread might fully use the processor while it waits. However, you should expect it to get to work much quicker.

The results will depend crucially on your processor and on your operation system. Let me report the rough numbers I get with an Intel-based linux box and GNU GCC 8.

My source code is available .

Published by

boost set thread priority

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ). View all posts by Daniel Lemire

15 thoughts on “Reusing a thread in C++ for better performance”

Hi Daniel, (Been following this blog for years, yet this is my first comment.)

Every now and then and for very long time, this subject was intriguing me, i have similar result like yours, and that is not the question, the question is why CPU technology/industry is and was mostly driven by the need for more speed and more cores, yet it doesn’t focus or even ignore this exact point, switching between thread, i mean GPU’s have hundreds cores, CPU’s already have tens, yet having specific instruction like similar to HLT (halt) to be waked up by another instruction or dedicated instructions set to time very short sleep to save power, this might be very useful and will boost the speed in cases or save power in different cases, Why the need of switching between threads in an efficient way seems to be like not important or not a priority?

For me it looks like been decided, this one is a software issue to resolute or to live with, yet those CPU technologies do evolve to hasten specific software problems, may be it is hard or wrong to do in hardware, may be, on other hand seeing what was considered to be hard or impossible 15 or 20 years ago (or even more) , in a device you can hold in one hand does means one thing, that hard and impossible are relative matter and not absolute. Is it wrong to begin with ? or just wrong now in relative to our time, and this can be seen differently in few years.

Daniel, i would love to read your opinion and thoughts about that, may be blog post.

switching between threads in an efficient way seems to be like not important or not a priority

It is very application dependent. In HPC (scientific computing) programs typically pin one thread to each core so they don’t disturb each other, meanwhile operative systems are optimized to minimize the noise introduced by other applications taking CPU time.

yet having specific instruction like similar to HLT (halt) to be waked up by another instruction or dedicated instructions set to time very short sleep to save power

In Intel processors you already got something like that. The instructions monitor and mwait track a memory location and put the core in a low power state. The problem is that it is processor specific and not portable to other platforms.

There’s definitely a lot going on in CPU technology to reduce the cost of concurrency and context switching:

Hyperthreads are definitely the most well-known: the CPU exposes a single core (with a single set of execution ports) as a pair of “logical” cores to the OS, which can schedule 2 different tasks on it; the CPU executes both tasks interleaved, and whenever one task blocks (for instance, due to a cache miss or atomic memory operation, or if it’s spinning over a lock and signals it with mm_pause ) the other task can run. In a more-traditional system (no HT, software scheduler) the cycles that the task spent blocking would simply be “lost” (no useful work happening). New concurrency-related hardware features (lock elision, hardware transactional memory, …) enable faster implementations of locks/semaphores, work queues, etc… Those hardware features are not really consumed directly by most software engineers, as they require very specialised knowledge to use effectively, but libraries of high-performance concurrency primitives tend to leverage them. On ARMv8 CPUs, the NVIC (Nested Vectored Interrupt Controller) supports fairly complex/flexible task configurations. For instance, the RTIC (Real-Time, Interrupt-driven Concurrency) framework reduces a program’s scheduling policy (i.e. the relative priorities of various tasks) to an NVIC configuration at compile time, meaning that all context switching and task management is managed by the hardware, rather than having a software scheduler. Cherry on top, RTIC extracts information about which resources are used by each task, to both avoid unnecessary locks (if a task uses a given shared resource, but no higher-priority task does, it can safely avoid taking-and-releasing the lock) and avoid unnecessarily blocking (when a task A is in a critical section, only tasks which use some of the same resources are blocked; higher-priority tasks that do not interact with A can still preempt it as needed). I’m not aware of any general-purposed OS doing this, though. 🙁

Thank you Nicolas,

What did you described about ARMv8 is in fact very interesting (i didn’t know that) , also reading that Apple will release its Mac with ARM processors in 2021, indicate the processing technology race is not slowing down on contrary it is picking up pace.

The Cherry you mentioned, IMHO it makes sense to be used to simplify the multi-reader single-writer implementation (may be for multi-writer in atomic behaviour !) , to provide higher level of efficiency with lower power consumption.

Thank you again for replying with these information.

Hope you will do optimization research on JavaScript 😭 plz

Why implement your own 1-thread thread pool ? Just use an existing library. DuckDuckGo Search .

Why implement your own 1-thread thread pool

To run benchmarks so that we can understand what the trade-offs are.

In our case, since the operating system closes a thread down in its own time, we quickly ran out of threads using the first approach. Re-using the thread was the only workable solution.

It is intriguing. Did you join your threads and still get the problem? I am hoping that once the call to join succeeds, the thread is gone. Callign detach would be something else… but I hope that “join” actually cleans the thread up…

The spinlock approach is something that should be avoided by any means. Especially on single core machines this will effectvely kill the performance of the whole system. I would never ever do that!

I like to use a ThreadPool for such circumstances.

This is a nice implementation.

https://github.com/progschj/ThreadPool

PS – subscribing without comment is broken.

PS – subscribing without comment is broken.

I am not sure what this means. Can you elaborate?

Some quick observations you might not be aware of:

when spinning on a lock, it’s usually a good idea to emit an instruction signalling that to the CPU ( mm_pause on x86/amd64, yield on Arm) : it enables optimisations such as switching to another hyperthread on the same core when waiting for the lock, or going low-power (modern CPUs are often bottlenecked by heat management, so going low-power can let other, useful work happen at a higher clock frequency) good mutex and work queues implementations already spin for a short while (to optimise away the context switch when duty cycle is high) before parking the thread (typically using a futex , so the OS scheduler knows exactly when to wake up a thread as work becomes available) ; I wasn’t quite capable of figuring out what the GNU libstdc++ does, from reading the relevant code, but it seems not to do spin-then-futex for some reason. in more general work-queue usecases, using a spinlock alone is susceptible to priority inversion: if some thread gets interrupted in the critical section, the OS might schedule the other threads (that are spinning uselessly) instead of the one holding the lock.

I couldn’t get it to work. The validation logic required a value in the message.

Leave a Reply Cancel reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

To create not a block, but an inline code span, use backticks:

For more help see http://daringfireball.net/projects/markdown/syntax

Save my name, email, and website in this browser for the next time I comment.

You may subscribe to this blog by email.

Limiting CPU Threads for Better Game Performance

Decorative image of scissors near a CPU with green light streaming out.

Many PC games are designed around an eight-core console with an assumption that their software threading system ‘just works’ on all PCs, especially regarding the number of threads in the worker thread pool. This was a reasonable assumption not too long ago when most PCs had similar core counts to consoles: the CPUs were just faster and performance just scaled.

In recent years though, the CPU landscape has changed and there is now a complex matrix of performance variables to navigate:

  • Higher core counts
  • The introduction of heterogeneous P/E cores from Intel
  • Asymmetrical caches from AMD
  • More complex scheduling algorithms 
  • Power management techniques from OS vendors such as Microsoft 

This complexity means that the previous thread count determination algorithm (and its derivatives) is no longer sufficient:

This traditional thread count determination algorithm was based on logical core counts and reserved two cores for critical threads.

Many CPU-bound games actually degrade in performance when the core count increases beyond a certain point, so the benefits of the extra threading parallelism are outweighed by the overhead. 

On high-end desktop systems with greater than eight physical cores for example, some titles can see performance gains of up to 15% by reducing the thread count of their worker pools to be less than the core count of the CPU.

The reasons for the performance drop are complex and varied. Where one title may see a performance drop of 10%, another may see a performance gain of 10% on the same system, thus highlighting the difficulty in providing a one-size-fits-all solution across all titles and all systems. 

Instead, a game’s thread count should be tailored to fit the workload. Light CPU workloads should use fewer threads.

Performance solutions

If the performance of your game is not scaling as expected on higher-core-count machines, potentially even reducing its performance, there can be several common reasons:

  • Hardware performance: Higher-core-count CPUs sometimes have lower CPU speeds. Reducing the number of threads may enable the active cores to boost their frequency.
  • Executing threads on both logical cores of a single physical core (hyperthreading or simultaneous multi-threading) can add latency as both threads must share the physical resource (caches, instruction pipelines, and so on). If a critical thread is sharing a physical core, then its performance may decrease. Targeting physical core counts instead of logical core counts can help to reduce this on larger core count systems.
  • Software resource contention: Locks and atomics can have much higher latency when accessed by many threads concurrently, adding to the memory pressure. False sharing can exacerbate this.
  • On systems with P/E cores, work is scheduled first to physical P cores, then E cores, and then hyperthreaded logical P cores. Using fewer threads than total physical cores enables background threads, such as OS threads, to execute on the E cores without disrupting critical threads running on P cores by executing on their sibling logical cores.
  • Core parking has been seen to be sensitive to high thread counts, causing issues with short bursty threads failing to trigger the heuristic to unpark cores. Having longer running, fewer threads helps the core parking algorithms.

There are several solutions to this scaling issue, depending on the root cause of the problem:

  • Dynamic load balancing of thread counts
  • Lockless threading models that scale with core count
  • Use QoS and thread priority APIs to help steer threads to specific cores
  • Other solutions…

The simplest method may be to find how many threads your game actually needs and then let the OS schedule the threads effectively. 

Figure 1 shows that reducing the number of threads your game uses may reduce some of the overhead, often from critical threads, which may directly improve the performance of your game.

Graph shows that, by halving the thread count, overall execution time decreases due to reduced per-thread overheads.

Test your game on different systems at different settings and with different thread counts. You will likely find a thread count sweet spot or a small number of sweet spots that work for your game. 

Ensure that you test hyperthreading to see whether you should align to physical or logical cores when enumerating your threads on the different systems. Hyperthreading often helps on low-core-count systems that don’t have enough physical cores to efficiently execute your workload but can hinder performance on larger core-count systems.

Your testing may produce a modified algorithm where you tailor max_thread_count to suit your workload. The following thread-count determination algorithm is modified to limit the thread count to a predefined maximum:

If max_thread_count is added to your game .ini file, it is easy for your IHV partners, QA teams, and gamers alike to find the right number of threads for their own PC setup to ensure that maximum performance is achieved.

CPU performance matters and worker thread count is an integral part of the performance equation. Measuring your game’s CPU performance on a matrix of CPUs and adjusting the thread count to fit the workload are simple optimizations that can produce large double-digit performance gains. 

Providing an override for thread count in an .ini file ensures that gamers can find the right value to maximize the performance on their PC.

Related resources

  • GTC session: Demystify CUDA Debugging and Performance with Powerful Developer Tools
  • GTC session: Connect With the Experts: Tuning Applications for Grace Hopper Superchip
  • GTC session: Performance Optimization for Grace CPU Superchip
  • SDK: Reflex
  • SDK: NVIDIA Texture Tools-Photoshop Plug in

About the Authors

Avatar photo

Related posts

A windmill and solar panel illustration.

Energy Efficiency in High-Performance Computing: Balancing Speed and Sustainability

Data center with overlay

Accelerating Data Center and HPC Performance Analysis with NVIDIA Nsight Systems

boost set thread priority

Optimizing Enterprise IT Workloads with NVIDIA-Certified Systems

boost set thread priority

The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload

boost set thread priority

NVIDIA Announces Nsight Systems 2018.3!

Decorative image of a workflow and the text "Part 3".

Robust Scene Text Detection and Recognition: Inference Optimization

Decorative image of a workflow and the text "Part 2".

Robust Scene Text Detection and Recognition: Implementation

Decorative image of a workflow and the text "Part 1".

Robust Scene Text Detection and Recognition: Introduction

Decorative image of light fields in green, purple, and blue.

Improving CUDA Initialization Times Using cgroups in Certain Scenarios

boost set thread priority

Deploying Retrieval-Augmented Generation Applications on NVIDIA GH200 Delivers Accelerated Performance

IMAGES

  1. C++ : Setting thread priority in Linux with Boost

    boost set thread priority

  2. In Java How to Set and Get Thread Priority? Get Thread ID, Count, Class

    boost set thread priority

  3. Thread Priority in Java Multi-threading

    boost set thread priority

  4. Thread Priority in Java

    boost set thread priority

  5. Getting Started with Boost Threads in Visual Studio

    boost set thread priority

  6. Thread Priority in Java with Practical

    boost set thread priority

VIDEO

  1. Thread Priority

COMMENTS

  1. Setting thread priority in Linux with Boost

    The Boost Libraries don't seem to have a device for setting a thread's priority. Would this be the best code to use on Linux or is there a better method? boost::thread myThread ( MyFunction () ); struct sched_param param; param.sched_priority = 90; pthread_attr_setschedparam ( myThread.native_handle (), SCHED_RR, &param);

  2. Thread Management

    The boost::thread class is responsible for launching and managing threads. Each boost::thread object represents a single thread of execution, or Not-a-Thread, and at most one boost::thread object represents a given thread of execution: objects of type boost::thread are not copyable.

  3. Priority Boosts

    Priority Boosts Article 01/07/2021 5 contributors Feedback Each thread has a dynamic priority. This is the priority the scheduler uses to determine which thread to execute. Initially, a thread's dynamic priority is the same as its base priority.

  4. SetProcessPriorityBoost function (processthreadsapi.h)

    When a thread is running in one of the dynamic priority classes, the system temporarily boosts the thread's priority when it is taken out of a wait state. If SetProcessPriorityBoost is called with the DisablePriorityBoost parameter set to TRUE, its threads' priorities are not boosted. This setting affects all existing threads and any threads ...

  5. SetThreadPriorityBoost function (processthreadsapi.h)

    Disables or enables the ability of the system to temporarily boost the priority of a thread. Syntax C++ BOOL SetThreadPriorityBoost( [in] HANDLE hThread, [in] BOOL bDisablePriorityBoost ); Parameters [in] hThread A handle to the thread whose priority is to be boosted.

  6. Better performance through threading

    Thread priority Making adept use of threads on Android can help you boost your app's performance. This page discusses several aspects of working with threads: working with the UI, or main, thread; the relationship between app lifecycle and thread priority; and, methods that the platform provides to help manage thread complexity.

  7. Set Thread Priority In QtConcurrent

    But current C++ standard does not have this function, it's only in C+0x and BOOST. After some googling i found few implementations and select one. ... Highest priority", (quintptr) thread);} int calculate (const int & num) {#if 0 // Test once call per thread with function #if 0 // Set lowest thread priority qCallOncePerThread ...

  8. windows

    The process priority class and the thread priority are building the base priority of a thread. See Scheduling Priorities to find how the priorities are assembled. By looking at this list it becomes clear that your understanding is somewhat correct; within a certain priority class the base priority can have various values, determined by the thread priority.

  9. Chapter 38. Thread 4.8.0

    The Boost.Thread library was originally written and designed by William E. Kempf (version 1). Anthony Williams version (version 2) was a major rewrite designed to closely follow the proposals presented to the C++ Standards Committee, in particular N2497 , N2320 , N2184 , N2139 , and N2094. Vicente J. Botet Escriba started (version 3) the ...

  10. Boost users' mailing page: RE: [Boost-Users] set thread priority

    Next in thread: William E. Kempf: "RE: [Boost-Users] set thread priority" Reply: William E. Kempf: "RE: [Boost-Users] set thread priority" I'd think you could get away with something like a set range, like 0 to 1000, and then map those values into whatever system is present. You could

  11. Boost users' mailing page: [Boost-users] Boost thread priority

    Reply: Anthony Williams: "Re: [Boost-users] Boost thread priority" Hello! I want to set the priority of a boost thread. It seems that this feature is not (yet) implemented and the only option is to go to lower level (pthread in my case). The documentation says that the method native_handle() should

  12. GetThreadPriorityBoost function (processthreadsapi.h)

    For more information, see Thread Security and Access Rights. Windows Server 2003 and Windows XP: The handle must have the THREAD_QUERY_INFORMATION access right. [out] pDisablePriorityBoost. A pointer to a variable that receives the priority boost control state. A value of TRUE indicates that dynamic boosting is disabled.

  13. Scheduling priority

    Every thread is assigned a priority. The thread scheduler selects the next thread to run by looking at the priority assigned to every thread that's READY (i.e., capable of using the CPU). On a single-core system, the READY thread with the highest priority is selected to run.

  14. SetThreadPriorityBoost in windows::Win32::System::Threading

    Required features: "Win32_System_Threading", "Win32_Foundation""Win32_System_Threading", "Win32_Foundation"

  15. Boost users' mailing page: Re: [Boost-Users] set thread priority

    > Dear Boost Users, > I am considering using the boost thread library; > currently I use the Win32 API for threading on Windows. > > Is it possible to set the thread priority of a boost thread? > > Thanks! Not currently. This is one of the more tricky things to deal with, since it's non-portable, but it's planned for the (hopefully near) future.

  16. SetThreadPriority function (processthreadsapi.h)

    Article 09/22/2022 Feedback In this article Syntax Parameters Return value Remarks Show 2 more Sets the priority value for the specified thread. This value, together with the priority class of the thread's process, determines the thread's base priority level. Syntax C++ BOOL SetThreadPriority( [in] HANDLE hThread, [in] int nPriority ); Parameters

  17. Reusing a thread in C++ for better performance

    In a previous post, I measured the time necessary to start a thread, execute a small job and return. auto mythread = std::thread([] { counter++; }); mythread.join(); The answer is thousands of nanoseconds. Importantly, that is the time as measured by the main thread. That is, sending the query, and getting back the result, takes … Continue reading Reusing a thread in C++ for better performance

  18. Windows multi-threading enforcing thread priority

    Windows multi-threading enforcing thread priority. Since the process affinity is restricted to a single core, all the threads of the current process have to compete for the execution time. Since thread a has a higher priority than thread b and it never blocks, thread b is never executed. Thread b gets spuriously executed.

  19. Limiting CPU Threads for Better Game Performance

    If max_thread_count is added to your game .ini file, it is easy for your IHV partners, QA teams, and gamers alike to find the right number of threads for their own PC setup to ensure that maximum performance is achieved.. Summary. CPU performance matters and worker thread count is an integral part of the performance equation. Measuring your game's CPU performance on a matrix of CPUs and ...