Optimization using batching¶

Introduction¶

Game engines take to send a set up of instructions to the GPU to tell the GPU what and where to depict. These instructions are sent using common instructions called APIs. Examples of graphics APIs are OpenGL, OpenGL ES, and Vulkan.

Unlike APIs incur different costs when cartoon objects. OpenGL handles a lot of work for the user in the GPU commuter at the cost of more than expensive depict calls. As a result, applications tin ofttimes exist sped upwards past reducing the number of describe calls.

Describe calls¶

In 2d, we need to tell the GPU to render a series of primitives (rectangles, lines, polygons etc). The about obvious technique is to tell the GPU to render one primitive at a time, telling information technology some information such as the texture used, the material, the position, size, etc. then saying "Draw!" (this is called a describe telephone call).

While this is conceptually simple from the engine side, GPUs operate very slowly when used in this fashion. GPUs piece of work much more efficiently if you lot tell them to depict a number of similar primitives all in one draw call, which nosotros will call a "batch".

Information technology turns out that they don't just work a bit faster when used in this manner; they work a lot faster.

Equally Godot is designed to be a general-purpose engine, the primitives coming into the Godot renderer can exist in whatever order, sometimes like, and sometimes dissimilar. To match Godot'due south general-purpose nature with the batching preferences of GPUs, Godot features an intermediate layer which can automatically group together primitives wherever possible and send these batches on to the GPU. This can give an increase in rendering performance while requiring few (if whatever) changes to your Godot project.

How information technology works¶

Instructions come into the renderer from your game in the course of a series of items, each of which can incorporate one or more commands. The items represent to Nodes in the scene tree, and the commands represent to primitives such as rectangles or polygons. Some items such as TileMaps and text can contain a large number of commands (tiles and glyphs respectively). Others, such equally sprites, may only comprise a single command (a rectangle).

The batcher uses ii principal techniques to grouping together primitives:

  • Sequent items can be joined together.

  • Consecutive commands inside an item can be joined to form a batch.

Breaking batching¶

Batching tin only accept place if the items or commands are similar enough to exist rendered in one draw call. Certain changes (or techniques), by necessity, prevent the formation of a contiguous batch, this is referred to as "breaking batching".

Batching will exist broken by (amidst other things):

  • Change of texture.

  • Change of material.

  • Alter of primitive blazon (say, going from rectangles to lines).

Note

For example, if y'all draw a series of sprites each with a dissimilar texture, there is no way they tin be batched.

Determining the rendering club¶

The question arises, if only similar items tin can be drawn together in a batch, why don't we look through all the items in a scene, grouping together all the like items, and draw them together?

In 3D, this is often exactly how engines piece of work. Yet, in Godot'south 2d renderer, items are fatigued in "painter's guild", from back to front. This ensures that items at the front end are drawn on top of earlier items when they overlap.

This also ways that if we try and depict objects on a per-texture basis, then this painter'south guild may break and objects volition be drawn in the wrong order.

In Godot, this dorsum-to-front order is determined by:

  • The society of objects in the scene tree.

  • The Z alphabetize of objects.

  • The canvas layer.

  • YSort nodes.

Notation

You can group like objects together for easier batching. While doing and then is non a requirement on your part, remember of it every bit an optional arroyo that tin improve operation in some cases. Run into the Diagnostics section to aid you make this decision.

A fob¶

And now, a sleight of hand. Even though the idea of painter's order is that objects are rendered from back to forepart, consider 3 objects A , B and C , that contain two different textures: grass and wood.

../../_images/overlap1.png

In painter'south order they are ordered:

                                        A                    -                    wood                    B                    -                    grass                    C                    -                    wood                  

Considering of the texture changes, they tin't be batched and volition be rendered in iii draw calls.

However, painter's order is only needed on the assumption that they will exist fatigued on top of each other. If we relax that assumption, i.e. if none of these iii objects are overlapping, in that location is no need to preserve painter'south club. The rendered upshot volition be the same. What if we could have reward of this?

Item reordering¶

../../_images/overlap2.png

It turns out that we tin can reorder items. However, nosotros can simply do this if the items satisfy the weather condition of an overlap test, to ensure that the end issue will be the same as if they were not reordered. The overlap test is very cheap in performance terms, but not admittedly costless, so there is a slight toll to looking alee to make up one's mind whether items can be reordered. The number of items to lookahead for reordering tin be set in projection settings (come across below), in order to remainder the costs and benefits in your project.

                                        A                    -                    wood                    C                    -                    forest                    B                    -                    grass                  

Since the texture only changes in one case, we can return the above in only 2 draw calls.

Lights¶

Although the batching organization's task is commonly quite straightforward, it becomes considerably more than complex when second lights are used. This is because lights are fatigued using boosted passes, one for each light affecting the archaic. Consider 2 sprites A and B , with identical texture and fabric. Without lights, they would exist batched together and drawn in one describe call. But with iii lights, they would be drawn equally follows, each line being a draw call:

../../_images/lights_overlap.png

                                    A                  A                  -                  light                  1                  A                  -                  light                  two                  A                  -                  light                  3                  B                  B                  -                  lite                  1                  B                  -                  light                  ii                  B                  -                  light                  3                

That is a lot of describe calls: 8 for but 2 sprites. Now, consider we are drawing 1,000 sprites. The number of draw calls quickly becomes astronomical and performance suffers. This is partly why lights accept the potential to drastically wearisome downwardly 2D rendering.

Still, if yous remember our magician's trick from item reordering, information technology turns out we tin utilise the aforementioned trick to get around painter's order for lights!

If A and B are non overlapping, we can return them together in a batch, then the drawing process is as follows:

../../_images/lights_separate.png

                                    AB                  AB                  -                  lite                  ane                  AB                  -                  light                  two                  AB                  -                  calorie-free                  3                

That is only 4 describe calls. Not bad, as that is a 2× reduction. All the same, consider that in a real game, yous might be drawing closer to 1,000 sprites.

  • Before: thousand × 4 = iv,000 draw calls.

  • Afterward: ane × iv = four describe calls.

That is a 1000× decrease in draw calls, and should give a huge increase in performance.

Overlap examination¶

However, as with the item reordering, things are non that unproblematic. We must first perform the overlap test to determine whether we tin bring together these primitives. This overlap exam has a pocket-sized cost. Again, you can cull the number of primitives to lookahead in the overlap test to residue the benefits against the price. With lights, the benefits usually far outweigh the costs.

Likewise consider that depending on the organisation of primitives in the viewport, the overlap test will sometimes fail (because the primitives overlap and therefore shouldn't be joined). In practice, the decrease in draw calls may exist less dramatic than in a perfect situation with no overlapping at all. Nevertheless, performance is unremarkably far higher than without this lighting optimization.

Low-cal scissoring¶

Batching can brand it more hard to cull out objects that are not afflicted or partially affected by a lite. This tin increase the fill rate requirements quite a scrap and slow down rendering. Make full charge per unit is the rate at which pixels are colored. Information technology is another potential bottleneck unrelated to draw calls.

In lodge to counter this problem (and speed up lighting in general), batching introduces low-cal scissoring. This enables the apply of the OpenGL command glScissor() , which identifies an area exterior of which the GPU won't render whatever pixels. We can greatly optimize fill charge per unit by identifying the intersection expanse betwixt a light and a primitive, and limit rendering the lite to that surface area only.

Light scissoring is controlled with the scissor_area_threshold project setting. This value is between 1.0 and 0.0, with 1.0 being off (no scissoring), and 0.0 being scissoring in every circumstance. The reason for the setting is that there may exist some small price to scissoring on some hardware. That said, scissoring should usually result in performance gains when yous're using 2D lighting.

The human relationship between the threshold and whether a scissor operation takes place is not always straightforward. Mostly, it represents the pixel expanse that is potentially "saved" by a scissor performance (i.e. the make full rate saved). At ane.0, the entire screen's pixels would need to be saved, which rarely (if ever) happens, so it is switched off. In practice, the useful values are close to 0.0, as only a small percentage of pixels demand to exist saved for the operation to be useful.

The exact relationship is probably not necessary for users to worry nearly, but is included in the appendix out of involvement: Light scissoring threshold calculation

Light scissoring example diagram

Bottom right is a light, the red surface area is the pixels saved past the scissoring operation. Just the intersection needs to exist rendered.

Vertex blistering¶

The GPU shader receives instructions on what to draw in 2 main ways:

  • Shader uniforms (due east.k. modulate color, item transform).

  • Vertex attributes (vertex color, local transform).

However, inside a unmarried draw call (batch), we cannot change uniforms. This means that naively, we would not be able to batch together items or commands that modify final_modulate or an item's transform. Unfortunately, that happens in an awful lot of cases. For case, sprites are typically individual nodes with their own item transform, and they may take their ain color attune too.

To get effectually this trouble, the batching tin "bake" some of the uniforms into the vertex attributes.

  • The particular transform tin exist combined with the local transform and sent in a vertex attribute.

  • The concluding modulate color can be combined with the vertex colors, and sent in a vertex aspect.

In most cases, this works fine, but this shortcut breaks down if a shader expects these values to be available individually rather than combined. This can happen in custom shaders.

Custom shaders¶

As a result of the limitation described above, certain operations in custom shaders volition prevent vertex baking and therefore decrease the potential for batching. While we are working to decrease these cases, the post-obit caveats currently apply:

  • Reading or writing COLOR or Attune disables vertex color baking.

  • Reading VERTEX disables vertex position baking.

Project Settings¶

To fine-melody batching, a number of project settings are available. You tin usually go out these at default during evolution, but it's a good thought to experiment to ensure you are getting maximum performance. Spending a lilliputian time tweaking parameters can frequently give considerable performance gains for very little attempt. See the on-hover tooltips in the Project Settings for more information.

rendering/batching/options¶

  • use_batching - Turns batching on or off.

  • use_batching_in_editor Turns batching on or off in the Godot editor. This setting doesn't touch on the running project in whatsoever fashion.

  • single_rect_fallback - This is a faster way of drawing unbatchable rectangles. Notwithstanding, it may lead to flicker on some hardware so it'due south not recommended.

rendering/batching/parameters¶

  • max_join_item_commands - Ane of the most of import ways of achieving batching is to join suitable next items (nodes) together, however they can merely be joined if the commands they contain are compatible. The organization must therefore do a lookahead through the commands in an detail to determine whether information technology can be joined. This has a small toll per command, and items with a big number of commands are not worth joining, and then the best value may be project dependent.

  • colored_vertex_format_threshold - Baking colors into vertices results in a larger vertex format. This is not necessarily worth doing unless there are a lot of color changes going on within a joined particular. This parameter represents the proportion of commands containing colour changes / the total commands, above which information technology switches to baked colors.

  • batch_buffer_size - This determines the maximum size of a batch, it doesn't have a huge outcome on operation only tin can be worth decreasing for mobile if RAM is at a premium.

  • item_reordering_lookahead - Particular reordering can aid peculiarly with interleaved sprites using different textures. The lookahead for the overlap examination has a small cost, so the best value may change per project.

rendering/batching/lights¶

  • scissor_area_threshold - See low-cal scissoring.

  • max_join_items - Joining items before lighting can significantly increment operation. This requires an overlap examination, which has a small cost, so the costs and benefits may be project dependent, and hence the best value to use hither.

rendering/batching/debug¶

  • flash_batching - This is purely a debugging characteristic to identify regressions between the batching and legacy renderer. When information technology is switched on, the batching and legacy renderer are used alternately on each frame. This will decrease performance, and should not exist used for your final export, only for testing.

  • diagnose_frame - This will periodically impress a diagnostic batching log to the Godot IDE / panel.

rendering/batching/precision¶

  • uv_contract - On some hardware (notably some Android devices) at that place have been reports of tilemap tiles drawing slightly outside their UV range, leading to edge artifacts such every bit lines around tiles. If you lot see this problem, endeavor enabling uv contract. This makes a small wrinkle in the UV coordinates to compensate for precision errors on devices.

  • uv_contract_amount - Hopefully, the default amount should cure artifacts on most devices, but this value remains adjustable just in instance.

Diagnostics¶

Although you lot can change parameters and examine the event on frame rate, this can feel like working blindly, with no idea of what is going on under the hood. To help with this, batching offers a diagnostic fashion, which will periodically print out (to the IDE or console) a list of the batches that are being processed. This can help pinpoint situations where batching isn't occurring as intended, and assist you prepare these situations to get the best possible performance.

Reading a diagnostic¶

                                        canvas_begin                                                            FRAME                                                            2604                                        items                                                                                joined_item                                                            1                                                            refs                                                                                batch                                                            D                                                            0-0                                                                                batch                                                            D                                                            0-2                                                            n                                                            n                                                                                batch                                                            R                                                            0-1                                                            [                    0                                                            -                                                            0                    ]                                                            {                    255                                                            255                                                            255                                                            255                                                            }                                                                                joined_item                                                            1                                                            refs                                                                                batch                                                            D                                                            0-0                                                                                batch                                                            R                                                            0-1                                                            [                    0                                                            -                                                            146                    ]                                                            {                    255                                                            255                                                            255                                                            255                                                            }                                                                                batch                                                            D                                                            0-0                                                                                batch                                                            R                                                            0-ane                                                            [                    0                                                            -                                                            146                    ]                                                            {                    255                                                            255                                                            255                                                            255                                                            }                                                                                joined_item                                                            i                                                            refs                                                                                batch                                                            D                                                            0-0                                                                                batch                                                            R                                                            0-2560                                                            [                    0                                                            -                                                            144                    ]                                                            {                    158                                                            193                                                            0                                                            104                                                            }                                                            MULTI                                                                                batch                                                            D                                                            0-0                                                                                batch                                                            R                                                            0-2560                                                            [                    0                                                            -                                                            144                    ]                                                            {                    158                                                            193                                                            0                                                            104                                                            }                                                            MULTI                                                                                batch                                                            D                                                            0-0                                                                                batch                                                            R                                                            0-2560                                                            [                    0                                                            -                                                            144                    ]                                                            {                    158                                                            193                                                            0                                                            104                                                            }                                                            MULTI                                        canvas_end                                      

This is a typical diagnostic.

  • joined_item: A joined item can contain 1 or more than references to items (nodes). Generally, joined_items containing many references is preferable to many joined_items containing a unmarried reference. Whether items can exist joined will be determined by their contents and compatibility with the previous item.

  • batch R: A batch containing rectangles. The second number is the number of rects. The second number in foursquare brackets is the Godot texture ID, and the numbers in curly braces is the color. If the batch contains more than one rect, MULTI is added to the line to go far easy to place. Seeing MULTI is good as it indicates successful batching.

  • batch D: A default batch, containing everything else that is not currently batched.

Default batches¶

The 2nd number following default batches is the number of commands in the batch, and it is followed by a brief summary of the contents:

                                        l                    -                    line                    PL                    -                    polyline                    r                    -                    rect                    n                    -                    ninepatch                    PR                    -                    primitive                    p                    -                    polygon                    chiliad                    -                    mesh                    MM                    -                    multimesh                    PA                    -                    particles                    c                    -                    circle                    t                    -                    transform                    CI                    -                    clip_ignore                  

You may run across "dummy" default batches containing no commands; you can ignore those.

Oft asked questions¶

I don't get a large functioning increment when enabling batching.¶

  • Try the diagnostics, see how much batching is occurring, and whether it can exist improved

  • Try changing batching parameters in the Project Settings.

  • Consider that batching may not be your clogging (see bottlenecks).

I become a decrease in performance with batching.¶

  • Try the steps described in a higher place to increase the number of batching opportunities.

  • Try enabling single_rect_fallback.

  • The single rect fallback method is the default used without batching, and information technology is approximately twice as fast. All the same, it can upshot in flickering on some hardware, so its use is discouraged.

  • After trying the above, if your scene is still performing worse, consider turning off batching.

I use custom shaders and the items are non batching.¶

  • Custom shaders can exist problematic for batching, come across the custom shaders department

I am seeing line artifacts appear on certain hardware.¶

  • Run across the uv_contract projection setting which tin be used to solve this problem.

I use a large number of textures, so few items are being batched.¶

  • Consider using texture atlases. Likewise as allowing batching, these reduce the need for state changes associated with irresolute textures.

Appendix¶

Batched primitives¶

Not all primitives can be batched. Batching is not guaranteed either, specially with primitives using an antialiased border. The following primitive types are currently available:

  • RECT

  • NINEPATCH (depending on wrapping way)

  • POLY

  • LINE

With non-batched primitives, you may be able to become meliorate functioning past drawing them manually with polys in a _draw() function. Encounter Custom drawing in 2nd for more information.

Low-cal scissoring threshold calculation¶

The bodily proportion of screen pixel area used as the threshold is the scissor_area_threshold value to the power of 4.

For example, on a screen size of 1920×1080, there are 2,073,600 pixels.

At a threshold of 1,000 pixels, the proportion would be:

                                        chiliad                    /                    2073600                    =                    0.00048225                    0.00048225                    ^                    (                    1                    /                    iv                    )                    =                    0.14819                  

So a scissor_area_threshold of 0.15 would be a reasonable value to try.

Going the other way, for instance with a scissor_area_threshold of 0.5 :

                                        0.5                    ^                    4                    =                    0.0625                    0.0625                    *                    2073600                    =                    129600                    pixels                  

If the number of pixels saved is greater than this threshold, the scissor is activated.