Scroll Profiling: Large Input List Drag

Scenario

100 multiline inputs, each with 100 lines of random text.
Scroll from top to bottom over 120 frames (slow drag).
Creation/layout done once before profiling; profiler captures scrolling only.
Renderer uploads (GL bufferData) are not executed; CPU-side VectorBuffer writes are included.

Profiler Output Summary

Top stacks (abridged):

assets/lua/layout.fnl:update:457 (LayoutRoot update loop)
assets/lua/bucket-queue.fnl:iterate:35 (layout queue walk)
assets/lua/list-view.fnl:114 (ListView layouter: per-child positioning)
assets/lua/layout.fnl:layouter:241 (layout traversal)
assets/lua/input.fnl:383 (Input layouter)
assets/lua/text.fnl:55 (Text glyph layout + VectorBuffer writes)

Interpretation: scrolling dirties layout, which re-lays out the list and each input widget every frame. The heaviest cost comes from ListView iterating all children and Input/Text layouters updating geometry.

Current Glyph Optimization (post-profile)

assets/lua/text.fnl now caches glyph metadata and avoids per-glyph table allocations during layout.
This reduces CPU overhead when text content is unchanged, but it does not remove the per-vertex position writes that happen on every layout pass.
Advances are computed per layout from the current TextStyle.scale (scale is not part of the cache). This keeps label LOD scaling correct today; if/when we move to a transform-only pass, we may want a scale-aware cache key.

Recent Profiler Runs

Baseline (pre-glyph-cache)

Total samples: ~601M
text.fnl layouter: ~3.4%
input.fnl layouter: ~5.2%

After glyph cache + allocation-free vertex emission

Total samples: ~484M
text.fnl layouter: ~2.8%
input.fnl layouter: ~4.6%

After scale-safe cache fix (current)

Total samples: ~449M
text.fnl layouter: ~2.8%
input.fnl layouter: ~4.6%

Key Hotspots

1) LayoutRoot pass cost

Location: assets/lua/layout.fnl:update:457
Cause: every scroll tick enqueues layout work across a large subtree.
Effect: repeated per-frame layout of all list items.

2) ListView layout of every child

Location: assets/lua/list-view.fnl:114 (layout-children loop)
Cause: scroll offset marks layout-dirty at scroll container, then ListView layouter recomputes position/size for all items.
Effect: O(N) layout work for N = 100, even though only a small visible subset changes.

3) Input layout and text geometry

Locations:
- assets/lua/input.fnl:383 (Input layouter)
- assets/lua/text.fnl:55 (Text layouter)
Cause: Input layouter drives child layouts and Text layouter rewrites glyph quads to VectorBuffers.
Effect: expensive per-item CPU work that repeats on every scroll tick.

4) Layout bookkeeping overhead

Locations:
- assets/lua/layout.fnl:116 (clip/visibility bookkeeping)
- assets/lua/layout.fnl:layouter:241 (tree traversal)
Cause: clip visibility checks and traversal across a large subtree on each scroll.

Suggested Optimizations

A) Split layout vs. transform/cull passes (unified for scroll + drag)

Goal: keep culling correct while avoiding full layout and glyph rebuilds on movement.

Approach:
- Layout pass computes local sizes/positions only when content/size changes.
- Transform/cull pass composes parent transforms, updates world bounds, and recomputes clip visibility.
- Scroll and Movables update only parent transforms and trigger transform/cull pass.
Expected impact:
- Scroll/drag becomes O(N) cheap (transform + bounds), instead of O(N) expensive (full layout + glyph rebuilds).
- Culling stays correct because clip visibility is recomputed. Implementation notes:
Add a transform-dirty queue in LayoutRoot.
Store local-position/rotation/size separate from world values.
Renderers consume world transforms via a per-batch uniform or a lightweight matrix stack.

B) Virtualize list items

Goal: reduce list size that participates in layout and text updates.

Approach: keep widgets only for visible rows + small overscan buffer.
Implementation idea:
- VirtualListView that reuses widget instances as scroll index changes.
- Adjust content height without instantiating off-screen inputs.
Expected impact: reduces layout + text costs by visible count, not total count.

C) Skip Input/Text work on pure scroll

Goal: avoid per-frame glyph rebuild when content and viewport are unchanged.

Approach:
- In Input layouter, only run apply-viewport and refresh-virtual-text when viewport size or model content changes.
- In Text layouter, reuse existing geometry and update transform/clip only; avoid rewrites of all glyph vertices.
Expected impact: reduces VectorBuffer writes and layout work per item.

D) Cache measure results

Goal: avoid running measurers when sizes are stable.

Approach:
- Cache child measurements in ListView; only re-measure on text/model change.
Expected impact: reduces measure-dirt propagation and downstream layout.

E) Aggressive culling for off-screen children

Goal: early-out of layout work for items outside clip.

Approach:
- Ensure ScrollArea clip visibility is evaluated before children; set parent-culled? so child layouters can early return.
Expected impact: lower layout time for large off-screen sets.

Notes on Measurer During Scroll

The scroll path (ScrollView:set-scroll-offset -> ScrollArea:set-scroll-offset) marks layout-dirty only. Measurers run during scroll only if:

viewport/scrollbar visibility changes and marks measure-dirty, or
content changes (e.g., text updates) mark measure-dirty.

Next Experiments

Re-run profiler after (A) to confirm ListView layout cost drops.
Add a run including renderer uploads to quantify GL bufferData costs if needed.

Scroll Profiling: Large Input List Drag ​

Scenario ​

Profiler Output Summary ​

Current Glyph Optimization (post-profile) ​

Recent Profiler Runs ​

Baseline (pre-glyph-cache) ​

After glyph cache + allocation-free vertex emission ​

After scale-safe cache fix (current) ​

Key Hotspots ​

1) LayoutRoot pass cost ​

2) ListView layout of every child ​

3) Input layout and text geometry ​

4) Layout bookkeeping overhead ​

Suggested Optimizations ​

A) Split layout vs. transform/cull passes (unified for scroll + drag) ​

B) Virtualize list items ​

C) Skip Input/Text work on pure scroll ​

D) Cache measure results ​

E) Aggressive culling for off-screen children ​

Notes on Measurer During Scroll ​

Next Experiments ​