#146 Rendering and caret positioning for complex scripts 9 months ago

Comment by ~tainted-bit on ~eliasnaur/gio

To summarize, the "Write a cgo wrapper around HarfBuzz" step in the original issue has several possible alternatives:

  • Write a cgo wrapper around HarfBuzz.
  • Link to a pre-compiled HarfBuzz using #38917 or the .syso method.
  • Convert the HarfBuzz shaping rules into native Go code, or a Graphite font (or some other encoding).

The github.com/grisha/hbshape package is a simplified form of the original suggestion: a cgo binding around HarfBuzz. If we take that approach, it would probably be best to expose more of the hb_shape functionality and write more code to handle buffer lifetimes. This is the easiest path forward, requiring only a few hundred lines of Go code at most and almost no maintenance. The main downside is that it requires a system dependency for linking with HarfBuzz, which we'd ideally like to avoid since it compromises Gio's "killer feature".

The two harder paths forward are: get the #38917 or .syso approaches to work (would require a lot of C++ linking knowledge and Go toolchain hacking, if it's possible at all) so that we can distribute pre-compiled versions of HarfBuzz, or programmatically translate the HarfBuzz algorithms into Go or Graphite rules (would require experience with static analysis, lots of tweaking, and manually porting some parts).

To give some more context around this last option, here's a rough breakdown of the HarfBuzz control flow based on my cursory observations:

  1. The main entry point is hb_shape_full in hb-shape.cc. The first thing that this does is initialize a shape plan based on the buffer properties. More on this later. [Code link]
  2. When the buffer is configured to use OpenType fonts (which is probably what we'll do), hb_shape_full eventually calls hb_ot_shape_internal in hb-ot-shape.cc with the shape plan. hb_ot_shape_internal contains the high-level logic for shaping. [Code link]
  3. Multiple shaping operations make use of the shape plan: masking, replacement characters, complex positioning, and more. In order to do complex script positioning, the plan is initialized (and cached) with table data based on the buffer's script back in the first step. The relevant part of the call stack for that is hb_ot_shape_complex_categorize in hb-ot-shape-complex.hh. [Code link]. There you can find a script-based switch that chooses a complex script shaper.
  4. Complex script shapers are stored in files called hb-ot-shape-complex-*. There are quite a few of them. As an example of how they work, look at hb-ot-shape-complex-arabic.cc. [Code link]. The main thing to note here is that complex shapers have a lot of power. They can specify font "features" to load (this corresponds to "table" tags in the SFNT format) in order to implement their responsibilities, which can be seen in struct hb_ot_complex_shaper_t in hb-ot-shape-complex.hh. [Code link]. They can use arbitrary C++ code to implement these features; looking at the Arabic shaper reveals that this logic can be quite complex in practice.
  5. The complex shaper implementations often import data tables from unicode. For example, hb-ot-shape-complex-arabic-table.hh contains information about glyph joining and positioning. [Code link].
  6. The table data header files are generated using small Python programs that read unicode standard text files. These programs are stored in files named gen-*.py. For Arabic, the table generator is gen-arabic-table.py. [Code link].

The main takeaway from this for me is that HarfBuzz is complicated! Moreover, most of the source files referenced above were updated between a few hours and a few days ago at the time of this writing---this is what I mean by the project being high maintenance. This is why I think that the only reasonable approach to transcoding the rules (e.g., into Go, Graphite, or something else) is an automated translation, or else the resources required to essentially duplicate the effort would be substantial. Unfortunately, as the summary above points out, this will require static analysis of C++ code, which is certainly non-trivial. It's certainly not possible to extract the rules using dynamic analysis alone (such as a cgo wrapper). The major advantage of the automated translation approach would be that we wouldn't need to link with C++ code at all, while still having access to the latest text shaping features.

Hopefully that gives a good starting point for anyone who wants to explore some of these methods.

#146 Rendering and caret positioning for complex scripts 9 months ago

Comment by ~tainted-bit on ~eliasnaur/gio

Thanks all for the kind words and Elias for the thoughtful analysis. In the interest of full disclosure, I am also not an expert in typography (my day job is cryptography), and most of these topics are things I've only learned in the past week; I'm certain that I remain ignorant of much. Apologies if I accidentally misrepresented my knowledge level. Additionally, beyond gio#104 this feature is in "nice-to-have" territory for my use-case, and thus I don't really have the institutional backing to heavily contribute to a resolution. I created this issue because I think it's an important feature to track, and because I felt that I could provide a general roadmap for others who may be in a better position to contribute.

The Rustaceans did it

I had not heard of the Allsorts project. It looks quite promising! I hope that the Go community could eventually create something similar, because it would be very useful to many projects.

Have you considered the other way round: working through the issues with the Go team and export their implementations? It may take longer, but the result will be reviewed by experts and more widely useful. I'd love for other Go toolkits such as Fyne to have great text layout as well.

That is definitely a possible approach. I'm quite curious about why the package is in its current state and what might be holding it back. If it is because there's a problem with the API design, that would be useful information to know before starting a similar effort. I personally cannot contribute code to the Go project itself, but maybe somebody without these constraints would be interested in solving the mystery and completing the package. Luckily we have the algorithm fully specified in the Unicode Annex to fall back on if needed.

It's really not great that the local machine's language(s) influence how text is laid out. I suspect Pango goes to a great deal of trouble because it wants to be useful in every situation, no matter how little information it has. It would be OK for Gio to require language hints from the programmer to properly support ambiguous scripts.

I agree. I went looking for this algorithm in Pango because it was not clear to me how it would be possible to provide the language to HarfBuzz at all, and was rather surprised to discover how Pango does it. Altering the text shaper based on system settings seems unexpected and undesirable. There are certainly cases where the developer knows the language that a string is written in, so it would be nice to have a version of LayoutString and friends that allows the language to be specified. Getting this right will be tricky for strings that contain multiple languages, though. I can imagine that being done either by passing a slice of string offsets and languages, or by having the app developer call the layout functions for each substring (which would be close to bubbling up responsibility for forming HarfBuzz runs up to the app).

In any case we'll still probably need a function that does best-effort guessing of languages, because there are a lot of cases in which the app also won't know what language a string is written in (particularly in the case of user input). In such cases, the most sensible approach might be to guess using the "representative language" approach in a more generous way than Pango. This would mean that we'd have failure cases like always guessing that Han characters are Chinese, thereby possibly rendering Japanese text incorrectly. These cases are more forgivable when the solution is for the app developer to explicitly specify the language with the other function.

I'm not sure of a better approach, because it seems that in the general case, using the HarfBuzz API essentially requires solving the Natural Language Processing problem, which will not be happening anytime soon (and certainly not at real-time speeds in any case).

My main worry is the hard C++ and Cgo dependency on HarfBuzz.

Yes, I share this worry. It seems to me that the best approach in the short term would be your first suggestion of a cgo package depending on HarfBuzz (which would need to be installed in the development environment at least) that could optionally be swapped out with the default shaper. This is something that would work today, but brings along all of the associated disadvantages. I don't know enough about the #38917 and .syso approaches to know if those would work, but they do sound preferable if they are possible. Depending on how long this issue remains open, it might make sense to start with the optional cgo package and then switch to / release another package using the more advanced methods if they become viable.

It seems to me that translating the HarfBuzz rules into Go would require the same infrastructure as embedding the rules in a Graphite font. I don't see how the rules could be reasonably extracted from HarfBuzz without a code analysis in either case, and so I'd lean toward the Go translation as a better option. This would also be preferable to using cgo in my opinion, if it is possible. After browsing the HarfBuzz sources for a while, I see that there are many table-based rules and code to generate these tables from Unicode-provided text files, but there is also a significant amount of logic in the code that gives me doubts about an automated translation. Still, it is worth investigating more deeply.

#146 Rendering and caret positioning for complex scripts 9 months ago

Ticket created by ~tainted-bit on ~eliasnaur/gio

With the fix for gio#104, we can now render glyphs for all languages. However, certain languages like Arabic are not rendered properly, to the degree that they are essentially still unsupported. This applies to all "complex scripts": writing systems that require advanced processing operations like context-dependent substitutions, context-dependent mark positioning, glyph-to-glyph joining, glyph reordering, or glyph stacking. The current font shaper also does not support bidirectional text. Because of this issue, users who only read/write languages with complex scripts are excluded from using Gio apps. This is a feature request to add full support for all languages supported by unicode. Resolving these limitations will require significant changes to the font shaping system; supporting fonts with broad unicode coverage (gio#104) was a necessary but insufficient step.

Other GUI toolkits such as Qt, GTK+, and browser render engines, all solve this problem using the same software stack, discussed below.

The first piece that we need is a system to convert Go strings (containing runes / unicode codepoints) into a set of glyphs from the font file and their positions. This is by far the most difficult step of proper text rendering, because this is where all of the complexities of human writing systems arise. There is exactly one open-source project that accomplishes this goal: HarfBuzz. At its core, HarfBuzz provides a function called hb_shape that is responsible for translating unicode strings into glyph sequences. This is the library that all of the GUI toolkits use.

It is always preferable to avoid cgo if possible. However, I believe that there is no way that Gio will be able to provide reasonable support for complex scripts without using a cgo wrapper around HarfBuzz. This is because the project is necessarily massive, is constantly receiving fixes and updates, and requires a large and active community to keep it running. These factors are extreme enough that there are essentially no viable alternatives to HarfBuzz in any programming language. Porting it or replicating only the necessary components in Go is a monumental task that is certainly beyond the development resources of Gio, and possibly even beyond the resources of the Go subcommunity that needs this support, and yet it is necessary for supporting complex scripts. Moreover, I believe it is in the best interest of the wider open-source community to concentrate such efforts in the HarfBuzz project. This is why the first step to resolving this issue is writing a cgo wrapper around HarfBuzz (which will be useful not just to Gio, but to other Go projects as well).

The output of a call to HarfBuzz hb_shape is a sequence of glyphs to render from the font. For each glyph, we are given the codepoint to look up in the font file, the X and Y advances (i.e., the values to add to the current rendering cursor position after drawing the glyph), and X and Y offsets (i.e., values to add to the current cursor position when placing the glyph, but these should not affect the cursor position). HarfBuzz also outputs a set of "clusters" for the string: this is the proper way to handle caret positioning and text selection in widget.Editor. The caret position should be based on clusters rather than bytes or runes, because fonts may merge multiple codepoints into individual glyphs.

HarfBuzz requires several pieces of information as input: the string to shape, the script (e.g., Cyrillic), the language (e.g., Russian), the direction (e.g., left-to-right), the font file, and the shaper model (e.g., OpenType shaper). All of this information must be constant for the whole string. This means that rendering something like an Arabic passage with a left-to-right English proper noun in the middle will require multiple calls to HarfBuzz. Additionally, toolkits like Gio want to expose a simple function like LayoutString from text.Shaper to the developer without requiring them to specify things like the language that the string is written in. This means that we need another component that takes an arbitrary string as input, and produces a sequence of HarfBuzz hb_shape calls as output. In other toolkits, this is the job of the Pango library from the GNOME project.

In contrast to HarfBuzz, the relevant algorithms from Pango are small and simple enough to rewrite in pure Go, either as part of Gio or as part of a separate project. Pango supports a lot of extra functionality that we do not need, and the pieces that we do need are buried within a lot of C code that is unnecessary in Go. To replicate the relevant Pango functionality (breaking a string into "runs" such that each "run" will be fed into a single HarfBuzz hb_shape call), we'll need to implement an algorithm like this:

  1. Break up the string into runs with the same text direction (either left-to-right or right-to-left) using the Unicode Bidirectional Algorithm. This algorithm is fully specified in Unicode Standard Annex #9. Strangely, although it is implemented in the Go extended library (golang.org/x/text/unicode/bidi), the code is in unexported functions, and the exported functions all panic with "unimplemented". Implementing this part will likely involve copying code from the unexported functions combined with new code implemented to follow the Unicode Annex.
  2. Within each run, break the string into further runs based on when the script changes. Pango does this by simply checking the unicode ranges for each codepoint to figure out which script it belongs to, and starting a new run whenever that changes. The information in the Go standard library (the RangeTable values in unicode) is sufficient for this.
  3. For each run, we now have the direction and script. We now need the language. In general, there is no algorithm for this. Pango does its best by using this algorithm (which is clearly not great, but it is good enough to be adopted by all of the GUI toolkits):
  4. Scan the languages installed on the system (e.g., specified in environment variables PANGO_LANGUAGE or LANGUAGE) in order of priority. For each language, check to see if the script can be used to write that language. If it can, then return that language.
  5. Otherwise, look up a "representative language" for the script from a hard-coded table (using the pango_script_get_sample_language function). If a representative language is defined, then return it.
  6. Otherwise, return the default language for the locale of the process (whether or not that makes sense).
  7. Set the shaper model to OpenType (we probably don't need to support other formats) and set the font file based on the current Gio theme. Note that for OpenType Collections like we used to fix gio#104, HarfBuzz requires a specific font index from the collection to use (it cannot scan the collection on its own). This means that we'll need to scan the collection to find the font index that supports the script for the run; we should probably scan the font data and cache this information when the collection is first loaded.

To summarize, my suggested path to supporting complex scripts in Gio is:

  1. Write a cgo wrapper around HarfBuzz.
  2. Implement the above algorithm to convert strings into runs.
  3. Update Gio's font rendering to: convert the string to runs, feed the runs into HarfBuzz, and then render the glyphs with the existing rendering system.
  4. Update the text as necessary to remove implicit shaping assumptions that are only valid for non-complex scripts.
  5. Update widget.Editor to base the caret position on clusters output by HarfBuzz, instead of runes.

#104 Support fonts with broad unicode coverage 9 months ago

Comment by ~tainted-bit on ~eliasnaur/gio

Thanks for your guidance with this patch series. As promised, I've uploaded the fonts example so that others can see the new feature in action: github.com/Nik-U/gioexamples/fonts.

Through this process I've learned that supporting all languages is more difficult than just loading a broad-coverage font collection and that fixing this issue was only the first step. I'll open a new issue shortly with what remains to be done and a potential path forward.

#104 Support fonts with broad unicode coverage 10 months ago

Comment by ~tainted-bit on ~eliasnaur/gio

Patch 11495 omits the sfnt.Buffer lock and includes a test with the aforementioned shapings. To determine whether or not two font shapes are equal, it compares the Ops as opaque byte slices. This testing method is very fast and does not require any rendering, but it breaks if the shaper produces non-deterministic Ops or if any of the operations use Refs. I think that these are reasonable assumptions, but I don't know much about the operations subsystem. The test uses two TTF files that I have prepared. As hoped, they are both very small: 1071 bytes combined. The tests also include a rudimentary OTC merging function, as discussed.

#104 Support fonts with broad unicode coverage 10 months ago

Comment by ~tainted-bit on ~eliasnaur/gio

Rendering text is great for a demo but overkill for a test. It seems to me a test that inspects the result of Layout (and/or Shape) for a string containing fallback character(s) would suffice.

This will mostly work, but the tricky part is testing that the replacement glyph comes from the first font in the collection. This is not visible in the []text.Line returned by Layout because the glyph rune will be the same. Distinguishing the replacement characters will require at least a call to Shape and potentially examining the returned op.CallOp to identify the differences.

Alternatively, we could just not test that part of the code and leave it up to Collection to decide where replacement characters come from. After all, if all goes well we'll almost never see them anyway. :)

I'd like to keep any checked in TTF/OTC files tiny, < 1kb or even < 512 bytes like the other reference files in refs/. I'm ok with test relying on the Go fonts not changing, and I'm also ok with non-general OTC merging code (since it's only for the test).

I'm pretty sure that sizes like those are attainable with pyftsubset and potentially some other tools / manual massaging, but I won't know the final size for sure until I try. The plan is to have two TTFs that each contain two glyphs (replacement glyph and one identifiable symbol) and nothing else.

#104 Support fonts with broad unicode coverage 10 months ago

Comment by ~tainted-bit on ~eliasnaur/gio

If it's alright with you, I'd like to avoid the mutex for now, deferring optimization to when it has a bigger impact.

Alright. Perhaps a short comment should be left around the sfnt.Buffer declaration to discourage optimization before the time is right?

I see your point. Let's keep the cool demo in a separate repository (yours?) and I'll link to that from the newsletter and such.

Sure. I'll make a repository for it somewhere and post the link here once the patch is merged.

Without an example, is there a way to add a test that doesn't need large dependencies? I'd like for the new feature not to bit-rot in the future.

That makes sense. My thought is to add internal/rendertest/shaper_test.go with TestCollectionAsFace to do some black-box testing. This would generate an OTC of two fonts, then render the following:

  1. Using font 1, a valid glyph in font 1.
  2. Using font 1, an invalid glyph in font 1 that is also invalid in font 2.
  3. Using font 2, a valid glyph in font 2 that is invalid in font 1.
  4. Using font 2, the same invalid glyph from render 2.
  5. Using the OTC, the glyph from render 1.
  6. Using the OTC, the glyph from render 3.
  7. Using the OTC, the glyph from render 2 / 4.

The tests are then as follows:

  • Renders 1, 2, 3, and 4 are distinct.
  • Render 5 == render 1.
  • Render 6 == render 3.
  • Render 7 == render 2.

The main question is which fonts to use for the test, and how to make the OTC. Right now I'm leaning towards storing sample TTFs and the merged OTC as raw files in the repository (probably under internal/rendertest/refs/ ?). This would avoid the test becoming brittle if the Go fonts change (although admittedly that is unlikely, given that the last update was 3 years ago) and also avoid dependency on Roboto (which is not currently used elsewhere). It also avoids the need to use OTCMerge as a dependency or to include rudimentary OTC merging code (which is admittedly only a few lines if generality is not required).

Does that approach seem sensible to you?

#104 Support fonts with broad unicode coverage 10 months ago

Comment by ~tainted-bit on ~eliasnaur/gio

I can certainly add an example for v3 of this patch. However, if it uses the actual Noto OTCs, that will add a 120 MB download as a dependency. For that reason I'm thinking of adding it as a nested module instead of directly in the gioui.org/example module, if that's okay with you.

I've submitted an example program in patch 11458. It doesn't compile without the opentype.Collection patch applied first, of course.

#104 Support fonts with broad unicode coverage 10 months ago

Comment by ~tainted-bit on ~eliasnaur/gio

Until benchmarks show a definite performance improvement, I prefer the simple approach of just not sharing buffers.

BenchmarkDrawUI in internal/rendertest is a good place to benchmark the approaches, since it calls Layout and Shape on an opentype.Font, but not concurrently (so it is similar to how a single window performs). The lock-vs-no-lock approach we're talking about here is going to be a tiny cost for most applications for the same reason that lock contention would be very low: text.faceCache will cache the layout and paths for most text drawing calls. Consequently the approach we take mainly affects GUIs that make poor use of the cache, such as a program that changes the text it draws very rapidly (e.g., a stopwatch GUI), or a GUI that is being resized by the user. To simulate this, I modified the drawText function in internal/rendertest/bench_test.go to bypass the cache so that it could test the performance of Layout and Shape:

txt.Text = textRows[x] + strconv.Itoa(rand.Int())

Here are the results of 3 runs with -test.benchtime=5s on my i7-6700K:

This patch (using sync.Mutex):

BenchmarkDrawUI-8   	     399	  14885780 ns/op	 5600746 B/op	    1571 allocs/op
BenchmarkDrawUI-8   	     398	  15011455 ns/op	 5601157 B/op	    1571 allocs/op
BenchmarkDrawUI-8   	     399	  15005305 ns/op	 5600769 B/op	    1571 allocs/op

7bbe0da (not sharing sfnt.Buffer):

BenchmarkDrawUI-8   	     400	  15011438 ns/op	 5794994 B/op	    1891 allocs/op
BenchmarkDrawUI-8   	     398	  14971487 ns/op	 5795127 B/op	    1891 allocs/op
BenchmarkDrawUI-8   	     399	  15030301 ns/op	 5794782 B/op	    1891 allocs/op

So in this degenerate case of constantly changing text, both approaches have roughly the same CPU time, but reusing sfnt.Buffer saves many allocs, as expected. The amount that it saves will be proportional to the number of uncached calls to text.Font. The difference between the approaches will be negligible for most GUIs (where the code is rarely called), but measurable for others. So the sync.Mutex approach has superior or equal performance, but slightly more code complexity. Personally I think that using a sync.Mutex and reusing the buffers is simple enough to justify this performance improvement, but I can see why you might disagree.

I would love an example/fonts example showing off the new feature. Can you add it? I believe you can reuse example/hello.

I can certainly add an example for v3 of this patch. However, if it uses the actual Noto OTCs, that will add a 120 MB download as a dependency. For that reason I'm thinking of adding it as a nested module instead of directly in the gioui.org/example module, if that's okay with you.

#104 Support fonts with broad unicode coverage 10 months ago

Comment by ~tainted-bit on ~eliasnaur/gio

Patch 11445 contains the revised code. I was initially confused about why it was okay for opentype.Font to use an sfnt.Buffer in multi-window settings when it was not okay for opentype.Collection to do the same. It turns out that it's not okay for either: there is an existing race condition in opentype.Font. It is triggered very rarely, because it only occurs when multiple windows are rendering strings using the same font simultaneously. Moreover, clobbering the sfnt.Buffer may not actually be noticeable depending on what the sfnt package does with it. In any case, this patch also fixes the existing race condition by adding a lock around buffers for both Font and Collection. The locks will have very low contention, so fast path mutex overhead is almost certainly worth the cost in order to keep buffers around to avoid GC pressure.

I've also packaged the Noto font family in public repositories under github.com/gonoto. This makes it easy to test this patch with a simple change to example/hello:

otc, _ := opentype.ParseCollection(notosans.OTC())
th := material.NewTheme([]text.FontFace{{Face: otc}})