Copyright 1999 The Perl Journal. Reprinted with permission. # Time-stamp: "1999-09-26 12:44:14 MDT" =head1 Braille Contractions and Regular Expressions by Sean M. Burke with Sheri Wells-Jensen This article is about how I used regular expressions to solve a particular the problem of how to take common English text and turn it into Braille, correctly using all the possible shorthand-like contractions and abbreviations that Braille provides. This is basically a case study in how a messy problem got solved straightaway with Perl -- but along the way I'll mention some of the new features to be found in Perl 5.005's regular expressions, which I used along the way; and in the end, I find a surprising commonality between regexps and natural language writing systems. =head2 Braille, and Braille Contractions When I was a little kid, I had a children's book about Hellen Keller. I don't remember reading it; I just remember the back cover, which had the Braille alphabet printed on it -- well, embossed, actually. They had the Roman letter "a" in ink, and then below that the embossed dot pattern for Braille "a", and so on up to "z". So I got the idea that Braille printing is just like a letter-for-letter substitution cypher for the Roman alphabet. Then I started noticing on men's room doors that below "MEN" in big Roman letters, there'd be presumably the same word in Braille -- but sometimes the word would have three Braille characters, and sometimes just two. And I I found perplexing. I couldn't imagine how the word "men" could end up only two characters long. So I asked my friend Sheri, who's been reading and writing Braille since she was a kid, how "men" could be just two characters long. She explained that Braille has contractions: the first character in "men" is "m", and the second character is a contraction for "en". (For the rest of this article, I'll use wedges to denote contractions, so "" will mean the I Braille character {{ Footnote: Braille readers typically use the term "sign" to mean a Braille character. }} that is a contraction for the two Roman letters "en".) Moreover, some contractions are context-specific -- there's a contraction for "com", but it can apply only at the start of words, as in "computer". These shorthand-like contractions make reading Braille faster, and they make books in Braille less expensive and less bulky. There's contractions for "en", "st", "ea", "er", "the" (the letter sequence, not just the word "the"), and lots more. It occurred to me that there were cases where I could imagine more than one way to apply contractions. Consider the two ways to encode the word "leather": l e a t h e r . \_/ \_/ \_/ -- contract "ther" as "" or l e a t h e r . \_/ \___/ . -- contract "ther" as "r" In each case, you end up with a word four letters long, so brevity can't be a tie-breaker. I asked Sheri whether both were acceptible spellings -- she said no; the second spelling, with "r", is the only correct way to encode "leather". This reminded me of English hyphenation. For example, you I hyphenate "boggling" as "bogg-ling" (sort of like "gos-ling", maybe), but you could also hyphenate it as "bog-gling", and that seems the I way to do it, so that gets to be the I way. So I was curious how this would be implemented as an encoding algorithm. Sheri explained that programs do exist for taking conventional English text, and applying all the contractions possible, in the right places -- what I'll just call "Braille encoding". Braille encoding programs can be embedded in tactile displays for computers, so that when I send Sheri email (unencoded), it shows on her display in encoded Braille. Or a Braille encoder can be a program that encodes a whole file at a time, typically for preparing getting an unencoded text file ready for sending to a Braille embosser, a kind of impact printer. One encoder program is from the National Federation of the Blind, and is called NFBTRANS. {{ Footnote: NFBTRANS is by no means the only such program around. Another widely used program is DBT, Duxbury Braille Translator. DBT, however, is a commercial product, with so source available. NFBTRANS is free, and comes with source. }} I hunted down a copy of NFBTRANS, and looked at the source code that come with it, and discovered that it was not just a Braille encoder, but that it also had many features like those of a typesetting system like troff. It was about seven thousand lines of C, and in the guts of it, I did manage to find the encoding routines -- and I couldn't make heads or tails of the source for them. NFBTRANS reads the list of contractions from an external data file that come with the program, but I couldn't figure out exactly how the rules were applied -- namely, in what order or priority. In other words, the rules file would include rules like: "the" contracts in any context "er" contracts in any context "th" contracts in any context ...but how or why would NFBTRANS manage to always contract the end of the word "leather" to "r" and not ""? Were there other "-ther-" words that could get "" instead of "r"? To fill in the blanks in my understanding of Braille, and to try my hand at modeling of a linguistic system in Perl, I decided to try to develop an algorithm in Perl to correctly apply these rules, and to check them against NFBTRANS's output. =head2 Kinds of Linguistic Rule Systems I'd had some experience with formal linguistic models of the subsystems that make up natural languages. Every time I'd seen a linguist create a formal model of a subsystem of a language (whether English phonology, or Hindi syntax, or something less abstract, like hyphenation), the rules that model the system have been in one of the two kinds of frameworks I'll explain below -- either the kind I call a "generativity system", or the kind I call an "optimality system". So I expected Braille encoding to be one of these two sorts of rule systems, and the first task was to figure out which it was. =head2 Generativity Systems In a generativity system, there's a list of rules, to be applied in a particular order. Linguists often use this kind of rule system to model phonology, the study of the way the sounds of language are represented in the brain. Phonological models assume that words are stored in a somewhat simplified, abstracted form, and that to work out the details of how these to be actually pronounced, we apply a series of rules. For example, a phonological model of my dialect of American English would assume that the "long i" sound is always stored as simply that. But I, like most Americans, pronounce it differently in "kite" than in "fly". To account for this difference, phonological models say that this sound starts out as just itself (here writen "/aj/"), but that rules may apply to change it in some contexts. This isn't just about vowels -- the two different-sounding t's in "write" and in "writing" supposedly both start out as just /t/, but a rule happens to change the latter one to sound like a /d/. The rules describing these two phonological changes could look like: * the diphthong /aj/ (as is "fly") changes to /Uj/ (as in "kite") when it's before /p/, /t/, or /k/. * the consonant /t/ changes to /d/ when it's between two vowels. These rules, when applied in that order, are supposed to take words in a representation that doesn't reflect the finer distinctions (like the difference between /aj/ and /Uj/), and change those representations, to flesh them out with just those distinctions. Now, translating rules like these into regular expressions used to be hard, but now, with Perl 5.005 regexps, it's a snap, with "lookbehind" and "lookahead": s/aj # target : the vowel /aj/ (?=[ptk]) # following context : /[ptk]/ /Uj/gx && print "R1 applies. Output: \$_\n"; s/(?<=[aeiouIUj]) # preceding context : any vowel t # target : /t/ (?=[aeiouIUj]) # following context : any vowel /d/gx && print "Output of system: \$_\n"; What this gets you over /aj[ptk]/ or /[aeiouIUj]t[aeiouIUJ]/ is that with lookbehind (as in "(?<=[aeiouIUj])") or lookahead (as is "(?=[aeiouIUj])"), what matches the lookahead and lookbehind parts of the regexp aren't part of what gets matched by the regexp as a whole. That is, text matching the lookbehind expression is free to precede the "cursor" (the part of the regexp mechanism that pos() reports the position of), and text matching the lookahead expression doesn't move the cursor forward, as normal matching does. (Also, the lookbehind text doesn't end up in C<\$&>, which you probably know as the "what matched?" regexp variable, but which you can better think of as "now that we've matched something, what was between where the point started, and where it ended up?".) Now, the above phonetic rules are greatly simplified for sake of discussion, but you can see their application to "write + ing", starting from its simplified abstract form /rajt + IN/: \$_ = 'rajtIN'; print "Input to system: \$_\n"; s/aj (?=[ptk]) /Uj/gx && print "R1 applies. Output: \$_\n"; s/(?<=[aeiouIUj]) t (?=[aeiouIUj]) /d/gx && print "R2 applies. Output: \$_\n"; print "Output of system: \$_\n"; This prints: Input to system: rajtIN R1 applies. Output: rUjtIN R2 applies. Output: rUjdIN Output of system: rUjdIN And this gives the correct pronunciation for "writing" in my dialect, /rUjdIN/. Change the first line to "rajdIN" ("riding"), and neither rule applies, and you get "rajdIN" out. (So, yes, in my dialect, "writing" and "riding" sound different.) And, importantly, if you swap the rules so that R2 applies I R1, you get: Input to system: rajtIN R2 applies. Output: rajdIN Output of system: rajdIN (So, if I dialect has "writing" and "riding" sounding the same, it might be that rule-ordering is the only difference between your dialect and mine, on this point.) The ordering of the rules in generativity systems is crucial; if you have the right rules in the wrong order, you get the wrong answer. If Braille encoding were a generativity system, I'd need to figure out how to order the rules from the NFBTRANS data table. =head2 Optimality Systems If a generativity system is one that gives rules that get you from input to correct output, then an optimality system is one that takes all kinds of even remotely conceivable possible output, and ranks them in order of desirability ("optimality"); then the highest ranked one is the "correct form" that's the output of the system. (And if it seems to you that generativity is like imperative programming, and optimality is like logical programming -- E la Prolog -- then you're basically right.) The algorithms I've seen that implement English hyphenation are basically optimality systems. I encourage interested readers to look at the hyphenation algorithm in TeX (which you can see reiterated in Perl in CPAN's TeX::Hyphen), but for sake of discussion, suppose you can model English hyphenation with these rules for ranking candidate forms: * Hyphenating between consonant letters is good. (As in "gos-ling".) * Hyphenating between a double consonant is good. (As in "bit-ter".) * Hyphenating between a consonant and a vowel is bad. (As in "gosl-ing".) * If hyphenating leaves a word-fragment of just consonants, that's I bad. (As in "g-osling" or "gosli-ng".) In Perl, you could implement this as: use strict; my \$in = 'boggling'; my \$best = \$in; my \$best_score = 0; my \$Cons = 'bcdfghjklmnpqrstvwxz'; my \$Vowel = 'aeiouy'; foreach my \$i (1 .. (length(\$in) - 1)) { \$_ = \$in; my \$score = 0; substr(\$_, \$i, 0) = '-'; ++\$score if /[\$Cons]-[\$Cons]/oi; ++\$score if /([\$Cons])-\1/oi; --\$score if /[\$Cons]-[\$Vowel]/oi; \$score -= 10 if /^[\$Cons]+-/oi || /-[\$Cons]+\$/oi; print " \"\$_\" : score: \$score\n"; if(\$score > \$best_score) { \$best_score = \$score; \$best = \$_; } } print "Best: \"\$best\" with score of \$best_score\n"; The output of this is: "b-oggling" : score: -11 "bo-ggling" : score: 0 "bog-gling" : score: 2 "bogg-ling" : score: 1 "boggl-ing" : score: -1 "boggli-ng" : score: -10 "bogglin-g" : score: -9 best: "bog-gling" with score of 2 ...so these rules seem to work right -- the highest-ranked form ("bog-gling") is the best; and incidentally, the second best ("bogg-ling") is not too bad, and from there on out it's all quite bad ("bo-ggling", "boggli-ng", etc.). Note that it doesn't matter here what I you apply the rules in; but it does matter what I gets attached to each rule. In the above example, I've kept it simple, but suppose we now add a rule that means "the closer to the middle of the word you hyphenate, the better", such as: \$score += .5 * (length(\$in) - abs(\$i - length(\$in) / 2)); If we leave that weighting constant at .5, you still get "bog-gling" coming out on top still. Change it to a 1, and we have a tie between "bog-gling" and "bogg-ling", since the point that "bog-gling" gets for hyphenating between a double consonant is offset by the point that it loses out on for the hyphen not being at the exact middle of the word. And if we change the constatn to 1.5, and we get "bogg-ling" coming out on top. If NFBTRANS's rules somehow interacted with an optimality system with ranking-rules like "give K points for every letter this contraction saves", or possibly "give (or subtract?) L points if this word ends up with contractions next to eachother" or "subtract a half-point if the word ends in a contraction", then I'd need to first puzzle out what these ranking-rules were, and then to figure out what their values for K and L were. =head2 Regexp Replacement as a First Hack I did intuitively feel that Braille encoding somehow had a lot in common with hyphenation (except that it was about contracting letters instead of sticking hyphens inbetween them), suggesting that optimality was at least part of the system. On the other hand, optimality systems often do have some generative component to them (since you need I some candidates to apply the ranking-rules), so I figured that whether the real answer would eventually be in a generativity system or an optimality system, I'd probably have to work up something generativity-based. So I decided to take the rules from NFBTRANS's rules file, and cook them up into a regular expression, and use that to perform all the substitutions, in a one-pass regexp like C, an approach I borrowed from Kevin Lenzo's article in TPJ#12. I figured that this approach would by no means behave correctly, but that the cases where it didn't behave correctly would give me some kind of hint as to whether a generativity system or an optimality system was called for, and then from there I could start worrying about the ordering or weighting of rules. Granted, next to either generativity systems or optimality systems, a big one-pass regular expression replacement seems pretty strange, but Perl makes regexps so easy that it seemed the path of least resistance for a first hack. =head2 Contexts in Regular Expressions Suppose the rule file consists of: "the" contracts to "" in any context "er" contracts to "" in any context "th" contracts to "" in any context "ea" contracts to "" only in the middle of words "ar" contracts to "" in any context "ear" contracts to "e" in any context Now, a regexp for "'the' in any context" would be a simple C. But how to express "'ea' only in the middle of words"? You could do it with something like C to assert that there be a word character before and after the "ea", but Perl already provides a metacharacter for matching a word boundary (C<\b>), or the absence of one (C<\B>). In other words, what's needed here is the absence of word boundaries before and after the "ea", and you can do that with simply C. Translating the above mini-list of Braille contraction rules into a regexp gives: %contraction_for = ( 'the' => '', # Remember: I'm using "" here # in place of the Braille ASCII, or Unicode, # code for the single Braille character # for that contracts "the" 'er' => '', 'th' => '', 'ea' => '', 'ar' => '', 'ear' => 'e', ); s/(the|ear|ar|\Bea|er|th)/\$contraction_for(\$1)/eg; Now, notice that I moved the longer strings to the start of the regexp. This is a crucial point in understanding Perl's implementation of regular expressions: given alternates, Perl will match the first one it can match, regardless of whether there may be other later alternates that are longer. So if "C" came before "C" in that regexp, the "C" would always match, never letting "C" match. Moreover, if there were a "C<\Bea\B>" and an "C" in the regexp, if the "C" came first, it would block the "C<\Bea\B> from ever matching. So, in producing this regular expression, I had to make sure that the alternates started out with the longest match strings, in the most specific contexts (like "C<\Bfoo\B>"), and worked their way down to the shortest match strings, in the most general contexts (like just plain "C"). This was a simple matter of a Schwartzian transform, where the sorter function looked like: ... sort { length(\$b->[0]) <=> length(\$a->[0]) # sort first by length of the literal or \$b->[1] <=> \$a->[1] # then by context "precedence" # where "\B_" gets a higher "precedence" # score than "_" } ... By this time, my code looked like about 180 lines of code for reading the contraction rules file and making a regexp out of it, and only about 20 lines to perform the contractions, basically as a wrapper around this: \$word =~ s/(\$my_re)/\$contraction_for{\$1}/oeg; =head2 Embedding Code in Regular Expressions I'd forgotten about one thing -- there could be rules that say something like: "foo" contracts to "" at word-start "foo" contracts to "" elsewhere Now, you could model this pair of (fictitious!) rules with a regexp that contains C<(...|\bfoo|foo|...)>, but then whether C<"\bfoo"> or C<"foo"> matches, you end with simply with C<\$1> holding C<"foo">. In other words, while you can use "C<\b>" or real lookbehind/lookahead to select for certain contexts, you don't end up knowing which context actually matched. However, a new feature in Perl 5.005 is exactly what I needed here. The feature is still somewhat experimental, and a full discussion of it would require an article of its own; but to make a long story short, having C<(?{ CODE })> in a regular expression will cause I``` (any Perl code) to be evaluated whenever the regular expression engine hits that point in trying to match the regular expression. So, to differentiate between "C<\bfoo>" and "C" matching, I add to the regexp a bit of code to be executed at the end of each: \$word =~ s/ ( | \bfoo (?{ \$x=1 }) | foo (?{ \$x=2 }) ) /\$contract[\$x]{\$1}/eg; Here, if "C<\bfoo>" is the bit of the regexp that matches, the last thing that the match on it does is to execute the code C<\$x=1>, (I'm just using \$x as a flag), so then the string that got matched (say, the "foo" in "foolish") gets replaced with the value of C<\$contract[1]{'foo'}>. But if it's the second bit of the regexp ("C") that matches (as in the "foo" in "buffoon"), C<\$x=2> will get executed, and so "foo" will get replaced with the value of C<\$contract[2]{'foo'}>. This array-of-hashes C<@contract> (accessed like \$contract[I]{I}) will have been filled by the same part of the program that read the rules in and created the regular expression. =head2 Rules as Exceptions Most linguistic models deal with exceptional cases (like, say, irregular verbs) by simply exempting certain forms from the rules. However, NFBTRANS's rules table knows only rules and more rules, where more specific rules stop more general rules from applying. For example, given: "the" contracts to "" in any context "er" contracts to "" in any context "th" contracts to "" in any context "ea" contracts to "" in the middle of words "ar" contracts to "" in any context "ear" contracts to "e" in any context the rule that replaces "the" with the single character "" blocks the more general rule of replacing "th" with "" from applying. Likewise, the rule that targets "ear" blocks the rule that targets "ea" from applying -- which is why "heart" contracts to "heEar>t", not "hEea>rt". Now, for some reason, it's incorrect to contract the "ear" in "tearoom" at all. This I be modelled by saying that "tearoom" is on a list of exception words that are exempt from contractions, but then you have I kinds of data in the system -- the rules table, and the list of exceptions. The way NFBTRANS models this is to simply add another rule: "tearoom" contracts to "tearoom" in any context A rule that replaces a word with itself seems entirely pointless, but the point of it is to block the more general rules. {{ Footnote: Actually, the rule says "replace 'tearo' with 'tearo' in any context", not specifying the whole world -- but I think the people who thought up that rule thought of "tearoom" as the only word it could apply to, and were not aware of the rarer "-tearo-" words: "Aotearoa" or "stearolactone". }} The NFBTRANS rules file basically consists of: * a few dozen general rules that implement the regular contractions, like "ea". * a few dozen more rules that implement more specific (yet still pretty general) contractions like "ear", * about a hundred rules for whole-word abbreviations like "qk" for "quick", and then * about a I rules that are there simply to block the more general rules. The above "tearoom" rule is an example that happens to block I contractions for that word, but the average rule is meant to stop only I of the contractions from applying. For example, applying the normal contraction rules to "pineapple" would contract it as "ppple". However, this spelling is considered incorrect, so there's a rule specifically to fix this with the correct spelling: "pineapple" contracts to "peapple" in any context And so on for "phoenix" (to block the "en" from contracting), "coworker" (to block the "ow" from contracting, lest they become cow orkers), and all the way up to the longest rule, which replaces "psychedelic" with "psyedelic" (blocking the ""). Incidentally, as with exceptions in hyphenation, most of these exceptions to the normal Braille contraction rules are motivated by wanting to not contract (or wanting I hyphenate between) letters that belong to different syllables or to different morphemes (roots, prefixes, or suffixes, like "tea-room", "co-worker", etc.). But the "ea" in "create" I contract to the single "" character, even though it's clearly across two syllables. Since the conventions for Braille contractions evolved naturally -- just as general English spelling did -- it's full of patterns, and messy exceptions, and messy exceptions to the messy exceptions. =head2 Testing It When I wrote the code that generated the big regular expression to match everything that could be contracted, I initially fed it just a few rules, then a few dozen, and when I did feed it the whole rule file, it ended up making a single regular expression 14KB long. The longest regexp I'd ever heard of anyone ever compiling was the 6.5KB regexp for matching RFC822-valid email addresses (in Appendix B of Jeffrey Friedl's I), and this was more than twice as long -- worrisomely long. I anticipated a screaming "OUT OF MEMORY" error, or that the regexp would take two minutes of swapping to compile. But it compiled imperceptably fast, and without errors. I took the "UNIX wordlist" (a motley wordlist of 25,000 English words, one per line), and encoded it with a copy of NFBTRANS, so that I could compare my algorithm's output it to. I wrote a small program to use my algoritm to encode the wordlist, print all cases where my algorithm's output disagreed with the output of the inscrutible NFBTRANS algorithm, and then report my algorithm's accuracy. Guessing wildly, I expected no better than 50% similarity between the outupt of my algorithm, and the output of NFBTRANS. =head2 It's Alive -- Alive! My first-hack one-pass regexp algorithm turned out to agree with NFBTRANS 99.59% of the time. I was boggled that it was over 50%, much less over 99%. Most of the cases of disagreement were words like "R&D" or "O'Leary" -- since this was a first hack, I had not bothered to properly deal with words with internal punctuation or complex capitalization, which require somewhat special treatment in Braille encoding. The remaining dozen words where my algorithm encoded differently than NFBTRANS were because I'd misimplemented the regexp context for one of the more obscure rules. (And incidentally, I've only implemented my encoder for words I -- in encoding of actual sentences, there's punctuation to deal with; and a few Braille contractions in words are sensitive to the context of the word, like whether there's punctuation after the word, which makes things a bit messier.) This one-pass regexp approach to be just a stab in the dark, and I expected to use the list of its failures, to refine the algorithm -- say, developing a generativity system that would make one pass to handle word-initial contractions, then another to handle word-final contractions, then a final pass for everything leftover inbetween. Or maybe the answer would lie in an optimality system to consider all I ways to contract a word, and select the one that had the fewest characters, maybe the fewest dots total, maybe the most contractions closest to the beginning or end of the word, or the like. But it was nothing so complicated -- this big regexp, in one pass, did the whole job. Not only did this seem much too easy (although this is a not unfamiliar sensation in programming Perl), but it was quite unprecedented -- I'd never heard of a phenomenon in natural language that could be treated with a regular expression. I'd always considered regular expressions to be essentially abstract and mathematical. After all, the formalism of regular expressions was devised just this century, coming out of the work of mathematician Stephen Kleene; and the workings of the regexp engine (as in Mark-Jason Dominus's article in TPJ#9) are explained in terms of automata. And whatever an automaton is (a big scary robot?), it sure doesn't sound like anything to do with something that people do unconsciously when they write Braille (And, incidentally, people have been been writing English in Braille this way for decades longer than regexps have been around.) But in spite of the very formalist origins of regular expressions, I think that the basic mechanism underlying regexp replacement -- scanning left to right, matching as much as you can, context permitting -- is not so unnatural and abstract. That's what we do when we use systems of contraction or abbreviation, whether in Braille, Gregg shorthand, or our own idiosyncratic abbreviations: we start at the beginning of the word (as opposed to inking in the end, and working backwards from there), and we abbreviate as much, and as early, as possible. Granted, Perl regexps have plenty of features that have no parallel in this process of Braille contractions (like backtracking, notably). But there's the commonality, in that what's called "greedy matching" for regexps, is effectively the same as what we do when we abbreviate/contract, because of a mix of economy (wanting to save space) and laziness (not having to operate on the whole word at a time, or to consider lots of alternatives, but just contracting in one pass, as you write, starting at the beginning of the word). Perl programmers know from experience that regexps are immensely useful, and my experience with Braille encoding is just another case of that. But the fact that regexps have a basic commonality with a linguistic phenomenon like Braille encoding suggests that not only are regexps a useful tool, but that they're also linguistically natural and, hopefully, more learnable and valuable because of it. __END__ Sean M. Burke (C) likes making linguists think he's a programmer, and programmers think he's a linguist. Sheri Wells-Jensen (C) is a linguist who used to be a programmer, way back when everyone thought FORTH was a pretty neat idea. (Sean adds: thanks to Tack-Tiles (tack-tiles.com) for lending me a set of the Braille blocks they make, and thanks to David Ondrik for taking the picture of them. And I'd also like to thank the conferees at this year's YAPC for their encouraging response to my talk on Braille encoding with regular expressions.) The final Braille-encoding code is available, along with some other Braille links, at C ---- Caption for the photo: Alphabet blocks for Braille! The text spelled out is: "The PerlEnumber>5 pilEer> r lEin>e 850 diEed>." [I think that's what it says... I'll have to check.] ---- Text for the box: =head2 Sample Text in Braille The first paragraph of Chapter 3 of Sincair Lewis's I: It was the best of nationally advertised and quantitatively produced alarm-clocks, with all modern attachments, including cathedral chime, intermittent alarm, and a phosphorescent dial. Babbitt was proud of being awakened by such a rich device. Socially it was almost as creditable as buying expensive cord tires. [GIF HERE: from http://www.speech.cs.cmu.edu/~sburke/braille/babbit.gif ] Under each Braille line is the "Braille ASCII" version -- simply the Braille characters mapped onto 7-bit ASCII codes. You'll notice that the character mapped to "5" is the contraction ""; the character mapped to "/" is the contraction ""; and the the character mapped to "," means "the next word is capitalized" when it precedes a word, but on the inside of words, it's a prefix for particular abbreviations (e.g., it plus "y" means "-ally"). The e-text of all of I is 717K unencoded (ASCII), and 581K encoded (grade-2 Braille, in Braille ASCII) -- a size savings of 19%. ```