Module stringsext::mission[][src]

Expand description

Parse and convert command-line-arguments into static MISSION structures, that are mainly used to initialize ScannerState-objects.

Structs

Mission represents the instruction parameters used mainly in scanner::scan(). Each thread gets its own instance and stores it in ScannerState.

A collection to bundle all Mission-objects.

When the decoder finds a valid Unicode character, it decodes it into UTF-8. The leading byte of this UTF-8 multi-byte-character must then pass an additional filter before being printed: the so called Utf8Filter. It comes with three independant filter criteria:

Constants

ASCII filter: Let all ASCII pass the filter (0x01..0x100) except Null (0x00) which is “end of string” marker. Null character - Wikipedia

ASCII filter: Controls: (0x00..0x20, 0x7F) C0 and C1 control codes - Wikipedia Unlike traditional strings we exclude “Space” (0x20) here, as it can appear in filenames. Instead, we consider “Space” to be a regular character.

ASCII filter: Set defaults close to those in traditional strings.

ASCII filter: Nothing passes ASCII pass filter

ASCII filter: White-space (0x09..=0x0c, 0x20) C0 and C1 control codes - Wikipedia It do not include “Carriage Return” (0x0d) here. This way strings are divided into shorter chunks and we get more location information.

Unicode-block-filter: Accents: (U+300..U+380).

Unicode-block-filter: Armenian: (U+0540..), Hebrew: (U+0580..), Arabic: (U+0600..), Syriac: (U+0700..), Arabic: (U+0740..), Thaana: (U+0780..), N’Ko: (U+07C0..U+800)

Unicode-block-filter: A filter that let pass all valid Unicode codepoints, except for ASCII where it behaves like the original strings. No leading bytes are filtered.

Unicode-block-filter: No leading bytes are filtered.

Unicode-block-filter: Arabic: (U+600..U+700, U+740..U+780)

Unicode-block-filter: Armenian: (U+540..U+580)

Unicode-block-filter: Kana: (U+3000..), CJK: (U+4000..), Asian: (U+A000..), Hangul: (U+B000..U+E000).

Unicode-block-filter: CJK: (U+3000..A000).

Unicode-block-filter: All 2-byte UFT-8 (U+07C0..U+800) #[allow(dead_code)]

Unicode-block-filter: Cyrillic: (U+400..U+540)

Unicode-block-filter: Greek: (U+380..U+400).

Unicode-block-filter: Hangul: (U+B000..E000).

Unicode-block-filter: Hebrew: (U+580..U+600)

Unicode-block-filter: These leading bytes are alway invalid in UTF-8

Unicode-block-filter: IPA: (U+240..U+300).

Unicode-block-filter: Kana: (U+3000..U+4000).

Unicode-block-filter: Latin: (U+80..U+240). Usually used together with UBF_ACCENTS.

Unicode-block-filter: Misc: (U+1000..), Symbol:(U+2000..U+3000), Forms:(U+F000..U+10000).

Unicode-block-filter: No leading byte > 0x7F is accepted. Therefor no multi-byte-characters in UTF-8, which means this is an ASCII-filter.

Unicode-block-filter: Private use area (U+E00..F00), (U+10_0000..U+14_0000).

Unicode-block-filter: Syriac: (U+700..U+740)

Unicode-block-filter: Besides PUA, more very uncommon planes: (U+10_000-U+C0_000).

Shortcuts for the hexadecimal representation of a unicode block filter. The array is defined as (key, value) tuples. For value see chapter Codepage layout in UTF-8 - Wikipedia

A filter for ASCII encoding searches only. No control character pass, but whitespace is allowed. This works like the traditional stringsextmode. Unless otherwise specified on the command line, his filter is default for ASCII-encoding searches.

A default filter for all non-ASCII encoding searches. For single-byte-characters (af-filter), no control character pass, but whitespace is allowed. This works like the traditional stringsextmode. For multi-byte-characters we allow only Latin characters with all kind of accents. Unless otherwise specified on the command line, this filter is default for non-ASCII-encoding searches.