Module stringsext::mission[−][src]

Expand description

Parse and convert command-line-arguments into static MISSION structures, that are mainly used to initialize ScannerState-objects.

Structs

MISSIONS

Mission

Mission represents the instruction parameters used mainly in scanner::scan(). Each thread gets its own instance and stores it in ScannerState.

Missions

A collection to bundle all Mission-objects.

Utf8Filter

When the decoder finds a valid Unicode character, it decodes it into UTF-8. The leading byte of this UTF-8 multi-byte-character must then pass an additional filter before being printed: the so called Utf8Filter. It comes with three independant filter criteria:

Constants

AF_ALL

ASCII filter: Let all ASCII pass the filter (0x01..0x100) except Null (0x00) which is “end of string” marker. Null character - Wikipedia

AF_CTRL

ASCII filter: Controls: (0x00..0x20, 0x7F) C0 and C1 control codes - Wikipedia Unlike traditional strings we exclude “Space” (0x20) here, as it can appear in filenames. Instead, we consider “Space” to be a regular character.

AF_DEFAULT

ASCII filter: Set defaults close to those in traditional strings.

AF_NONE

ASCII filter: Nothing passes ASCII pass filter

AF_WHITESPACE

ASCII filter: White-space (0x09..=0x0c, 0x20) C0 and C1 control codes - Wikipedia It do not include “Carriage Return” (0x0d) here. This way strings are divided into shorter chunks and we get more location information.

ASCII_FILTER_ALIASSE

UBF_ACCENTS

Unicode-block-filter: Accents: (U+300..U+380).

UBF_AFRICAN

Unicode-block-filter: Armenian: (U+0540..), Hebrew: (U+0580..), Arabic: (U+0600..), Syriac: (U+0700..), Arabic: (U+0740..), Thaana: (U+0780..), N’Ko: (U+07C0..U+800)

UBF_ALL

Unicode-block-filter: A filter that let pass all valid Unicode codepoints, except for ASCII where it behaves like the original strings. No leading bytes are filtered.

UBF_ALL_VALID

Unicode-block-filter: No leading bytes are filtered.

UBF_ARABIC

Unicode-block-filter: Arabic: (U+600..U+700, U+740..U+780)

UBF_ARMENIAN

Unicode-block-filter: Armenian: (U+540..U+580)

UBF_ASIAN

Unicode-block-filter: Kana: (U+3000..), CJK: (U+4000..), Asian: (U+A000..), Hangul: (U+B000..U+E000).

UBF_CJK

Unicode-block-filter: CJK: (U+3000..A000).

UBF_COMMON

Unicode-block-filter: All 2-byte UFT-8 (U+07C0..U+800) #[allow(dead_code)]

UBF_CYRILLIC

Unicode-block-filter: Cyrillic: (U+400..U+540)

UBF_GREEK

Unicode-block-filter: Greek: (U+380..U+400).

UBF_HANGUL

Unicode-block-filter: Hangul: (U+B000..E000).

UBF_HEBREW

Unicode-block-filter: Hebrew: (U+580..U+600)

UBF_INVALID

Unicode-block-filter: These leading bytes are alway invalid in UTF-8

UBF_IPA

Unicode-block-filter: IPA: (U+240..U+300).

UBF_KANA

Unicode-block-filter: Kana: (U+3000..U+4000).

UBF_LATIN

Unicode-block-filter: Latin: (U+80..U+240). Usually used together with UBF_ACCENTS.

UBF_MISC

Unicode-block-filter: Misc: (U+1000..), Symbol:(U+2000..U+3000), Forms:(U+F000..U+10000).

UBF_NONE

Unicode-block-filter: No leading byte > 0x7F is accepted. Therefor no multi-byte-characters in UTF-8, which means this is an ASCII-filter.

UBF_PUA

Unicode-block-filter: Private use area (U+E00..F00), (U+10_0000..U+14_0000).

UBF_SYRIAC

Unicode-block-filter: Syriac: (U+700..U+740)

UBF_UNCOMMON

Unicode-block-filter: Besides PUA, more very uncommon planes: (U+10_000-U+C0_000).

UNICODE_BLOCK_FILTER_ALIASSE

Shortcuts for the hexadecimal representation of a unicode block filter. The array is defined as (key, value) tuples. For value see chapter Codepage layout in UTF-8 - Wikipedia

UTF8_FILTER_ASCII_MODE_DEFAULT

A filter for ASCII encoding searches only. No control character pass, but whitespace is allowed. This works like the traditional stringsextmode. Unless otherwise specified on the command line, his filter is default for ASCII-encoding searches.

UTF8_FILTER_NON_ASCII_MODE_DEFAULT

A default filter for all non-ASCII encoding searches. For single-byte-characters (af-filter), no control character pass, but whitespace is allowed. This works like the traditional stringsextmode. For multi-byte-characters we allow only Latin characters with all kind of accents. Unless otherwise specified on the command line, this filter is default for non-ASCII-encoding searches.