[][src]Crate stringsext

stringsext searches for multi-byte encoded strings in binary data.
stringsext is a Unicode enhancement of the GNU strings tool with additional functionalities: stringsext recognizes Cyrillic, CJKV characters and other scripts in all supported multi-byte-encodings, while GNU strings fails in finding any of these scripts in UTF-16 and many other encodings.
The role of the main-module is to launch the processing of the input stream in batches with threads. It also receives, merges, sorts and prints the results.

Operating principle

  1. The iterator input::Slicer concatenates the input-files and cuts the input stream into slices called main::slice.

  2. In main::run() these slices are feed in parallel to threads, where each has its own Mission configuration.

  3. Each thread runs a search in main::slice == scanner::input_buffer. The search is performed by scanner::scan(), which cuts the scanner::input_buffer into smaller chunks of size 2*output_line_char_nb_max bytes hereafter called input_window.

  4. The Decoder runs through the input_window, searches for valid strings and decodes them into UTF-8-chunks.

  5. Each UTF-8-chunk is then fed into the filter helper::SplitStr to be analyzed if parts of it satisfy certain filter conditions.

  6. Doing so, the helper::SplitStr cuts the UTF-8-chunk into even smaller SplitStr-chunks not longer than output_line_char_nb_max and sends them back to the scanner::scan() loop.

  7. There the SplitStr-chunk is packed into a finding::Finding object and then successively added to a finding::FindingCollection.

  8. After finishing its run through the input_window the search continues with the next `input_window. Goto 5.

  9. When all input_window s are processed, scanner::scan() returns the finding::FindingCollection to main::run() and exits.

  10. main::run() waits for all threads to return their finding::FindingCollection s. Then, all Findings s are merged, sorted and finally print out by finding::print().

  11. While the print still running, the next main::slice == scanner::input_buffer is sent to all threads for the next search. Goto 3.

  12. main::run() exits when all main::slice s are processed.

Modules

finding

Store string-findings and prepare them for output.

help

Help the user with command-line-arguments.

helper

Small functions of general use, mainly used in module scanner.

input

Cut the input stream in chunks for batch processing.

mission

Parse and convert command-line-arguments into static MISSION structures, that are mainly used to initialize ScannerState-objects.

options

This module deals with command-line arguments and directly related data structures.

scanner

Find encoded strings in some input chunk, apply a filter (defined by a Mission-object) and store the filtered strings as UTF-8 in Finding-objects.

Macros

as_mut_slice_no_borrow_check

A macro useful to reuse an existing buffer while ignoring eventual existing borrows. Make sure that this buffer is not used anymore before applying this! Buffer reuse helps to avoid additional memory-allocations.

as_mut_str_unchecked_no_borrow_check

This macro is useful for zero-cost conversion from &[u8] to &str. Use this with care. Make sure, that the byte-slice boundaries always fit character boundaries and that the slice only contains valid UTF-8. Also, check for potential race conditions yourself, because this disables borrow checking for $slice_u8. This is the mutable version.

as_str_unchecked_no_borrow_check

This macro is useful for zero-cost conversion from &[u8] to &str. Use this with care. Make sure, that the byte-slice boundaries always fit character boundaries and that the slice only contains valid UTF-8. Also, check for potential race conditions yourself, because this disables borrow checking for $slice_u8. This is the immutable version.

ascii_enc_label

Encoding name literal used when simulating non-built-in ASCII-decoder.

chars_min_default

Default value, when no --chars-min command-line-argument is given. Must be u8.

counter_offset_default

Default value, when no --counter-offset command-line-argument is given. Must be of type ByteCounter.

encoding_default

If no command-line argument --chars_min is given and none is specified in --encoding use this. Must be one of --list-encodings.

output_line_char_nb_max_default

Default value when no --output-line-len command-line-argument is given. Must be usize.

parse_filter_parameter

Parses a filter expression from some hexadecimal string or from some filter-alias-name in $list to a filter-integer value.

parse_integer

Parses a filter expression from some hexadecimal string or number string to an integer value.

Constants

AUTHOR

(c) Jens Getreu

VERSION

Use the version-number defined in ../Cargo.toml.

Functions

main

Application entry point.

run

Process the input stream in batches with threads. Then receive, merge, sort and print the results.