Crate stringsext[][src]

Expand description

stringsext searches for multi-byte encoded strings in binary data.
stringsext is a Unicode enhancement of the GNU strings tool with additional functionalities: stringsext recognizes Cyrillic, CJKV characters and other scripts in all supported multi-byte-encodings, while GNU strings fails in finding any of these scripts in UTF-16 and many other encodings.
The role of the main-module is to launch the processing of the input stream in batches with threads. It also receives, merges, sorts and prints the results.

Operating principle

  1. The iterator input::Slicer concatenates the input-files and cuts the input stream into slices called main::slice.

  2. In main::run() these slices are feed in parallel to threads, where each has its own Mission configuration.

  3. Each thread runs a search in main::slice == scanner::input_buffer. The search is performed by scanner::FindingCollection::scan(), which cuts the scanner::input_buffer into smaller chunks of size 2*output_line_char_nb_max bytes hereafter called input_window.

  4. The Decoder runs through the input_window, searches for valid strings and decodes them into UTF-8-chunks.

  5. Each UTF-8-chunk is then fed into the filter helper::SplitStr to be analyzed if parts of it satisfy certain filter conditions.

  6. Doing so, the helper::SplitStr cuts the UTF-8-chunk into even smaller SplitStr-chunks not longer than output_line_char_nb_max and sends them back to the scanner::FindingCollection::scan() loop.

  7. There the SplitStr-chunk is packed into a finding::Finding object and then successively added to a finding::FindingCollection.

  8. After finishing its run through the input_window the search continues with the next `input_window. Goto 5.

  9. When all input_window s are processed, scanner::FindingCollection::scan() returns the finding::FindingCollection to main::run() and exits.

  10. main::run() waits for all threads to return their finding::FindingCollection s. Then, all Findings s are merged, sorted and finally print out by finding::print().

  11. While the print still running, the next main::slice == scanner::input_buffer is sent to all threads for the next search. Goto 3.

  12. main::run() exits when all main::slice s are processed.

Modules

Store string-findings and prepare them for output.

Help the user with command-line-arguments.

Small functions of general use, mainly used in module scanner.

Cut the input stream in chunks for batch processing.

Parse and convert command-line-arguments into static MISSION structures, that are mainly used to initialize ScannerState-objects.

This module deals with command-line arguments and directly related data structures.

Find encoded strings in some input chunk, apply a filter (defined by a Mission-object) and store the filtered strings as UTF-8 in Finding-objects.

Macros

A macro useful to reuse an existing buffer while ignoring eventual existing borrows. Make sure that this buffer is not used anymore before applying this! Buffer reuse helps to avoid additional memory allocations.

This macro is useful for zero cost conversion from &u8 to &str. Use this with care. Make sure, that the byte slice boundaries always fit character boundaries and that the slice only contains valid UTF-8. Also, check for potential race conditions yourself, because this disables borrow checking for $slice_u8. This is the mutable version.

This macro is useful for zero-cost conversion from &u8 to &str. Use this with care. Make sure, that the byte-slice boundaries always fit character boundaries and that the slice only contains valid UTF-8. Also, check for potential race conditions yourself, because this disables borrow checking for $slice_u8. This is the immutable version.

Parses a filter expression from some hexadecimal string or from some filter-alias-name in $list to a filter-integer value.

Parses a filter expression from some hexadecimal string or number string to an integer value.

Constants

(c) Jens Getreu

Uses the version-number defined in ../Cargo.toml.

Functions

Application entry point.

Processes the input stream in batches with threads. Then receives, merges, sorts and prints the result