Crate stringsext
source ยทExpand description
stringsext searches for multi-byte encoded strings in binary data.
stringsext is a Unicode enhancement of the GNU strings tool with
additional functionalities: stringsext recognizes Cyrillic, CJKV characters
and other scripts in all supported multi-byte-encodings, while GNU strings
fails in finding any of these scripts in UTF-16 and many other encodings.
The role of the main-module is to launch the processing of the input stream in
batches with threads. It also receives, merges, sorts and prints the results.
ยงOperating principle
-
The iterator
input::Slicerconcatenates the input-files and cuts the input stream into slices calledmain::slice. -
In
main::run()these slices are feed in parallel to threads, where each has its ownMissionconfiguration. -
Each thread runs a search in
main::slice==scanner::input_buffer. The search is performed byscanner::FindingCollection::scan(), which cuts thescanner::input_bufferinto smaller chunks of size 2*output_line_char_nb_maxbytes hereafter calledinput_window. -
The
Decoderruns through theinput_window, searches for valid strings and decodes them into UTF-8-chunks. -
Each UTF-8-chunk is then fed into the filter
helper::SplitStrto be analyzed if parts of it satisfy certain filter conditions. -
Doing so, the
helper::SplitStrcuts the UTF-8-chunk into even smallerSplitStr-chunks not longer thanoutput_line_char_nb_maxand sends them back to thescanner::FindingCollection::scan()loop. -
There the
SplitStr-chunk is packed into afinding::Findingobject and then successively added to afinding::FindingCollection. -
After finishing its run through the
input_windowthe search continues with the next `input_window. Goto 5. -
When all
input_windows are processed,scanner::FindingCollection::scan()returns thefinding::FindingCollectiontomain::run()and exits. -
main::run()waits for all threads to return theirfinding::FindingCollections. Then, allFindingss are merged, sorted and finally print out byfinding::print(). -
While the print still running, the next
main::slice==scanner::input_bufferis sent to all threads for the next search. Goto 3. -
main::run()exits when allmain::slices are processed.
Modulesยง
- finding ๐Store string-findings and prepare them for output.
- finding_collection ๐
- help ๐Help the user with command-line-arguments.
- helper ๐Small functions of general use, mainly used in module
scanner. - input ๐Cut the input stream in chunks for batch processing.
- mission ๐Parse and convert command-line-arguments into static
MISSIONstructures, that are mainly used to initializeScannerState-objects. - options ๐This module deals with command-line arguments and directly related data structures.
- scanner ๐Find encoded strings in some input chunk, apply a filter (defined by a
Mission-object) and store the filtered strings as UTF-8 inFinding-objects.
Macrosยง
- A macro useful to reuse an existing buffer while ignoring eventual existing borrows. Make sure that this buffer is not used anymore before applying this! Buffer reuse helps to avoid additional memory allocations.
- This macro is useful for zero cost conversion from &u8 to &str. Use this with care. Make sure, that the byte slice boundaries always fit character boundaries and that the slice only contains valid UTF-8. Also, check for potential race conditions yourself, because this disables borrow checking for
$slice_u8. This is the mutable version. - This macro is useful for zero-cost conversion from &u8 to &str. Use this with care. Make sure, that the byte-slice boundaries always fit character boundaries and that the slice only contains valid UTF-8. Also, check for potential race conditions yourself, because this disables borrow checking for
$slice_u8. This is the immutable version. - Parses a filter expression from some hexadecimal string or from some filter-alias-name in
$listto a filter-integer value. - Parses a filter expression from some hexadecimal string or number string to an integer value.