stringsext
is a Unicode enhancement of the GNU strings tool with
additional functionalities: stringsext recognizes Cyrillic, CJKV characters
and other scripts in all supported multi-byte-encodings, while GNU strings
fails in finding any of these scripts in UTF-16 and many other encodings.The iterator input::Slicer
concatenates the input-files and cuts
the input stream into slices called main::slice
.
In main::run()
these slices are feed in parallel to threads, where each has
its own Mission
configuration.
Each thread runs a search in main::slice
== scanner::input_buffer
. The
search is performed by scanner::FindingCollection::scan()
, which cuts the scanner::input_buffer
into smaller chunks of size 2*output_line_char_nb_max
bytes hereafter called
input_window
.
The Decoder
runs through the input_window
, searches for valid strings and
decodes them into UTF-8-chunks.
Each UTF-8-chunk is then fed into the filter helper::SplitStr
to be
analyzed if parts of it satisfy certain filter conditions.
Doing so, the helper::SplitStr
cuts the UTF-8-chunk into even smaller
SplitStr
-chunks not longer than output_line_char_nb_max
and sends them back to the
scanner::FindingCollection::scan()
loop.
There the SplitStr
-chunk is packed into a finding::Finding
object and
then successively added to a finding::FindingCollection
.
After finishing its run through the input_window
the search continues with
the next `input_window. Goto 5.
When all input_window
s are processed, scanner::FindingCollection::scan()
returns the
finding::FindingCollection
to main::run()
and exits.
main::run()
waits for all threads to return their
finding::FindingCollection
s. Then, all Findings
s are merged,
sorted and finally print out by finding::print()
.
While the print still running, the next main::slice
==
scanner::input_buffer
is sent to all threads for the next search.
Goto 3.
main::run()
exits when all main::slice
s are processed.
Store string-findings and prepare them for output.
Help the user with command-line-arguments.
Small functions of general use, mainly used in module scanner
.
Cut the input stream in chunks for batch processing.
Parse and convert command-line-arguments into static MISSION
structures,
that are mainly used to initialize ScannerState
-objects.
This module deals with command-line arguments and directly related data structures.
Find encoded strings in some input chunk, apply a filter (defined by a
Mission
-object) and store the filtered strings as UTF-8 in Finding
-objects.
A macro useful to reuse an existing buffer while ignoring eventual existing borrows. Make sure that this buffer is not used anymore before applying this! Buffer reuse helps to avoid additional memory allocations.
This macro is useful for zero cost conversion from &u8 to &str. Use
this with care. Make sure, that the byte slice boundaries always fit character
boundaries and that the slice only contains valid UTF-8. Also, check for potential
race conditions yourself, because this disables borrow checking for
$slice_u8
.
This is the mutable version.
This macro is useful for zero-cost conversion from &u8 to &str. Use
this with care. Make sure, that the byte-slice boundaries always fit character
boundaries and that the slice only contains valid UTF-8. Also, check for potential
race conditions yourself, because this disables borrow checking for
$slice_u8
.
This is the immutable version.
Parses a filter expression from some hexadecimal string or from some
filter-alias-name in $list
to a filter-integer value.
Parses a filter expression from some hexadecimal string or number string to an integer value.