stringsext
is a Unicode enhancement of the GNU strings tool with
additional functionalities: stringsext recognizes Cyrillic, CJKV characters
and other scripts in all supported multi-byte-encodings, while GNU strings
fails in finding any of these scripts in UTF-16 and many other encodings.The iterator input::Slicer
concatenates the input-files and cuts
the input stream into slices called main::slice
.
In main::run()
these slices are feed in parallel to threads, where each has
its own Mission
configuration.
Each thread runs a search in main::slice
== scanner::input_buffer
. The
search is performed by scanner::FindingCollection::scan()
, which cuts
the scanner::input_buffer
into smaller chunks of size
2*output_line_char_nb_max
bytes hereafter called input_window
.
The Decoder
runs through the input_window
, searches for valid strings and
decodes them into UTF-8-chunks.
Each UTF-8-chunk is then fed into the filter helper::SplitStr
to be
analyzed if parts of it satisfy certain filter conditions.
Doing so, the helper::SplitStr
cuts the UTF-8-chunk into even smaller
SplitStr
-chunks not longer than output_line_char_nb_max
and sends
them back to the scanner::FindingCollection::scan()
loop.
There the SplitStr
-chunk is packed into a finding::Finding
object and
then successively added to a finding::FindingCollection
.
After finishing its run through the input_window
the search continues with
the next `input_window. Goto 5.
When all input_window
s are processed, scanner::FindingCollection::scan()
returns the finding::FindingCollection
to main::run()
and exits.
main::run()
waits for all threads to return their
finding::FindingCollection
s. Then, all Findings
s are merged,
sorted and finally print out by finding::print()
.
While the print still running, the next main::slice
==
scanner::input_buffer
is sent to all threads for the next search.
Goto 3.
main::run()
exits when all main::slice
s are processed.
scanner
.MISSION
structures,
that are mainly used to initialize ScannerState
-objects.Mission
-object) and store the filtered strings as UTF-8 in Finding
-objects.$slice_u8
.
This is the mutable version.$slice_u8
.
This is the immutable version.$list
to a filter-integer value.