[][src]Module stringsext::scanner

This module encapsulates and abstracts the interface with the encoding-crate and spawns worker threads scanners, searching for valid strings.

Scanner algorithm

  1. A scanner is a thread with an individual search Mission containing the encoding it searches for.

  2. The input data is divided into consecutive overlapping memory chunks. A chunk is a couple of 4KB memory pages, WIN_LEN Bytes in size.

  3. Scanners are in pause state until they receive a pointer to a memory chunk with a dedicated search Mission.

  4. All scanner-threads search simultaneously in one memory chunk only. This avoids that the threads drift to far apart.

  5. Every scanner thread searches its encoding consecutively Byte by Byte from lower to higher memory.

  6. When a scanner finds a valid string, it encodes it into a UTF-8 copy. Valid strings are composed of control characters and graphical characters.

  7. The copy of the above valid string is split into one or several graphical strings. Hereby all control characters are omitted. The graphical strings are then concatenated and the result is stored in a Finding object. A Finding-object also carries the memory location of the finding and a label describing the search mission. Goto 5.

  8. A scanner stops when it passes the upper border WIN_STEP of the current memory chunk.

  9. The scanner stores its Finding-objects in a vector referred as Findings. The vector is ascending in memory location.

  10. Every scanner sends its Findings to the merger-printer-thread. In order to resume later, it updates a marker in its Mission-object pointing to the exact Byte where it has stopped scanning. Besides this marker, the scanner is stateless. Finally the scanner pauses and waits for the next memory chunk and mission.

  11. After all scanners have finished their search in the current chunk, the merger-printer-thread receives the Findings and collects them in a vector.

  12. The merger-printer-thread merges all Findings from all threads into one timeline and prints the formatted result through the output channel.

  13. In order to prepare the next iteration, pointers are set to beginning of the next chunk. Every scanner resumes exactly where it stopped before.

  14. Goto 3.

  15. Repeat until the last chunk is reached.



Holds the runtime environment for ScannerPool::launch_scanner().


As the ScannerPool.scan_window() function itself is stateless, the following variables store some data that will be transfered from iteration to iteration. Each thread has a unique ScannerState which holds a reference to a unique Mission.