The multifast Tool
The Multifast package provides you with a tool which is called multifast. It is a text processing tool which utilizes the functionalities of the Aho-Corasick library to provide a command-line interface for the user.
Build
Before building the multifast program, you must build the Aho-Corasick library first. See the library page. Then do the followings commands:
$ cd multifast
$ make
Manual
NAME
multifast - stream editor for searching and substituting text
SYNOPSIS
multifast -P pattern-file [-R out-dir [-l] | -n[d|x]rpvfi] [-h] input-file1 [input-file2 ...]
DESCRIPTION
The program receives a pattern file and one or more input file(s). It searches inside the input file(s) for the given patterns. The program works in two modes: search or replace. In the search mode it reports the matched items in the output. In the replace mode it yields the result of the substitution in the output directory. The default mode is search mode.
- -d
- Used in search mode to show the start positions of the matched patterns in decimal form.
- -f
- Used in search mode in order to find only the first matched item.
- -h
- Print usage
- -i
- Used in search mode for case insensitive search.
- -l
- Used in replace mode to performs replacement in lazy mode. See the explanation in below.
- -n
- Used in search mode in order to show the item numbers in the output.
- -P pattern-file
- Specifies the pattern file.
- -p
- Used in search mode to print out patterns in the output. For non-printable characters it shows the hex equivalent.
- -R out-dir
- Used in replace mode to specify the output directory to save the replacement result.
- -r
- Used in search mode to shows the representative string for the patterns.
- -v
- Used in search mode to show the verbose output.
- -x
- Used in search mode to show the start positions of the matched patterns in hexadecimal form.
Input file(s) could be a single file name, multiple file names, standard input, or single directory name. in last case only regular file of the directory will be searched. Directories are not accepted for replace mode. Using find and xargs utilities you can generate multiple file names for multifast. It is recommended to use find and xargs commands to produce multiple files instead of using directory name.
You cat feed multifast from standard input; to do so you need to write a single dash (-) instead of file name.
In replace mode you need to determine an output directory. The replacement result will be saved in the given directory with the same hierarchy of the input.
There are two replacement modes: Normal and Lazy. In the normal mode any pattern occurrence is replaced, and factor patterns are ignored. In the lazy mode the first occurrence is replaced, and if the first occurrence has a common factor with a successor pattern, then the successor is ignored. For example consider the following patterns file:
a {abc} {x} a {cb} {y} a {b} {z}
The result in different replacement mode is as follows.
In the normal mode:
- Input: abc; output: x
- Input: abcb; output: xy
In the lazy mode:
- Input: abc; output: azc
- Input: abcb; output: azy
PATTERN FILE
Pattern file is a text file which contains patterns. Each pattern is defined by a 4-part expression:
AX (ID) {PATTERN} {REPLACEMENT}
- The first part (we call it AX) can take only 2 values: 'a' or 'x'. The 'a' stands for ASCII and 'x' stands for hexadecimal. This part is mandatory. The interpretation of the 3rd and 4th parts, depends on the value of AX.
- The second part, ID, defines a meaningful identifier for the pattern. For patterns with non-printable character or large patterns it helps to improve the intelligibility of the output. This part is optional. For patterns without identifier the program will assign an automatic identifier. This part is enclosed by parenthesis and only can take 0-9, a-zA-Z and _ (no space allowed).
- The third part is the main part which defines the string of characters. A string can be in ASCII (e.g. abc) or in hexadecimal representation (e.g. 61 62 63). It is determined by the first part (AX). Strings must be put inside brackets ({}). If the string contains brackets it must be escaped. e.g. {abc\{dd\}g}. Backslashes (\) must be escaped as well if they are presented in the input. E.g. {dro\\des} equals to "dro\des". No other form of escape provided. If the input contains non-printable characters they must be given in the hexadecimal form. Be careful about initial and ending spaces between your string and the brackets. They are taken into account. E.g. { lily} is equal to " lily" not "lily". In ASCII mode everything you put inside the brackets (including line breaks) will be taken into account. For HEX mode, only hex digits (0-9, a-fA-F) are allowed inside the brackets. the number of digits inside the bracket must be even. No other constraints are applied. There could be spaces between digits.
- The 4th part defines the replacement pattern and is the same as part 3.
Here is a sample pattern file:
# comments a (n1) {cinnamon} # n1 is the representative for cinnamon a {apple } {orange } # Replace "apple " with "orange " (no representative) x (sunflower) { 56 10 23 Ef EB 1D e9 09 d3 7c a4} {00 00 00 2A} # # comments a (ice) {tip of the iceberg} x {20 b3 7e 0a 40 97 79 ff ac 2d 84 2c 0c 3d 60 8d} # comments x(poppy) {50 55 42 5 1 6 c c c 0 a} { 00 00 00 00 } x (rose) {00 00 00 fe002345 e3} { ff ff ff ff ff}
NOTE:
- You can define a pattern in several lines.
- Multiple patterns can be defined in one line.
- You can add comment to pattern file using #.
- You can not put comment inside {} or ().
EXAMPLES
Examples are based on the following files in the current working directory.
. ├── cities.pat ├── cities_r.pat ├── dir1 │ ├── input1.txt │ └── input2.txt └── dir2 └── file1.txt
Working in Search Mode
The followings are typical examples for search which demonstrate different display options.
$ multifast -P cities.pat dir1/input1.txt $ multifast -P cities.pat -nxp dir1/input1.txt $ multifast -P cities.pat -ndrp dir1/input1.txt $ multifast -P cities.pat dir1/input1.txt dir1/input2.txt $ multifast -P cities.pat dir1 $ multifast -P cities.pat -ndrpf dir1 $ multifast -P cities.pat -ndrpf /var/www/
To feed multifast from standard input, use a single dash (-) instead of file name.
$ cat dir1/input1.txt | multifast -P cities.pat -dp -
Using find and xargs utilities you can generate multiple file names for multifast:
$ find dir* -type f -print0 | xargs -0 multifast -P cities.pat -xrp $ find /var/www/ -type f -print0 | xargs -0 multifast -P cities.pat -ndrpf
Working in Replace Mode
The following example replace the given patterns in the input files and save the result in the given directory with the same hierarchy input.
$ multifast -P cities_r.pat -R out1 dir1/* $ multifast -P cities_r.pat -R out2 -l dir1/input1.txt dir2/file1.txt
Warning: It replaces the existing files without asking.