Preface
linux There are many tools for text processing , for example :sort, cut, split, join, paste, comm, uniq, column, rev, tac, tr, nl, pr, head, tail….., Study linux Lazy way of text processing ( Not the best way ) May be : Just learn grep,sed and awk.
Use these three tools , You can solve the problem 99% linux Text processing of the system , Instead of remembering the different commands and parameters above .
and , If you’ve learned and used all three , You’ll know the difference . actually , The difference here means which tools are good at solving what problems .
A more lazy way might be to learn scripting languages (python,perl or ruby) And use it for every text processing .
summary
awk、grep、sed yes linux Three sharp tools for text manipulation , It is also necessary to master linux Command one .
The function of all three is to process text , But the focus is different , Among them awk Most powerful , But it’s also the most complicated .grep More suitable for simple search or matching text ,sed More suitable for editing matched text ,awk Better for formatting text , More complex formatting of text .
A brief summary :
-
grep: Data search positioning
-
awk: Data slicing
-
sed: Data modification
grep = global regular expression print
In the simplest terms ,grep( Global regular expression printing )– The command is used to find the strings in the file that match the conditions . Start with the first line of the file ,grep Copy a line to buffer in , Compare it to the search string , If the comparison passes , Print the line to the screen .grep This process will be repeated , Until the file searches all lines .
Be careful : There is no process execution here grep Store lines 、 Change the line or search only a few lines .
Sample data file
Please cut and paste the following data into a file named “sampler.log” In the file of :
-
boot
-
book
-
booze
-
machine
-
boots
-
bungie
-
bark
-
aardvark
-
broken$tuff
-
robots
A simple example
grep The simplest example is :
-
grep "boo" sampler.log
In this case ,grep Will traverse the file “sampler.log” Each line , And print out every line Contains words “boo”:
-
boot
-
book
-
booze
-
boots
But if you’re working on large files , This will happen : If these lines identify which line in the file , What are they , It might be more useful to you , If you need to open a file in an editor , So it’s easier to track specific strings and make some changes . In this case, you can add -n Parameter to implement :
-
grep -n "boo" sampler.log
This leads to a more useful result , Explains which lines match the search string :
-
1:boot
-
2:book
-
3:booze
-
5:boots
Another interesting parameter is -v, It prints the opposite result . let me put it another way ,grep All lines that do not match the search string will be printed , Instead of printing the line that matches it .
In the following cases ,grep Will print without strings “boo” Each line , And display the line number , As shown in the previous example
-
grep -vn "boo" sampler.log
-
4:machine
-
6:bungie
-
7:bark
-
8:aardvark
-
9:broken$tuff
-
10:robots
c Options tell grep Suppress printing of matching lines , Show only the number of matching rows , The rows that match the query . for example , The numbers will be printed below 4, Because there is 4 It’s in sampler.log
It appears that “boo”.
-
grep -c "boo" sampler.log
-
4
l Option prints only the file name string of the file in the query that has a line that matches the search . If you want to search multiple files for the same string , This will be very useful . like this :
-
grep -l "boo" *
For searching non code files , A more useful option is -i, Ignore case . This option will handle when matching search strings , Equal case . In the following example , Even if the search string is uppercase , contain “boo” And the lines will also be printed out .
-
grep -i "BOO" sampler.log
-
boot
-
book
-
booze
-
boots
x Options only match exactly . let me put it another way , The following command search has no results , Because no line contains only “boo”
-
grep -x "boo" sampler.log
Last ,-A Allows you to specify additional up and down file lines , So you get the search string extra lines , for example
-
grep -A2 "mach" sampler.log
-
machine
-
boots
-
bungie
Regular expressions
Regular expressions are a compact way to describe complex patterns in text .
With grep You can use search mode ( pattern ) . Other tools use regular expressions (regexp) In a complex way . and grep The normal string used , It’s actually very simple regular expressions . If you use wildcards , Such as ‘ * ‘ or ‘ ? ‘, For example, list file names and so on , You can use grep Search with basic regular expressions
For example, search a file for letters e The line at the end :
-
grep "e$" sampler.log
-
booze
-
machine
-
bungie
If you need more extensive regular expression commands , Must be used grep-E
.
for example , Regular expression commands ? Will match 1 or 0 time Previous characters :
-
grep -E "boots?" sampler.log
-
boot
-
boots
You can still use it pipe(|)
Combine multiple searches , It means “ perhaps ”, So you can do this :
-
grep -E "boot|boots" sampler.log
-
boot
-
boots
Special characters
If you want to search for a special character , What should I do ? If you want to find all the lines , If it contains the dollar character “$”, It cannot be executed grep“$”a_file
, because ‘$’ Will be interpreted as regular expressions , contrary , You will get all the lines , Any of them ends as a line , That is, all lines . The solution is “ escape ” Symbol , So you will use
-
grep '\$' sampler.log
-
broken$tuff
You can still use it “-F” Options , It represents “ Fixed string ” or “ Fast ”, Because it only searches for Strings , Not regular expressions .
added regexp Example
Reference resources :http://gnosis.cx/publish/programming/regular_expressions.html
AWK
from Aho,Weinberger and Kernighan Create text patterns for scanning and processing languages .
AWK Very complicated , So this is not a complete guide , But it should give you a way to know what awk You can do it . It’s easy to use , Strongly recommended .
AWK Basic knowledge of
awk The program operates on each line of the input file . It can have an optional BEGIN{ } Part of the command executed before processing anything in the file , Then master { } Parts run on every line of the file , Finally, there’s an alternative END{ } Part of the operation will be performed later, and the file reading is completed :
-
BEGIN { …. initialization awk commands …}
-
{ …. awk commands for each line of the file…}
-
END { …. finalization awk commands …}
For each line of the input file , It looks to see if there are any pattern matching instructions , In this case, it only runs on lines that match the pattern , Otherwise it runs on all lines . these 'pattern-matching'
A command can contain and grep The same regular expression .
awk Commands can do some very complex mathematical and string operations ,awk It also supports associative arrays . AWK Think of each line as consisting of multiple fields , Each field consists of “ Spacer ” Separate . By default , This is one or more space characters , So it’s OK :
-
this is a line of text
contain 6 A field . stay awk in , The first field is called $1, The second field is called $2, wait , All lines are called $0.
The field separator is defined by awk Internal variables FS Set up , So if you set FS= “: ” Then it will be based on ‘:’ In a row , This is for /etc/passwd
Documents like that are very useful , Other useful internal variables are NR, The current record number ( Line number ) NF Is the number of fields in the current row .
AWK You can operate on any file , Include std-in
, under these circumstances , It is usually with ‘|’ Command is used together , for example , combination grep Or other orders .
for example , If I list all the files in the current directory
-
ls -l
-
Total usage 140
-
-rw-r--r-- 1 root root 55121 1 month 3 17:03 combined_log_format.log
-
-rw-r--r-- 1 root root 80644 1 month 3 17:03 combined_log_format_w_resp_time.log
-
-rw-r--r-- 1 root root 71 1 month 3 17:55 sampler.log
` I can see the file size report as 3 Column data . If I want to know their total size , The files in this directory I can do :
-
ls -l | awk 'BEGIN {sum=0} {sum=sum+$5} END {print sum}'
-
135836
Please note that ,’print sum’ Print variables sum Value , So if sum = 2 be ‘print sum’ Give the output ‘2’ and ‘print $ sum’ Will print ‘1’ , Because the second field contains the value ‘1’ .
therefore , Will be very simple to write a can calculate the average and the standard deviation of a column of numbers awk command – Accumulate in the main interior ‘sumx’ and ‘sumx2’ part , Then use the standard formula to calculate END The mean and standard deviation of the part .
AWK Support (’for’ and ‘while’) Loops and branching ( Use ‘if ‘). therefore , If you want to trim a file and only on each page 3 Line operation , You can do that :
-
ls -l | awk '{for (i=1;i<3;i++) {getline}; print NR,$0}'
-
3 -rw-r--r-- 1 root root 80644 1 month 3 17:03 combined_log_format_w_resp_time.log
-
4 -rw-r--r-- 1 root root 71 1 month 3 17:55 sampler.log
for Recycling “getline” Command traverses the file , And every 3 Print one line at a time .
Be careful , Because the number of lines in the file is 4, Can not be 3 to be divisible by , So the last order is done ahead of time , So the last “print $0” Order to print the 4 That’s ok , You can see that we also printed the line , Use NR Variable output line number .
AWK Pattern matching
AWK It’s a line oriented language . The first is the pattern , And then there’s the action . The operation statement uses { and } Cover up . Patterns may be missing , Or the movement may be missing , however , Of course not all . If there is no pattern , For each input record . A missing action will print the entire record .
AWK Patterns include regular expressions ( Use with “grep -E” The same grammar ) And the combination of special symbols used “&&” Express “ Logic AND ”,“||” Express “ Logic or ”,“!” It means “ No logic ”.
You can also do relationship patterns 、 Pattern group 、 Scope, etc .
AWK Control statement
-
if (condition) statement [ else statement ]
-
while (condition) statement
-
do statement while (condition)
-
for (expr1; expr2; expr3) statement
-
for (var in array) statement
-
break
-
continue
-
exit [ expression ]
AWK Input / Output statement
Be careful :printf The command allows you to use something like C Specifies the output format more closely for example , You can specify an integer of a given width , Floating point numbers or strings, etc .
AWK Mathematical functions
AWK String function
AWK Command line and usage
You can use it as many times as you need ‘ -v ‘ Flag passes the variable to awk Program , for example
-
awk -v skip=3 '{for (i=1;i<skip;i++) {getline}; print $0}' sampler.log
-
booze
-
bungie
-
broken$tuff
You can also use the editor to write awk Program , Then save it as a script file , for example :
-
$ cat awk_strip
-
#!/usr/bin/awk -f
-
#only print out every 3rd line of input file
-
BEGIN {skip=3}
-
{for (i=1;i<skip;i++)
-
{getline};
-
print $0}
You can then use it as a new add-on command
-
chmod u+x awk_strip
-
./awk_strip sampler.dat
sed = stream editor
sed For the input stream ( File or input from pipeline ) Perform basic text conversion single through stream , So it’s very efficient . however , sed Ability to filter text through pipes , Especially different from other types of editors .
sed Basics
sed It can be on the command line or shel l Use in script , Edit files in a non interactive way . Perhaps the most useful function is to edit a string “ Search and replace ” To another string . You can use sed Commands are embedded into the use of ‘-e’ Option call sed In the command line of , Or put them in a separate file ‘sed.in’ And use ‘-f sed.in’ Option call sed. The latter option is if sed The command is complex and involves a lot of regexp, The most commonly used , for example :
sed-e's/input/output/'sampler.log
Will be taken from sampler.log
Echo to every line of standard output , Change every line of ‘input’ Line up ‘output’. Be careful sed It’s line oriented , So if you want to change every event in every line , So you need to make it a ‘ greedy ‘ Search and replace , As shown below :
-
sed -e 's/input/output/g' sampler.log
-
boot
-
book
-
booze
-
machine
-
boots
-
bungie
-
bark
-
aardvark
-
broken$tuff
-
robots
/.../
The expression in can be a literal string or regular expression . Note that by default , The output will be written to stdout. You can redirect it to a new file , Or if you want to Edit existing files , You should use ‘-i’ sign :
-
sed -e 's/input/output/' sampler.log > new_file
-
sed -i -e 's/input/output/' sampler.log
sed And regular expressions
If a character you want to use in a search command is a special symbol , for example ‘/’, What should I do ?( For example, in the file name ) or ‘*’ etc. ? Then you have to be like grep( and awk) So the escape symbol . I want to tell you that I want to edit shell Script to reference /usr/local/bin
instead of /bin
, Then you can do this
-
sed -e 's/\/bin/\/usr\/local\/bin/' my_script > new_script
What if you want to use wildcards in your search – How to write an output string ? You need to use a special symbol corresponding to the pattern you find “&”. So you want each line to start with a number in your file , And bracket the number :
-
sed -e 's/[0-9]*/(&)/'
among [0-9] It’s all single digits regexp Range , and ‘*’ It’s a repeat count , The number of digits representing any number . You can also regexp Using position commands in , You can even save some of the matching results in the pattern buffer , So that it can be reused elsewhere .
Other SED command
The general form is
-
sed -e '/pattern/ command' sampler.log
among ‘pattern’ It’s a regular expression ,’command’ It can be ‘s’= search&replace, or ‘p’= print, or ‘d’= delete, or ‘i’=insert, or ‘a’=append etc. . Please note that , The default operation is to print all not match anyway , So if you want to suppress it , You need to use ‘-n’ Flag call sed, Then you can use ‘p’ Command to control what is printed . therefore , If you want to make a list of all Subdirectories you can use
-
ls -l | sed -n -e '/^d/ p'
Because the long list starts with each line with ‘d’ Symbol , If it’s a directory , So this will only print out those with ‘d’ The line at the beginning of the symbol . Again , If you want to delete all comments with symbols ‘#’ Beginning line , You can use
-
sed -e '/^#/ d' sampler.log
You can also use the scope form
-
sed -e '1,100 command' sampler.log
In the 1-100 Do it “ command ”. You can also use a special line number $ To express “ end ” file . therefore , If you want to delete the file before 10 All lines except lines , You can use
-
sed -e '11,$ d' sampler.log
You can also use the pattern range form , The first regular expression defines the beginning of the scope , And the second stop . therefore , for example , If you want to print from ‘boot’ To ‘machine’ All of the line You can do that :
-
sed -n -e '/boot$/,/mach/p' sampler.log
-
boot
-
book
-
booze
-
machine
And then just print out (-n)regexp The lines in a given range .
summary
Linux Three swordsmen awk,sed and grep It is widely used in performance modeling 、 Performance monitoring and performance analysis , It’s also a high-frequency interview question for testing posts of major Internet companies , One of the necessary skills for middle and high-end testers