• 周三. 11月 29th, 2023

5G编程聚合网

5G时代下一个聚合的编程学习网

热门标签

Performance tools of Linux three swordsmen awk, grep, sed

[db:作者]

1月 6, 2022

Preface

linux There are many tools for text processing , for example :sort, cut, split, join, paste, comm, uniq, column, rev, tac, tr, nl, pr, head, tail….., Study linux Lazy way of text processing ( Not the best way ) May be : Just learn grep,sed and awk. 

Use these three tools , You can solve the problem 99% linux Text processing of the system , Instead of remembering the different commands and parameters above . picture

and , If you’ve learned and used all three , You’ll know the difference . actually , The difference here means which tools are good at solving what problems .

A more lazy way might be to learn scripting languages (python,perl or ruby) And use it for every text processing .

summary

awk、grep、sed yes linux Three sharp tools for text manipulation , It is also necessary to master linux Command one .

The function of all three is to process text , But the focus is different , Among them awk Most powerful , But it’s also the most complicated .grep More suitable for simple search or matching text ,sed More suitable for editing matched text ,awk Better for formatting text , More complex formatting of text .

A brief summary :

  • grep: Data search positioning

  • awk: Data slicing

  • sed: Data modification

grep = global regular expression print

In the simplest terms ,grep( Global regular expression printing )– The command is used to find the strings in the file that match the conditions . Start with the first line of the file ,grep Copy a line to buffer in , Compare it to the search string , If the comparison passes , Print the line to the screen .grep This process will be repeated , Until the file searches all lines .

  Be careful : There is no process execution here grep Store lines 、 Change the line or search only a few lines .

Sample data file

Please cut and paste the following data into a file named “sampler.log” In the file of :

  1. boot

  2. book

  3. booze

  4. machine

  5. boots

  6. bungie

  7. bark

  8. aardvark

  9. broken$tuff

  10. robots

A simple example

grep The simplest example is :

  1. grep "boo" sampler.log

In this case ,grep Will traverse the file “sampler.log” Each line , And print out every line Contains words “boo”:

  1. boot

  2. book

  3. booze

  4. boots

But if you’re working on large files , This will happen : If these lines identify which line in the file , What are they , It might be more useful to you , If you need to open a file in an editor , So it’s easier to track specific strings and make some changes . In this case, you can add -n Parameter to implement :

  1. grep -n "boo" sampler.log

This leads to a more useful result , Explains which lines match the search string :

  1. 1:boot

  2. 2:book

  3. 3:booze

  4. 5:boots

Another interesting parameter is -v, It prints the opposite result . let me put it another way ,grep All lines that do not match the search string will be printed , Instead of printing the line that matches it . 

In the following cases ,grep Will print without strings “boo” Each line , And display the line number , As shown in the previous example

  1. grep -vn "boo" sampler.log

  2. 4:machine

  3. 6:bungie

  4. 7:bark

  5. 8:aardvark

  6. 9:broken$tuff

  7. 10:robots

c Options tell grep Suppress printing of matching lines , Show only the number of matching rows , The rows that match the query . for example , The numbers will be printed below 4, Because there is 4 It’s in  sampler.log  It appears that “boo”.

  1. grep -c "boo" sampler.log

  2. 4

l Option prints only the file name string of the file in the query that has a line that matches the search . If you want to search multiple files for the same string , This will be very useful . like this :

  1. grep -l "boo" *

For searching non code files , A more useful option is -i, Ignore case . This option will handle when matching search strings , Equal case . In the following example , Even if the search string is uppercase , contain “boo” And the lines will also be printed out .

  1. grep -i "BOO" sampler.log

  2. boot

  3. book

  4. booze

  5. boots

x Options only match exactly . let me put it another way , The following command search has no results , Because no line contains only “boo”

  1. grep -x "boo" sampler.log

Last ,-A Allows you to specify additional up and down file lines , So you get the search string extra lines , for example

  1. grep -A2 "mach" sampler.log

  2. machine

  3. boots

  4. bungie

Regular expressions

Regular expressions are a compact way to describe complex patterns in text .

With grep You can use search mode ( pattern ) . Other tools use regular expressions (regexp) In a complex way . and grep The normal string used , It’s actually very simple regular expressions . If you use wildcards , Such as ‘ * ‘ or ‘ ? ‘, For example, list file names and so on , You can use grep Search with basic regular expressions  

For example, search a file for letters e The line at the end :

  1. grep "e$" sampler.log

  2. booze

  3. machine

  4. bungie

If you need more extensive regular expression commands , Must be used  grep-E

for example , Regular expression commands ? Will match 1 or 0 time Previous characters :

  1. grep -E "boots?" sampler.log

  2. boot

  3. boots

You can still use it  pipe(|)  Combine multiple searches , It means “ perhaps ”, So you can do this :

  1. grep -E "boot|boots" sampler.log

  2. boot

  3. boots

Special characters

If you want to search for a special character , What should I do ? If you want to find all the lines , If it contains the dollar character “$”, It cannot be executed  grep“$”a_file, because ‘$’ Will be interpreted as regular expressions , contrary , You will get all the lines , Any of them ends as a line , That is, all lines . The solution is “ escape ” Symbol , So you will use

  1. grep '\$' sampler.log

  2. broken$tuff

You can still use it “-F” Options , It represents “ Fixed string ” or “ Fast ”, Because it only searches for Strings , Not regular expressions .

added regexp Example

Reference resources :http://gnosis.cx/publish/programming/regular_expressions.html

AWK

from Aho,Weinberger and Kernighan Create text patterns for scanning and processing languages . 

AWK Very complicated , So this is not a complete guide , But it should give you a way to know what awk You can do it . It’s easy to use , Strongly recommended .

AWK Basic knowledge of

awk The program operates on each line of the input file . It can have an optional BEGIN{ } Part of the command executed before processing anything in the file , Then master { } Parts run on every line of the file , Finally, there’s an alternative END{ } Part of the operation will be performed later, and the file reading is completed :

  1. BEGIN { …. initialization awk commands …}

  2. { …. awk commands for each line of the file…}

  3. END { …. finalization awk commands …}

For each line of the input file , It looks to see if there are any pattern matching instructions , In this case, it only runs on lines that match the pattern , Otherwise it runs on all lines . these  'pattern-matching'  A command can contain and grep The same regular expression . 

awk Commands can do some very complex mathematical and string operations ,awk It also supports associative arrays . AWK Think of each line as consisting of multiple fields , Each field consists of “ Spacer ” Separate . By default , This is one or more space characters , So it’s OK :

  1. this is a line of text

contain 6 A field . stay awk in , The first field is called $1, The second field is called $2, wait , All lines are called $0. 

The field separator is defined by awk Internal variables FS Set up , So if you set FS= “: ” Then it will be based on ‘:’ In a row , This is for  /etc/passwd  Documents like that are very useful , Other useful internal variables are NR, The current record number ( Line number ) NF Is the number of fields in the current row . 

AWK You can operate on any file , Include  std-in, under these circumstances , It is usually with ‘|’ Command is used together , for example , combination grep Or other orders . 

for example , If I list all the files in the current directory

  1. ls -l

  2. Total usage 140

  3. -rw-r--r-- 1 root root 55121 1 month   3 17:03 combined_log_format.log

  4. -rw-r--r-- 1 root root 80644 1 month   3 17:03 combined_log_format_w_resp_time.log

  5. -rw-r--r-- 1 root root    71 1 month   3 17:55 sampler.log

` I can see the file size report as 3 Column data . If I want to know their total size , The files in this directory I can do :

  1. ls -l | awk 'BEGIN {sum=0} {sum=sum+$5} END {print sum}'

  2. 135836

Please note that ,’print sum’ Print variables sum Value , So if sum = 2 be ‘print sum’ Give the output ‘2’ and ‘print $ sum’ Will print ‘1’ , Because the second field contains the value ‘1’ . 

therefore , Will be very simple to write a can calculate the average and the standard deviation of a column of numbers awk command – Accumulate in the main interior ‘sumx’ and ‘sumx2’ part , Then use the standard formula to calculate END The mean and standard deviation of the part . 

AWK Support (’for’ and ‘while’) Loops and branching ( Use ‘if ‘). therefore , If you want to trim a file and only on each page 3 Line operation , You can do that :

  1. ls -l | awk '{for (i=1;i<3;i++) {getline}; print NR,$0}'

  2. 3 -rw-r--r-- 1 root root 80644 1 month   3 17:03 combined_log_format_w_resp_time.log

  3. 4 -rw-r--r-- 1 root root    71 1 month   3 17:55 sampler.log

for Recycling “getline” Command traverses the file , And every 3 Print one line at a time .

Be careful , Because the number of lines in the file is 4, Can not be 3 to be divisible by , So the last order is done ahead of time , So the last “print $0” Order to print the 4 That’s ok , You can see that we also printed the line , Use NR Variable output line number .

AWK Pattern matching

AWK It’s a line oriented language . The first is the pattern , And then there’s the action . The operation statement uses { and } Cover up . Patterns may be missing , Or the movement may be missing , however , Of course not all . If there is no pattern , For each input record . A missing action will print the entire record . 

AWK Patterns include regular expressions ( Use with “grep -E” The same grammar ) And the combination of special symbols used “&&” Express “ Logic AND ”,“||” Express “ Logic or ”,“!” It means “ No logic ”. 

You can also do relationship patterns 、 Pattern group 、 Scope, etc .

AWK Control statement

  1. if (condition) statement [ else statement ]

  2. while (condition) statement

  3. do statement while (condition)

  4. for (expr1; expr2; expr3) statement

  5. for (var in array) statement

  6. break

  7. continue

  8. exit [ expression ]

AWK Input / Output statement

Be careful :printf The command allows you to use something like C Specifies the output format more closely for example , You can specify an integer of a given width , Floating point numbers or strings, etc .

AWK Mathematical functions

AWK String function

AWK Command line and usage

You can use it as many times as you need ‘ -v ‘ Flag passes the variable to awk Program , for example

  1. awk -v skip=3 '{for (i=1;i<skip;i++) {getline}; print $0}' sampler.log

  2. booze

  3. bungie

  4. broken$tuff

You can also use the editor to write awk Program , Then save it as a script file , for example :

  1. $ cat awk_strip

  2. #!/usr/bin/awk -f

  3. #only print out every 3rd line of input file

  4. BEGIN {skip=3}

  5. {for (i=1;i<skip;i++)

  6. {getline};

  7. print $0}

You can then use it as a new add-on command

  1. chmod u+x awk_strip

  2. ./awk_strip sampler.dat

sed = stream editor

sed For the input stream ( File or input from pipeline ) Perform basic text conversion single through stream , So it’s very efficient . however , sed Ability to filter text through pipes , Especially different from other types of editors .

sed Basics

sed It can be on the command line or shel l Use in script , Edit files in a non interactive way . Perhaps the most useful function is to edit a string “ Search and replace ” To another string . You can use sed Commands are embedded into the use of ‘-e’ Option call sed In the command line of , Or put them in a separate file ‘sed.in’ And use ‘-f sed.in’ Option call sed. The latter option is if sed The command is complex and involves a lot of regexp, The most commonly used , for example : 

sed-e's/input/output/'sampler.log 

Will be taken from  sampler.log  Echo to every line of standard output , Change every line of ‘input’ Line up ‘output’. Be careful sed It’s line oriented , So if you want to change every event in every line , So you need to make it a ‘ greedy ‘ Search and replace , As shown below :

  1. sed -e 's/input/output/g' sampler.log

  2. boot

  3. book

  4. booze

  5. machine

  6. boots

  7. bungie

  8. bark

  9. aardvark

  10. broken$tuff

  11. robots

/.../  The expression in can be a literal string or regular expression . Note that by default , The output will be written to stdout. You can redirect it to a new file , Or if you want to Edit existing files , You should use ‘-i’ sign :

  1. sed -e 's/input/output/' sampler.log  > new_file

  2. sed -i -e 's/input/output/' sampler.log  

sed And regular expressions

If a character you want to use in a search command is a special symbol , for example ‘/’, What should I do ?( For example, in the file name ) or ‘*’ etc. ? Then you have to be like grep( and awk) So the escape symbol . I want to tell you that I want to edit shell Script to reference  /usr/local/bin instead of  /bin, Then you can do this

  1. sed -e 's/\/bin/\/usr\/local\/bin/' my_script > new_script

What if you want to use wildcards in your search – How to write an output string ? You need to use a special symbol corresponding to the pattern you find “&”. So you want each line to start with a number in your file , And bracket the number :

  1. sed -e 's/[0-9]*/(&)/'

among [0-9] It’s all single digits regexp Range , and ‘*’ It’s a repeat count , The number of digits representing any number . You can also regexp Using position commands in , You can even save some of the matching results in the pattern buffer , So that it can be reused elsewhere .

Other SED command

The general form is

  1. sed -e '/pattern/ command' sampler.log

among ‘pattern’ It’s a regular expression ,’command’ It can be ‘s’= search&replace, or ‘p’= print, or ‘d’= delete, or ‘i’=insert, or ‘a’=append etc. . Please note that , The default operation is to print all not match anyway , So if you want to suppress it , You need to use ‘-n’ Flag call sed, Then you can use ‘p’ Command to control what is printed . therefore , If you want to make a list of all Subdirectories you can use

  1. ls -l | sed -n -e '/^d/ p'

Because the long list starts with each line with ‘d’ Symbol , If it’s a directory , So this will only print out those with ‘d’ The line at the beginning of the symbol . Again , If you want to delete all comments with symbols ‘#’ Beginning line , You can use

  1. sed -e '/^#/ d' sampler.log

You can also use the scope form

  1. sed -e '1,100 command' sampler.log

In the 1-100 Do it “ command ”. You can also use a special line number $ To express “ end ” file . therefore , If you want to delete the file before 10 All lines except lines , You can use

  1. sed -e '11,$ d' sampler.log

You can also use the pattern range form , The first regular expression defines the beginning of the scope , And the second stop . therefore , for example , If you want to print from ‘boot’ To ‘machine’ All of the line You can do that :

  1. sed -n -e '/boot$/,/mach/p' sampler.log

  2. boot

  3. book

  4. booze

  5. machine

And then just print out (-n)regexp The lines in a given range .

summary

Linux Three swordsmen awk,sed and grep It is widely used in performance modeling 、 Performance monitoring and performance analysis , It’s also a high-frequency interview question for testing posts of major Internet companies , One of the necessary skills for middle and high-end testers

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注