【linux】grep 和【perl】 脚本实现的grep功能的运行时间差异
时间:2022-04-03 09:43
参考在网上找到的代码,没想到相差那么大,目前有个项目要对50GB~70GB的代码,260个关键字做扫描,急需一个比较快速的方案。
[gzhy@nearby stat]$ wc -l 1 234033 1 [gzhy@nearby stat]$ perl 1.pl cost 1 seconds zjtel : 32606 [gzhy@nearby stat]$ perl 2.pl cost 111 seconds zjtel : 32606
1.pl
#!/usr/bin/perl my $time=time(); open(file,"1"); while(<file>;) { chomp; if(m/:zjtel:/) { $zjtel++; } } close(file); $time=time()-$time; print "cost $time seconds\n"; print "zjtel : $zjtel\n";2.pl
#!/usr/bin/perl $time=time(); $count=`grep zjtel 1 | wc -l `; $time=time()-$time; print "cost $time seconds\n"; print "zjtel : $count\n"
我的等待测试代码:
pattern-match:
use strict; use File::Basename; //在一个目录的文件文件中查找包含关键字的 <文件名>:<行数>:<行内容> my ($dir,$keywords)= @ARGV; opendir(DIRHANDLE,$dir) or die "Can‘t open $dir:$!"; my @filenames=sort readdir(DIRHANDLE); close(DIRHANDLE); open KEY,"<$keywords" or die "Can‘t open $keywords"; my @keywords=<KEY>; close KEY; my $num_key=scalar @keywords; my @match_lines; my $time=time(); foreach my $file(@filenames){ open FILE,"<$file"; $n=1; while my $line(<FILE>){ chomp $line; foreach my $key(@keywords){ if($line=~m/$key/){ $context="$file:$n:$line\n"; push @match_lines,$context; } } } close(file); } open RS,">result_file_pattern"; foreach(@match_lines){ print RS $_; } close RS; $time=time()-$time; print "Patter-match ($num_key keywords) end:$time seconds\n"; //如果直接将$context print到RS句柄和现在这种方式是否有区别?grep:
use strict; use File::Basename; //在一个目录的文件文件中查找包含关键字的 <文件名>:<行数>:<行内容> my ($dir,$keywords)= @ARGV; opendir(DIRHANDLE,$dir) or die "Can‘t open $dir:$!"; my @filenames=sort readdir(DIRHANDLE); close(DIRHANDLE); open KEY,"<$keywords" or die "Can‘t open $keywords"; my @keywords=<KEY>; close KEY; my $num_key=scalar @keywords; my @match_lines; my $time1=time(); foreach my $file(@filenames){ foreach $key(@keywords){ chomp $key; my @sub_match_lines=`grep $key $file`; push @match_lines,@sub_match_lines; } } open RS,">result_file_grep"; foreach(@match_lines){ print RS $_; } close RS; my $time2=time(); print "Grep ($num_key keywords) end:",$time2-$time1,"\n"; //如果直接将$context print到RS句柄和现在这种方式是否有区别?