您的位置:首页 > 博客中心 > 网络系统 >

【linux】grep 和【perl】 脚本实现的grep功能的运行时间差异

时间:2022-04-03 09:43

参考在网上找到的代码,没想到相差那么大,目前有个项目要对50GB~70GB的代码,260个关键字做扫描,急需一个比较快速的方案。

[gzhy@nearby stat]$ wc -l 1
 234033 1
[gzhy@nearby stat]$ perl 1.pl 
cost 1 seconds
zjtel : 32606
[gzhy@nearby stat]$ perl 2.pl 
cost 111 seconds
zjtel :   32606

1.pl
#!/usr/bin/perl
my $time=time();
open(file,"1");
while(<file>;)
{
  chomp;
  if(m/:zjtel:/)
  {
    $zjtel++;
  }
}
close(file);
$time=time()-$time;
print "cost $time seconds\n";
print "zjtel : $zjtel\n";
2.pl
#!/usr/bin/perl

$time=time();
$count=`grep zjtel 1 | wc -l `;
$time=time()-$time;
print "cost $time seconds\n";
print "zjtel : $count\n"

我的等待测试代码:

pattern-match:


use strict;
use File::Basename;
//在一个目录的文件文件中查找包含关键字的 <文件名>:<行数>:<行内容>
my ($dir,$keywords)= @ARGV;
opendir(DIRHANDLE,$dir) or die "Can‘t open $dir:$!";
my @filenames=sort readdir(DIRHANDLE);
close(DIRHANDLE);
open KEY,"<$keywords" or die "Can‘t open $keywords";
my @keywords=<KEY>;
close KEY;
my $num_key=scalar @keywords;
my @match_lines;
my $time=time();
foreach my $file(@filenames){
    open FILE,"<$file";
    $n=1;
    while my $line(<FILE>){
       chomp $line;
       foreach my $key(@keywords){
         if($line=~m/$key/){
             $context="$file:$n:$line\n";
             push @match_lines,$context;
         }
       }
   }
   close(file);
}
open RS,">result_file_pattern";
foreach(@match_lines){
   print RS $_;
}
close RS;
$time=time()-$time;
print "Patter-match ($num_key keywords) end:$time seconds\n";
//如果直接将$context print到RS句柄和现在这种方式是否有区别?
grep:


use strict;
use File::Basename;
//在一个目录的文件文件中查找包含关键字的 <文件名>:<行数>:<行内容>
my ($dir,$keywords)= @ARGV;
opendir(DIRHANDLE,$dir) or die "Can‘t open $dir:$!";
my @filenames=sort readdir(DIRHANDLE);
close(DIRHANDLE);
open KEY,"<$keywords" or die "Can‘t open $keywords";
my @keywords=<KEY>;
close KEY;
my $num_key=scalar @keywords;
my @match_lines;
my $time1=time();
foreach my $file(@filenames){
    foreach $key(@keywords){
      chomp $key;
      my @sub_match_lines=`grep $key $file`;
      push @match_lines,@sub_match_lines;
    }
}
open RS,">result_file_grep";
foreach(@match_lines){
   print RS $_;
}
close RS;
my $time2=time();
print "Grep ($num_key keywords) end:",$time2-$time1,"\n";
//如果直接将$context print到RS句柄和现在这种方式是否有区别?






热门排行

今日推荐

热门手游