Perl Squirrel: Perl

Showing posts with label Perl. Show all posts

Tuesday, January 1, 2013

Perl one-liners: File Extension Frequency

Perl Squirrel was curious about what types of files were on the Linux system.

Using a series of Linux commands to isolate just the extensions of each file, this output is piped to a Perl one-liner that stores in a hash, the file extension as the key and the occurrence count as the value.
The output is sorted by key (or file extension, alphabetical order).

This is all done as a one-liner.

Description and One-Liner Code:

## For all files,
## get basename of each file (meaning drop the directory portion of the string)
## except files starting with dot (meaning don't look at hidden files)
## except files containing comma (no strange files)
## awk uses period as field separator ( -F\. )
## for lines that have more than 1 field (NF > 1 meaning there is at least one period in the file)
## print the last field $NF (meaning the file extension)
## Display unique extension and number of occurances
find . -type f -print | xargs -I {} basename {} | grep -v "^\." | grep -v ',' | awk -F\. 'NF > 1 {print $NF}' | perl -e 'while(<>){chomp;$f{$_}++;}printf("%-40s  %12i\n",$_,$f{$_}) foreach sort keys %f;'

Bonus: Frequency counter for numeric values

To get frequency values for numeric values, you'll want to use a slightly different kind of sort when outputting the contents of the hash keys and values so that the keys display in numerical order not alphabetical order.

This is just an example to show the numeric sort.
In a long directory listing (ls -l), the 7th field contains the day of the month.
Lets get a frequency distribution of the day of the month for each item in the directory.

One-Liner Code:

ls -l | awk '$7 >= 1 {print $7}' | perl -e 'while(<>){chomp;$f{$_}++;}printf("%-40s  %12i\n",$_,$f{$_}) foreach sort {$a<=>$b} keys %f;'

Happy New Year!!

Thursday, May 31, 2012

Perl Power - Remove files fast

Perl Squirrel has been very busy collecting nuts ( about 1.5 million ) over the past years and his store room was about to burst. If the nuts were actually files in a directory, how would Nugget get rid of them fast? Nugget wants to keep only the nuts that were collected in the past 90 days ( they would taste better ). First get a count of the files that would get removed using the Unix find command and mtime option to count files older than 90 days...

cd \perlsquirrel\storeroom find . $ ! -name . -prune "*nuts.*" $ -type f -mtime +90 -print | wc -l
Now lets say the number of files returned was 1 million. In order to remove these, Nugget did the following...
cd \perlsquirrel\storeroom find . $ ! -name . -prune "*nuts.*" $ -type f -mtime +90 -print | perl -lne unlink
This removed 1 million files in about 9 minutes.
Note: This find command prevents descending into subdiretories.
Perl Squirrel advises...measure twice...cut once.

Wednesday, November 30, 2011

Perl Power - Sequential Iteration in shell scripts

Lets say you want to process a bunch of files containing a numeric sequence like "inventory000297.txt" through "inventory000302.txt". You would like to print the following:

inventory000297.txt
inventory000298.txt
inventory000299.txt
inventory000300.txt
inventory000301.txt
inventory000302.txt

I did not see a simple way to do this using a plain shell script, but adding some perl is a different story...

for F in $(perl -e 'foreach $num (297 .. 302) {printf("%s%06i%s\n","inventory",$num,".txt");}')
do
    echo ${F}
done

Notes to keep in mind:

$( ) contains the perl one-liner that generates the output to print
The perl script is between the single quotes
The foreach loop block is within braces { }
The numerical range is in parenthesis ( 297 .. 302 ), change this as necessary
The printf has 3 format specifiers:

%s corresponds with "inventory", change prefix as necessary

%06i corresponds with $num, a fixed width 6 digit integer, change as necessary

%s corresponds with the ".txt", change extension as necessary

Adjust the size of your sequence number - if 15 digits is needed you would use %015i for the format. That leading 0 says fill in with leading zeros on output to match the specified width.

Enjoy!

Monday, August 1, 2011

Perl One-liners - Deal with the "too many files" issue

In UNIX, sometimes you end up getting a "too many files" error when using the "cp, mv, or rm" commands if there are thousands of files in a given directory. The usual workaround is to utilize the "find" command.

Without the newer (GNU) find versions, like I don't have... I have to resort to using this type of syntax to display files only in the current directory (I don't want it to descend in to sub-directories). Let say there are tons of "xml" files in the current directory.

#Example using find that will count the number of xml files
find . \( ! -name . -prune -name "*.xml" \) -type f -print | wc -l

Here is an alternative way to count the files in Perl:

perl -e '@a=<*.xml>;map {print "$_\n";} @a;' | wc -l
#Even quicker version
perl -e '@a=<*.xml>;printf("%s\n", scalar(@a));'

The caveat for the Perl version is that you would need enough memory to hold the names of the files in the array @a .

Saturday, December 25, 2010

Count Each ASCII Character Occurance in a Given File

Ever get strange characters in a file and want to know what they are? Using this code you can see what characters would be causing the issue. The normal printing characters are ASCII 32 through 126, so any characters outside this range might be causing you trouble. Consult an online ASCII chart and look for the decimal values to verify the output from this tool.

#!/usr/bin/env perl -w
#Script:  ord_frequency.pl
#Frequency of ASCII characters in file or standard input
#this can read from stdin or from a passed filename
#Reads file one character at a time in binary mode
#gets frequency of the ordinal value of each character.
#Display frequency distribution of ordinal values.
#----------------------------------------------------------
$file = shift;                          #optional file
$file = '-' unless defined $file;       #use STDIN if no file given
open FILE, "$file" or die $!;           #Open file
my %freq;
binmode FILE;                           #set binary mode
my ($buf, $data, $n);
while (($n = read FILE, $data, 1) != 0) {   #read 1 char at a time
   $freq{ord($data)}++;                 #record ordinal frequency
}
close(FILE);
#Display frequency Sorted Numerically by ordinal value
printf("%12i: %s\n", $freq{$_}, $_) foreach sort { $a <=> $b; } keys %freq;

Enjoy!

Open Multiple Text Files With One Click using ActiveState Perl

This is a convenient way of opening many text files at once in an editor that supports multiple tabs. For those that use a text editor for taking notes, logs, work files, journaling, etc. and often reference these files frequently this technique is very easy to use.

Requirements are:

An editor that supports multiple files open at once in separate tabs.
ActiveState Perl installed (and in your PATH)
The sourcecode below
setup a Windows Shortcut to the perl code

In the code below you will see that it is configured to open 4 files when "log.pl" runs.
The files are:
YYYY_MM.txt (each month it will create a new file)
JOURNAL.txt
INFO.txt
TIPS.txt

You can change the names and the number of files you wish to open and the script will build the appropriate command line to run (see lines 20-23 below).

Make sure you specify the correct editor you are using... In this case it is using TextPad.exe, but it can use other editors if you specify the correct path to your editor (see line 14 below).

Source Code:

#!/usr/bin/perl
#Script : log.pl
#Purpose: To open multiple files at once in the editor that supports tabs
#Requirements: ActivePerl (in your path)
#Create a shortcut for this script
# Target: path to your script
# Start in: directory your script is in
use POSIX qw(strftime);

$scnt = 0; #string count
@strings = (); #array of strings
$cmd = ""; #command to run

$strings[$scnt++] = "C:\\Progra~1\\TextPa~1\\TextPad.exe"; #Use Textpad editor
#$strings[$scnt++] = "C:\\Progra~1\\EditPl~1\\editplus.exe"; #Use Editplus editor

#--------------------------------------------------------------------
# FILES YOU ENTER HERE WILL BE OPENED IN SEPARATE TABS IN YOUR EDITOR
#--------------------------------------------------------------------
$strings[$scnt++] = strftime("c:\\dir1\\notes\\%Y_%m.txt", localtime);
$strings[$scnt++] = "c:\\dir1\\notes\\JOURNAL.txt";
$strings[$scnt++] = "c:\\dir1\\notes\\INFO.txt";
$strings[$scnt++] = "c:\\dir1\\notes\\TIPS.txt";

foreach $str (@strings) { #Build Command string
$cmd .= chr(34) . $str . chr(34) . " ";
}

system ($cmd); #Open files in editor

Enjoy!

Monday, November 30, 2009

Perl - A compact multiple-value IF statement using grep

Here is a simple way to compare multiple string values against a variable.
Normally one might use multiple OR conditions such as:

$pet = "rabbit";
if ( uc($pet) eq "CAT" || 
     uc($pet) eq "DOG" ||
     uc($pet) eq "HAMSTER" ||
     uc($pet) eq "RABBIT" ||
     uc($pet) eq "RACCOON" ||
     uc($pet) eq "MONKEY" ||
     uc($pet) eq "HORSE" ) {
     printf("A %s is a Mammal.\n", ucfirst($pet) );
}
elsif ( uc($pet) eq "ALLIGATOR" ||
        uc($pet) eq "FROG" ||
        uc($pet) eq "SALAMANDER" ||
        uc($pet) eq "SNAKE" ||
        uc($pet) eq "TOAD" ) {
    printf("A %s is an Amphibian.\n", ucfirst($pet) );
}
else {
    printf("What is a %s ?\n", ucfirst($pet) );
}

Here is an easier way (Full string match, not case-sensitive):

#Search word
$pet = "rabbit";

#Regular Expression: / ^=beginning of string, $pet variable, $=end of string / i=ignore case
#Using Anonomous Array:  ("cat", "dog", "...as-many-as-you-like...", "horse")
if ( grep /^$pet$/i, ("cat","dog","hamster","RABBIT","Raccoon","Monkey","horse") ) {
    printf("A %s is a Mammal.\n", ucfirst($pet) );
} 
elsif ( grep /^$pet$/i, ("Alligator", "frog", "salamander", "Snake", "toad") ) {
    printf("A %s is an Amphibian.\n", ucfirst($pet) );
}
else {
    printf("What is a %s ?\n", ucfirst($pet) );
}

Some notes:

Removing the "i" option would make the statement case-sensitive:

Altering the Regular Expression slightly, will match the beginning or end of strings.
Matching the beginning of the strings (remove the ending $ symbol):
Matching the end of the strings (remove the leading ^ carret symbol):

Sunday, November 22, 2009

Perl One-liners - An easy way to begin using Perl

A Perl one-liner is a small perl script run from the command-line prompt or a shell script as one command. You can pack a lot of power in a small command. By embedding Perl into a shell script you can enhance the capabilities of your shell script a great deal.

Here are some examples:

Fixed Length Data File - Show records where a column range matches a given string.

General Command:

perl -ne 'print if substr($_, StartCol, Stringlen) =~ "searchstring";'  infile.txt > outfile.txt

Example: Find records where position 488-489 is equal to "CA". Keep in mind that for startcol, the first character of the record begins a position 0.

perl -ne 'print if substr($_, 487, 2) =~ "CA";' infile.txt > outfile.txt

Pipe-Delimited Data File - Show records where a column 2 value equals 2

123|2|Oak
456|4|Cedar
789|2|Willow

perl -ne 'split(/\|/); @line=@_; print if $line[1] == 2;' in.txt > out.txt

Korn shell - Embed Perl for loop inside Korn shell for loop

Example: Loop to count from 1 to 99 (by Odd numbers)
Note: everything between $( ... ) is the perl one-liner

for i in $(perl -e 'for ($i=1; $i<100; $i+=2) {print "$i\n";}')
do
  echo $i
done

Frequency Counter - Pipe Standard Input into this one-liner File of votes with a name on each line - in.txt

cat in.txt | perl -e '%freq;while (<>) {chomp;$freq{$_}++;} printf("%-40s %10i\n", $_, $freq{$_}) foreach sort keys %freq;'

Output:
Al               3
Fred           5
Mary         6