Tuesday, January 1, 2013

Perl one-liners: File Extension Frequency

Perl Squirrel was curious about what types of files were on the Linux system.

Using a series of Linux commands to isolate just the extensions of each file, this output is piped to a Perl one-liner that stores in a hash, the file extension as the key and the occurrence count as the value.
The output is sorted by key (or file extension, alphabetical order).

This is all done as a one-liner.

Description and One-Liner Code:
## For all files, ## get basename of each file (meaning drop the directory portion of the string) ## except files starting with dot (meaning don't look at hidden files) ## except files containing comma (no strange files) ## awk uses period as field separator ( -F\. ) ## for lines that have more than 1 field (NF > 1 meaning there is at least one period in the file) ## print the last field $NF (meaning the file extension) ## Display unique extension and number of occurances find . -type f -print | xargs -I {} basename {} | grep -v "^\." | grep -v ',' | awk -F\. 'NF > 1 {print $NF}' | perl -e 'while(<>){chomp;$f{$_}++;}printf("%-40s %12i\n",$_,$f{$_}) foreach sort keys %f;'

Bonus:   Frequency counter for numeric values

To get frequency values for numeric values, you'll want to use a slightly different kind of sort when outputting the contents of the hash keys and values so that the keys display in numerical order not alphabetical order.

This is just an example to show the numeric sort.
In a long directory listing (ls -l), the 7th field contains the day of the month.
Lets get a frequency distribution of the day of the month for each item in the directory.

One-Liner Code:
ls -l | awk '$7 >= 1 {print $7}' | perl -e 'while(<>){chomp;$f{$_}++;}printf("%-40s %12i\n",$_,$f{$_}) foreach sort {$a<=>$b} keys %f;'

Happy New Year!!

Sunday, July 8, 2012

Shell script - Capture snippets from vi or Vim

Here is a utility that may be useful to people who use vi or Vim.
Perl Squirrel has a common tendency of collecting useful code samples and this tool enables that without too much time or effort.  The general idea is this... While editing code, highlight the code sample worth keeping and file it in a text file without leaving the vi or Vim environment. 

Here are the requirements for this utility:
  • You use 'vi' or 'Vim'
  • Create a subdirectory called "notes" under your home directory
  • Place the "snip.sh" utility in your personal 'bin' directory or somewhere in your PATH

Here is the code: (See Instructions in the code comments)
#!/bin/bash #Script: snip.sh #Purpose: Grab certain lines of text from the file # you are editing/viewing in vi/vim, # and append this snippet to a notes file. # #How to use in vi/vim session: # Position cursor on 1st line to save, press the letters # ma # Position cursor on last line to save, press the letters # mb # Capture the snippet to the default ~/notes/snippet.txt file # :'a,'b w !snip.sh # Or, Capture the snippet to a specified file ~/notes/favorite.txt # :'a,'b w !snip.sh favorite ["Optional Short description"] # #This code assumes you will store the snippets in a "notes" #subdirectory under your home directory. #Place this code in your personal bin directory, or in your PATH. #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SNIP=${1:-snippet} DESC="${2:-UNNAMED SNIPPET}" echo '#[SNIP]------------------------------------' >> ~/notes/${SNIP}.txt echo "#[SNIP] ${DESC} " >> ~/notes/${SNIP}.txt echo '#[SNIP]------------------------------------' >> ~/notes/${SNIP}.txt cat - >> ~/notes/${SNIP}.txt

Thursday, May 31, 2012

Perl Power - Remove files fast

Perl Squirrel has been very busy collecting nuts ( about 1.5 million ) over the past years and his store room was about to burst. If the nuts were actually files in a directory, how would Nugget get rid of them fast? Nugget wants to keep only the nuts that were collected in the past 90 days ( they would taste better ). First get a count of the files that would get removed using the Unix find command and mtime option to count files older than 90 days...

cd \perlsquirrel\storeroom find . \( ! -name . -prune "*nuts.*" \) -type f -mtime +90 -print | wc -l
Now lets say the number of files returned was 1 million. In order to remove these, Nugget did the following...
cd \perlsquirrel\storeroom find . \( ! -name . -prune "*nuts.*" \) -type f -mtime +90 -print | perl -lne unlink
This removed 1 million files in about 9 minutes.
Note: This find command prevents descending into subdiretories.
Perl Squirrel advises...measure twice...cut once.

Sunday, February 5, 2012

Source Code Search Browser for Command-line

Here is a handy utility ( viewsrc.sh ) for searching through your existing code base for code examples.  Let say you are developing some new code and you need to add a feature but you can not quite remember the syntax you used before.   No problem...without even having to leave the code you are in you can search for that example.

Syntax:  viewsrc.sh  <searchstring>  <fileExtension>
Where: searchstring is a regular expression (one accepted by "egrep"
and fileExtension is the file extension of the files to look at... for example... "pl" or "c" or "sh" or "py" or "html", ... you get the idea.

Features:
  • Search-word can be a simple regular expression
  • Search within files of certain file extensions
  • The search is from the current directory and also within sub-directories
  • Search is not case-sensitive
  • Every search match results in the full contents of the matching file being appended to one temporary file.
  • The temporary file is opened with vi/vim in "view" read-only mode.
  • The cursor is placed at the first occurrence of the search-word.
  • Press "n" for next match, or "N" for previous match.
  • The filename where each match occurred, is placed at the top of each matching file that was appended to the temporary file.
  • No changes are made to original code, only to a temporary file.
  • The advantage of the "viewsrc.sh" utility vs grep is that you see the full context of the match by seeing the entire source file.
  • When you find a match you like, just go backwards in the file until you see the source file name, if you want to know which file the match occurred in.
Example 1:
You would like to find examples ofPerl programs that use the "foreach" keyword

viewsrc.sh foreach pl

You will be placed in view mode, and if there are any matches, just press "n" to go to the next match, "N" for previous match, and when you are done, just quit like you would in vi/vim ...

:q!

Example 2:
You are editing some Bash code...


vi mycode.sh

Without leaving your editing session, search for example code containing the word "case"


:!viewsrc.sh case sh

Press 'n' to view next occurrance of the word "case".
Then exit with ":q!".

Note:  If you wanted to you can search multiple times in succession.  For every search you do you would have to do a ":q!" to return to the previous environment.

Here is the source for Bash Shell:


#!/bin/bash
#Script:  viewsrc.sh
#Purpose: This script is used to search through the current directory
# and down for any files that contain the search word (param 1) within files
# with the given file extension (param 2).
#
# Every file that contains the searched for word is
# concatenated into one temp file and viewed with the cursor on the
# first occurence.
# Press "n" to see the next occurance of the searchword.
# You exit with the command    :q!
#
# Example:
# You are editing a file in vi and you want to see korn shell code examples
# using the word "grep".
# To view this examples do the following:
#    1)  Press ESC to make sure you are not in edit mode
#
#    2)     :!viewsrc.ksh grep ksh  <press enter>
#
#    3)  A read-only temporary file will be displayed with the cursor on
#        the first match of the searchword you searched for.
#
#    4)     n    to see the next match if one exists.
#           N    to see the previous match
#
#    5)  To quit the view session enter the following:
#           :q   <press enter>  or
#           :q!  <press enter>
#
#    6)  You will see  "[Hit return to continue]"
#                       <press enter>
#
#    7)  You are back where you started.
#
#    Note:  Step 2 can be done multiple times to drill down into
#           procedure calls.  Just do Step 5 to go back a level.
#           Press CTRL-G to verify what file you are currently in
#           (whether it is your original file or the temp file)
#
#    Note:  If you don't get a match you will see something like:
#           /tmp/srch.030130_095607" [Read only] No lines in the buffer
#                   Do step 5 to go back to your original file.

#Function Name :  fullpath()
#Purpose     : Accepts a filename
#              Returns the full pathname of the file
#-----------------------------------------------------------------
# NOTE:  fullpath function works relative to the current directory.
#        Either the full path or the basename will work ok.
#-----------------------------------------------------------------

#Function to construct a full pathname for the given filename
fullpath ()
{
s=$i                                    #s is a string containing the filename
ts=`echo $s  | awk '{print substr($0,1,1)}'`    #ts is the 1st char of string
if [[ "$ts" = "/" ]]                    #Is leading char a slash
then
  #No change is necessary because it is already a full path
  echo $s
else
  #Tack on current directory without the leading ./
  ts=`echo $s  | awk '{print substr($0,1,2)}'`          #1st two char of string
  if [[ "$ts" = "./" ]]                 # does it start with "./"
  then
    # Replace the dot with the current working directory and tack on
    # the string from the second character to the end of the string
    s=$(echo `pwd``echo $s | awk '{printf ("%s", substr($0,2))}'`)
    echo $s
  else
    #Just a filename was given so build a string from the current working
    #directory, a slash and then the filename
    s=$(echo `pwd`/`echo $s | awk '{printf ("%s", substr($0,1))}'`)
    echo $s
  fi
fi
}


#- - - MAIN SCRIPT STARTS HERE - - -
USAGE="\n\nUsage:  viewsrc.ksh searchword FileExtension\n"
if (($# != 2))                  #One parameter required
then
    echo $USAGE
    exit 1
fi

SEARCHWORD=$1

# Declare Variables
TIMESTAMP=`date +%y%m%d_%H%M%S`
MATCHFILES=/tmp/temp.$TIMESTAMP
SEARCHFILE=/tmp/srch.$TIMESTAMP
DIRFILE=/tmp/dirfile.$TIMESTAMP

#If the searchfile already exists remove it
if [[ -f $SEARCHFILE ]]
then
   rm $SEARCHFILE
fi

#Create an empty file
touch $SEARCHFILE

#If the dirfile already exists remove it
if [[ -f $DIRFILE ]]
then
   rm $DIRFILE
fi

#Create an empty file
touch $DIRFILE

# Find Files That Contain The Searchword in various subdirectories
find . -name "*.${2}" -print > $DIRFILE

#Note:  in between the brackets is a space and a tab character
cat $DIRFILE | xargs grep -i -l "${1}" > $MATCHFILES

for i in `cat $MATCHFILES`
do
    # Consolidate The Files That Contain The Searchword
    echo "===========================================" >> $SEARCHFILE
    FP=$(fullpath ${i})
    echo "File: `hostname`:${FP}" >> $SEARCHFILE
    echo "===========================================" >> $SEARCHFILE
    cat $i >> $SEARCHFILE
done

# Display the SEARCHFILE at the first Searchword
EXINIT='set ic'
export EXINIT
view +/$SEARCHWORD/ $SEARCHFILE

# Cleanup Temporary Files
rm $MATCHFILES
rm $SEARCHFILE
rm $DIRFILE

TIPS:

  • To search on more than one word, put a period between words.
    Like....      
    viewsrc.ksh Searching.on.four.words sh
  • Searching on special characters like brackets would need to be backslash escaped
    Like....
    viewsrc.sh if.\[\[ sh


Enjoy!

Wednesday, November 30, 2011

Perl Power - Sequential Iteration in shell scripts

Lets say you want to process a bunch of files containing a numeric sequence like "inventory000297.txt" through "inventory000302.txt". You would like to print the following:

inventory000297.txt
inventory000298.txt
inventory000299.txt
inventory000300.txt
inventory000301.txt
inventory000302.txt

I did not see a simple way to do this using a plain shell script, but adding some perl is a different story...

for F in $(perl -e 'foreach $num (297 .. 302) {printf("%s%06i%s\n","inventory",$num,".txt");}')
do
    echo ${F}
done

Notes to keep in mind:
  • $( ) contains the perl one-liner that generates the output to print
  • The perl script is between the single quotes
  • The foreach loop block is within braces { }
  • The numerical range is in parenthesis ( 297 .. 302 ), change this as necessary
  • The printf has 3 format specifiers:

    %s corresponds with "inventory", change prefix as necessary

    %06i corresponds with $num, a fixed width 6 digit integer, change as necessary

    %s corresponds with the ".txt", change extension as necessary






Adjust the size of your sequence number - if 15 digits is needed you would use %015i for the format. That leading 0 says fill in with leading zeros on output to match the specified width.

Enjoy!

Monday, August 1, 2011

Perl One-liners - Deal with the "too many files" issue

In UNIX, sometimes you end up getting a "too many files" error when using the "cp, mv, or rm" commands if there are thousands of files in a given directory. The usual workaround is to utilize the "find" command.

Without the newer (GNU) find versions, like I don't have... I have to resort to using this type of syntax to display files only in the current directory (I don't want it to descend in to sub-directories). Let say there are tons of "xml" files in the current directory.

#Example using find that will count the number of xml files
find . \( ! -name . -prune -name "*.xml" \) -type f -print | wc -l

Here is an alternative way to count the files in Perl:

perl -e '@a=<*.xml>;map {print "$_\n";} @a;' | wc -l
#Even quicker version
perl -e '@a=<*.xml>;printf("%s\n", scalar(@a));'

The caveat for the Perl version is that you would need enough memory to hold the names of the files in the array @a .

Display all Standard Perl Modules available

Here is a quick way to see all names of the Standard Perl Modules that are installed on your system:

perl -MFile::Find=find -MFile::Spec::Functions -Tlw -e 'find { wanted => sub { print canonpath $_ if /\.pm\z/ }, no_chdir => 1 }, @INC'