Showing posts with label Awk. Show all posts
Showing posts with label Awk. Show all posts

Tuesday, January 1, 2013

Perl one-liners: File Extension Frequency

Perl Squirrel was curious about what types of files were on the Linux system.

Using a series of Linux commands to isolate just the extensions of each file, this output is piped to a Perl one-liner that stores in a hash, the file extension as the key and the occurrence count as the value.
The output is sorted by key (or file extension, alphabetical order).

This is all done as a one-liner.

Description and One-Liner Code:
## For all files, ## get basename of each file (meaning drop the directory portion of the string) ## except files starting with dot (meaning don't look at hidden files) ## except files containing comma (no strange files) ## awk uses period as field separator ( -F\. ) ## for lines that have more than 1 field (NF > 1 meaning there is at least one period in the file) ## print the last field $NF (meaning the file extension) ## Display unique extension and number of occurances find . -type f -print | xargs -I {} basename {} | grep -v "^\." | grep -v ',' | awk -F\. 'NF > 1 {print $NF}' | perl -e 'while(<>){chomp;$f{$_}++;}printf("%-40s %12i\n",$_,$f{$_}) foreach sort keys %f;'

Bonus:   Frequency counter for numeric values

To get frequency values for numeric values, you'll want to use a slightly different kind of sort when outputting the contents of the hash keys and values so that the keys display in numerical order not alphabetical order.

This is just an example to show the numeric sort.
In a long directory listing (ls -l), the 7th field contains the day of the month.
Lets get a frequency distribution of the day of the month for each item in the directory.

One-Liner Code:
ls -l | awk '$7 >= 1 {print $7}' | perl -e 'while(<>){chomp;$f{$_}++;}printf("%-40s %12i\n",$_,$f{$_}) foreach sort {$a<=>$b} keys %f;'

Happy New Year!!

Sunday, February 5, 2012

Source Code Search Browser for Command-line

Here is a handy utility ( viewsrc.sh ) for searching through your existing code base for code examples.  Let say you are developing some new code and you need to add a feature but you can not quite remember the syntax you used before.   No problem...without even having to leave the code you are in you can search for that example.

Syntax:  viewsrc.sh  <searchstring>  <fileExtension>
Where: searchstring is a regular expression (one accepted by "egrep"
and fileExtension is the file extension of the files to look at... for example... "pl" or "c" or "sh" or "py" or "html", ... you get the idea.

Features:
  • Search-word can be a simple regular expression
  • Search within files of certain file extensions
  • The search is from the current directory and also within sub-directories
  • Search is not case-sensitive
  • Every search match results in the full contents of the matching file being appended to one temporary file.
  • The temporary file is opened with vi/vim in "view" read-only mode.
  • The cursor is placed at the first occurrence of the search-word.
  • Press "n" for next match, or "N" for previous match.
  • The filename where each match occurred, is placed at the top of each matching file that was appended to the temporary file.
  • No changes are made to original code, only to a temporary file.
  • The advantage of the "viewsrc.sh" utility vs grep is that you see the full context of the match by seeing the entire source file.
  • When you find a match you like, just go backwards in the file until you see the source file name, if you want to know which file the match occurred in.
Example 1:
You would like to find examples ofPerl programs that use the "foreach" keyword

viewsrc.sh foreach pl

You will be placed in view mode, and if there are any matches, just press "n" to go to the next match, "N" for previous match, and when you are done, just quit like you would in vi/vim ...

:q!

Example 2:
You are editing some Bash code...


vi mycode.sh

Without leaving your editing session, search for example code containing the word "case"


:!viewsrc.sh case sh

Press 'n' to view next occurrance of the word "case".
Then exit with ":q!".

Note:  If you wanted to you can search multiple times in succession.  For every search you do you would have to do a ":q!" to return to the previous environment.

Here is the source for Bash Shell:


#!/bin/bash
#Script:  viewsrc.sh
#Purpose: This script is used to search through the current directory
# and down for any files that contain the search word (param 1) within files
# with the given file extension (param 2).
#
# Every file that contains the searched for word is
# concatenated into one temp file and viewed with the cursor on the
# first occurence.
# Press "n" to see the next occurance of the searchword.
# You exit with the command    :q!
#
# Example:
# You are editing a file in vi and you want to see korn shell code examples
# using the word "grep".
# To view this examples do the following:
#    1)  Press ESC to make sure you are not in edit mode
#
#    2)     :!viewsrc.ksh grep ksh  <press enter>
#
#    3)  A read-only temporary file will be displayed with the cursor on
#        the first match of the searchword you searched for.
#
#    4)     n    to see the next match if one exists.
#           N    to see the previous match
#
#    5)  To quit the view session enter the following:
#           :q   <press enter>  or
#           :q!  <press enter>
#
#    6)  You will see  "[Hit return to continue]"
#                       <press enter>
#
#    7)  You are back where you started.
#
#    Note:  Step 2 can be done multiple times to drill down into
#           procedure calls.  Just do Step 5 to go back a level.
#           Press CTRL-G to verify what file you are currently in
#           (whether it is your original file or the temp file)
#
#    Note:  If you don't get a match you will see something like:
#           /tmp/srch.030130_095607" [Read only] No lines in the buffer
#                   Do step 5 to go back to your original file.

#Function Name :  fullpath()
#Purpose     : Accepts a filename
#              Returns the full pathname of the file
#-----------------------------------------------------------------
# NOTE:  fullpath function works relative to the current directory.
#        Either the full path or the basename will work ok.
#-----------------------------------------------------------------

#Function to construct a full pathname for the given filename
fullpath ()
{
s=$i                                    #s is a string containing the filename
ts=`echo $s  | awk '{print substr($0,1,1)}'`    #ts is the 1st char of string
if [[ "$ts" = "/" ]]                    #Is leading char a slash
then
  #No change is necessary because it is already a full path
  echo $s
else
  #Tack on current directory without the leading ./
  ts=`echo $s  | awk '{print substr($0,1,2)}'`          #1st two char of string
  if [[ "$ts" = "./" ]]                 # does it start with "./"
  then
    # Replace the dot with the current working directory and tack on
    # the string from the second character to the end of the string
    s=$(echo `pwd``echo $s | awk '{printf ("%s", substr($0,2))}'`)
    echo $s
  else
    #Just a filename was given so build a string from the current working
    #directory, a slash and then the filename
    s=$(echo `pwd`/`echo $s | awk '{printf ("%s", substr($0,1))}'`)
    echo $s
  fi
fi
}


#- - - MAIN SCRIPT STARTS HERE - - -
USAGE="\n\nUsage:  viewsrc.ksh searchword FileExtension\n"
if (($# != 2))                  #One parameter required
then
    echo $USAGE
    exit 1
fi

SEARCHWORD=$1

# Declare Variables
TIMESTAMP=`date +%y%m%d_%H%M%S`
MATCHFILES=/tmp/temp.$TIMESTAMP
SEARCHFILE=/tmp/srch.$TIMESTAMP
DIRFILE=/tmp/dirfile.$TIMESTAMP

#If the searchfile already exists remove it
if [[ -f $SEARCHFILE ]]
then
   rm $SEARCHFILE
fi

#Create an empty file
touch $SEARCHFILE

#If the dirfile already exists remove it
if [[ -f $DIRFILE ]]
then
   rm $DIRFILE
fi

#Create an empty file
touch $DIRFILE

# Find Files That Contain The Searchword in various subdirectories
find . -name "*.${2}" -print > $DIRFILE

#Note:  in between the brackets is a space and a tab character
cat $DIRFILE | xargs grep -i -l "${1}" > $MATCHFILES

for i in `cat $MATCHFILES`
do
    # Consolidate The Files That Contain The Searchword
    echo "===========================================" >> $SEARCHFILE
    FP=$(fullpath ${i})
    echo "File: `hostname`:${FP}" >> $SEARCHFILE
    echo "===========================================" >> $SEARCHFILE
    cat $i >> $SEARCHFILE
done

# Display the SEARCHFILE at the first Searchword
EXINIT='set ic'
export EXINIT
view +/$SEARCHWORD/ $SEARCHFILE

# Cleanup Temporary Files
rm $MATCHFILES
rm $SEARCHFILE
rm $DIRFILE

TIPS:

  • To search on more than one word, put a period between words.
    Like....      
    viewsrc.ksh Searching.on.four.words sh
  • Searching on special characters like brackets would need to be backslash escaped
    Like....
    viewsrc.sh if.\[\[ sh


Enjoy!

Sunday, November 22, 2009

AWK - Sort lines from shortest to longest

Display filenames in a directory with the shortest filenames first and longest length last:

ls -1 | awk '{printf "%d\t%s\n", length($0), $0}' | sort +0n -1 | sed 's/^[0-9][0-9]* //'