Saturday, December 25, 2010

Count Each ASCII Character Occurance in a Given File

Ever get strange characters in a file and want to know what they are? Using this code you can see what characters would be causing the issue. The normal printing characters are ASCII 32 through 126, so any characters outside this range might be causing you trouble. Consult an online ASCII chart and look for the decimal values to verify the output from this tool.


#!/usr/bin/env perl -w
#Script:  ord_frequency.pl
#Frequency of ASCII characters in file or standard input
#this can read from stdin or from a passed filename
#Reads file one character at a time in binary mode
#gets frequency of the ordinal value of each character.
#Display frequency distribution of ordinal values.
#----------------------------------------------------------
$file = shift;                          #optional file
$file = '-' unless defined $file;       #use STDIN if no file given
open FILE, "$file" or die $!;           #Open file
my %freq;
binmode FILE;                           #set binary mode
my ($buf, $data, $n);
while (($n = read FILE, $data, 1) != 0) {   #read 1 char at a time
   $freq{ord($data)}++;                 #record ordinal frequency
}
close(FILE);
#Display frequency Sorted Numerically by ordinal value
printf("%12i: %s\n", $freq{$_}, $_) foreach sort { $a <=> $b; } keys %freq;

Enjoy!