Listen Print Discuss

Charting Data at the Bottom of the World

by Alex Gough
May 04, 2006

I have an odd job: I'm the only programmer for about 500 miles. I look after experiments on a remote Antarctic research station and look after the data they produce. As well as the scientific stuff knocking about, we have between 20 and 70 people, most of them keen on the weather. Either because we can't work if its windy, or can enjoy a spot of kite skiing if it's just windy enough, everyone here wants to know what's going on outside.

Luckily we have a few climate science experiments running, including a weather station. For a few years now, data from the weather station has been available on people's computers through a Perl Tk application and some slightly baroque shuttling of records between three different data servers and the network the office computers run on. All is well and good, and we leave it well alone, as it's worked well. Recently, a new experiment installed on the station provides an up-to-the-minute profile of wind speeds over the first 30 meters of the air. It's there to support research into interactions between snow and air in Antarctica, but it's also crucial information if you want to head out and whiz about behind a kite.

The data from this mast goes to a remote machine that allows users to VNC in to check its health, and logs this data to a binary format of its own making. People around the station have taken to logging in to this machine before heading out, which is probably not the best way keep the data rolling in without interruption. Rather than forbidding access to this useful source of local data, we decided to upgrade our weather display system to include the major parameters recorded by the mast.

Alas, while fairly nice to use, Tk is a bit fiddly and not exactly my cup of tea. Adding new displays to an existing application can be time-consuming, as you must re-learn the relations among each different widget, pane, and button. Added to this programming burden, even if we could find every copy of the application scattered around our network, we'd have to do so every time we added some other source of data. We settled instead on a complete rewrite as a CGI script and some automatically generated graphs. A fancier man than me might call that a three-tier application, but then, he'd probably be selling you something at the same time.

Mountains of Data

Before you can see what the weather is doing (beyond looking out of the window), you need to get at the raw numbers somehow. Ours are provided by state-of-the-art scientific instruments in state-of-the-art data formats; that is to say, partly as lines of ASCII data in columns, and partly as fixed-length records in a binary file. No matter, though. Perl and some friends from CPAN make fast work of building meaning from tumbled piles of data.

Before doing anything, I set up a couple of objects to hold some data values. Each set of observations has a class corresponding to the experiment that generated it. The classes also contain read_file factory methods that read a file and produce a list of observations. To make things as quick (to write) as possible, I used Class::Accessor to autogenerate get and set methods for my objects:

 # Current weather data
 package Z::Weather;
 use base qw(Class::Accessor);
 Z::Weather->mk_accessors( qw(time temp pressure wind dir) );

This automatically creates a new() method for Z::Weather. Call it as:

 my $weather = Z::Weather->new({time => $time,
                                temp => $temp,
                                pressure => $pres,
                                wind => $wind,
                                dir

It also generates get and set accessors for each field:

 # set
 $weather->temp(30);
 
 # get
 my $temp = $weather->temp();

(The "codename" used when shipping items to our station is Z, so I've used that as my little local namespace, too.)

From our mast, we have a number of observations taken at different heights, so I wanted a slightly more complicated representation, using a class to represent the mast and another to represent each level on the mast.

 package Z::Mast;
 use base qw(Class::Accessor);
 
 Z::Mast->mk_accessors(qw(time values));
 
 package Z::Mast::Level;
 use base qw(Class::Accessor);
 Z::Mast::Level->mk_accessors(qw(wind dir level));

Remember that Z::Mast::values will set and get a reference to an array of ::Level objects. If I wanted to enforce that, I could override the methods provided by Class::Accessor, but that would create work that I can get away without doing for this simple case.

Now that I know what the data will look like in Perl, I can wrench it from the covetous hands of our data loggers and turn it into something I can use.

First, I decided to deal with the plain ASCII file. This contains single lines, with the time of observation first, then white-space-separated values for temperature, pressure, wind speed, direction, and a few others that I don't care about. Z::Weather needs to use a couple of modules and add a couple of methods:

 use IO::All;
 
 sub from_file {
     my $class = shift;
     my $io    = io(shift);
     my @recs  = ();
     
     while (my $line = $io->readline()) {
         chomp($line);
         push @recs, $class->_line($line);
     }
     return @recs;
 }

I expect to call this as:

 my @weather_records = Z::Weather->fromfile("weather.data");

Using the IO::All module to access the files both makes it very easy to read the file and also allows calling code to instead supply an IO::All object of its own, or to call this method with a filehandle already opened to the data source. This will make it easy to obtain data from some other source; for instance, if the experiment changes to provide a socket from which to read the current values.

Parsing the data is the responsibility of another method, _line(), which expects lines like:

 2006 02 06 01 25  -10.4  983.2  23.5 260.1

 use DateTime;
 sub _line {
     my ($class, $line) = @_;
     my @vals = split /\s+/, $line;

     # extract time fields and turn into DateTime object
     my($y, $m, $d, $h, $min)
        = $line =~ /^(\d{4}) (\d\d) (\d\d) (\d\d) (\d\d)/;
 
     my $t = DateTime->new(year=>$y,month=>$m,day=>$d,hour=>$h,minute=>$min);
 
     # return a new Z::Weather record, using the magic new() method
     return $class->new({time => $t,
                         temp     => $vals[5],
                         pressure => $vals[6],
                         wind     => $vals[7],
                         dir      => $vals[8],  });
 }

split and Perl's magic make sense of the data points, and the DateTime module take cares of the details of when the record was produced. I find it much easier to turn any time-related value into a DateTime object at the soonest possible moment, so that the rest of my code can expect DateTime objects. It becomes easier to reuse in other projects. If you find yourself writing code to handle leap years every other day, then make using DateTime your number one new habit.

I deal with the mast data in a similar way, except that the other format is fixed-length binary records. The time of the recording is stored in the first four bytes as the number of seconds into an arbitrary epoch. I correct this into Unix time when creating its DateTime object. Values are stored as two-byte, network-endian unsigned shorts stored as hundredths of the recorded values. unpack() comes to my aid here.

 sub from_file {
   my $class = shift;
   my $io    = io(shift);
   my ($rec, @recs);
 
   while ($io->read($rec, 62) == 62) {
     push @recs, $class->_record($rec);
   }
   return @recs;
 }

 # map height of reading to offsets in binary record
 our %heights = qw(1 24  2 28 4 32  8 36  15 40  30 44);
 use constant MAST_EPOCH => 2082844800;
 
 sub _record {
   my ($class, $rec) = @_;

   # extract the time as a 4 byte network order integer, and correct epoch
   my $crazy_time = unpack("N", $rec);
   my $time       = DateTime->from_epoch(epoch=>$crazy_time-MAST_EPOCH);

   # then a series of (speed, dir) 2 byte pairs further into the record
   my @vals;
   foreach my $offset (sort values %heights) {
     my ($speed, $dir) = unpack("nn", substr($rec, $offset));
     push @vals,
       Z::Mast::Level->new({wind=>$speed*100,
                            dir => $dir*100,
                            level=>$heights{$offset}});
   }
   return $class->new({time => $time,
                       values => \@vals});
 }

Again, I can call this using any one of the types supported by IO::All. Again, I wield DateTime to my advantage to turn a time stored in an unusual epoch quickly into an object which anything or anyone else can understand. There are a few magic numbers here, but that's what you end up with when you deal with other people's crazy file formats. The key thing is to record magic numbers in one place, to allow other people to change them if they need to, both in your code and from their own code (hence the our variable), and finally, to let values pass from undocumented darkness into visible, named objects as soon as possible.

Advanced Perl Programming

Related Reading

Advanced Perl Programming
By Simon Cozens

Table of Contents
Index
Sample Chapter

Read Online--Safari
Search this book on Safari:
 

Code Fragments only

Pages: 1, 2

Next Pagearrow





Contact Us | Advertise with Us | Privacy Policy | Press Center | Jobs | Submissions Guidelines

Copyright © 2000-2008 O’Reilly Media, Inc. All Rights Reserved. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on the O'Reilly Network are the property of their respective owners.

For problems or assistance with this site, email