MARC/Perl
MARC Tutorial
NAME
MARC::Doc::Tutorial - A documentation-only module for new users of MARC::Record
SYNOPSIS
perldoc
MARC::Doc::Tutorial
INTRODUCTION
What is MARC? The MAchine Readable Cataloging format was designed by the Library of Congress in the late 1960s in order to allow libraries to convert their card catalogs into a digital format. The advantages of having computerized card catalogs were soon realized, and now MARC is being used by all sorts of libraries around the world to provide computerized access to their collections. MARC data in transmission format is optimized for processing by computers, so it's not very readable for the normal human. For more about the MARC format, visit the Library of Congress at http://www.loc.gov/marc/
What is this Tutorial?
The
document you are reading is a beginners guide to using Perl to
processing MARC data, written in the 'cookbook' style. Inside, you
will find recipes on how to read, write, update and convert MARC
data using the MARC::Record CPAN package. As with any cookbook, you
should feel free to dip in at any section and use the recipe you
find interesting. If you are new to Perl, you may want to read from
the beginning. The document you are reading is distributed with the
MARC::Record package, however in case you are reading it somewhere
else, you can find the latest version at CPAN:
http://www.cpan.org/modules/by-module/MARC/. You'll notice that
some sections aren't filled in yet, which is a result of this
document being a work in progress. If you have ideas for new
sections please make a suggestion to perl4lib:
http://www.rice.edu/perl4lib/.
History of MARC on CPAN
In 1999, a group of
developers began working on MARC.pm to provide a Perl module for
working with MARC data. MARC.pm was quite successful since it grew
to include many new options that were requested by the Perl/library
community. However, in adding these features the module swiftly
outgrew its own clothes, and maintenance and addition of new
features became extremely difficult. In addition, as libraries
began using MARC.pm to process large MARC data files (1000 records)
they noticed that memory consumption would skyrocket. Memory
consumption became an issue for large batches of records because
MARC.pm's object model was based on the 'batch' rather than the
record... so each record in the file would often be read into
memory. There were ways of getting around this, but they were not
obvious. Some effort was made to reconcile the two approaches
(batch and record), but with limited success. In mid 2001, Andy
Lester released MARC::Record and MARC::Field which provided a much
simpler and maintainable package for processing MARC data with
Perl. As its name suggests, MARC::Record treats an individual MARC
record as the primary Perl object, rather than having the object
represent a given set of records. Instead of forking the two
projects, the developers agreed to encourage use of the
MARC::Record framework, and to work on enhancing MARC::Record
rather than extending MARC.pm further. Soon afterwards, MARC::Batch
was added, which allows you to read in a large data file without
having to worry about memory consumption. In Dec., 2004, the
MARC::Lint module, an extension to check the validity of MARC
records, was removed from the MARC::Record distribution, to become
a separately distributed package. This tutorial contains examples
for using MARC::Lint.
Brief
Overview of MARC Classes
The MARC::Record package is made up of
several separate packages. This can be somewhat confusing to people
new to Perl, or Object Oriented Programming. However this framework
allows easy extension, and is built to support new input/output
formats as their need arises. For a good introduction to using the
object oriented features of Perl, see the perlboot documentation
that came with your version of Perl. Here are the packages that get
installed with MARC::Record:
MARC::Batch
A convenience class for accessing MARC data contained in an external file.
MARC::Field
An object for representing the indicators and subfields of a single MARC field.
MARC::Record
This primary class represents a MARC record, being a container for multiple MARC::Field objects.
MARC::Doc::Tutorial
This document!
MARC::File
A superclass for representing files of MARC data.
MARC::File::MicroLIF
A subclass of MARC::File for working with data encoded in the MicroLIF format.
MARC::File::USMARC
A subclass of MARC::File for working with data encoded in the USMARC format.
Help Wanted!
It's already been mentioned but it's worth mentioning again: MARC::Doc::Tutorial is a work in progress, and you are encouraged to submit any suggestions for additional recipes via the perl4lib mailing list at http://www.rice.edu/perl4lib. Also, the development group is always looking for additional developers with good ideas; if you are interested you can sign up at SourceForge: http://sourceforge.net/projects/marcpm/.