![]() |
Getting Out of the HTML Business: |
Library Web sites have grown in size and complexity over the last several years without a corresponding growth in the sophistication of the underlying technology. Web managers are struggling to control their sites using only the primitive tool of HTML. Under this constraint, it is hard for the library to deliver information with multiple access points and via user-defined displays. CGI (common gateway interface) scripting, the tool traditionally used to deliver dynamic content, finds limited use on most library sites due to the programming skills necessary to support it. Fortunately, there are new tools available that allow Web managers with minimal technical skills to create database-driven Web sites and, at the same time, streamline the Web management process within their organization.
The Challenge
Today's library Web site plays a central role in meeting the library's mission of delivering information and services, a role it did not play even three years ago. This is true for all types of libraries. Like most Web sites, library sites undergo a major redesign about every two years. Significant resources and organizational commitment are being invested in current efforts to revamp Web sites, which is indicative of the Web's prominent role. One new area to which resources are being devoted is usability testing, which tends to reveal a range of navigational and other problems. By creating "Web manager" and "electronic resources" positions, the library acknowledges that managing its site is no longer a one-person job. Thus, whether or not they are conscious of this evolving dynamic, libraries are taking steps to address the substantial technical and organizational challenges posed by the second-generation Web.
As the library continues to replace traditional resources and services with their electronic counterparts, the webmaster model of Web site management has become inefficient. The webmaster model fails because it lacks flexibility and scalability. From the early days of the Web, staff have been trained (or learned on their own) how to code HTML and then write pages for the Web site. HTML editors play a greater or lesser role here, but the picture is basically the same: the author submits pages to the webmaster who links them into the site. Either by design or de facto, the webmaster has become responsible for soliciting content, ensuring stylistic conformity, and handling other coordination tasks. Inevitably, some staff resist learning HTML or learn it poorly, resulting in time-consuming recoding. As sites grow both in size and the number of people involved, the sheer volume of HTML-coded pages and the links they contain have become unmanageable. Templates, validators, and link-checking utilities can stave off the chaos only so long.
The role of a Web site in the library is also continually evolving. Library Web sites circa 1995 gave the library a presence within its larger context (e.g., the university, the community) and provided basic information about the library such as its hours, links to locally held electronic resources such as the online catalog and citation databases, and links to selected Internet resources. Now, at the end of the decade, the library's Web site is on the ascent, while the catalog is in decline.1 The Web is the logical point of integration for nearly all library resources and services, and serves as the preferred access point for local as well as remote users. One of the reasons catalogs are being "relegated to a smaller and smaller role"2 is because their data are not easily interchangeable with other data, particularly Web-based data. The Holy Grail for users of journal literature is the direct link to the full text from the online citation, which is a rapidly emerging reality. Online books are not far behind. With delivery of library resources highly focused around a single medium, it becomes incumbent upon libraries to utilize the tools necessary to adapt their expertise to that medium.
User expectations add another dimension to the problem. Fresh from their experiences at Yahoo! and Amazon.com, users expect to view information in a variety of intuitive configurations. They come to a library site reasonably expecting to be able to browse electronic resources by title and subject, search for them by keyword, or even view them by publisher or some other variable. Some of these displays can be extracted from the online catalog by a skilled searcher, but others are beyond the capability of the catalog entirely. They may also reasonably expect that the site should "know" them and present them with a display tailored to their information needs.
The dynamic Web is not a novel idea. From the technical perspective, the Web has been dynamic from its inception. Every query to a search engine or form submission uses technologies behind the scenes for serving up dynamic content. CGI (common gateway interface) scripting in Perl typically supports those features. Most CGI scripts are custom written, but there are also repositories of scripts for free public downloading. Although many library Web managers download and modify those scripts to provide dynamic content on their sites (typically back-end support for forms), the absence of local programming expertise often limits their use.
To move forward, libraries must stop thinking of their Web sites as collections of HTML pages and view them as dynamic resources for information and services that patrons will use in highly individualized ways. Achieving such a site is not possible with HTML alone; it requires use of tools that can support dynamic content. Tools that libraries can afford and use quickly to create useful applications are now available, but the tools are only part of the picture. The hardest part is the reconceptualization of the Web site that must take place and the development of the data models that will underlie the delivery of dynamic content.
The Database-Driven Web Site SolutionTools
A number of Web technologies fall under the rubric "dynamic."3 Dynamic HTML (DHTML) and JavaScript both allow a Web page to change after it has been loaded in the browser, typically by relying on a combination of style sheets and scripts, but they both deliver a dynamic look, not dynamic content. Some Web management and database packages support database-linking capabilities that are based on a "publish to the Web" model. One example is FileMaker's FileMaker Pro, which, although severely limited in its customizability and scalability, can quickly and easily make databases available on the Web.4 Another example is Microsoft Access 2000, which supports Web database publishing. These tools do not produce dynamic content, however; "publish to the Web" software writes static HTML pages to the Web server. In addition, some HTML editors (e.g., Microsoft FrontPage, NetObjects Fusion, and Allaire HomeSite) offer "database" capabilities in the form of wizards or shortcuts for creating ColdFusion or Active Server Pages code, but they require that the database technology be present on the server side.
True dynamic content is created only through server-side processing. One standard feature of Web servers is the server-side include (SSI). SSIs take advantage of the Web server software's ability to pull "macros" from files and insert them into pages as they are delivered. They are typically used to deliver standardized headers and footers, but the primary tool for creating dynamic content on the Web has always been CGI. An open standard supported by all Web servers, CGI scripts are able to create pages on the fly, thus incorporating user- and database-provided input to deliver dynamic content. CGI scripting continues to be a solution for environments that value its platform independence and have access to staff with the requisite technical skills. CGI does have some drawbacks: because it spawns a new program for each user request, it can be slow. The final option for server-side processing is software that communicates with the Web server via a vendor-specific application programming interface (API). This solution offers speed (only one instance of the program is active at a given time), as well as powerful development tools that allow nonprogrammers to create useful applications.
"Application server" software falls into the category of a vendor-specific API interface to the Web server. Application servers can support many client/server services, including authentication, session management, and load balancing. Their core functionality, however, is data access. In the classic "three-tier" model, application server software plays the role of middleware, sitting between the Web server and a database to support delivery of dynamic content to the browser (figure 1).5 The application server accesses the database via either a native database interface or using a standard protocol, open database connectivity (ODBC), and a common query language, structured query language (SQL).
Figure 1. Three-Tier Model
The application server market offers a number of solutions, many of which are designed and priced for large-scale business applications.6 The most popular application server in the "develop-and-deploy" category for small to medium-sized organizations is Allaire's ColdFusion.7 provides a well-integrated development environment, including a client based on the HTML editor HomeSite that allows quick development of Web applications. ColdFusion's markup language is integrated into the HTML coding of pages and functions much as SSIs do, expanding the codes into SQL queries to the database and delivering the results to the browser in HTML. ColdFusion also comes with the Verity search engine.
Another option in the "develop-and-deploy" category is Everyware Tango.8 Tango offers a drag-and-drop development environment. Like ColdFusion, Tango uses its own markup language, which is integrated into the HTML coding of the page. Tango is scalable and compatible, but its main distinguishing feature is its easy-to-learn visual interface.
A related category of the application server is the "page server." Microsoft's Active Server Pages (ASP) is an example of a page server application, meaning the coded pages are loaded and executed inside Microsoft's IIS Web server.9 ASP is an object-oriented scripting environment that uses the VBScript language (although it can support others). Like ColdFusion and Tango, ASP scripting is embedded in an HTML page. It can define and connect to an ODBC data source, execute SQL queries, and deliver the results to the browser in HTML. In practice, a developer using ASP could implement the same applications he could using an application server, but the developer should have some programming experience.
Another increasingly popular page server solution is the Open Source PHP.10 PHP is a server-side, cross-platform scripting language similar to ASP that interfaces with the free MySQL database, as well as others. Like ASP, it is designed for developers with programming experience but has the advantage of being free and platform independent. Other Open Source options include AOLServer and Meta-HTML.11
The technical skill threshold necessary to utilize these tools is not much higher than the skills required to use "publish to the Web" database tools, and the payoff in flexibility and scalability is much greater. Experience with relational databases is important (and not hard to find or acquire for the ubiquitous Microsoft Access), as is an understanding of the Web and HTML. It is undeniable, however, that some programming skills, or at least an aptitude for application development, will reduce the learning curve.
Applications
As librarians, most of our collective experience organizing information into databases has been focused on creating and maintaining a shared bibliographic database (OCLC) and implementing local integrated library systems. The structure of online catalog data is the same as it was in the paper catalog, including the subject classifications we use. These highly structured classification systems have been successful in organizing the information in the catalog because of the homogeneity of the data. Web sites are not analogous.
Information on the Web is heterogeneous, containing links that range from meta-sites to single Web pages and encompassing a wide range of formats and data types. It is difficult to impose highly structured organization on that range of content.12 In their book, Information Architecture for the World Wide Web, Rosenfeld and Morville describe the different data structures that are appropriate at different levels of a Web site. At the top level, some TYPE of hierarchical organization is natural and essential. At the level of individual pages where information is the least structured, a hypertext LINK between pages is the most appropriate data structure. In between the home page and individual pages lie collections of structured, homogeneous information. That is the CONTENT that is best organized according to a database model.13 As our electronic collections develop, that segment is the fastest growing and most heavily accessed information on our Web sites. A display solution that works for a collection of one hundred electronic journals is not likely to work for a collection of ten thousand. The application server is a solution designed to support scalable applications.
Library Web sites, probably more than most, contain lists of all kinds: directories of local information, Internet resources, and various library holdings reproduced in list format for the convenience of library users and librarians. It does not take long to realize that coding and maintaining lists is an activity that does not scale. That reality is particularly stark when one considers the desirability of viewing the same data in MULTIPLE ways: an alphabetic list by title; a list sorted by subject category, format, or publisher; the whole database searchable by keyword. The current trend in the electronic information marketplace toward vendor-aggregated and commercial sites striving to your "portal" of choice does not necessarily serve the best interests of the library user. A patron may know that they want x journal, or information on y topic, but is unlikely to know that x journal is contained in aggregator a and not b, or that aggregator b and not a is the best choice for searching on y topic. Of course, as always, it is the librarian's job to help match the user's need with the appropriate resources. The database-driven Web site can make that task easier. Providing title-level access to in aggregator collections is one way. Another strategy is to implement site-wide subject searching, or browsing, that would result in a single page displaying links to all types of electronic information on a given topic drawn from many places on the Web site. Once all appropriate substantive on a Web site is database-driven, powerful search capabilities such as site-wide fielded searching are possible, which greatly improve upon the generic site-wide keyword index.
There are many useful applications in the area of delivering customized Web pages and increasing the interactivity between the user and a Web site. The appeal of sites that "know" something about you has been amply demonstrated by Amazon.com, eBay ("My eBay"), and others. There are prototype projects being done in libraries now to deliver customized "My Library" pages.14 If that idea turns out to be a successful one in the library context, application and page server tools could bring supporting such features within the reach of most libraries. Another useful twist on the idea of customized pages is to be able to deliver different pages to users based on where they are coming from (as an indication of their access rights) or other profiling information they may provide at some point while using the site (for instance, membership in a group or demographic information).
Most libraries support online forms for such services as reference questions, interlibrary loan requests, and comments. Usually CGI scripts, often freely distributed by their authors, support those forms. The output of the form is generally sent as e-mail to the appropriate person or department. But information contained in the e-mail is flat, and cannot be integrated with other information unless copied to another application. A form supported by an application server can instantly update a database with the submitted information, while sending an e-mail notifying the appropriate person that the database has been updated. That database may then be able to communicate with other applications, e.g., the interlibrary loan system.
Other potential applications are simply a by-product of putting useful data into a relational database, namely, the ability to answer questions about your electronic resources. As long as the of a typical library's Web site is locked into HTML pages, this is not possible. Some questions a database could answer would be: Which electronic journals do we have from x publisher?15 How many journals do we get free online with print? Which resources still require ID/password access? Some integrated library systems support questions like these, but unless the records are enhanced and updated, the desired information is incorrect or not present. If use statistics, which can be gathered in a uniform way using application server technology, are incorporated into the database, it becomes an even more useful management tool. Additionally, once the is freed of HTML and put into the flexible database format, it can be used to output information in other formats, such as brochures, user documentation, or printed directories.
Planning and ImplementationCreating a dynamic Web site requires careful thinking about the site as a whole. While parts of a site can be made dynamic within an existing structure, the benefits will be far greater if the project can be integrated into a site-wide redesign. The redesign process will address fundamental issues about content and functionality, navigation and searching, and plans for growth-all of which pose problems that can be solved by database-driven content. In addition, developing dynamic content is an iterative process, and the redesign effort, itself intrinsically an iterative process, offers the framework within which experimentation can take place.
The database or databases should not be designed with only Web display in mind. Like the integrated library system, these databases should be created with the intent to use them for multiple purposes. A well-designed relational database describing any collection will find multiple uses and users. The battle over whether or not to maintain duplicate databases (that is, in addition to the catalog) is essentially over; because of the Web's flexibility as a display medium, as well as its intense popularity with users, the data have made their way to the Web already. Some libraries have found ways to extricate some of the data from their catalog, but all have created substantial independent content for the Web.16 Whether or not data are extracted from the catalog, it is useful to be mindful of ways in which technical services and collection development staff could use the database and include those people in the design process. They have expertise in the bibliographic and acquisitions issues that will surely arise. In fact, the logical outcome of the project will be that they are responsible for all or part of the maintenance of a database pertaining to collections of electronic resources. Likewise, public service staff should be involved in database design and maintenance of forms, Internet resource guides, and other service-oriented projects.
Because it is impossible to predict in advance all the information associated with an item that should be recorded in the database, work on the Web display should occur in tandem with the database design. It will inevitably happen that certain fields and logic will have to be added to the database in order to achieve the desired display. There will be a dozen or more iterations of both database and Web display before all parties are satisfied. Self-evidently, the richer the database, the more useful it is (while, of course, the more work it is to create and maintain). Designers should think about ways to describe items in new ways: instead of height and number of pages, an item could be coded "free" or "includes full text." Metadata elements could be coded. For Internet resources, there is much information relating to licensing and access to include. Which of your communities has access and how? Does the library obtain the item free, as part of a print subscription, or as part of consortial access to a collection? What are the electronic holdings? What are the IDs and passwords associated with accessing or managing the item online? In creating these data, it is important to consider where controlled vocabulary is necessary (fields that will be searched or browsed, as well as fields for which global updating or querying makes sense) and where it is not (note fields). Database input forms can be designed to "enforce" the controlled vocabulary by providing drop-down lists of choices. Additionally, database designers gain another measure of control through effective use of the database software's data type definitions (e.g., numeric, Boolean, yes/no, time/date, text).
The Web interface design issues for dynamic content include all the standard design issues, but application and page server tools provide the flexibility to make choices that are not practical with HTML. For instance, it may be desirable to have a "gateway" page before linking to an off-site resource, in order to provide the user with information about access restrictions, scope notes, and so on. In HTML as many pages as there are individual resources must be created; in a dynamic site, one page is coded. In an HTML-based site, chunks of information (e.g., "reference resources on x topic") must be hard-coded for as many topics as there are and on as many pages as they are referenced; in a dynamic site, a "reference resources" page is coded once. If there are thousands of such resources, the efficiency of the database model is clear.
Once the framework for dynamic content is in place, it is important to be aware of all the ways in which it can be exploited. Application server environments can support SQL queries across databases as well as within a single database. In addition, they can reference queries that have been created on the database side in order to take advantage of the full query-building features of the database software. Database queries encoded in URLs are stable, which means that Web page authors as well as catalog records can link reliably to the local database, while the appropriate person maintains the URL, holdings statement, access restrictions, and so on. Furthermore, database query URLs don't have to be linked to a specified item; a URL can contain a query to request all items with a given characteristic, for instance, of a given format on a given subject. Embedding queries in URLs always returns the most up-to-date results.
The gains in efficiency and scalability for Web site management are tremendous. The number of files to maintain will drop dramatically, even as the site grows in size and complexity. Instead of attempting to police the use of templates across the Web site, one file is created for each display page and kept in a secure place. Content is then added through forms in the database itself, which are designed to be appropriate for the users' level of knowledge as well as their authorization to modify the data. The need for HTML training is greatly reduced. Also lessened is the need to ensure that each staff person involved in creating Web content continually updates his or her skills to accommodate changes in Web technology. Under the database model, those training resources can be focused on the technical staff who will be coding the dynamic pages while the others are free to focus on creating good content.
SummarySelling the idea of a database-driven Web site is not hard. Neither is choosing the appropriate tool. The challenge is to fully integrate the potential of the technology into the design and management of a site. Making a Web site dynamic will completely transform the way information is delivered, what information can be delivered, and how it is delivered. The logic of a good database design, and then the application server's linking of its various components for display, will pull the site together and add to its overall coherence and ease of use. A dynamic site is tighter, more accurate and timely, and far easier to manage. The process itself will naturally bring together the people in the library who should be involved in organizing and delivering information and services in the electronic environment.
HTML has created an artificial symbiosis of form and content. The Web is good at displaying data; a relational database is good at managing data. Implementing a database-driven Web site allows each tool to do what it does best. It also allows the library to move one step closer to a new, open infrastructure that, together with developments like XML, will continue to separate content from container, thus opening up new possibilities for flexible display and intelligent manipulation of information.
Many readers will be uncomfortable with the idea of creating new databases that will inevitably duplicate to a certain extent the catalog, to which so much time and effort has been devoted. Librarians have long known that patron access drives demand and use. For many of our current users, if the resource is not available online, it is of no interest. Web sites that provide effective access to a large selection of electronic resources serve that need, while Web sites that ineptly present such resources or focus on describing paper resources do not. A near-term goal should be to integrate catalog records and electronic "records" into a single flexible display.17 Before long, one should be able to integrate MARC content into the local database, structure the data in XML, and link one's database with another, all of which are tasks that are impossible within the constraints of the current generation of integrated library systems.18 Only at that time will the library collection be unified rather than fragmented by the presence of electronic resources. With this destination in mind, it can easily be envisioned that the databases created to drive a dynamic Web site today will serve as the foundation for our primary library databases tomorrow. Thus, one can view these databases as transitional objects, but not wasted effort. Some of the most transforming trends now visible in electronic publishing-the movement away from the "journal" as container, for instance-will take at least a few years to fully develop and will affect different types of libraries within different time frames.19 In the meantime, library Web sites should be as flexible and useful as possible, especially when the effort required to make them so is less than that required to maintain the status quo.
References and Notes
- Virginia Ortiz-Repiso and Purificacion Moscoso, "Web-Based OPACs: Between Tradition and Innovation," Information Technology and Libraries 18, no. 2 (1999): 68-77.
- David W. Lewis, "What if Libraries Are Artifact-Bound Institutions?" Information Technology and Libraries 17, no. 4 (1998): 190.
- For an extended discussion of choosing a database solution, including a detailed discussion of the tools outlined here, see John Paul Ashenfelter, Choosing a Database for Your Web Site (New York: Wiley, 1999). For a more technical, but lighthearted, perspective that is critical of the application server, see Philip Greenspun, Philip and Alex's Guide to Web Publishing, San Francisco: Morgan Kaufmann Pub., 1999. Accessed Aug. 3, 1999, www.photo.net/wtr/thebook/.
- Filemaker, Inc. Accessed Aug. 3, 1999, www. filemaker.com/.
- John Paul Ashenfelter, "Getting Started with Web Data-bases," Webreview.com. Accessed Feb. 26, 1999, http://webreview. com/wr/pub/1999/02/26/webdb/index. html.
- John Paul Ashenfelter, "The Application Server Market- place," Webreview.com. Accessed Feb. 26, 1999, http://webreview. com/wr/pub/1999/02/26/webdb/index3.html.
- Allaire Corp. Accessed Aug. 3, 1999, www.allaire.com.
- Pervasive Software, Inc. Accessed Aug. 3, 1999, www. everyware.com/products/tango/.
- Wodaski, Ron. "ASP Technology Overview." Accessed Nov. 1, 1999, http://msdn.microsoft.com/workshop/server/ asp/aspfeat. asp?RLD=22. Several solutions are available to run ASP on nonMicrosoft platforms. For instance, Chilisoft (www. chilisoft.com) and Halcyon (www.halcyonsoft.com/index.htm).
- For more on PHP, see www.php.net/usage.php3, www.php.net, and www.hotwired.com/webmonkey/99/21/. For more on MySQL, see www.mysql.net/.
- "Welcome to the World of AOL Server" Accessed Nov. 1, 1999, www.aolserver.com/; Meta-Html.Com, Accessed Nov. 1, 1999, www.metahtml.com/.
- Louis Rosenfeld and Peter Morville, Information Architecture for the World Wide Web (Sebastopol, Calif.: O'Reilly, 1998): 24-25.
- Ibid., 46.
- Roy Tennant, "Personalizing the Digital Library," Library Journal 124, no. 12 (July 1999): 36-38.
- One problem is the publisher field in our MARC serial records, which according to standard cataloging practice is not updated. "With the exception of the final date of publication, significant changes appearing on later issues are recorded in notes, when considered desirable. Do not clutter the record with minor changes, particularly those that involve commercial publishers," CONSER Cataloging Manual, 10.6. A periodical's current publisher is useful information for managing electronic journals because, for example, if access to one title is "broken," typically all other titles from that publisher will be broken and need to be coded not to display on the Web.
- One option is to use the MARC record as the central database record. Such a project at Los Alamos National Laboratory is described in Frances L. Knudson et al, "Creating Electronic Journal Web Pages from OPAC Records," Issues in Science and Technology Librarianship 15 (Summer 1997). Accessed Aug. 3, 1999, www.library.ucsb.edu/istl/97summer/article2. html. Information pertaining to the electronic version of a journal was coded in a local 956 field. Customized CGI scripts were written and run nightly to extract records from the catalog and reformat them into various HTML displays.
- Such research is already underway in the Stanford Lane Medical Library "Medlane" project. Dick Miller described their research at the Medical Libraries Association annual meeting, May 15-19, 1999, Chicago, IL. His presentation is available at http://krypton.Stanford.EDU:8080/~dmiller/.
- Ortiz-Repiso and Moscoso, 74-75.
- Declan Butler, "The Writing Is on the Web for Science Journals in Print," Nature 397, no. 6716 (Jan. 21, 1999): 195-200. Projects such as E-biomed (www.nih.gov/welcome/director/ ebiomed/ebiomed.htm) and BioOne (www.arl.org/sparc/ bio1.html) are taking significant steps in this direction.
Kristin Antelman (kaa@ahsl.arizona.edu) is Head, Systems and Networking, at the University of Arizona Health Sciences Library.
http://www.lita.org/ital/1804_antelman.html
Copyright 1999, American Library Association