A Brief History of Searching

Brenda Bell
6 min readFeb 4, 2024

And its counterpart, Indexing

Back in the days before the Internet, one of our weekly grade-school special periods was “Library”. We were introduced to stories read aloud, and as our reading skills (and responsibility levels) improved, what the general sections of the library were, and how to check out and return books. Still later, we were introduced to the Card Catalog.

AI-generated image of a library card catalog #dalle
AI-generated image of a library catalog (via DALL-E)

Every library had three separate card catalogs — something like three views of a modern database, except that everything was generated by hand and typewriter. These were the Subject, Title, and Author indexes. (The process of creating these cards — similar to records in a database — is called indexing. It is similar to, but not identical to, the process of creating an index you might find at the back of a book.) If you knew any one of these three categories of information about a book, its catalog cards would tell you its Dewey Decimal number (or its Library of Congress number), which would then tell you its location in “the stacks” (the shelves where books are located).

The setup at the school library was similar to that at the local public library, except that it was only books (no magazines or media), and the books were general-interest books geared to students of that school’s age group. Once we started getting assigned papers that required actual research, the school library often got left behind for the public library, where books would be supplemented by magazines, journals, newspapers, and other sorts of periodical literature.

For those who need a refresher: the Card Catalog only indexed books. To search for something that might have appeared in a periodical, you went into the Reference section and looked for “Wilson’s Guide to Periodical Literature”. You would use the same search techniques (author name, title, subject) as you did with the card catalog, but you would have to search these books year by year, or month by month. Once you found the paper or article you were looking for, you would submit a slip of paper to the reference librarian, who would then go into the back and bring you either the periodical itself or a reel of microfilm or one or more sheets of microfiche, and set you up at the appropriate reader. (More about microfilm and microfiche can be found at the Wikipedia entry for “Microform”.)

If you needed something very technical — like something out of a journal aimed at, say, metallurgical engineers — you might need to go to a technical library (often hosted by a university or a professional society) to find what you were looking for. Much as with the big New York Public Library, technical libraries don’t let patrons wander through the stacks. Just as with periodicals in the public library, you have to submit a request for the particular book(s), papers, conference Proceedings, or periodicals, you wished to read. That said, periodicals relating to a specific field of study would be indexed in a special abstracts journal, which was something like Wilson’s Guide but focusing on material of interest to that field of interest. As their name implies, abstracts journals included abstracts of most of the books, papers, and articles they indexed. (And just like today, sometimes the abstract of a paper, along with information about its source, is all you need.)

With the advent of large-scale digital storage and computerized relational databases, abstracts journals began getting online counterparts. In the beginning these needed to be accessed by dial-up modems. Later on, parts of these databases might be published on CD-ROM. One of the advantages of these online databases (or abstracts databases) is that you could use multiple key terms, additional fields, and Boolean logic to refine your search.

[Basic Boolean logic: if you search for “this that”, it is generally interpreted as “this OR that”, and you will get responses with only “this” and responses with only “that” as well as responses that have both “this” and “that”. Most search engines work this way; it is called a “default OR search”. If you want responses that have both “this” and “that”, you need to search for “this AND that”. You could also search for “this NOT that”, use quotation marks to look for exactly “this that”, and use parentheses in an algebraic style to further refine your search.]

I spent seventeen years of my life working in an A&I (Abstracting and Indexing) house, curating content for an abstract journal and online database. One of the ways we indexed content to make it easier to find was to standardize the words and phrases we used for a particular material or concept. The book (later, database) with the standardized words and phrases is called a thesaurus, and while it does provide synonyms like Roget’s does, its purpose is to list all the allowable terms and to direct you from common alternate terms to the ones we were required to use.

While we were curating, indexing, and abstracting highly technical material for a government agency and a professional organization, and selling subscriptions to this very-labor-intensive information to those corporate and university libraries that needed (and could afford) it, the nascent Internet was finding ways to search itself. WAIS (“Wide Area Information Server”) was probably the first Internet archiver, and “archie” was an early predecessor of the Wayback Machine. “gopher” was a command to look for whatever you wrote after the command. “veronica” (Very Easy Rodent-Oriented Netwide Index to Computerized Archives) and “jughead” (Jonzy’s Universal Gopher Hierarchy Excavation and Display) were early mouse-navigable gopher interfaces. Finally, more graphical (and logical) interfaces were developed, such as Yahoo.

By the turn of the millenium, not only did our online database have a number of free (if more general and less inclusive) competitors, but so did Yahoo, and a tier of hybrid (free to a certain point, then paid) search sites existed as well. Almost all of these expected the user to use Boolean logic in their searches. Spiders and crawlers — programs that query every site and server for web pages and automatically index them word by word — were already part of the digital landscape, but they weren’t yet the exclusive method for indexing and archiving web sites. At the time, I expected the future of online search-and-retrieval to be

better query processing: (context-sensitive filtering ; pattern-matching; neural nets; inference engines; fuzzy logic)

more for-pay indexes; fewer free search sites (database industry shakeout: online v. traditional; intellectual property laws)[1]

Of course, things didn’t quite turn out that way. While there was a bit of a shakeout (the major winners: Google, Microsoft Bing, Yahoo), instead of putting all search behind paywalls, the costs of providing the services are funded by advertisements and SEO (Search Engine Optimization). A lot of Boolean logic has fallen by the wayside on public search engines, making the increasingly huge Internet increasingly harder to successfully search. Even the paid search sites have consolidated, many under either ProQuest or the initial publisher’s umbrellas. For reasons of intellectual property management and “infochunking” (making each paragraph of a paper and each image a separate “publication” for the purposes of increased monetization and limiting fair use), a number of primary publishers have rearranged their websites in such a way that it takes a bit of clicking around to even find their publications.

What this has done is made it more difficult to effectively search the Internet.

— — — — — — — — — — — — — — — — — -

Some of the historical data in this story was taken from presentations I gave at the Trenton Computer Festival during the time I was working in the A&I field. These are still available online, although many of the links are outdated, and some of the companies are no longer in existence.

[1]Searching the Internet: How to find what you’re looking for, presented at Trenton Computer Festival, May 6–7, 2000. (slide deck)

Finding Technical Information Over the Internet, May 3, 2003. (paper)

— — — — — — — — — — — — — — — — — -

Inspired by the comments on George Dillard’s story, “Let Them Eat Ads

--

--

Brenda Bell

libertarian, contrarian, multiply-hyphenated American she/her