Commercial Patent Databases

Text searching for pharmaceutical patents has serious limitations. Because drugs are usually expressed as molecular structures, there may or may not be a text description of the compounds of interest. The earliest patent application on a new class of drugs is often submitted during the discovery research phase, long before a generic name has been assigned, and before systematic chemical nomenclature becomes consistent. There are many ways to write the name of a single organic compound, and variations in spelling and punctuation are too unpredictable to find by searching the text of a patent. For reliable retrieval it is necessary to refer to the index of a value-added database like Chemical Abstracts® that uses standardized nomenclature. In addition, compounds are often illustrated graphically or embedded in complex generic chemical structure drawings without being named. Patent claims often point to a specific compound within a generic structure by naming only the unique features of the structure drawing, e.g., "a compound of Claim 1 wherein X is chlorine.'' This disadvantage is less important late in the life cycle of a drug, when patents on new dosage forms are likely to contain a long list of the generic names of active ingredients suitable for administration. For complete retrieval of patent information on pharmaceuticals, it is necessary to search by chemical structure. And because the chemical structures in patents are often generic Markush structures, it is necessary to search in a database where the generic structures are indexed in full. An individual compound of interest may be embedded within a generic Markush structure in a patent. This is particularly true when looking for the earliest patent covering a drug molecule, since patent applications are frequently filed before the lead compounds (the ones that will ultimately become marketed drugs) are identified. Structure searchable patent databases are quite expensive, as indexing all of the chemical substances in a patent and making the structures searchable is extremely labor intensive. The most important producers of chemical structure searchable databases are Chemical Abstracts Service (CAS) and Thomson Derwent, each of which covers patents from most of the industrialized countries of the world. (For a more detailed explanation of chemical structure searching resources, please refer to the chapter on Chemistry)

CAS has been indexing chemically related patents, including pharmaceutical patents, since 1907. CAS provides summary abstracts, focusing on chemical aspects ofthe patent, and deep controlled indexing ofthe technology described in the patent. Specific compounds are searchable topologically in the Registry File on the STNInternational search service, and can be transferred into the bibliographic Chemical Abstracts CA or CAPlus databases to retrieve both patent and nonpatent records. The Registry file incorporates the biosequence structure databases of Genbank®, European Molecular Biology Laboratory (EMBL) and DNA Database & Japan (DDBJ) with the biosequences indexed by CAS abstractors.

Although the Registry File can be searched with a generic query, the structures retrieved by the generic search are always specific compounds from the patent claims and examples. CAS has been indexing generic chemical structures from patents since the late 1980s in the MARPAT database. Marpat structure searches retrieve the bibliographic record of a patent rather than a chemical structure record, so no crossfile searching is needed to retrieve the bibliographic record. A convenient multifile search environment on STN is called CASLINK. Entering a chemical structure search in CASLINK initiates a search of the Registry, Marpat and Mar-pat Previews structure databases, transfers the Registry hits to the CAPlus database, and deduplicates with the Marpat and Marpat Previews bibliographic records.

Derwent World Patents Index (DWPI). Produced by Thomson Derwent, it covers pharmaceutical patents from around the world from 1963 to the present. Derwent produces its own abstracts of the patents and offers several different chemical structure retrieval tools. Broad chemical groupings and non-chemical aspects of a patent are assigned Manual Codes. Manual Codes are similar to patent classification codes in that they are applied as a general screening tool to divide a file of patent records into subsets that can be screened "manually," and are not intended to identify a specific patent. Der-went's Manual Codes are alphanumeric symbols, subdivided according to the technical field of the indexed patents, and based on a hierarchy of chemical structures and utilities. The chemical hierarchy gives the greatest weight to fused heterocyclic ring systems, with monocyclic heterocyclic rings, and carbocyclic and noncyclic structures lower in the hierarchy rank according to functional groups present in the molecule. Hierarchies of Manual Codes are available for therapeutic uses and other features of patents in Derwent's 13 technologically-based sections. For generic structures, all appropriate Manual Codes are applied to a patent, but the codes merely classify the patent rather than defining the compounds it covers. Derwent uses a chemical fragmentation code to index both generic structures and Markush structures in patents. The fragments define substructural units of the molecules such as ring systems, functional groups, and relationships among atoms. They incorporate essential group codes that are indexed only when a group is required rather than optional and can safely be negated from the search strategy. The fragment coding is stored in the bibliographic record of the patent. Chemical fragmentation codes are assigned for each fragment that can be present in a compound. Because the fragmentation code does not define all of the connections in a molecule, patents retrieved by a search will include both relevant and irrelevant answers. A separate polymer code is used to find nonbiological polymers.

Derwent also provides a topological retrieval system, the Merged Markush Service (MMS), which contains chemical structures from the PharmSearch database created by INPI, the French Patent Office. MMS is available only on the Questel-Orbit search service, and uses the Markush DARC retrieval system. Topological representations of specific or generic structures may be searched in the MMS file and the resulting chemical structure records transferred to the bibliographical DWPI and PharmSearch files. The Derwent Chemistry Resource (DCR) is a newer topological structure search and retrieval system for DWPI, which contains structure records for specific compounds. DCR is embedded within the MMS for searching on Questel-Orbit and is a segment of the DWPI databases on STN. The DCR is searched on STN, with the same query representation one would use to search the structure in the Registry file, and retrieves compound numbers that must be converted to search terms to identify the corresponding bibliographic records within the DWPI file.

GeneSeqTM. Thomson Derwent's biosequence database, provides information on nucleic and amino acid sequences found in the patent literature. It has biosequence indexing for the patents included in the DWPI database beginning with the very first patents to carry protein and nucleotide sequence descriptions. GeneSeq structure searches retrieve records with the sequence, a short abstract directed to that sequence, and bibliographic information about the patent. Each sequence has its own record. GeneSeq can be searched on the STN service in file DGENE, and it can be purchased as a flat file for searching on a company's own computers.

Protein and nucleic acid sequences are submitted electronically to the United States Patent and Trademark Office (USPTO) to avoid the introduction of errors in printed documents and to simplify the job of examining patent claims that include biosequences. Short sequence listings are printable in the USPTO's full text database, but for longer sequences the electronic sequence records are stored in the Publication Site for Issued and Published Sequences (PSIPS), located at http://

The IFI CLAIMS® Comprehensive Database (IFICDB) covers only U.S. patents using its own controlled indexing and chemical fragment codes. The fragment codes are applied to both specific compounds and Markush structures. Compounds that appear frequently in patents are indexed with a unique compound number, so that a complete search must take into account both the specific and generic indexing. IFICDB does not index biosequences. A companion file, the IFI Current Legal Status database, indexes changes in the status of U.S. patents that occur after the patent is granted, and is the most complete of the patent status databases.

Thomson Current Patents. This is one company that is completely devoted to pharmaceutical and biotechnology patents. The Current Patents Gazette is a rapid current awareness publication covering newly published patent applications from the PCT, EP, U.K., and U.S., classified according to claim types, discussed and put into context by Current Patent's editorial team. Unlike most patent current awareness publications, which objectively reproduce the information printed on the patent, the Current Patents Gazette attempts to evaluate the importance of the inventions in newly published patents, both to the patentee and to the pharmaceutical industry in general. Thomson Current Patents also produces DOLPHIN, the Database of all Pharmaceutical Inventions. This database uses bibliographic, patent family, and status data from the INPADOC database and integrates abstracts from the Patent fast alert service, commentary from the Current Patents Gazette, and information from the IDdb3 database of drugs in development. Statistical analysis tools are integrated into the service in order to generate graphical pictures of such things as company holdings and trends in the patenting of a particular drug or area of therapy.

IFI CLAIMS. IFI CLAIMS Patent Service, Wilmington DE. United States patents and published applications from 1950. Contains searchable patent front page information, abstract, and claims, with standardized patentee information and enhanced titles. There are three versions of the CLAIMS database, the simple bibliographic/abstract database IFI PATENT (IFIPAT), the IFI Uniterm Database (IFIUDB) with controlled subject term indexing and a simple chemical fragmentation code, and the IFI Comprehensive Database (IFICDB), with the same controlled subject indexing and a much more specific chemical and polymer fragmentation code. The IFI Current Legal Status Database covers postgrant status changes for U.S. patents. Available online.

Derwent World Patents Index® (WPI). Thomson Derwent, London. Available online. Abstracts, patent family information, and proprietary indexing for patents from 40 patent issuing authorities. Beginning with pharmaceutical patents in 1963, the technological and country coverage has increased over time. Chemical indexing is accessible only with a corporate subscription. Available from STN, Questel-Orbit and Dialog® search services, and Delphion. The Derwent Patents Citation Index indexes patents cited by patent examiners, combining the citing patents and cited patents and nonpatent literature in WPI patent family records. Available online and in other formats.

INPADOC. EPO, Vienna. A patent family and status database produced by the EPO from data provided by about 70 patenting authorities. Bibliographic information is available from many countries, with legal status information provided by a smaller, but growing number and abstracts from a few. Available from STN, Questel-Orbit and Dialog search services, MicroPatent, and Delphion.

PLUSPAT and FAMPAT. Questel-Orbit, Paris. Created by merging INPADOC, several national patent databases, and the EPO's DOCDB database, this version of INPADOC additionally has all of the text and status information abstracts from Questel-Orbit's U.S., French, European, and PCT patent databases. In addition to access through the Questel-Orbit search service, PLUSPAT is the basis of the end-user subscription QPAT service. Available online.

Patent Abstracts of Japan (PAJ). JAPIO, Tokyo. English language abstracts of published Japanese patent applications. Widely available as a standalone database on fee-based searches and the Internet and as a resource for Web-based search systems. Most full-text patent search services include Japanese information from PAJ without emphasizing that only a short abstract is searchable.

CAPlus. Chemical Abstracts Service, Columbus, OH. In addition to abstracts, patent family information and proprietary indexing for chemically related patent and nonpatent and literature beginning in 1907, CAPlus on STN has topological indexing of chemical substances in the companion Registry file. Generic structures from patents in the late 1980s are searchable in the Marpat database. CAPlus, but not Marpat, is available through the SciFinder end-user interface, and a version of Chemical Abstracts without searchable chemical structures, patent families, or records before 1967 is available on other search services. Available online.

PharmSearch. Institut National de Propriete Industrielle, Paris. Pharmaceutical patents indexed by INPI, the French Patent Office, with chemical structures searchable in the Mar-kush DARC system on Questel-Orbit. Abstracts, bibliographic information, and proprietary indexing from one patent per family. PharmSearch shares the topological chemical structure database, the MMS, with DWPI. Nonstructural indexing was discontinued at the end of 1999. Available online.

AUREKA Online Service. MicroPatent, East Haven, CT. URL: Web-based subscription search service intended for sharing of patent information within an enterprise. Has full-text searchable U.S., German, British, French, European and WO patent documents and Patent Abstracts of Japan. Patent images are available as annotatable Smart Patents. Search results can be saved, annotated, and shared by workgroups. Aureka includes tools for mapping patents graphically.

GeneSeq. Thomson Derwent, London. Virtually all patents covering polypeptides and polynucleotides, searchable by bio-sequence and keywords. Bibliographic records are customized to the sequence, and correlate to patent family records from the DWPI. Available online.

MicroPatent Patent Web. Micropatent, East Haven, CT. URL: Web-based subscription search and document delivery service. Has full text searchable U.S., EP, and WO patent documents and Patent Abstracts of Japan and INPADOC patent family information. Patent images of the searchable patents and a large collection of patents from other countries are available as PDF files without subscription payments.

DELPHION. Thomson Delphion, Lisle, IL. URL: http://www. Full text searchable U.S., German, EP, and WO patent documents, INPADOC records and Patent Abstracts of Japan. DWPI is searchable and Derwent abstracts are displayable for an additional charge. PDF records are available for U.S., German, EP, WO, and Swiss documents; other patents identified by INPADOC are available in PDF format for an additional fee. Patent information is shown as an integrated view, combining bibliographic data and links to INPADOC family and status tables. Delphion has tools for saving search results, mapping patents, and creating current awareness searches.

DOLPHIN. Thomson Current Patents, London. Subscription service with searchable biographic data, controlled indexing, legal status, and patent family information about patents on all phases of pharmaceutical technologies. Business information and news about the companies and institutes that own the patents are integrated, and graphical displays of patents by company and therapeutic area are available.

