->Previous Page

->Contents

 Compound Searching:

The name of a particular compound, e.g., 2- hydroxybenzoic acid, is searchable just like any other subject. There are, however, two problems with this approach.

1. Selecting the "correct" name for a compound might not be all that easy. For example, 2-hydroxybenzoic acid could also be named o-hydroxybenzoic acid, 2-carboxyphenol, o-carboxyphenol, and salicylic acid (its common name). Different authors might refer to this relatively simple compound by different names. So to be complete, one would need to construct a list of all of the possible names. This is no simple task!

2. Not all papers that refer to a particular compound include the name of the compound in the commonly searched fields like the title or the abstract.

So, even with a "complete" list of all of the possible names of a given compound, you might well miss some crucial references. Fortunately, there is a better way to search for specific compounds that avoids both of the above problems. The solution is the use of Registry Numbers. Registry Numbers (RN) are the ONLY way that compounds should be searched if you want a comprehensive search.

 

Registry Numbers:

A Registry Number is a unique number that is assigned by Chemical Abstracts to each unique chemical compound or unique formulation. They are assigned to a particular structure, and hence don't suffer from the vagaries of different nomenclature systems and author preferences. Whenever a compound is mentioned in a paper, the registry number of the compound is included in the searchable bibliographic record for that paper, irrespective of whether the compound is mentioned in the title or the abstract of the paper.

So, Registry Numbers (RN) are the ONLY way that compounds should be searched if you want a comprehensive search. Chemical names are simply used by the searcher as a means to locate the registry number of the compound in question. Once the RN has been located, it will be used in constructing the search.

The registry number is searchable in file ca and in file caold. It operates like any other field, but does not use a field delimiter (STN automatically recognizes the unique "dash-separated" format of a registry number).

Entering a registry number: Registry numbers are searched simply by typing in the registry number (including dashes) as you would in any search command. For example:

=> s 1632-16-2

Do not include the /rn field delimiter, it will generate an error!

 

Finding a Registry Number:

To find a Registry Number, one can try some non-online resources (FREE!!!):

1. newer editions of The Handbook of Chemistry and Physics include the CAS RN.

2. the latest edition of the Dictionary of Organic Compounds also includes RN's.

3. current editions of the Aldrich Chemical Catalog also includes RN's.

If these fail, there is always the online chemical dictionary, the registry file. Searching in the registry file is VERY EXPENSIVE, and we receive NO discount when we use this file! There is a learning file for the registry file (LREG) and you should definitely practice there first.

LREG File

To enter the learning version of the registry file, type:

=> file LREG

 

In the registry files (reg and lreg), each unique compound has its own "bibliographic record." Here, for example is the record for nitrobenzene:

RN 98-95-3 LREGISTRY

CN Benzene, nitro- (8CI, 9CI) (CA INDEX NAME)

OTHER NAMES:

CN Essence of Mirbane

CN Essence of Myrbane

CN Mirbane oil

CN Nitrobenzene

CN Nitrobenzol

CN Oil of Mirbane

CN Oil of Myrbane

FS 3D CONCORD

MF C6 H5 N O2

CI COM

LC STN Files: AGRICOLA, ANABSTR, APILIT, APILIT2, APIPAT, APIPAT2,

BEILSTEIN*, BIOBUSINESS, BIOSIS, CA, CANCERLIT, CAOLD, CAPLUS,

CASREACT, CEN, CHEMCATS, CHEMINFORMRX, CHEMLIST, CBNB, CHEMSAFE,

CIN, CSCHEM, CSNB, DETHERM*, DDFU, DIPPR*, DRUGU, EMBASE, GMELIN*,

HODOC*, HSDB*, IFICDB, IFIPAT, IFIUDB, IPA, MEDLINE, MRCK*,

MSDS-OHS, NIOSHTIC, PDLCOM*, PIRA, PROMT, RTECS*, SPECINFO,

TOXLINE, TOXLIT, TULSA, ULIDAT, USPATFULL, VTB

(*File contains numerically searchable property data)

Other Sources: DSL**, EINECS**, TSCA**

(**Enter CHEMLIST File for up-to-date regulatory information)

 

STRUCTURE WAS SHOWN HERE

12174 REFERENCES IN FILE CA (1967 TO DATE)

238 REFERENCES TO NON-SPECIFIC DERIVATIVES IN FILE CA

12188 REFERENCES IN FILE CAPLUS (1967 TO DATE)

11 REFERENCES IN FILE CAOLD (PRIOR TO 1967)

 

How to find the Registry File record for a compound?

To locate the record for a particular compound in the Registry File, there are several different approaches that might seem reasonable. Three will be discussed, but only the third should be used. The others are fraught with problems.

1. If we happened to know that that was the entire name of the compound, we would use the complete name (CN) field delimiter. But this requires that we know one of the approved names/synonyms (note the OTHER NAMES field in the above record). In general this will not be the case, so DON'T USE THIS METHOD.

Natural products and biochemistry exception: Particularly in fields such as these, "identified" substances, e.g. specific named proteins, are often reported and assigned registry numbers years in advance of the experimental determination of the exact structure and/or molecular formula of the substance. In these instances, the only way to find the registry number of such substances is by way of their chemical names (/CN) or name fragments (no field specified).

2. You could try finding this record by typing in parts of the name, e.g.

=> s nitro

L7 5609 NITRO

But you get LOTS of hits! "AND"ing with other parts of the name would reduce this number. This approach, by itself, is very, laborious, inefficient, and expensive. DON'T USE THIS METHOD either.

3.** The best (really only) approach to use relies upon the molecular formula (MF). The molecular formula is part of the record (see above) and molecular formula is one of the searchable fields in the Registry File (reg) and its learning equivalent (lreg). Though the molecular formula of a specific compound does not generally describe one single compound (recall the existence of isomer structures), it is something that is not subject to the whims of authors or catalogers. Combining the molecular formula together (AND) with parts of the name that you know (from your experience in nomenclature) should give only one or a couple hits. These can then be DISPLAYed to find the correct record and hence the unique Registry Number for your compound. So, DO USE MOLECULAR FORMULA COMBINED WITH NAME FRAGMENTS.

 

Molecular Formula:

The molecular formula is part of the "bibliographic" record (see above) for a compound in the Registry File and is a searchable field (/mf) in the Registry File (reg) (and in lreg, its learning equivalent).

There are, however, a couple tricks to entering molecular formulas:

1. You must use the /MF field delimiter, otherwise STN will retrieve not only the molecular formula, but also all other molecular formulas that contain that combination of atoms. For example, entering simply C2H4 (instead of C2H4/MF) will retrieve C2CL2H4 as well as C2H4.

2. The alphabetical order is a bit odd in molecular formulas (you probably already know that). Alphabetization (and hence the writing of a molecular formula) uses the Hill System (i.e., C, H, followed by everything else in alphabetical order).

3. STN does not use subscripts. In normal molecular formulas, subscripts are used immediately following an element to indicate the number of atoms of that element in the compound. These subscripts are simply entered as normal numbers following the atomic symbol. So C6H6 becomes C6H6. NOTE: spaces can be added or not, as you choose, in the molecular formula. (e.g. c6 h5 n o2 is the same as c6h5no2).

4. STN does not use upper and lower case letters, so there will be some ambiguity in interpreting what you have entered, for example CO (the formula for carbon monoxide) is typed the same as Co (the elemental symbol for cobalt). Fortunately, the presence of numbers following elemental symbols generally separates letters that otherwise might create ambiguities.

5. Complexes of two compounds are written in a "dot formula" notation. C6H5NO2.1/2C6H7N In this hypothetical complex, the "molecule" consists of 2 molecules of C6H5NO2 complexed with 1 molecule of C6H7N. The "." indicates a complex, and the 1/2 shows the molecular ratio. In some cases, salts of organic acids are written in this fashion: C7H5O2.NA = sodium benzoate. Though you may also find it listed in the more conventional C7H5NAO2 form.

Enzymes

Specific enzymes, like all other "pure" substances, have been assigned Registry Numbers by CAS and are most effectively searched in databases like ca and medline via those numbers. The Registry Number for an enzyme can be found by searching the Registry File (file reg). In doing so, one needs to be a specific as possible (or as general as necessary) in using the name of an enzyme. For example:

=> s esterase

L1 1826 ESTERASE

(many enzymes have the word esterase somewhere in one or more of their names)

=> s esterase/cn

L2 2 ESTERASE/CN

(two enzymes go by the complete name esterase)

=> s .alpha.-Carboxylesterase/cn

L3 1 .ALPHA.-CARBOXYLESTERASE/CN

(note how Greek characters are handled)

What follows is the Registry File listing for the above enzyme:

d L3

L3 ANSWER 1 OF 1 REGISTRY COPYRIGHT 1999 ACS

RN 9016-18-6 REGISTRY

CN Esterase, carboxyl (8CI, 9CI) (CA INDEX NAME)

OTHER NAMES:

CN .alpha.-Carboxylesterase

CN 1,4-Butanediol diacrylate esterase

CN 7-Amino-3-methoxy-3-cephem-4-carboxyl ester hydrolase

CN Aliesterase

CN Aminoacyl esterase

CN B-Esterase

CN Butyrate esterase

CN Butyryl esterase

CN Carbonic esterase

CN Carboxyesterase

CN Carboxyl ester hydrolase

CN Carboxyl ester lipase

CN Carboxyl esterase

CN Carboxylate esterase

CN Carboxylesterase B

CN Carboxylesterase ES-1

CN Carboxylic acid esterase

CN Carboxylic ester hydrolase

CN Carboxylic esterase

CN Chirazyme E 1

CN Cinnamate esterase

CN Cinnamic acid esterase

CN Cinnamoyl esterase

CN E.C. 3.1.1.1

CN E.C. 3.1.1.12

CN Egasyn

CN Esterase

CN Esterase 29

CN Esterase EP10

CN Esterase, B-

CN Fluazifop-butyl esterase

CN Ketoprofen alkyl esterase

CN Ketoprofen choline esterase

CN Methyl farnesoate esterase

CN Methylbutyrase

CN Methylbutyrate esterase

CN Monobutyrase

CN Naproxen esterase

CN Neutral esterase

CN Nonspecific carboxylesterase

CN Procaine esterase

CN Propionyl esterase

CN Proteins (specific proteins and subclasses), egasyns

CN Short chain fatty acid esterase

CN Sterase

CN Thiazopyr esterase

CN Triacetin esterase

CN _Vitamin A esterase_

DR 9025-97-2, 9027-84-3, 114514-18-0, 139074-54-7

MF Unspecified

CI MAN

LC STN Files: AGRICOLA, ANABSTR, BIOBUSINESS, BIOSIS, CA, CABA, CAPLUS,

CASREACT, CEN, CHEMCATS, CHEMINFORMRX, CHEMLIST, CIN, CSCHEM, CSNB,

EMBASE, IFICDB, IFIPAT, IFIUDB, MSDS-OHS, PIRA, PROMT, TOXLINE, TOXLIT,

USPATFULL

Other Sources: EINECS**, TSCA**

(**Enter CHEMLIST File for up-to-date regulatory information)

 

*** STRUCTURE DIAGRAM IS NOT AVAILABLE ***

The above file should be enough to convince you that a given enzyme can have MANY different names. Thus, searches under one or even a couple of those names are likely to miss a large number of pertinent articles. The Registry Number (RN) of the enzyme (for the above substance RN = 9016-18-6 ) is the only way to search comprehensively for refences related to a particular enzyme. Remember the command for such a search would be:

s 9016-18-6

EC Numbers of Enzymes

Some, but not all enzymes are assigned EC (Enzyme Comission, an international organization) numbers according to the type of reaction that they catalyze and substrate upon which they operate. EC numbers are based on a decimal heirarchy system. For example the above enzyme is E.C. 3.1.1.1 You can explore the details of this system at a variety of sites:

http://prowl.rockefeller.edu/enzymes/enzymes.htm

http://www.expasy.ch/cgi-bin/enzyme-search-cl.

There are six classes of enzymes in the system:

E.C. 1... Oxido-reductases
...

E.C. 2... Transferases

...

E.C. 3... Hydrolyases

E.C. 3.1... Esters (are the substrate)
3.1.1 carboxylic esters
3.1.1.1 Carboxylesterase.

3.1.1.2 Arylesterase.

3.1.1.3 Triacylglycerol lipase.

3.1.1.4 Phospholipase A2.

...

3.1.1.12 Deleted entry

...

3.1.2 thioesters

3.1.3 phosphoric monoesters

3.1.4 phosphoric diesters

3.1.5 triphosphoric monesters

...

E.C. 3.2... Glycosidic bonds are the substrate

E.C. 3.3... Peptide bonds are the substrate

...

E.C. 4... Lyases

...

E.C. 5... Isomerases

...

E.C. 6... Ligases

...

If your enzyme has an EC number and you happen to know it, you can search it in the Registry File to find the registry number. (Note from the above file that a given enzyme may list more than one EC class. The above enzyme was previously included in the now deleted EC 3.1.1.12 class). The format for searching an EC number in the Registry File is:

s E.C. 3.1.1.1/CN

The four place EC number is reasonably specific, but remember, not all articles will include the EC number of an enzyme and not all enzymes have EC numbers. Bottom line, find the Registry Number and use it in searches.

File REG

When you are ready to do a search in the full Chemical Dictionary, the Registry File, simply type:

=> s file reg

Remember it is very expensive here, so have your search strategy well planned, and if you get stuck, do a "logoff hold" to temporarily save your L-numbered sets (or if you simply need a small amount of time to think, enter either file lreg or file lca).

Next Page - Appendices