
I want to build lists of prefixes and suffixes of some length from all the IUPAC names mentioned in Pubchem Database,so that I can use them further in my project as a feature.So I want all the IUPAC chemical names in a text file or in some format where I can extract these lists.


No correct solution


Sounds you need something like this Nist species list

You can search for most also in the Webbook but I failed to find a download link for the complete set.

In our lab we got a Cd(?) with the mass spectral database which contained the (complete? - well it got like 250.000 substances) database as text file. Maybe you can get that through some of the vendors.

The pubchem site offers you to download a dump of their data by ftp. Why not use that?

PubChem data can be downloaded via ftp from the PubChem site. A complete description of the available data can be obtained here:

Of particular interest for the question of IUPAC names, the data are downloadable from the "Compound Extras" section of the ftp site:

The README-Extras file in this location describes the data in detail. For the IUPAC names, the following information is provided:


This is a listing of all CIDs with their computed IUPAC names. It is a gzipped text file with CID, tab, IUPAC on each line. Note that the names may contain UTF8 characters.

A download today (23-Apr-2020) contains 102,586,778 rows. An excerpt of the information is shown below.

> head CID-IUPAC
1       3-acetyloxy-4-(trimethylazaniumyl)butanoate
2       (2-acetyloxy-3-carboxypropyl)-trimethylazanium
3       5,6-dihydroxycyclohexa-1,3-diene-1-carboxylic acid
4       1-aminopropan-2-ol
5       (3-amino-2-oxopropyl) dihydrogen phosphate
6       1-chloro-2,4-dinitrobenzene
7       9-ethylpurin-6-amine
8       2,3-dihydroxy-3-methylpentanoic acid
9       (2,3,4,5,6-pentahydroxycyclohexyl) dihydrogen phosphate
11      1,2-dichloroethane
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top