Question

I was tasked with creating a program that takes a hydrocarbon molecule, e.g. Pentene, C5H10, and outputs, preferably graphically, all of the possible permutations of its molecular chain. I'm not asking for it to be written for me (in fact, please don't), but could someone give me a good starting point? As for the language, I can use C#, Java or JavaScript and HTML (I'd prefer to make it using the latter, since it makes the graphical bit significantly easier).

The rules are as follows:

  • Each Carbon atom must make 4 bonds, and each Hydrogen atom must make one.
  • All of the chains must have the same amount of Carbon and Hydrogen atoms (so that they're all the same molecule).
  • Two chains are considered "different" when one cannot be altered into the other one without making or breaking any bonds. For example, all these chains are different.
  • Double bonds exist, and count as 2 bonds (this may sound like common sense, but I'm including it to remove any chance of a misunderstanding).
  • Single bonds can be "twisted", i.e. the chain on the other side of the single bond can be flipped over. However, double bonds cannot. So, 2 chains with a "twisted" single bond are the same, but 2 chains with a "twisted" double bond are not.

Please ask me if anything above is in any way unclear.

EDIT: from what I've figured out so far, if there are no "rings" of Carbon atoms and no double bonds (a double bond is essentially a "ring" with only 2 atoms), the formula of the compound is CnH2n + 2. If the formula is CnH2n, as with Pentene, then there must be one and only one double bond or one ring (as mentioned, the two are the same).

EDIT 2: as per the last rule, the following 2 chains are the same:

      H               C-H

     /               /
…-C-C       and …-C-C
     \               \
      C-H             H
Since the single bond on the left can be twisted.

EDIT 3: In response to David, here is an example of 2 chains that aren't identical:

H H H
 \|/
H C H
 \|/
H C H
 \|/
H C H
 \|/
  C     H
  |    /
  C-C=C
  | |  \
  H H   C-H
        |\
        H H

and

H H H
 \|/
H C H
 \|/
H C H
 \|/
H C H   H H
 \|/    |/
  C     C-H
  |    /
  C-C=C
  | |  \
  H H   H

This is so because to make them identical, the C=C bond would have to be twisted, which isn't allowed, as it is a double bond.

Was it helpful?

Solution

The algorithms for the ringless case are significantly easier, so I would start there. My suggested plan is as follows.

  1. A rooted chain is a chain with one H (the root) distinguished from the others. Generate all inequivalent rooted chains.

  2. For each generated rooted chain, retain it if and only if the root has been chosen "canonically" (more on this later). Now we have the inequivalent chains. (Steps 1 and 2 would be called something like "enumerating non-isomorphic unrooted 4-ary trees" in the algorithms literature.)

  3. Draw the chains. As a first cut, use Graphviz, but there are many, many algorithms that embed trees "nicely" in the Euclidean plane. This is sort of a separate question.

Step 1

Instead of generating complete chains, generate incomplete chains with one open bond. We recover complete chains by bonding the root H.

An incomplete chain is an H, or a C with three bonded incomplete chains. This definition is recursive, and so is the natural generation strategy. Generate an incomplete chain with n Cs as follows.

  • If n == 0, then output H.
  • Otherwise, for every triple n1, n2, n3 >= 0 such that n1 + n2 + n3 == n, for every incomplete chain c1 with n1 Cs, for every incomplete chain c2 with n2 Cs, for every incomplete chain c3 with n3 Cs, if c1 <= c2 <= c3 (more on this later), then output the incomplete chain consisting of a C bonded with c1, c2, c3.

The condition c1 <= c2 <= c3 is because the order of bonds doesn't seem to matter here (please correct me if I'm wrong), and we don't want to generate multiple permutations. I haven't said what c <= c' means, though, so let's fix that.

If c == H, then c <= c'. Otherwise, if c' != H, then let c1 <= c2 <= c3 be the incomplete chains bonded with the open C of c and let c1' <= c2' <= c3' be the incomplete chains bonded with the open C of c'. I'm using <= before I've finished the definition, but that's OK because this definition is structurally inductive (i.e., recursive). If c1 <= c1' and ((not c1' <= c1) or (c2 <= c2' and ((not c2' <= c2) or c3 <= c3'))), then c <= c', otherwise not. In other words, we're comparing (c1, c2, c3) lexicographically to (c1', c2', c3'). Haskellers, this is what the derived Eq/Ord instances do for

data IncompleteChain = H | C IncompleteChain IncompleteChain IncompleteChain .

Step 2

We're going to use the order again. For each incomplete chain c, generate all c' obtained by completing c with a root H, then removing a different H and reforming the incomplete chain (remember to ensure sorted order of the bonded incomplete chains before comparing). If c is less than or equal to all of these c', then retain c, otherwise don't.

One double bond

I'm leaving this section as a warmup. Let one of the doubly bonded carbons have incomplete singly bonded chains c1 and c2. Let the other have incomplete singly bonded chains c3 and c4. If I understand correctly, then there are four ways to generate equivalent chains (where one of the "ways" is to keep the existing chain): optionally swap (c1, c2) with (c3, c4), then optionally swap (c1, c3) with (c2, c4). To generate all n-carbon chains with a double bond, consider all n1, n2, n3, n4 >= 0 such that n1 + n2 + n3 + n4 == n, then all incomplete chains c1, c2, c3, c4 with n1, n2, n3, n4 carbons respectively. Output the chain if c1 <= c2 and (c1, c2) <= (min(c3, c4), max(c3, c4)).

One ring

We can generate all n-carbon chains with one ring as follows. Iterate over how many carbons are in the ring (a number k between 2 and n). Iterate over partitions n1, n1', n2, n2', ..., nk, nk' >= 0 such that n1 + n1' + n2 + n2' + ... + nk + nk' = n - k. Iterate over all incomplete chains c1, c1', c2, c2', ..., ck, ck' with n1, n1', n2, n2', ..., nk, nk' carbons. Chains ci and ci' will be bonded to the ith carbon in the ring. To determine whether given c1, c1', c2, c2', ..., ck, ck' should be retained, generate isomorphic presentations as follows. Start with the list [c1, c1', c2, c2', ..., ck, ck']. Create a second list [ck, ck', ..., c2, c2', c1, c1']. Create a third list [c1', c1, c2', c2, ..., ck', ck] and a fourth [ck', ck, ..., c2', c2, c1', c1]. Create 4 (k - 1) more lists by permuting the carbons in a cycle (e.g., [c2, c2', c3, c3', ..., c1, c1']). If the original list is the lexicographic minimum of all of these lists, then retain it.

OTHER TIPS

Obviously, forall C atoms it holds that there must be at least one bound to another C molecule. For something like

C - H - C

is impossible. Therefore, I would start with the simpler task to make the C skeleton first.

This is relatively simple, as an C skeleton of n molecules can only be created from a skeleton with (n-1) molecules, and then there are so many possibilities to bind the last C.

Once you have the C skeleton, simply put H atoms on the not yet fully bound C atoms.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top