Question

I am stumped on how to attempt this. I have read through some posts but here but I'm quite a novice at Excel so not sure what way to go.

I am trying to add metadata to sound effects files. They came with a PDF which I've converted in to an xlsx. Normally you get one line for each sound file and it's as easy as copying the description column and pasting it into the metadata writing program. Problem I have with this CD is that multiple sounds are recorded in one file, so the description is in multiple rows.

I need to combine all the descriptions for each file into one cell, then remove duplicate words, so I can then paste it into each single files description.

B column shows it's the same file by it's leading number.

File 1 = 1-1, 1-2 File 2 = 2-1, 2-2, 2-3, 2-4 File 3 = 3-1, 3-2, 3-3, 3-4, 3-5 and so on for 990 files

So for File 1 my output would be C2 + D2 + C3 + D3 = AIR, JET STRONG STREAM THROUGH ANIMAL, FOOTSTEP ANIMAL FOOTSTEPS IN DIRT: VARIOUS MOVEMENTS

Does this seem like it's doable? I tried banging my head on the wall but it didn't help ;)

You can see from the first few entries that certain files won't matter like File 3 but I really need to have the others have searchable metadata.

FS01    1-1 AIR, JET STRONG AIR STREAM THROUGH JET
FS01    1-2 ANIMAL, FOOTSTEP ANIMAL FOOTSTEPS IN DIRT: VARIOUS MOVEMENTS
FS01    2-1 APPLAUSE, CROWD SMALL INDOOR CROWD APPLAUSE WITH SLIGHT BUILD
FS01    2-2 APPLAUSE, CROWD SMALL INDOOR CROWD APPLAUSE
FS01    2-3 APPLAUSE, CROWD SMALL INDOOR CROWD APPLAUSE
FS01    2-4 APPLAUSE, CROWD SMALL OUTDOOR CROWD APPLAUSE WITH VOICES
FS01    3-1 APPLAUSE, CROWD SMALL CROWD APPLAUSE
FS01    3-2 APPLAUSE, CROWD SMALL CROWD APPLAUSE
FS01    3-3 APPLAUSE, CROWD SMALL CROWD APPLAUSE

enter image description here

Was it helpful?

Solution

There are several problems to contend with here:

  • Files can have variable numbers of "parts"
  • Several "parts" within the same file can have the same "description"
  • And possibly, the same "description" can occur in different files

One solution to these problems is to

  1. Construct a unique list of the files in your data
  2. Construct a unique list of the descriptions in your data
  3. Construct a table showing for each possible combination of file and description whether that combination is present in your data or not
  4. Using the table to construct a list for each file of those descriptions where the file/description combination exists

An Excel Pivot Table can be used for steps 1-3, above and a bit of twiddling with formulae takes care of 4.

The image below provides an illustration.

Excel screenshot

There are three elements in the worksheet.

Range A1:B12 provides the data. There are 11 data points and each point comprises a File and a Desc (short for 'Description') value. I have deliberately kept both parts short and simple, though there is no reason why they could not be much longer than 1 character each.

Range D1:I6 is the Pivot Table. (This feature of Excel is extremely powerful and really worth getting to know.) The Pivot Table has been constructed so that its rows correspond to File values and its columns to Desc values. The Pivot Table shows that the data contains 4 distinct File values (1, 2, 3 and 4) and 5 distinct Desc values (A, B, C, D and E). (The File and Desc values showing in the Pivot Table are placed there by Excel.) The entries in the body of the Pivot table count the File/Desc combinations that occur in the data and an empty cell indicates that the combination does not occur. So, for example cell E4 is empty meaning that the combination 2 A does not occur in the data whilst cell G5 has a value of 2 signifying that 3 C occurs twice. If the data is changed the Pivot Table can be refreshed with a single click and will then reflect the newly changed data.

Range D10:I13 completes the task and is derived from the values in the Pivot Table. There are three formula involved:

Cell D10: =D3

Cell E10: =IF(E3>0,E$2,"")

Cell F10: =E10 & IF(AND(LEN(E10)>0,F3>0),"/","") & IF(F3>0,F$2,"")

F10 is copied to range G10:I10 and then D10:I10 is copied down to D11:I13.

The first formula (used in D10:D13) simply reproduces the list of File values from the Pivot Table. The second formula (used in E10:E13) places either the first Desc value or the zero-length string "" according to whether the values in the corresponding area of the Pivot Table are positive or blank. The third formula (used in F10:I13) concatenates three strings. The third (rightmost) string is created in a similar way to the second formula using the Desc at the top of the corresponding Pivot Table column. The first (leftmost) string is simply the value in the cell to the left. The second (middle) string is either the zero length string or a delimiter. The latter is used only when both the first and third strings are not the zero-length string. I have used the forward slash character / as a delimiter to separate successive 'Desc' values but the delimiter could be pretty much anything such as , or *and* simply by modifying the third formula appropriately.

Each row of the third area effectively performs a cumulative concatenation of the Desc values present in the data for the corresponding File. The final column, labelled FullDesc contains the full list of Desc values for each File.

Although I have shown the data listed in File order this is not necessary for the approach to work. You should be able to use the approach set out here as the basis of your own solution.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top