Method to strip C comments from patch files

https://stackoverflow.com/questions/10555777

07-06-2021
|

Question

I'm looking at trying to strip out C comments from our patch files and have looked at numerous regexes, but if we remove lines from our patches - it would break them.

How would you write a regex or sed command to search diff patch files for comments and replace comment lines with blank spaces.

This works sed regex works for C files, but for patches I need something different:

sed '/^\/\*/,/\*\//d'

An example patch exerpt would be:

@@ -382,7 +391,109 @@
        return len;
 }

+/**********************************************************************************
+ * Some patch
+ * Author: Mcdoomington
+ * Do somethimg
+ * 
+ * fix me
+ **********************************************************************************/

Anyone have ideas?

Edit:

Using this filter:

sed -e 's,^+ \*.*,+ \/\/Comment removed,' mypatch.patch > output


+/**********************************************************************************
+ //Comment removed
+ //Comment removed
+ //Comment removed

How do I add a if line ends with \ to skip it?

Edit: Solution

While not the cleanest way - I used sed with a jury-rigged regex.

sed -e '/[^\*\/]$/{N;s,^+ \* .*,+ \* Comment removed,;}' patch > output
sed -e '/[^\*\/]$/{N;s,^+\\\* .*,+ \/\* Comment removed,;}' patch > output

Note the second command can be a bit too greedy, but for the purposes of sanitizing comments - this works!

How it works:

1.) First command To determine if this is the end of a comment /[^*/]$/ determines if it is / then {N;s,^+\* .,+ /* Comment removed,;}' finds +* (whatever) and replaces it with * Comment removed.

2.) Second command To determine if this is the end of a comment /[^*/]$/ determines if it is / then {N;s,^+\* .,+ /* Comment removed,;}' finds + * (whatever) and replaces it with * Comment removed.

Solution 2

I just used a quick and dirty hackjob that canned most of the comments using

sed -e '/[^\*\/]$/{N;s,^+ \* .*,+ \* Comment removed,;}' patch > output
sed -e '/[^\*\/]$/{N;s,^+\\\* .*,+ \/\* Comment removed,;}' patch > output

OTHER TIPS

Regular expressions are wonderful, but not that wonderful.

I would remove the comments before creating the patch.

If you can't do this, I would apply the patch. Remove the comments from both patched and unpatched files then re-create the patch.

So starting with x.h we edit it to x1.h and create a patch:

diff -u x.h x1.h > patch

Then we publish the patch to someone who has x.h.

cp x.h xnc.h
sed -e '/^\/\*/,/\*\//d' -i xnc.h
patch x.h patch
cp x.h xnc2.h
sed -e '/^\/\*/,/\*\//d' -i xnc2.h
diff -u xnc.h xnc2.h > patchnc

should create the comment-free patch.

But if I have patched and unpatched source trees, then

find unpatched -exec sed -e ':^/\*:,:\*/:d' -i "{}" \;
find patched -exec sed -e ':^/\*:,:\*/:d' -i "{}" \;
diff -urN unpatched patched > patch

I would not use regular expressions. In general they work within a line. And your file will hold comments which run over multiple lines.

I would write a simple parser in C/C++ or Java.

Start with state 0.

In state 0 just read character by character (and output it) until you find a sequence of /*

Then switch to state 1.

In state 1 just read character by character (and DO NOT output it) until you find a sequence of */

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow