Method to strip C comments from patch files
Question
I'm looking at trying to strip out C comments from our patch files and have looked at numerous regexes, but if we remove lines from our patches - it would break them.
How would you write a regex or sed command to search diff patch files for comments and replace comment lines with blank spaces.
This works sed regex works for C files, but for patches I need something different:
sed '/^\/\*/,/\*\//d'
An example patch exerpt would be:
@@ -382,7 +391,109 @@
return len;
}
+/**********************************************************************************
+ * Some patch
+ * Author: Mcdoomington
+ * Do somethimg
+ *
+ * fix me
+ **********************************************************************************/
Anyone have ideas?
Edit:
Using this filter:
sed -e 's,^+ \*.*,+ \/\/Comment removed,' mypatch.patch > output
+/**********************************************************************************
+ //Comment removed
+ //Comment removed
+ //Comment removed
How do I add a if line ends with \ to skip it?
Edit: Solution
While not the cleanest way - I used sed with a jury-rigged regex.
sed -e '/[^\*\/]$/{N;s,^+ \* .*,+ \* Comment removed,;}' patch > output
sed -e '/[^\*\/]$/{N;s,^+\\\* .*,+ \/\* Comment removed,;}' patch > output
Note the second command can be a bit too greedy, but for the purposes of sanitizing comments - this works!
How it works:
1.) First command To determine if this is the end of a comment /[^*/]$/ determines if it is / then {N;s,^+\* .,+ /* Comment removed,;}' finds +* (whatever) and replaces it with * Comment removed.
2.) Second command To determine if this is the end of a comment /[^*/]$/ determines if it is / then {N;s,^+\* .,+ /* Comment removed,;}' finds + * (whatever) and replaces it with * Comment removed.
Solution 2
I just used a quick and dirty hackjob that canned most of the comments using
sed -e '/[^\*\/]$/{N;s,^+ \* .*,+ \* Comment removed,;}' patch > output
sed -e '/[^\*\/]$/{N;s,^+\\\* .*,+ \/\* Comment removed,;}' patch > output
OTHER TIPS
Regular expressions are wonderful, but not that wonderful.
I would remove the comments before creating the patch.
If you can't do this, I would apply the patch. Remove the comments from both patched and unpatched files then re-create the patch.
So starting with x.h we edit it to x1.h and create a patch:
diff -u x.h x1.h > patch
Then we publish the patch to someone who has x.h.
cp x.h xnc.h
sed -e '/^\/\*/,/\*\//d' -i xnc.h
patch x.h patch
cp x.h xnc2.h
sed -e '/^\/\*/,/\*\//d' -i xnc2.h
diff -u xnc.h xnc2.h > patchnc
should create the comment-free patch.
But if I have patched and unpatched source trees, then
find unpatched -exec sed -e ':^/\*:,:\*/:d' -i "{}" \;
find patched -exec sed -e ':^/\*:,:\*/:d' -i "{}" \;
diff -urN unpatched patched > patch
I would not use regular expressions. In general they work within a line. And your file will hold comments which run over multiple lines.
I would write a simple parser in C/C++ or Java.
Start with state 0.
In state 0 just read character by character (and output it) until you find a sequence of /*
Then switch to state 1.
In state 1 just read character by character (and DO NOT output it) until you find a sequence of */