Comment utiliser sed pour ne remplacer que la première occurrence dans un fichier?
-
02-07-2019 - |
Question
Je souhaite mettre à jour un grand nombre de fichiers source C ++ avec une directive include supplémentaire avant tout fichier #includes existant. Pour ce type de tâche, j'utilise normalement un petit script bash avec sed pour réécrire le fichier.
Comment obtenir sed
simplement la première occurrence d'une chaîne dans un fichier plutôt que de remplacer toutes les occurrences?
Si j'utilise
sed s/#include/#include "newfile.h"\n#include/
il remplace tous les #includes.
Les suggestions alternatives visant à atteindre le même objectif sont également les bienvenues.
La solution
# sed script to change "foo" to "bar" only on the first occurrence
1{x;s/^/first/;x;}
1,/foo/{x;/first/s///;x;s/foo/bar/;}
#---end of script---
ou, si vous préférez: Note de la rédaction: fonctionne avec GNU sed
uniquement.
sed '0,/RE/s//to_that/' file
Autres conseils
Écrivez un script sed qui remplacera uniquement la première occurrence de " Apple " par " banane "
Exemple d'entrée: Sortie:
Apple Banana
Orange Orange
Apple Apple
Voici le script simple: Remarque de l'éditeur: fonctionne avec GNU sed
uniquement.
sed '0,/Apple/{s/Apple/Banana/}' filename
sed '0,/pattern/s/pattern/replacement/' filename
cela a fonctionné pour moi.
exemple
sed '0,/<Menu>/s/<Menu>/<Menu><Menu>Sub menu<\/Menu>/' try.txt > abc.txt
Note de la rédaction: les deux fonctionnent avec GNU sed
uniquement.
An overview of the many helpful existing answers, complemented with explanations:
The examples here use a simplified use case: replace the word 'foo' with 'bar' in the first matching line only.
Due to use of ANSI C-quoted strings ($'...'
) to provide the sample input lines, bash
, ksh
, or zsh
is assumed as the shell.
GNU sed
only:
Ben Hoffstein's anwswer shows us that GNU provides an extension to the POSIX specification for sed
that allows the the following 2-address form: 0,/re/
(re
represents an arbitrary regular expression here).
0,/re/
allows the regex to match on the very first line also. In other words: such an address will create a range from the 1st line up to and including the line that matches re
- whether re
occurs on the 1st line or on any subsequent line.
- Contrast this with the POSIX-compliant form
1,/re/
, which creates a range that matches from the 1st line up to and including the line that matchesre
on subsequent lines; in other words: this will not detect the first occurrence of anre
match if it happens to occur on the 1st line and also prevents the use of shorthand//
for reuse of the most recently used regex (see next point).[1]
If you combine a 0,/re/
address with an s/.../.../
(substitution) call that uses the same regular expression, your command will effectively only perform the substitution on the first line that matches re
.
sed
provides a convenient shortcut for reusing the most recently applied regular expression: an empty delimiter pair, //
.
$ sed '0,/foo/ s//bar/' <<<$'1st foo\nUnrelated\n2nd foo\n3rd foo'
1st bar # only 1st match of 'foo' replaced
Unrelated
2nd foo
3rd foo
A POSIX-features-only sed
such as BSD (macOS) sed
(will also work with GNU sed
):
Since 0,/re/
cannot be used and the form 1,/re/
will not detect re
if it happens to occur on the very first line (see above), special handling for the 1st line is required.
MikhailVS's answer mentions the technique, put into a concrete example here:
$ sed -e '1 s/foo/bar/; t' -e '1,// s//bar/' <<<$'1st foo\nUnrelated\n2nd foo\n3rd foo'
1st bar # only 1st match of 'foo' replaced
Unrelated
2nd foo
3rd foo
Note:
The empty regex
//
shortcut is employed twice here: once for the endpoint of the range, and once in thes
call; in both cases, regexfoo
is implicitly reused, allowing us not to have to duplicate it, which makes both for shorter and more maintainable code.POSIX
sed
needs actual newlines after certain functions, such as after the name of a label or even its omission, as is the case witht
here; strategically splitting the script into multiple-e
options is an alternative to using an actual newlines: end each-e
script chunk where a newline would normally need to go.
1 s/foo/bar/
replaces foo
on the 1st line only, if found there.
If so, t
branches to the end of the script (skips remaining commands on the line). (The t
function branches to a label only if the most recent s
call performed an actual substitution; in the absence of a label, as is the case here, the end of the script is branched to).
When that happens, range address 1,//
, which normally finds the first occurrence starting from line 2, will not match, and the range will not be processed, because the address is evaluated when the current line is already 2
.
Conversely, if there's no match on the 1st line, 1,//
will be entered, and will find the true first match.
The net effect is the same as with GNU sed
's 0,/re/
: only the first occurrence is replaced, whether it occurs on the 1st line or any other.
NON-range approaches
potong's answer demonstrates loop techniques that bypass the need for a range; since he uses GNU sed
syntax, here are the POSIX-compliant equivalents:
Loop technique 1: On first match, perform the substitution, then enter a loop that simply prints the remaining lines as-is:
$ sed -e '/foo/ {s//bar/; ' -e ':a' -e '$!{n;ba' -e '};}' <<<$'1st foo\nUnrelated\n2nd foo\n3rd foo'
1st bar
Unrelated
2nd foo
3rd foo
Loop technique 2, for smallish files only: read the entire input into memory, then perform a single substitution on it.
$ sed -e ':a' -e '$!{N;ba' -e '}; s/foo/bar/' <<<$'1st foo\nUnrelated\n2nd foo\n3rd foo'
1st bar
Unrelated
2nd foo
3rd foo
[1] 1.61803 provides examples of what happens with 1,/re/
, with and without a subsequent s//
:
- sed '1,/foo/ s/foo/bar/' <<<$'1foo\n2foo'
yields $'1bar\n2bar'
; i.e., both lines were updated, because line number 1
matches the 1st line, and regex /foo/
- the end of the range - is then only looked for starting on the next line. Therefore, both lines are selected in this case, and the s/foo/bar/
substitution is performed on both of them.
- sed '1,/foo/ s//bar/' <<<$'1foo\n2foo\n3foo'
fails: with sed: first RE may not be empty
(BSD/macOS) and sed: -e expression #1, char 0: no previous regular expression
(GNU), because, at the time the 1st line is being processed (due to line number 1
starting the range), no regex has been applied yet, so //
doesn't refer to anything.
With the exception of GNU sed
's special 0,/re/
syntax, any range that starts with a line number effectively precludes use of //
.
You could use awk to do something similar..
awk '/#include/ && !done { print "#include \"newfile.h\""; done=1;}; 1;' file.c
Explanation:
/#include/ && !done
Runs the action statement between {} when the line matches "#include" and we haven't already processed it.
{print "#include \"newfile.h\""; done=1;}
This prints #include "newfile.h", we need to escape the quotes. Then we set the done variable to 1, so we don't add more includes.
1;
This means "print out the line" - an empty action defaults to print $0, which prints out the whole line. A one liner and easier to understand than sed IMO :-)
Quite a comprehensive collection of answers on linuxtopia sed FAQ. It also highlights that some answers people provided won't work with non-GNU version of sed, eg
sed '0,/RE/s//to_that/' file
in non-GNU version will have to be
sed -e '1s/RE/to_that/;t' -e '1,/RE/s//to_that/'
However, this version won't work with gnu sed.
Here's a version that works with both:
-e '/RE/{s//to_that/;:a' -e '$!N;$!ba' -e '}'
ex:
sed -e '/Apple/{s//Banana/;:a' -e '$!N;$!ba' -e '}' filename
Just add the number of occurrence at the end:
sed s/#include/#include "newfile.h"\n#include/1
#!/bin/sed -f
1,/^#include/ {
/^#include/i\
#include "newfile.h"
}
How this script works: For lines between 1 and the first #include
(after line 1), if the line starts with #include
, then prepend the specified line.
However, if the first #include
is in line 1, then both line 1 and the next subsequent #include
will have the line prepended. If you are using GNU sed
, it has an extension where 0,/^#include/
(instead of 1,
) will do the right thing.
A possible solution:
/#include/!{p;d;}
i\
#include "newfile.h"
:
n
b
Explanation:
- read lines until we find the #include, print these lines then start new cycle
- insert the new include line
- enter a loop that just reads lines (by default sed will also print these lines), we won't get back to the first part of the script from here
I know this is an old post but I had a solution that I used to use:
grep -E -m 1 -n 'old' file | sed 's/:.*$//' - | sed 's/$/s\/old\/new\//' - | sed -f - file
Basically use grep to find the first occurence and stop there. Also print line number ie 5:line. Pipe that into sed and remove the : and anything after so you are just left with a line number. Pipe that into sed which adds s/.*/replace to the end which gives the a 1 line script which is piped into the last sed to run as a script on file.
so if regex = #include and replace = blah and the first occurrance grep finds is on line 5 then the data piped to the last sed would be 5s/.*/blah/.
If anyone came here to replace a character for the first occurrence in all lines (like myself), use this:
sed '/old/s/old/new/1' file
-bash-4.2$ cat file
123a456a789a
12a34a56
a12
-bash-4.2$ sed '/a/s/a/b/1' file
123b456a789a
12b34a56
b12
By changing 1 to 2 for example, you can replace all the second a's only instead.
i would do this with an awk script:
BEGIN {i=0}
(i==0) && /#include/ {print "#include \"newfile.h\""; i=1}
{print $0}
END {}
then run it with awk:
awk -f awkscript headerfile.h > headerfilenew.h
might be sloppy, I'm new to this.
As an alternative suggestion you may want to look at the ed
command.
man 1 ed
teststr='
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
'
# for in-place file editing use "ed -s file" and replace ",p" with "w"
# cf. http://wiki.bash-hackers.org/howto/edit-ed
cat <<-'EOF' | sed -e 's/^ *//' -e 's/ *$//' | ed -s <(echo "$teststr")
H
/# *include/i
#include "newfile.h"
.
,p
q
EOF
I finally got this to work in a Bash script used to insert a unique timestamp in each item in an RSS feed:
sed "1,/====RSSpermalink====/s/====RSSpermalink====/${nowms}/" \
production-feed2.xml.tmp2 > production-feed2.xml.tmp.$counter
It changes the first occurrence only.
${nowms}
is the time in milliseconds set by a Perl script, $counter
is a counter used for loop control within the script, \
allows the command to be continued on the next line.
The file is read in and stdout is redirected to a work file.
The way I understand it, 1,/====RSSpermalink====/
tells sed when to stop by setting a range limitation, and then s/====RSSpermalink====/${nowms}/
is the familiar sed command to replace the first string with the second.
In my case I put the command in double quotation marks becauase I am using it in a Bash script with variables.
Using FreeBSD ed
and avoid ed
's "no match" error in case there is no include
statement in a file to be processed:
teststr='
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
'
# using FreeBSD ed
# to avoid ed's "no match" error, see
# *emphasized text*http://codesnippets.joyent.com/posts/show/11917
cat <<-'EOF' | sed -e 's/^ *//' -e 's/ *$//' | ed -s <(echo "$teststr")
H
,g/# *include/u\
u\
i\
#include "newfile.h"\
.
,p
q
EOF
This might work for you (GNU sed):
sed -si '/#include/{s//& "newfile.h\n&/;:a;$!{n;ba}}' file1 file2 file....
or if memory is not a problem:
sed -si ':a;$!{N;ba};s/#include/& "newfile.h\n&/' file1 file2 file...
With GNU sed's -z
option you could process the whole file as if it was only one line. That way a s/…/…/
would only replace the first match in the whole file. Remember: s/…/…/
only replaces the first match in each line, but with the -z
option sed
treats the whole file as a single line.
sed -z 's/#include/#include "newfile.h"\n#include'
In the general case you have to rewrite your sed expression since the pattern space now holds the whole file instead of just one line. Some examples:
s/text.*//
can be rewritten ass/text[^\n]*//
.[^\n]
matches everything except the newline character.[^\n]*
will match all symbols aftertext
until a newline is reached.s/^text//
can be rewritten ass/(^|\n)text//
.s/text$//
can be rewritten ass/text(\n|$)//
.
The following command removes the first occurrence of a string, within a file. It removes the empty line too. It is presented on an xml file, but it would work with any file.
Useful if you work with xml files and you want to remove a tag. In this example it removes the first occurrence of the "isTag" tag.
Command:
sed -e 0,/'<isTag>false<\/isTag>'/{s/'<isTag>false<\/isTag>'//} -e 's/ *$//' -e '/^$/d' source.txt > output.txt
Source file (source.txt)
<xml>
<testdata>
<canUseUpdate>true</canUseUpdate>
<isTag>false</isTag>
<moduleLocations>
<module>esa_jee6</module>
<isTag>false</isTag>
</moduleLocations>
<node>
<isTag>false</isTag>
</node>
</testdata>
</xml>
Result file (output.txt)
<xml>
<testdata>
<canUseUpdate>true</canUseUpdate>
<moduleLocations>
<module>esa_jee6</module>
<isTag>false</isTag>
</moduleLocations>
<node>
<isTag>false</isTag>
</node>
</testdata>
</xml>
ps: it didn't work for me on Solaris SunOS 5.10 (quite old), but it works on Linux 2.6, sed version 4.1.5
Nothing new but perhaps a little more concrete answer: sed -rn '0,/foo(bar).*/ s%%\1%p'
Example: xwininfo -name unity-launcher
produces output like:
xwininfo: Window id: 0x2200003 "unity-launcher"
Absolute upper-left X: -2980
Absolute upper-left Y: -198
Relative upper-left X: 0
Relative upper-left Y: 0
Width: 2880
Height: 98
Depth: 24
Visual: 0x21
Visual Class: TrueColor
Border width: 0
Class: InputOutput
Colormap: 0x20 (installed)
Bit Gravity State: ForgetGravity
Window Gravity State: NorthWestGravity
Backing Store State: NotUseful
Save Under State: no
Map State: IsViewable
Override Redirect State: no
Corners: +-2980+-198 -2980+-198 -2980-1900 +-2980-1900
-geometry 2880x98+-2980+-198
Extracting window ID with xwininfo -name unity-launcher|sed -rn '0,/^xwininfo: Window id: (0x[0-9a-fA-F]+).*/ s%%\1%p'
produces:
0x2200003
POSIXly (also valid in sed), Only one regex used, need memory only for one line (as usual):
sed '/\(#include\).*/!b;//{h;s//\1 "newfile.h"/;G};:1;n;b1'
Explained:
sed '
/\(#include\).*/!b # Only one regex used. On lines not matching
# the text `#include` **yet**,
# branch to end, cause the default print. Re-start.
//{ # On first line matching previous regex.
h # hold the line.
s//\1 "newfile.h"/ # append ` "newfile.h"` to the `#include` matched.
G # append a newline.
} # end of replacement.
:1 # Once **one** replacement got done (the first match)
n # Loop continually reading a line each time
b1 # and printing it by default.
' # end of sed script.