Find/Replace special characters in text file using Bash script
![Creative The name of the picture](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO9GURib1T8z7lCwjOGLQaGtrueEthgQ8LO42ZX8cOfTqDK4jvDDpKkLFwf2J49kYCMNW7d4ABih_XCb_2UXdq5fPJDkoyg7-8g_YfRUot-XnaXkNYycsNp7lA5_TW9td0FFpLQ2APzKcZ/s1600/1.jpg)
![Creative The name of the picture](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYQ0N5W1qAOxLP7t7iOM6O6AzbZnkXUy16s7P_CWfOb5UbTQY_aDsc727chyphenhyphen5W4IppVNernMMQeaUFTB_rFzAd95_CDt-tnwN-nBx6JyUp2duGjPaL5-VgNO41AVsA_vu30EJcipdDG409/s400/Clash+Royale+CLAN+TAG%2523URR8PPP.png)
up vote
1
down vote
favorite
I'm looking for some guidance on creating a script to find and replace special characters inside a text file.
I've come up with this piece of pseudo code but filling in the blanks is a bit harder:
- Find newline & replace by space.
- Find
CP
& replace by newline. - Find
Mr. Mime
(with space) & replace byMr.Mime
(without space) - Find tab & replace by space
- Find double space & replace by single space
- Find
%
& replace with nothing (aka just leave it out) - Find " ATK DEF STA IV " & replace by space
"Find" stands for "Find All Instances".
I've been looking into sed
, but I can't seem to find how I'd handle these special characters. Any ideas much appreciated.
EDIT: As asked hereby an little snippet of the input:
CP 1593
SSS
SudowoodoâÂÂ
ATK DEF STA IV
15 15 15 100.0%
counter
rock slide
CP 1262
SSS
TangrowthâÂÂ4
ATK DEF STA IV
15 15 15 100.0%
vine whip
grass knot
CP 1077
SSS
Mr. MimeâÂÂ
ATK DEF STA IV
15 15 15 100.0%
confusion
psychic
And the expected output:
1593 SSS Sudowoodoâ 15 15 15 100.0 counter rock slide
1262 SSS TangrowthâÂÂ4 15 15 15 100.0 vine whip grass knot
1077 SSS Mr.Mimeâ 15 15 15 100.0 confusion psychic
scripts text-processing sed
add a comment |Â
up vote
1
down vote
favorite
I'm looking for some guidance on creating a script to find and replace special characters inside a text file.
I've come up with this piece of pseudo code but filling in the blanks is a bit harder:
- Find newline & replace by space.
- Find
CP
& replace by newline. - Find
Mr. Mime
(with space) & replace byMr.Mime
(without space) - Find tab & replace by space
- Find double space & replace by single space
- Find
%
& replace with nothing (aka just leave it out) - Find " ATK DEF STA IV " & replace by space
"Find" stands for "Find All Instances".
I've been looking into sed
, but I can't seem to find how I'd handle these special characters. Any ideas much appreciated.
EDIT: As asked hereby an little snippet of the input:
CP 1593
SSS
SudowoodoâÂÂ
ATK DEF STA IV
15 15 15 100.0%
counter
rock slide
CP 1262
SSS
TangrowthâÂÂ4
ATK DEF STA IV
15 15 15 100.0%
vine whip
grass knot
CP 1077
SSS
Mr. MimeâÂÂ
ATK DEF STA IV
15 15 15 100.0%
confusion
psychic
And the expected output:
1593 SSS Sudowoodoâ 15 15 15 100.0 counter rock slide
1262 SSS TangrowthâÂÂ4 15 15 15 100.0 vine whip grass knot
1077 SSS Mr.Mimeâ 15 15 15 100.0 confusion psychic
scripts text-processing sed
3
It would be helpful for you to post some sample input and output.
â glenn jackman
Feb 21 at 11:59
What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
â steeldriver
Feb 21 at 13:17
By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
â zotteken
Feb 21 at 14:44
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I'm looking for some guidance on creating a script to find and replace special characters inside a text file.
I've come up with this piece of pseudo code but filling in the blanks is a bit harder:
- Find newline & replace by space.
- Find
CP
& replace by newline. - Find
Mr. Mime
(with space) & replace byMr.Mime
(without space) - Find tab & replace by space
- Find double space & replace by single space
- Find
%
& replace with nothing (aka just leave it out) - Find " ATK DEF STA IV " & replace by space
"Find" stands for "Find All Instances".
I've been looking into sed
, but I can't seem to find how I'd handle these special characters. Any ideas much appreciated.
EDIT: As asked hereby an little snippet of the input:
CP 1593
SSS
SudowoodoâÂÂ
ATK DEF STA IV
15 15 15 100.0%
counter
rock slide
CP 1262
SSS
TangrowthâÂÂ4
ATK DEF STA IV
15 15 15 100.0%
vine whip
grass knot
CP 1077
SSS
Mr. MimeâÂÂ
ATK DEF STA IV
15 15 15 100.0%
confusion
psychic
And the expected output:
1593 SSS Sudowoodoâ 15 15 15 100.0 counter rock slide
1262 SSS TangrowthâÂÂ4 15 15 15 100.0 vine whip grass knot
1077 SSS Mr.Mimeâ 15 15 15 100.0 confusion psychic
scripts text-processing sed
I'm looking for some guidance on creating a script to find and replace special characters inside a text file.
I've come up with this piece of pseudo code but filling in the blanks is a bit harder:
- Find newline & replace by space.
- Find
CP
& replace by newline. - Find
Mr. Mime
(with space) & replace byMr.Mime
(without space) - Find tab & replace by space
- Find double space & replace by single space
- Find
%
& replace with nothing (aka just leave it out) - Find " ATK DEF STA IV " & replace by space
"Find" stands for "Find All Instances".
I've been looking into sed
, but I can't seem to find how I'd handle these special characters. Any ideas much appreciated.
EDIT: As asked hereby an little snippet of the input:
CP 1593
SSS
SudowoodoâÂÂ
ATK DEF STA IV
15 15 15 100.0%
counter
rock slide
CP 1262
SSS
TangrowthâÂÂ4
ATK DEF STA IV
15 15 15 100.0%
vine whip
grass knot
CP 1077
SSS
Mr. MimeâÂÂ
ATK DEF STA IV
15 15 15 100.0%
confusion
psychic
And the expected output:
1593 SSS Sudowoodoâ 15 15 15 100.0 counter rock slide
1262 SSS TangrowthâÂÂ4 15 15 15 100.0 vine whip grass knot
1077 SSS Mr.Mimeâ 15 15 15 100.0 confusion psychic
scripts text-processing sed
scripts text-processing sed
edited Feb 21 at 14:55
asked Feb 21 at 11:44
zotteken
627
627
3
It would be helpful for you to post some sample input and output.
â glenn jackman
Feb 21 at 11:59
What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
â steeldriver
Feb 21 at 13:17
By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
â zotteken
Feb 21 at 14:44
add a comment |Â
3
It would be helpful for you to post some sample input and output.
â glenn jackman
Feb 21 at 11:59
What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
â steeldriver
Feb 21 at 13:17
By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
â zotteken
Feb 21 at 14:44
3
3
It would be helpful for you to post some sample input and output.
â glenn jackman
Feb 21 at 11:59
It would be helpful for you to post some sample input and output.
â glenn jackman
Feb 21 at 11:59
What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
â steeldriver
Feb 21 at 13:17
What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
â steeldriver
Feb 21 at 13:17
By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
â zotteken
Feb 21 at 14:44
By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
â zotteken
Feb 21 at 14:44
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
5
down vote
accepted
sed text process is strictly line oriented, so it's pretty difficult to replace newlines with sed.
Untested:
cat file |
tr 'nt' ' ' |
sed -e 's/ CP /n/g'
-e 's/Mr[.] Mime/Mr.Mime/g'
-e 's/ */ /g'
-e 's/%//g'
Useless use ofcat
. Otherwise +1.
â David Foerster
Feb 21 at 12:29
@zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick â left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
â pa4080
Feb 24 at 8:38
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
sed text process is strictly line oriented, so it's pretty difficult to replace newlines with sed.
Untested:
cat file |
tr 'nt' ' ' |
sed -e 's/ CP /n/g'
-e 's/Mr[.] Mime/Mr.Mime/g'
-e 's/ */ /g'
-e 's/%//g'
Useless use ofcat
. Otherwise +1.
â David Foerster
Feb 21 at 12:29
@zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick â left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
â pa4080
Feb 24 at 8:38
add a comment |Â
up vote
5
down vote
accepted
sed text process is strictly line oriented, so it's pretty difficult to replace newlines with sed.
Untested:
cat file |
tr 'nt' ' ' |
sed -e 's/ CP /n/g'
-e 's/Mr[.] Mime/Mr.Mime/g'
-e 's/ */ /g'
-e 's/%//g'
Useless use ofcat
. Otherwise +1.
â David Foerster
Feb 21 at 12:29
@zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick â left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
â pa4080
Feb 24 at 8:38
add a comment |Â
up vote
5
down vote
accepted
up vote
5
down vote
accepted
sed text process is strictly line oriented, so it's pretty difficult to replace newlines with sed.
Untested:
cat file |
tr 'nt' ' ' |
sed -e 's/ CP /n/g'
-e 's/Mr[.] Mime/Mr.Mime/g'
-e 's/ */ /g'
-e 's/%//g'
sed text process is strictly line oriented, so it's pretty difficult to replace newlines with sed.
Untested:
cat file |
tr 'nt' ' ' |
sed -e 's/ CP /n/g'
-e 's/Mr[.] Mime/Mr.Mime/g'
-e 's/ */ /g'
-e 's/%//g'
edited Feb 21 at 12:12
![](https://i.stack.imgur.com/9L8vd.png?s=32&g=1)
![](https://i.stack.imgur.com/9L8vd.png?s=32&g=1)
dessert
20k55795
20k55795
answered Feb 21 at 11:58
glenn jackman
12.1k2442
12.1k2442
Useless use ofcat
. Otherwise +1.
â David Foerster
Feb 21 at 12:29
@zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick â left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
â pa4080
Feb 24 at 8:38
add a comment |Â
Useless use ofcat
. Otherwise +1.
â David Foerster
Feb 21 at 12:29
@zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick â left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
â pa4080
Feb 24 at 8:38
Useless use of
cat
. Otherwise +1.â David Foerster
Feb 21 at 12:29
Useless use of
cat
. Otherwise +1.â David Foerster
Feb 21 at 12:29
@zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick â left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
â pa4080
Feb 24 at 8:38
@zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick â left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
â pa4080
Feb 24 at 8:38
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1008356%2ffind-replace-special-characters-in-text-file-using-bash-script%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
3
It would be helpful for you to post some sample input and output.
â glenn jackman
Feb 21 at 11:59
What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
â steeldriver
Feb 21 at 13:17
By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
â zotteken
Feb 21 at 14:44