Find/Replace special characters in text file using Bash script

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP








up vote
1
down vote

favorite












I'm looking for some guidance on creating a script to find and replace special characters inside a text file.



I've come up with this piece of pseudo code but filling in the blanks is a bit harder:



  • Find newline & replace by space.

  • Find CP & replace by newline.

  • Find Mr. Mime (with space) & replace by Mr.Mime (without space)

  • Find tab & replace by space

  • Find double space & replace by single space

  • Find % & replace with nothing (aka just leave it out)

  • Find " ATK DEF STA IV " & replace by space

"Find" stands for "Find All Instances".



I've been looking into sed, but I can't seem to find how I'd handle these special characters. Any ideas much appreciated.



EDIT: As asked hereby an little snippet of the input:



CP 1593
SSS
Sudowoodo♀
ATK DEF STA IV
15 15 15 100.0%
counter
rock slide
CP 1262
SSS
Tangrowth♀4
ATK DEF STA IV
15 15 15 100.0%
vine whip
grass knot
CP 1077
SSS
Mr. Mime♀
ATK DEF STA IV
15 15 15 100.0%
confusion
psychic


And the expected output:



1593 SSS Sudowoodo♀ 15 15 15 100.0 counter rock slide
1262 SSS Tangrowth♀4 15 15 15 100.0 vine whip grass knot
1077 SSS Mr.Mime♀ 15 15 15 100.0 confusion psychic









share|improve this question



















  • 3




    It would be helpful for you to post some sample input and output.
    – glenn jackman
    Feb 21 at 11:59










  • What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
    – steeldriver
    Feb 21 at 13:17










  • By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
    – zotteken
    Feb 21 at 14:44















up vote
1
down vote

favorite












I'm looking for some guidance on creating a script to find and replace special characters inside a text file.



I've come up with this piece of pseudo code but filling in the blanks is a bit harder:



  • Find newline & replace by space.

  • Find CP & replace by newline.

  • Find Mr. Mime (with space) & replace by Mr.Mime (without space)

  • Find tab & replace by space

  • Find double space & replace by single space

  • Find % & replace with nothing (aka just leave it out)

  • Find " ATK DEF STA IV " & replace by space

"Find" stands for "Find All Instances".



I've been looking into sed, but I can't seem to find how I'd handle these special characters. Any ideas much appreciated.



EDIT: As asked hereby an little snippet of the input:



CP 1593
SSS
Sudowoodo♀
ATK DEF STA IV
15 15 15 100.0%
counter
rock slide
CP 1262
SSS
Tangrowth♀4
ATK DEF STA IV
15 15 15 100.0%
vine whip
grass knot
CP 1077
SSS
Mr. Mime♀
ATK DEF STA IV
15 15 15 100.0%
confusion
psychic


And the expected output:



1593 SSS Sudowoodo♀ 15 15 15 100.0 counter rock slide
1262 SSS Tangrowth♀4 15 15 15 100.0 vine whip grass knot
1077 SSS Mr.Mime♀ 15 15 15 100.0 confusion psychic









share|improve this question



















  • 3




    It would be helpful for you to post some sample input and output.
    – glenn jackman
    Feb 21 at 11:59










  • What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
    – steeldriver
    Feb 21 at 13:17










  • By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
    – zotteken
    Feb 21 at 14:44













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I'm looking for some guidance on creating a script to find and replace special characters inside a text file.



I've come up with this piece of pseudo code but filling in the blanks is a bit harder:



  • Find newline & replace by space.

  • Find CP & replace by newline.

  • Find Mr. Mime (with space) & replace by Mr.Mime (without space)

  • Find tab & replace by space

  • Find double space & replace by single space

  • Find % & replace with nothing (aka just leave it out)

  • Find " ATK DEF STA IV " & replace by space

"Find" stands for "Find All Instances".



I've been looking into sed, but I can't seem to find how I'd handle these special characters. Any ideas much appreciated.



EDIT: As asked hereby an little snippet of the input:



CP 1593
SSS
Sudowoodo♀
ATK DEF STA IV
15 15 15 100.0%
counter
rock slide
CP 1262
SSS
Tangrowth♀4
ATK DEF STA IV
15 15 15 100.0%
vine whip
grass knot
CP 1077
SSS
Mr. Mime♀
ATK DEF STA IV
15 15 15 100.0%
confusion
psychic


And the expected output:



1593 SSS Sudowoodo♀ 15 15 15 100.0 counter rock slide
1262 SSS Tangrowth♀4 15 15 15 100.0 vine whip grass knot
1077 SSS Mr.Mime♀ 15 15 15 100.0 confusion psychic









share|improve this question















I'm looking for some guidance on creating a script to find and replace special characters inside a text file.



I've come up with this piece of pseudo code but filling in the blanks is a bit harder:



  • Find newline & replace by space.

  • Find CP & replace by newline.

  • Find Mr. Mime (with space) & replace by Mr.Mime (without space)

  • Find tab & replace by space

  • Find double space & replace by single space

  • Find % & replace with nothing (aka just leave it out)

  • Find " ATK DEF STA IV " & replace by space

"Find" stands for "Find All Instances".



I've been looking into sed, but I can't seem to find how I'd handle these special characters. Any ideas much appreciated.



EDIT: As asked hereby an little snippet of the input:



CP 1593
SSS
Sudowoodo♀
ATK DEF STA IV
15 15 15 100.0%
counter
rock slide
CP 1262
SSS
Tangrowth♀4
ATK DEF STA IV
15 15 15 100.0%
vine whip
grass knot
CP 1077
SSS
Mr. Mime♀
ATK DEF STA IV
15 15 15 100.0%
confusion
psychic


And the expected output:



1593 SSS Sudowoodo♀ 15 15 15 100.0 counter rock slide
1262 SSS Tangrowth♀4 15 15 15 100.0 vine whip grass knot
1077 SSS Mr.Mime♀ 15 15 15 100.0 confusion psychic






scripts text-processing sed






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Feb 21 at 14:55

























asked Feb 21 at 11:44









zotteken

627




627







  • 3




    It would be helpful for you to post some sample input and output.
    – glenn jackman
    Feb 21 at 11:59










  • What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
    – steeldriver
    Feb 21 at 13:17










  • By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
    – zotteken
    Feb 21 at 14:44













  • 3




    It would be helpful for you to post some sample input and output.
    – glenn jackman
    Feb 21 at 11:59










  • What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
    – steeldriver
    Feb 21 at 13:17










  • By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
    – zotteken
    Feb 21 at 14:44








3




3




It would be helpful for you to post some sample input and output.
– glenn jackman
Feb 21 at 11:59




It would be helpful for you to post some sample input and output.
– glenn jackman
Feb 21 at 11:59












What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
– steeldriver
Feb 21 at 13:17




What about longer sequences of spaces? should 3 spaces become 2? or 1? should CP be replaced even if it appears in the middle of a word, or only if it is surrounded by whitespace? or at a word boundary?
– steeldriver
Feb 21 at 13:17












By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
– zotteken
Feb 21 at 14:44





By experience I discovered there's no longer sequences than 2 spaces, but if there would be more there should be reduced to just one... CP should only be replaced if it has a space before & after.
– zotteken
Feb 21 at 14:44











1 Answer
1






active

oldest

votes

















up vote
5
down vote



accepted












sed text process is strictly line oriented, so it's pretty difficult to replace newlines with sed.

Untested:



cat file |
tr 'nt' ' ' |
sed -e 's/ CP /n/g'
-e 's/Mr[.] Mime/Mr.Mime/g'
-e 's/ */ /g'
-e 's/%//g'





share|improve this answer






















  • Useless use of cat. Otherwise +1.
    – David Foerster
    Feb 21 at 12:29










  • @zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick ✓ left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
    – pa4080
    Feb 24 at 8:38










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "89"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1008356%2ffind-replace-special-characters-in-text-file-using-bash-script%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
5
down vote



accepted












sed text process is strictly line oriented, so it's pretty difficult to replace newlines with sed.

Untested:



cat file |
tr 'nt' ' ' |
sed -e 's/ CP /n/g'
-e 's/Mr[.] Mime/Mr.Mime/g'
-e 's/ */ /g'
-e 's/%//g'





share|improve this answer






















  • Useless use of cat. Otherwise +1.
    – David Foerster
    Feb 21 at 12:29










  • @zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick ✓ left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
    – pa4080
    Feb 24 at 8:38














up vote
5
down vote



accepted












sed text process is strictly line oriented, so it's pretty difficult to replace newlines with sed.

Untested:



cat file |
tr 'nt' ' ' |
sed -e 's/ CP /n/g'
-e 's/Mr[.] Mime/Mr.Mime/g'
-e 's/ */ /g'
-e 's/%//g'





share|improve this answer






















  • Useless use of cat. Otherwise +1.
    – David Foerster
    Feb 21 at 12:29










  • @zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick ✓ left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
    – pa4080
    Feb 24 at 8:38












up vote
5
down vote



accepted







up vote
5
down vote



accepted








sed text process is strictly line oriented, so it's pretty difficult to replace newlines with sed.

Untested:



cat file |
tr 'nt' ' ' |
sed -e 's/ CP /n/g'
-e 's/Mr[.] Mime/Mr.Mime/g'
-e 's/ */ /g'
-e 's/%//g'





share|improve this answer
















sed text process is strictly line oriented, so it's pretty difficult to replace newlines with sed.

Untested:



cat file |
tr 'nt' ' ' |
sed -e 's/ CP /n/g'
-e 's/Mr[.] Mime/Mr.Mime/g'
-e 's/ */ /g'
-e 's/%//g'






share|improve this answer














share|improve this answer



share|improve this answer








edited Feb 21 at 12:12









dessert

20k55795




20k55795










answered Feb 21 at 11:58









glenn jackman

12.1k2442




12.1k2442











  • Useless use of cat. Otherwise +1.
    – David Foerster
    Feb 21 at 12:29










  • @zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick ✓ left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
    – pa4080
    Feb 24 at 8:38
















  • Useless use of cat. Otherwise +1.
    – David Foerster
    Feb 21 at 12:29










  • @zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick ✓ left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
    – pa4080
    Feb 24 at 8:38















Useless use of cat. Otherwise +1.
– David Foerster
Feb 21 at 12:29




Useless use of cat. Otherwise +1.
– David Foerster
Feb 21 at 12:29












@zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick ✓ left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
– pa4080
Feb 24 at 8:38




@zotteken: If this answer was helpful to you, then please consider marking it as the accepted answer (by click on the grey tick ✓ left to it) so others may more easily find it in the future. This is also a polite way to thank the person answering your question for helping you out.
– pa4080
Feb 24 at 8:38

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1008356%2ffind-replace-special-characters-in-text-file-using-bash-script%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

pylint3 and pip3 broken

Missing snmpget and snmpwalk

How to enroll fingerprints to Ubuntu 17.10 with VFS491