How do I make a conditional search and replace that will add a line between two lines with different match criteria?

Clash Royale CLAN TAG#URR8PPP up vote
0
down vote
favorite
I have a text file many thousands of lines long with ASCII and non-ACII characters. It is supposed to follow a pattern of
First line: only non-ASCII characters
Second line: only non-ASCII characters
Third line: only ASCII characters
Fourth line: mix of ASCII and non-ASCII characters
Unfortunately, the reality is that it looks something like the following example, where in the middle it is missing the line that mixes ASCII and non-ASCII characters:
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
Fortunately, as far as I can tell, it is only the line that mixes ASCII and non-ASCII characters that is sometimes absent. Meaning that what should be groups of 4 lines are sometimes groups of only 3.
To fix the file, I need to:
- Search for every line with only ASCII characters.
- Test the line following to see if it contains only non-ASCII.
- If so, insert a placeholder line following the ASCII only line.
The result should be:
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
+AãÂÂ+
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
(I chose to make the placeholder +AãÂÂ+ so that it will conform to the mix of ASCII and non-ASCII as the lines it is standing in for.)
I've found I can use sed to insert new lines sed -e "/this is existing text/a'this is a new line'" < file.text. And I've learned I can search for ASCII characters with sed using LC_ALL=C and [d0-d127].
However, I'm unclear on how to make a conditional separate from the search. I mean, I could insert a line after every instance of ASCII only characters, but how do I make a search that inserts a line when an all ASCII line is found and the next line is only non-ASCII?
Please note that I am not particular about using sed. If an answer can be provided using Gedit, LibreOffice, or any command line operation, that would be great.
command-line text-processing
add a comment |Â
up vote
0
down vote
favorite
I have a text file many thousands of lines long with ASCII and non-ACII characters. It is supposed to follow a pattern of
First line: only non-ASCII characters
Second line: only non-ASCII characters
Third line: only ASCII characters
Fourth line: mix of ASCII and non-ASCII characters
Unfortunately, the reality is that it looks something like the following example, where in the middle it is missing the line that mixes ASCII and non-ASCII characters:
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
Fortunately, as far as I can tell, it is only the line that mixes ASCII and non-ASCII characters that is sometimes absent. Meaning that what should be groups of 4 lines are sometimes groups of only 3.
To fix the file, I need to:
- Search for every line with only ASCII characters.
- Test the line following to see if it contains only non-ASCII.
- If so, insert a placeholder line following the ASCII only line.
The result should be:
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
+AãÂÂ+
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
(I chose to make the placeholder +AãÂÂ+ so that it will conform to the mix of ASCII and non-ASCII as the lines it is standing in for.)
I've found I can use sed to insert new lines sed -e "/this is existing text/a'this is a new line'" < file.text. And I've learned I can search for ASCII characters with sed using LC_ALL=C and [d0-d127].
However, I'm unclear on how to make a conditional separate from the search. I mean, I could insert a line after every instance of ASCII only characters, but how do I make a search that inserts a line when an all ASCII line is found and the next line is only non-ASCII?
Please note that I am not particular about using sed. If an answer can be provided using Gedit, LibreOffice, or any command line operation, that would be great.
command-line text-processing
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a text file many thousands of lines long with ASCII and non-ACII characters. It is supposed to follow a pattern of
First line: only non-ASCII characters
Second line: only non-ASCII characters
Third line: only ASCII characters
Fourth line: mix of ASCII and non-ASCII characters
Unfortunately, the reality is that it looks something like the following example, where in the middle it is missing the line that mixes ASCII and non-ASCII characters:
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
Fortunately, as far as I can tell, it is only the line that mixes ASCII and non-ASCII characters that is sometimes absent. Meaning that what should be groups of 4 lines are sometimes groups of only 3.
To fix the file, I need to:
- Search for every line with only ASCII characters.
- Test the line following to see if it contains only non-ASCII.
- If so, insert a placeholder line following the ASCII only line.
The result should be:
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
+AãÂÂ+
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
(I chose to make the placeholder +AãÂÂ+ so that it will conform to the mix of ASCII and non-ASCII as the lines it is standing in for.)
I've found I can use sed to insert new lines sed -e "/this is existing text/a'this is a new line'" < file.text. And I've learned I can search for ASCII characters with sed using LC_ALL=C and [d0-d127].
However, I'm unclear on how to make a conditional separate from the search. I mean, I could insert a line after every instance of ASCII only characters, but how do I make a search that inserts a line when an all ASCII line is found and the next line is only non-ASCII?
Please note that I am not particular about using sed. If an answer can be provided using Gedit, LibreOffice, or any command line operation, that would be great.
command-line text-processing
I have a text file many thousands of lines long with ASCII and non-ACII characters. It is supposed to follow a pattern of
First line: only non-ASCII characters
Second line: only non-ASCII characters
Third line: only ASCII characters
Fourth line: mix of ASCII and non-ASCII characters
Unfortunately, the reality is that it looks something like the following example, where in the middle it is missing the line that mixes ASCII and non-ASCII characters:
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
Fortunately, as far as I can tell, it is only the line that mixes ASCII and non-ASCII characters that is sometimes absent. Meaning that what should be groups of 4 lines are sometimes groups of only 3.
To fix the file, I need to:
- Search for every line with only ASCII characters.
- Test the line following to see if it contains only non-ASCII.
- If so, insert a placeholder line following the ASCII only line.
The result should be:
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
+AãÂÂ+
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
(I chose to make the placeholder +AãÂÂ+ so that it will conform to the mix of ASCII and non-ASCII as the lines it is standing in for.)
I've found I can use sed to insert new lines sed -e "/this is existing text/a'this is a new line'" < file.text. And I've learned I can search for ASCII characters with sed using LC_ALL=C and [d0-d127].
However, I'm unclear on how to make a conditional separate from the search. I mean, I could insert a line after every instance of ASCII only characters, but how do I make a search that inserts a line when an all ASCII line is found and the next line is only non-ASCII?
Please note that I am not particular about using sed. If an answer can be provided using Gedit, LibreOffice, or any command line operation, that would be great.
command-line text-processing
edited Apr 27 at 5:44
muru
129k19271462
129k19271462
asked Apr 27 at 3:00
Questioner
1,4382480146
1,4382480146
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
2
down vote
accepted
Based on your recent questions it sounds like you have an XY problem
Here's a sed solution based on @Zanna's answer to your previous question How do I search for lines in a file that only contain ASCII characters and then act on them?
$ LC_ALL=C sed -E '/^[d0-d127]+$/ $!N; s/n[^d0-d127]+$/n+AãÂÂ+&/;' file
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
+AãÂÂ+
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
â Questioner
May 5 at 4:41
add a comment |Â
up vote
2
down vote
Using awk:
awk '1; ! /^[x01-x7F]*$/ next getline !/[x01-x7F]/ print "+AãÂÂ+" 1'
- Print the input line unconditionally -
1is a true condition, and the default action in that case is to print. - Then, if it isn't (
!) entirely ASCII (/^[x01-x7F]*$/), skip processing more rules (proceeding to the next line, but processing rules from 1). - If it is entirely ASCII, we get the next line
getline, and if that doesn't!have any ASCII characters/[x01-x7F]/in it, print your placeholder. - Finally print the line we read using
getline.
I'm assuming that your æÂ¥æÂ¬èªÂã®ã¿ lines don't have half-width spaces or punctuation (. ! vs ãÂÂãÂÂï¼Â).
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Based on your recent questions it sounds like you have an XY problem
Here's a sed solution based on @Zanna's answer to your previous question How do I search for lines in a file that only contain ASCII characters and then act on them?
$ LC_ALL=C sed -E '/^[d0-d127]+$/ $!N; s/n[^d0-d127]+$/n+AãÂÂ+&/;' file
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
+AãÂÂ+
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
â Questioner
May 5 at 4:41
add a comment |Â
up vote
2
down vote
accepted
Based on your recent questions it sounds like you have an XY problem
Here's a sed solution based on @Zanna's answer to your previous question How do I search for lines in a file that only contain ASCII characters and then act on them?
$ LC_ALL=C sed -E '/^[d0-d127]+$/ $!N; s/n[^d0-d127]+$/n+AãÂÂ+&/;' file
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
+AãÂÂ+
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
â Questioner
May 5 at 4:41
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Based on your recent questions it sounds like you have an XY problem
Here's a sed solution based on @Zanna's answer to your previous question How do I search for lines in a file that only contain ASCII characters and then act on them?
$ LC_ALL=C sed -E '/^[d0-d127]+$/ $!N; s/n[^d0-d127]+$/n+AãÂÂ+&/;' file
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
+AãÂÂ+
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
Based on your recent questions it sounds like you have an XY problem
Here's a sed solution based on @Zanna's answer to your previous question How do I search for lines in a file that only contain ASCII characters and then act on them?
$ LC_ALL=C sed -E '/^[d0-d127]+$/ $!N; s/n[^d0-d127]+$/n+AãÂÂ+&/;' file
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
+AãÂÂ+
æÂ¥æÂ¬èªÂã®ã¿
æÂ¥æÂ¬èªÂã®ã¿
English words only
English and æÂ¥æÂ¬èªÂ
answered Apr 27 at 12:21
steeldriver
62.7k1196164
62.7k1196164
Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
â Questioner
May 5 at 4:41
add a comment |Â
Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
â Questioner
May 5 at 4:41
Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
â Questioner
May 5 at 4:41
Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
â Questioner
May 5 at 4:41
add a comment |Â
up vote
2
down vote
Using awk:
awk '1; ! /^[x01-x7F]*$/ next getline !/[x01-x7F]/ print "+AãÂÂ+" 1'
- Print the input line unconditionally -
1is a true condition, and the default action in that case is to print. - Then, if it isn't (
!) entirely ASCII (/^[x01-x7F]*$/), skip processing more rules (proceeding to the next line, but processing rules from 1). - If it is entirely ASCII, we get the next line
getline, and if that doesn't!have any ASCII characters/[x01-x7F]/in it, print your placeholder. - Finally print the line we read using
getline.
I'm assuming that your æÂ¥æÂ¬èªÂã®ã¿ lines don't have half-width spaces or punctuation (. ! vs ãÂÂãÂÂï¼Â).
add a comment |Â
up vote
2
down vote
Using awk:
awk '1; ! /^[x01-x7F]*$/ next getline !/[x01-x7F]/ print "+AãÂÂ+" 1'
- Print the input line unconditionally -
1is a true condition, and the default action in that case is to print. - Then, if it isn't (
!) entirely ASCII (/^[x01-x7F]*$/), skip processing more rules (proceeding to the next line, but processing rules from 1). - If it is entirely ASCII, we get the next line
getline, and if that doesn't!have any ASCII characters/[x01-x7F]/in it, print your placeholder. - Finally print the line we read using
getline.
I'm assuming that your æÂ¥æÂ¬èªÂã®ã¿ lines don't have half-width spaces or punctuation (. ! vs ãÂÂãÂÂï¼Â).
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Using awk:
awk '1; ! /^[x01-x7F]*$/ next getline !/[x01-x7F]/ print "+AãÂÂ+" 1'
- Print the input line unconditionally -
1is a true condition, and the default action in that case is to print. - Then, if it isn't (
!) entirely ASCII (/^[x01-x7F]*$/), skip processing more rules (proceeding to the next line, but processing rules from 1). - If it is entirely ASCII, we get the next line
getline, and if that doesn't!have any ASCII characters/[x01-x7F]/in it, print your placeholder. - Finally print the line we read using
getline.
I'm assuming that your æÂ¥æÂ¬èªÂã®ã¿ lines don't have half-width spaces or punctuation (. ! vs ãÂÂãÂÂï¼Â).
Using awk:
awk '1; ! /^[x01-x7F]*$/ next getline !/[x01-x7F]/ print "+AãÂÂ+" 1'
- Print the input line unconditionally -
1is a true condition, and the default action in that case is to print. - Then, if it isn't (
!) entirely ASCII (/^[x01-x7F]*$/), skip processing more rules (proceeding to the next line, but processing rules from 1). - If it is entirely ASCII, we get the next line
getline, and if that doesn't!have any ASCII characters/[x01-x7F]/in it, print your placeholder. - Finally print the line we read using
getline.
I'm assuming that your æÂ¥æÂ¬èªÂã®ã¿ lines don't have half-width spaces or punctuation (. ! vs ãÂÂãÂÂï¼Â).
answered Apr 27 at 6:01
muru
129k19271462
129k19271462
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1028592%2fhow-do-i-make-a-conditional-search-and-replace-that-will-add-a-line-between-two%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password