How do I make a conditional search and replace that will add a line between two lines with different match criteria?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP








up vote
0
down vote

favorite












I have a text file many thousands of lines long with ASCII and non-ACII characters. It is supposed to follow a pattern of



First line: only non-ASCII characters
Second line: only non-ASCII characters
Third line: only ASCII characters
Fourth line: mix of ASCII and non-ASCII characters


Unfortunately, the reality is that it looks something like the following example, where in the middle it is missing the line that mixes ASCII and non-ASCII characters:



日本語のみ
日本語のみ
English words only
English and 日本語
日本語のみ
日本語のみ
English words only
日本語のみ
日本語のみ
English words only
English and 日本語


Fortunately, as far as I can tell, it is only the line that mixes ASCII and non-ASCII characters that is sometimes absent. Meaning that what should be groups of 4 lines are sometimes groups of only 3.



To fix the file, I need to:



  1. Search for every line with only ASCII characters.

  2. Test the line following to see if it contains only non-ASCII.

  3. If so, insert a placeholder line following the ASCII only line.

The result should be:



日本語のみ
日本語のみ
English words only
English and 日本語
日本語のみ
日本語のみ
English words only
+Aあ+
日本語のみ
日本語のみ
English words only
English and 日本語


(I chose to make the placeholder +Aあ+ so that it will conform to the mix of ASCII and non-ASCII as the lines it is standing in for.)



I've found I can use sed to insert new lines sed -e "/this is existing text/a'this is a new line'" < file.text. And I've learned I can search for ASCII characters with sed using LC_ALL=C and [d0-d127].



However, I'm unclear on how to make a conditional separate from the search. I mean, I could insert a line after every instance of ASCII only characters, but how do I make a search that inserts a line when an all ASCII line is found and the next line is only non-ASCII?



Please note that I am not particular about using sed. If an answer can be provided using Gedit, LibreOffice, or any command line operation, that would be great.







share|improve this question


























    up vote
    0
    down vote

    favorite












    I have a text file many thousands of lines long with ASCII and non-ACII characters. It is supposed to follow a pattern of



    First line: only non-ASCII characters
    Second line: only non-ASCII characters
    Third line: only ASCII characters
    Fourth line: mix of ASCII and non-ASCII characters


    Unfortunately, the reality is that it looks something like the following example, where in the middle it is missing the line that mixes ASCII and non-ASCII characters:



    日本語のみ
    日本語のみ
    English words only
    English and 日本語
    日本語のみ
    日本語のみ
    English words only
    日本語のみ
    日本語のみ
    English words only
    English and 日本語


    Fortunately, as far as I can tell, it is only the line that mixes ASCII and non-ASCII characters that is sometimes absent. Meaning that what should be groups of 4 lines are sometimes groups of only 3.



    To fix the file, I need to:



    1. Search for every line with only ASCII characters.

    2. Test the line following to see if it contains only non-ASCII.

    3. If so, insert a placeholder line following the ASCII only line.

    The result should be:



    日本語のみ
    日本語のみ
    English words only
    English and 日本語
    日本語のみ
    日本語のみ
    English words only
    +Aあ+
    日本語のみ
    日本語のみ
    English words only
    English and 日本語


    (I chose to make the placeholder +Aあ+ so that it will conform to the mix of ASCII and non-ASCII as the lines it is standing in for.)



    I've found I can use sed to insert new lines sed -e "/this is existing text/a'this is a new line'" < file.text. And I've learned I can search for ASCII characters with sed using LC_ALL=C and [d0-d127].



    However, I'm unclear on how to make a conditional separate from the search. I mean, I could insert a line after every instance of ASCII only characters, but how do I make a search that inserts a line when an all ASCII line is found and the next line is only non-ASCII?



    Please note that I am not particular about using sed. If an answer can be provided using Gedit, LibreOffice, or any command line operation, that would be great.







    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have a text file many thousands of lines long with ASCII and non-ACII characters. It is supposed to follow a pattern of



      First line: only non-ASCII characters
      Second line: only non-ASCII characters
      Third line: only ASCII characters
      Fourth line: mix of ASCII and non-ASCII characters


      Unfortunately, the reality is that it looks something like the following example, where in the middle it is missing the line that mixes ASCII and non-ASCII characters:



      日本語のみ
      日本語のみ
      English words only
      English and 日本語
      日本語のみ
      日本語のみ
      English words only
      日本語のみ
      日本語のみ
      English words only
      English and 日本語


      Fortunately, as far as I can tell, it is only the line that mixes ASCII and non-ASCII characters that is sometimes absent. Meaning that what should be groups of 4 lines are sometimes groups of only 3.



      To fix the file, I need to:



      1. Search for every line with only ASCII characters.

      2. Test the line following to see if it contains only non-ASCII.

      3. If so, insert a placeholder line following the ASCII only line.

      The result should be:



      日本語のみ
      日本語のみ
      English words only
      English and 日本語
      日本語のみ
      日本語のみ
      English words only
      +Aあ+
      日本語のみ
      日本語のみ
      English words only
      English and 日本語


      (I chose to make the placeholder +Aあ+ so that it will conform to the mix of ASCII and non-ASCII as the lines it is standing in for.)



      I've found I can use sed to insert new lines sed -e "/this is existing text/a'this is a new line'" < file.text. And I've learned I can search for ASCII characters with sed using LC_ALL=C and [d0-d127].



      However, I'm unclear on how to make a conditional separate from the search. I mean, I could insert a line after every instance of ASCII only characters, but how do I make a search that inserts a line when an all ASCII line is found and the next line is only non-ASCII?



      Please note that I am not particular about using sed. If an answer can be provided using Gedit, LibreOffice, or any command line operation, that would be great.







      share|improve this question














      I have a text file many thousands of lines long with ASCII and non-ACII characters. It is supposed to follow a pattern of



      First line: only non-ASCII characters
      Second line: only non-ASCII characters
      Third line: only ASCII characters
      Fourth line: mix of ASCII and non-ASCII characters


      Unfortunately, the reality is that it looks something like the following example, where in the middle it is missing the line that mixes ASCII and non-ASCII characters:



      日本語のみ
      日本語のみ
      English words only
      English and 日本語
      日本語のみ
      日本語のみ
      English words only
      日本語のみ
      日本語のみ
      English words only
      English and 日本語


      Fortunately, as far as I can tell, it is only the line that mixes ASCII and non-ASCII characters that is sometimes absent. Meaning that what should be groups of 4 lines are sometimes groups of only 3.



      To fix the file, I need to:



      1. Search for every line with only ASCII characters.

      2. Test the line following to see if it contains only non-ASCII.

      3. If so, insert a placeholder line following the ASCII only line.

      The result should be:



      日本語のみ
      日本語のみ
      English words only
      English and 日本語
      日本語のみ
      日本語のみ
      English words only
      +Aあ+
      日本語のみ
      日本語のみ
      English words only
      English and 日本語


      (I chose to make the placeholder +Aあ+ so that it will conform to the mix of ASCII and non-ASCII as the lines it is standing in for.)



      I've found I can use sed to insert new lines sed -e "/this is existing text/a'this is a new line'" < file.text. And I've learned I can search for ASCII characters with sed using LC_ALL=C and [d0-d127].



      However, I'm unclear on how to make a conditional separate from the search. I mean, I could insert a line after every instance of ASCII only characters, but how do I make a search that inserts a line when an all ASCII line is found and the next line is only non-ASCII?



      Please note that I am not particular about using sed. If an answer can be provided using Gedit, LibreOffice, or any command line operation, that would be great.









      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 27 at 5:44









      muru

      129k19271462




      129k19271462










      asked Apr 27 at 3:00









      Questioner

      1,4382480146




      1,4382480146




















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          2
          down vote



          accepted










          Based on your recent questions it sounds like you have an XY problem



          Here's a sed solution based on @Zanna's answer to your previous question How do I search for lines in a file that only contain ASCII characters and then act on them?



          $ LC_ALL=C sed -E '/^[d0-d127]+$/ $!N; s/n[^d0-d127]+$/n+Aあ+&/;' file
          日本語のみ
          日本語のみ
          English words only
          English and 日本語
          日本語のみ
          日本語のみ
          English words only
          +Aあ+
          日本語のみ
          日本語のみ
          English words only
          English and 日本語





          share|improve this answer




















          • Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
            – Questioner
            May 5 at 4:41

















          up vote
          2
          down vote













          Using awk:



          awk '1; ! /^[x01-x7F]*$/ next getline !/[x01-x7F]/ print "+Aあ+" 1'


          1. Print the input line unconditionally - 1 is a true condition, and the default action in that case is to print.

          2. Then, if it isn't (!) entirely ASCII (/^[x01-x7F]*$/), skip processing more rules (proceeding to the next line, but processing rules from 1).

          3. If it is entirely ASCII, we get the next line getline, and if that doesn't ! have any ASCII characters /[x01-x7F]/ in it, print your placeholder.

          4. Finally print the line we read using getline.

          I'm assuming that your 日本語のみ lines don't have half-width spaces or punctuation (. ! vs 。 !).






          share|improve this answer




















            Your Answer







            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "89"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1028592%2fhow-do-i-make-a-conditional-search-and-replace-that-will-add-a-line-between-two%23new-answer', 'question_page');

            );

            Post as a guest






























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            2
            down vote



            accepted










            Based on your recent questions it sounds like you have an XY problem



            Here's a sed solution based on @Zanna's answer to your previous question How do I search for lines in a file that only contain ASCII characters and then act on them?



            $ LC_ALL=C sed -E '/^[d0-d127]+$/ $!N; s/n[^d0-d127]+$/n+Aあ+&/;' file
            日本語のみ
            日本語のみ
            English words only
            English and 日本語
            日本語のみ
            日本語のみ
            English words only
            +Aあ+
            日本語のみ
            日本語のみ
            English words only
            English and 日本語





            share|improve this answer




















            • Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
              – Questioner
              May 5 at 4:41














            up vote
            2
            down vote



            accepted










            Based on your recent questions it sounds like you have an XY problem



            Here's a sed solution based on @Zanna's answer to your previous question How do I search for lines in a file that only contain ASCII characters and then act on them?



            $ LC_ALL=C sed -E '/^[d0-d127]+$/ $!N; s/n[^d0-d127]+$/n+Aあ+&/;' file
            日本語のみ
            日本語のみ
            English words only
            English and 日本語
            日本語のみ
            日本語のみ
            English words only
            +Aあ+
            日本語のみ
            日本語のみ
            English words only
            English and 日本語





            share|improve this answer




















            • Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
              – Questioner
              May 5 at 4:41












            up vote
            2
            down vote



            accepted







            up vote
            2
            down vote



            accepted






            Based on your recent questions it sounds like you have an XY problem



            Here's a sed solution based on @Zanna's answer to your previous question How do I search for lines in a file that only contain ASCII characters and then act on them?



            $ LC_ALL=C sed -E '/^[d0-d127]+$/ $!N; s/n[^d0-d127]+$/n+Aあ+&/;' file
            日本語のみ
            日本語のみ
            English words only
            English and 日本語
            日本語のみ
            日本語のみ
            English words only
            +Aあ+
            日本語のみ
            日本語のみ
            English words only
            English and 日本語





            share|improve this answer












            Based on your recent questions it sounds like you have an XY problem



            Here's a sed solution based on @Zanna's answer to your previous question How do I search for lines in a file that only contain ASCII characters and then act on them?



            $ LC_ALL=C sed -E '/^[d0-d127]+$/ $!N; s/n[^d0-d127]+$/n+Aあ+&/;' file
            日本語のみ
            日本語のみ
            English words only
            English and 日本語
            日本語のみ
            日本語のみ
            English words only
            +Aあ+
            日本語のみ
            日本語のみ
            English words only
            English and 日本語






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Apr 27 at 12:21









            steeldriver

            62.7k1196164




            62.7k1196164











            • Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
              – Questioner
              May 5 at 4:41
















            • Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
              – Questioner
              May 5 at 4:41















            Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
            – Questioner
            May 5 at 4:41




            Thank you for this help. Sometimes one doesn't know all the problems one will face until one takes a few steps into the task. Even after applying this, I still had to make many manual edits for exceptions and further search and replace tricks for conditions that had not been previously visible. Just the way it goes sometimes.
            – Questioner
            May 5 at 4:41












            up vote
            2
            down vote













            Using awk:



            awk '1; ! /^[x01-x7F]*$/ next getline !/[x01-x7F]/ print "+Aあ+" 1'


            1. Print the input line unconditionally - 1 is a true condition, and the default action in that case is to print.

            2. Then, if it isn't (!) entirely ASCII (/^[x01-x7F]*$/), skip processing more rules (proceeding to the next line, but processing rules from 1).

            3. If it is entirely ASCII, we get the next line getline, and if that doesn't ! have any ASCII characters /[x01-x7F]/ in it, print your placeholder.

            4. Finally print the line we read using getline.

            I'm assuming that your 日本語のみ lines don't have half-width spaces or punctuation (. ! vs 。 !).






            share|improve this answer
























              up vote
              2
              down vote













              Using awk:



              awk '1; ! /^[x01-x7F]*$/ next getline !/[x01-x7F]/ print "+Aあ+" 1'


              1. Print the input line unconditionally - 1 is a true condition, and the default action in that case is to print.

              2. Then, if it isn't (!) entirely ASCII (/^[x01-x7F]*$/), skip processing more rules (proceeding to the next line, but processing rules from 1).

              3. If it is entirely ASCII, we get the next line getline, and if that doesn't ! have any ASCII characters /[x01-x7F]/ in it, print your placeholder.

              4. Finally print the line we read using getline.

              I'm assuming that your 日本語のみ lines don't have half-width spaces or punctuation (. ! vs 。 !).






              share|improve this answer






















                up vote
                2
                down vote










                up vote
                2
                down vote









                Using awk:



                awk '1; ! /^[x01-x7F]*$/ next getline !/[x01-x7F]/ print "+Aあ+" 1'


                1. Print the input line unconditionally - 1 is a true condition, and the default action in that case is to print.

                2. Then, if it isn't (!) entirely ASCII (/^[x01-x7F]*$/), skip processing more rules (proceeding to the next line, but processing rules from 1).

                3. If it is entirely ASCII, we get the next line getline, and if that doesn't ! have any ASCII characters /[x01-x7F]/ in it, print your placeholder.

                4. Finally print the line we read using getline.

                I'm assuming that your 日本語のみ lines don't have half-width spaces or punctuation (. ! vs 。 !).






                share|improve this answer












                Using awk:



                awk '1; ! /^[x01-x7F]*$/ next getline !/[x01-x7F]/ print "+Aあ+" 1'


                1. Print the input line unconditionally - 1 is a true condition, and the default action in that case is to print.

                2. Then, if it isn't (!) entirely ASCII (/^[x01-x7F]*$/), skip processing more rules (proceeding to the next line, but processing rules from 1).

                3. If it is entirely ASCII, we get the next line getline, and if that doesn't ! have any ASCII characters /[x01-x7F]/ in it, print your placeholder.

                4. Finally print the line we read using getline.

                I'm assuming that your 日本語のみ lines don't have half-width spaces or punctuation (. ! vs 。 !).







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 27 at 6:01









                muru

                129k19271462




                129k19271462



























                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1028592%2fhow-do-i-make-a-conditional-search-and-replace-that-will-add-a-line-between-two%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    Trouble downloading packages list due to a “Hash sum mismatch” error

                    How do so many people here on Academia.SE, and in general, afford lavish higher education programs?

                    How do I move numbers in filenames, in a batch renaming operation?