How to delete lines where certain pattern appears on the specific position

Clash Royale CLAN TAG#URR8PPP up vote
0
down vote
favorite
I am having a file that looks like this:
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
PDB; 5LZW; EM; 3.53 A; ii=1-385.
PDB; 5LZX; EM; 3.67 A; ii=1-385.
PDB; 5LZY; EM; 3.99 A; ii=1-385.
PDB; 5LZZ; EM; 3.47 A; ii=1-385.
From this file I want to match all EM; elements that are found just after PDB; (four letter code); EM;. So under this column either X-ray;, NMR; or EM; can be found. For those lines that have EM; remove them. Is there some bash command that I can use to match these elements and remove these lines?
Importantly when matching it put the space before EM, so match it with space, like EM;.
Expected result is:
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
command-line bash text-processing sed awk
add a comment |Â
up vote
0
down vote
favorite
I am having a file that looks like this:
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
PDB; 5LZW; EM; 3.53 A; ii=1-385.
PDB; 5LZX; EM; 3.67 A; ii=1-385.
PDB; 5LZY; EM; 3.99 A; ii=1-385.
PDB; 5LZZ; EM; 3.47 A; ii=1-385.
From this file I want to match all EM; elements that are found just after PDB; (four letter code); EM;. So under this column either X-ray;, NMR; or EM; can be found. For those lines that have EM; remove them. Is there some bash command that I can use to match these elements and remove these lines?
Importantly when matching it put the space before EM, so match it with space, like EM;.
Expected result is:
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
command-line bash text-processing sed awk
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am having a file that looks like this:
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
PDB; 5LZW; EM; 3.53 A; ii=1-385.
PDB; 5LZX; EM; 3.67 A; ii=1-385.
PDB; 5LZY; EM; 3.99 A; ii=1-385.
PDB; 5LZZ; EM; 3.47 A; ii=1-385.
From this file I want to match all EM; elements that are found just after PDB; (four letter code); EM;. So under this column either X-ray;, NMR; or EM; can be found. For those lines that have EM; remove them. Is there some bash command that I can use to match these elements and remove these lines?
Importantly when matching it put the space before EM, so match it with space, like EM;.
Expected result is:
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
command-line bash text-processing sed awk
I am having a file that looks like this:
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
PDB; 5LZW; EM; 3.53 A; ii=1-385.
PDB; 5LZX; EM; 3.67 A; ii=1-385.
PDB; 5LZY; EM; 3.99 A; ii=1-385.
PDB; 5LZZ; EM; 3.47 A; ii=1-385.
From this file I want to match all EM; elements that are found just after PDB; (four letter code); EM;. So under this column either X-ray;, NMR; or EM; can be found. For those lines that have EM; remove them. Is there some bash command that I can use to match these elements and remove these lines?
Importantly when matching it put the space before EM, so match it with space, like EM;.
Expected result is:
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
command-line bash text-processing sed awk
command-line bash text-processing sed awk
edited Mar 6 at 12:26
dessert
19.9k55795
19.9k55795
asked Mar 6 at 12:06
sergio
786
786
add a comment |Â
add a comment |Â
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
awk can do that:
awk 'if(!($1=="PDB;"&&$3=="EM;"))print' <yourfile
This tests if the first column (by default whitespaces are taken as the delimiter) of the current line is PDB; and the third column is EM; and prints the line only if not both are true.
Output
$ awk 'if(!($1=="PDB;"&&$3=="EM;"))print' <test
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
add a comment |Â
up vote
1
down vote
You could do something like this - using perl's paragraph mode:
$ perl -F'n' -00le 'print join "n", grep !/PDB; ....; EM;/ @F' file
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
add a comment |Â
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
awk can do that:
awk 'if(!($1=="PDB;"&&$3=="EM;"))print' <yourfile
This tests if the first column (by default whitespaces are taken as the delimiter) of the current line is PDB; and the third column is EM; and prints the line only if not both are true.
Output
$ awk 'if(!($1=="PDB;"&&$3=="EM;"))print' <test
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
add a comment |Â
up vote
1
down vote
accepted
awk can do that:
awk 'if(!($1=="PDB;"&&$3=="EM;"))print' <yourfile
This tests if the first column (by default whitespaces are taken as the delimiter) of the current line is PDB; and the third column is EM; and prints the line only if not both are true.
Output
$ awk 'if(!($1=="PDB;"&&$3=="EM;"))print' <test
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
awk can do that:
awk 'if(!($1=="PDB;"&&$3=="EM;"))print' <yourfile
This tests if the first column (by default whitespaces are taken as the delimiter) of the current line is PDB; and the third column is EM; and prints the line only if not both are true.
Output
$ awk 'if(!($1=="PDB;"&&$3=="EM;"))print' <test
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
awk can do that:
awk 'if(!($1=="PDB;"&&$3=="EM;"))print' <yourfile
This tests if the first column (by default whitespaces are taken as the delimiter) of the current line is PDB; and the third column is EM; and prints the line only if not both are true.
Output
$ awk 'if(!($1=="PDB;"&&$3=="EM;"))print' <test
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
edited Mar 6 at 12:47
answered Mar 6 at 12:17
dessert
19.9k55795
19.9k55795
add a comment |Â
add a comment |Â
up vote
1
down vote
You could do something like this - using perl's paragraph mode:
$ perl -F'n' -00le 'print join "n", grep !/PDB; ....; EM;/ @F' file
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
add a comment |Â
up vote
1
down vote
You could do something like this - using perl's paragraph mode:
$ perl -F'n' -00le 'print join "n", grep !/PDB; ....; EM;/ @F' file
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
You could do something like this - using perl's paragraph mode:
$ perl -F'n' -00le 'print join "n", grep !/PDB; ....; EM;/ @F' file
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
You could do something like this - using perl's paragraph mode:
$ perl -F'n' -00le 'print join "n", grep !/PDB; ....; EM;/ @F' file
PEBP1_HUMAN Homo sapiens P30086 PDB; 1BD9; X-ray; 2.05 A; A/B=1-187.
PDB; 1BEH; X-ray; 1.75 A; A/B=1-187.
PDB; 2L7W; NMR; -; A=1-187.
PDB; 2QYQ; X-ray; 1.95 A; A=1-187.
PECA1_HUMAN Homo sapiens P16284 PDB; 2KY5; NMR; -; A=686-738.
PDB; 5C14; X-ray; 2.80 A; A/B=28-229.
PDB; 5GEM; X-ray; 3.01 A; A/B=28-232.
PELO_HUMAN Homo sapiens Q9BRX2 PDB; 1X52; NMR; -; A=261-371.
PDB; 5EO3; X-ray; 2.60 A; A/B=265-385.
answered Mar 6 at 13:18
steeldriver
63.3k1198167
63.3k1198167
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1012393%2fhow-to-delete-lines-where-certain-pattern-appears-on-the-specific-position%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password