Checking if each 'block' has less then 5 lines with specific character
![Creative The name of the picture](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO9GURib1T8z7lCwjOGLQaGtrueEthgQ8LO42ZX8cOfTqDK4jvDDpKkLFwf2J49kYCMNW7d4ABih_XCb_2UXdq5fPJDkoyg7-8g_YfRUot-XnaXkNYycsNp7lA5_TW9td0FFpLQ2APzKcZ/s1600/1.jpg)
![Creative The name of the picture](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYQ0N5W1qAOxLP7t7iOM6O6AzbZnkXUy16s7P_CWfOb5UbTQY_aDsc727chyphenhyphen5W4IppVNernMMQeaUFTB_rFzAd95_CDt-tnwN-nBx6JyUp2duGjPaL5-VgNO41AVsA_vu30EJcipdDG409/s400/Clash+Royale+CLAN+TAG%2523URR8PPP.png)
up vote
0
down vote
favorite
I have a file that has 5 'blocks' and looks like this:
AACP_AGRFC Agrobacterium fabrum A9CHM9 PDB; 2JQ4; NMR; -; A=1-83.
PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.
AADB1_KLEPN Klebsiella pneumoniae. P0AE05 PDB; 4WQK_GOL.pdb; X-ray; 1.48 A; A=1-177.
PDB; 4WQL_GOL.pdb; X-ray; 1.73 A; A=1-177.
PDB; 5KQJ; NMR; -; A=1-177.
AAKB2_RAT Rattus norvegicus Q9QZH4 PDB; 2LU3; NMR; -; A=67-163.
PDB; 2LU4; NMR; -; A=67-163.
PDB; 4Y0G_GOL.pdb; X-ray; 1.60 A; A/B=74-155.
PDB; 4YEE_GOL.pdb; X-ray; 2.00 A; A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R=74-155.
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
Each line is different in size but we are looking only for specific column, we are looking at columns where are X-ray
and NMR
(they are always in same column) and we want to check if under each 'block' there are >=5
lines that under that column has X-ray
. If it is the case then we want to print that block. If it is not the case then we want to remove whole block. So expected result should look like this:
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
PS. we cannot take ;
as a delimiter for columns but we know these columns where X-ray
and NMR
are located, are always PDB; XXXX(.pdb); X-ray or NMR
.
Does anybody has an idea how this can be done in bash? Thanks
bash
add a comment |Â
up vote
0
down vote
favorite
I have a file that has 5 'blocks' and looks like this:
AACP_AGRFC Agrobacterium fabrum A9CHM9 PDB; 2JQ4; NMR; -; A=1-83.
PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.
AADB1_KLEPN Klebsiella pneumoniae. P0AE05 PDB; 4WQK_GOL.pdb; X-ray; 1.48 A; A=1-177.
PDB; 4WQL_GOL.pdb; X-ray; 1.73 A; A=1-177.
PDB; 5KQJ; NMR; -; A=1-177.
AAKB2_RAT Rattus norvegicus Q9QZH4 PDB; 2LU3; NMR; -; A=67-163.
PDB; 2LU4; NMR; -; A=67-163.
PDB; 4Y0G_GOL.pdb; X-ray; 1.60 A; A/B=74-155.
PDB; 4YEE_GOL.pdb; X-ray; 2.00 A; A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R=74-155.
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
Each line is different in size but we are looking only for specific column, we are looking at columns where are X-ray
and NMR
(they are always in same column) and we want to check if under each 'block' there are >=5
lines that under that column has X-ray
. If it is the case then we want to print that block. If it is not the case then we want to remove whole block. So expected result should look like this:
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
PS. we cannot take ;
as a delimiter for columns but we know these columns where X-ray
and NMR
are located, are always PDB; XXXX(.pdb); X-ray or NMR
.
Does anybody has an idea how this can be done in bash? Thanks
bash
Have you considered a python script? It's probably better suited for this.
â Katu
Mar 8 at 9:00
hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
â sergio
Mar 8 at 9:02
@steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that haveX-ray
andNMR
. I am only interested in counting number ofX-ray
within each block. So if number ofX-ray
is>=5
within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.
â sergio
Mar 8 at 9:18
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have a file that has 5 'blocks' and looks like this:
AACP_AGRFC Agrobacterium fabrum A9CHM9 PDB; 2JQ4; NMR; -; A=1-83.
PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.
AADB1_KLEPN Klebsiella pneumoniae. P0AE05 PDB; 4WQK_GOL.pdb; X-ray; 1.48 A; A=1-177.
PDB; 4WQL_GOL.pdb; X-ray; 1.73 A; A=1-177.
PDB; 5KQJ; NMR; -; A=1-177.
AAKB2_RAT Rattus norvegicus Q9QZH4 PDB; 2LU3; NMR; -; A=67-163.
PDB; 2LU4; NMR; -; A=67-163.
PDB; 4Y0G_GOL.pdb; X-ray; 1.60 A; A/B=74-155.
PDB; 4YEE_GOL.pdb; X-ray; 2.00 A; A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R=74-155.
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
Each line is different in size but we are looking only for specific column, we are looking at columns where are X-ray
and NMR
(they are always in same column) and we want to check if under each 'block' there are >=5
lines that under that column has X-ray
. If it is the case then we want to print that block. If it is not the case then we want to remove whole block. So expected result should look like this:
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
PS. we cannot take ;
as a delimiter for columns but we know these columns where X-ray
and NMR
are located, are always PDB; XXXX(.pdb); X-ray or NMR
.
Does anybody has an idea how this can be done in bash? Thanks
bash
I have a file that has 5 'blocks' and looks like this:
AACP_AGRFC Agrobacterium fabrum A9CHM9 PDB; 2JQ4; NMR; -; A=1-83.
PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.
AADB1_KLEPN Klebsiella pneumoniae. P0AE05 PDB; 4WQK_GOL.pdb; X-ray; 1.48 A; A=1-177.
PDB; 4WQL_GOL.pdb; X-ray; 1.73 A; A=1-177.
PDB; 5KQJ; NMR; -; A=1-177.
AAKB2_RAT Rattus norvegicus Q9QZH4 PDB; 2LU3; NMR; -; A=67-163.
PDB; 2LU4; NMR; -; A=67-163.
PDB; 4Y0G_GOL.pdb; X-ray; 1.60 A; A/B=74-155.
PDB; 4YEE_GOL.pdb; X-ray; 2.00 A; A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R=74-155.
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
Each line is different in size but we are looking only for specific column, we are looking at columns where are X-ray
and NMR
(they are always in same column) and we want to check if under each 'block' there are >=5
lines that under that column has X-ray
. If it is the case then we want to print that block. If it is not the case then we want to remove whole block. So expected result should look like this:
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
PS. we cannot take ;
as a delimiter for columns but we know these columns where X-ray
and NMR
are located, are always PDB; XXXX(.pdb); X-ray or NMR
.
Does anybody has an idea how this can be done in bash? Thanks
bash
bash
asked Mar 8 at 8:43
sergio
736
736
Have you considered a python script? It's probably better suited for this.
â Katu
Mar 8 at 9:00
hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
â sergio
Mar 8 at 9:02
@steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that haveX-ray
andNMR
. I am only interested in counting number ofX-ray
within each block. So if number ofX-ray
is>=5
within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.
â sergio
Mar 8 at 9:18
add a comment |Â
Have you considered a python script? It's probably better suited for this.
â Katu
Mar 8 at 9:00
hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
â sergio
Mar 8 at 9:02
@steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that haveX-ray
andNMR
. I am only interested in counting number ofX-ray
within each block. So if number ofX-ray
is>=5
within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.
â sergio
Mar 8 at 9:18
Have you considered a python script? It's probably better suited for this.
â Katu
Mar 8 at 9:00
Have you considered a python script? It's probably better suited for this.
â Katu
Mar 8 at 9:00
hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
â sergio
Mar 8 at 9:02
hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
â sergio
Mar 8 at 9:02
@steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that have
X-ray
and NMR
. I am only interested in counting number of X-ray
within each block. So if number of X-ray
is >=5
within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.â sergio
Mar 8 at 9:18
@steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that have
X-ray
and NMR
. I am only interested in counting number of X-ray
within each block. So if number of X-ray
is >=5
within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.â sergio
Mar 8 at 9:18
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
Assuming your criterion can be expressed as the number of lines matching regular expression /PDB; [^;]*; X-ray/
you could do something like
awk -vRS= -F'n' '
c=0; for(i=1;i<=NF;i++) c += $i ~ /PDB; [^;]*; X-ray/ ? 1 : 0 c >= 5
'
or (slightly neater, IMHO)
perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5'
Ex.
$ perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5' file
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
this worked perfectly. Thanks!
â sergio
Mar 8 at 10:05
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
Assuming your criterion can be expressed as the number of lines matching regular expression /PDB; [^;]*; X-ray/
you could do something like
awk -vRS= -F'n' '
c=0; for(i=1;i<=NF;i++) c += $i ~ /PDB; [^;]*; X-ray/ ? 1 : 0 c >= 5
'
or (slightly neater, IMHO)
perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5'
Ex.
$ perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5' file
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
this worked perfectly. Thanks!
â sergio
Mar 8 at 10:05
add a comment |Â
up vote
3
down vote
accepted
Assuming your criterion can be expressed as the number of lines matching regular expression /PDB; [^;]*; X-ray/
you could do something like
awk -vRS= -F'n' '
c=0; for(i=1;i<=NF;i++) c += $i ~ /PDB; [^;]*; X-ray/ ? 1 : 0 c >= 5
'
or (slightly neater, IMHO)
perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5'
Ex.
$ perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5' file
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
this worked perfectly. Thanks!
â sergio
Mar 8 at 10:05
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
Assuming your criterion can be expressed as the number of lines matching regular expression /PDB; [^;]*; X-ray/
you could do something like
awk -vRS= -F'n' '
c=0; for(i=1;i<=NF;i++) c += $i ~ /PDB; [^;]*; X-ray/ ? 1 : 0 c >= 5
'
or (slightly neater, IMHO)
perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5'
Ex.
$ perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5' file
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
Assuming your criterion can be expressed as the number of lines matching regular expression /PDB; [^;]*; X-ray/
you could do something like
awk -vRS= -F'n' '
c=0; for(i=1;i<=NF;i++) c += $i ~ /PDB; [^;]*; X-ray/ ? 1 : 0 c >= 5
'
or (slightly neater, IMHO)
perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5'
Ex.
$ perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5' file
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.
ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.
answered Mar 8 at 9:28
steeldriver
63.2k1198167
63.2k1198167
this worked perfectly. Thanks!
â sergio
Mar 8 at 10:05
add a comment |Â
this worked perfectly. Thanks!
â sergio
Mar 8 at 10:05
this worked perfectly. Thanks!
â sergio
Mar 8 at 10:05
this worked perfectly. Thanks!
â sergio
Mar 8 at 10:05
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1012989%2fchecking-if-each-block-has-less-then-5-lines-with-specific-character%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);
;
$window.on('scroll', onScroll);
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Have you considered a python script? It's probably better suited for this.
â Katu
Mar 8 at 9:00
hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
â sergio
Mar 8 at 9:02
@steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that have
X-ray
andNMR
. I am only interested in counting number ofX-ray
within each block. So if number ofX-ray
is>=5
within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.â sergio
Mar 8 at 9:18