Checking if each 'block' has less then 5 lines with specific character

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP








up vote
0
down vote

favorite












I have a file that has 5 'blocks' and looks like this:



AACP_AGRFC Agrobacterium fabrum A9CHM9 PDB; 2JQ4; NMR; -; A=1-83.
PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.

AADB1_KLEPN Klebsiella pneumoniae. P0AE05 PDB; 4WQK_GOL.pdb; X-ray; 1.48 A; A=1-177.
PDB; 4WQL_GOL.pdb; X-ray; 1.73 A; A=1-177.
PDB; 5KQJ; NMR; -; A=1-177.

AAKB2_RAT Rattus norvegicus Q9QZH4 PDB; 2LU3; NMR; -; A=67-163.
PDB; 2LU4; NMR; -; A=67-163.
PDB; 4Y0G_GOL.pdb; X-ray; 1.60 A; A/B=74-155.
PDB; 4YEE_GOL.pdb; X-ray; 2.00 A; A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R=74-155.

AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.


Each line is different in size but we are looking only for specific column, we are looking at columns where are X-ray and NMR (they are always in same column) and we want to check if under each 'block' there are >=5 lines that under that column has X-ray. If it is the case then we want to print that block. If it is not the case then we want to remove whole block. So expected result should look like this:



AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.


PS. we cannot take ; as a delimiter for columns but we know these columns where X-ray and NMR are located, are always PDB; XXXX(.pdb); X-ray or NMR.



Does anybody has an idea how this can be done in bash? Thanks










share|improve this question





















  • Have you considered a python script? It's probably better suited for this.
    – Katu
    Mar 8 at 9:00










  • hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
    – sergio
    Mar 8 at 9:02










  • @steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that have X-ray and NMR. I am only interested in counting number of X-ray within each block. So if number of X-ray is >=5 within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.
    – sergio
    Mar 8 at 9:18














up vote
0
down vote

favorite












I have a file that has 5 'blocks' and looks like this:



AACP_AGRFC Agrobacterium fabrum A9CHM9 PDB; 2JQ4; NMR; -; A=1-83.
PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.

AADB1_KLEPN Klebsiella pneumoniae. P0AE05 PDB; 4WQK_GOL.pdb; X-ray; 1.48 A; A=1-177.
PDB; 4WQL_GOL.pdb; X-ray; 1.73 A; A=1-177.
PDB; 5KQJ; NMR; -; A=1-177.

AAKB2_RAT Rattus norvegicus Q9QZH4 PDB; 2LU3; NMR; -; A=67-163.
PDB; 2LU4; NMR; -; A=67-163.
PDB; 4Y0G_GOL.pdb; X-ray; 1.60 A; A/B=74-155.
PDB; 4YEE_GOL.pdb; X-ray; 2.00 A; A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R=74-155.

AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.


Each line is different in size but we are looking only for specific column, we are looking at columns where are X-ray and NMR (they are always in same column) and we want to check if under each 'block' there are >=5 lines that under that column has X-ray. If it is the case then we want to print that block. If it is not the case then we want to remove whole block. So expected result should look like this:



AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.


PS. we cannot take ; as a delimiter for columns but we know these columns where X-ray and NMR are located, are always PDB; XXXX(.pdb); X-ray or NMR.



Does anybody has an idea how this can be done in bash? Thanks










share|improve this question





















  • Have you considered a python script? It's probably better suited for this.
    – Katu
    Mar 8 at 9:00










  • hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
    – sergio
    Mar 8 at 9:02










  • @steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that have X-ray and NMR. I am only interested in counting number of X-ray within each block. So if number of X-ray is >=5 within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.
    – sergio
    Mar 8 at 9:18












up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have a file that has 5 'blocks' and looks like this:



AACP_AGRFC Agrobacterium fabrum A9CHM9 PDB; 2JQ4; NMR; -; A=1-83.
PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.

AADB1_KLEPN Klebsiella pneumoniae. P0AE05 PDB; 4WQK_GOL.pdb; X-ray; 1.48 A; A=1-177.
PDB; 4WQL_GOL.pdb; X-ray; 1.73 A; A=1-177.
PDB; 5KQJ; NMR; -; A=1-177.

AAKB2_RAT Rattus norvegicus Q9QZH4 PDB; 2LU3; NMR; -; A=67-163.
PDB; 2LU4; NMR; -; A=67-163.
PDB; 4Y0G_GOL.pdb; X-ray; 1.60 A; A/B=74-155.
PDB; 4YEE_GOL.pdb; X-ray; 2.00 A; A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R=74-155.

AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.


Each line is different in size but we are looking only for specific column, we are looking at columns where are X-ray and NMR (they are always in same column) and we want to check if under each 'block' there are >=5 lines that under that column has X-ray. If it is the case then we want to print that block. If it is not the case then we want to remove whole block. So expected result should look like this:



AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.


PS. we cannot take ; as a delimiter for columns but we know these columns where X-ray and NMR are located, are always PDB; XXXX(.pdb); X-ray or NMR.



Does anybody has an idea how this can be done in bash? Thanks










share|improve this question













I have a file that has 5 'blocks' and looks like this:



AACP_AGRFC Agrobacterium fabrum A9CHM9 PDB; 2JQ4; NMR; -; A=1-83.
PDB; 4H2W_5GP.pdb; X-ray; 1.95 A; C/D=1-83.
PDB; 4H2X_G5A.pdb; X-ray; 2.15 A; C/D=1-83.
PDB; 4H2Y; X-ray; 2.10 A; C/D=1-83.

AADB1_KLEPN Klebsiella pneumoniae. P0AE05 PDB; 4WQK_GOL.pdb; X-ray; 1.48 A; A=1-177.
PDB; 4WQL_GOL.pdb; X-ray; 1.73 A; A=1-177.
PDB; 5KQJ; NMR; -; A=1-177.

AAKB2_RAT Rattus norvegicus Q9QZH4 PDB; 2LU3; NMR; -; A=67-163.
PDB; 2LU4; NMR; -; A=67-163.
PDB; 4Y0G_GOL.pdb; X-ray; 1.60 A; A/B=74-155.
PDB; 4YEE_GOL.pdb; X-ray; 2.00 A; A/B/C/D/E/F/G/H/I/J/K/L/M/N/O/P/Q/R=74-155.

AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.


Each line is different in size but we are looking only for specific column, we are looking at columns where are X-ray and NMR (they are always in same column) and we want to check if under each 'block' there are >=5 lines that under that column has X-ray. If it is the case then we want to print that block. If it is not the case then we want to remove whole block. So expected result should look like this:



AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.


PS. we cannot take ; as a delimiter for columns but we know these columns where X-ray and NMR are located, are always PDB; XXXX(.pdb); X-ray or NMR.



Does anybody has an idea how this can be done in bash? Thanks







bash






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 8 at 8:43









sergio

736




736











  • Have you considered a python script? It's probably better suited for this.
    – Katu
    Mar 8 at 9:00










  • hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
    – sergio
    Mar 8 at 9:02










  • @steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that have X-ray and NMR. I am only interested in counting number of X-ray within each block. So if number of X-ray is >=5 within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.
    – sergio
    Mar 8 at 9:18
















  • Have you considered a python script? It's probably better suited for this.
    – Katu
    Mar 8 at 9:00










  • hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
    – sergio
    Mar 8 at 9:02










  • @steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that have X-ray and NMR. I am only interested in counting number of X-ray within each block. So if number of X-ray is >=5 within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.
    – sergio
    Mar 8 at 9:18















Have you considered a python script? It's probably better suited for this.
– Katu
Mar 8 at 9:00




Have you considered a python script? It's probably better suited for this.
– Katu
Mar 8 at 9:00












hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
– sergio
Mar 8 at 9:02




hm I have no experience in python at all..have some in perl, but still not enough to deal with this problem..
– sergio
Mar 8 at 9:02












@steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that have X-ray and NMR. I am only interested in counting number of X-ray within each block. So if number of X-ray is >=5 within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.
– sergio
Mar 8 at 9:18




@steeldriver thanks for you comment. Do you mean "under each block"? About the columns..I am looking only to this specific one that have X-ray and NMR. I am only interested in counting number of X-ray within each block. So if number of X-ray is >=5 within the block then print the whole block as it was at the beginning no changes..but if it is less then 5..then remove whole block, like in the example.
– sergio
Mar 8 at 9:18










1 Answer
1






active

oldest

votes

















up vote
3
down vote



accepted










Assuming your criterion can be expressed as the number of lines matching regular expression /PDB; [^;]*; X-ray/ you could do something like



awk -vRS= -F'n' '
c=0; for(i=1;i<=NF;i++) c += $i ~ /PDB; [^;]*; X-ray/ ? 1 : 0 c >= 5
'


or (slightly neater, IMHO)



perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5'


Ex.



$ perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5' file
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.





share|improve this answer




















  • this worked perfectly. Thanks!
    – sergio
    Mar 8 at 10:05










Your Answer







StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "89"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1012989%2fchecking-if-each-block-has-less-then-5-lines-with-specific-character%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
3
down vote



accepted










Assuming your criterion can be expressed as the number of lines matching regular expression /PDB; [^;]*; X-ray/ you could do something like



awk -vRS= -F'n' '
c=0; for(i=1;i<=NF;i++) c += $i ~ /PDB; [^;]*; X-ray/ ? 1 : 0 c >= 5
'


or (slightly neater, IMHO)



perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5'


Ex.



$ perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5' file
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.





share|improve this answer




















  • this worked perfectly. Thanks!
    – sergio
    Mar 8 at 10:05














up vote
3
down vote



accepted










Assuming your criterion can be expressed as the number of lines matching regular expression /PDB; [^;]*; X-ray/ you could do something like



awk -vRS= -F'n' '
c=0; for(i=1;i<=NF;i++) c += $i ~ /PDB; [^;]*; X-ray/ ? 1 : 0 c >= 5
'


or (slightly neater, IMHO)



perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5'


Ex.



$ perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5' file
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.





share|improve this answer




















  • this worked perfectly. Thanks!
    – sergio
    Mar 8 at 10:05












up vote
3
down vote



accepted







up vote
3
down vote



accepted






Assuming your criterion can be expressed as the number of lines matching regular expression /PDB; [^;]*; X-ray/ you could do something like



awk -vRS= -F'n' '
c=0; for(i=1;i<=NF;i++) c += $i ~ /PDB; [^;]*; X-ray/ ? 1 : 0 c >= 5
'


or (slightly neater, IMHO)



perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5'


Ex.



$ perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5' file
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.





share|improve this answer












Assuming your criterion can be expressed as the number of lines matching regular expression /PDB; [^;]*; X-ray/ you could do something like



awk -vRS= -F'n' '
c=0; for(i=1;i<=NF;i++) c += $i ~ /PDB; [^;]*; X-ray/ ? 1 : 0 c >= 5
'


or (slightly neater, IMHO)



perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5'


Ex.



$ perl -F'n' -00ne 'print unless (grep /PDB; [^;]*; X-ray/ @F) < 5' file
AAPK2_HUMAN Homo sapiens P54646 PDB; 2H6D; X-ray; 1.85 A; A=6-279.
PDB; 2LTU; NMR; -; A=282-339.
PDB; 2YZA; X-ray; 3.02 A; A=6-279.
PDB; 3AQV_TAK.pdb; X-ray; 2.08 A; A=6-279.
PDB; 4CFE; X-ray; 3.02 A; A/C=1-552.
PDB; 4CFF; X-ray; 3.92 A; A/C=1-552.
PDB; 4ZHX_4O7_C1V_C2Z.pdb; X-ray; 2.99 A; A/C=2-552.
PDB; 5EZV_C1V_C2Z_STU.pdb; X-ray; 2.99 A; A/C=2-347, A/C=397-552.
PDB; 5ISO_992_STU.pdb; X-ray; 2.63 A; A/C=1-552.

ABC3B_HUMAN Homo sapiens Q9UH17 PDB; 2NBQ; NMR; -; A=187-382.
PDB; 5CQD_GOL.pdb; X-ray; 2.08 A; A/C=187-378.
PDB; 5CQH; X-ray; 1.73 A; A=187-378.
PDB; 5CQI; X-ray; 1.68 A; A=187-378.
PDB; 5CQK_GOL_PGE.pdb; X-ray; 1.88 A; A=187-378.
PDB; 5TD5; X-ray; 1.72 A; A=187-378.
PDB; 5TKM; X-ray; 1.90 A; A/B=1-191.






share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 8 at 9:28









steeldriver

63.2k1198167




63.2k1198167











  • this worked perfectly. Thanks!
    – sergio
    Mar 8 at 10:05
















  • this worked perfectly. Thanks!
    – sergio
    Mar 8 at 10:05















this worked perfectly. Thanks!
– sergio
Mar 8 at 10:05




this worked perfectly. Thanks!
– sergio
Mar 8 at 10:05

















 

draft saved


draft discarded















































 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1012989%2fchecking-if-each-block-has-less-then-5-lines-with-specific-character%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

pylint3 and pip3 broken

Missing snmpget and snmpwalk

How to enroll fingerprints to Ubuntu 17.10 with VFS491