dna1.fasta file has the following contents repetitively.
>gi|142022655|gb|EQ086233.1|91 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
CTCGCGTTGCAGGCCGGCGTGTCGCGCAACGACGTGTGGGGCCTGACGGGCAGGGAGGATCTCGGCGGCG
CCAACTATGCGGTCTTTCGGCTCGAAAGCCAGTTCCAGACCTCCGACGGCGCGCTGACCGTGCCCGGCTC
CGCATTCAGTTCGCAAGCCTACGTCGGGCTCGGCGGCGACTGGGGGACCGTGACGCTCGGGCGCCAGTTC
Code: Open the file, loop through it and read through everything.
file_handler = open("dna1.fasta", "r")
for file_contents in file_handler:
print file_contents
for file_contents in file_handler:
print file_contents
Output
>gi|142022655|gb|EQ086233.1|527 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
GAGAACCGGGAACCGGAACCATGACAGCCCCGCGCCGGTTTTACGCGAGATAGCCGGAAACGCCGTCCCA
GAGCAGTTTCAATGCGGTCACCGCCAGCAATCCGTAGCAGCTCCGGTAGATCAGGCGCTGGTCCAGCCTG
CCGTGAAGCCGCCAGCCGAACACCACGCCGGCCGGAATGGCAAGCAGGCACACCGCCATCAACGCCCAGA
CGTTCGCGGTCGGCTGCACGATCAGCAGCCACGGCACTGCCTTGATCGCATTGCCCACGGTGAAGAACAG
GCTCGTCGTTCCCGCGTACATCTCCTTGCTGAGGCCAAGCGGCAGCAGATACATCGCGAGCGGCGGCCCG
file_handler = open("dna1.fasta", "r")
for file_contents in file_handler:
if file_contents.startswith('>gi'):
print file_contents
Output
>gi|142022655|gb|EQ086233.1|91 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
>gi|142022655|gb|EQ086233.1|304 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
>gi|142022655|gb|EQ086233.1|255 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
>gi|142022655|gb|EQ086233.1|45 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
>gi|142022655|gb|EQ086233.1|396 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
GAGAACCGGGAACCGGAACCATGACAGCCCCGCGCCGGTTTTACGCGAGATAGCCGGAAACGCCGTCCCA
GAGCAGTTTCAATGCGGTCACCGCCAGCAATCCGTAGCAGCTCCGGTAGATCAGGCGCTGGTCCAGCCTG
CCGTGAAGCCGCCAGCCGAACACCACGCCGGCCGGAATGGCAAGCAGGCACACCGCCATCAACGCCCAGA
CGTTCGCGGTCGGCTGCACGATCAGCAGCCACGGCACTGCCTTGATCGCATTGCCCACGGTGAAGAACAG
GCTCGTCGTTCCCGCGTACATCTCCTTGCTGAGGCCAAGCGGCAGCAGATACATCGCGAGCGGCGGCCCG
Code: when the file content has line that 'start with' ">gi", start looping and printing that line. No other contents to be included.
file_handler = open("dna1.fasta", "r")
for file_contents in file_handler:
if file_contents.startswith('>gi'):
print file_contents
Output
>gi|142022655|gb|EQ086233.1|91 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
>gi|142022655|gb|EQ086233.1|304 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
>gi|142022655|gb|EQ086233.1|255 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
>gi|142022655|gb|EQ086233.1|45 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
>gi|142022655|gb|EQ086233.1|396 marine metagenome JCVI_SCAF_1096627390048 genomic scaffold, whole genome shotgun sequence
Codes: if the file content has line start with ">gi" print anything outside of it. This is the negation of the true condition/statement. That's the keyword "not"
file_handler = open("dna1.fasta", "r")
for file_contents in file_handler:
if not file_contents.startswith('>gi'):
print file_contents
Output
AATTACCGTCGCCGCCAAGGAGCAGAGCACGGGGATCGAGCAGGTGAATCAGGCTGTGTCGCAACTCGAC
AATGCGACGCAGCAGAACGCGGCGCTCGTCGAGCAGTCGGCGGCGGCCGCGACATTGCTGCGCGAGCAGG
CCGCCAGGCTCGCGCAGACGGTCGGCGAGTTCAAGCTCGAGGACCGCCGCGCGATGACGTTGCAG
CCGCGAAGGCCGCGTTCGCCACGCCCGCTGCCAACAGCGATCTCGCCGGCACCACGTTGCGTGTCGCAAC
CTACAAGGGTGGCTGGCGCGCGCTGCTGCAGGCGGCCGGGCTGGCCGACACACCGTACCGGATCGACTGG
CGCGAGCTGAACAACGGCGTGCTGCATATCGAGGCGCTCAACGCGGATGCGCTCGACATCGGTTCGGGAA
Making entire block more elegant.
for file_contents in open("dna1.fasta"):
if file_contents.startswith('>gi'): print file_contents
for file_contents in file_handler:
if not file_contents.startswith('>gi'):
print file_contents
Output
AATTACCGTCGCCGCCAAGGAGCAGAGCACGGGGATCGAGCAGGTGAATCAGGCTGTGTCGCAACTCGAC
AATGCGACGCAGCAGAACGCGGCGCTCGTCGAGCAGTCGGCGGCGGCCGCGACATTGCTGCGCGAGCAGG
CCGCCAGGCTCGCGCAGACGGTCGGCGAGTTCAAGCTCGAGGACCGCCGCGCGATGACGTTGCAG
CCGCGAAGGCCGCGTTCGCCACGCCCGCTGCCAACAGCGATCTCGCCGGCACCACGTTGCGTGTCGCAAC
CTACAAGGGTGGCTGGCGCGCGCTGCTGCAGGCGGCCGGGCTGGCCGACACACCGTACCGGATCGACTGG
CGCGAGCTGAACAACGGCGTGCTGCATATCGAGGCGCTCAACGCGGATGCGCTCGACATCGGTTCGGGAA
Making entire block more elegant.
for file_contents in open("dna1.fasta"):
if file_contents.startswith('>gi'): print file_contents