Introduction:
In this
project, a nucleic acid sequence from the yellow fever mosquito Aedes aegyti’s genomic library will be annotated. All significant biological information will
be derived from the sequence using the most up to date bioinformatics
tool
found on the internet. These programs
are constantly being updated to provide the most relevant information
possible. This information will be used to
identify what
proteins are possibly encoded, exon-intron sites, and promoter
sites.
For this project a nucleic acid sequence obtained from an Aedes aegyti genomic library will
be annotated.
Methods and
Materials:
BLASTn:
GENSCAN:
BLASTn/BLASTx:
CgGPlot/CpGReport:
BLASTn:
When
the whole DNA sample was placed in a BLASTn
many
significant hits resulted. One of these
hits was a
reverse transcriptase from Aedes aegypti another was a ferritin
heavy chain-like protein also from Aedes aegypti. The
BLAST showed that the section of the DNA, base 13236 to 14631,
corresponded to a
section of an Aedes aegypti DNA
sequence that codes for transposable elements.
The section that the sample DNA matches is
part of the coding region for a reverse transcriptase domain (Z86117). This match produced a Bit score of 2409 and
an E value of 0.0.
GENSCAN:
The GENSCAN output showed 4
possible coding sequences. Each sequence
was placed in a BLASTn to search for
similar
sequences.
When the
first sequence presented by GENSCAN was placed in a BLASTn,
the results were not significant. The
sequence was then used in a BLASTx. The results form the BLASTx
showed significant hits to proteins from Anopheles
gambiae. The
functions of these proteins were unknown and thus could not aid in the
identifying of the unknown sequence. No
valuable data was obtained from the BLASTs
of
sequence 1. The GENSCAN results for the first identified coding
sequence shows that it exists on the negative strand of the DNA.
It also shows the possibility for the coding sequence to contain four
introns and a promoter.
Figure 2: Results for the BLASTn of sequence 1
Figure 3: Results for the BLASTx of sequence 1
Figure 4: Results of
the BLASTn for sequence 2
Figure 5: Results of the BLASTx for sequence 2
Sequence 3
of the GENSCAN output gave interesting results.
Just a few hits significant hits came back and these hits were
all
involved with an Abdominal-B protein, but these proteins belonged to a
verity
of different insects. The BLASTx of sequence 3 also returned hit to
Abdominal-B
proteins although these were less significant.
Figure 6: Results of the BLASTn from sequence 3
Figure 7: Results of the BLASTx of sequence 3
The BLASTn
of the final sequence specified by GENSCAN produced very significant
results. These results corresponded to transposable elements found in Aedes aegypti.
The BLASTx of sequence 4 found Significant hits to unnamed Anopheles
gambiae proteins
and to synthetic reverse transcriptases.
Figure 9:
Results of the BLASTx
for sequence 4
GENSCAN
also produced an image that depicts the
possible genes locations along the sample sequence.
50..1490 |
1822..2253 |
2567..2766 |
4117..4418 |
7436..7699 |
8802..9450 |
11715..12048 |
12177..12408 |
12538..12824 |
12877..13147 |
13336..13866 |
14116..14364 |
The original BLASTn
done using this sequence resulted in
significant hits to copia-like transposable element ZebedeeI and
LINE-like
element JAM1 and ferritin heavy chain-like
protein. The results of the GENSCAN search
showed the
presence of four genes in the sequence. The
CpGPlot/CpGReport showed many CpG islands in the areas of the GENSCAN
predicted genes. There are a few
The first
gene predicted by GENSCAN showed no significant hits using BLASTn, but
the BLASTx found significant hits to unknown proteins in Anopheles gambiae. This unknown protein find does not aid
it the annotating of the sequence. The
GENSCAN results, on the other hand, show that this gene has no
predicted pollyA tail or terminal exon; therefore, this most possibly
could be a pseudo-gene brought along by the retrovirus gene, or it
could have been erroneously fragmented during the processes of library
creation.
For the
second, gene predicted by GENSCAN the BLASTn and BLASTx produced
significant results pointing towards a ferritin
heavy chain-like protein from Aedes
aegypti. This
predicted gene would occur on the antisense strand of the provided
sample DNA. According the GENSCAN output,
this predicted gene does have an initiation and termination exon, a
promoter, and a pollyA tail. All these are
need to have an expressible gene.
The third
gene produced by GENSCAN obtained a hit to an Abdominal-B protein. This is gene that aids in the development of
the most posterior end of the thorax (2). This
proposed gene also has an initiation and termination exon, a promoter,
and a pollyA tail according to GENSCAN. The
Abdominal-B protein BLAST results came from an assortment of insects. A reason for this could be that the
abdominal-B protein is a conserved protein that many insects use in
development. This gene existence could
have also resulted from the being carried by the reverse transcriptase.
The
significant hits to the copia-like transposable element ZebedeeI and
LINE-like element JAM1 were repeated with the fourth gene predicted by
GENSCAN. Both BLASTn and BLASTx results
pointed toward reverse transcriptase genes. This
gene has an internal and termination exon present along with a pollyA
tail according to the GENSCAN results, but this gene is lacking a
promoter and an initiation exon according to this information.
This sample of DNA could possibly contain 3 genes: a ferritin heavy chain-like protein, an abdominal-B protein, and a reverse transcriptase. The ferritin and reverse transcriptase genes seem to be the only areas of importance. The abdominal-B protein is most likely an artifact brought along by the reverse transcriptase.
The probable locations
of genes in this Aedes aegypti genomic library
sequence are a ferritin heavy
chain-like protein in the 5000 to 7000 base pair range on the antisense
strand with
an initiation and termination exon, a promoter and a pollyA tail. The second possible gene is a reverse
transcriptase that starts at approximately 1200 base pairs and goes on
to the
end of the sequence. This sequence
contains an internal and terminal exon and a pollyA tail; however, it
is missing
a promoter and an initiation exon. This
truncation
of the sequence could have occurred during the library formation.
Reference:
[1] National
[2] Homeobox
Genes DataBase (200). [Online]. Available: URL http://www.iephb.nw.ru/labs/lab38/spirov/hox_pro/abd-b.html
[