- トップ
- > English
- > Researchers
- > Research Institute
- > Achievement
- > data_science
- > Search for sequence responsible for trans-splicing using deep learning and linear classification
Search for sequence responsible for trans-splicing using deep learning and linear classification
In general, splicing is the process of removing introns from mRNA precursors to synthesize mature mRNA. Trans-splicing is known to exist in all species except vertebrates, where splicing occurs between different RNA molecules. The sequence of the splicing site is very similar between the two types of splicing. The two types of splicing are similar in the sequence of the site where splicing occurs, and although experiments have been conducted to distinguish between the two, no clear characterization has been made. In ascidians, which are progressively located near the boundary between invertebrates and vertebrates, about half of all genes undergo trans-splicing, and the 5' end of the precursor, called the outron, is replaced by a leader sequence. In this study, we used ascidians to explore the mechanism, biology, and evolutionary significance of trans-splicing, and searched for sequences that distinguish between the two types of splicing by using our own deep convolutional learning and linear classification. We extracted RNA from Catahoula tail bud embryos and performed RNA-seq using the 5' cap to determine the presence of trans-splicing and acceptor site data. For genes that undergo general splicing, we extracted the acceptor sequence of the first intron from public data. From these data, we created training and test data with position and frequency information to see if they could be classified using deep learning and linear classification. We were able to distinguish between the two splicing types with 90% and 80% recognition accuracy in both deep learning and linear classification, respectively. In order to find out the features involved in the classification, we performed linear classification by shifting the positions of the sequences to be trained, and found that up to 30 bases downstream of the splice site played a significant role in the distinction. This study suggests that the two types of splicing have distinct sequence features and that the RNA sequence consisting of three specific nucleotides is important in distinguishing them.