Detalji o radu
Broj pregleda rada: 67
Naslov: Quantification of complexity of membrane protein structure and its application in the selection of a non-redundant set of sequences of a higher quality
Godina: 2024
Autori: JADRANKO BATISTA, ŽELJKO MARUŠIĆ, BONO LUČIĆ
Sažetak: To train the protein secondary structure prediction methods more robustly, it is necessary to define the sets that will be the most demanding for the prediction models. When choosing non-redundant sets of proteins, algorithms are used that do not take into account the complexity of their secondary structure when choosing protein structures. Therefore, we defined a procedure for determining the complexity of the protein secondary structure, which refers to the prediction of the number of possible positions of regular secondary structure segments within the protein sequence (binomial model), which is based on counting the total number of possible random redistribution of segments along the protein sequences. Then we wrote two versions of improved algorithms (Algorithms 1 and 2) for extracting a non-redundant set of membrane proteins where the highest possible complexity is used in addition to similarities between sequences. Such sets will be more demanding for secondary structure prediction models, which allows for better model training. This algorithm has good improved properties that lead to the selection of proteins that have multiple segments in the secondary structure and are therefore more difficult to model. With the necessary adaptations, these improved algorithms can also be used for other data and problems in bioinformatics and more generally in the selection of non-redundant sequence sets, data for text mining or other data sets, data for text mining or other learning (ML) data sets.
Vrsta rada: Izvorni znanstveni članak
Izdanje: 3rd Serbian International Conference on Applied Artificial Intelligence (SICAAI)3rd Serbian International Conference on Applied Artificial Intelligence (SICAAI)
Znanstvena područja: INFORMATIČKE ZNANOSTI I BIOINFORMATIKA (RAZVOJ SOFTVERA, INFORMACIJSKE TEHNOLOGIJE, UMJETNA INTELIGENCIJA, OBRADA INFORMACIJE) , BIOFIZIKA I MEDICINSKA FIZIKA
Linkovi:
AAI2024