Title Phylogenetic analysis of protein sequences based on a novel k-mer natural vector method.
Author Zhang, YuYan; Wen, Jia; Yau, Stephen S-T
Journal Genomics Publication Year/Month 2019-Dec
PMID 30195069 PMCID -N/A-
Affiliation + expend 1.School of Agriculture and hydraulic Engineering, Suihua University, Suihua 152061, China.

Based on the k-mer model for protein sequence, a novel k-mer natural vector method is proposed to characterize the features of k-mers in a protein sequence, in which the numbers and distributions of k-mers are considered. It is proved that the relationship between a protein sequence and its k-mer natural vector is one-to-one. Phylogenetic analysis of protein sequences therefore can be easily performed without requiring evolutionary models or human intervention. In addition, there exists no a criterion to choose a suitable k, and k has a great influence on obtaining results as well as computational complexity. In this paper, a compound k-mer natural vector is utilized to quantify each protein sequence. The results gotten from phylogenetic analysis on three protein datasets demonstrate that our new method can precisely describe the evolutionary relationships of proteins, and greatly heighten the computing efficiency.

  • Copyright © 2023
    National Institute of Pathogen Biology, CAMS & PUMC, Bejing, China
    All rights reserved.