C15orf52

From Wikipedia the free encyclopedia

Chromosome 15 open reading frame 52 is a human protein encoded by the C15orf52 gene, its function is poorly understood.

Gene[edit]

C15orf52 is a gene located on the reverse strand of chromosome 15 in the species Homo sapiens at locus 15q15.1. The gene is 9,516 base pairs long including introns and exons.[1] The gene contains 12 distinct introns, 11 exons, produces 7 different mRNAs, and 6 alternatively spliced variants.[2]

Promoter[edit]

The promoter region upstream of the gene contains several transcription factors that regulate the expression of the C15orf52 gene.

mRNA[edit]

The linear mRNA is 5344 base pairs long.[3] The mRNA contains a short 5’ untranslated region of 15 base pairs and a long 3’ untranslated region of 3782 base pairs. In the long 3' untranslated region, three specific miRNA binding sites were found for has-miR-147b, hsa-miR-203a-3p.1, and has-miR-214-5p miRNAs.

Specific nucleotide binding sites for known miRNAs in 3' UTR of C15orf52 mRNA conserved among vertebrates.

Protein[edit]

General Properties[edit]

The protein contains a domain of unknown function (DUF4594 from amino acid 185 to 350).[1] The protein, C15orf52, is a 534 amino acid long protein weighing 57.325 kDa found in Homo sapiens.[4]

Structure[edit]

Primary[edit]

Comparison of the amino acid composition to "Homo sapiens" revealed certain amino acids with differing frequencies than other proteins in humans.[5] Phenylalanine, Tyrosine, and Asparagine were all found in lower frequencies than other proteins in humans. Glycine and Arginine were found at higher frequencies than other proteins in humans. The isoelectric point of the protein is 9.457, indicating a basic protein at a normal physiological pH of 7.4.

Secondary[edit]

C15orf52 has a coiled coil domain spanning amino acids 60-97 containing alpha helices.[6]

Tertiary[edit]

The tertiary structure of this protein is still unknown to the scientific community and is often up for debate.

Subcellular Localization[edit]

There are no transmembrane sequences detected in the C15orf52 protein.[7] C15orf52 is also predicted to be a non-cytoplasmic soluble protein[7] likely to be found as a nuclear protein.[8]

Post-Translational Modifications[edit]

The protein has been experimentally observed with phosphorylation at serines found at two locations, S201 [9] and S392.[10] N-terminal acetylations, C-glycosylations, glycations, leucine rich nuclear export signals, sumoylation, and PEST motifs were all predicted across orthologs for this protein.[11]

Interacting Proteins[edit]

Two proteins, THO complex subunit 1 (THOC1) and THO complex subunit 7 (THOC7) were found to interact with C15orf52 using anti-tag coimmunoprecipitation.[12] THOC1 is a component of the THO subcomplex of the TREX complex that is thought to couple mRNA transcription, processing and nuclear export. It is also involved in an apoptotic pathway characterized by activation of caspase-6. THOC7 is also part of the same subcomplex and is required for efficient export of polyadenylated RNA. Ring finger protein 2 (RNF2) and SUZ12 polycomb repressive complex 2 subunit (SUZ12) were also indicated as interacting proteins.[13] RNF2 is part of a polycomb group of proteins that are important for transcription repression of various genes. It also possess ubiquitin ligase activity. SUZ12 is also a polycomb group protein and part of a complex that methylates lysines of histones and also is involved with repression of genes.

Homology[edit]

Paralogs[edit]

There are no known complete paralogs for the C15orf52 protein. There is a homologous domain found in Coiled Coil Domain Containing Protein 9 (CCDC9) that is paralogous to the C15orf52 protein from amino acid 9 to 55 of CCDC9. This domain is found in primates to mollusks. This CCDC9 domain is not found in any unicellular organisms or multicellular organisms more distant than mollusks.

Orthologs[edit]

Orthologs of the C15orf52 protein were traced back to cartilaginous fishes. None were found in any multicellular organisms more distant than cartilaginous fishes or unicellular organisms.

Common Name Genus & Species Date of Divergence from Humans (MYA) Accession Number Sequence Length Sequence Identity to Humans Sequence Similarity to Humans
Human Homo sapiens 0 NP_997263.2 534 100% 100%
Brandt's bat Myotis brandtii 97.5 XP_005860303.2 564 76% 79%
Cattle Bos taurus 97.5 XP_015328613.1 577 75% 77%
Mouflon Ovis musimon 97.5 XP_014962253.1 513 69% 74%
House mouse Mus musculus 90.5 NP_001001982.2 545 63% 71%
Gekko Gekko japonicus 320.5 XP_015282702.1 591 41% 55%
Zebra finch Taeniopygia guttata 320.5 XP_012429790.1 625 41% 57%
Carolina anole Anolis carolinensis 320.5 XP_008115041.1 496 39% 55%
Green sea turtle Chelonia mydas 320.5 XP_007069465.1 743 39% 57%
Chicken Gallus gallus 320.5 XP_004941352.2 637 38% 54%
Golden eagle Aquila chrysaetos canadensis 320.5 XP_011595804.1 647 38% 53%
Western clawed frog Xenopus tropicalis 355.7 XP_004917355.1 507 37% 54%
Common garter snake Thamnophis sirtalis 320.5 XP_013925154.1 586 37% 52%
Mexican tetra Astyanax mexicanus 429.6 XP_007230442.1 354 37% 52%
Spotted gar Lepisosteus oculatus 429.6 XP_015206400.1 674 37% 52%
Common starling Sturnus vulgaris 320.5 XP_014734365.1 646 37% 53%
Chinese alligator Alligator sinensis 320.5 XP_014372849.1 504 37% 54%
Burmese python Python bivittatus 320.5 XP_007429068.1 587 37% 53%
Zebra fish Danio rerio 429.6 XP_001337385.3 516 32% 51%
Australian ghostshark Callorhinchus milii 482.9 XP_007891400.1 692 29% 45%
Pufferfish Takifugu rubripes 429.6 XP_011614636.1 525 35% 51%

Divergence[edit]

A comparison of the corrected distances of C15orf52 with the rapidly mutating Fibrinogen Alpha protein and the slowly mutating Cytochrome C protein is shown below. The paralogous domain in CCDC9 is also shown below. Overall, C15orf52 changes fairly rapidly as a whole, however the paralogous domain does not, which may point to functionality as this domain is well conserved.

Expression[edit]

Origin of cDNAs of C15orf52 shows that the gene is expressed in numerous locations such as primary and secondary digestive organs (pancreas, stomach, liver, etc.), nervous system (brain, retina, lens), skin, reproductive organs, bones, and many other tissues suggesting a fairly nonspecialized function.[3] However, C15orf52 protein is relatively over-expressed in the colon, peripheral blood mononuclear cells, testis, and rectum.[14] Application of RNA-seq to plasma extracellular RNA profiles indicated C15orf52 as the most abundant mRNA present, possibly indicating some role outside of the cell.[15] In mice, the expression pattern of C15orf52, as well as TCEA3 and FHOD3, two other genes studied, was found to be similar to that of well-characterized genes known to be associated with heart development such as BVES and CXCL12.[16] However C15orf52 was not detected before embryological day 9.5 in the tail area and its exact function is not yet known.[16]

Clinical Significance[edit]

Diseases associated with C15orf52 include colorectal cancer where the protein was over-expressed in tumor cells.[14]

References[edit]

  1. ^ a b NCBI (National Center for Biotechnology Information) gene entry on C15orf52 [1]
  2. ^ AceView entry on C15orf52 gene
  3. ^ a b NCBI (National Center for Biotechnology Information) nucleotide entry on C15orf52 [2]
  4. ^ NCBI (National Center for Biotechnology Information) protein entry on C15orf52 [3]
  5. ^ SDSC Biology WorkBench 3.2 - Statistical Analysis of Primary Structure tool http://seqtool.sdsc.edu/CGI/BW.cgi#[permanent dead link]!
  6. ^ UniProtKB entry on C15orf52 [www.uniprot.org/uniprot/Q6ZUT6]
  7. ^ a b SOSUI http://harrier.nagahama-i-bio.ac.jp/sosui/sosui_submit.html
  8. ^ Reinhardt's method
  9. ^ Bian, Yangyang; Song, Chunxia; Cheng, Kai; Dong, Mingming; Wang, Fangjun; Huang, Junfeng; Sun, Deguang; Wang, Liming; Ye, Mingliang; Zou, Hanfa (2014-01-16). "An enzyme assisted RP-RPLC approach for in-depth analysis of human liver phosphoproteome". Journal of Proteomics. 96: 253–262. doi:10.1016/j.jprot.2013.11.014. ISSN 1876-7737. PMID 24275569.
  10. ^ Olsen, Jesper V.; Blagoev, Blagoy; Gnad, Florian; Macek, Boris; Kumar, Chanchal; Mortensen, Peter; Mann, Matthias (2006-11-03). "Global, in vivo, and site-specific phosphorylation dynamics in signaling networks". Cell. 127 (3): 635–648. doi:10.1016/j.cell.2006.09.026. ISSN 0092-8674. PMID 17081983.
  11. ^ ExPASy proteomic tools. http://www.expasy.org/proteomics
  12. ^ Hein, Marco Y.; Hubner, Nina C.; Poser, Ina; Cox, Jürgen; Nagaraj, Nagarjuna; Toyoda, Yusuke; Gak, Igor A.; Weisswange, Ina; Mansfeld, Jörg; Buchholz, Frank; Hyman, Anthony A. (2015-10-22). "A human interactome in three quantitative dimensions organized by stoichiometries and abundances". Cell. 163 (3): 712–723. doi:10.1016/j.cell.2015.09.053. ISSN 1097-4172. PMID 26496610.
  13. ^ Cao, Qi; Wang, Xiaoju; Zhao, Meng; Yang, Rendong; Malik, Rohit; Qiao, Yuanyuan; Poliakov, Anton; Yocum, Anastasia K.; Li, Yong; Chen, Wei; Cao, Xuhong (2014). "The central role of EED in the orchestration of polycomb group complexes". Nature Communications. 5: 3127. doi:10.1038/ncomms4127. ISSN 2041-1723. PMC 4073494. PMID 24457600.
  14. ^ a b GeneCards® entry on C15orf52
  15. ^ Yuan, Tiezheng; Huang, Xiaoyi; Woodcock, Mark; Du, Meijun; Dittmar, Rachel; Wang, Yuan; Tsai, Susan; Kohli, Manish; Boardman, Lisa; Patel, Tushar; Wang, Liang (2016-01-20). "Plasma extracellular RNA profiles in healthy and cancer patients". Scientific Reports. 6 (1): 19413. doi:10.1038/srep19413. ISSN 2045-2322. PMC 4726401.
  16. ^ a b Xu, Xiu Qin; Soo, Set Yen; Sun, William; Zweigerdt, Robert (September 2009). "Global expression profile of highly enriched cardiomyocytes derived from human embryonic stem cells". Stem Cells. 27 (9): 2163–2174. doi:10.1002/stem.166. ISSN 1549-4918. PMID 19658189.