From early prognosis, progression monitoring, to individual therapeutic targets and drug therapy, proteomics has become an emerging field in cancer research [1,2]. There are numerous machine learning (ML) models to detect cancers from image data , but there is a lack of ML model to prognosis cancer from complex protein dataset. Even though there is a diverse protein dataset available online, it is essential to evaluate them based on available statistical tools, so that the dataset can be used further for feature extraction tasks and development of ML model for cancer prognostication. Our study aims to evaluate different protein-based cancer datasets, statistical evaluation (PCA, K-mean clustering etc.) and propose a well-annotated protein dataset for future model.
The implication of the study will be in cancer diagnosis and prognosis, biomarker development for numerous cancers and will open a horizon to utilise ML models for training and testing on vast, well-annotated, publicly available proteomic dataset and validating for numerous human diseases.
- Proficiency in Python,
- Working knowledge with HPC facility and deep learning models.
- Knowledge on protein chemistry/biochemistry is a plus point.
- Srinivas, P. R., Srivastava, S., Hanash, S., & Wright Jr, G. L. (2001). Proteomics in early detection of cancer. Clinical chemistry, 47(10), 1901-1911.
- Shruthi, B. S., & Palani Vinodhkumar, S. (2016). Proteomics: A new perspective for cancer. Advanced biomedical research, 5.
- Yassin, N. I., Omran, S., El Houby, E. M., & Allam, H. (2018). Machine learning techniques for breast cancer computer aided diagnosis using different image modalities: A systematic review. Computer methods and programs in biomedicine, 156, 25-45.