High-throughput biological data generation has driven the adoption of automated bioinformatics pipelines on high-performance computing (HPC) systems and supercomputers. This systematic review synthesizes 101 studies published between 2018 and 2025, following PRISMA guidelines, to examine workflow management systems (WfMSs) deployed in HPC environments across genomics, transcriptomics, proteomics, and metagenomics domains. We analyzed prominent frameworks including Nextflow, Snakemake, WDL, and CWL, documenting their implementation challenges and emerging solutions. Key challenges identified include scheduler saturation from massive parallelism, I/O bottlenecks on shared file systems, heterogeneous resource allocation, and reproducibility across diverse computing environments. Containerization through Docker and Singularity has emerged as the dominant solution for ensuring portability and reproducibility. Community-driven initiatives like nf-core have accelerated adoption by providing curated, best-practice pipelines. Advanced solutions include HPC-aware scheduling strategies, hybrid cloud-HPC architectures, and GPU integration for machine learning-augmented analyses. While significant progress has been made in automating complex multi-step analyses, continued co-evolution of workflow systems and HPC infrastructure remains essential for handling exascale data volumes and achieving fully reproducible computational biology at scale.
ASHIMGALIYEV M.
PhD, lecturer, department of computer and software engineering, faculty information technologies, L.N. Gumilyov Eurasian national university, Astana, Kazakhstan.
E-mail: ashimgaliyev.medet@gmail.com, https://orcid.org/0009-0003-9829-6187
MUSSABEK M.
Senior lecturer, school of artificial intelligence and data science, Astana IT University, Astana, Kazakhstan
E-mail: miras.k@astanait.edu.kz, https://orcid.org/0009-0009-2353-3524
MATKARIMOV B.
Doctor of technical sciences, professor, lecturer and researcher, department of artificial intelligence technology, faculty information technologies, L.N. Gumilyov Eurasian national university, Astana, Kazakhstan
E-mail: bakhyt.matkarimov@gmail.com, https://orcid.org/0000-0003-0775-7324
ZHUMADILLAYEVA A.K.
Candidate of technical sciences, associate professor, department of computer and software engineering, faculty information technologies, L.N. Gumilyov Eurasian national university, Astana, Kazakhstan.
E-mail: Ainur.Zhumadillayeva@astanait.edu.kz, https://orcid.org/0000-0003-1042-0415
- Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, et al. (2015) Big Data: Astronomical or Genomical? PLoS Biol 13(7): e1002195. https://doi.org/10.1371/journal.pbio.1002195 DOI: https://doi.org/10.1371/journal.pbio.1002195
- Zhou, Y., Kathiresan, N., Yu, Z., Rivera, L. F., Thimma, M., Manickam, K., Chebotarov, D., Mauleon, R., Chougule, K., Wei, S., Gao, T., Green, C. D., Zuccolo, A., Ware, D., Zhang, J., … & Wing, R. A. (2024). A high-performance computational workflow to accelerate GATK SNP detection across a 25-genome dataset. BMC Biology, 22, Article 13. https://doi.org/10.1186/s12915-024-01820-5 DOI: https://doi.org/10.1186/s12915-024-01820-5
- Djaffardjy, M., Marchment, G., Sebé, C., Blanchet, R., Belhajjame, K., Gaignard, A., Lemoine, F., & Cohen-Boulakia, S. (2023). Developing and reusing bioinformatics data analysis pipelines using scientific workflow systems. Computational and Structural Biotechnology Journal, 21, 2075–2085. https://doi.org/10.1016/j.csbj.2023.03.003 DOI: https://doi.org/10.1016/j.csbj.2023.03.003
- Evangelidis, T., & van der Velde, J. (2025). Empowering bioinformatics communities with Nextflow and nf-core. Genome Biology, 26, Article 228. https://doi.org/10.1186/s13059-025-03673-9
- Amstutz, P. (Ed.), Crusoe, M. R. (Ed.), Tijanić, N. (Ed.), Chapman, B., Chilton, J., Heuer, M., Kartashov, A., Leehr, D., Ménager, H., Nedeljkovich, M., Scales, M., Soiland-Reyes, S., & Stojanovic, L. (2016). Common Workflow Language, v1.0. figshare. https://doi.org/10.6084/m9.figshare.3115156.v2
- Vivian, J., Rao, A. A., Nothaft, F. A., Ketchum, C., Armstrong, J., Novak, A., … & Paten, B. (2017). Toil enables reproducible, open source, big biomedical data analyses. Nature Biotechnology, 35(4), 314–316. https://doi.org/10.1038/nbt.3772 DOI: https://doi.org/10.1038/nbt.3772
- Langer, B. E., Amaral, A., Baudement, M. O., et al. (2025). Empowering bioinformatics communities with Nextflow and nf-core. Genome Biology, 26, Article 228. https://doi.org/10.1186/s13059-025-03673-9 DOI: https://doi.org/10.1186/s13059-025-03673-9
- Crusoe, M. R., Abeln, S., Iosup, A., Amstutz, P., Chilton, J., Tijanić, N., … & Goble, C. (2021). Methods included: Standardizing computational reuse and portability with the Common Workflow Language. arXiv. https://doi.org/10.48550/arXiv.2105.07028
- Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., … & Moher, D. (2021). The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ, 372, n71. https://doi.org/10.1136/bmj.n71 DOI: https://doi.org/10.1136/bmj.n71
- Ahmed, A. E., Heldenbrand, J., Asmann, Y., Fadlelmola, F. M., Katz, D. S., Kendig, K., … & Zermeno, J. (2019). Genomic workflow management with Swift/T. PLOS ONE, 14(7), e0211608. https://doi.org/10.1371/journal.pone.0211608 DOI: https://doi.org/10.1371/journal.pone.0211608
- Ahmed, A. E., Allen, J. M., Bhat, T., Burra, P., Fliege, C. E., Hart, S. N., Heldenbrand, J. R., Hudson, M. E., Istanto, D. D., Kalmbach, M. T., Kapraun, G. D., Kendig, K. I., Kendzior, M. C., Klee, E. W., Mattson, N., Ross, C. A., Sharif, S. M., Venkatakrishnan, R., Fadlelmola, F. M., & Mainzer, L. S. (2021). Design considerations for workflow management systems use in production genomics research and the clinic. Scientific reports, 11(1), 21680. https://doi.org/10.1038/s41598-021-99288-8 DOI: https://doi.org/10.1038/s41598-021-99288-8
- Koboldt, D. C., Steinberg, K. M., Larson, D. E., Wilson, R. K., & Mardis, E. R. (2020). Best practices for variant calling in clinical sequencing. Genome Medicine, 12(91). https://doi.org/10.1186/s13073-020-00791-w DOI: https://doi.org/10.1186/s13073-020-00791-w
- Larsonneur, E., Mercier, J., Wiart, N., Le Floch, E., Delhomme, O., & Meyer, V. (2018). Evaluating Workflow Management Systems: A Bioinformatics Use Case. DOI: https://doi.org/10.1109/BIBM.2018.8621141
- Angelova, N., Danis, T., Lagnel, J., Tsigenopoulos, C. S., & Manousaki, T. (2022). SnakeCube: Containerized and automated pipeline for de novo genome assembly in HPC environments. BMC Research Notes, 15, 98. https://doi.org/10.1186/s13104-022-05978-5 SpringerLink+1 DOI: https://doi.org/10.1186/s13104-022-05978-5
- Genome variant calling workflow implementation and deployment in HPC infrastructure. (2021). In 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. https://doi.org/10.1109/BIBM52615.2021.9669519 ResearchGate+1 DOI: https://doi.org/10.1109/BIBM52615.2021.9669519
- Ramos Carneiro, A., Bez, J. L., Osthoff, C., Schnorr, L. M., & Navaux, P. O. A. (2023). Uncovering I/O demands on HPC platforms: Peeking under the hood of Santos Dumont. Journal of Parallel and Distributed Computing, 181, 104768. DOI: https://doi.org/10.1016/j.jpdc.2023.104744
- Visconti, A., Martin, T. C., & Falchi, M. (2018). YAMP: a containerised workflow enabling reproducibility in metagenomics research. GigaScience. DOI: https://doi.org/10.1101/223016
- Mousavi-Derazmahalleh, M., Stott, A., Lines, R., Peverley, G., Nester, G., Simpson, T., Zawierta, M., De La Pierre, M., Bunce, M., & Christophersen, C. T. (2021). eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and Singularity. Molecular ecology resources, 21(5), 1697–1704. https://doi.org/10.1111/1755-0998.13356 DOI: https://doi.org/10.1111/1755-0998.13356
- Budiš, J., Krampl, W., Kucharík, M., Hekel, R., Goga, A., Sitarčík, J., ... & Szemes, T. (2024). SnakeLines: integrated set of computational pipelines for sequencing reads. Journal of Integrative Bioinformatics, 20(3), 20220059. DOI: https://doi.org/10.1515/jib-2022-0059
- Czech, L., & Exposito-Alonso, M. (2022). grenepipe: a flexible, scalable and reproducible pipeline to automate variant calling from sequence reads. Bioinformatics (Oxford, England), 38(20), 4809–4811. https://doi.org/10.1093/bioinformatics/btac600 DOI: https://doi.org/10.1093/bioinformatics/btac600
- Wratten, L., Wilm, A., & Göke, J. (2021). Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. Nature Methods, 18, 1161–1168. DOI: https://doi.org/10.1038/s41592-021-01254-9
- Jalili, V., Afgan, E., Gu, Q., Clements, D., Blankenberg, D., Goecks, J., Taylor, J., & Nekrutenko, A. (2020). The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update. Nucleic acids research, 48(W1), W395–W402. https://doi.org/10.1093/nar/gkaa434 DOI: https://doi.org/10.1093/nar/gkaa434
- Zhou, J., Zhang, B., Li, G., Chen, X., Li, H., Xu, X., Chen, S., He, W., Xu, C., Liu, L., & Gao, X. (2024). An AI agent for fully automated multi-omic analyses. Advanced Science, 11(44), e2407094. https://doi.org/10.1002/advs.202407094 DOI: https://doi.org/10.1002/advs.202407094
- Lang, O., & colleagues. (2022). ScriptManager: an interactive platform for reducing barriers to genomics analysis for novice bioinformaticians. In Proceedings of the PEARC ’22: Practice and Experience in Advanced Research Computing (Article No. 3535161). ACM. https://doi.org/10.1145/3491418.3535161 DOI: https://doi.org/10.1145/3491418.3535161
- Kanitz, A., McLoughlin, M. H., Beckman, L., GA4GH Cloud Workstream, Malladi, V. S., & Ellrott, K. (2024). The GA4GH Task Execution Application Programming Interface: Enabling Easy Multicloud Task Execution. Computing in science & engineering, 26(3), 30–39. https://doi.org/10.1109/mcse.2024.3414994 DOI: https://doi.org/10.1109/MCSE.2024.3414994
- Ewels, P.A., Peltzer, A., Fillinger, S. et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol 38, 276–278 (2020). https://doi.org/10.1038/s41587-020-0439-x DOI: https://doi.org/10.1038/s41587-020-0439-x
- Guo, R., Zhao, Y., Zou, Q., Fang, X., & Peng, S. (2018). Bioinformatics applications on Apache Spark. GigaScience, 7(8), giy098. https://doi.org/10.1093/gigascience/giy098 DOI: https://doi.org/10.1093/gigascience/giy098
- Decap, D., de Schaetzen van Brienen, L., Larmuseau, M., Costanza, P., Herzeel, C., Wuyts, R., Marchal, K., & Fostier, J. (2022). Halvade Somatic: Somatic variant calling with Apache Spark. GigaScience, 11, giab094. https://doi.org/10.1093/gigascience/giab094 DOI: https://doi.org/10.1093/gigascience/giab094
- Wagner, D. D., Garry, D., Krueger, S., Cole, S., Nadon, C., & Greig, A. (2022). VPipe: An automated bioinformatics platform for assembly and management of viral next-generation sequencing data. Microbiology Spectrum, 10(2), e02564-21. https://doi.org/10.1128/spectrum.02564-21 DOI: https://doi.org/10.1128/spectrum.02564-21
- Hitz, B. C., Jin-Wook, L., Jolanki, O., Kagda, M. S., Graham, K., Sud, P., Gabdank, I., Strattan, J. S., Sloan, C. A., Dreszer, T., Rowe, L. D., Podduturi, N. R., Malladi, V. S., Chan, E. T., Davidson, J. M., Ho, M., Miyasato, S., Simison, M., Tanaka, F., Luo, Y., … Cherry, J. M. (2023). The ENCODE Uniform Analysis Pipelines. bioRxiv : the preprint server for biology, 2023.04.04.535623. https://doi.org/10.1101/2023.04.04.535623 DOI: https://doi.org/10.1101/2023.04.04.535623
