Evaluation and Comparison of Hadoop Technologies for Genetic Data Analyses
Thesis Type | Master |
Thesis Status |
Finished
|
Student | Clemens Banas |
Final |
|
Start |
|
Thesis Supervisor | |
Contact | |
Research Field |
As data volume in Genetics is constantly increasing, it is key to utilize scalable big data technologies to process large genomic studies. The selection of a specific technology is crucial, whereby Apache Hadoop and Apache Spark are two promising technologies to tackle the demands. The aim of this thesis is to compare the advantages/disadvantages of these state-of-the-art technologies and to evaluate them on the three most important genetic data formats FASTQ, BAM and VCF.