What are the goals of analyzing Big Data?
According to P. Bickel, the two main goals of high-dimensional data analysis are to develop effective methods that can accurately predict future observations and at the same time to gain insight into the relationship between the features and response for scientific purposes.
Big Data Characteristics
The volume of health and medical data is expected to raise intensely in the years ahead, usually measured in terabytes, petabytes even yottabytes. Volume refers to the amount of data, while velocity refers to data in motion as well as and the speed and frequency of data creation, processing, and analysis. Complexity and heterogeneity of multiple datasets, which can be structured, semi-structured and unstructured, refer to the variety. Veracity referrers to the data quality, relevance, uncertainty, reliability, and predictive value, while variability regards the consistency of the data over time. The value of the big data refers to their coherent analysis, which should be valuable to the patients and clinicians.
What are the challenges of analyzing Big Data?
First, combined with a large sample size creates issues such as heavy computational cost and algorithmic instability
Second, it is difficult to get useful information in real-time and determine which data should be stored, which should be discarded.
Third, complexity and heterogeneity of multiple datasets, which can be structured, semi-structured, and unstructured
Fourth, about the consistency of the data over time referrers to the data quality, relevance, uncertainty, reliability, and predictive value
To handle the challenges of Big Data, we need new statistical thinking and computational methods.
Different data mining techniques can be applied to these heterogeneous biomedical data sets, such as anomaly detection, clustering, classification, association rules as well as summarization and visualization of those big data sets
This data pre-processing enables to be applied statistical techniques and data mining methods and thus the big data analytics quality and outcomes can improve and can result in discovering of novel knowledge.
In conclusions, big data analytics in medicine and healthcare is a very promising process of integrating, exploring, and analyzing a large amount of complex heterogeneous data with different natures: biomedical data, experimental data, electronic health records data, and social media data.
Reference : Blagoj Ristevski and Ming Chen. 2018. Big Data Analytics in Medicine and Healthcare. De Gruyter, Berlin/Boston.