Introduction
Despite the importance that has been attached to big data, there are inherent dangers of entirely relying on big data to make decisions. The use of big data has been expanded from private business uses to use by the government to allocate critical resources. Despite the portrayal of the advantage of using big data through the inferences that can be created from data analysis, there is a significant risk of bias and misrepresentation of big data inferences (O'Neill 3). Big data lacks objectivity, and overreliance on the same can result in the misallocation of resources, which can be detrimental to the proper use of scarce public resources. This paper will make an assessment on critical scenarios where the use of big data poses a threat due to lack of credibility, misrepresentation, lack of a qualitative dimension of the collected data, and the potential of preprogrammed bias in the algorithmic applications which can be transferred from their creators.
Dangers of Relying On Big Data and Machine Learning Algorithms for Making Decisions
Despite the high dependence of big data and machine learning algorithms today in making decisions, there are structural, technical and human-induced biases which pose a significant threat to the overall credibility of the data and the resulting data outcomes of the decisions that are created from the substantial data inferences (O'Neill 4).
Lack of Big Data Objectivity
Big data and machine learning algorithms are usually data sets that have been programmed to establish common themes in data and do not have any prior objectivity. The lack of objectivity in data analysis makes it difficult to use the same to make objective decisions in today's world (Crawford 2). The world is highly subjective, and the reliance of data inferences that do not align with the highly dynamic context of today makes it insufficient in creating more objective, and problem-focused solutions (O'Neill 6). Many social scientists have turned towards big data and machine learning algorithms to make decisions on social and political aspects, but the concepts used to divide data into objects and create relations is highly subject to the view of the people who programmed the algorithms to make inferences which increases the overall bias (Crawford 1). Therefore, the lack of real-time objectivity of real data makes it highly biased in inferencing the results in decision making, especially in unrelated or closely related areas. For instance, the use of big data on the number of people on social media to assess technology acceptance will be subject to bias due to multiple externalities that will not be captured by the algorithm in making inferences from the big data.
Bias in Collection and Analysis of Data
Despite the high regard associated with big data, the data collection and the analysis continues to pose significant reliability questions on the overall data results. There are many problems with regard to how data is collected and analyzed to make decisions (Crawford 2).As such, this poses a risk to the overall outcomes of the decisions made based on the data sets collected using bias prone methods and analysis. For instance, the data collection is usually incomplete and only focused on specific hashtags to decipher through what people is posting online on social media sites. As such, many people do not use a common approach to express similar information due to the high subjectivity and dynamics in decision making. As such, data collection using such methods results in an adequate amount of data being left out in the resulting data sets which makes the data less representative of the real event or situation and contributes to the irrelevancy of the data used to make decisions on severe problems in the society (Crawford 1).An example of potential data collection and analysis bias is the assumption when collecting data from people using Smart phones and precise techniques that the majority of people use technology. This can be a misrepresentation and can fail to create the right context in society. The data collected will be biased in such a case, and the outcomes will also be biased (O'Neill 6).
Three Ways Biases Can Find Their Way Into Algorithms and How These Biases Can Reinforce Systemic Inequalities
Machine Learning Coding Sampling Bias
During automated data analysis using machine learning, which mines data from the collected data set to make meaning out of the data, there is a significant risk of bias transferred from the human-based bias during the programming of the machine learning software. Prejudices are not only presented in the data collected but also in cognitive during the implementation of the algorithms (Crawford 2). For instance, during the sampling of data, the machine learning algorithm can overlook specific colors or representations and as a result, leave them out of the resulting data set. For example, image-based big data sampling algorithms can ignore or fail to identify people dressed in a specific manner or with a particular skin color pigment, which results in data collection and analysis biases. As such, sampling bias due to the sparse algorithmic coding can lead to race insensitive decision making by the cities and governments by failing to sample a particular racial group (O'Neill 7).
Confirmation Bias
The data analytics algorithms are designed and programmed by human beings, which makes them susceptible to social prejudices, bias, and misunderstanding, which further results in preferences. Confirmation bias in the algorithms used in machine learning can result in prejudiced due to the alignment of the designers' beliefs and opinions with the data sample subjected to machine learning (O'Neill 8). For instance, data mining from smartphones based on the scientist assumption that majority of the people own smartphones is a form of confirmation bias, and the resulting data will not capture the information of children and senior citizens who might not be using smartphones yet the data will be generalized to represent the entire population.
Over Fitting and Under Fitting
Over fitting and under fitting represent a bias in big data algorithms in which several or few data trends are used to make an inference while ignoring or overemphasizing specific trends, which can result in algorithmic bias during the establishment of assumptions. For instance, the application of big data in hiring practices and critical business decision making can result in under fitting bias (Crawford 2). When businesses use the creditworthiness of job applicants to hire employees and associating paying the debt on time with great work: punctuality results in potential discrimination and misrepresentation. This is because some people are destitute at honoring the obligation, yet they are very hardworking (O'Neill 7). In such cases, under fitting can result in algorithmic design bias, which will result in discrimination of potential employees unfairly.
Possible Solution for Bias and Dangers Posed by Big Data and Machine Algorithms in Decision Making
Combination of Subjective and Objective Data Elements
For data to the full represent the subject people or area, it seeks to provide inference for decision making, it is essential to incorporate a subjective aspect of data collection. Big data involves mainly numbers that are collected into data sets but does not include the personal sentiment of the data sample (Crawford 3).Having both subjective and objective data inputs through the use of semi-structured interviews can significantly add depth to data and improve the alignment of resulting data to the sample characteristics.
Conclusion
Big data and machine learning algorithms have become a common approach to conducting social research. Big data analytics using algorithms is subject to multiple points of biases such as misrepresentation, lack of objectivity, failure to identify the subjective elements of data, which significantly poses a danger in using the resulting data to make decisions on sharing public resources and other social inferences. The data analytics and collection algorithms used in machine learning are prone to bias based on the individual subjective beliefs and values of the people designing the algorithms, which increase bias in both data analysis and collection. Using both subjective and objective elements in data collection and incorporating the views of many people in the design of the machine learning algorithm can improve the objectivity and introduce a personal perception or dimension in big data which will significantly reduce the extent of bias.
Works Cited
Crawford, Kate. "The hidden biases in big data." Harvard Business Review 1 (2013): 2013.
O'Neill, Cathy. "Weapons of math destruction: How big data increases inequality and threatens democracy." Nueva York, NY: Crown Publishing Group (2016).
Cite this page
Big Data: Risks of Reliance and the Need for Objectivity - Essay Sample. (2023, Mar 07). Retrieved from https://proessays.net/essays/big-data-risks-of-reliance-and-the-need-for-objectivity-essay-sample
If you are the original author of this essay and no longer wish to have it published on the ProEssays website, please click below to request its removal:
- Security of Healthcare Records Essay
- The Future of Big Data - Paper Example
- Difference between IDS and IPS in a Computer Network System - Compare and Contrast Essay
- Research Paper on Cybertechnology
- Identifying Malicious Software Paper Example
- Essay Sample on Big Data
- Unlock Big Data's Potential: Can It Feed the World? - Essay Sample