To examine if a dataset is valid or not, it is paramount for the analyst to understand the difference between outliers and invalid data. According to Curran, (2016) explained that invalid data could be traced to a specific cause while the outliers are obtained when the cause of the error cannot be established, and calculation shows that it is essentially importable the figure belongs with the bulk of the data. There are various challenges that a data analyst may face while validating data (Sirota et al., 2011). These reasons arise when the data is distributed in multiple databases across the organization; in this case, the data may be soiled or outdated; hence in light of this, it may be challenging for an analyst to validate the data (Lamma et al., 2001). Secondly, the data formatting might be an extremely time-consuming procedure and more so if the analyst has large databases and if the validation is done manually.
There are indeed various scenarios that can lead to invalid data. Some of these scenarios, such as falsifying data, which is the intentional misinterpretation or manipulation of research materials, equipment, or procedure, or even changing or omitting data in such a way the results of the research or data collection is not accurately documented in the research record (Sandford & Handel, 2000). Other causes of invalid data are the errors in transferring data from the source, especially when any form of rekeying of data necessary. Invalid data maybe also as a result of failure to follow the stipitate guidelines either in the collection of data, recording as well as transferring of information (Potter & LevineDonnerstein, 1999). However, there is a way to pinpoint errors in a dataset, for instance when Analysing the number of individuals in a data set you would expect that datatype to be whole numbers and in case of fraction or then that data ought be invalid and the most effective procedure to find discrepancies in a dataset is through a descriptive analysis.
To confirm the validity of the project, I would examine the testing process as well as the environment to screen out any outcomes that did not follow the stipulated testing process and report on the inconsistencies identified (Beardsley et al., 1996). I, however, agree with the post that taking action on invalid data can have adverse consequences such as the loss of the organization resources as well as be a wastage of time. It is also true use invalid or incorrect data can elevate the problem that the analyst was trying to solve in the first place (Cook & Wolf, 1996). Also, Analysing invalid data would result in incorrect findings as well as misinformed conclusions hence resulting in decisions that lack basis.
Hence, I agree with my peers' postings.
References
Cook, J. E., & Wolf, A. L. (1996). Process discovery and validation through event-data analysis (Doctoral dissertation, University of Colorado). Retrieved 13 February 2020, from https://www.researchgate.net/publication/298144668_Using_Process_Mining_to_Bridge_the_Gap_between_BI_and_BPM
Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4-19. doi: 10.1016/j.jesp.2015.07.006
Lamma, E., Manservigi, M., Mello, P., Nanetti, A., Riguzzi, F., & Storari, S. (2001, September). The automatic discovery of alarm rules for the validation of microbiological data. In Proceedings of Workshop on Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP). Retrieved 13 February 2020, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6473086/
Mounteney, J., Fry, C., McKeganey, N., & Haugland, S. (2010). Challenges of reliability and validity in the identification and monitoring of emerging drug trends. Substance Use & Misuse, 45(1-2), 266-287. doi: 10.3109/10826080903368598
Potter, W. J., & LevineDonnerstein, D. (1999). Rethinking validity and reliability in content analysis. Retrieved 13 february 2020, from https://www.researchgate.net/profile/Sandra_Richardson2/post/Does_anyone_have_any_advice_on_reliability_and_validity_using_content_analysis_for_a_lone_researcher_as_part_of_PhD/attachment/59d63541c49f478072ea344c/AS%3A273662152773640%401442257549913/download/rethink_validity.pdf
Sandford, I. M. T., & Handel, T. G. (2000). U.S. Patent No. 6,065,119. Washington, DC: U.S. Patent and Trademark Office. Retrieved 13 February 2020, from https://www.commerce.gov/bureaus-and-offices/uspto
Sirota, M., Dudley, J. T., Kim, J., Chiang, A. P., Morgan, A. A., Sweet-Cordero, A., ... & Butte, A. J. (2011). Discovery and preclinical validation of drug indications using compendia of public gene expression data. Science translational medicine, 3(96), 96ra77-96ra77. doi: 10.1126/scitranslmed.3003215
Cite this page
Essay Example on Validating Data: Outliers vs. Invalid Data. (2023, Apr 05). Retrieved from https://proessays.net/essays/essay-example-on-validating-data-outliers-vs-invalid-data
If you are the original author of this essay and no longer wish to have it published on the ProEssays website, please click below to request its removal:
- Organizational IT Maturity Assessment of Coca-Cola Company
- Adoption of New Technology Systems: Rogers Theory Paper Example
- SAP Evaluation and Data Management Paper Example
- Essay Sample on Economic Benefits of Cloud Computing
- Paper Example on Cyberattacks: Trends, Patterns & Security Countermeasures
- Essay Example on Amazon: Disrupting E-Commerce, AI, Cloud, and Digital Streaming
- Paper Example on Balancing Act: The Pros and Cons of Students' Internet Use