When an individual is analyzing any dataset, it is required of them to understand it first. After performing the assessment process, I am made to conclude that performing the summary function for any dataset is vital as it provides the general onset of data (Davis, 1976). The function provides us with the in-depth information of the data like, for instance, it helps us describe the data in terms of the median, mean, and variance (Friedman et al., 2000). The minimum and maximum values of the dataset can also be determined. However, standard deviation and mode can also get established through the descriptive analysis (Wigginton and Abecasis 2005). When comparing one dataset to the other, the summary function enables us to view the background information of the other dataset in which we are using to analyze our data (Balci, 1998). The summary, however, provides information about the dispersion and central tendency of Apple's and Microsoft's stock prices (Smailovic et al., 2013). With this information, one can convert the raw data into a meaningful representation (Brownlee, 2016). The summary undergoes five-number analysis, also referred to as the quartiles, which also includes the mean. With the quartile, however, the variation of data is way better described, and low and high values can also get identified (Ganti et al., 1999). The function is thus essential since it enables one to screen extra or missing data. Besides, with the mode and class, character strings imputation is ensured, and numerical values can get prevented. The maximum and minimum values of either dataset can aid in finding the range and together with the 1st, 2nd, and 3rd quartiles, and the appearance of the data set can be described (Kotsiantis et al., 2008). The range enables us to understand how much our data is spread (Groebner et al., 2013). Outliers changes, like the addition of a value that may or may not affect the minimum and maximum values. The average, on the other hand, tells us where the majority of the data falls into (Tibshirani et al., 2001). When a dataset is less spread, the mean and values are almost similar. Outliers are said to affect the average of any dataset easily.
References
Balci, O. (1998). Verification, validation, and testing. Handbook of simulation, 10(8), 335-393. Retrieved from http://www.academia.edu/download/56728205/simulation_handbook.pdf#page=347
Brownlee, J. (2016). Master Machine Learning Algorithms: discover how they work and implement them from scratch. Machine Learning Mastery. Retrieved from https://books.google.com/books?hl=en&lr=&id=n--oDwAAQBAJ&oi=fnd&pg=PP1&dq=Brownlee+(2016)+data+understanding&ots=3jnx04kGwb&sig=oOPQk_9jiR5EUIlxIIJeyX0SPjc
Friedman, N., Linial, M., Nachman, I., & Pe'er, D. (2000). Using Bayesian networks to analyze expression data. Journal of computational biology, 7(3-4), 601-620. Retrieved from https://www.ics.uci.edu/~xhx/courses/references/Fridman_BN_JCB.pdf
Davis, R. E. (1976). Predictability of sea surface temperature and sea level pressure anomalies over the North Pacific Ocean. Journal of Physical Oceanography, 6(3), 249-266. Retrieved from https://journals.ametsoc.org/doi/pdf/10.1175/1520-0485(1976)006%3C0249%3APOSSTA%3E2.0.CO%3B2
Ganti, V., Gehrke, J., & Ramakrishnan, R. (1999, August). CACTUS-clustering categorical data using summaries. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 73-83). Retrieved from http://staff.icar.cnr.it/manco/Teaching/2005/datamining/articoli/ganti99cactus.pdf
Groebner, D. F., Shannon, P. W., Fry, P. C., & Smith, K. D. (2013). Business statistics. Pearson Education, UK. Retrieved from http://admin.umt.edu.pk/Media/Site/SBE/SubSites/dqm/FileManager/Courses/Fall14/BBA/Business%20Statistics.docx
Kotsiantis, S., Kostoulas, A., Lykoudis, S., Argiriou, A., & Menagias, K. (2008). Using data mining techniques for estimating minimum, maximum, and average daily temperature values. International Journal of Mathematical, Physical and Engineering Sciences, 1(1), 16-20. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.120.4923&rep=rep1&type=pdf
Smailovic, J., Grcar, M., Lavrac, N., & Znidarsic, M. (2013, July). Predictive sentiment analysis of tweets: A stock market application. In International Workshop on Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data (pp. 77-88). Springer, Berlin, Heidelberg. Retrieved from http://first.ijs.si/FirstShowcase/Content/pub/HCI-KDD-2013.pdf
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411-423. Retrieved from http://web.stanford.edu/~hastie/Papers/gap.pdf
Wigginton, J. E., & Abecasis, G. R. (2005). PEDSTATS: descriptive statistics, graphics, and quality assessment for gene mapping data. Bioinformatics, 21(16), 3445-3447. Retrieved from https://academic.oup.com/bioinformatics/article/21/16/3445/215339
Cite this page
Essay Example on Analyzing Data: Summary Function Is Vital. (2023, Apr 06). Retrieved from https://proessays.net/essays/essay-example-on-analyzing-data-summary-function-is-vital
If you are the original author of this essay and no longer wish to have it published on the ProEssays website, please click below to request its removal:
- How to Install and Run Opencast Software
- Summary of My View on Marketing Information System (MIS) Essay
- Articles Analysis Essay on Scam, Identity Theft and Fraud
- Will the Internet and Other New Technology Replace the Book as the Chief Tool of Learning? Essay Example
- A New Year, a New Love: A Single's Journey to Happiness
- Essay Sample on eWOM
- Essay Example on Shared Memory: Interprocess Communication & DSM Implications