Essay Example on Analyzing Data: Summary Function Is Vital

Paper Type: Essay

Pages: 3

Wordcount: 750 Words

Date: 2023-04-06

Categories:

When an individual is analyzing any dataset, it is required of them to understand it first. After performing the assessment process, I am made to conclude that performing the summary function for any dataset is vital as it provides the general onset of data (Davis, 1976). The function provides us with the in-depth information of the data like, for instance, it helps us describe the data in terms of the median, mean, and variance (Friedman et al., 2000). The minimum and maximum values of the dataset can also be determined. However, standard deviation and mode can also get established through the descriptive analysis (Wigginton and Abecasis 2005). When comparing one dataset to the other, the summary function enables us to view the background information of the other dataset in which we are using to analyze our data (Balci, 1998). The summary, however, provides information about the dispersion and central tendency of Apple's and Microsoft's stock prices (Smailovic et al., 2013). With this information, one can convert the raw data into a meaningful representation (Brownlee, 2016). The summary undergoes five-number analysis, also referred to as the quartiles, which also includes the mean. With the quartile, however, the variation of data is way better described, and low and high values can also get identified (Ganti et al., 1999). The function is thus essential since it enables one to screen extra or missing data. Besides, with the mode and class, character strings imputation is ensured, and numerical values can get prevented. The maximum and minimum values of either dataset can aid in finding the range and together with the 1st, 2nd, and 3rd quartiles, and the appearance of the data set can be described (Kotsiantis et al., 2008). The range enables us to understand how much our data is spread (Groebner et al., 2013). Outliers changes, like the addition of a value that may or may not affect the minimum and maximum values. The average, on the other hand, tells us where the majority of the data falls into (Tibshirani et al., 2001). When a dataset is less spread, the mean and values are almost similar. Outliers are said to affect the average of any dataset easily.

References

Balci, O. (1998). Verification, validation, and testing. Handbook of simulation, 10(8), 335-393. Retrieved from http://www.academia.edu/download/56728205/simulation_handbook.pdf#page=347

Brownlee, J. (2016). Master Machine Learning Algorithms: discover how they work and implement them from scratch. Machine Learning Mastery. Retrieved from https://books.google.com/books?hl=en&lr=&id=n--oDwAAQBAJ&oi=fnd&pg=PP1&dq=Brownlee+(2016)+data+understanding&ots=3jnx04kGwb&sig=oOPQk_9jiR5EUIlxIIJeyX0SPjc

Friedman, N., Linial, M., Nachman, I., & Pe'er, D. (2000). Using Bayesian networks to analyze expression data. Journal of computational biology, 7(3-4), 601-620. Retrieved from https://www.ics.uci.edu/~xhx/courses/references/Fridman_BN_JCB.pdf

Davis, R. E. (1976). Predictability of sea surface temperature and sea level pressure anomalies over the North Pacific Ocean. Journal of Physical Oceanography, 6(3), 249-266. Retrieved from https://journals.ametsoc.org/doi/pdf/10.1175/1520-0485(1976)006%3C0249%3APOSSTA%3E2.0.CO%3B2

Ganti, V., Gehrke, J., & Ramakrishnan, R. (1999, August). CACTUS-clustering categorical data using summaries. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 73-83). Retrieved from http://staff.icar.cnr.it/manco/Teaching/2005/datamining/articoli/ganti99cactus.pdf

Groebner, D. F., Shannon, P. W., Fry, P. C., & Smith, K. D. (2013). Business statistics. Pearson Education, UK. Retrieved from http://admin.umt.edu.pk/Media/Site/SBE/SubSites/dqm/FileManager/Courses/Fall14/BBA/Business%20Statistics.docx

Kotsiantis, S., Kostoulas, A., Lykoudis, S., Argiriou, A., & Menagias, K. (2008). Using data mining techniques for estimating minimum, maximum, and average daily temperature values. International Journal of Mathematical, Physical and Engineering Sciences, 1(1), 16-20. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.120.4923&rep=rep1&type=pdf

Smailovic, J., Grcar, M., Lavrac, N., & Znidarsic, M. (2013, July). Predictive sentiment analysis of tweets: A stock market application. In International Workshop on Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data (pp. 77-88). Springer, Berlin, Heidelberg. Retrieved from http://first.ijs.si/FirstShowcase/Content/pub/HCI-KDD-2013.pdf

Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411-423. Retrieved from http://web.stanford.edu/~hastie/Papers/gap.pdf

Wigginton, J. E., & Abecasis, G. R. (2005). PEDSTATS: descriptive statistics, graphics, and quality assessment for gene mapping data. Bioinformatics, 21(16), 3445-3447. Retrieved from https://academic.oup.com/bioinformatics/article/21/16/3445/215339

Cite this page

Essay Example on Analyzing Data: Summary Function Is Vital. (2023, Apr 06). Retrieved from https://proessays.net/essays/essay-example-on-analyzing-data-summary-function-is-vital

If you are the original author of this essay and no longer wish to have it published on the ProEssays website, please click below to request its removal: