Paper Example on Company Transitions to Big Data: Reducing Costs With Hadoop

Paper Type:  Case study
Pages:  7
Wordcount:  1722 Words
Date:  2023-01-29


A real company use case in transitioning from enterprise data warehouse to big data is such as when the company deals with big data analytics. Offloading unused data as well as the ETL workloads is a common activity in this company. There was no use of a platform by the name Hadoop that can provide an economical way to store data as well as do bulk processing of the large sets. The company was trying to reduce the costs of data processing through the use of this system.

Trust banner

Is your time best spent reading someone else’s essay? Get a 100% original essay FROM A CERTIFIED WRITER!

The main service that the company offered included administrative services for the clients. It dealt with large data which needed extraction from the source systems, transformation and then loading into the data warehouses (Tian, Zou, Ozcan, Goncalves, & Pirahesh, 2015). In this traditional data warehousing world, the company has had clearly defined transformations. However, the need to move to the big data world is to ensure that there is no need for data storage as a structured format.

The main disadvantage of using the Enterprise Data Warehouse in the company is the lack of scalability when there is involvement of the three aspects of Big Data which are Volume, variety, and Velocity (Joe, 2017). The company has not been able to manage these aspects hence the migration has not been successful. The business generates an enormous quantity of data and the data they create and collect continues to grow at an exponential rate.

The increase in the volume of data was getting too much hence the need to advance to big data plans. The reason for migration is because the EDW databases are powerless in quickly and cost-effectively adapting to the cumulative increase in the volume as well as the velocity of the incoming data (Tian, Ozcan, Zou, Goncalves, & Pirahesh, 2016).

Data Flows

Data flows in the company take the form of a lifecycle with three main areas which as data ingestion, data integration and data delivery. Data ingestion in the company is where there is a mapping of the existing data flow so as to understand the modifications that may be needed. The data is ingested and stored where there is a need for partitioning and making incremental updates where necessary. At this point, there is also data backups and long-term storage (Michael, 2019).

In data integration, the company uses the EDW databases to build data models for the customers. The use of this technique in data is to ensure the provision of a compact relational understanding of this data then comprise a centralized model (Michael, 2019). This data can then be denormalized, cubed, aggregated and also interpreted according to the specific applications.

Data delivery is also a crucial step in the data flow because, at some point, customers will need to surface data to the end-users. Data delivery is to ensure that there is accurate information provided after keen processing and using full programming languages (Michael, 2019). Data delivery also includes data provenance which may be needed in addition to regular activities and communication logs. These plans are to be integrated through the use of Big Data analytical tools to enhance existing capabilities.

Suitable Platforms

A suitable platform for the general big data management is cloud-based infrastructure since the company is a large one and require the benefits of cloud computing for big data plans. Using cloud-based solutions in the enterprise can help prevent many problems that could affect the organization especially where it relies on on-premises. The importance of cloud-based infrastructure is that it will help in the reduction of costs and complexity that arises from owning and operating networks. It is also efficient in freeing up resources so as to have more focus on innovation and the development of products.

A suitable platform for the vendor is the Hadoop system supplied by Cloudera CDH. Hadoop is an open-source that can help provide excellent data management. The framework supports the processing of very large data sets which is necessary for a company dealing with large volumes of data. The computing environment for Hadoop is also distributed making it efficient to do the processing. The use of Hadoop is also in a design where it can expand from single servers to other machines where each of them provides computation as well as storage. Hadoop is also necessary for this particular organization because of the value of size business which continues to grow.

Addressing Security and Privacy Issues

There is a need for developing a security plan to protect customer data because of the increase in high-profile data breaches and also the stated regulations by the General Data Protection Regulation (GDPR) (Puppala, He, Yu, Chen, Ogunti, & Wong, 2016). The organization needs to be concerned about security and data privacy and make sure that the companies that are associated in terms of business also protect the data of customers. The company will address security and privacy issues through digital monitoring. This will be helpful for the organization as it will streamline the business operations.

The use of data backups will help the company not lose crucial information about the consumers that they may use to get better. Privacy is considered a fundamental right and every user of the internet should take control of personal data and how it is used. Use of regular backups will be crucial to ensure the discovery of lost files that may still be essential for the company. The organization also needs to set up secure backup storage which will cater to the privacy issues. Generally, the organization will need to do a better job at cybersecurity and protection of personal data through enacting security policies and systems to meet the objectives of the enterprise.

Data Management Policy

Crucial data management policies include that for accountability and stewardship data need to have defined data stewardship. It is responsible for the integrity, accuracy as well as the security of data. The responsibility of data stewards is to ensure that all the legal, regulatory and policy requirements are met as applied in the specific sets of data (University of Sunshine Coast, 2019). Another data policy is the discouraging of data duplication and encouraging data re-use. Integrity in data management is such that it should be entered only once. The duplication or storage of data needs the approval of the associated data steward. There is a need for collaboration of staff to prevent the storage of duplicated data.

The top challenges of Cloud Security Alliance include Real-Time security monitoring, secure storage of data and the transaction logs, secure computations in distributed frameworks of programming, security best practices for non-related storage of data, granular access control, scalable privacy-preserving data mining and cryptographically enforced data-centric security (Cloud Security Alliance 2013). These challenges have barred CSA from being effective in implementing solutions that are meant for ensuring the secure management of data for the customers in a range of fields.

Big Data Analytics and Visualization Methods

The main big data analytics and visualization methods that will be used include charts that will show the relationship between elements and demonstrate the components and also proportions (Dastjerdi, Gupta, Calheiros, Ghosh, & Buyya, 2016). This will be important to show the patterns of the data that they follow and make it easy to show the development. Maps will also help to describe the distribution of data and allow for the positioning of elements on the relevant objects and areas. Maps will be essential for use to depict the plans from the organization. The use of diagrams is also important to demonstrate the relationships and links between the data.

The data visualization tools that will be great for the organization include the tableau which is a platform that helps to derive meaning from data (Digiteum 2018). It will specifically be important when it comes to handling big data sets and provide the meaning of the patterns shown. Tableau has a user-friendly interface and has a rich library of interactive visualizations hence stands out as powerful capabilities. It also provides large options for interpretation including the Hadoop services such that when incorporated will ensure professional data analytics.


The purpose of Enterprise Data Warehouse is envisioned in large enterprises which have many departments such as marketing, products, sales, financial and also administrative (Evangelista, 2017). All these departments generate information in various data formats which can easily be disorganized with a common format. The problem with the management of big data is when the company starts to deploy more computing power to help the processes. This paper has looked into the migration plan from Enterprise Data Warehouse (EDW) systems to big data-driven enterprise (BDDE).


Cloud Security Alliance (2013). CSA Releases the Expanded Top Ten Big Data Security & Privacy Challenges. Retrieved from

Dastjerdi, A. V., Gupta, H., Calheiros, R. N., Ghosh, S. K., & Buyya, R. (2016). Fog computing: Principles, architectures, and applications. In the Internet of Things (pp. 61-75). Morgan Kaufmann. Retrieved from

Digiteum (2018). Data Visualization Techniques and Tools. Retrieved from, P. (2017). Information and Communication Technologies: a Key Factor in Freight Transport and Logistics. Training in Logistics and the Freight Transport Industry (pp. 29-50). Routledge.

Joe, O., (2017). Migrating Enterprise Data warehouse (EDW) to Big Data. Retrieved from, F., (2019). Best practices on migration from a data warehouse to a big data platform. MapR Technologies. Retrieved from

Puppala, M., He, T., Yu, X., Chen, S., Ogunti, R., & Wong, S. T. (2016, February). Data security and privacy management in healthcare applications and clinical data warehouse environment. In 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) (pp. 5-8). IEEE.

Tian, Y., Ozcan, F., Zou, T., Goncalves, R., & Pirahesh, H. (2016). Building a hybrid warehouse: efficient joins between data stored in HDFS and enterprise warehouse. ACM Transactions on Database Systems (TODS), 41(4), 21.Retrieved from

Tian, Y., Zou, T., Ozcan, F., Goncalves, R., & Pirahesh, H. (2015, March). Joins for Hybrid Warehouses: Exploiting Massive Parallelism in Hadoop and Enterprise Data Warehouses. In EDBT (pp. 373-384). Retrieved from

University of Sunshine Coast (2019). Data management-Procedures. Retrieved from

Cite this page

Paper Example on Company Transitions to Big Data: Reducing Costs With Hadoop. (2023, Jan 29). Retrieved from

Free essays can be submitted by anyone,

so we do not vouch for their quality

Want a quality guarantee?
Order from one of our vetted writers instead

If you are the original author of this essay and no longer wish to have it published on the ProEssays website, please click below to request its removal:

didn't find image

Liked this essay sample but need an original one?

Hire a professional with VAST experience and 25% off!

24/7 online support

NO plagiarism