Big Data: A modern problem

Prakhar Lad
2 min readSep 16, 2020

Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software.

Google is an undisputed champion when it comes to big data. They have developed several open source tools and techniques that are extensively used in big data ecosystem. With the help of different big data tools and techniques, Google is now capable of exploring millions of websites and fetch you the right answer or information within milliseconds. The first question that comes to our mind is how can Google perform such complex operations so efficiently? The answer is simple Big Data Analytics.

In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional Hadoop Distributed File System (HDFS) with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput, and customers will also benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent defaults.

Let’s take a look over Apple. How apple manages their Big Data problem? Apple is often on the cutting edge of technological advances, so it probably shouldn’t be a surprise that the company uses big data extensively. Having said that, it’s important to note that it wasn’t always this way. It’s true that Apple remains highly secretive about how they use big data in many cases, but that hasn’t prevented some interesting insights from being divulged. By learning how Apple is using big data analytics, other companies can get a better view of how best to utilize the incredibly versatile technology.

One area in particular that has received a boost from big data analytics is application design. Applications are the useful tools many people have on their smartphones and tablets, and those tools can collect data on exactly how people use them. This is an important distinction to make, since in the past, designs were made intending to force people to use applications a certain way.

Amazon EMR is a managed service that makes it fast, easy, and cost-effective to run Apache Hadoop and Spark to process vast amounts of data. Amazon EMR also supports powerful and proven Hadoop tools such as Presto, Hive, Pig, HBase, and more. In this project, you will deploy a fully functional Hadoop cluster, ready to analyze log data in just a few minutes.

Amazon Web Services is using the open-source Apache Hadoop distributed computing technology to make it easier for users to access large amounts of computing power to run data-intensive tasks. Hadoop, the open-source version of Google’s MapReduce, is already being used by such companies as Yahoo and Facebook.

--

--