36 Big Data
Ms.Vinodini Kapoor
- Learning Outcome:
After completing this module the students will be able to:
- Develop an understanding of the concept of Big Data
- List various Sources of Big Data.
- Discuss how various industries are utilizing Big Data.
- Understand how Big Data is changing Information Systems.
- Understand Benefits of Big Data.
- Introduction
The amount of data in today’s world has been exploding, resulting in what is popularly known as Big Data. Big data refers to our ability to collect and analyze the vast amounts of data. The ability to harness the large realms of data is completely transforming our ability to understand the world and everything within it.
The size of the data, along with the underlying purpose to derive benefit from it, has led to a new class of technologies that have emerged. Organizations are on the run to accumulate, store and analyze data that has high volume, velocity, and variety and comes from a variety of new sources. These may be in form of social media access, log files, video, text, image and global positioning system access. These sources exhaust the capabilities of traditional relational database management systems and galvanize a host of new technologies, approaches, and platforms.
Interestingly, the idea behind the phrase ‘Big Data’ is that everything we do in our lives leaves a digital trace (or data), which we can use and analyze. The advances in capturing and analyzing big data allow us to decode human DNA in minutes, find cures for cancer, accurately predict human behavior, foil terrorist attacks, pinpoint marketing efforts, and prevent diseases and so much more. This can be utilized constructively or otherwise if not secured.
The dominant Big Data technologies in use today commercially are Apache’s Hadoop and No-SQL databases. Hadoop is a software framework for data intensive distributed applications and was inspired by Google’s MapReduce, a software framework in which an application is broken down into numerous small parts.
- The Concept of Big Data
Primarily, three major forces that drive the interest and growth in Big Data can be stated as follows:
1. Enormous growth in the amount of data being generated on the internet.
2. The evolving strategy of firms to collect data from internal and external sources throughout the product and process lifecycle
3. The phenomenal outreach of social media, mobile applications, and sensor based technologies as well as the Internet.
All of these forces are generating a flood of data which is increasing in volume, variety and velocity.
Big Data is referred to both, the type of data being managed as well as the technology used to store and process it. Mostly, the technologies originated from companies such as Google, Amazon, Facebook and Linked-In, where they were developed for each company’s own use in order to analyze the massive amounts of social media data they were dealing with.
Big Data is increasingly being defined by the “Three Vs.” stated in exhibit 6, which become a reasonable test as to whether a Big Data approach is the right one to adopt for a new area of analysis. The Vs are:
- Volume – It refers to the size of the data. With technology it’s limiting to talk about data volume in any absolute sense, numbers get quickly outdated, volume refers to a relative sense instead. If the data volume is at an order of magnitude or larger than anything previously encountered in your industry, then you’re probably dealing with Big Data. In case of certain companies this might range to the order of 10’s of terabytes or 10’s of petabytes.
Walmart is estimated to accumulate more than 2.5 petabytes of data every hour from its customer transactions. A terabyte is one quadrillion bytes, or the equivalent of about 20 million filing cabinets’ worth of text. An exabyte is 1,000 times that amount, or one billion gigabytes.
- Velocity – It refers to the rate at which data is being received and acted upon. The moment a broad cast message is received and user initiates a reaction. For example, a discount offered to a customer based on location updates or traffic forecast may render useless if the user has already crossed the geographical distance.
Variety – There are two aspects of variety one pertaining to syntax and other, semantics. This implies the ability for the data to be categorized into a relational database easily and content exposed for analysis. Modern tools are capable of dealing with data arriving in virtually any format or syntax. However, they are less able to deal with semantically rich data such as free text.
- Sources of Big Data
It is important to understand from where does this large volume of data come from?
Big data is an aggregation of quantum of data generated by machines, people, and organizations. With automated or machine generated data we refer to real time sensors in industrial machinery or logs that monitor and capture user behavior online, environmental sensors or personal health trackers, and many other sense data resources.
We also refer to vast amount of social media data, profile updates, tweets, check-ins and photos. With organizational generated data we refer to more traditional types of data, including transaction information in databases and structured data open stored in data warehouses.
As manual activities move to the digitized platform, information and minimal cost equipment combine to step into a space where large amounts of digital information exist on virtually any topic of interest to a business.
From phones to shopping on online portals, social networks and communication all produce torrents of data as a by-product of their ordinary operations.
Over a period of time data driven companies have brainstormed as to what they should do with the data they collect. It necessarily may not entirely fit into any relational databases such as text and web logs. Big Data offers the promise of unlocking the potential of this data and opens up new avenues for value creation with regard to correlation of social network and purchase behavior to form a complete profile of every customer.
Big Data, consists of huge volumes of structured, unstructured and se-structured data from both internal and external sources from which insight and actionable intelligence is sought.
- Unstructured data- refers to data that does not conform to a predefined data model. This implies no relation model and no SQL. It is mostly anything that we don’t store in a traditional Relational database management system. Nearly, 80 to 90% of all data in the world is unstructured and this number is rapidly growing. Unstructured data generated by people includes images, videos, audio, internet searches, and emails.
The costs and time of the process acquisition, storage, retrieval and processing of unstructured data may add up to quite and investment before we can start reaping value from this process. It can be pretty hard to find the tools and people to implement such a process and reap value in the end.
- Structured Data – Each organization has distinct operation practices and business models, which result in a variety of data generation platforms. Organizational big data come from online e commerce transactions, government institution websites, banking or stock records, medical records, sensors and so on.
Almost every event can be potentially digitally stored. Organizations build and apply processes to record and monitor business events of interest, such as registering a customer, manufacturing a product or taking an order. These processes streamline data in a structured format which includes transactions, reference tables, and relationships, as well as the metadata that sets its context.
Various Sources from which Big Data can be captured are listed below and showcased in exhibit 12.
- Media exists in-and-out of the organization; it may connect with APIs (Application Programmable Interface) and is moderately structured. Media files such as images, videos, audio, flash, live streams, podcasts etc.
- Business apps are structured, and using APIs you can pull data from both inside and outside the For example a CRM or SCM tool integrated with ecommerce system. Externally using Accu-Weather update for local personalization.
- Public web also deals with some very useful applications. For example, business affected by the daily fluctuation of currency or share prices or gold rates that can be pulled from Google Trends. Government, traffic, health care services, and other web services are all examples of Public Web based data.
- Sensor data It follows the quadruple characteristic of high velocity, volume, variety and value. When used correctly to understand user context and predict behavior. Examples of sensors include temperature, noise, pollution levels, traffic updates and biometrics. Further, car sensors, traffic recording devices, office buildings, cell towers, jet engines provide information to be analyzed for various technical aspects.
- Machine log data – This refers to mobile or third party services that identify, target and convert Event logs, Server data, application logs, call logs, mobile location, mobile app usage refer to machine log data that can build a repository of information.
- Social media is high velocity, high volume data that can be used to analyze reviews, brand popularity, visitor count, customer rating target campaigns to social accounts that match the email addresses in your customer file. Social Media refers to Twitter, Face book, LinkedIn, Blog, YouTube, and Google+ are some among the various examples.
- Docs can exist inside or outside your organization, and like archived data, doesn’t use APIs.
There refer to various xls, pdf, csv, email, word, ppt, plain text files etc.
4.1 Who uses Big Data?
Big data affects organizations across every industry. Various applications are highlighted in exhibit 13.Each industry can benefit from this onslaught of information.
- Banking – With large amounts of information pooled in from countless sources, banks are faced with finding new and innovative ways to manage big data. While it is imperative to understand customers and ensure their satisfaction, it’s equally important to minimize risk and fraud while maintaining regulatory compliance. Big data requires financial institutions to stay abreast with advanced analytics.
- Education – Academicians armed with data-driven analytics can make a significant impact on school systems, students and curriculums. By analyzing big data, they can identify student performance at various levels and parameters and segregate students needing more attention, make sure students are making adequate progress, and can implement a better system for evaluation and support of teachers and principals.
- Government – When government use data analytics to gain significant ground when it comes to managing utilities, running agencies, dealing with traffic congestion or preventing crime. But while there are many advantages to big data, governments must also address issues of transparency, data security and privacy.
- Health Care – In healthcare everything needs to be done quickly, accurately – and, in some cases, with enough transparency to satisfy stringent industry regulations. High degree of accuracy and timely information available to the medical practitioner is imperative. When big data is managed effectively, health care providers can create patient profiles, maintain medical history.
- Manufacturing – Manufacturers focus on enhancing quality and substantially reduce wastage. Such processes are key in today’s highly competitive market. Manufacturers today apply business analytics for implementing lean six sigma, just in time, quality control and other such benchmarks to ensure minimum defects and maximum return.
- Retail – Maintaining relationships is of paramount importance to the retail industry – and the best way to manage is by filtering and segmenting information from big data. Retailers can know customers better by identifying them through the database, maintain purchase history and cross sell, manage transactions.
The benefit of big data comes from how it is used and how it is analyzed. With management focus and creative analytical ability big data can be helpful for multiple use cases.
- Big Data changing the face of Information Systems
Reasons as to why Big Data is changing Information Systems and corporate information technology.
- Move away from traditional RDBMS – From the start of electronic storage and processing of data the concept of the Relational Database Management System has emerged. It is centric to most of the computerized corporate information systems. Information systems such as ERP or CRM are well integrated RDBMS. NoSQL helps to run data base queries to fetch information on a real time basis.
- Unstructured data handling capability – Capability of handling both data in various formats is a competent capability of any information system. Variety implies that Big Data is not necessarily text or numbers (alphanumeric fields), but also unstructured data.
3. Real Time Data Processing – Harnessing the power of big data also requires the ability to immediately take action in case of various events. This refers to responding to a query or customer complaint, reacting to a review or a tweet or handling negative or damaging publicity over the social media. Infact, batch processing, nightly or weekly updates and even near real time data processing are not good enough when dealing with data velocity as is the case with Big Data.
4. Predictive analytics and in memory analytics – If data is being generated in a variety of formats (structured and unstructured), in high volume and at a high velocity, only way it can be used effectively for decision making is through the use of Predictive Analytics and in memory data analytics. Information systems in future will have to be designed keeping this aspect in mind.
5.Most data are either user or machine generated – Most of big data is captured from multiple touch points. These may be users/customers (such as social media data) or by machines/sensors outside the confines or firewall of a company. This is unlike when most of the data were generated within the firewall of a corporation (such as transaction data, inventory data or factory production data) with very little coming from outside.
6. Benefits of capturing Big Data
1. Cost Reductions – Organizations that rely on cost effectiveness largely adopt big data tools primarily on largely technical and economic criteria. Cost reduction can also be a secondary objective after others have been achieved.
2. Time Reduction from Big Data – With processing speed of computers and high speed processors have helped in reducing the cycle time for complex and large-scale analytical calculations from hours or even days to minutes or seconds.
3. Building Customer Decision Trees – Big data makes it possible for companies to better understand customers’ shopping behavior at each stage of the “consumer purchase cycle.” By analyzing online browsing and searching histories, for example, companies can learn the alternatives customers look at when considering buying a product, the important factors in their final purchasing decision, and how they put together their shopping baskets—information that can help companies identify valuable up-selling and cross-selling opportunities.
Companies can also monitor how customers talk about a product on social media, including why they purchased it, which features they like and dislike, and what would prompt them to purchase it again.
4. Building Customer Satisfaction – A common use-case for Big Data the intelligent use of CRM. Apart from this it is essential to manage customer experience. Companies understand customer sentiment and adapt service delivery across our channels accordingly to offer the best possible customer experience.
- Summary
Big Data has been a game changer in the field of knowledge management and data mining. This revolution has fundamentally changed how information is collected, stored, managed and consumed thereby transforming the way we work, live and play. The use of Big Data is emerging as a crucial way for leading companies to gain a competitive edge and outperform their counterparts. Established firms and new entrants accumulate data or buy databases to leverage on data-driven strategies. Their aim is to compete and capture value. Big Data helps to create multiple use cases, avenues for business and growth opportunities and entirely new distinction of companies such as data aggregators who accumulate and analyze industry data. Big data technologies help to provide accurate analysis, which may lead to more concrete decision making resulting in greater operational efficiencies, cost reductions, and reduced risks for the business. Large organizations across industries are joining the data economy. They are not keeping traditional analytics and big data separate, but are combining them to form a new synthesis.
you can view video on Big Data |
Web Resources
- http://aisel.aisnet.org/cais/vol34/iss1/65/
- http://www.sas.com/en_us/insights/big-data/what-is-big-data.html#modal3