Given the bewildering impact of Big Data on businesses of every size and kind, it’s no wonder that the space is complex and morphing before our eyes. In 2012 entrepreneur, adviser, and investor David Feinleib visually represented the companies inhabiting this booming tech space by mapping the space as it looked at the time. In our continuing series of blogs outlining what is terra incognita for many companies (especially small and mid-sized tech innovators), let’s take a broad view of the various sectors of Feinleib’s Big Data landscape, one at a time.
Drastic data sets require drastic technologies. These comprise the application software necessary to process the humongous data sets we call Big Data. These are the technologies needed to capture, store analyze, curate, integrate, cleanse, search, share, transfer, visualize, query, update and ensure the privacy of the data. They include the complementary NoSQL, Hadoop, and MPP (Massively Parallel Processing) databases, as well as other important Big Data technologies such as predictive analytics software and hardware solutions, stream analytics software, and much more.
“To say that what can’t be easily measured really doesn’t exist […] is suicide.”
— Daniel Yankelovich, “Corporate Priorities: A continuing study of the new demands on business” (1972)
Analytics, generally speaking, refers to the analysis of internal and external data to obtain valuable, actionable insights that improve decision making. Gartner points out that “analytics has emerged as a catch-all term for a variety of different business intelligence- and application-related initiatives. In some cases this could be website analytics, predictive or prescriptive analytics. In others, “it is applying the breadth of BI capabilities to a specific content area (for example, sales, service, supply chain and so on). The range of integrated applications, processes, software, equipment and other resources that makes up the underlying framework of an organization’s analytics needs is its analytics infrastructure.
This refers to the underlying framework of an organization’s big-data operations. This framework may include operational technology (the hardware and software that detects or causes a change through the direct monitoring or control of physical devices, processes and events in the enterprise), operational resource management (a method for acquiring a better view into the cost of goods and services to yield enterprise-wide financial control that streamlines the maintenance, repair and operations procurement process and supply-chain control), and operational resilience (a set of techniques that allow people, processes and informational systems to adapt to changing patterns)
Infrastructure as a Service
Infrastructure as a Service (IaaS) “is a form of cloud computing that provides virtualized computing resources over the Internet. IaaS is one of three main categories of cloud computing services, alongside Software as a Service (SaaS) and Platform as a Service (PaaS).” IaaS “is a standardized, highly automated offering, where compute resources, complemented by storage and networking capabilities, are owned and hosted by a service provider and offered to customers on demand.”
“For the most part,” BrightPlanet.com states, “structured data refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations; whereas unstructured data is essentially the opposite. Structured databases are used to store and process structured data.
Data As A Service
The new kid on the block, a cousin of Software as a Service, is data as a service. DaaS builds upon the concept of data as a product made available, on demand, to customers regardless of geographic or organizational separation. A great benefit of DaaS is the availability of clean, rich data stored by a centralized providers offering it to different systems, applications or users, regardless of where they are, providing agility, cost effectiveness and data quality. A good example of major organizations now wading into the DaaS waters is the collaboration between the American Hospital Association and Amazon Web Services to create the AHA Precision Medicine Platform.
Log Data Apps
A great challenge for data analysis in the age of distributed systems and cloud computing is gathering machine log data residing on local hard drives — records of every event and transaction, every error and usage statistic that all modern software generates. Clickstream logs, for instance, record data about web pages that people visit and in which order they visit them. Examples of current log-data apps include Splunk, Sumo Logic, PaperTrails, and the open-source apps LogStash and GrayLog, Loggly.
Techopedia explains, “A vertical application is software that is defined and built according to a user’s specific requirements in order to achieve specific functions and processes that are unique to that user. It is usually customized for a target enterprise or organization in order to meet its own special needs. These applications may support the business or organization in different business units like sales, marketing, inventory and overall management, but may not work for another business that do not have very similar processes to the one for which it was built. Vertical applications are simply targeted for specific users or a niche, unlike horizontal applications, which are created with a broader audience in mind.
Examples of vertical apps are enterprise applications like enterprise resource planning (ERP) and customer relationship management (CRM).
As Forbes pointed out a couple of years ago, Big Data has become “Media’s Blockbuster Business Tool.” Media organizations have long collected scads of data, from every song streamed, every minutes of video viewed, every web page visited. But most traditional media companies have not excelled at understanding all that data. As companies like Netflix and Amazon move more aggressively to become content creators, Big Data helps them determine what audiences want and when, retrain customers, target advertising, and monetize their content. Apps currently available for analyzing advertising and media data include Mapr, Qubole,
As defined by Gartner, “Business intelligence (BI) is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.” OLAP.com elaborates by saying that “Business Intelligence systems are data-driven Decision Support Systems (DSS). Business Intelligence is sometimes used interchangeably with briefing books, report and query tools and executive information systems.
Analytics and Visualization
Big Data analytics has been defined by TechTarget as “the process of examining large and varied data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful information that can help organizations make more-informed business decisions.”
Data visualization has come a long way from treemaps, charts, and word clouds. But the intention has remained the same: To present data “in a pictorial or graphical format [enabling] decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns” (SAS.com). The Big Data Landscape image at the top of this article is an example of visualizing the Big Data tech space.
The data-visualization-software market is crowded. So you’re likely to come across many “Best of” lists such this, this and this. Since Dave Feinleib mapped the landscape in 2012, other observers have contributed updates. A notable example comes from Matt Turck, whose 2016 landscape (“v18 Final”) which, despite its simpler six-part structure, captures the dramatic diversification of the space over the intervening four years. For instance, you’ll see Amazon Web Analytics listed in several sectors across three major segments:
- Hadoop in the Cloud
- Cluster Services
- NoSQL Databases
- Cloud EDW
- BI Platforms
- Machine Learning
P.S. Matt Turck asserts that “the term ‘Big Data’ continues to gradually fade away” and that “the froth has indisputably moved to the machine learning and artificial intelligence side of the ecosystem.” In other words, it’s no longer cool to talk about Big Data. It’s time to move on to AI, IoT, and IPOs.
That’s where tekMountain comes in — not just to preserve your cool quotient (although that would be nice too) but, as an important center of tech innovation, to guide you across today’s rugged tech landscape, and connect you to resources, investors and ideas.