An Insider’s Look:
The Ultimate List of Big Data Problems

As the volume of collected raw data continues to grow, so too does the seemingly endless possibilities of what can be done with it upon transformation into actionable insight.  With the digital revolution in full swing, big data and its processes, tools and technologies are rapidly becoming just another part of day-to-day business life.  In fact, according to the 2018 NewVantage Partners Big Data Executive Survey, 97.2% of the Fortune 1000 businesses surveyed said they are investing in big data and AI initiatives.

But just because big data is becoming normal and common, doesn’t mean that it’s easy.  Only 40.8% of those Fortune 1000 businesses believe that their big data and AI investments have helped their company more effectively position itself for the future.  Furthermore, while 98.6% of respondents said their organization aspires to a data-driven culture, only 32.4% said they have achieved it.

Data is gold in today’s business arena.  It’s the currency of the digital age, and while it’s clearly an asset, it can also become a liability when there are obstacles standing in the way of your ability to properly leverage it.  The road to extracting maximum value from your information resources is often peppered with problems, hitches and hurdles.

There’s no denying that organizations face a number of complex challenges when it comes to leveraging big data optimally.  Gaining a thorough understanding of potential problems associated with big data can help you prepare for how to deal with them if and when they occur, and arm yourself with the solutions you need to overcome them.


So what are those potential problems?
And what can you do to overcome them?
Here’s a list of big data problems that are currently impacting organizations across the globe:


The volume of information collected from a person can be processed to paint a surprisingly comprehensive picture of who that person is as a consumer.  Where you go, what you read, what you write, who you communicate with, what you watch, what you buy, what websites you visit and more, are all data points that, in the hands of marketers, employers, financial institutions and government, can have a major impact on your life.

In many ways, that impact can be a positive one.  There’s no denying the benefits of the conveniences offered by big data-powered apps and services, but with those benefits comes a risk to privacy.  In order to provide those conveniences and make them most effective, organizations use the sensitive data they collect to tailor their offerings to consumers.  But how much control do consumers have over how their personal information is used?  Moreover, is it being used in ways that makes it more vulnerable to exposure?

Additionally, cloud computing, which is heavily utilized to support big data initiatives, can potentially lack the capabilities required to ensure data privacy.  In many cases, a data protection resource for using cloud computing services is needed to implement specific data privacy controls with well-defined responsibilities for the cloud provider and user.

As big data processes and technologies continue to mature, grow and become commonplace in all areas of business, concern over data privacy will continue to grow in parallel.

Legislative Response to Privacy Concerns

With over 80 countries and independent territories adopting some form of data protection laws or information privacy laws, legislators across the globe are realizing the importance of big data privacy.

In May of 2018, for example, the European Union (EU) enacted the General Data Protection Regulation (GDPR), which is aimed at giving control to individuals over their personal data and simplifying the regulatory environment for international business.

Data privacy is not vastly regulated in the United States, however.  As of now, there is no comprehensive law controlling the attainment, storage, or utilization of personal data in the U.S., only sectional laws in limited areas.

Data Transparency

Ultimately, the best way for an organization to enable better data privacy practices is to be open about the data that’s been collected.

While there may be a hesitation to allow consumers to see how much of their detailed personal data has been collected, this level of transparency is a valuable way to achieve consumer trust and to build confidence in the decisions being made by utilizing big data.

Organizations that are honest and build trust will also build loyalty within their customer base.  Those that don’t, or that allow the use of their customers’ personal data without consent, are likely to experience customer distrust and a damaged reputation.


As the volume of big data continues to increase and the number of connected devices continues to grow, so too does the potential that the information that drives their insight becomes subject to a security breach.  Security of big data is an endless concern because its deployments are precious targets for potential attackers.

If successful, those increasingly sophisticated attacks can cause an organization severe financial and reputational damage.  Unauthorized users could gain access to sell critical information, or hold it ransom, which could lead to hefty fines from regulators, along with loss of customer trust and loyalty.

Because of the seriousness of these threats, it’s important to understand where any gaps and vulnerabilities may lie.  With big data, the only way to truly safeguard your critical information is by enhancing your security intelligence through the monitoring and analyzing of action across your ecosystem and throughout your IT infrastructure.  Understanding where your data comes from and maximizing your visibility of it, is of paramount importance.

Security Challenges

The challenges of securing big data are inclusive of on-site big data platforms, along with the cloud.  Here are just a few typical challenges involved with securing big data:

  • Suspicious Activity

    Suspicious activity within your network could come from anywhere.  For example, an employee with appropriate access may perform data mining activities without approval or without notifying his or her superiors.  Whether that is because of simple forgetfulness, or because of more nefarious gains, security tools need to identify and alert suspicious activity no matter what the cause or where it comes from.

  • Size

    The sheer size of a big data is too large for standard IT security tools and processes to effectively manage.  This introduces multiple vulnerabilities across multiple access points, servers, devices and more.  This can also impact the effectiveness of routine security audits.

  • Tools

    Analytics tools, cloud computing, artificial intelligence and machine learning are just a few newer technologies that are in active development.  While these tools certainly work well with big data and increase the effectiveness of its use, traditional security software and processes may have difficulty appropriately integrating with these tools, or protecting them if necessary.

Security Technologies

There are many big data security technologies that can help secure a big data ecosystem.  Most technologies aren’t new, but they’ve been adapted to align with big data requirements.  Here are a few examples:

  • User Access Control

    Robust user access control necessitates a policy-based approach that automates access based on role and/or user settings.

  • Intrusion Detection

    Enables the protection of big data platforms from intrusion and should an intrusion become successful, a quarantine before significant damage is done.

  • Encryption

    Secures massive amounts of user and machine-generated data in flight and at rest while working with various analytics tools, their output data and common storage formats.

Security Problem Potential

When it comes to protecting against big data security incidents, the stakes are higher than ever.  As more and more organizations adopt digital practices and more people get connected, the more potential there is for security breach incidents large and small.

When a breach occurred in a professional development system at Arkansas State University in 2014, 50,000 people were impacted.  While this a large amount, it can’t compare with the 145 million people whose birth dates, home addresses, email addresses and more were stolen in a data breach at eBay that very same year.

Securing your big data platform from any threats in all directions will serve you and your customers well for many years to come.

Data Quality, Accuracy and Reliability

Big data is not immune to inaccuracies. For example, according to a recent report from Experian Data Quality, 75% of businesses believe their customer contact information is incorrect.

When organizations use big data that houses bad data as part of their strategy to make customer relationships stronger, it can lead to big problems.  From small embarrassments to complete customer dissention, overconfidence in the accuracy of data can lead to:

  • Overall poor business decisions
  • Predicting outcomes that never come to pass
  • Not capitalizing on, or a misunderstanding of, customer purchase trends and habits
  • Moving a customer relationship along at an improper pace
  • Conveying a wrong or misguided message to a customer
  • Decreased customer loyalty and trust that, in turn, leads to customer retention issues and revenue loss
  • Wasted marketing efforts
  • Inaccurately assessing various risks

Not only can big data hold wrong information, but it can also include contradictions and duplicate itself as well.  Having a database full of inaccurate data wouldn’t lend itself to providing the necessary precision insight needed to support innovation and growth initiatives.  But because of the massive volume of data that’s involved from so many sources, it would be a bit surprising if big data was 100% accurate 100% of the time.

Reasons for bad data

How does big data wind up in such bad shape?  There are countless possible reasons given that there could be multiple causes in combination that result in a specific error.  While human error, criminal behavior and collection errors stand as examples of general reasons for data errors, here are some more targeted examples:

  • Incorrect conclusions about customer interests
  • Usage of biased sample populations
  • Lack of proper big data governance processes that would identify data inconsistencies
  • Evaluative or leading survey questions that skew true opinion, behavior or belief
  • Usage of outdated or incomplete information
  • Multiple data sources improperly linking data sets
  • Cybercrime activity that alters or corrupts data

So while it’s no secret that big data can be inaccurate, it doesn’t mean that you shouldn’t do whatever you can to control the accuracy and reliability of your data.  Eliminating or minimizing the various ways data inaccuracy festers within your network is key to combating this issue.

While many factors can contribute to the quality, accuracy and reliability of your data, here are a few common problem areas to consider:

Data Silos

A data silo is a warehouse of information under the control of a single department, closed off from outside visibility and isolated from the rest of an organization.  It’s not unlike a farm silo.  We can all see it from the road and we know it’s there, but those without a key have no idea what’s inside.  Instead of grain or corn, however, a data silo houses business-critical information.

The issue with data silos is their isolation.  They store data in disparate units that can’t share information with each other.  There is simply no integration on the back end and therefore the data you’ve collected can’t provide the meaningful, comprehensive insights that should you should gain from it.

Essentially, data silos are catalysts for inefficiency and redundancy that cause resources to be misused and productivity to be reduced.  They’re a breeding ground for inaccurate data that prevent you from seeing the big picture.

What impact do data silos have on your organization?

Basically, there are two results from data silos; the same data is stored by multiple teams, or teams store complementary, but separate, data.  Neither situation yields positive results.

There is obviously cost associated with the storage of data, and paying extra to store the same data in multiple areas is not only inefficient, but it also soaks up valuable resources that could be better utilized in other areas of your business.

There’s also risk involved.  There is the possibility that the “same” data collected in two different data silos can vary slightly.  How would you decide which dataset is correct?  Or more appropriately, how would you decide which dataset is the most accurate or up-to-date?  If the wrong one is chosen, you risk relying on insight driven by outdated information.

Data Silos – An overwhelming challenge

In a 2016 survey, F5 Networks, Inc. asked organizations how many applications were in their portfolio.  54% of those respondents said they have as many as 200 on their networks.  23% said as many as 500 and 15% said as many as 1,000.  Additionally, 9% said between 1,001 and 3,000.    Forbes reported through a separate study by Netskope that the typical enterprise has more than 500 applications in place.

With those staggeringly high numbers, the thought of investigating a data problem and the process of checking each data silo to make sense of relevant information, is overwhelming at best.

In this very real scenario, issue resolution is dreadfully slow not only because each silo must be sifted through, but also because you must determine which fragments of information are relevant to the problem at hand.

How do you solve the data silo problem?

Adding new big data initiatives typically heightens isolation issues, thereby increasing data silos and the problems that come with them.

But adding agnostic big data architecture can enable access to data across your organizational silos and provide comprehensive visibility of that segmented information.  This essentially breaks down the data silos and eliminates their negative impact, while providing you with the ability to effectively leverage all your data investments across any deployment platform or technology stack.

Data Cleanliness

As you know, data isn’t always usable as it’s received.  Preparing it so it can be used for whatever purpose, otherwise known as data cleaning or data cleansing, is normally a slow and difficult process.

There are some estimates that state poor-quality data costs the U.S. economy up to $3.1 trillion per year. That’s certainly a high number, but not necessarily a surprising one given that weak data quality can lead to incorrect results from big data analytics and can also lead to unwise decision making.  Additionally, it can potentially open businesses up to issues with compliance because the regulatory requirements of some industries require data to be as accurate and current as possible.

Appropriate design and management of processes can help lessen the potential for poor data quality at the front end, but they can’t wipe it out.  The solution is to make bad data usable through the removal or correction of errors and inconsistencies in a dataset.  More specifically, the solution is data cleansing.

The Data Cleansing Challenge

Data cleansing is a tedious, time-consuming task that requires multiple complex steps.  According to a survey by CrowdFlower, data scientists spend nearly 80% of their time preparing and managing data for analysis.

A detailed analysis of the data must be performed to uncover existing data errors or inconsistencies that ultimately need to be resolved.  While this can be done manually, it typically requires the help of analytics tools and programs to streamline the process and make things more efficient.

Depending on the number and type of data sources, part of the data cleansing process may also include:

  • Steps to format the data to gain a consistent structure
  • Transforming bad data into better quality, usable data
  • Evaluation and testing of formatting and transformation definitions and workflows
  • Repetition of analysis, design and verification steps

To minimize the potential for working the same data twice, once the data is cleaned, it should be placed back into the original sources to replace its inaccurate, error-rich counterpart.

To be effective, the process of data cleansing must be repeated each time your data is accessed or anytime values change, making it far from a one-off task.

Best Practices to Clean and Preserve Your Data

While we’ve established that data cleansing is a labor-intensive process, there are some best practices you can use up front to help minimize the workload.  Here are a few to consider:

  • Keep Your Data Updated

    Set standards and policies for updating data and utilize technology to simplify this task, such as the use of parsing tools to scan incoming emails and automatically update contact information.

  • Validate Any Newly Captured Data

    Set organizational standards and policies to verify all new data that is captured before it enters your database.

  • Reliable Data Entry

    Implement policies to ensure all necessary data points are captured at the applicable time and ensure all employees are aware of these standards.

  • Duplicate Data Removal

    Utilize tools to help remove any potential duplicate data generated by data silos or various other data sources.

Noise Discovery

Every insight potentially has value, but the challenge of finding the right one at the right time within a huge (and growing) lump of data often proves to be quite difficult.

If uncovered, a few bits of information could provide the invaluable business intelligence you need to push past your competitors.  But those bits often get lost amongst the irrelevant information that surrounds them.  Knowing that the information you need to establish dominance within your industry is right at your fingertips, but you’re unable to grab it, can be frustrating and maddening.

Maksim Tsvetovat, author of the book “Social Network Analysis for Startups”, points out that in order to use big data, “There has to be a discernible signal in the noise that you can detect and sometimes, there just isn’t one.  You approach (big data) carefully and behave like a scientist, which means if you fail at your hypothesis, you come up with a few other hypothesizes and maybe one of them turns out to be correct.”

Leaning on the expertise of a seasoned data scientist can help you discover the source of the noise within your big data ecosystem more quickly, giving you the chance to gain the actionable insight you need to make better business decisions and capitalize on growth opportunities.

Ethics and Data Discrimination

While there’s no question that big data helps create opportunities for businesses to become better marketers and better providers, the deep knowledge they gain about their customers and prospects also has the potential to lead to harmful discrimination.

Consumer data is now being collected at nearly all points along the purchasing journey.  And while they’re being analyzed and assessed in greater detail, most consumers accept this because they receive a payout in the form of customized consumer experiences, better services and tailored marketing efforts that are specific to their wants, needs and interests.  But what if all the insight gained from big data were to make it more difficult for someone to say, secure a loan, because of the story their personal information tells?

To gain an understanding of this potential issue and more, the Federal Trade Commission (FTC) developed and released the report, “Big Data: A Tool for Inclusion or Exclusion?”, which reviews the risks and benefits of utilizing big data for marketing purposes.

Intentional Data Discrimination

The FTC report shows that there are consumer protection laws in place that protect against intentional discrimination and which are applicable to big data analysis, including:

  • Fair Credit Reporting Act

    Ensures that the type of data used with regards to credit, employment, housing, insurance or other benefits is reported with maximum accuracy and only provided for appropriate purposes.

  • Equal Opportunity Laws

    Through the many laws that are currently in place, these are meant to prohibit discrimination based on protected characteristics, such as race, gender, age, religion, marital status and more.

  • Federal Trade Commission Act

    Prohibits unfair or deceptive acts or practices that affect commerce and is generally applicable to most companies acting in commerce.

Unintentional Data Discrimination

Additionally, the FTC report identified that if your use of big data results in a marketing campaign that excludes a protected class based on personal data, you could be subject to legal action.  This is known as disparate impact.

Furthermore, it is not necessary for disparate impact to be intentional for a claim to be founded.

Best Practices to Take Advantage of Big Data, But Not Risk a Discrimination Lawsuit

Understanding the laws that could impact big data practices is an advantageous first step.  To take it a step further, here are a few big data best practices that can help ensure that you’re targeting your customer without defying any data discrimination or disparate impact rules:

  • Fairness Above Analytical Results

    Err on the side of caution.  Even if you’re unsure whether discrimination issues exist or not, favoring fairness is not only best for consumers, but also for your reputation.

  • Accurate Consumer Representation

    Be sure to have a representative dataset and exhaustive consumer representation when investigating marketing or purchasing trends.

  • Understand the Biases in Your Data

    Be aware, hidden biases exist in collected datasets.  Gain an understanding of why they exist so the potential sources can be identified and eliminated as applicable.

Ethical Issues of Big Data

There is a ton of personal data given to businesses today.  Ethical concerns come into play if those businesses ever use that data for monetary gain, or for purposes other than what it was initially collected for.

While laws and regulations are in place to guide organizations on what is ethically acceptable, the advancements in analytics and technology itself has widened the space between what is possible and what is lawfully permitted.

Additionally, different people may have different opinions about the definition of “ethical use” of collected data.  For instance, a stakeholder of a business and a customer of that business may have different opinions about what the ethical use of that customer’s personal data is.

While traditional ethics principals may not be targeted enough toward the ease and speed of today’s data collection and analytics, there are a few principals that organizations should follow:

  • Data shared with third party companies should have restrictions on how and if it is shared further.
  • People should be provided with a transparent view of how their data is being used.
  • A person’s private data that’s consensually provided to an organization should not be offered up to other organizations for use with any traceability back to their identity.

The People Problem:  Lack of Skill and Organizational Buy-In

The task of finding and retaining skilled data scientists or data analysis experts is daunting to say the least.  There is a worldwide shortage that continues to grow as the demand for talent greatly exceeds the supply.  One study by McKinsey projected that the U.S. alone may face a 50% to 60% gap between them.

This shortage of skill has been a noticeable stumbling block for many organizations seeking to better utilize big data and develop more effective data analysis systems.  So much so that 43% of companies have cited their lack of appropriate analytics skills as a central challenge to their success.  Without talent in place to identify and extract valuable data, the already difficult task of gaining actionable insight becomes even more problematic.

As if it weren’t already difficult enough to find talent, the number of data science and analytics job openings are projected to grow 15% by 2020.  Moreover, the Bureau of Labor Statistics estimates that the number of roles that include this skill set will grow by 30% by 2025.  While university analytics and data science programs are growing, right now they simply can’t produce enough sufficiently trained program graduates to keep pace with the explosive demand for big data talent.  And as more and more organizations explore machine learning and AI adoption, this talent gap issue becomes even more pressing.

In addition to understanding data from a scientific perspective, in a perfect world, you really need people that understand your business and your customers, along with how your data could best be applied to benefit both.  Additionally, as fast-paced as the IT environment is, you’d also want people well-versed in new technology.

What can be done to alleviate the big data talent gap?

  • Encourage everyone to be a data scientist

    There are loads of training programs available for citizen data scientists to gain data science skills to go alongside their existing jobs and business experience.  Many schools have even re-configured their data science programs to oblige students who are employed full time.

    • What is a citizen data scientist?

      Gartner defines this as a ‘person who creates or generates models that leverage predictive or prescriptive analytics, but whose primary job function is outside of the field of statistics and analytics.’

  • Give way to automation

    Gartner estimates that by 2020, over 40% of data science tasks will be automated.  While this could be a mix of expert-level tasks, simple tasks and everything in between, automation could reduce the demand for data scientists while also increasing the supply as the position shifts from implementation to more integration.

  • Better Understanding of Roles

    Big data is still a fairly new practice and the role of data scientist is even newer.  Because of this, job descriptions can differ from organization to organization and are often written by people who are uncertain about what they need, but certain that they need someone to do it.  With twisted expectations, many say they need a data scientist when in fact, they need an analyst.

Insufficient buy-in

In some cases, big data initiatives fail without ever really having a chance to prove out their worth.  Sure, there are technological challenges that must be overcome, but people can present a challenge too.

According to the 2018 NewVantage Partners Big Data Executive Survey, 48.5% of respondents reported that the greatest challenge to achieving their goal of a data-driven culture is people issues.  Additionally, 64.7% indicated that business adoption of big data initiatives remains a major challenge.  When asked what the biggest challenge is to successful business adoption, respondents identified the top three roadblocks as:

  • Cultural resistance to change: 5%
  • Understanding of data as an asset: 30%
  • Insufficient organizational alignment and business agility: 25%

For organizations to capitalize on the big opportunities offered by big data, things often have to be done differently.  Big data is a colossal change.  Without a clear understanding of what big data is, what its benefits are, what infrastructure is needed and most importantly, without buy-in from top management and then on down the ladder, a big data initiative is doomed for failure.

If personnel resist necessary changes to existing processes as part of a big data adoption, they can easily obstruct the progress of the initiative and cause their organization to waste valuable time and resources.

To ensure an acceptance and understanding of big data, trainings, workshops and presentations are typically good places to start.  These events help to alleviate concerns over big data and provide up-front, detailed information on big data’s day-to-day impact.  Only once organizational buy-in is gained can a big data initiative realize its full potential.

The Next Steps

Big data is an exceptionally valuable resource that, if used properly, inspires true innovation and delivers insight abilities that we never thought would be possible.  But as we continue to tap into this resource, we must consider the risks and problems that come along with it

Given the trajectory of big data usage, if you’re wondering what a big-data-insight-driven future may look like, it may also be worth considering what future problems or issues could arise. For just that reason, we’ve created a quick reference list compiling 8 Problems Big Data Will Face in the Future. Click the link to receive your free copy and Contact Us if you have any questions.

After considering what the future of big data may hold, the next step is to not only think about how to implement a big data strategy, but how to fast-track that strategy so you can start reaping its benefits as quickly as possible. The key is to roadmap the best route for your unique business needs while targeting high impact areas of your business where your return on investment will come swiftly.

When you’re ready to take the next step to discover how a big data strategy might benefit your organization, check out these 3 Ways to Fast-Track Your Big Data Strategy.

You might also be interested in…

Blog Home
Enterprise Integration
Share This