The Nexus of Big Data: Context

The Swamp: Choose Your Own Adventure



Big data has a big problem. You know that data lake you recently started filling? Its data ecosystem is already well on its way to a catastrophic tipping point. Has a platform vendor convinced you to dive in with visions of a serene data-lake-side cabin in the woods? They probably forgot to mention how many horror stories have the same setting.

Don’t hold a grudge. Vendors were just responding to the market created by a steady stream of people eager to claim the title of Big Data Thought Leader.

Let’s also give those thought leaders the benefit of the doubt. Many of them really believed the data lake would be the answer to the big data problem. After all, the data lake finally made it so the well-intentioned but much-maligned database administrator wasn’t the bottleneck in the ad hoc analytics pipeline. “Analysts, the data is finally yours!” they proclaimed.

Now, the problem: it turns out data lakes quickly turn into data swamps if massive preservation efforts aren’t undertaken from the start. If only you had a data preservationist who could keep the data lake all neat and orderly. By the way, what’s your well-intentioned but much-maligned DBA up to these days?

So, there you have it. You thought you invested in a data lake, but now you have a data swamp. And it just sits there, a primordial soup, waiting to spawn intelligent lifeforms.

Your vendor may eventually show up with a decent solution if you’re willing to wait (and wait, and wait) for evolution to take its course, but even then, who knows what you’ll end up with? Besides, you really need to get something–something useful–out of the swamp now.

Context can do that for you by turning data into information, information into knowledge, and knowledge into action–all informed by the people who know your data best.

To make this possible, context borrowed a few concepts from the human brain. Like the unconscious brain, context accepts data pouring in from countless sources (at MIOsoft, we started with thousands), combines it with existing knowledge derived from past experience, and acts–immediately and efficiently. Like the conscious brain, context remains active and provides an ongoing focus for situations requiring deeper analysis.

But unlike the human brain, context isn’t fooled by illusions of light (team white and gold here), and its memory isn’t impaired by stress. Context doesn’t forget where data came from, or what form it arrived in. It also never forgets what it’s been taught by human experts.

Context provides concurrency out of the box. A context is a group of closely related objects: technically, a graph. Though one context can have a relationship to another, they have no shared parts, a concept called boundedness. This means it’s possible to update more than one context at a time.

Context is empowering. Code adds behavior to context. Inefficient code, in context, is still guaranteed to be massively parallelizable. In fact, pairing logical and physical structure allows context to be ready to fly with a single disk seek–speed without expensive hardware or complicated infrastructure design. Now your best people can finally solve business problems again!

Context is inductive. It learns like science–incrementally, based on observation. It uses unsupervised machine learning to discover when two fragments of data are about the same person, or how a group of people are connected to one another. It learns by revising existing beliefs when new information contradicts previous observations.

Context is prescriptive. It can push knowledge and recommended actions to connected systems without human intervention. It can be given predictive models that allow it to understand how and when to act. And complex decision-making is context’s bread-and-butter. Because it acts in the moment, context enables smart applications that intend to influence behavior.

Context is operational and analytical. It is always ready to make the best decision, informed by the most up-to-date knowledge at a given moment in time. Yet it also facilitates deep analytics on that same knowledge store. There’s no need to maintain multiple copies of your data.

Context is bigger data.

Big data isn’t new for big enterprise–that’s how you ended up mired in the data swamp, after all–but IoT promises a volume, variety, and velocity of data that is unprecedented. Ten years from now, every business will be thinking about bigger data.

IoT startups are creating sensors that produce data at a rate that will, for example, dwarf all the transactions flowing through today’s credit card systems. Some companies are already thinking about solutions that will need to process a billion events per second. Just look at the data created by the sensors on one self-driving car for a taste of where things are going. Bigger data is the norm in an IoT world.

So why doesn’t every business have context, especially when the innovation-promising collision course of IoT and big data pretty much requires it?

For a start, only a small handful of vendors (caution: shameless self-promotion ahead) currently have the technology portfolio necessary to attack context efficiently. MIOsoft has been operating petabyte-scale systems with all these capabilities since 2009. We’re already where we need to be as IoT accelerates the adoption timeline for these kinds of systems for everyone.

Why are there so few vendors if context systems are going to be so important? It really boils down to a flaw in how people have been thinking about big data solutions. Most of the technological parts needed to create context have long been considered independent IT solutions for separate problems. Different people work on implementing different systems under the assumption they can all eventually be glued together.

The reality is that successful context systems will not be able to afford the design torque inherent in piecemeal solutions: the transaction costs are just too high to create a solution that is both performant and cost-effective at scale.

Though we’re eager for you to have the superpowers that come with a unified data platform, we’re concerned about how some vendors are going to offer those powers to you. Many of them rely exclusively on an in-memory database to be able to provide both analytical and transactional capabilities. Memory is orders of magnitude more expensive than disk. We don’t think superpowers should require super expenses.

And, keep in mind, data may grow faster than Moore’s Law in the years ahead.

MIOsoft saw this coming–that’s why our context technology was purpose-built to deliver in-memory performance using spinning disks inside commodity hardware clustered using commodity networking. Superpowers can be affordable!

Getting context just right took a lot of time and a lot of work, but it’s what we do. The good news is that because context is ready now, instead of spending the next decade pivoting from one incomplete data swamp solution to the next, you can leap ahead and start using the future’s data technologies today.