Steps to Implementation
Once the right infrastructure and skills are in place, and the business problem to be solved is chosen, it’s time to begin implementing.
Organizations are often faced with the choice between a long view and solving short-term problems. Consulting with a trusted partner who has the necessary expertise to help design a viable solution may help you settle on a good starting point and possible immediate wins.
Where and how to store the new data the business will utilize is a major consideration. On-premise and cloud solutions both have their merits, and a hybrid of the two often accomplishes the goals of speed, security, stability, and scalability.
Likewise, new software platforms may be worthy of consideration. Open Source solutions continue to play a major role in analytics clusters. Having a community innovating on a shared platform may spur significant advances for your organization. Apache* Spark* and Hadoop* are two essential analytics solutions that are open source, with more options coming as needs are better defined.
Soon your organization will have more actionable information than it ever imagined, and dealing with unstructured data is wholly different than traditional information stored inside a relational database. Cloud Platform-as-a-Service (PaaS) solutions can help build skills and accelerate progress early on.
Information will end up in what’s often called a data “lake,” since it’s so much information in one place. However, this “lake” is actually a series of “puddles” and it’s the job of the organization’s scientists and architects to connect the “puddles” and make sense of it all. This includes discerning which bits of information are related enough to look at together, as well as which data sets seem dissimilar but could reveal valuable insights under the lens of analytics. Software-defined storage initiatives could be necessary to aggregate data in a single logical “place” and format it for a variety of different analytics engines. Some storage solutions even have enough capability to build analytics engines that run directly on the storage node itself.