Understand whether you need structured or unstructured data, or data that’s in voice, text or photographic formats. Once the size and extent of the data required is clear, start investigating where to find it.
Collecting, annotating, and curating relevant and compliant data can be difficult and costly. Many are investing in tools and processes, while others are purchasing data sets or pre-trained models from third parties.
Training data often comes from archives of data reserves and is run on cloud infrastructure, but inference may run on live data from sensors, at the network edge. This will have implications for the size of data set to be used, and the infrastructure required.
The Internet of Things (IoT) will include a projected 200bn devices by 2020, and the data produced is expected to total 40 zettabytes by that time. Developing AI applications that can mine this massive amount of data will require advanced infrastructure, effective job scheduling and storage management technology.