BKW Sigma Holiday Specials

The Three Main Data Challenges of Machine Learning

by Tech Mainstream Staff


May 6, 2019


The Three Main Data Challenges of Machine Learning


According to the Western Digital Blog article "3 Key Data Challenges of Machine Learning" there are three critical data challenges of Machine Learning: Quality, Sparsity and Integrity.

Quality assesses data from external sources where "no quality control or guarantee on how the original data is captured" and "you need to understand the quality of the data and how to prepare it." Data from experiments and examples must be free from errors and must be cleaned up before proper analysis is conducted.

Sparsity involves incomplete metadata especially when data comes from diverse sources without a standard definition of metadata. When data sources are combined, often fields do not correspond. "How do you correlate and filter data" when you have the same type of data with different metadata fields populated? The answer is "through the metadata disclosing when it was captured. When scientists are doing historical analysis they need metadata in order to be able to adjust their models accordingly."

Integrity is data accuracy and consistency assurance:

"The chain of data custody is critical to prove that data is not compromised as it moves through pipelines and locations."

When capture and ingestion of the data is controlled data veracity is not an issue. Yet issues arise such as when one cannot maintain the data was recorded originally as intended nor that the data you obtain is the same as when it was originally recorded. Therefore data integrity is contigent on a combination of security technologies and policies such as using https and encryption. Policy driven access control eliminates human errors.

In summation, organizations and businesses should begin refining its machines learning environment success by defining data collection policy, metadata format, and apply standard security techniques.

 

Visit Tech Mainstream's homepage for more stories.

 

 TECH IN A SECOND     

Read All News...


Upcoming Tech Events

December 2-3, 2019- Future Compute

December 8-14, 2019- NeurIPS | 2019

December 10-11, 2019- Global Growth Marketing Conference 2019

January 12-15, 2020- SII 2020 - International Symposium on System Integration

January 23-24, 2020- Hands-On Workshop: Developing Modern Web Apps with Azure

February 4-6, 2020- Social Media Strategies Summit 2020

February 12-15, 2020- 2020 6th International Conference on Mechatronics and Robotics Engineering 

February 19-21, 2020- SMX West 2020

February 24-26, 2020- DNN Summit 2020

February 24-28, 2020- RSA 2020

March 9-11, 2020- The Fourth IEEE International Conference on Robotic Computing

March 15-18, 2020- Strata Data Conference

March 24-25, 2020- AI World Congress 2020

March 30-April 3, 2020- Visual Studio Live! Austin

April 6-9, 2020- IEEE International Conference on Soft Robotics

April 15-17, 2020- MarTech

June 8-10, 2020- SMX Advanced 2020

June 22-26, 2020- 17th International Conference on Ubiquitous Robots

 

 


Tech Definitions in the News

Arduino is an open-source electronics platform based on easy-to-use hardware and software. It's intended for anyone making interactive projects. Arduino boards are able to read inputs - light on a sensor, a finger on a button, or a Twitter message - and turn it into an output - activating a motor, turning on an LED, publishing something online

Source: https://www.arduino.cc/en/Guide/Introduction/

Augmented Reality is an enhanced version of reality where live direct or indirect views of physical real-world environments are augmented with superimposed computer-generated images over a user’s view of the real-world, thus enhancing one’s current perception of reality.

Source: https://www.realitytechnologies.com/ augmented-reality/

Chatbot is a piece of software that interacts with users in a conversational way.

Source: https://snatchbot.me/insight/250/ intelligent-chatbots


Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence.

Source: http://deeplearning.net/


DevOps is a set of software development practices that combines software development (Dev) and information technology operations (Ops) to shorten the systems development life cycle while delivering features, fixes, and updates frequently in close alignment with business objectives. Different disciplines collaborate, making quality everyone's job.

Source: https://en.wikipedia.org/wiki/DevOps


Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. Containers allow a developer to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package.

Source: https://opensource.com/resources/what-docker


Edge computing is a distributed computing paradigm which brings computation and data storage closer to the location where it is needed, to improve response times and save bandwidth.

Source: https://en.wikipedia.org/wiki/Edge_computing


FogHorn is an intelligent Internet of Things ( IoT) edge solution that delivers data processing and real-time inference where data is created.

Source: https://aws.amazon.com/blogs/architecture/foghorn-edge-to-edge-communication-and-deep-learning/


Hybrid Cloud is a computing environment that combines a public cloud and a private cloud by allowing data and applications to be shared between them.

Source: https://azure.microsoft.com/en-us/overview/what-is-hybrid-cloud-computing/


Kubernetes (k8s) is an open-source system for automating deployment, scaling, and management of containerized applications.

Source: https://kubernetes.io/blog/



WWW2 and WWW3 (k8s) are hostnames or subdomains, typically used to identify a series of closely related websites within a domain, such as www.example.com, www2.example.com, and www3.example.com; the series may be continued with additional numbers: WWW4, WWW5, WWW6 etc. 

Source: https://en.wikipedia.org/wiki/WWW2


Did You Know?

Duck Duck Go Search Engine has six different themes to choose from for its search interface.


Tech Gallery

LG SIGNATURE OLED TV R9 - 4K HDR Smart TV
Image Credit: LG