The extraordinary volume of data that is nowadays produced in different contexts and by various platforms and devices (such as social networks and multi-sensor system) has given rise to the Big Data movement and data economy. Big Data is one of the main pillars of the information society and a cornerstone for the next generation of smart systems, which act automatically and intelligently, optimizing productivity, business processes and managerial decisions. Despite significant advances in our ability to collect, store, manage and process data characterized by the 4Vs (Volume, Velocity, Variety and Veracity), Big Data’s business value still lies in the analytics. Raw data tends to be useless, unless it is properly processed and transformed to actual insights for any business. Applications helping with diagnosis of diseases, forecasting of the demand for electricity, prediction of a machine’s end of life, identification of the driving context etc. are all based on the processing of large volumes of data and deriving knowledge.
Big Data Analytics and Knowledge Extraction
There are many different ways and techniques for extracting knowledge from raw Big Data. In most cases data scientists, employ statistics for testing some knowledge-related hypotheses and machine learning as a means of building a high-performance software agent that is able to learn from the data. As part of the data mining and knowledge discovery process, scientists combine statistics and machine learning in a way that integrates theory and heuristics. Furthermore, they undertake other prerequisite activities for extracting and using knowledge, such as data cleaning, data transformation, as well as visualization of the extracted knowledge.
A data mining and knowledge extraction process may have different targets depending on the business problem at hand. Some of the most common tasks include:
- Classification, which aims at predicting the class of an item (e.g., automatically predicting whether a loan application will be accepted or rejected).
- Clustering, which automatically clusters datasets into distinct categories (e.g., clustering customers into different market segments).
- Association, which identifies relations between two or more variables in a large dataset (e.g., identifying whether a customer that buys a) a shirt and b) a pair of trousers is likely to buy a jacket as well).
- Summarization, which summarizes the properties of a dataset (e.g., automatically summarizing a document containing natural language).
- Deviation Detection, which is about identifying events, observations or items that do not follow a specific pattern.
For each one of the above tasks, data mining experts, have at their disposal, several tools and techniques (e.g., decision trees, Bayesian methods, linear regression, k-means clustering), which need to be appropriately configured and parameterized according to the business requirements at hand. The identification, validation and ultimate deployment of an optimal data mining model involves a series of tasks, which are carried out in an iterative fashion.
Read More: Big Data Analytics for SMBs
The following tasks are part of the data mining process.
The CRISP-DM Data Mining Process
Data mining processes analyze the datasets and evaluate alternative data mining models as a means of identifying and selecting the most suitable ones for deployment. The most widely used data mining process is the CRISP-DM (Cross Industry Standard Process for Data Mining), which comprises the following six phases:
- Understanding the business question: This initial phase is about understanding the purpose and the scope of the data mining process. It identifies the requirements and the ultimate target of the knowledge process, such as what has to be classified or predicted, with what speed and on the basis of what accuracy. Moreover, this phase creates an initial plan for dealing with the problem given the available datasets, including a list of methods to be explored.
- Data Understanding: This phase focuses on the collection and inspection of the available datasets. It aims at identifying data quality problems, while at the same time gaining some insights on the methods that are likely to be effective. The latter insights are based on the experience of the data scientists, who are in most cases able to identify the main properties of the available datasets by simply reviewing them.
- Data preparation: As part of this phase, the datasets are prepared to be used as input to a data mining model. This preparation process entails different transformation steps, such as filtering out fields that are not useful, homogenizing data formats, and cleaning the data from empty or incomplete attributes.
- Data Modeling: This phase focuses on the selection and calibration of the data mining models that will be used for the target problem (e.g., decision trees or linear regression models). This phase is very closely affiliated to the data preparation activities, given that different models may require different input datasets.
- Data models evaluation: During this phase, the candidate data models are evaluated in terms of their ability to solve the problem at hand. A successful data model should be able to solve the target business problem (e.g., classifying a customer to a proper segment), while at the same time respecting non-functional constraints (e.g., performance). In cases where the business requirements cannot be met, the first phase of the CRISP-DM is revisited in order to (re)formulate the business problem.
- Deployment: This is the final phase of the CRISP-DM based data mining process phase. It entails the actual deployment of the successful machine learning models. As part of this deployment phase, the developed system needs to be integrated in its operational environment (e.g., with other business information systems), while the extracted knowledge needs to be appropriate visualized.
Read More: Relationship between Big Data and Analytics
Other Data Mining Methodologies
CRISP-DM is not the sole data mining methodology is use. Other popular methodologies include KDD (Knowledge Discovery in Databases) and SEMMA (Sample, Explore, Modify, Model, and Assess). These methodologies comprise of slightly different phases and activities when compared to CRISP-DM. However, they have similar characteristics to CRISP-DM:
- These are iterative, since they comprise of phases that can be executed in an iterative approach, till results that meet business requirements are produced.
- These are sector agnostic as they can be applied for knowledge extraction regardless of the application domain of the business problem at hand.
Moreover, they comprise of similar phases. For example, KDD includes the selection, pre-processing, data transformation, data mining, and interpretation-evaluation. On the other hand, SEMMA comprises of the sampling, exploration, modification, modeling and assessment phases. While there is no one-to-one mapping between these phases, the names of these phases indicate a clear pertinence to the structure and phases of the CRISP-DM data mining process.
In the Big Data era, it is very important to employ experts that have a very good understanding of data mining processes, as the business value of Big Data is mainly in the analytics. Given the the proclaimed talent gap in Big Data experts, it’s always a good idea to look for reliable and knowledgeable business partners that can help you derive knowledge and maximize the value of your data.
How’ve you been?During our exploration of a new website, we stumbled upon a webpage that immediately grabbed our attention. We are thoroughly impressed with our initial findings and eagerly anticipate your upcoming updates. We are excited to delve deeper into your website and discover all the fantastic features it has to offer. my web page.. Come by 김포 풀싸롱
https://wfqqmy.com/
What’s good?I was truly impressed by your post, which contained incredible content and valuable insights. It has sparked a deep interest in me to delve further into this topic, as it aligns with my passion for learning. Your expertise and unique perspective are greatly appreciated. Thank you for generously sharing your thoughts and taking the time to do so! my web page. Come by 토지노사이트
https://24hnetticasino.com/
An insightful article on big data knowledge extraction! The comprehensive overview and practical tips are invaluable for anyone in the filed.
GreetingsWhile browsing another website, we came across this webpage and it immediately caught our attention. I am thoroughly impressed with what I have seen so far, and I am eager to stay updated with your future content. I am excited to delve deeper into your website and explore all that it has to offer.All the best. My site.. Come by 빅벳
https://xn--tl3bt2a04co2k97o.com/
Hi-helloI am deeply moved by your post, which has sparked a strong curiosity within me to delve deeper into the subject. Your extensive knowledge and insightful perspectives are truly admirable, and I sincerely value your unique point of view. Thank you for generously sharing your thoughts and taking the time to do so! my page. Come by 카지노 커뮤니티
https://oncapick.com/
Hi-helloDuring our exploration of a new website, we stumbled upon a webpage that instantly grabbed our attention. We are thoroughly impressed with what we have discovered thus far and eagerly anticipate any future updates. Excitement fills us as we delve deeper into your website, eager to uncover all the incredible features it has in store.Catch you later. my page.. Come by 토렌트
https://t4.torrentpi148.com/
YoWhile browsing a new website, we came across a webpage that immediately caught our attention. We are extremely impressed with what we have seen so far and are eagerly looking forward to your upcoming updates. We can’t wait to delve deeper into your website and uncover all the amazing things it has to offer.Ciao. my web page… Come by 구로 스파
https://wfqqmy.com/
SalutationsI am truly impressed by your outstanding article. Your expertise and clarity on this subject are truly remarkable. I would love to subscribe to your feed to keep up with your future posts. Thank you for your exceptional work, and please keep up the great work. my page Come by 재테크
https://rarebirdinvestors.com/
We stumbled upon this page on a different website and thought we’d check it out. I’m liking what I see, so I’m going to follow you. Looking forward to further exploring your website.Keep in touch.
https://lineagefrees.com/
I always emailed this web site post page to all my contacts, because if
like to read it next my friends will too.
my blog; 카지노솔루션
Hi, Neat post. There is an issue together with your web site
in web explorer, may check this? IE nonetheless is the
marketplace leader and a good part of other folks will miss your magnificent writing because of this problem.
My website 슬롯 api
Nice post. I learn something new and challenging on sites I stumbleupon everyday.
It will always be useful to read articles from other authors
and use a little something from their web sites.
Feel free to surf to my webpage :: 슬롯 게임
AhoyYour blog post immediately caught my attention! Since stumbling upon your blog, I have been completely absorbed in your other articles. The engaging content you provide has left me wanting more. I have subscribed to your RSS feed and am eagerly looking forward to your upcoming updates!See you around.
https://downloadandroidfiles.com/
While exploring a new website, we came across a webpage that immediately caught our eye. We are extremely impressed with what we have seen so far and are eagerly looking forward to your future updates. We are excited to explore your website further and uncover all the amazing features it has to offer.Godspeed.
https://roanokewine.com/
Hello thereI really loved your post and found it incredibly insightful! I was wondering if you could provide more information on this topic. I would greatly appreciate it if you could go into more depth. Thank you so much for taking the time to share your thoughts!Stay well.
https://universj.com/
What’s happening?I was immediately captivated by your blog post! Ever since I discovered your blog, I have been engrossed in your other articles. The captivating content you offer has left me yearning for more. I have subscribed to your RSS feed and am eagerly anticipating your future updates!Take care.
http://jaewook.net/
Good to see youWe stumbled upon this webpage on another site and were intrigued to explore further. I’m really liking what I see, so I’ll be keeping up with your updates. Looking forward to delving deeper into your website.Have a great day.
http://jaewook.net/
Sup?Your post deeply touched me and sparked a strong curiosity to delve deeper into the subject. I hold great admiration for your profound insights and expertise, and I truly value your unique perspective. Thank you for generously sharing your thoughts and taking the time to do so!
http://jaewook.net/
I am truly impressed by your article, as it has left a lasting impression on me. Your talent for explaining intricate concepts with clarity and expertise is truly outstanding. I am excited to subscribe to your updates and follow your future work. Thank you for your exceptional contribution, and I wholeheartedly support and encourage you to continue excelling in all your endeavors.Later, gator.
http://jaewook.net/
HelloThe content of your post has truly made a lasting impact on me, sparking a strong curiosity to delve deeper into the topic. Your insights and expertise are greatly appreciated, and I value your unique perspective. Thank you for sharing your thoughts so generously and taking the time to do so!Take care.
http://jaewook.net/