Abstract
Data farming is a process to grow data by applying various statistical, predictions, machine learning and data mining approach on the available data. As data collection cost is high so many times data mining projects use existing data collected for various other purposes, such as daily collected data to process and data required for monitoring & control. Sometimes, the dataset available might be large or wide data set and sufficient for extraction of knowledge but sometimes the data set might be narrow and insufficient to extract meaningful knowledge or the data may not even exist. Mining from wide datasets has received wide attention in the available literature. Many models and algorithms for data reduction & feature selection have been developed for wide datasets. Determining or extracting knowledge from a narrow data set (partial availability of data) or in the absence of an existing data set has not been sufficiently addressed in the literature. In this paper we propose an algorithm for data farming, which farm sufficient data from the available little seed data. Classification accuracy of J48 classification for farmed data is achieved better than classification results for the seed data, which proves that the proposed data farming algorithm is effective.
Original language | English |
---|---|
Title of host publication | Not Known |
Pages | 114-118 |
DOIs | |
Publication status | E-pub ahead of print - 23 Aug 2018 |
Event | 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence) - , India Duration: 11 Jan 2018 → 12 Jan 2018 |
Conference
Conference | 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence) |
---|---|
Country/Territory | India |
Period | 11/01/18 → 12/01/18 |
Keywords
- Interactive data exploration and discovery
- Methodologies and Tools
- Data Farming
- J48 Classification
- Cardiac Patient data
- Missing value estimation.