Knowledge discovery in databases thesis

Among these, the discovery of associations, discovery of groupings, discovery of classifications, discovery of forecasting rules, classification hierarchies, discovery of sequential patterns, and discovery of patterns in categorized segmented and time series, which are found in Alvares Theses that had Brazil as their research topic were extracted.

The total sample was 1, theses bibliographic records , among which were also included all theses written by Brazilians and defended in France between and We chose to study various tendencies in procedure, created and chosen by means of applying the Dataview bibliometric software which will be the object of commentary and analysis in subsequent sections of this study. After the data preparation stage in which Infotrans Version 4. Dataview is based on bibliometric methods whose ultimate objective is to turn data into intelligence for decision -making by creating elements for statistical analysis.

To achieve this, reformatting data is a basic condition for bibliometric treatment. After statistical analysis the information retrieved will have a decisive influence on generating knowledge and intelligence, a process in which two aspects will be considered. Both value and validity of information will have a decisive influence in the search for knowledge in databases KDD.

This is the philosophy which must direct any study concerning data mining as well as generating knowledge. When applying Dataview it became obvious the importance of the previous phase of preparation of data data cleaning done with Infotrans. The quality of the data generated by Infotrans did result in clear results from the bibliometric analysis.

In Figure 5 we present the situation of Dataview in a bibliometric study. Another important characteristic of the Dataview software relates to the measurement characteristic of bibliometry established on numerical bases which in their turn are created by using occurrences. Thus, for each unit of bibliographic element, occurrence must be dealt with in three ways, a primary state - simple location of occurrences, presence or absence of reference elements, b condensed state - expansion of these occurrences or frequencies, and c co -occurrence, which represents the combination of primary and condensed states.


  1. Files in this item.
  2. Architecture for knowledge discovery and knowledge management | SpringerLink;
  3. wigners symetries and reflections essay;
  4. PhD Thesis!
  5. career shadowing essay.
  6. intermediate 2nd year english model papers?
  7. le conseil constitutionnel et le citoyen dissertation;

In this way lists will be created - occurrence frequency and co -occurrence and frames - frameworks of presences and absences Rostaing, Bradford chose periodicals for his analysis because of their characteristics of occurrence of themes and tendencies, and found that few periodicals produce many articles and many periodicals produce few articles. It is sub -divided into Zipf's First Law which relates to the frequency of words appearing in a text number of occurrences of words. It is controlled by the following mathematical expression:.

Zone II - Interesting information: found between Zones I and II and showing both peripheral topics and also potentially innovative information. It is here that technology transfers related to new ideas should be considered, and. Zone III - Noise: characterized by containing concepts that have not yet emerged in which it is impossible to say whether they will emerge or if they will remain merely statistical noise. A third of the total of theses which had Brazil as either the researcher's country of origin or as the topic of research, were found in the areas of economics, sociology and technological sciences, closely followed by and 98 theses in the areas of geography and biology respectively, as may be seen in Graph I and which corresponds to Zone I - Trivial information.

As France, together with Germany, has one of the most important and longstanding schools of sociology, it is therefore a favorable location for the elaboration of academic studies in this area, as is shown in Graph I. The same is true of economics, where we also find a strong interest in Latin American topics. These are topics that students researching Brazil look for and are of constant interest. In terms of the area of technology, France is one of the world leaders in technological development, having an efficient system of technological innovation that justifies its position in the rankings of this field of research.

It should be pointed out that the areas analyzed were those in which the numbers of Brazilians in France grew during the period up to , after which time there was a decline in demand. Zone II - Interesting information, in its turn, represents those areas that are emerging, which is indicated by the areas of education, medical sciences, Latin American studies and history, which have been increasing in popularity since Some of the facts that have been creating interest in these areas of study are found in the influence of the new scientific and technological dimension as is the case of the areas of education and medicine, which are constantly affected by new discoveries and technologies that move them forward in the field of human knowledge.

In the case of history, the fact of our living in a period of abrupt transition in this type of society, forces us to engage in a constant re -reading and search continually for explanations concerning new aspects of this society. This performance results fundamentally from the strong influence of French historiography in Brazilian academic life. As a result of the facts mentioned above we may state that not only does the number of thesis supervised by Mauro account for the significant number of works noted in the area of history, but that this is also clearly due to the fact that French historiography has been the main catalyst for the interest of Brazilian historians seeking training abroad.

By analyzing Table 3 we find that between and , only the area of Law achieved high levels of interest, the greatest concentration that was found relative to all the other areas. This situation is noticeable and may be explained in part by the political circumstances prevailing in Brazil during the s. The coincidence of the high level of concentration of theses defended in France with the peak of the military dictatorship in Brazil from raised the level of interest in understanding the state of law imposed there, especially in relation to the citizen's basic rights and guarantees.

My current research interests are Data Mining

In the area of linguistics, it will be seen that it peaked between and , with a tendency to recapture interest after By and large, in relation to the number of thesis defended during the period in question, we may note that since the number has been falling rapidly, as may be seen in Graph 4. The reason for this is perhaps found in the fact that since there has been uncertainty about grants for overseas study in the areas of humanities and social studies, which has meant that the area of technology alone does not reach high levels as the whole. It is interesting to note that in the period of relative equilibrium in the curve, which oscillates between 36 and 58 theses defended between and , an average of about 47 theses were defended each year, with the field of economics being especially prominent during this period.

In the field of information sciences, twelve thesis were defended between and The golden age was between and , with a total of five thesis. Prominent among the supervisors are F.

Ballet, followed by J. The other five, each responsible for one thesis, were P. Albert, M. Menou, M. Mouillard, J. Perriault and G. Thibault, the latter, based in Bordeaux, being the only one working outside Paris. Menou has worked in information sciences as an international consultant in Canada, where he has developed several lines of research on the impact of information on development.

Knowledge discovery in scientific databases

With regard to Zone III - the so -called zone of noise, in spite of its not yet having established emerging concepts and because it is not a very conclusive area, it must be systematically monitored since it can show, or at least allow, in the analysis of weak signals, the inference of future interests in training and research. Thus we should not dismiss it a priori. In this zone are found art and archaeology, literature, political science, science and technology, philosophy, administration, information science and communication studies, among others. Discovery of knowledge occurred gradually as the data mining process took shape.

In the first stage - defining the problem - it was decided to explore the database related to Brazil both by key -word and by origin of supervisor. The second stage - cleaning the data - brought about the first contact with the data, extracting only those of potential interest in discovering a pattern. In the third stage - carrying out the data mining per se - it was decided to use the Dataview software which already had embedded in its system statistical rules and the ability to visualize data to find knowledge.

The first analyses and findings come from this phase, in line with the aim of the research. The fourth stage - analysis of data - new associations were created and knowledge emerged. On the academic side the Brazilian Federal Universities already started to use data mining in laboratorial research and consultancy work using several softwares, among them Clementine SPSS, Although the utilization of data minig in Brazil still is in its initial phase, in the governmental and productive sector there are signs of its application.

In the data stream model, data arrive at high speed, and the algorithms that must process them have very strict constraints of space and time. In the first part of this thesis we propose and illustrate a framework for developing algorithms that can adaptively learn from data streams that change over time.

Library and Knowledge Center

Our methods are based on using change detectors and estimator modules at the right places. We propose an adaptive sliding window algorithm ADWIN for detecting change and keeping updated statistics from a data stream, and use it as a black-box in place or counters or accumulators in algorithms initially not designed for drifting data.

01-04-what-is-the-process-of-knowledge-discovery-in-databases-kdd, E & ICT Academy

Since ADWIN has rigorous performance guarantees, this opens the possibility of extending such guarantees to learning and mining algorithms. We test our methodology with several learning methods as Naive Bayes, clustering, decision trees and ensemble methods. We build an experimental framework for data stream mining with concept drift, based on the MOA framework, similar to WEKA, so that it will be easy for researchers to run experimental data stream benchmarks.

Trees are connected acyclic graphs and they are studied as link-based structures in many cases.