Title: A new propagation model coupling the offline and online social networks
Speaker: Jiaxing Chen
Modeling the information diffusion through both online and offline interactions has become a challenging topic that has attracted much attention from the field of industrial informatics. To this end, we characterize the coupling effect of online and offline communication by developing a multilayer network propagation model, which considers three novel aspects: two distinct contagions taking place at two types of social software; competition and dynamic processes between them under the joint influence of individual selection and information dissemination; and information coupling between offline and online social networks.
Supervisor: Ying Liu, Qi Zeng
Title:The Gradual Resampling Ensemble for mining imbalanced data streams with concept drift
Speaker: Dan Deng
Knowledge extraction from data streams has received increasing interest in recent years. However, most of the existing studies assume that the class distribution of data streams is relatively balanced. The reaction of concept drifts is more diffcult if a data stream is class imbalanced. Current oversampling methods generally selectively absorb the previously received minority examples into the current minority set by evaluating similarities of past minority examples and the current minority set. However, the similarity evaluation is easily affected by data diffculty factors. Meanwhile, these oversampling techniques have ignored the majority class distribution, thus risking class overlapping.
To overcome these issues, we propose an ensemble classifier called Gradual Resampling Ensemble (GRE). GRE could handle data streams which exhibit concept drifts and class imbalance. On the one hand, a selectively resampling method, where drifting data can be avoidable, is applied to select a part of previous minority examples for amplifying the current minority set. The disjuncts can be discovered by the DBSCAN clustering, and thus the influences of small disjuncts and outliers on the similarity evaluation can be avoidable. Only those minority examples with low probability of overlapping with the current majority set can be selected for resampling the current minority set. On the other hand, previous component classifiers are updated using latest instances. Thus, the ensemble could quickly adapt to a new condition, regardless types of concept drifts. Through the gradual oversampling of previous chunks using the current minority events, the class distribution of past chunks can be balanced. Favorable results in comparison to other algorithms suggest that GRE can maintain good performance on minority class, without sacrificing majority class performance.
Supervisor: Yuan Zhong, Tai Zhang