BNCOD 2010 - the 27th International Information Systems Conference
2010 Special Theme - Data Security & Security Data
(This page uses CSS style sheets)
(skip to next section)
Accepted Full Papers
- Oleg Chertov and Dan Tavrov. Providing Group Anonymity Using Wavelet Transform: Providing public access to unprotected digital data can pose a threat of unwanted restricted information disclosure.
The problem of protecting restricted information can be divided into two main subclasses, namely, individual and group data anonymity. By group anonymity we define protecting important data patterns, distributions, and collective features which cannot be determined through only analyzing individual records.
An effective and comparatively simple way of solving the group anonymity problem is doubtlessly wavelet transform. It's easy-to-implement, powerful enough, and might give acceptable results when applied correctly.
In the paper, we present a novel method of using wavelet transform for providing group anonymity, which is gained through redistributing wavelet approximation values, along with simultaneous fixing data mean value and leaving wavelet details unchanged (or proportionally altering them). Moreover, we provide a comprehensive illustrative example.
- Raman Adaikkalavan and Sharma Chakravarthy. Access Control using Active Rules: Access to authorized resources is provided by access control mechanisms. Active or Event-Condition-Action rules, on the other hand, make the underlying systems and applications active by detecting and reacting to changes. In this paper, we show how active rules can be used for enforcing Role-Based Access Control (RBAC) standard without modifying the functional specification. First, we analyze different components of active rules and their mappings for enforcing RBAC standard. Second, we discuss how RBAC standard is enforced using active rules. Finally, we discuss how active rules extend RBAC standard to cater a large class of applications.
- Srikumar Krishnamoorthy, Krupa Benhur Gadde, Durga Prasad Muni, Gladbin David, Sunil Kumar, Sudesna Maharathy and Thej Kishore. Complex Schema Match Identification using Genetic Programming Technique: We present SASTRA a schema matching system that semi-automatically identifies complex schema matches between a target schema and one or more source schemas. Unlike traditional approaches that largely rely on schema structure and element level information to identify simple matches, SASTRA utilizes a combination of schema and instance level details to derive complex schema matches. While the proposed system utilizes prior-art techniques to identify simple matches, it introduces the novel use of Genetic Programming (GP) techniques to identify complex schema matches from instance data. Efficient problem specific heuristic schemes are recommended and implemented to limit the combinatorial search space evaluation in GP. A detailed experimental evaluation is also conducted on a host of synthetic and real-world schemas to demonstrate the usefulness of the system.
- Reza Kalantari and Chris Bryant. Comparing the Performance of Object and Object Relational Database Systems on Objects of Varying Complexity: This is the first published work to compare the performance of object and object relational database systems based on the objects complexity. The findings of this research show that the performance of object and object relational database systems are related to the complexity of the object in use. Object relational databases have better performance compared to object databases for fundamental database operations, with the exception of insert operations, on objects with low and medium complexity. For objects with high complexity, the object relational databases have better performance for update and delete operations.
- Zahid Islam. EXPLORE: A Novel Decision Tree Classification Algorithm: Decision tree algorithms such as See 5 (or C5) are typically used in data mining for classification and prediction purposes. In this study we propose EXPLORE, a novel decision tree algorithm, which is a modification of See 5. The modifications are made to improve the capability of a tree in extracting hidden patterns. Justification of the proposed modifications has also been presented. We then experimentally compare EXPLORE with some existing algorithms such as See5, REPTree and J48 on several factors including classification accuracy, quality of extracted rules/patterns and simplicity of the tree. Our initial experimental results indicate some advantages of EXPLORE over existing algorithms.
- Greg Hamerly and Greg Speegle. Efficient Model Selection for Large-Scale Nearest-Neighbor Data Mining: One of the most widely used models for large-scale data mining is the k-nearest
neighbor (k-nn) algorithm. It can be used for classification, regression,
density estimation, and information retrieval. To use k-nn, a practitioner must
first choose k, usually selecting the k with the minimal loss estimated by
cross-validation. In this work, we begin with an existing but little-studied
method that greatly accelerates the cross-validation process for selecting k
from a range of user-provided possibilities. The result is that a much larger
range of k values may be examined more quickly. Next, we extend this algorithm
with an additional optimization to provide improved performance for locally
linear regression problems. We also show how this method can be applied to
automatically select the range of k values when the user has no a priori
knowledge of appropriate bounds. Furthermore, we apply statistical methods to
reduce the number of examples examined while still finding a likely best k,
greatly improving performance for large data sets. Finally, we present both
analytical and experimental results that demonstrate these benefits.
- Uday Kiran and Krishna Reddy Polepalli. A Pattern Growth Approach to Mine Rare Association Rules Using Maximum Items' Support Constraints: A rare association rule is an association rule (or rule) consisting of rare items. It is difficult to mine rare association rules using single minimum support ($minsup$) framework because of the dilemma called ``rare item problem." That is, at high $minsup$, rules consisting of rare items will be missed and at low $minsup$, the number of rules explode. Hence, efforts have been made to find rules using multiple $minsup$ framework. In this framework, a model, called Maximum Constraint Model has been discussed to discover rare association rules efficiently. At present, an Apriori-like approach is using this model, thereby, leading to performance problems. Also, the characteristic nature of this model has yet to be explored. In this paper, we made an effort to explore the characteristic nature of this model and extended the FP-growth approach to use Maximum Constraint Model to discover rare association rules. Motivation for using FP-growth is that it is relatively efficient than Apriori. Experimental results show that the proposed approach is efficient.
- Sumit Kumar Bose and Rashi Malviya. A Memory Efficient Bottom-up Query Processing Scheme for XML Twigs with Arbitrary Boolean-Predicates: Existing query processing schemes for XML twigs are not generic enough to handle all types of Boolean predicates. Additionally, such query processing schemes can be in-efficient as they require large intermediate storage. To address this challenge we propose a hierarchical twig encoding scheme using a structure called L-blocks. . We show that a combination of L-block with the post-order tree traversal scheme is powerful and generic enough to evaluate twigs involving all types of Boolean operations, in a holistic manner. The proposed approach helps in reducing storage requirements. The initial results of our approach are extremely promising.
- Gabriele Pozzani and Esteban Zimanyi. Defining Spatio-Temporal Granularities for Raster Data: The notion of granularity is used in several areas of computing. In the temporal research community, granularity relates to the fact that the time frame associated to an event of interest (e.g., an accident) can be envisaged at several levels of detail (e.g., hour, day, month, etc.). Similarly, granularity in data warehousing is the level of detail at which facts (e.g., sales) are captured in dimensions (e.g., product, store, and day), and hierarchies are used for examining these data at different abstraction levels (e.g., total sales in a province during March 2009). However, there is no commonly-agreed definition of spatial or spatio-temporal granularities. Sometimes, the term spatial granularity is confounded for multiple resolutions. Further, the few proposals about them are mainly focused on vector data models. Raster model is an alternate representation to the vector one used, for example, in environmental information systems. In this paper, we extend the approach already proposed about vector-based granularities and define spatial and spatio-temporal granularities for raster data models. In our framework relations and operations between spatial and spatio-temporal granularities are defined.
Accepted Short Poster Papers
- Raman Adaikkalavan. Load Shedding in Data Stream Management Systems using Application Semantics: Data Stream Management Systems (DSMSs) process highly bursty streams in real time and are used in diverse application domains. Satisfying Quality of Service (QoS) requirements and providing accurate results are critical to the success of DSMSs and the applications that use them. In order to maintain QoS, various approaches have been proposed in the literature including capacity planning, scheduling strategies, load shedding, and others. Existing load shedding approaches drop tuples either randomly or based on the characteristics of data or continuous queries. On the other hand, utilizing application characteritics for dropping tuples would increase the accuracy of the results at the same time maintain QoS. In this paper we introduce load shedding schemes that are based on application semantics. The techniques presented in this paper complement existing load shedding approaches.
- Klaus Haller. Test Data Provisioning for Database-driven Applications: Most of todays business applications rely on database management systems (Database-Driven Applications or DBAPs). Testing DBAPs requires not only knowing the functions to be invoked, the input values, and the expected results; testers also need an initial database state (database test data). Popular methods for gaining database test data include generators for synthetic test data, manual test data design, or live system snapshots. Our first goal is to make the test data of these methods comparable; therefore, we introduce our concept of test data compliance levels. We analyze the major impact factors driving the process of choosing a database test data provisioning method in commercial projects. Furthermore, we provide qualitative guidelines linking impact factors and methods.
- Jose Luis Navarro Galindo and Jose Samos Jimenez. Flexible Range Semantic Annotations Based on RDFa:
In this paper, a new flexible range markup technique for Web
documents is presented as an alternative to XPointer technology, based on the
RDFa standard. The principal objective is to define semantic annotations that
support the evolution of annotated Web documents more effectively than
alternative techniques. The term Flexible Range indicates that annotations can
be defined over different ranges of text within a Web page and multimedia
objects within it, independently of its HTML tags.
- Jianing Wang. A Quality Framework for Data Integration: In the context of heterogeneous Data Integration (DI), distributed information conforming to different data models can be accessed though an integrated schema using mappings between this schema and the data sources. Although many DI tools have been designed to assist integrators in DI tasks (semi-) automatically, e.g. similarity matching, mapping generation, etc., the quality of the generated DI solutions is still difficult to determine and control. One of the reasons is due to the limitations of current DI tools in reflecting user requirements that can affect the quality of the DI solutions in aspects such as correctness, accuracy and performance. In this paper, we aim to determine and improve DI quality by presenting an extendable quality framework focusing on the user's requirements based on emerging ontology techniques extended with quality criteria and factors defined specifically in the DI context. Some of the associated measurement methods using metadata and knowledge extracted from the DI setting are also introduced in this paper. We also outline how, by using ontology reasoning capabilities, the framework reports on the extent to which the user's quality requirements are met by the DI solutions.
- Nafees-Ur-Rehman and Marc H. Scholl. Enabling Decision Tree Classification in Database Systems through pre-computations: Integration of data mining techniques in database systems is an open topic of research. The DBMSs power of dealing with lots of data and maintaining data integrity adds to the motivation of integrating it with data mining. Improvements in the database systems and in data mining go in parallel. We propose a method to integrate decision tree classification to do the required pre-computations and store it in database objects for later use. These values get updated with the introduction of new data or change in the existing data for classification. Decision tree classification can readily make use of these pre-computed values to build classification models. Our approach is based on the column database to use it effectively for feature oriented calculations. This comparatively improves performance if classification is deemed to be performed on a high dimensional data.
- Yi Ou and Theo Harder. Issues of Flash-Aware Buffer Management for Database Systems: Classical buffer replacement policies, e.g., LRU, are suboptimal
for database systems having flash disks for persistence, because
they are not aware of the distinguished characteristics of those storage
devices. We present CFDC (Clean-First Dirty-Clustered), a flash-aware
buffer management algorithm, which emphasizes that clean buffer pages
are first considered for replacement and that modified buffer pages are
clustered for better spatial locality of page flushes. Our algorithm is
complementary to and can be integrated with conventional replacement
policies. Our DBMS-based performance studies using both synthetic and
real-life OLTP traces reveal that CFDC significantly outperforms previ-
ous proposals with a performance gain up to 53%.
Sponsors