Foto del docente

Enrico Gallinucci

Junior assistant professor (fixed-term)

Department of Computer Science and Engineering

Academic discipline: ING-INF/05 Information Processing Systems


Keywords: Big Data NoSQL Databases Social Data Analysis Trajectory Data Analysis Precision Agriculture Business Intelligence Machine Learning OLAP Analysis Data Warehouse Semantic Web

Big Data: the term refers to datasets so large and complex (especially in terms of volume, variety, and velocity) that they require innovative techniques and methodologies to manage data storage and analysis. A big data platform is built upon clusters of common workstations, where the complexity of the distributed system is managed at the software level by frameworks that deal with data distribution and computation (e.g., Apache Hadoop and related tools). In this context, the research activity focuses primarily on the development of smart approaches to support the data scientist in the identification, management, and analysis of data. The goal is to implement a data platform capable of recognizing and reconciling data within the data lake, offering the data scientist a level of abstraction to support the governance and querying of the data of interest. Another research topic consists in the design of analysis and data mining algorithms that exploit the parallelism offered by the platform to process big data in a scalable way; this topic is addressed in particular in the analysis of trajectory data (i.e., trajectory mining).

NoSQL databases: the inability of relational DBMSs to scale easily in a big data context has led to the spread of a new generation of databases, which are based on non-relational data models and are built ground-up for distributed architectures. One of the main features of NoSQL databases is the adoption of a soft-schema approach to data modeling, which allows multiple data instances to coexist within the same collection without a predefined schema. A first research problem consists in solving the heterogeneity of the schema within a collection by using schema profiling techniques, supporting the analyst in the recognition of the main attributes and in understanding the rules that guide the use of the different schemas. Currently, NoSQL systems are often used in polyglot contexts, where different technologies (including relational ones) are used to store different and potentially overlapping parts of an information system. In this context, the research activity consists in the realization of a multistore system, to automate the integration of the data available in the various DBMSs and to enable transparent querying mechanisms to the analyst, while managing (at the same time) all the problems related to the overlapping of data and the heterogeneity in terms of technologies and schemas.

Non-conventional Business Intelligence: the term Business Intelligence (BI) indicates a broad category of IT applications designed to collect, manage, and analyze data to support the decision-making process. At the heart of a BI system usually lies a Data Warehouse, i.e., a repository that stores information according to a multidimensional schema and is optimized for supporting analysis queries. The research topic refers to the extension of traditional BI techniques to integrate and analyze data obtained from sources that go beyond corporate boundaries. In this direction, one of the main sources is the content published on social networks; to this end, research activities include the proposal of a Social BI methodology and an innovative extension of the traditional data warehouse modeling techniques. Several research projects have been carried out, with a particular focus on user profiling activities in the fields of politics and vaccination. A further research topic concerns the integration of the data warehouse with data from the world of the semantic web, with particular reference to linked open data.

Precision agriculture: the term refers to the optimization of operational and decision-making processes in the world of agriculture through the use of advanced technologies for data collection and analysis. The context of innovation begin with the use of smart devices in the field, requires the integration of data collected on-site with corporate data and open data (such as satellite images and weather data), and culminates with the design of an analytical system to extract knowledge on different aspects of the production chain. In this context, the specific research topic concerns the creation of a big data platform to collect and integrate data from various sources, as well as the application of prescriptive analysis techniques to model the spread of water in the ground and to support the farmer in optimizing the use of water resources.

Latest news

At the moment no news are available.