INSPECTr Project

Blogs

INSPECTr Ethics Workshops January 2021

Across January, the INSPECTr project held two ethics workshops designed to provide both ethical and technical partners the opportunity to engage in an in-depth and informal dialogue, together with the Ethics Advisory Board and external ethics experts, on two 'spotlight' issues identified through partner Trilateral's ethics governance processes as important for ethical design. The first workshop took place on 19 January 2021 and focused on the integration of publicly available data, typically online data, into the INSPECTr Platform. The project was joined by ethics expert Dr Thilo Gottschalk and discussed the importance of data minimisation and data storage limitations in this regard. Design solutions, such as the use of search filters and default settings were identified as granular ethics requirements for the Platform. The second workshop took place on 26 January 2021 and focused on artificial intelligence systems within INSPECTr. The project was joined by ethics expert Phil Booth and discussed the importance of bias mitigation and the understability of the artificial intelligence output for LEA investigators. Design solutions, such as adjustments to datasets, weightings for certainty and the importance of error identification were discussed with a view to adoption in the Platform. These workshops inform the ethics requirements for the Platform, that accompany the functional requirements. The project looks forward to hosting a third project workshop on gender in Spring 2021.

INSPECTr Ethics Workshop June 2021

As part of the Ethics-by-Design approach being taken by the INSPECTr project, the partners held an ethics workshop on Gender and AI during June 2021. This was attended by both ethical and technical partners, along with external experts. Attendees worked together to discuss gender-related issues that are relevant to both the project and the platform that is being researched, and to develop design solutions for these issues. Gender is a sometimes overlooked issue in data science projects that can have significant negative impacts. Not incorporating gender perspectives can mean that important issues are missed. For example, a gender bias in the data used to train machine learning algorithms, or amongst the members of a project, could mean that issues affecting different gender groups are not considered, or are missed.

The workshop discussed the impacts that can occur when gender is considered as an unchangeable binary category, how this can create harm for both people who are mis-categorised and law enforcement dealing with potential inaccurate data. This led to consideration of where such issues could be avoided or dealt with during development of the INSPECTr tools. Further, with respect to the tools being researched, a major point of discussion looked at whether data used for training machine learning tools could have gendered effects, and what some of the impacts could be. It was discussed how the project has already done a lot of pre-existing work to deal with bias issues in tools to ensure that they give more accurate results than if gender was not considered. Further discussions covered topics such as whether the project should include or exclude image search tools that used emotion detection, and biases associated with particular datasets, as noted by external expert Dr Allison Gardner, who joined the discussion. Looking forward, the project is taking into account work on bias by the IEEE and ISO, and looking at bias audits for tools and algorithms that are developed.

Complying with Ethical and Legal Standards in the INSPECTr Project

As a project focused on researching data-analysis tools and data-exchange infrastructure to help improve law enforcement agencies in their daily work, it is imperative that the INSPECTr project abides by the standards of research ethics, and data protection. To ensure this, the European Commission (EC) provided a number of ethics requirements for us to demonstrate our compliance with such standards. We have worked hard to ensure compliance with these standards, and we have had a positive response from the EC on our submissions to them.

A key part of Trilateral’s work is the ethics and data protection monitoring of the Living Lab research. This is where the tools developed in the INSPECTr project will be tested by our law enforcement partners. The work of partners includes completing data protection impact assessments to evaluate and mitigate risks to the rights and freedoms of data subjects relating to the processing of personal data. We have also worked to determine the most appropriate data processing relationships between Data Controllers, Data Processors, and Joint Controllers.

The impact of Brexit on international data transfers between the EU and UK-based partners has been an additional concern for the project. Following the expiry of transition arrangements, the EU has agreed an adequacy decision, meaning that the UK’s data protection regime is judged to provide an adequate level of protection for EU data-subjects. This provides for a continuation of smooth data transfers between all partners in the project. However, bearing in mind recent judicial decisions, the partners are implementing appropriate back-up options in the event that the adequacy decision is subject to a successful legal challenge.

The Ethical Approach to Research in the INSPECTr Project

The INSPECTr project has opted to take ethics-by-design and privacy-by-design approaches to the development of the INSPECTr tools and platform, that are managed by Trilateral. This means that ethical and privacy issues are considered at all points of the INSPECTr project, and choices are made to follow outcomes are as ethical and privacy-respecting as possible. A key part of this is sensitising the consortium to ethical and privacy issues that could arise during projects like INSPECTr; to do this, we held a series of webinars, along with continuous communication between ethics and technical partners, to highlight particular issues to technical partners. A key result of these discussions was coming to an understanding that balanced the potential technical benefit of using personal data to train and test machine learning tools with the privacy and data protection concerns that such processing entails, it was agreed that technology research should take place using as little personal data as possible, and only where necessary to fulfil the research purposes. Major work also took place around the most appropriate legal basis for law enforcement partners to use when testing the INSPECTr tools. It was decided that law enforcement partners should act under a legal regime that treats them as researchers and should not be engaged in processing of data from ongoing investigations. This helps to draw a clear line between research and operational policework.

Another crucial aspect of the ethics and privacy-by-design approaches is the development of a series of recommendations to mitigate risks and enhance opportunities related to ethics, legal, and societal impacts. These were co-developed through discussions and workshops with technology and ethics experts. These are being implemented across the remainder of the INSPECTr project through collaboration with the technical partners to ensure that the final results of the project are ethical, privacy respecting, legally compliant, and societally acceptable.

Further, as technology advances at a rapid rate, so do ethical, legal, and societal concerns associated with it. So, Trilateral engages in continuous horizon scanning for issues that could affect the project itself, or adoption of the tools after the project. Recently, this work has focused on the European Commission’s proposed AI Regulation. The project wants to understand the nature and scope of the proposed Regulation to try and ‘future-proof’ the tools ahead of changes to the legislative environment. Happily, Trilateral already planned to incorporate many of the key aspects of the proposed Regulation into the INSPECTr project, albeit in a slightly different format. Consequently, Trilateral will be monitoring the development of the proposed Regulation and adapting its approach to minimise barriers to adoption of the INSPECTr tools in future.

Explanation of CASE Language and Reasons for its Adoption for the INSPECTr Platform

With the CASE language having such a central role in the platform, handling of the CASE format becomes crucial. To accommodate the different data needs on the platform, the CASE language is used in three different forms by the platform and depending on the form stored in three different storage engines.

It exists as a binary json-ld file, that depending on the amount of investigative information it represents, could be of several hundred megabytes or possibly larger (e.g., if describing the file contents of a large disk drive). It is the output of the platform's parsers, the primary data input of the platform, utilised when for example an LEA user wishes to import in the platform evidential material stored elsewhere, parse reports produced by other tools in their organisation, or directly process evidential material via one of the platform's "wrappers" carrying out this functionality.

The binary form is essentially a collection of interlinked nodes forming a graph. For example, information about a (mobile) phone call between two individuals could loosely speaking be represented via a graph by three nodes; the phone call node holding information about the call itself (duration, application used, etc.), linked with two person nodes, each providing information about the individuals (phone number, name, etc.). Moreover, each person node could also be linked with other nodes, representing other calls that the individual took part in, messages they sent or received, etc. These binary files are stored in INSPECTr's HDFS storage system, along with all other binary files (photographs, videos, etc.), which are also described by CASE files (e.g., a photo node can list the EXIF data and provide the HDFS location of the actual photograph).

With the binary form being the output of all tools on the platform and hence the common denominator of all investigation-related data, it became evident that handling CASE data in a clear and efficient way was of crucial importance to the platform. To achieve this, binary files are "flattened", or in other words transformed from a large json-ld file describing all interlinked nodes (a large hash table) into a collection (an array) of much smaller hash tables, each describing a single flat node. The graph structure of the binary CASE form is retained via the inclusion of meta-information on each flat node. During this process, each flat node is individually stored in INSPECTr's Elasticsearch storage system. This enables tools to directly and efficiently retrieve the information needed, instead of first parsing a perhaps very large binary CASE file in order to access this information. For example, the natural language processing tool directly accessing the required sms messages (a few hundred kilobytes) without first parsing the binary CASE file describing the mobile phone's drive where these messages were retrieved from (hundreds of megabytes, possibly larger).

Finally, in order for the structured and interlinked representation of cyber-investigation information, as well as the evolution of an investigation to be easily studied and manipulated by LEAs, CASE data is also converted into knowledge graph form, and stored in INSPECTr's Neo4j storage service. This allows CASE data to be queried in a more "investigative" way; for example, a query to "retrieve all individuals having contacted suspect X via sms during period Y" can be created using the CASE language and run on Neo4j.

Handling of Standardised Evidence (CASE) by the Platform

One of the call’s requirements was to adopt a common format for data homogenisation, data discovery (linked cases) and data exchange. The INSPECTr project opted for the open-source Cyber-investigation Analysis Standard Expression (CASE) language, a community-developed ontology designed to serve as a standard for interchange, interoperability, and analysis of investigative information in a broad range of cyber-investigation domains, including digital forensic science, incident response, counter-terrorism, criminal justice, forensic intelligence, and situational awareness. The CASE Community is a consortium of for-profit, academic, government and law enforcement, and non-profit organisations that have created a new specification for the exchange of cyber investigation data between tools.

CASE provides a structured specification for representing information that are analysed and exchanged during investigations involving digital evidence. To perform digital investigations effectively, there is a pressing need to harmonise how information relevant to cyber-investigations is represented and exchanged. CASE enables the merge of information from different data sources and forensic tool outputs to allow more comprehensive and cohesive analysis. The main benefits of using CASE are:

Fostering interoperability:
to enable the exchange of cyber-investigation information between tools, organisations, and countries. For example, standardising how cyber-information is represented addresses the current problem of investigators receiving the same kind of information from different sources in a variety of formats

Establishing authenticity and trustworthiness:
based on the clear representation of the Chain of Evidence (provenance) and the Chain of Custody. A fundamental requirement in digital forensics is to maintain information about evidence provenance while it is exchanged and processed

Enabling more advanced and comprehensive correlation and analysis
In addition to searching for specific keywords or characteristics within a single case or across multiple cases, having a structured representation of cyber-investigation information allows more sophisticated processing such as data mining, or NLP techniques. This can help, for instance, to overcome linkage blindness that is the failure to recognise a pattern that links one crime to another, such as crimes committed by the same offender in different jurisdictions

Helping in dual/multiple tools validation or results
in order to evaluate their completeness and correctness/accuracy;

Automating normalisation
and combination of differing data sources to facilitate analysis and exploration of investigative questions (who, when, how long, where).

An investigation generally involves many different tools and data sources, creating separate storerooms of information. Manually pulling together information from these various data sources and tools is time consuming, and error prone.

Tools that support CASE can extract and ingest data, along with their context, in a standard format that can be automatically combined into a unified collection to strengthen correlation and analysis. This offers new opportunities for searching, contextual analysis, pattern recognition, machine learning, and visualisation.

Artificial Intelligence (AI) Research Methodology and Application

Blog provided by INSPECTr partner Daniel Camara of the French Gendarmerie (GN)

Artificial Intelligence (AI) is an interesting research field and, depending on whom you ask, it either has no formal definition or has hundreds. Even specialists do not fully agree, with different researchers having different definitions and interpretations of AI, which comes to the fact that intelligence itself has different interpretations. What is intelligence? Some believe it must be linked to rationality, critical thinking, high order brain functions, others seek intelligence from things that may not even have a brain at all. However, a relatively well-accepted intuition is that a computer program belongs to the AI class if capable of behaving/giving answers, close to the ones a human would if presented to the same kind of situation/question. Another concept that is intrinsically linked to AI is the notion of error. If you have a way to solve a problem, that gives a perfect solution every time…. That, most probably, does not fit the general concept of AI. In some sense, it does not contradict the previous intuition, as humans solve problems in an instinctive way, and yes from time to time we make mistakes, “to err is human“. AI methods are heuristics, intuitions, that are not necessarily full-proof methods. The word heuristic originates from the Greek word “heuriskein”, meaning “to find” or “discover”. A heuristic is a practical method, which comes from, for example, an intuition, an abstraction, or even a pattern generalisation. It is an idea that is not sure to work every time but that, in reality, behaves pretty well.

As the definition of the AI field is “flexible”, to say the least. Many different methods/heuristics are used to try to reach this “similar to human” kind of answer. This implies that the AI field is constantly evolving, and new methods are created to explore ways to solve general problems or work well in a specific domain. One of the most iconic artificial intelligence methods is the neural network. It is based on abstractions of how the human neurons work and are organised. Rosenblatt proposed the formalisation of a perceptron (the abstraction of a neuron) in the 50’s, and even the modern deep learning approaches rely on his concepts. A perceptron is a simple sum function that add up the different received stimuli and have an activation function to propagate a signal. A neural network is an organisation of these basic units in series and layers. The modern deep networks are called deep because they have many layers of perceptrons. When considered individually, perceptrons may not be that impressive, but when organised in a network and trained, the provided answers may be quite impressive, even though there is no guarantee that the answers will always be correct. Deep learning methods are currently used in INSPECTr for example to perform facial recognition, for detection of child pornography and cars in images, text translation, among others.

Even if deep learning is quite in vogue in the last few years, and with good reason (it is a general method that can be used to solve an important range of problems), it is not the only AI method. A broad range of other methods exist. For example, the crime forecasting implemented in INSPECTr is based on the analysis of time series and on the intuition that crimes have an opportunity factor, which is basically random, but also a facilitator factor, which means not all the places and times are fit for the commission of a crime. If we can map these places, we can have a good intuition of where crimes may happen in the future. Three things are required for a crime, a victim, a perpetrator, and a place where these two meet. Not necessarily at the same place at the same time, but somehow the paths of these have crossed at some point. However, some places are more fit for a crime commission than others. For example, a dark alley or a tourist attraction with many distracted people. Moreover the crimes have a seasonal component, e.g., not much happens in a winter sky station during the summer. However, it is also a tendency factor, e.g., a team that is committing residential burglary tends to spread their activities in a region over a period of time. So, if we analyse the criminality of different regions, considering criminal seasonality and tendencies, to forecast the next days/months criminal levels of activities it is not possible to guarantee that the criminal pattern in a given region will be the same tomorrow as it was today, but most probably, tomorrow's criminal profile for that region will be close to today than any day two months behind. Moreover, in general, it can be assumed that this week will present more similarities with the same week of the last year than with the same week six months ago.

Hundreds of other heuristics exist, some are more suited to solve LEAs’ problems, some less. Some people could even argue that we would need to apply LEA based heuristics to solve some of the specific LEA-related problems, and they are probably right. However, AI methods should be used on “hard to solve problems”, the ones we cannot distinguish a clear pattern linked to the problem or solution to that problem. If one can distinguish a pattern that can be transformed into a set of defined rules, AI methods may not be required to solve it. Considering that AI-based methods imply an error probability, if you solve the problem using standard and precise algorithms, it will always be better.

Natural Language Processing (NLP)

Blog provided by INSPECTr partner Panos Protopapas of Inlecom (ILS)

Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) and linguistics concerned with the task of making machines "understand" text found in documents and automatically extract information contained in these to automate various processes by using it. Some common NLP tasks are described below, ordered from those considered relatively easier for a machine to perform to those considered more difficult:

Part of speech tagging (POST): to determine the part of speech of words in a document, given the context. For example, in the phrases "click on this link" and "some languages include click sounds", the word click should be recognised as a verb and adjective respectively.
Lemmatisation: to remove the inflectional endings of words appearing in a document (due to different tenses, cases, voices, genders, etc.) and retrieve a "base" word (lemma).
Named entity recognition (NER): to classify named entities found in a document into pre-defined categories (locations, persons, organizations, dates, etc.), with the addition of more categories possible via training.
Sentiment analysis: to classify the polarity of a given document, that is, whether it is of a positive, negative, or neutral sentiment.
Topic modelling: to infer "abstract" topics occurring within a document. The main idea is that when a document is about a particular topic, some words will appear more often than others. For example, a document about cars is more likely to contain the words engine, and transmission and less likely to contain the words bone and bark which are more likely to be present in a document about dogs.
Summarisation: to sum up the main points of information contained in a document and create a short, coherent, and fluent summary outlining the text’s major points.

Although the output of these tasks might not seem very useful on a per-task basis also considering that the machine performance in these tasks is not yet at par with human performance, one should not overlook the speed increase that the usage of NLP can provide, in situations where the number of documents to check is too large and checking them manually is not realistically possible or economically viable. To this effect, the synergies between these NLP outputs and the positive effects they can have when post-processing searching or filtering through these documents is of interest.

Furthermore, the outputs of the NER, sentiment analysis and topic modelling techniques can greatly enhance the filtering techniques of users going through NLP-analysed documents. If said documents have been tagged with the outputs of the NLP models, a user could filter for example for documents mentioning a particular location and/or date, having a negative sentiment, and some firearm appearing in the list of words defining the topic.

Finally, the summarisation results can also aid the human processing of documents by enabling users to scan through a document for an accurate and brief summary of its contents prior to deciding on investing the time to read it.

Image processing in INSPECTr

Blog provided by Ben Roques of the CCI project coordination team (CCI/UCD)

The INSPECTr platform implements an image processing system aiming at reducing the workload of investigators when dealing with a very large amount of media content. After media extraction through the use of INSPECTr platform’s specific tools, or after ingesting media directly into the platform, law enforcement officers can automatically annotate pictures using different image recognition models. Those annotations can then be filtered and searched depending on the needs of the investigation.

The system needs to be able to ingest a large amount of data. Oftentimes, storage supports linked to an investigation contain thousands of media files. Law enforcement officers should be able to send all the files to be processed in the background and then search for/prioritise investigative actions. A model is a machine learning classifier that was trained on specific data. For the purpose of INSPECTr, the image processing pipeline only uses convolutional neural networks as those were proven very effective for image processing tasks. There are many different architectures, with new ones frequently published in scientific journals. The system possesses many different neural networks trained on for a specific task. It was designed to be very modular. Adding and removing models needs to be easily done. This is due to the fact that the field of machine learning is always evolving and moves really fast. That is why the system tries to be model agnostic. Model specific pipelines are pushed at the very end of the data flow.

The system is organised around micro-services. Pictures to be analysed are sent to a main endpoint that forwards tasks to a queuing system. A worker collects the tasks from the queue and forwards them to a model micro-service. Models can be swapped, added, or removed without bringing the system down. Once the model is finished, a webhook is sent to the main endpoint. The result is then cached into a database.

The system is not designed to make automated decisions. The idea is to triage a large number of pictures to lighten the investigator’s workflow. For instance, cases related to child sexual exploitation oftentimes contain a lot of media files. The system is able to detect the presence of children and/or nudity in a set of pictures. The investigator can then, for example, search for the face of a suspect inside the set of pictures containing both nudity and children using face recognition technologies. This example is only a small use case. It is important to note that because many more models can be added, the possibilities will vastly increase over time.

Ethical, Legal, and Societal Impact Assessment Overview

Blog provided by INSPECTr partner Joshua Hughes of Trilateral Research (TRI)

As part of the ethical approach to research in INSPECTr, Trilateral conducted an impact assessment of ethical, legal, and societal issues that could be raised by the INSPECTr project and technologies. As has been discussed in other blogs, privacy and data protection issues are also key considerations. It is important that these issues are considered so that the final version of the INSPECTr platform developed in the project is the most ethical, legally optimal, socially acceptable, and privacy-respecting as possible. It is crucial that this work is done because the intended recipients of the INSPECTr platform are law enforcement agencies whose officers come from, and contribute to, the society that is being policed.

As noted in other blogs, Trilateral analysed both the project and technologies being researched to develop a list of requirements, and some of these were discussed in more detail during workshops (the list of requirements is provided in D8.5 Ethical, Legal and Societal requirements for the INSPECTr platform and tools). The workshops helped to specify some of the details needed to fulfil some of the requirements and, for others, provided space where options for fulfilling the requirement could be discussed. In addition, through the Ethics and Privacy-by-Design work carried out by Trilateral, some additional requirements have been developed.

How all of these requirements can be fulfilled is an evolving conversation that will continue until the end of the project. As the technologies are still being researched, it is not possible to give a final overview as to how the requirements are being fulfilled. However, generally, it can be said that the INSPECTr platform is being researched in a way that enables data to be collected, analysed, and shared in appropriate ways that respect the applicable standards; end-users will be fully informed about what tools can and cannot do, what their limitations are, and how to use them through a detailed training provision. Overall, the ethical, legal, and societal requirements developed in the first part of the project are likely to be fulfilled in the second half, leading to a platform that will enable law enforcement to engage in analysing large amounts of data from complex investigations in an ethical, legal, and socially acceptable way.

Introduction to INSPECTr Living Labs Experimentation

Blog provided by INSPECTr partner Centre for Cybersecurity and Cybercrime Investigation

A network of LEA Living Labs was to be set up both organisationally and technically to specify requirements, experiment with and test the project outputs, and create the nucleus for the sustainable use of the INSPECTr platform by LEAs throughout Europe. Each LEA Living Lab was to provide an experimentation and a test-bench environment primarily and initially with mocked evidence from scenarios created by LEA partners to test the requirements, accuracy, and user acceptance of the platform. Use Case mocked but realistic evidence was then prepared by our Law Enforcement project partners for experimentation and testing of the INSPECTr platform's functional and non-functional characteristics and have been developed to utilise the majority of the technological developments. Numerous forensic analysers and intelligence gathering tools are required to process both digital and non-digital items.

Early Milestones: An early milestone in the project was to define the common processes and baseline resources in LEA Living Labs to experiment towards producing a detailed requirements pipeline. This required comprehensively detailing the LEAs coordination and experimentation environment, each of the investigative tools used, and an in-depth analysis of a 3-stage questionnaire process completed with the support of our LEA partners. All of the efforts throughout this task were geared towards establishing a solid collaborative environment for LEAs. Based on LEA experimentation and feedback, the technical partners were able to extract the initial requirements and specifications for the forthcoming INSPECTr platform, which was being developed in parallel within the other relevant work packages. This process provided invaluable knowledge to inform the Consortium of what is expected of the INSPECTr platform, regarding its performance and functionalities that are most relevant to LEAs’ investigative workflow.

Key Approaches: Using both structured and unstructured data as input, the developed platform will facilitate the ingestion and homogenisation of this data with increased levels of automatisation, allowing for interoperability between outputs from multiple data formats. Various knowledge discovery techniques will allow the investigator to visualise and bookmark important evidential material and export it to an investigative report. In addition to providing basic and advanced (cognitive) cross-correlation analysis with existing case data, this technique will aim to improve knowledge discovery across exhibit analysis within a case, between separate cases and ultimately, between inter-jurisdictional investigations. INSPECTr will deploy big data analytics, cognitive machine learning and blockchain approaches to significantly improve digital and forensics capabilities for pan-European Law Enforcement Agencies (LEAs).

Data Formatting: The appropriate formatting and structure of data has been an important consideration in the project from the outset, with the project ultimately opting for the open-source Cyber-investigation Analysis Standard Expression (CASE) language. This is a community-developed ontology designed to serve as a standard for interchange, interoperability, and analysis of investigative information in a broad range of cyber-investigation domains. CASE provides a structured specification for representing information that is analysed and exchanged during investigations involving digital evidence. CASE also enables the merge of information from different data sources and forensic tool outputs to allow more comprehensive and cohesive analysis.

Gadgets: In INSPECTr our aim was always to be able to use our own tool outputs and develop a ‘Toolbox’ of data enrichment analysers, or as we call them, ‘Gadgets’ with the idea being that we would have free forensic tools and all of the outputs of these tools would be fed into the storage layers and accessible through the Case Management System.

Introduction to INSPECTr Living Labs Experimentation

Mocked use case development: 3-part scenarios scheduled to match the technology agenda

A challenge that presented early in the project was how can we test the technology while respecting data privacy and GDPR? The use of existing evidential data from historical cases for testing our platform would clearly contravene these. Therefore, the decision was taken early, to use mocked data for our experiments.

To replicate “real-life” investigations, our experienced law enforcement partners were tasked with developing three unique scenarios to be investigated, each with fake suspects and a rich history of communication across numerous sources of evidence. The evidence they created also reflects the volume of information that investigators see in real-life and the linkage between actors under investigation.

By using mocked evidence, our law enforcement partners can openly discuss issues with the platform with developers, while respecting ethical considerations. The use cases have been developed in three parts and address features of the platform as it develops. There are six phases of technology developments on the platform, so we wanted the LEAs to map the technology agenda to each part of their use case and get more technical as they develop.

Living Labs Phase 1 - April 2021

In April 2021, the INSPECTr technical partners demonstrated the features of the platform to law enforcement partners for the purposes of familiarisation and for gathering feedback on the developments to date. A lot of services, which had so far been developed in isolation, were ready to be integrated in preparation for the next phase of development. It was essential to get the views and observations of our law enforcement partners prior to moving to this next stage and use their valuable feedback to guide the continued development of the platform technology.

Further Development of the INSPECTr Platform

Until this point, the INSPECTr project's primary focus had been on developing the platform rather than on a live environment but with the Living Labs, this was all set to change. The work carried out in the early parts of the project to understand the hardware, software and service requirements meant we had a good foundation for deployment. This initial planning phase meant that we had applicable hardware for the requirements mentioned above. We had several conditions in mind for the hardware planning phase, virtualisation capable hardware, easy expandability, and hardware uniformity; if we had an issue with one device, we would have the same issues with other devices. We also understood that asking partners to install multiple devices wasn’t feasible and determined virtualisation as a critical aspect of our needs. Most infrastructure today is developed to be one device for many services. The hardware was selected with extendibility in mind and had an upgrade path available if the system needed more resources or if the project required more performance in future.

The above gives a brief overview of the hardware, but next comes the software and technologies aspect of the project. Virtualisation of services was the most important. Virtualisation allows us to have multiple services carrying out various functions; each node is a network in a box, providing all requirements for running a node on the INSPECTr network. To further segregate the hardware resources and separate them into parts that allow multiple services to work independently of one another, we used Docker. Docker allows more functionality than just segregation of resources, such as live updates of services, instead of requiring redeployment and shutting down systems to update. The platform's development process required such technologies, which became fundamental during the deployment process for the Living Labs.

So we have the technology and the hardware, but now came the question of how we would deploy it. There are numerous steps in the platform's deployment, but there was a degree of automation available due to having the same hardware in each case. We carried out the deployment with an automation framework that allowed us to deploy operating systems and software and set up each node as a replica of one another while not directly replicating. The strategy allowed us to deploy all living lab infrastructures within two days and have the Living labs up to date with the current rate of development pace. So, when developers need to update or modify code after deployment, changes made in the code management system (cms) are pushed to all Living Lab nodes. After a few minutes, these changes are live on all Living Lab nodes. Rather than waiting days for updates to software or, in the case of some commercial offerings, potentially months, these changes can be made much faster and with less downtime. More rapid deployment is not a silver bullet to software updates, but it gives allows the technical team to send code from developer to platform with fewer delays and minimal platform downtime.

Living Labs Phase 2 – March - April 2022

Phase 2 testing of the INSPECTr platform, using mocked evidence Use Cases, allowed our Law Enforcement partners (LEAs) to test our software in March and April 2022.

For this testing phase, formal feedback was gathered from our LEAs by means of a survey and informal feedback was gathered at a meeting of the Law Enforcement Steering Group following the completion of the testing phase. Feedback on the overall approach of the platform’s development was positive, with our law enforcement partners clearly understanding the ‘vision’ of the INSPECTr platform and how it could be used and useful for investigative purposes. Overall, there was a good level of satisfaction with the ease of new case creation, which was found to be simple and intuitive, as well as the process of logging in, the widgets interface, and how to execute INSPECTr gadgets (but with a few issues still requiring more clarification on how to use them).

More detailed feedback was further provided by our LEA testers on individual elements of the platform where issues had arisen highlighting the need for further platform development and refinement in these areas. These elements included the following:

some work and refinement on datatypes and gadget names needed, ensuring that the platform is more intuitive in the future
numerous tasks should be grouped, to improve the usability of the interface. This will be addressed later in the project using workflow programming
some minor adjustments to the data would be required, to simplify what will be a very intuitive analytic system.

LEA feedback on the issues of data security, data deletion, data sharing and exchange were expected issues as they are still currently in the development pipeline and were not expected to be ready for this testing stage. The main focus for this stage was to provide basic data processing and analysis. As a result, LEA participants provided very positive feedback on the way data can be ingested and processed and the visual reports (widgets) for each gadget.

Summary

This had been the first opportunity in the project to test the Use Cases in this much depth and it proved to be an extremely useful exercise and a great learning experience in terms of how the concerns of law enforcement can be addressed. The engagement of the INSPECTr law enforcement partners in developing the mocked evidence Use Cases, participating in the testing phase, and in providing feedback, has been invaluable. It has also been very encouraging from the technical side that the technical team were able to fix bugs and resolve issues raised during testing quite quickly. The issues raised during testing were tracked and categorised into headings of minor, major and critical. Only one critical type was recorded and that is the fact that evidence is not currently being segregated by case ID. This was a known factor prior to testing and is scheduled to receive remedial action before the next Living Lab. The technical team will also continue to work on the refinement of gadgets, widgets and dashboards and the way SIREN accesses the data, speeding that up and have greater linkage and better analytics overall. A more detailed demo will be prepared and provided to participating law enforcement partners ahead of the next Living Lab, Living Lab 3, which has been scheduled for 21st-23rd June 2022. This will be held in UCD, hosted by the INSPECTr Coordinator.

Update on INSPECTr Living Labs Experimentation Phase 3

From the outset of the project the INSPECTr platform was to be built with the active participation of a broad and inclusive pan-European LEA community including the definition of requirements, feedback and involvement on research and development tasks and the demonstrable use of the outputs.

Living Lab 3 took place in UCD on 21st to 23rd June 2022. This was an in-house meeting where both developers and law enforcement participants were able to meet in person. This proved invaluable from a communications point of view as both groups were able to easily interact with one another during the live testing, allowing both groups to effectively communicate problems and usability issues that needed to be resolved and/or improved in the platform.

An example of this was how the law enforcement participants highlighted usability issues which developers had up until this time been unaware of, but having these brought to their attention, could easily understand. These have since been brought on board and elevated as main priorities by developers to improve or incorporate.

During Living Lab 3 law enforcement participants worked alongside developers as live updates were made to the platform. Developers worked late nights to fix reported bugs, and it was found to be much easier to find effective solutions with law enforcement feedback being available immediately and in person rather than working remotely. The meeting also allowed law enforcement participants a more immediate view of how INSPECTr could fit for them as potential future users of the platform.

Despite fixing many bugs at the live testing event, developers also logged a long list of issues to work on after the meeting. From this work, developers noticed and logged other relevant platform issues to work on in the long term that had possibly not been flagged as requiring attention prior to the live event.

Since Living Lab 3 the CMS (Case Management System) has been integrated on the platform which should resolve a significant amount of the usability issues that were raised. Further work on the graphical representation of data for users continues and is being directed based on law enforcement feedback from the Living Lab.

Update on INSPECTr Platform Development During Living Labs Experimentation:
Phases 3.5, 4, and 5.

Living Lab experimentation continues to be an integral part of INSPECTr platform development. There has been a great deal of highly focussed activity during the most recent project quarter where the INSPECTr development team have been working intensively through several phases of the Living Lab Experimentation schedule to develop, troubleshoot and provide solutions to the ever-evolving INSPECTr platform.

Living Labs Experimentation - Phase 3.5

During Living Lab 3.5 (LL3.5) INSPECTr project LEA participants were invited to practice several tasks remotely using an updated version of the INSPECTr framework on their individual server nodes via a secured VPN connection. Participants were asked to create and link a new user account to perform a series of practical tasks, including the following:

Populating tables with Tokens, Countries, and Crime Types
Checking available INSPECTr analysers (gadgets) and related data types
Searching existing artefacts and running INSPECTr analysers (to run reports)
Examining the INSPECTr analyser generated reports
Selecting items in the report related to a mocked use case (Terror, CSAM, or Fraud) and creating new artefacts as appropriate.

As well as testing the platform, the LEAs were tasked with providing live feedback regarding issues that they encountered as they worked through the testing process. This feedback was also provided more formally via feedback surveys. Both feedback formats served to inform the development team of the technical capabilities of the platform and to provide suggested improvements to help the development team best tailor the platform for LEA end-users.

Examples of feedback presented by LEA testers were syncing issues across a subset of components, artifact creation within the Case Management System (CMS), report visualisation and its integration across components, and running of analysers on certain nodes. These issues received attention by the INSPECTr development team as they occurred during the Living Lab and served to provide further direction for subsequent platform software development.

INSPECTr Technical Team Meeting Athens 24th- 28th October 2022

An INSPECTr developer team meeting was held in Athens during October to focus on the ongoing technical development of the platform. The meeting commenced with some big-picture planning with a review undertaken of issues being experienced in the platform and on functionality that was yet to be implemented.

Pressing issues related to forthcoming Living Labs were considered and following this review any issues that could be addressed within the available time were fixed and the platform tested to ensure everything was working as expected. In particular, some of the issues examined and fixed pertained to the running of analysers, creating artefacts using reports of analysers run previously and viewing images and downloading them and account management and population of Elastic Search indices in charge of this.

Issues regarding the Case Management System [CMS] were also discussed following feedback received from LEA testers during Living Lab 3.5, a Living Lab that was held concurrently with this technical meeting. Discussions were also held regarding widgets and Siren including issues in the sorting and pagination of widgets and switching the language used in widgets' codebase from javascript to typescript. Discussions also included SIREN adding functionality to offer NLP results within its platform. Towards this, the NLP toolbox would be developed to offer a new set of endpoints, so that models' results could be returned.

The security of the platform was discussed from various aspects. In most cases it was decided that due to time restrictions and, in an effort, to push further with the development of other functionality, issues would be examined at a later point in the project.

A discussion was also held regarding the envisioned usage of the AI tools in the future, how the CASE standard could be used to describe their results, and how all this data could be later harnessed from the Knowledge Graph stored in Neo4j.

On the closing day of the meeting the PubSub component of the platform was discussed and specifically:

The usage of the eCodex framework (Domibus, Domibus Connector, and Domibus Connector Client components) and the need to deploy this on a Windows machine.
The existence of Linux containers for the Domibus and Domibus Connector components, which were provided to us (ILS) late in the project unfortunately, requiring us to proceed with the Windows machine solution due to time restrictions. However, it is certain that in any future development of this component, a move to a Linux-compatible solution will be achievable and beneficial.
The development of the Information Request Management Engine (IRME) component which will act as a UI-enabled "middleware" between the CMS and the PubSub functionality, allowing users to request information and view results of previous requests, as well as allow administrators full control over the requests and thus enhanced security of the information transmitted.

The above summary of work illustrates the extensive work being undertaken in the technical development of the platform and how LEA user testing in the Living Labs environment is helping to inform the direction of the evolving platform.

Living Labs Experimentation - Phase 4

For Living Lab 4 (LL4), LEA participants travelled to CCI, University College Dublin from November 7th to November 11th 2022 where they reviewed and tested the latest version of the INSPECTr framework locally in Dublin. Accessing their individual INSPECTr server nodes, LEAs created admin and user accounts to test the functionality and usability of the Case Management System (CMS) as well as the SIREN digital visualisation environment on selected mocked use case artefacts. LL4 tasks included:

Creating new admin and user accounts in the CMS
Ingesting mocked data sets, and running INSPECTr analysers on them
Examining the INSPECTr analyser generated reports for investigative purposes
Running the SIREN digital visualisation component of the INSPECTr platform

As usual during a Living Lab, as well as testing the platform, LEAs were asked to provide live feedback regarding issues that they encountered as they worked through the testing process. This provides the software development team with priority bug reporting as well as suggested usability improvements for LEA end-users. Examples of issues presented by LEA testers were issues finding investigative links in the use case mocked data, digital visualisation of evidence in the SIREN component, and the running of analysers on some nodes. These issues received attention by the INSPECTr development team as they were presented during the Living Lab and continue to provide direction for subsequent software upgrades and usability improvements for end-users.

Living Labs Experimentation – Phase 5 – Live Demonstration by GN

INSPECTr partner, GN (French Gendarmerie), presented an online live demonstration to the project’s LEA partners of some of the AI platform tools. One of those is the Gendarmerie's crime forecasting tool. The tool uses past criminal reports' data like geolocation, date and type of crime to predict future tension zones and displays them using heatmaps. This allows LEAs to distribute resources according to the zones of tension and better manage them in time and space. GN also presented tools such as speech recognition, OCR and image annotator tools.

A Series of CEPOL Webinars to Demonstrate the INSPECTr Platform

In February 2023 the INSPECTr project presented a series of lunchtime webinars, courtesy of CEPOL, with the target audience being Law Enforcement Officers, Judicial Authorities, and EU Public Security Entities fighting cybercrime.

The main goal of the INSPECTr project is to create a proof-of-concept platform, but future development will aim to improve the technology towards operational use so that it will be adopted by European LEAs. The platform can be used for a wide range of LEA activities, such as digital forensics and open source intelligence gathering. However, it also addresses major issues that LEAs experience, such as big data management and collaboration with other jurisdictions. The aim of this webinar series was to inform the target audience on how the INSPECTr platform can be accessed, installed, and configured, how to use INSPECTr gadgets for the acquisition and processing of digital evidence and intelligence sources, and to demonstrate the platform’s case management system and analytics services. Comprehensive practical demonstrations were presented throughout the week-long webinar series. Access to the recordings of the webinar series is available to those registered on the LEEd platform, CEPOL’s online education and training platform.

The webinar series covered the following topics:

Project Overview Including Platform Setup and Usage Featured Tools and Basic Data Visualisation Data Standardisation, Chain of Evidence/Custody and Analytics Integrated AI/ML Tools Evidence Discovery and Exchange Other Features and Future Exploitation.

A comprehensive Blog that covers in greater detail what was presented throughout the full CEPOL webinar series is available in our most recent INSPECTr Project Newsletter.

INSPECTr Project

Recent Articles and Publications

Enterprise Ireland Article, Jun 2020.

EU Researcher Article, July 2022.

INSPECTr Newsletter,
First Issue (March 2021)

INSPECTr Newsletter,
Second Issue (June 2021)

Blogs

INSPECTr Ethics Workshops January 2021

INSPECTr Ethics Workshop June 2021

Complying with Ethical and Legal Standards in the INSPECTr Project

The Ethical Approach to Research in the INSPECTr Project

Explanation of CASE Language and Reasons for its Adoption for the INSPECTr Platform

Handling of Standardised Evidence (CASE) by the Platform

Artificial Intelligence (AI) Research Methodology and Application

Natural Language Processing (NLP)

Image processing in INSPECTr

Ethical, Legal, and Societal Impact Assessment Overview

Introduction to INSPECTr Living Labs Experimentation

Introduction to INSPECTr Living Labs Experimentation

Living Labs Phase 1 - April 2021

Further Development of the INSPECTr Platform

Living Labs Phase 2 – March - April 2022

Summary

Update on INSPECTr Living Labs Experimentation Phase 3

Update on INSPECTr Platform Development During Living Labs Experimentation:
Phases 3.5, 4, and 5.

Living Labs Experimentation - Phase 3.5

INSPECTr Technical Team Meeting Athens 24th- 28th October 2022

Living Labs Experimentation - Phase 4

Living Labs Experimentation – Phase 5 – Live Demonstration by GN

A Series of CEPOL Webinars to Demonstrate the INSPECTr Platform

Project Kick-Off Meeting, UCD, Dublin, September 2019

Second Project General Assembly, June 2022

Contact Us

INSPECTr Project

Recent Articles and Publications

Enterprise Ireland Article, Jun 2020.

EU Researcher Article, July 2022.

Newsletter

INSPECTr Newsletter, First Issue (March 2021)

INSPECTr Newsletter, Second Issue (June 2021)

INSPECTr Newsletter, Third Issue (Sep 2021)

INSPECTr Newsletter, Fourth Issue (Dec 2021)

INSPECTr Newsletter, Fifth Issue (April 2022)

INSPECTr Newsletter, Sixth Issue (July 2022)

INSPECTr Newsletter, Seventh Issue (Nov 2022)

INSPECTr Newsletter, Eighth Issue (February 2023)

More Information about subscribing to our newsletter:

Blogs

INSPECTr Ethics Workshops January 2021

INSPECTr Ethics Workshop June 2021

Complying with Ethical and Legal Standards in the INSPECTr Project

The Ethical Approach to Research in the INSPECTr Project

Explanation of CASE Language and Reasons for its Adoption for the INSPECTr Platform

Handling of Standardised Evidence (CASE) by the Platform

Artificial Intelligence (AI) Research Methodology and Application

Natural Language Processing (NLP)

Image processing in INSPECTr

Ethical, Legal, and Societal Impact Assessment Overview

Introduction to INSPECTr Living Labs Experimentation

Introduction to INSPECTr Living Labs Experimentation

Living Labs Phase 1 - April 2021

Further Development of the INSPECTr Platform

Living Labs Phase 2 – March - April 2022

Summary

Update on INSPECTr Living Labs Experimentation Phase 3

Update on INSPECTr Platform Development During Living Labs Experimentation: Phases 3.5, 4, and 5.

Living Labs Experimentation - Phase 3.5

INSPECTr Technical Team Meeting Athens 24th- 28th October 2022

Living Labs Experimentation - Phase 4

Living Labs Experimentation – Phase 5 – Live Demonstration by GN

A Series of CEPOL Webinars to Demonstrate the INSPECTr Platform

Project Kick-Off Meeting, UCD, Dublin, September 2019

Second Project General Assembly, June 2022

Contact Us

INSPECTr Newsletter,
First Issue (March 2021)

INSPECTr Newsletter,
Second Issue (June 2021)

INSPECTr Newsletter,
Third Issue (Sep 2021)

INSPECTr Newsletter,
Fourth Issue (Dec 2021)

INSPECTr Newsletter,
Fifth Issue (April 2022)

INSPECTr Newsletter,
Sixth Issue (July 2022)

INSPECTr Newsletter,
Seventh Issue (Nov 2022)

INSPECTr Newsletter,
Eighth Issue (February 2023)

Update on INSPECTr Platform Development During Living Labs Experimentation:
Phases 3.5, 4, and 5.