Intelligence Network & Secure Platform for Evidence Correlation and Transfer
Project News
The project coordinator on INSPECTr and supporting the fight against cybercrime.
Read
more...
Project Coordinator, Ray Genoe:
"An intelligent platform for effective policing".
View ...
We are pleased to share our INSPECTr Project newsletters, in which we provide the latest updates on the activities of the INSPECTr Project. The INSPECTr consortium will periodically provide a newsletter containing information about the project and events to stakeholders on the project’s contact list. If you would like to receive such information, please subscribe to our newsletter here.
Across January, the INSPECTr project held two ethics workshops designed to provide both ethical and technical partners the opportunity to engage in an in-depth and informal dialogue, together with the Ethics Advisory Board and external ethics experts, on two 'spotlight' issues identified through partner Trilateral's ethics governance processes as important for ethical design. The first workshop took place on 19 January 2021 and focused on the integration of publicly available data, typically online data, into the INSPECTr Platform. The project was joined by ethics expert Dr Thilo Gottschalk and discussed the importance of data minimisation and data storage limitations in this regard. Design solutions, such as the use of search filters and default settings were identified as granular ethics requirements for the Platform. The second workshop took place on 26 January 2021 and focused on artificial intelligence systems within INSPECTr. The project was joined by ethics expert Phil Booth and discussed the importance of bias mitigation and the understability of the artificial intelligence output for LEA investigators. Design solutions, such as adjustments to datasets, weightings for certainty and the importance of error identification were discussed with a view to adoption in the Platform. These workshops inform the ethics requirements for the Platform, that accompany the functional requirements. The project looks forward to hosting a third project workshop on gender in Spring 2021.
As part of the Ethics-by-Design approach being taken by the INSPECTr
project, the partners held an ethics workshop on Gender and AI during June 2021. This was
attended
by both ethical and technical partners, along with external experts. Attendees worked together to
discuss gender-related issues that are relevant to both the project and the platform that is being
researched, and to develop design solutions for these issues. Gender is a sometimes overlooked issue
in data science projects that can have significant negative impacts. Not incorporating gender
perspectives
can mean that important issues are missed. For example, a gender bias in the data used to train
machine
learning algorithms, or amongst the members of a project, could mean that issues affecting different
gender
groups are not considered, or are missed.
The workshop discussed the impacts that can occur when gender is considered as an unchangeable
binary category,
how this can create harm for both people who are mis-categorised and law enforcement dealing with
potential
inaccurate data. This led to consideration of where such issues could be avoided or dealt with
during development
of the INSPECTr tools. Further, with respect to the tools being researched, a major point of
discussion looked at
whether data used for training machine learning tools could have gendered effects, and what some of
the impacts
could be. It was discussed how the project has already done a lot of pre-existing work to deal with
bias issues
in tools to ensure that they give more accurate results than if gender was not considered. Further
discussions
covered topics such as whether the project should include or exclude image search tools that used
emotion detection,
and biases associated with particular datasets, as noted by external expert Dr Allison Gardner, who
joined the
discussion. Looking forward, the project is taking into account work on bias by the IEEE and ISO,
and looking at
bias audits for tools and algorithms that are developed.
As a project focused on researching data-analysis tools and data-exchange
infrastructure to help improve law enforcement agencies in their daily work, it is imperative that
the INSPECTr project abides by the standards of research ethics, and data protection. To ensure
this,
the European Commission (EC) provided a number of ethics requirements for us to demonstrate our
compliance with such standards. We have worked hard to ensure compliance with these standards,
and we have had a positive response from the EC on our submissions to them.
A key part of Trilateral’s work is the ethics and data protection monitoring of the Living Lab
research.
This is where the tools developed in the INSPECTr project will be tested by our law enforcement
partners.
The work of partners includes completing data protection impact assessments to evaluate and mitigate
risks to the rights and freedoms of data subjects relating to the processing of personal data. We
have
also worked to determine the most appropriate data processing relationships between Data
Controllers,
Data Processors, and Joint Controllers.
The impact of Brexit on international data transfers between the EU and UK-based partners has been
an
additional concern for the project. Following the expiry of transition arrangements, the EU has
agreed
an adequacy decision, meaning that the UK’s data protection regime is judged to provide an adequate
level of protection for EU data-subjects. This provides for a continuation of smooth data transfers
between all partners in the project. However, bearing in mind recent judicial decisions, the
partners
are implementing appropriate back-up options in the event that the adequacy decision is subject to a
successful legal challenge.
The INSPECTr project has opted to take ethics-by-design and
privacy-by-design approaches to the development of the INSPECTr tools and platform, that
are managed by Trilateral. This means that ethical and privacy issues are considered at
all points of the INSPECTr project, and choices are made to follow outcomes are as ethical
and privacy-respecting as possible. A key part of this is sensitising the consortium to
ethical and privacy issues that could arise during projects like INSPECTr; to do this,
we held a series of webinars, along with continuous communication between ethics and
technical partners, to highlight particular issues to technical partners. A key result
of these discussions was coming to an understanding that balanced the potential technical
benefit of using personal data to train and test machine learning tools with the privacy
and data protection concerns that such processing entails, it was agreed that technology
research should take place using as little personal data as possible, and only where
necessary to fulfil the research purposes. Major work also took place around the most
appropriate legal basis for law enforcement partners to use when testing the INSPECTr
tools. It was decided that law enforcement partners should act under a legal regime
that treats them as researchers and should not be engaged in processing of data from
ongoing investigations. This helps to draw a clear line between research and operational
policework.
Another crucial aspect of the ethics and privacy-by-design approaches is the development
of a series of recommendations to mitigate risks and enhance opportunities related to
ethics, legal, and societal impacts. These were co-developed through discussions and
workshops with technology and ethics experts. These are being implemented across the
remainder of the INSPECTr project through collaboration with the technical partners to
ensure that the final results of the project are ethical, privacy respecting, legally
compliant, and societally acceptable.
Further, as technology advances at a rapid rate, so do ethical, legal, and societal
concerns associated with it. So, Trilateral engages in continuous horizon scanning
for issues that could affect the project itself, or adoption of the tools after the
project. Recently, this work has focused on the European Commission’s proposed AI
Regulation. The project wants to understand the nature and scope of the proposed
Regulation to try and ‘future-proof’ the tools ahead of changes to the legislative
environment. Happily, Trilateral already planned to incorporate many of the key
aspects of the proposed Regulation into the INSPECTr project, albeit in a slightly
different format. Consequently, Trilateral will be monitoring the development of the
proposed Regulation and adapting its approach to minimise barriers to adoption of the
INSPECTr tools in future.
With the CASE language having such a central role in the platform,
handling
of the CASE format becomes crucial. To accommodate the different data needs on the platform, the
CASE
language is used in three different forms by the platform and depending on the form stored in three
different
storage engines.
It exists as a binary json-ld file, that depending on the amount of investigative information it
represents,
could be of several hundred megabytes or possibly larger (e.g., if describing the file contents of a
large
disk drive). It is the output of the platform's parsers, the primary data input of the platform,
utilised
when for example an LEA user wishes to import in the platform evidential material stored elsewhere,
parse
reports produced by other tools in their organisation, or directly process evidential material via
one of the
platform's "wrappers" carrying out this functionality.
The binary form is essentially a collection of interlinked nodes forming a graph. For example,
information
about a (mobile) phone call between two individuals could loosely speaking be represented via a
graph by
three nodes; the phone call node holding information about the call itself (duration, application
used, etc.),
linked with two person nodes, each providing information about the individuals (phone number, name,
etc.).
Moreover, each person node could also be linked with other nodes, representing other calls that the
individual
took part in, messages they sent or received, etc. These binary files are stored in INSPECTr's HDFS
storage
system, along with all other binary files (photographs, videos, etc.), which are also described by
CASE files
(e.g., a photo node can list the EXIF data and provide the HDFS location of the actual
photograph).
With the binary form being the output of all tools on the platform and hence the common denominator
of all
investigation-related data, it became evident that handling CASE data in a clear and efficient way
was of
crucial importance to the platform. To achieve this, binary files are "flattened", or in other words
transformed
from a large json-ld file describing all interlinked nodes (a large hash table) into a collection
(an array)
of much smaller hash tables, each describing a single flat node. The graph structure of the binary
CASE form is
retained via the inclusion of meta-information on each flat node. During this process, each flat
node is
individually stored in INSPECTr's Elasticsearch storage system. This enables tools to directly and
efficiently
retrieve the information needed, instead of first parsing a perhaps very large binary CASE file in
order to
access this information. For example, the natural language processing tool directly accessing the
required
sms messages (a few hundred kilobytes) without first parsing the binary CASE file describing the
mobile phone's
drive where these messages were retrieved from (hundreds of megabytes, possibly larger).
Finally, in order for the structured and interlinked representation of cyber-investigation
information,
as well as the evolution of an investigation to be easily studied and manipulated by LEAs, CASE data
is also
converted into knowledge graph form, and stored in INSPECTr's Neo4j storage service. This allows
CASE data to
be queried in a more "investigative" way; for example, a query to "retrieve all individuals having
contacted
suspect X via sms during period Y" can be created using the CASE language and run on Neo4j.
One of the call’s requirements was to adopt a common format
for data homogenisation, data discovery (linked cases) and data exchange. The INSPECTr project
opted for the open-source Cyber-investigation Analysis Standard Expression
(CASE) language, a community-developed ontology
designed to serve as a standard for interchange, interoperability, and analysis of investigative
information in a broad range of cyber-investigation domains, including digital forensic science,
incident response, counter-terrorism, criminal justice, forensic intelligence, and
situational awareness. The CASE Community is a consortium of for-profit, academic,
government and law enforcement, and non-profit organisations that have
created a new specification for the exchange of cyber investigation data between tools.
CASE provides a structured specification for representing information that are analysed
and exchanged during investigations involving digital evidence. To perform digital
investigations effectively, there is a pressing need to harmonise how information relevant
to cyber-investigations is represented and exchanged. CASE enables the merge of information
from different data sources and forensic tool outputs to allow more comprehensive and
cohesive analysis. The main benefits of using CASE are:
Fostering interoperability:
to enable the exchange of cyber-investigation information between tools,
organisations, and countries. For example, standardising how cyber-information is
represented addresses the current problem of investigators receiving the same
kind of information from different sources in a variety of formats
Establishing authenticity and trustworthiness:
based on the clear representation of the Chain of Evidence (provenance) and the
Chain of Custody. A fundamental requirement in digital forensics is to maintain
information about evidence provenance while it is exchanged and processed
Enabling more advanced and comprehensive correlation and analysis
In addition to searching for specific keywords or characteristics within a single
case or across multiple cases, having a structured representation of
cyber-investigation information allows more sophisticated processing such as
data mining, or NLP techniques. This can help, for instance, to overcome linkage
blindness that is the failure to recognise a pattern that links one crime to
another, such as crimes committed by the same offender in different jurisdictions
Helping in dual/multiple tools validation or results
in order to evaluate their completeness and correctness/accuracy;
Automating normalisation
and combination of differing data sources to facilitate analysis and exploration
of investigative questions (who, when, how long, where).
An investigation generally involves many different tools and data sources,
creating separate storerooms of information. Manually pulling together information
from these various data sources and tools is time consuming, and error prone.
Tools that support CASE can extract and ingest data, along with their
context, in a standard format that can be automatically combined into a unified
collection to strengthen correlation and analysis. This offers new opportunities
for searching, contextual analysis, pattern recognition, machine learning, and
visualisation.
Blog provided by INSPECTr partner Daniel Camara of the French Gendarmerie (GN)
Artificial Intelligence (AI) is an interesting research field and, depending on whom you ask, it
either
has no formal definition or has hundreds. Even specialists do not fully agree, with different
researchers
having different definitions and interpretations of AI, which comes to the fact that intelligence
itself
has different interpretations. What is intelligence? Some believe it must be linked to rationality,
critical
thinking, high order brain functions, others seek intelligence from things that may not even have a
brain at
all. However, a relatively well-accepted intuition is that a computer program belongs to the AI
class if
capable of behaving/giving answers, close to the ones a human would if presented to the same kind of
situation/question. Another concept that is intrinsically linked to AI is the notion of error. If
you
have a way to solve a problem, that gives a perfect solution every time…. That, most probably, does
not
fit the general concept of AI. In some sense, it does not contradict the previous intuition, as
humans
solve problems in an instinctive way, and yes from time to time we make mistakes, “to err is human“.
AI methods are heuristics, intuitions, that are not necessarily full-proof methods. The word
heuristic
originates from the Greek word “heuriskein”, meaning “to find” or “discover”. A heuristic is a
practical
method, which comes from, for example, an intuition, an abstraction, or even a pattern
generalisation.
It is an idea that is not sure to work every time but that, in reality, behaves pretty well.
As the definition of the AI field is “flexible”, to say the least. Many different methods/heuristics
are used to try to reach this “similar to human” kind of answer. This implies that the AI field is
constantly evolving, and new methods are created to explore ways to solve general problems or work
well in a specific domain. One of the most iconic artificial intelligence methods is the neural
network.
It is based on abstractions of how the human neurons work and are organised. Rosenblatt proposed the
formalisation of a perceptron (the abstraction of a neuron) in the 50’s, and even the modern deep
learning approaches rely on his concepts. A perceptron is a simple sum function that add up the
different received stimuli and have an activation function to propagate a signal. A neural network
is an organisation of these basic units in series and layers. The modern deep networks are called
deep because they have many layers of perceptrons. When considered individually, perceptrons may not
be that impressive, but when organised in a network and trained, the provided answers may be quite
impressive, even though there is no guarantee that the answers will always be correct. Deep
learning methods are currently used in INSPECTr for example to perform facial recognition,
for detection of child pornography and cars in images, text translation, among others.
Even if deep learning is quite in vogue in the last few years, and with good reason (it is a
general method that can be used to solve an important range of problems), it is not the only AI
method. A broad range of other methods exist. For example, the crime forecasting implemented in
INSPECTr is based on the analysis of time series and on the intuition that crimes have an
opportunity
factor, which is basically random, but also a facilitator factor, which means not all the places and
times are fit for the commission of a crime. If we can map these places, we can have a good
intuition
of where crimes may happen in the future. Three things are required for a crime, a victim, a
perpetrator,
and a place where these two meet. Not necessarily at the same place at the same time, but somehow
the
paths of these have crossed at some point. However, some places are more fit for a crime commission
than others. For example, a dark alley or a tourist attraction with many distracted people. Moreover
the crimes have a seasonal component, e.g., not much happens in a winter sky station during the
summer.
However, it is also a tendency factor, e.g., a team that is committing residential burglary tends to
spread their activities in a region over a period of time. So, if we analyse the criminality of
different
regions, considering criminal seasonality and tendencies, to forecast the next days/months criminal
levels
of activities it is not possible to guarantee that the criminal pattern in a given region will be
the same
tomorrow as it was today, but most probably, tomorrow's criminal profile for that region will be
close to
today than any day two months behind. Moreover, in general, it can be assumed that this week will
present
more similarities with the same week of the last year than with the same week six months
ago.
Hundreds of other heuristics exist, some are more suited to solve LEAs’ problems, some less. Some
people
could even argue that we would need to apply LEA based heuristics to solve some of the specific
LEA-related
problems, and they are probably right. However, AI methods should be used on “hard to solve
problems”, the
ones we cannot distinguish a clear pattern linked to the problem or solution to that problem. If one
can
distinguish a pattern that can be transformed into a set of defined rules, AI methods may not be
required
to solve it. Considering that AI-based methods imply an error probability, if you solve the problem
using
standard and precise algorithms, it will always be better.
Blog provided by INSPECTr partner Panos Protopapas of Inlecom (ILS)
Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) and linguistics
concerned with the task of making machines "understand" text found in documents and automatically
extract information contained in these to automate various processes by using it. Some common NLP
tasks are described below, ordered from those considered relatively easier for a machine to perform
to those considered more difficult:
Although the output of these tasks might not seem very useful on a per-task basis also considering
that the machine performance in these tasks is not yet at par with human performance, one should not
overlook the speed increase that the usage of NLP can provide, in situations where the number of
documents to check is too large and checking them manually is not realistically possible or economically
viable. To this effect, the synergies between these NLP outputs and the positive effects they can have
when post-processing searching or filtering through these documents is of interest.
Furthermore, the outputs of the NER, sentiment analysis and topic modelling techniques can greatly
enhance the filtering techniques of users going through NLP-analysed documents. If said documents have
been tagged with the outputs of the NLP models, a user could filter for example for documents mentioning
a particular location and/or date, having a negative sentiment, and some firearm appearing in the list
of words defining the topic.
Finally, the summarisation results can also aid the human processing of documents by enabling users to
scan through a document for an accurate and brief summary of its contents prior to deciding on investing
the time to read it.
Blog provided by Ben Roques of the CCI project coordination team (CCI/UCD)
The INSPECTr platform implements an image processing system aiming at reducing the workload of
investigators when dealing with a very large amount of media content. After media extraction through
the use of INSPECTr platform’s specific tools, or after ingesting media directly into the platform,
law enforcement officers can automatically annotate pictures using different image recognition
models. Those annotations can then be filtered and searched depending on the needs of the
investigation.
The system needs to be able to ingest a large amount of data. Oftentimes, storage supports linked to
an investigation contain thousands of media files. Law enforcement officers should be able to send
all the files to be processed in the background and then search for/prioritise investigative
actions. A model is a machine learning classifier that was trained on specific data. For the purpose
of INSPECTr, the image processing pipeline only uses convolutional neural networks as those were
proven very effective for image processing tasks. There are many different architectures, with new
ones frequently published in scientific journals. The system possesses many different neural
networks trained on for a specific task. It was designed to be very modular. Adding and removing
models needs to be easily done. This is due to the fact that the field of machine learning is always
evolving and moves really fast. That is why the system tries to be model agnostic. Model specific
pipelines are pushed at the very end of the data flow.
The system is organised around micro-services. Pictures to be analysed are sent to a main endpoint
that forwards tasks to a queuing system. A worker collects the tasks from the queue and forwards
them to a model micro-service. Models can be swapped, added, or removed without bringing the system
down. Once the model is finished, a webhook is sent to the main endpoint. The result is then cached
into a database.
The system is not designed to make automated decisions. The idea is to triage a large number of
pictures to lighten the investigator’s workflow. For instance, cases related to child sexual
exploitation oftentimes contain a lot of media files. The system is able to detect the presence of
children and/or nudity in a set of pictures. The investigator can then, for example, search for the
face of a suspect inside the set of pictures containing both nudity and children using face
recognition technologies. This example is only a small use case. It is important to note that
because many more models can be added, the possibilities will vastly increase over time.
Blog provided by INSPECTr partner Joshua Hughes of Trilateral Research (TRI)
As part of the ethical approach to research in INSPECTr, Trilateral conducted an impact assessment
of ethical, legal, and societal issues that could be raised by the INSPECTr project and
technologies. As has been discussed in other blogs, privacy and data protection issues are also key
considerations. It is important that these issues are considered so that the final version of the
INSPECTr platform developed in the project is the most ethical, legally optimal, socially
acceptable, and privacy-respecting as possible. It is crucial that this work is done because the
intended recipients of the INSPECTr platform are law enforcement agencies whose officers come from,
and contribute to, the society that is being policed.
As noted in other blogs, Trilateral analysed both the project and technologies being researched to
develop a list of requirements, and some of these were discussed in more detail during workshops
(the list of requirements is provided in D8.5 Ethical, Legal and Societal requirements for the
INSPECTr platform and tools). The workshops helped to specify some of the details needed to fulfil
some of the requirements and, for others, provided space where options for fulfilling the
requirement could be discussed. In addition, through the Ethics and Privacy-by-Design work carried
out by Trilateral, some additional requirements have been developed.
How all of these requirements can be fulfilled is an evolving conversation that will continue until
the end of the project. As the technologies are still being researched, it is not possible to give a
final overview as to how the requirements are being fulfilled. However, generally, it can be said
that the INSPECTr platform is being researched in a way that enables data to be collected, analysed,
and shared in appropriate ways that respect the applicable standards; end-users will be fully
informed about what tools can and cannot do, what their limitations are, and how to use them through
a detailed training provision. Overall, the ethical, legal, and societal requirements developed in
the first part of the project are likely to be fulfilled in the second half, leading to a platform
that will enable law enforcement to engage in analysing large amounts of data from complex
investigations in an ethical, legal, and socially acceptable way.
Blog provided by INSPECTr partner Centre for Cybersecurity and Cybercrime Investigation
A network of LEA Living Labs was to be set up both organisationally and technically to specify requirements, experiment with and test the project outputs, and create the nucleus for the sustainable use of the INSPECTr platform by LEAs throughout Europe. Each LEA Living Lab was to provide an experimentation and a test-bench environment primarily and initially with mocked evidence from scenarios created by LEA partners to test the requirements, accuracy, and user acceptance of the platform. Use Case mocked but realistic evidence was then prepared by our Law Enforcement project partners for experimentation and testing of the INSPECTr platform's functional and non-functional characteristics and have been developed to utilise the majority of the technological developments. Numerous forensic analysers and intelligence gathering tools are required to process both digital and non-digital items.
Early Milestones:
An early milestone in the project was to define the common processes and baseline resources in LEA Living Labs to experiment towards producing a detailed requirements pipeline. This required comprehensively detailing the LEAs coordination and experimentation environment, each of the investigative tools used, and an in-depth analysis of a 3-stage questionnaire process completed with the support of our LEA partners. All of the efforts throughout this task were geared towards establishing a solid collaborative environment for LEAs. Based on LEA experimentation and feedback, the technical partners were able to extract the initial requirements and specifications for the forthcoming INSPECTr platform, which was being developed in parallel within the other relevant work packages. This process provided invaluable knowledge to inform the Consortium of what is expected of the INSPECTr platform, regarding its performance and functionalities that are most relevant to LEAs’ investigative workflow.
Key Approaches:
Using both structured and unstructured data as input, the developed platform will facilitate the ingestion and homogenisation of this data with increased levels of automatisation, allowing for interoperability between outputs from multiple data formats. Various knowledge discovery techniques will allow the investigator to visualise and bookmark important evidential material and export it to an investigative report. In addition to providing basic and advanced (cognitive) cross-correlation analysis with existing case data, this technique will aim to improve knowledge discovery across exhibit analysis within a case, between separate cases and ultimately, between inter-jurisdictional investigations. INSPECTr will deploy big data analytics, cognitive machine learning and blockchain approaches to significantly improve digital and forensics capabilities for pan-European Law Enforcement Agencies (LEAs).
Data Formatting:
The appropriate formatting and structure of data has been an important consideration in the project from the outset, with the project ultimately opting for the open-source Cyber-investigation Analysis Standard Expression (CASE) language. This is a community-developed ontology designed to serve as a standard for interchange, interoperability, and analysis of investigative information in a broad range of cyber-investigation domains. CASE provides a structured specification for representing information that is analysed and exchanged during investigations involving digital evidence. CASE also enables the merge of information from different data sources and forensic tool outputs to allow more comprehensive and cohesive analysis.
Gadgets:
In INSPECTr our aim was always to be able to use our own tool outputs and develop a ‘Toolbox’ of data enrichment analysers, or as we call them, ‘Gadgets’ with the idea being that we would have free forensic tools and all of the outputs of these tools would be fed into the storage layers and accessible through the Case Management System.
Mocked use case development: 3-part scenarios scheduled to match the technology agenda
A challenge that presented early in the project was how can we test the technology while respecting data privacy and GDPR? The use of existing evidential data from historical cases for testing our platform would clearly contravene these. Therefore, the decision was taken early, to use mocked data for our experiments.
To replicate “real-life” investigations, our experienced law enforcement partners were tasked with developing three unique scenarios to be investigated, each with fake suspects and a rich history of communication across numerous sources of evidence. The evidence they created also reflects the volume of information that investigators see in real-life and the linkage between actors under investigation.
By using mocked evidence, our law enforcement partners can openly discuss issues with the platform with developers, while respecting ethical considerations. The use cases have been developed in three parts and address features of the platform as it develops. There are six phases of technology developments on the platform, so we wanted the LEAs to map the technology agenda to each part of their use case and get more technical as they develop.
In April 2021, the INSPECTr technical partners demonstrated the features of the platform to law enforcement partners for the purposes of familiarisation and for gathering feedback on the developments to date. A lot of services, which had so far been developed in isolation, were ready to be integrated in preparation for the next phase of development. It was essential to get the views and observations of our law enforcement partners prior to moving to this next stage and use their valuable feedback to guide the continued development of the platform technology.
Until this point, the INSPECTr project's primary focus had been on developing the platform rather than on a live environment but with the Living Labs, this was all set to change. The work carried out in the early parts of the project to understand the hardware, software and service requirements meant we had a good foundation for deployment. This initial planning phase meant that we had applicable hardware for the requirements mentioned above. We had several conditions in mind for the hardware planning phase, virtualisation capable hardware, easy expandability, and hardware uniformity; if we had an issue with one device, we would have the same issues with other devices. We also understood that asking partners to install multiple devices wasn’t feasible and determined virtualisation as a critical aspect of our needs. Most infrastructure today is developed to be one device for many services. The hardware was selected with extendibility in mind and had an upgrade path available if the system needed more resources or if the project required more performance in future.
The above gives a brief overview of the hardware, but next comes the software and technologies aspect of the project. Virtualisation of services was the most important. Virtualisation allows us to have multiple services carrying out various functions; each node is a network in a box, providing all requirements for running a node on the INSPECTr network. To further segregate the hardware resources and separate them into parts that allow multiple services to work independently of one another, we used Docker. Docker allows more functionality than just segregation of resources, such as live updates of services, instead of requiring redeployment and shutting down systems to update. The platform's development process required such technologies, which became fundamental during the deployment process for the Living Labs.
So we have the technology and the hardware, but now came the question of how we would deploy it. There are numerous steps in the platform's deployment, but there was a degree of automation available due to having the same hardware in each case. We carried out the deployment with an automation framework that allowed us to deploy operating systems and software and set up each node as a replica of one another while not directly replicating. The strategy allowed us to deploy all living lab infrastructures within two days and have the Living labs up to date with the current rate of development pace. So, when developers need to update or modify code after deployment, changes made in the code management system (cms) are pushed to all Living Lab nodes. After a few minutes, these changes are live on all Living Lab nodes. Rather than waiting days for updates to software or, in the case of some commercial offerings, potentially months, these changes can be made much faster and with less downtime. More rapid deployment is not a silver bullet to software updates, but it gives allows the technical team to send code from developer to platform with fewer delays and minimal platform downtime.
Phase 2 testing of the INSPECTr platform, using mocked evidence Use Cases, allowed our Law Enforcement partners (LEAs) to test our software in March and April 2022.
For this testing phase, formal feedback was gathered from our LEAs by means of a survey and informal feedback was gathered at a meeting of the Law Enforcement Steering Group following the completion of the testing phase. Feedback on the overall approach of the platform’s development was positive, with our law enforcement partners clearly understanding the ‘vision’ of the INSPECTr platform and how it could be used and useful for investigative purposes. Overall, there was a good level of satisfaction with the ease of new case creation, which was found to be simple and intuitive, as well as the process of logging in, the widgets interface, and how to execute INSPECTr gadgets (but with a few issues still requiring more clarification on how to use them).
More detailed feedback was further provided by our LEA testers on individual elements of the platform where issues had arisen highlighting the need for further platform development and refinement in these areas. These elements included the following:
This had been the first opportunity in the project to test the Use Cases in this much depth and it proved to be an extremely useful exercise and a great learning experience in terms of how the concerns of law enforcement can be addressed. The engagement of the INSPECTr law enforcement partners in developing the mocked evidence Use Cases, participating in the testing phase, and in providing feedback, has been invaluable. It has also been very encouraging from the technical side that the technical team were able to fix bugs and resolve issues raised during testing quite quickly. The issues raised during testing were tracked and categorised into headings of minor, major and critical. Only one critical type was recorded and that is the fact that evidence is not currently being segregated by case ID. This was a known factor prior to testing and is scheduled to receive remedial action before the next Living Lab. The technical team will also continue to work on the refinement of gadgets, widgets and dashboards and the way SIREN accesses the data, speeding that up and have greater linkage and better analytics overall. A more detailed demo will be prepared and provided to participating law enforcement partners ahead of the next Living Lab, Living Lab 3, which has been scheduled for 21st-23rd June 2022. This will be held in UCD, hosted by the INSPECTr Coordinator.
From the outset of the project the INSPECTr platform was to be built with the active
participation of a broad and inclusive pan-European LEA community including the definition of
requirements, feedback and involvement on research and development tasks and the demonstrable
use of the outputs.
Living Lab 3 took place in UCD on 21st to 23rd June 2022. This was an in-house meeting where
both developers and law enforcement participants were able to meet in person. This proved
invaluable from a communications point of view as both groups were able to easily interact
with one another during the live testing, allowing both groups to effectively communicate problems
and usability issues that needed to be resolved and/or improved in the platform.
An example of this was how the law enforcement participants highlighted usability issues which
developers had up until this time been unaware of, but having these brought to their attention, could
easily understand. These have since been brought on board and elevated as main priorities by
developers to improve or incorporate.
During Living Lab 3 law enforcement participants worked alongside developers as live updates
were made to the platform. Developers worked late nights to fix reported bugs, and it was found to
be much easier to find effective solutions with law enforcement feedback being available immediately
and in person rather than working remotely. The meeting also allowed law enforcement participants a
more immediate view of how INSPECTr could fit for them as potential future users of the platform.
Despite fixing many bugs at the live testing event, developers also logged a long list of issues
to work on after the meeting. From this work, developers noticed and logged other relevant platform
issues to work on in the long term that had possibly not been flagged as requiring attention prior
to the live event.
Since Living Lab 3 the CMS (Case Management System) has been integrated on the platform which
should resolve a significant amount of the usability issues that were raised. Further work on the
graphical representation of data for users continues and is being directed based on law enforcement
feedback from the Living Lab.
Living Lab experimentation continues to be an integral part of INSPECTr platform development.
There has been a great deal of highly focussed activity during the most recent project quarter
where the INSPECTr development team have been working intensively through several phases of the
Living Lab Experimentation schedule to develop, troubleshoot and provide solutions to the
ever-evolving INSPECTr platform.
During Living Lab 3.5 (LL3.5) INSPECTr project LEA participants were invited to practice several tasks remotely using an updated version of the INSPECTr framework on their individual server nodes via a secured VPN connection. Participants were asked to create and link a new user account to perform a series of practical tasks, including the following:
As well as testing the platform, the LEAs were tasked with providing live feedback regarding issues
that they encountered as they worked through the testing process. This feedback was also provided
more formally via feedback surveys. Both feedback formats served to inform the development team of
the technical capabilities of the platform and to provide suggested improvements to help the
development team best tailor the platform for LEA end-users.
Examples of feedback presented by LEA testers were syncing issues across a subset of components,
artifact creation within the Case Management System (CMS), report visualisation and its integration
across components, and running of analysers on certain nodes. These issues received attention by
the INSPECTr development team as they occurred during the Living Lab and served to provide further
direction for subsequent platform software development.
An INSPECTr developer team meeting was held in Athens during October to focus on the ongoing
technical development of the platform. The meeting commenced with some big-picture planning with
a review undertaken of issues being experienced in the platform and on functionality that was yet
to be implemented.
Pressing issues related to forthcoming Living Labs were considered and following this review any
issues that could be addressed within the available time were fixed and the platform tested to
ensure everything was working as expected. In particular, some of the issues examined and fixed
pertained to the running of analysers, creating artefacts using reports of analysers run previously
and viewing images and downloading them and account management and population of Elastic Search
indices in charge of this.
Issues regarding the Case Management System [CMS] were also discussed following feedback received
from LEA testers during Living Lab 3.5, a Living Lab that was held concurrently with this
technical meeting. Discussions were also held regarding widgets and Siren including issues in the
sorting and pagination of widgets and switching the language used in widgets' codebase from
javascript to typescript. Discussions also included SIREN adding functionality to offer NLP
results within its platform. Towards this, the NLP toolbox would be developed to offer a new set of
endpoints, so that models' results could be returned.
The security of the platform was discussed from various aspects. In most cases it was decided
that due to time restrictions and, in an effort, to push further with the development of other
functionality, issues would be examined at a later point in the project.
A discussion was also held regarding the envisioned usage of the AI tools in the future, how the
CASE standard could be used to describe their results, and how all this data could be later
harnessed from the Knowledge Graph stored in Neo4j.
On the closing day of the meeting the PubSub component of the platform was discussed and
specifically:
The above summary of work illustrates the extensive work being undertaken in the technical
development of the platform and how LEA user testing in the Living Labs environment is helping
to inform the direction of the evolving platform.
For Living Lab 4 (LL4), LEA participants travelled to CCI, University College Dublin from November 7th to November 11th 2022 where they reviewed and tested the latest version of the INSPECTr framework locally in Dublin. Accessing their individual INSPECTr server nodes, LEAs created admin and user accounts to test the functionality and usability of the Case Management System (CMS) as well as the SIREN digital visualisation environment on selected mocked use case artefacts. LL4 tasks included:
As usual during a Living Lab, as well as testing the platform, LEAs were asked to provide live
feedback regarding issues that they encountered as they worked through the testing process.
This provides the software development team with priority bug reporting as well as suggested
usability improvements for LEA end-users. Examples of issues presented by LEA testers were issues
finding investigative links in the use case mocked data, digital visualisation of evidence in the
SIREN component, and the running of analysers on some nodes. These issues received attention by the
INSPECTr development team as they were presented during the Living Lab and continue to provide
direction for subsequent software upgrades and usability improvements for end-users.
INSPECTr partner, GN (French Gendarmerie), presented an online live demonstration to the project’s LEA partners of some of the AI platform tools. One of those is the Gendarmerie's crime forecasting tool. The tool uses past criminal reports' data like geolocation, date and type of crime to predict future tension zones and displays them using heatmaps. This allows LEAs to distribute resources according to the zones of tension and better manage them in time and space. GN also presented tools such as speech recognition, OCR and image annotator tools.
In February 2023 the INSPECTr project presented a series of lunchtime webinars, courtesy of CEPOL,
with the target audience being Law Enforcement Officers, Judicial Authorities, and EU Public
Security Entities fighting cybercrime.
The main goal of the INSPECTr project is to create a proof-of-concept platform, but future
development will aim to improve the technology towards operational use so that it will be adopted
by European LEAs. The platform can be used for a wide range of LEA activities, such as digital
forensics and open source intelligence gathering. However, it also addresses major issues that
LEAs experience, such as big data management and collaboration with other jurisdictions. The aim
of this webinar series was to inform the target audience on how the INSPECTr platform can be
accessed, installed, and configured, how to use INSPECTr gadgets for the acquisition and
processing of digital evidence and intelligence sources, and to demonstrate the platform’s case
management system and analytics services. Comprehensive practical demonstrations were presented
throughout the week-long webinar series. Access to the recordings of the webinar series is
available to those registered on the LEEd platform, CEPOL’s online education and training platform.
The webinar series covered the following topics:
Project Overview Including Platform Setup and Usage
Featured Tools and Basic Data Visualisation
Data Standardisation, Chain of Evidence/Custody and Analytics
Integrated AI/ML Tools
Evidence Discovery and Exchange
Other Features and Future Exploitation.
A comprehensive Blog that covers in greater detail what was presented throughout the full CEPOL
webinar series is available in our most recent
INSPECTr Project Newsletter.
INSPECTr Project Coordinator (UCD-CCI)
UCD Centre for Cybersecurity and Cybercrime Investigation
UCD School of Computer Science
University College Dublin
Belfield, Dublin 4, Ireland
+353 1 716 2934
+353 1 716 2923