Invisible metadata, real risks: what anyone can see on your website

Every time you upload a document to your website, you may be sharing more information than you realise. Beyond the visible content, metadata can reveal details such as who created a file, which software was used, the date of creation, and even from what location. If you want to learn more about metadata, you can visit our blog.

For businesses and organisations, this data can represent a security breach. From employee names to geolocation data, poorly managed metadata can lead to privacy and compliance issues, especially under regulations such as GDPR.

In this analysis, we explore how exposed metadata can affect a company and what kind of information is leaked without many noticing.

The study: what we have analysed and why it matters

To understand the impact of the exposed metadata, we have analysed public documents from two different websites:

On the two websites analysed, we have anonymised all personal data and replaced the links to the original files with links to sample files hosted on our own server to ensure privacy.

Web with high volume of content: thousands of documents accessible in multiple formats (PDF, Word, Excel). — Link to the report
Web with little content: only a few files available, but enough to assess risk. — Link to the report

This approach allows us to compare two extreme scenarios and demonstrate how, regardless of the size of a website, metadata can represent a risk if not properly managed.

Here you can see the personal names (small circles) that have been found on each website (larger central circles).

Most interesting is the central section, where common names appear on both sites. This indicates that a person has collaborated with both organisations or that one company has published a document created by the other, exposing its metadata.

If you are interested, you can explore the original graph here:

Throughout this article, we will show you what information is commonly found in metadata, what data can become a data breach and how you can reduce the risks to comply with regulations such as GDPR.

Summary of the analysis: an X-ray of the exposed metadata

This report presents a detailed analysis of the metadata exposed in public documents on a website. Our objective is to analyse and detect what information is exposed in the metadata of publicly accessible documents and produce a report oriented to a privacy professional for evaluation and advice to the company.

To facilitate interpretation of the results, we compare the findings found on the site with other sites of similar size. This comparison helps to understand whether the level of exposure is within the norm or whether it presents a higher risk than expected and needs further investigation.

Your metadata may be leaking private information.

Audit your website now and protect yourself.

What can you see in these metadata reports?

The usual data found in this type of analysis are:

Devices used for its creation (cameras, smartphones, printers, …).
Personal names or user names.
GPS coordinates (sometimes of offices or events… and private homes!).
Software tools that have been used to create or manipulate the documents.

Summary and comparison with websites of similar size

To better understand the impact of the metadata displayed, we will compare the differences in the results obtained for each website analysed. We will look at what types of information appear, how many times they appear and how these results compare with similar websites.

In the first case, the measurements indicate that it filters much more information than websites of similar size, while in the second case, despite filtering information, the volume of information is slightly below the average.

Geographical information

One of the most sensitive leaks is related to GPS coordinates, which can be embedded in photos taken with smartphones, depending on their settings. In the right context, this information is useful: it allows us to organise travel memories or locate events on maps and create beautiful visualisations.

However, when this data appears in documents or images published on a corporate website, it can pose a risk to the privacy and security of employees. It is not uncommon to find geographic information in business activities such as events, trade fairs or team dinners, which is usually not problematic.

But what about the images in the ‘Meet our team’ section – were they taken at the company’s offices or in private homes? What were the privacy settings on the smartphone used to capture them?

These questions can make the difference between a simple piece of information and an unintentional security breach.

In this example, the larger website contains a lot of geographical information in the metadata. It points to the same city, which gives information about the activities of this organisation and the dates on which they take place.

If this metadata has not been placed there consciously, there is no guarantee that any of these coordinates point to private addresses.

In the second website, we see one of the effects of the use of content management systems (WordPress, Joomla,…). To optimise the website, copies of different sizes of the images that have been uploaded are created. If the original image contains GPS information, it will be spread throughout the rest of the website making it much easier to find.

What details are exposed about an employee?

Document metadata contains information that associates a person with a company, its activity and its working tools. A metadata analysis allows us to find things like:

Personal name or alias: This helps to identify persons or deduce their email address or internal user name.
Number of associated documents: This indicates how many files have been created or edited by this person within the analysed website.
Period of activity: From the dates of creation and modification of documents, it is possible to estimate from when and until when someone has worked on certain files, which could reveal changes in the workforce or internal movements in a company.
Software used: Metadata includes details of the programmes used to create or edit files, giving clues about the most commonly used tools within the organisation.
Common devices: In some cases, documents may contain information about the type of device used (PC, Mac, or even mobile), which could be useful for assessing technology standards within a company.

While this data may seem harmless, taken together it can profile a worker’s activity, identifying patterns of use and potential privacy risks.

Moreover, by exposing all this information, very personal details are given that allow the creation of very sophisticated phishing campaigns to deceive the user.