GameIndustry.eu Logo

2022 2014 2016 2019 2023 2017 2023 
GameIndustry.eu /  Metadata

 What Metadata Reveal About You


Metadata are traffic data generated by users during their internet usage and are often stored over long period. Frequently without their knowledge or active consent. These data can contain a variety of information about the users' online behavior, such as visited websites, communication times or IP addresses.


In political discussions, metadata are frequently addressed. Especially in connection with the issue of privacy in the digital space. A recurring topic is the potential erosion of data protection rights, such as through the weakening of encryption or the introduction of backdoors in systems designed for state surveillance. The legal obligation for internet providers to disclose user data or to enforce filtering mechanisms for certain online content is also often implemented.


A well-known example of the use of metadata by intelligence agencies is the disclosure of information through the Edward Snowden leaks in 2013. It was revealed that institutions like the NSA (National Security Agency) of the United States and the GCHQ (Government Communications Headquarters) of the United Kingdom were collecting metadata on a large scale and using it for their surveillance programs.


Metadata Slide 1 Metadata Slide 2

ℹ️ Content or metadata? Metadata matrix from 2010

Metadata provide a means of surveillance without the need to directly capture the content of communication itself. The extensive collection and analysis of such data by government entities raises important questions regarding privacy and individual rights.


Use of Metadata by Palantir Technologies Inc.


Palantir offers software solutions like Gotham and Foundry, which can integrate and analyze large amounts of structured and unstructured data. These platforms enable the collection, organization, and visualization of metadata to identify patterns and interpersonal relationships.
An example is the collaboration with the U.S. immigration agency ICE, which commissioned Palantir to develop a system called "ImmigrationOS  ." This system is designed to help collect information on individuals who have violated immigration regulations, including those who overstay their visas or self-deport.


In the past, Palantir has also been involved in projects that led to the surveillance of journalists and activists. For instance, it became known that Palantir software was used in Germany to analyze communication networks, which could also include individuals such as defenders or journalists who were in contact with suspects.


Vermeintliche Playtestanmeldung

ℹ️ Palantir Foundry based on Amazon Web Services (AWS)  


Metadaten in Games


Metadata in video games are not only used to analyze player behavior, but also to capture psychological aspects and moral decisions made by players. In past revelations, mainly to identify potential terrorist activities or illegal communications.


As part of the Snowden leaks, it became known that the NSA infiltrated virtual worlds to collect user data and monitor potential threats. A prominent example was the popular online game World of Warcraft, developed by Blizzard Entertainment, Inc..


Other games and platforms that were also monitored by the NSA include:

  1. Second Life:
    The virtual world *Second Life* was under focus by the NSA, as the platform could be used as a potential location for secret meetings and undiscovered communications.
  2. Xbox Live:
    The online gaming platform *Xbox Live* was monitored, particularly because players might exchange confidential information via the voice chat function.
  3. PlayStation Network:
    Similar to *Xbox Live*, the NSA focused on the *PlayStation Network*, mainly on the communication possibilities within games.
  4. EverQuest:
    The MMORPG *EverQuest* was also monitored by intelligence agencies due to its popularity and complex social structures.
  5. Angry Birds:
    The NSA and GCHQ collected a variety of personal information through the game Angry Birds  . This included location, age, gender, and sexual orientation, which were shared through the app and its advertising networks.

Since the revelations by Edward Snowden in 2013, data collection in the gaming industry has further developed—both in scope and in the technologies used.


Modern games collect an incredible amount of (meta) data on user behavior, technical specifications, personal interests and preferences, financial activity and in some cases even biometric characteristics. This information is used not only to optimize gameplay but also for targeted monetization through personalized advertising and offers.

  1. Unity Technologies (Unity Ads):
    Unity generated revenue of USD 457 million in Q4 2024, with both its engine and advertising business declining. The platform recorded around 8 billion daily user ad interactions, maintaining its position as one of the largest ad networks in the gaming industry.
  2. Amazon.com, Inc. (AWS):
    Amazon AWS generated ad revenues of USD 56.2 billion in 2024, making it one of the leading providers in the digital advertising market. The cloud business generated USD 107.6 billion in revenue and was the most profitable segment, with an operating profit of nearly USD 40 billion.
  3. Microsoft Corporation (Xbox & PlayFab):
    Following the acquisition of Activision Blizzard, revenue rose to approximately USD 21.5 billion. Microsoft's overall advertising revenue grew by 21% in Q4 2024 and was estimated between USD 13 and 14 billion annually.
  4. Electronic Arts, Inc. (EA):
    Total revenue in 2024: Around USD 7.6 billion, with part of it generated through in-game advertising and partnerships. EA integrates ads into games like FIFA and Madden NFL, particularly via virtual advertising boards and sponsorships.
  5. Blizzard Entertainment, Inc. (Activision):
    Approximately USD 8 billion, prior to full integration into Microsoft. Games like Call of Duty and Candy Crush contribute significantly to revenue through in-game advertising and microtransactions.

  6. In many cases, users are left unaware of how the product actually behaves. Sometimes even misled. Tracking what data is collected, requesting its deletion or gaining access to it is often difficult and despite regulations like the GDPR or other laws, in some cases practically impossible.


    On these pages, you will find numerous examples of unauthorized data collection, privacy violations and questionable mechanisms in games and software.


    Meta- and Other Data: An Unmanageable Mountain of Data


    In games and regulary software, almost every detail is captured. Meta,- and other data are collected through product analytics, user profiling, advertising and crash reports (Crashlytics). This leads to an ever-growing data mountain, the scale of which is almost impossible to comprehend.


    Companies often justify their data collection with anonymized statistics. However, during this process, the IP address is frequently captured, which is considered a personal identifier.


    Unique Identifiers: More than Just IP Addresses

    In addition to IP addresses, there are a variety of other unique identifiers that can provide insights into individuals. These are used in conjunction with metadata. Some examples include:
    1. Heatmaps:
      Heatmaps are used to analyze user behavior in games, on websites, and in programs. They provide insight into areas that are used particularly frequently or intensively.
    2. Technologies for Image, Handwriting, Text, and Speech Recognition:
      Technologies like Amazon Alexa, Google Vision, and Microsoft Speech capture and analyze images, handwriting, text, and speech to understand and process user interactions.
    3. Persistent Tracking Cookies:
      Tracking cookies are used over long periods to track user behavior and provide personalized ads or content.
    4. Telemetry and Diagnostic Data as well as Crash Reports (Crashlytics):
      These data are used to analyze software problems, fix errors, and improve the user experience.
    5. Profile Building through User-Generated Content:
      User-generated content like screenshots, artworks, and text analyses contribute to profile building by providing insights into the user's interests, preferences, and life.
    6. Hardware Data:
      Information about the device's hardware, such as the processor, graphics card, RAM, and hard drive identifiers, is collected to optimize performance or analyze user behavior.
    7. Software Data:
      Details about the operating system, programs, and their versions are collected to optimize software and troubleshoot errors.
    8. Language Settings and Folder Structures on Local Hard Drives:
      The language settings and folder structures on local hard drives provide insight into the user's individual preferences and work environment.
    9. Affiliate Marketing and Funnel Analysis:
      Tracking the customer through to the goal (e.g., completing a purchase) enables companies to develop targeted marketing strategies and measure the success of campaigns.
    10. Account Names and Unique Identifiers:
      Account names and unique identifiers like UUIDs and GUIDs help identify and track individual users.
    11. User-Assigned System and Computer Names:
      User-assigned system and computer names can be used as identifiers to uniquely associate the device.
    12. Player Names and Login Information:
      Player names and associated login data are unique identifiers that create user profiles in online games and make their interactions traceable.
    13. User Interactions:
      By capturing clicks, viewed areas, and time spent, user behavior is analyzed to personalize content and advertisements.
    14. Financial Data:
      Currency information, purchasing behavior, and banking data are used to analyze consumer behavior and create personalized offers.
    15. Player Decisions in Games:
      The decisions players make in games, including moral aspects, are analyzed to draw conclusions about their behavior and interests.

    The Problem of Comprehensive Profiles: Manipulation and Reading of Data Sets


    The main issue with the comprehensive collection of data is that it enables the creation of detailed long-term profiles and allows for the manipulation or reading of existing data sets. Even if some of this data is considered "pseudonymized," it can be linked to specific individuals over time. This information is far more valuable than simple, static data and provides deeper insights into the behavior, preferences, and lives of all citizens.


    Another significant problem is that users and citizens are practically defenseless against the data collection frenzy. The majority of users are unaware of the extensive data collection, which often occurs without their explicit consent.


    Although legal regulations like the General Data Protection Regulation (GDPR) exist, the practical enforcement is often insufficient. This is particularly true for companies or institutions that provide little to no transparency. In this context, privacy is increasingly viewed as a risk, while at the same time, more and more personal data is being collected and processed by companies and government agencies, including for mass surveillance.


    Efficiency Increase through AI


    Modern AI models, especially Large Language Models (LLMs), can process and analyze metadata in real-time. This leads to improved content personalization, optimized user experiences, and more efficient business processes. For example, AI systems can analyze user data to provide personalized recommendations or adjust content.


    Risks and Potential for Abuse


    Despite the benefits, AI-powered systems carry significant risks, particularly due to the combination of metadata from various sources. These data lead to comprehensive profiles that be misused for manipulative purposes or unauthorized mass surveillance   or chat control  .


    An example of this is the use of AI for automated facial recognition in public spaces, which occurs in real-time. Technologies developed by companies like Palantir enable large-scale collection and analysis of personal data, contributing to the global increase in mass surveillance, restriction of democratic core values, and invasion of privacy.


    Regulation and Ethical Considerations


    To address these challenges, the EU AI Act was introduced. It regulates the use of AI systems within the EU and distinguishes between different risk categories. Strict requirements apply to high-risk AI systems, including risk management, transparency, and human oversight. Additionally, AI-generated content, such as texts or images, must be clearly labeled to ensure transparency.



    Your opinion is important – please leave a comment!

    ×

    BB-Code Explanations

    Here are the BB-Codes you can use:

    • [b] for bold text: [b]Text[/b] turns into Text
    • [i] for italic text: [i]Text[/i] turns into Text
    • [u] for underlined text: [u]Text[/u] turns into Text
    • [spoiler] for hidden Text: [spoiler]Hidden Text[/spoiler] turns into Hidden Text
    • [url] for hyperlinks: [url]http://example.com[/url] becomes a clickable link  
    • [url=link]text[/url] for named hyperlinks: [url=http://example.com]Visit me[/url] turns into Visit me  
    • [github] for GitHub links: [github]http://github.com/example[/github] turns into a  GitHub-Link

    0 Comments