The Hidden Dangers of MathML - How It Enables Fingerprinting

2023-09-02

This article was ported from a different blogging platform and may not display properly.

MathML (Mathematical Markup Language) is a powerful tool for displaying mathematical equations on the web. However, MathML can also be exploited to fingerprint users in subtle ways that allow tracking across sites. In this post, we’ll do a deep dive on how clientrects are used to calculate fingerprints from MathML.

What is MathML?

MathML is an XML-based markup language for describing mathematical notation. It allows creators to encode both the structure and presentation of formulas for high-quality rendering in browsers and other applications.

MathML has tags that indicate mathematical structures like fractions, roots, and matrices. It also uses specific fonts like STIX to ensure symbols, spacing, and layout are perfect for academic and technical writing.

For example, the quadratic formula can be encoded like this in MathML:

<math xmlns="http://www.w3.org/1998/Math/MathML">

  <mrow>
    <mi>x</mi>
    <mo>=</mo>
    <mfrac>
      <mrow>
        <mo>-</mo>
        <mi>b</mi>
        <mo>±</mo>
        <msqrt>
          <msup><mi>b</mi><mn>2</mn></msup>
          <mo>-</mo>
          <mn>4</mn><mi>a</mi><mi>c</mi>
        </msqrt>
      </mrow>
      <mrow>
        <mn>2</mn><mi>a</mi>
      </mrow>
    </mfrac>
  </mrow>

</math>

�=-�±�2-4��2�

As you can see above, this code generates the quadratic formula in a nicely rendered, easy-to-read format that is often used on educational and scientific websites.

Why is MathML Vulnerable to Fingerprinting?

There are a few key properties of MathML that opens it up to potential fingerprinting:

Rendering differences - MathML renders slightly differently across varying browsers, devices, operating systems and MathML engines. These subtle differences can be detected.

Fonts - MathML can use any OpenType math font with a MATH table, but proprietary fonts like STIX are recommended for better rendering and unicode support. Checking for these fonts does not create fingerprints, but it may affect the layout and appearance of MathML formulas.

Varying Support - Some older browser versions may not support MathML, and this support information can be used in the fingerprinting process.

The complex nature of MathML means many data points can be gathered to assemble a unique fingerprint that is very difficult to spoof.

Detecting MathML Differences with ClientRects

The key to fingerprinting with MathML is detecting the subtle rendering differences between devices and browsers. This is accomplished using JavaScript’s getClientRects() method.

When getClientRects() is called on a visible DOM element, it returns a ClientRectList containing the size and position of the element and its subelements. These measurements will vary between devices and browsers based on factors like:

  • Screen size
  • Browser engine
  • MathML engine
  • Fonts installed
  • OS/GPU performance

This allows websites to calculate highly detailed fingerprints that persist even with browser updates and can be used to track users across sites because of the way MathML renders.

Example

(�+��-�)��

MathML: ( x + y x - y ) e x

ClientRects: [ { “x”: 271.66668701171875, “y”: 3342.20849609375, “width”: 720, “height”: 32, “top”: 3342.20849609375, “right”: 991.6666870117188, “bottom”: 3374.20849609375, “left”: 271.66668701171875 } ]

Hash: 72b8dd288015a5c63b96a31d3fac9a983700350d6873ba4464cc031364a540eb

This example creates a MathML equation and then uses ClientRects to get the size of each rendered element. It extracts measurements like width, height, x position, and y position. These values are unique to each user based on factors like screen size and how MathML is rendered by their specific browser.

The example takes these ClientRect measurements and creates a hash value. This hash acts as a fingerprint that identifies users because of the distinct way MathML renders on their device and browser combination.

Even tiny differences in the ClientRect values caused by screen resolution, installed fonts, browser engines, etc. will result in a completely different hash. That allows the MathML to be used to silently fingerprint users without their knowledge.

Preventing MathML Based Fingerprinting

Here are some tips to help mitigate MathML fingerprinting:

  • Use privacy extensions like Privacy Badger and uBlock Origin to block unnecessary third party scripts.
  • Mask your browser and environment configurations using browser privacy settings.
  • Use the Tor or Brave browser which is designed to normalize many fingerprinting vectors.
  • Advocate for standardized MathML implementations in all browsers to close loopholes.

Conclusion

MathML is clearly a game-changer for presenting mathematical equations on the web. The ability to render complex formulas for science, engineering, and education is an immense benefit. However, as with any powerful technology, we must be cognizant of how it could potentially be exploited for questionable purposes like fingerprinting users.

By illuminating how techniques like using clientrects to detect subtle MathML rendering differences can identify fingerprints, we make the first step towards protecting against misuse. Forewarned is forearmed. Now developers, educators, and policy makers can take appropriate steps to ensure MathML is leveraged responsibly while still enabling all the goodness it brings.

There are many promising options, like pushing for standardized MathML implementations across browsers and platforms. In this way, we can eliminate loopholes and inconsistencies that might be abused while fully unlocking MathML’s potential. With care and wisdom, we can overcome any short-term growing pains to realize the full value of MathML as a force for knowledge and understanding. Our shared goal should be upholding an ethical framework that allows MathML’s benefits to improve society while curtailing any harmful applications that undermine user privacy and freedom.