Birth Certificate Debate Caused by OCR Software and Digital Optimization?


Home / Birth Certificate Debate Caused by OCR Software and Digital Optimization?

Long-form Birth Certificate Image

There are a number of inconsistencies within the President’s scanned long-form birth certificate, posted on the White House website. The presence of layers, kerning, different pixel sizes, unusual variations in color, areas with and without noise and aliasing, pixel-by-pixel reproductions of certain blocks and letters, a misspelled word, and mismatching thresholding patterns throughout the document are consistent with the scenario of scanning with an OCR option, and optimizing the text. Without a “fresh,” copy of the document, not subject to optimization or other digital manipulation, further forensic analysis will not reveal conclusive details.

Layers in the President’s Birth Certificate

Layers in the Birth Certificate

When you scan a page into your computer, and open it with editing software, you will normally see only one layer. You can then break it into layers for editing or manipulating if you want to change things around. If you scanned the page into your computer using OCR (Optical Character Recognition) settings, or an optimization program, however, it will already be in layers when you open it in your computer. Here are a few facts about scanning, OCR and optimization:

  • The OCR software compares each character on the page to an internal database, matches them when possible, and separates everything that it can match into another layer for potential editing.
  • Optimization software tries to ensure that the images are as clear as possible.
  • Most scanners are set to scan either with or without OCR as a default for each scanning job.

President Obama’s birth certificate was created on a Mac, in Quartz PDFContext, and clearly has a number of layers. It is impossible, however, to determine whether the layers are solely the result of OCR software during the scanning process coupled with optimization after the fact, or if additional layers were created at some later point.

Kerning and Document Authentication

Kerning is the process of aligning letters in text to make them fit together more evenly. Older typewriters simply place all letters an equal distance from one another, but a computer will nest letters together to ensure that the text is easy to read without taking up too much space.  The difference between text created by an old typewriter and output from a computer can identify a forgery in the eyes of a forensic document examiner.

What does this have to do with the birth certificate? OCR software translates the images into text, and treats the translation as regular computerized text. During the process, it formats and arranges the letters for optimum readability, which affects kerning and the alignment of letters and words in any document. In the case of the President’s birth certificate, the use of OCR and optimization software during scanning prevents the definitive evaluation of kerning and typesetting in the online birth-certificate-long-form .pdf document.

Other Effects of OCR and Optimization Software

The use of OCR software and image optimization have a number of other effects on documents. Each of these issues, which can result from OCR or optimization processing, may have led to the appearance of tampering and manipulation, and accusations of forgery.

Pixel size: In any scanned image, pixels are all the same size. Pixels in the President’s birth certificate, however, are not. The pixels around the optimized text are a much smaller size than the background pixels.

Color Variations May be Due to OCR

Color variations: There are variations in the colors of the text, ranging from a very dark black to gray and even green. This is not a normal result for a document that is simply scanned as an image – a simple  scan would be true to the original.

Noise: In any scanned document, there are small dots called “noise” scattered throughout the document, particularly in areas of high-contrast.  In President Obama’s birth certificate, noise around the letters is inconsistent.

Aliasing: The term “aliasing” refers to the smoothness of an edge. An aliased image is choppy, while an anti-aliased image is artificially smoothed by the computer to produce a more pleasing line. President Obama’s long-form birth certificate contains both aliased and anti-aliased images.

Pixel-by-Pixel Twins: During the process of scanning, translation and optimization, the software searches for ways to create a document with the best possible appearance with the least required resources. One method of reducing effort is to duplicate similar characters from the first character identified, rather than re-forming each subsequent character from scratch. This results in a document, such as this one, with bits that are identical on a pixel-by-pixel basis.

TXE and OCR Software: Optical character recognition software is not, by any means, perfect. When a document is translated using OCR, misspellings and other odd errors are the exception – not the rule. In a highly complex document, such as the President’s birth certificate, there would naturally be spelling or formatting errors. A single error of this nature is an unusually good result, but within the realm of possibility. In this case, the software would have been unable to fully match the “H” in “THE” and substituted an “X” instead.

Thresholding Patterns: During the optimization process, high-contrast areas are clarified with thresholding. Each pixel is evaluated and assigned a value based on a threshold set up by the program. Some areas of the president’s birth certificate have been optimized in this manner, while others have not.

President Barack Obama’s Birth Certificate: Conclusion

The changes made to the original document by OCR software and image optimization have rendered it impossible to determine whether these inconsistencies are due to manual tampering, or are simply the result of the optimization and scanning process.

Leave a Comment