Data extraction
If you wish to request the document details from the user this section will provide you context on what this check entails. Yoti will extract data from thousands of ID documents from 200+ countries using Optical Character Recognition (OCR), with the option of a fallback to a manual data entry process in the event OCR is not successful.
This section will describe:
- What a data extraction check is.
- Outcome report with recovery suggestions.
The below checks are related to data extraction checks available at Yoti:
Name | Description | Resources | Manual check available |
---|---|---|---|
Text extraction | A request to obtain the data printed visually on a document, in structured form. | 1x Document Resource | ✅ |
If machine data extraction is not successful, at session creation there is an option to fallback to manual data extraction. This generates a ‘text data check’ automatically, and the document is reviewed by one of our document processing experts. The automated check is instantaneous.
Yoti will attempt to extract as much as data as possible depending on the document type.
Enhance your check
Add these features to enhance your check for a stronger verification.
Name | Description | Resources | Manual check available |
---|---|---|---|
Mobile hand off | Start your session in web and allow the user to smoothly move to mobile. | N/A | N/A |
NFC | Enable Near Field Communication (NFC) for native integrations. | 1x Document Resource | ❌ |
Data extracted
See below for data that Yoti will attempt to extract. Please note every document has different fields, Yoti will extract what we can.
Field name | Type | Description | Example |
---|---|---|---|
full_name | string | FullName contains given names and family name. | “Jon Jim Fred Foo” |
date_of_birth | string | DateOfBirth is the date of birth in the form yyyy-mm-dd, or yyyy-mm or yyyy. A small percentage of documents do not provide the full DoB, if available in the visual zone or barcode data Yoti will always return the full DoB. | "2000-12-01" |
nationality | string | Nationality is the nationality expressed as an ISO/ICAO alpha-3 country code. Not all documents provide this. | "GBR" |
given_names | string | GivenNames contains first and middle names. Not all documents provide split given names and family name, in those cases only the full name field will be returned. | "Jon Jim Fred" |
first_name | string | FirstName is the first name only. Only for those documents that provide first name and middle name split | “Jon” |
middle_name | string | MiddleName contains the middle names only. Only for those documents that provide first name and middle names split. | “Jim Fred” |
family_name | string | FamilyName is the family name. Not all documents provide split given names and family name, in those cases only the full name field will be returned. | “Foo” |
place_of_birth | string | PlaceOfBirth is the place of birth, it may contain a country name or a city name or both. Not all documents provide this. | "London" |
country_of_birth | string | CountryOfBirth is the country of birth, as an ISO/ICAO alpha-3 country code. Not all documents provide this. | "GBR" |
gender | string | Gender is the gender and can be any of the following values: "MALE", "FEMALE", "TRANSGENDER" or "OTHER". | "MALE" |
name_prefix | string | NamePrefix is the name prefix, for example a title. Only certain documents provide this. | “Dr” |
name_suffix | string | NameSuffix is the name suffix. Only certain documents provide this. | “Jr” |
first_name_alias | string | FirstNameAlias is the alias for the first name. Only certain documents provide this. | |
middle_name_alias | string | MiddleNameAlias is the alias for the middle name. Only certain documents provide this. | |
family_name_alias | string | FamilyNameAlias is the alias for the family name. Only certain documents provide this. | |
weight | string | Weight is the weight as displayed on the document. Should contain weight and the unit. Only certain documents provide this. | |
height | string | Height is the height as displayed on the document. Should contain height and the unit. Only certain documents provide this. | |
eye_color | string | EyeColor is the eye color as displayed on the document. Only certain documents provide this. | |
structured_postal_address | object | StructuredPostalAddress is the postal address with the breakdown in address lines, post code and so on as well as the formatted address all in one line. See details for | |
document_type | string | DocumentType specifies the type of document. | “DRIVING_LICENCE” |
issuing_country | string | IssuingCountry is the country the document was issued in. Defined as an ISO/ICAO alpha-3 country code. | "GBR" |
document_number | string | DocumentNumber is the document number. | "EF1523467" |
expiration_date | string | ExpirationDate defines the date of expiry of the document in the form yyyy-mm-dd. Note: Few ID documents do not contain a visible expiration date. In these cases, the date may be returned as either | "2030-12-01" "LIFETIME" "NOT_PRESENT" null |
date_of_issue | string | DateOfIssue is the date of issue of the document in the form yyyy-mm-dd. | "2001-12-01" |
issuing_authority | string | IssuingAuthority is the authority that issued the document. | "DVLA" |
mrz | object | MRZ provides the content of the machine readable zone, as displayed on the document. | |
mrz.type | number | Type is type of MRZ, 2 lines or 3 lines. 1=TD1 2=TD3 | |
mrz.line1 | string | MRZ line 1. | |
mrz.line2 | string | MRZ line 2. | |
mrz.line3 | string | MRZ line 3. | |
organisation | string | Used primarily for Supplementary documents. | "Thames Water" |
personal_identification_number | string | Identification number of the document holder | |
place_of_issue | string | If present, the place the ID document was issued | |
document_template | string | Contains the fields of The relevant field will only appear if available/detected. |
Address extraction
Field name | Type | Description |
---|---|---|
address_format | number | AddressFormat is used to identify which fields may be present in the JSON object. See table below that defines what format is used for each country. |
udprn | string | Udprn is the Unique Delivery Point Reference Number that identifies a property throughout its lifecycle. |
care_of | string | CareOf identifies the owner of the premises. |
sub_building | string | SubBuilding is used when the building is divided into smaller units (e.g. a block of flats) to identify the sub unit. |
building_number | string | BuildingNumber is the number of the building. |
building | string | Building is the name/number of the building. |
street | string | Street is the name/number of the street the building is on. |
landmark | string | Landmark is a description used to describe the location of the building. |
address_line1 | string | AddressLine1 is the first line of the address. |
address_line2 | string | AddressLine2 is the second line of the address. |
address_line3 | string | AddressLine3 is the third line of the address. |
address_line4 | string | AddressLine4 is the fourth line of the address. |
address_line5 | string | AddressLine5 is the fifth line of the address. |
address_line6 | string | AddressLine6 is the sixth line of the address. |
locality | string | Locality is the area the building is in. |
town_city | string | TownCity is the town/city/village/hamlet/community/etc. that the building is in. |
subdistrict | string | Subdistrict is the sub-district the building is in. |
district | string | District is the district the building is in. |
state | string | State is the state/county the building is in. |
postal_code | string | PostalCode is a code used by the country's postal service to aid in sorting and delivering mail (e.g. postcode, zipcode, pincode). |
post_office | string | PostOffice is the post office that serves the area the building is in. |
country_iso | string | CountryIso is the country the building is in. In ISO-3166-1 alpha-3 format. |
country | string | Country is the country the building is in. Localised. |
formatted_address | string | FormattedAddress is the full address in a single human readable string in a format that is suitable for printing onto an envelope. This field is not required when providing address information. |
The below defines the fields of the JSON structure used for all addresses. A subset of fields will be present in each case and address_format can be used to ascertain which ones for any given address. The country iso should not be used for this purpose.
Four address formats are available and detailed below:
Countries that use this format | GBR, JEY, IMN | IND | USA, AUS | All other countries |
address_format | 1 | 2 | 3 | 4 |
udprn | Optional | |||
care_of | Optional | |||
sub_building | Optional* | |||
building_number | Optional* | |||
building | Optional* | Optional | ||
street | Optional | |||
landmark | Optional | |||
address_line1 | Mandatory | Mandatory | Mandatory | |
address_line2 | Optional | Optional | Optional | |
address_line3 | Optional | Optional | ||
address_line4 | Optional | |||
address_line5 | Optional | |||
address_line6 | Optional | |||
locality | Optional | |||
town_city | Mandatory | Optional | Mandatory | |
subdistrict | Optional | |||
district | Optional | |||
state | Optional | Optional | Mandatory | |
postal_code | Mandatory | Mandatory | Mandatory | Optional |
post_office | Optional | |||
country_iso | Mandatory | Mandatory | Mandatory | Mandatory |
country | Mandatory | Mandatory | Mandatory | Mandatory |
formatted_address** | Mandatory | Mandatory | Mandatory | Mandatory |
** At least one must be present
*** Will always be returned in the data extraction, but is not mandatory when configuring an applicant profile
{
"address_format": 1,
"building_number": "15a",
"address_line1": "15a North Street",
"town_city": "CARSHALTON",
"postal_code": "SM5 2HW",
"country_iso": "GBR"
"country": "UK",
"formatted_address": "15a North Street\nCARSHALTON\nSM5 2HW\nUK"
}