Click or drag to resize

Tesseract.Net parameters

„Tesseract is extremely flexible, if you know how to control it. There is a large number of control parameters to modify its behaviour. While these change from time to time, most of them are fairly stable.“ (Tesseract ControlParams wiki)

There are two way how to set parameter: cofig file and through API

Config file

Config file is simple text file without BOM and with Unix end-of-line mark (on Windows you can use some advanced text editor e.g. Notepad++ to achieve this).

Config file should be located in your tessdata/configs directory. Your tessdata directory should be look like this:
tessdats\
configs\
config.cfg
pdf.ttf
pdf.ttx
...
language files

C#
public void InitWithConfig()
{
    using (var api = OcrApi.Create())
    {
        string[] configs = { "config.cfg" };
        api.Init(@"path_to_tessdata_folder", "eng", OcrEngineMode.OEM_DEFAULT, configs);
    }
}

Tesseract-OCR API

You can set single parameter with API function SetVariable. E.g.

C#
public void Init()
{
    using (var api = OcrApi.Create())
    {
        api.Init(@"path_to_tessdata_folder", "eng");
        api.SetVariable("editor_image_xpos", "590");
    }
}

In case you want (need) to set parameter during tesseract init you need to create arrays for parameters and their values. Here is example:

C#
public void InitWithVariables()
{
    using (var api = OcrApi.Create())
    {
        string[] variables = { "editor_image_xpos", "editor_dbwin_width" };
        string[] values = { "590", "80" };
        api.Init(@"path_to_tessdata_folder", "eng", OcrEngineMode.OEM_DEFAULT, null, variables, values);
    }
}

Of course you can use also API function ReadConfigFiles(String) (or ReadDebugConfigFiles(String)) to read tesseract config files with non-init parameters.

Parameters in 4.1 version

Variable

Default value

Type

Description

allow_blob_division

1

Boolean

Use divisible blobs chopping

applybox_learn_chars_and_char_frags_mode

0

Boolean

Learn both character fragments (as is done in the special low exposure mode) as well as unfragmented characters.

applybox_learn_ngrams_mode

0

Boolean

Each bounding box is assumed to contain ngrams. Only learn the ngrams whose outlines overlap horizontally.

assume_fixed_pitch_char_segment

0

Boolean

include fixed-pitch heuristics in char segmentation

bland_unrej

0

Boolean

unrej potential with no checks

chop_enable

1

Boolean

Chop enable

chop_new_seam_pile

1

Boolean

Use new seam_pile

chop_vertical_creep

0

Boolean

Vertical creep

classify_bln_numeric_mode

0

Boolean

Assume the input is numbers [0-9].

classify_debug_character_fragments

0

Boolean

Bring up graphical debugging windows for fragments training

classify_enable_adaptive_debugger

0

Boolean

Enable match debugger

classify_enable_adaptive_matcher

1

Boolean

Enable adaptive classifier

classify_enable_learning

1

Boolean

Enable adaptive classifier

classify_nonlinear_norm

0

Boolean

Non-linear stroke-density normalization

classify_save_adapted_templates

0

Boolean

Save adapted templates to a file

classify_use_pre_adapted_templates

0

Boolean

Use pre-adapted classifier templates

crunch_accept_ok

1

Boolean

Use acceptability in okstring

crunch_early_convert_bad_unlv_chs

0

Boolean

Take out ~^ early?

crunch_early_merge_tess_fails

1

Boolean

Before word crunch?

crunch_include_numerals

0

Boolean

Fiddle alpha figures

crunch_leave_accept_strings

0

Boolean

Don't pot crunch sensible strings

crunch_leave_ok_strings

1

Boolean

Don't touch sensible strings

crunch_terrible_garbage

1

Boolean

As it says

devanagari_split_debugimage

0

Boolean

Whether to create a debug image for split shiro-rekha process.

disable_character_fragments

1

Boolean

Do not include character fragments in the results of the classifier

edges_children_fix

0

Boolean

Remove boxy parents of char-like children

edges_debug

0

Boolean

turn on debugging for this module

edges_use_new_outline_complexity

0

Boolean

Use the new outline complexity module

enable_noise_removal

1

Boolean

Remove and conditionally reassign small outlines when they confuse layout analysis, determining diacritics vs noise

equationdetect_save_bi_image

0

Boolean

Save input bi image

equationdetect_save_merged_image

0

Boolean

Save the merged image

equationdetect_save_seed_image

0

Boolean

Save the seed image

equationdetect_save_spt_image

0

Boolean

Save special character image

force_word_assoc

0

Boolean

force associator to run regardless of what enable_assoc is. This is used for CJK where component grouping is necessary.

gapmap_debug

0

Boolean

Say which blocks have tables

gapmap_no_isolated_quanta

0

Boolean

Ensure gaps not less than 2quanta wide

gapmap_use_ends

0

Boolean

Use large space at start and end of rows

hocr_char_boxes

0

Boolean

Add coordinates for each character to hocr output

hocr_font_info

0

Boolean

Add font info to hocr output

interactive_display_mode

0

Boolean

Run interactively?

language_model_ngram_on

0

Boolean

Turn on/off the use of character ngram model

language_model_ngram_space_delimited_language

1

Boolean

Words are delimited by space

language_model_ngram_use_only_first_uft8_step

0

Boolean

Use only the first UTF8 step of the given string when computing log probabilities.

language_model_use_sigmoidal_certainty

0

Boolean

Use sigmoidal score for certainty

load_bigram_dawg

1

Boolean

Load dawg with special word bigrams.

load_freq_dawg

1

Boolean

Load frequent word dawg.

load_number_dawg

1

Boolean

Load dawg with number patterns.

load_punc_dawg

1

Boolean

Load dawg with punctuation patterns.

load_system_dawg

1

Boolean

Load system word dawg.

load_unambig_dawg

1

Boolean

Load unambiguous word dawg.

lstm_use_matrix

1

Boolean

Use ratings matrix/beam search with lstm

matcher_debug_separate_windows

0

Boolean

Use two different windows for debugging the matching: One for the protos and one for the features.

merge_fragments_in_matrix

1

Boolean

Merge the fragments in the ratings matrix and delete them after merging

oldbl_corrfix

1

Boolean

Improve correlation of heights

oldbl_xhfix

0

Boolean

Fix bug in modes threshold for xheights

pageseg_apply_music_mask

1

Boolean

Detect music staff and remove intersecting components

paragraph_text_based

1

Boolean

Run paragraph detection on the post-text-recognition (more accurate)

poly_allow_detailed_fx

0

Boolean

Allow feature extractors to see the original outline

poly_debug

0

Boolean

Debug old poly

poly_wide_objects_better

1

Boolean

More accurate approx on wide things

preserve_interword_spaces

0

Boolean

Preserve multiple interword spaces

prioritize_division

0

Boolean

Prioritize blob division over chopping

rej_1Il_trust_permuter_type

1

Boolean

Don't double check

rej_1Il_use_dict_word

0

Boolean

Use dictword test

rej_alphas_in_number_perm

0

Boolean

Extend permuter check

rej_trust_doc_dawg

0

Boolean

Use DOC dawg in 11l conf. detector

rej_use_good_perm

1

Boolean

Individual rejection control

rej_use_sensible_wd

0

Boolean

Extend permuter check

rej_use_tess_accepted

1

Boolean

Individual rejection control

rej_use_tess_blanks

1

Boolean

Individual rejection control

save_alt_choices

1

Boolean

Save alternative paths found during chopping and segmentation search

save_doc_words

0

Boolean

Save Document Words

segment_nonalphabetic_script

0

Boolean

Don't use any alphabetic-specific tricks. Set to true in the traineddata config file for scripts that are cursive or inherently fixed-pitch

stopper_no_acceptable_choices

0

Boolean

Make AcceptableChoice() always return false. Useful when there is a need to explore all segmentations

stream_filelist

0

Boolean

Stream a filelist from stdin

suspect_constrain_1Il

0

Boolean

UNLV keep 1Il chars rejected

tess_bn_matching

0

Boolean

Baseline Normalized Matching

tess_cn_matching

0

Boolean

Character Normalized Matching

tessedit_adaption_debug

0

Boolean

Generate and print debug information for adaption

tessedit_ambigs_training

0

Boolean

Perform training for ambiguities

tessedit_create_alto

0

Boolean

Write .xml ALTO file

tessedit_create_boxfile

0

Boolean

Output text with boxes

tessedit_create_hocr

0

Boolean

Write .html hOCR output file

tessedit_create_lstmbox

0

Boolean

Write .box file for LSTM training

tessedit_create_pdf

0

Boolean

Write .pdf output file

tessedit_create_tsv

0

Boolean

Write .tsv output file

tessedit_create_txt

0

Boolean

Write .txt output file

tessedit_create_wordstrbox

0

Boolean

Write WordStr format .box output file

tessedit_debug_block_rejection

0

Boolean

Block and Row stats

tessedit_debug_doc_rejection

0

Boolean

Page stats

tessedit_debug_fonts

0

Boolean

Output font info per char

tessedit_debug_quality_metrics

0

Boolean

Output data to debug file

tessedit_display_outwords

0

Boolean

Draw output words

tessedit_do_invert

1

Boolean

Try inverting the image in `LSTMRecognizeWord`

tessedit_dont_blkrej_good_wds

0

Boolean

Use word segmentation quality metric

tessedit_dont_rowrej_good_wds

0

Boolean

Use word segmentation quality metric

tessedit_dump_choices

0

Boolean

Dump char choices

tessedit_dump_pageseg_images

0

Boolean

Dump intermediate images made during page segmentation

tessedit_enable_bigram_correction

1

Boolean

Enable correction based on the word bigram dictionary.

tessedit_enable_dict_correction

0

Boolean

Enable single word correction based on the dictionary.

tessedit_enable_doc_dict

1

Boolean

Add words to the document dictionary

tessedit_fix_fuzzy_spaces

1

Boolean

Try to improve fuzzy spaces

tessedit_fix_hyphens

1

Boolean

Crunch double hyphens?

tessedit_flip_0O

1

Boolean

Contextual 0O O0 flips

tessedit_good_quality_unrej

1

Boolean

Reduce rejection on good docs

tessedit_init_config_only

0

Boolean

Only initialize with the config file. Useful if the instance is not going to be used for OCR but say only for layout analysis.

tessedit_make_boxes_from_boxes

0

Boolean

Generate more boxes from boxed chars

tessedit_minimal_rej_pass1

0

Boolean

Do minimal rejection on pass 1 output

tessedit_minimal_rejection

0

Boolean

Only reject tess failures

tessedit_override_permuter

1

Boolean

According to dict_word

tessedit_prefer_joined_punct

0

Boolean

Reward punctuation joins

tessedit_preserve_blk_rej_perfect_wds

1

Boolean

Only rej partially rejected words in block rejection

tessedit_preserve_row_rej_perfect_wds

1

Boolean

Only rej partially rejected words in row rejection

tessedit_reject_bad_qual_wds

1

Boolean

Reject all bad quality wds

tessedit_rejection_debug

0

Boolean

Adaption debug

tessedit_resegment_from_boxes

0

Boolean

Take segmentation and labeling from box file

tessedit_resegment_from_line_boxes

0

Boolean

Conversion of word/line box file to char box file

tessedit_row_rej_good_docs

1

Boolean

Apply row rejection to good docs

tessedit_test_adaption

0

Boolean

Test adaption criteria

tessedit_timing_debug

0

Boolean

Print timing stats

tessedit_train_from_boxes

0

Boolean

Generate training data from boxed chars

tessedit_train_line_recognizer

0

Boolean

Break input into lines and remap boxes if present

tessedit_unrej_any_wd

0

Boolean

Don't bother with word plausibility

tessedit_use_primary_params_model

0

Boolean

In multilingual mode use params model of the primary language

tessedit_use_reject_spaces

1

Boolean

Reject spaces?

tessedit_word_for_word

0

Boolean

Make output have exactly one word per WERD

tessedit_write_block_separators

0

Boolean

Write block separators in output

tessedit_write_images

0

Boolean

Capture the image from the IPE

tessedit_write_rep_codes

0

Boolean

Write repetition char code

tessedit_write_unlv

0

Boolean

Write .unlv output file

tessedit_zero_kelvin_rejection

0

Boolean

Don't reject ANYTHING AT ALL

tessedit_zero_rejection

0

Boolean

Don't reject ANYTHING

test_pt

0

Boolean

Test for point

textonly_pdf

0

Boolean

Create PDF with only one invisible text layer

textord_all_prop

0

Boolean

All doc is proportial text

textord_biased_skewcalc

1

Boolean

Bias skew estimates with line length

textord_blockndoc_fixed

0

Boolean

Attempt whole doc/block fixed pitch

textord_blocksall_fixed

0

Boolean

Moan about prop blocks

textord_blocksall_prop

0

Boolean

Moan about fixed pitch blocks

textord_blocksall_testing

0

Boolean

Dump stats when moaning

textord_chopper_test

0

Boolean

Chopper is being tested.

textord_debug_baselines

0

Boolean

Debug baseline generation

textord_debug_blob

0

Boolean

Print test blob information

textord_debug_pitch_metric

0

Boolean

Write full metric stuff

textord_debug_pitch_test

0

Boolean

Debug on fixed pitch test

textord_debug_printable

0

Boolean

Make debug windows printable

textord_debug_xheights

0

Boolean

Test xheight algorithms

textord_disable_pitch_test

0

Boolean

Turn off dp fixed pitch algorithm

textord_equation_detect

0

Boolean

Turn on equation detector

textord_fast_pitch_test

0

Boolean

Do even faster pitch algorithm

textord_fix_makerow_bug

1

Boolean

Prevent multiple baselines

textord_fix_xheight_bug

1

Boolean

Use spline baseline

textord_force_make_prop_words

0

Boolean

Force proportional word segmentation on all rows

textord_fp_chopping

1

Boolean

Do fixed pitch chopping

textord_heavy_nr

0

Boolean

Vigorously remove noise

textord_interpolating_skew

1

Boolean

Interpolate across gaps

textord_new_initial_xheight

1

Boolean

Use test xheight mechanism

textord_no_rejects

0

Boolean

Don't remove noise blobs

textord_noise_debug

0

Boolean

Debug row garbage detector

textord_noise_rejrows

1

Boolean

Reject noise-like rows

textord_noise_rejwords

1

Boolean

Reject noise-like words

textord_ocropus_mode

0

Boolean

Make baselines for ocropus

textord_old_baselines

1

Boolean

Use old baseline algorithm

textord_old_xheight

0

Boolean

Use old xheight algorithm

textord_oldbl_debug

0

Boolean

Debug old baseline generation

textord_oldbl_merge_parts

1

Boolean

Merge suspect partitions

textord_oldbl_paradef

1

Boolean

Use para default mechanism

textord_oldbl_split_splines

1

Boolean

Split stepped splines

textord_parallel_baselines

1

Boolean

Force parallel baselines

textord_pitch_cheat

0

Boolean

Use correct answer for fixed/prop

textord_pitch_scalebigwords

0

Boolean

Scale scores on big words

textord_really_old_xheight

0

Boolean

Use original wiseowl xheight

textord_restore_underlines

1

Boolean

Chop underlines and put back

textord_show_blobs

0

Boolean

Display unsorted blobs

textord_show_boxes

0

Boolean

Display unsorted blobs

textord_show_expanded_rows

0

Boolean

Display rows after expanding

textord_show_final_blobs

0

Boolean

Display blob bounds after pre-ass

textord_show_final_rows

0

Boolean

Display rows after final fitting

textord_show_fixed_cuts

0

Boolean

Draw fixed pitch cell boundaries

textord_show_fixed_words

0

Boolean

Display forced fixed pitch words

textord_show_initial_rows

0

Boolean

Display row accumulation

textord_show_initial_words

0

Boolean

Display separate words

textord_show_new_words

0

Boolean

Display separate words

textord_show_page_cuts

0

Boolean

Draw page-level cuts

textord_show_parallel_rows

0

Boolean

Display page correlated rows

textord_show_row_cuts

0

Boolean

Draw row-level cuts

textord_show_tables

0

Boolean

Show table regions

textord_single_height_mode

0

Boolean

Script has no xheight, so use a single mode

textord_space_size_is_variable

0

Boolean

If true, word delimiter spaces are assumed to have variable width, even though characters have fixed pitch.

textord_straight_baselines

0

Boolean

Force straight baselines

textord_tabfind_find_tables

1

Boolean

run table detection

textord_tabfind_force_vertical_text

0

Boolean

Force using vertical text page mode

textord_tabfind_only_strokewidths

0

Boolean

Only run stroke widths

textord_tabfind_show_blocks

0

Boolean

Show final block bounds

textord_tabfind_show_columns

0

Boolean

Show column bounds

textord_tabfind_show_finaltabs

0

Boolean

Show tab vectors

textord_tabfind_show_initial_partitions

0

Boolean

Show partition bounds

textord_tabfind_show_initialtabs

0

Boolean

Show tab candidates

textord_tabfind_show_reject_blobs

0

Boolean

Show blobs rejected as noise

textord_tabfind_show_vlines

0

Boolean

Debug line finding

textord_tabfind_vertical_text

1

Boolean

Enable vertical detection

textord_tablefind_recognize_tables

0

Boolean

Enables the table recognizer for table layout and filtering.

textord_tablefind_show_mark

0

Boolean

Debug table marking steps in detail

textord_tablefind_show_stats

0

Boolean

Show page stats used in table finding

textord_test_landscape

0

Boolean

Tests refer to land/port

textord_test_mode

0

Boolean

Do current test

textord_use_cjk_fp_model

0

Boolean

Use CJK fixed pitch model

tosp_all_flips_fuzzy

0

Boolean

Pass ANY flip to context?

tosp_block_use_cert_spaces

1

Boolean

Only stat OBVIOUS spaces

tosp_flip_fuzz_kn_to_sp

1

Boolean

Default flip

tosp_flip_fuzz_sp_to_kn

1

Boolean

Default flip

tosp_force_wordbreak_on_punct

0

Boolean

Force word breaks on punct to break long lines in non-space delimited langs

tosp_fuzzy_limit_all

1

Boolean

Don't restrict kn->sp fuzzy limit to tables

tosp_improve_thresh

0

Boolean

Enable improvement heuristic

tosp_narrow_blobs_not_cert

1

Boolean

Only stat OBVIOUS spaces

tosp_old_to_bug_fix

0

Boolean

Fix suspected bug in old code

tosp_old_to_constrain_sp_kn

0

Boolean

Constrain relative values of inter and intra-word gaps for old_to_method.

tosp_old_to_method

0

Boolean

Space stats use prechopping?

tosp_only_small_gaps_for_kern

0

Boolean

Better guess

tosp_only_use_prop_rows

1

Boolean

Block stats to use fixed pitch rows?

tosp_only_use_xht_gaps

0

Boolean

Only use within xht gap for wd breaks

tosp_recovery_isolated_row_stats

1

Boolean

Use row alone when inadequate cert spaces

tosp_row_use_cert_spaces

1

Boolean

Only stat OBVIOUS spaces

tosp_row_use_cert_spaces1

1

Boolean

Only stat OBVIOUS spaces

tosp_rule_9_test_punct

0

Boolean

Don't chng kn to space next to punct

tosp_stats_use_xht_gaps

1

Boolean

Use within xht gap for wd breaks

tosp_use_pre_chopping

0

Boolean

Space stats use prechopping?

tosp_use_xht_gaps

1

Boolean

Use within xht gap for wd breaks

unlv_tilde_crunching

0

Boolean

Mark v.bad words for tilde crunch

use_ambigs_for_adaption

0

Boolean

Use ambigs for deciding whether to adapt to a character

use_only_first_uft8_step

0

Boolean

Use only the first UTF8 step of the given string when computing log probabilities.

wordrec_blob_pause

0

Boolean

Blob pause

wordrec_debug_blamer

0

Boolean

Print blamer debug messages

wordrec_display_all_blobs

0

Boolean

Display Blobs

wordrec_display_splits

0

Boolean

Display splits

wordrec_enable_assoc

1

Boolean

Associator Enable

wordrec_run_blamer

0

Boolean

Try to set the blame for errors

wordrec_skip_no_truth_words

0

Boolean

Only run OCR for words that had truth recorded in BlamerBundle

certainty_scale

20

Double

Certainty scaling factor

certainty_scale

20

Double

Certainty scaling factor

chop_center_knob

0.15

Double

Split center adjustment

chop_good_split

50

Double

Good split limit

chop_ok_split

100

Double

OK split limit

chop_overlap_knob

0.9

Double

Split overlap adjustment

chop_sharpness_knob

0.06

Double

Split sharpness adjustment

chop_split_dist_knob

0.5

Double

Split length adjustment

chop_width_change_knob

5

Double

Width change adjustment

classify_adapted_pruning_factor

2.5

Double

Prune poor adapted results this much worse than best result

classify_adapted_pruning_threshold

-1

Double

Threshold at which classify_adapted_pruning_factor starts

classify_char_norm_range

0.2

Double

Character Normalization Range ...

classify_character_fragments_garbage_certainty_threshold

-3

Double

Exclude fragments that do not look like whole characters from training and adaption

classify_cp_angle_pad_loose

45

Double

Class Pruner Angle Pad Loose

classify_cp_angle_pad_medium

20

Double

Class Pruner Angle Pad Medium

classify_cp_angle_pad_tight

10

Double

CLass Pruner Angle Pad Tight

classify_cp_end_pad_loose

0.5

Double

Class Pruner End Pad Loose

classify_cp_end_pad_medium

0.5

Double

Class Pruner End Pad Medium

classify_cp_end_pad_tight

0.5

Double

Class Pruner End Pad Tight

classify_cp_side_pad_loose

2.5

Double

Class Pruner Side Pad Loose

classify_cp_side_pad_medium

1.2

Double

Class Pruner Side Pad Medium

classify_cp_side_pad_tight

0.6

Double

Class Pruner Side Pad Tight

classify_max_certainty_margin

5.5

Double

Veto difference between classifier certainties

classify_max_rating_ratio

1.5

Double

Veto ratio between classifier ratings

classify_max_slope

2.41421

Double

Slope above which lines are called vertical

classify_min_slope

0.414214

Double

Slope below which lines are called horizontal

classify_misfit_junk_penalty

0

Double

Penalty to apply when a non-alnum is vertically out of its expected textline position

classify_norm_adj_curl

2

Double

Norm adjust curl ...

classify_norm_adj_midpoint

32

Double

Norm adjust midpoint ...

classify_pico_feature_length

0.05

Double

Pico Feature Length

classify_pp_angle_pad

45

Double

Proto Pruner Angle Pad

classify_pp_end_pad

0.5

Double

Proto Prune End Pad

classify_pp_side_pad

2.5

Double

Proto Pruner Side Pad

crunch_del_cert

-10

Double

POTENTIAL crunch cert lt this

crunch_del_high_word

1.5

Double

Del if word gt xht x this above bl

crunch_del_low_word

0.5

Double

Del if word gt xht x this below bl

crunch_del_max_ht

3

Double

Del if word ht gt xht x this

crunch_del_min_ht

0.7

Double

Del if word ht lt xht x this

crunch_del_min_width

3

Double

Del if word width lt xht x this

crunch_del_rating

60

Double

POTENTIAL crunch rating lt this

crunch_poor_garbage_cert

-9

Double

crunch garbage cert lt this

crunch_poor_garbage_rate

60

Double

crunch garbage rating lt this

crunch_pot_poor_cert

-8

Double

POTENTIAL crunch cert lt this

crunch_pot_poor_rate

40

Double

POTENTIAL crunch rating lt this

crunch_small_outlines_size

0.6

Double

Small if lt xht x this

crunch_terrible_rating

80

Double

crunch rating lt this

doc_dict_certainty_threshold

-2.25

Double

Worst certainty for words that can be inserted into the document dictionary

doc_dict_pending_threshold

0

Double

Worst certainty for using pending dictionary

edges_boxarea

0.875

Double

Min area fraction of grandchild for box

edges_childarea

0.5

Double

Min area fraction of child outline

fixsp_small_outlines_size

0.28

Double

Small if lt xht x this

gapmap_big_gaps

1.75

Double

xht multiplier

language_model_ngram_nonmatch_score

-40

Double

Average classifier score of a non-matching unichar.

language_model_ngram_rating_factor

16

Double

Factor to bring log-probs into the same range as ratings when multiplied by outline length

language_model_ngram_scale_factor

0.03

Double

Strength of the character ngram model relative to the character classifier

language_model_ngram_small_prob

0.000001

Double

To avoid overly small denominators use this as the floor of the probability returned by the ngram model.

language_model_penalty_case

0.1

Double

Penalty for inconsistent case

language_model_penalty_chartype

0.3

Double

Penalty for inconsistent character type

language_model_penalty_font

0

Double

Penalty for inconsistent font

language_model_penalty_increment

0.01

Double

Penalty increment

language_model_penalty_non_dict_word

0.15

Double

Penalty for non-dictionary words

language_model_penalty_non_freq_dict_word

0.1

Double

Penalty for words not in the frequent word dictionary

language_model_penalty_punc

0.2

Double

Penalty for inconsistent punctuation

language_model_penalty_script

0.5

Double

Penalty for inconsistent script

language_model_penalty_spacing

0.05

Double

Penalty for inconsistent spacing

matcher_avg_noise_size

12

Double

Avg. noise blob length

matcher_bad_match_pad

0.15

Double

Bad Match Pad (0-1)

matcher_clustering_max_angle_delta

0.015

Double

Maximum angle delta for prototype clustering

matcher_good_threshold

0.125

Double

Good Match (0-1)

matcher_perfect_threshold

0.02

Double

Perfect Match (0-1)

matcher_rating_margin

0.1

Double

New template margin (0-1)

matcher_reliable_adaptive_result

0

Double

Great Match (0-1)

min_orientation_margin

7

Double

Min acceptable orientation margin

noise_cert_basechar

-8

Double

Hingepoint for base char certainty

noise_cert_disjoint

-1

Double

Hingepoint for disjoint certainty

noise_cert_factor

0.375

Double

Scaling on certainty diff from Hingepoint

noise_cert_punc

-3

Double

Threshold for new punc char certainty

oldbl_dot_error_size

1.26

Double

Max aspect ratio of a dot

oldbl_xhfract

0.4

Double

Fraction of est allowed in calc

pitsync_joined_edge

0.75

Double

Dist inside big blob for chopping

pitsync_offset_freecut_fraction

0.25

Double

Fraction of cut for free cuts

quality_blob_pc

0

Double

good_quality_doc gte good blobs limit

quality_char_pc

0.95

Double

good_quality_doc gte good char limit

quality_outline_pc

1

Double

good_quality_doc lte outline error limit

quality_rej_pc

0.08

Double

good_quality_doc lte rejection limit

quality_rowrej_pc

1.1

Double

good_quality_doc gte good char limit

rating_scale

1.5

Double

Rating scaling factor

rej_whole_of_mostly_reject_word_fract

0.85

Double

if >this fract

segment_penalty_dict_case_bad

1.3125

Double

Default score multiplier for word matches, which may have case issues (lower is better).

segment_penalty_dict_case_ok

1.1

Double

Score multiplier for word matches that have good case (lower is better).

segment_penalty_dict_frequent_word

1

Double

Score multiplier for word matches which have good case and are frequent in the given language (lower is better).

segment_penalty_dict_nonword

1.25

Double

Score multiplier for glyph fragment segmentations which do not match a dictionary word (lower is better).

segment_penalty_garbage

1.5

Double

Score multiplier for poorly cased strings that are not in the dictionary and generally look like garbage (lower is better).

segsearch_max_char_wh_ratio

2

Double

Maximum character width-to-height ratio

speckle_large_max_size

0.3

Double

Max large speckle size

speckle_rating_penalty

10

Double

Penalty to add to worst rating for noise

stopper_allowable_character_badness

3

Double

Max certaintly variation allowed in a word (in sigma)

stopper_certainty_per_char

-0.5

Double

Certainty to add for each dict char above small word size.

stopper_nondict_certainty_base

-2.5

Double

Certainty threshold for non-dict words

stopper_phase2_certainty_rejection_offset

1

Double

Reject certainty offset

subscript_max_y_top

0.5

Double

Maximum top of a character measured as a multiple of x-height above the baseline for us to reconsider whether it's a subscript.

superscript_bettered_certainty

0.97

Double

What reduction in badness do we think sufficient to choose a superscript over what we'd thought. For example, a value of 0.6 means we want to reduce badness of certainty by at least 40%

superscript_min_y_bottom

0.3

Double

Minimum bottom of a character measured as a multiple of x-height above the baseline for us to reconsider whether it's a superscript.

superscript_scaledown_ratio

0.4

Double

A superscript scaled down more than this is unbelievably small. For example, 0.3 means we expect the font size to be no smaller than 30% of the text line font size.

superscript_worse_certainty

2

Double

How many times worse certainty does a superscript position glyph need to be for us to try classifying it as a char with a different baseline?

suspect_accept_rating

-999.9

Double

Accept good rating limit

suspect_rating_per_ch

999.9

Double

Don't touch bad rating limit

tessedit_certainty_threshold

-2.25

Double

Good blob limit

tessedit_class_miss_scale

0.00390625

Double

Scale factor for features not used

tessedit_good_doc_still_rowrej_wd

1.1

Double

rej good doc wd if more than this fraction rejected

tessedit_lower_flip_hyphen

1.5

Double

Aspect ratio dot/hyphen test

tessedit_reject_block_percent

45

Double

%rej allowed before rej whole block

tessedit_reject_doc_percent

65

Double

%rej allowed before rej whole doc

tessedit_reject_row_percent

40

Double

%rej allowed before rej whole row

tessedit_upper_flip_hyphen

1.8

Double

Aspect ratio dot/hyphen test

tessedit_whole_wd_rej_row_percent

70

Double

Number of row rejects in whole word rejects which prevents whole row rejection

test_pt_x

100000

Double

xcoord

test_pt_y

100000

Double

ycoord

textord_ascheight_mode_fraction

0.08

Double

Min pile height to make ascheight

textord_ascx_ratio_max

1.8

Double

Max cap/xheight

textord_ascx_ratio_min

1.25

Double

Min cap/xheight

textord_balance_factor

1

Double

Ding rate for unbalanced char cells

textord_blshift_maxshift

0

Double

Max baseline shift

textord_blshift_xfraction

9.99

Double

Min size of baseline shift

textord_chop_width

1.5

Double

Max width before chopping

textord_descheight_mode_fraction

0.08

Double

Min pile height to make descheight

textord_descx_ratio_max

0.6

Double

Max desc/xheight

textord_descx_ratio_min

0.25

Double

Min desc/xheight

textord_excess_blobsize

1.3

Double

New row made if blob makes row this big

textord_expansion_factor

1

Double

Factor to expand rows by in expand_rows

textord_fp_chop_snap

0.5

Double

Max distance of chop pt from vertex

textord_fp_min_width

0.5

Double

Min width of decent blobs

textord_fpiqr_ratio

1.5

Double

Pitch IQR/Gap IQR threshold

textord_initialasc_ile

0.9

Double

Ile of sizes for xheight guess

textord_initialx_ile

0.75

Double

Ile of sizes for xheight guess

textord_linespace_iqrlimit

0.2

Double

Max iqr/median for linespace

textord_max_pitch_iqr

0.2

Double

Xh fraction noise in pitch

textord_min_blob_height_fraction

0.75

Double

Min blob height/top to include blob top into xheight stats

textord_min_linesize

1.25

Double

* blob height for initial linesize

textord_minxh

0.25

Double

fraction of linesize for min xheight

textord_noise_area_ratio

0.7

Double

Fraction of bounding box for noise

textord_noise_hfract

0.015625

Double

Height fraction to discard outlines as speckle noise

textord_noise_normratio

2

Double

Dot to norm ratio for deletion

textord_noise_rowratio

6

Double

Dot to norm ratio for deletion

textord_noise_sizelimit

0.5

Double

Fraction of x for big t count

textord_noise_sxfract

0.4

Double

xh fract width error for norm blobs

textord_noise_syfract

0.2

Double

xh fract height error for norm blobs

textord_occupancy_threshold

0.4

Double

Fraction of neighbourhood

textord_oldbl_jumplimit

0.15

Double

X fraction for new partition

textord_overlap_x

0.375

Double

Fraction of linespace for good overlap

textord_pitch_rowsimilarity

0.08

Double

Fraction of xheight for sameness

textord_projection_scale

0.2

Double

Ding rate for mid-cuts

textord_skew_ile

0.5

Double

Ile of gradients for page skew

textord_skew_lag

0.02

Double

Lag for skew on row accumulation

textord_spacesize_ratiofp

2.8

Double

Min ratio space/nonspace

textord_spacesize_ratioprop

2

Double

Min ratio space/nonspace

textord_spline_outlier_fraction

0.1

Double

Fraction of line spacing for outlier

textord_spline_shift_fraction

0.02

Double

Fraction of line spacing for quad

textord_tabfind_aligned_gap_fraction

0.75

Double

Fraction of height used as a minimum gap for aligned blobs.

textord_tabfind_vertical_text_ratio

0.5

Double

Fraction of textlines deemed vertical to use vertical page mode

textord_tabvector_vertical_box_ratio

0.5

Double

Fraction of box matches required to declare a line vertical

textord_tabvector_vertical_gap_fraction

0.5

Double

max fraction of mean blob width allowed for vertical gaps in vertical text

textord_underline_offset

0.1

Double

Fraction of x to ignore

textord_underline_threshold

0.5

Double

Fraction of width occupied

textord_underline_width

2

Double

Multiple of line_size for underline

textord_width_limit

8

Double

Max width of blobs to make rows

textord_width_smooth_factor

0.1

Double

Smoothing width stats

textord_words_def_fixed

0.016

Double

Threshold for definite fixed

textord_words_def_prop

0.09

Double

Threshold for definite prop

textord_words_default_maxspace

3.5

Double

Max believable third space

textord_words_default_minspace

0.6

Double

Fraction of xheight

textord_words_default_nonspace

0.2

Double

Fraction of xheight

textord_words_definite_spread

0.3

Double

Non-fuzzy spacing region

textord_words_initial_lower

0.25

Double

Max initial cluster size

textord_words_initial_upper

0.15

Double

Min initial cluster spacing

textord_words_maxspace

4

Double

Multiple of xheight

textord_words_min_minspace

0.3

Double

Fraction of xheight

textord_words_minlarge

0.75

Double

Fraction of valid gaps needed

textord_words_pitchsd_threshold

0.04

Double

Pitch sync threshold

textord_words_width_ile

0.4

Double

Ile of blob widths for space est

textord_wordstats_smooth_factor

0.05

Double

Smoothing gap stats

textord_xheight_error_margin

0.1

Double

Accepted variation

textord_xheight_mode_fraction

0.4

Double

Min pile height to make xheight

tosp_dont_fool_with_small_kerns

-1

Double

Limit use of xht gap with odd small kns

tosp_enough_small_gaps

0.65

Double

Fract of kerns reqd for isolated row stats

tosp_flip_caution

0

Double

Don't autoflip kn to sp when large separation

tosp_fuzzy_kn_fraction

0.5

Double

New fuzzy kn alg

tosp_fuzzy_sp_fraction

0.5

Double

New fuzzy sp alg

tosp_fuzzy_space_factor

0.6

Double

Fract of xheight for fuzz sp

tosp_fuzzy_space_factor1

0.5

Double

Fract of xheight for fuzz sp

tosp_fuzzy_space_factor2

0.72

Double

Fract of xheight for fuzz sp

tosp_gap_factor

0.83

Double

gap ratio to flip sp->kern

tosp_ignore_big_gaps

-1

Double

xht multiplier

tosp_ignore_very_big_gaps

3.5

Double

xht multiplier

tosp_init_guess_kn_mult

2.2

Double

Thresh guess - mult kn by this

tosp_init_guess_xht_mult

0.28

Double

Thresh guess - mult xht by this

tosp_kern_gap_factor1

2

Double

gap ratio to flip kern->sp

tosp_kern_gap_factor2

1.3

Double

gap ratio to flip kern->sp

tosp_kern_gap_factor3

2.5

Double

gap ratio to flip kern->sp

tosp_large_kerning

0.19

Double

Limit use of xht gap with large kns

tosp_max_sane_kn_thresh

5

Double

Multiplier on kn to limit thresh

tosp_min_sane_kn_sp

1.5

Double

Don't trust spaces less than this time kn

tosp_narrow_aspect_ratio

0.48

Double

narrow if w/h less than this

tosp_narrow_fraction

0.3

Double

Fract of xheight for narrow

tosp_near_lh_edge

0

Double

Don't reduce box if the top left is non blank

tosp_old_sp_kn_th_factor

2

Double

Factor for defining space threshold in terms of space and kern sizes

tosp_pass_wide_fuzz_sp_to_context

0.75

Double

How wide fuzzies need context

tosp_rep_space

1.6

Double

rep gap multiplier for space

tosp_silly_kn_sp_gap

0.2

Double

Don't let sp minus kn get too small

tosp_table_fuzzy_kn_sp_ratio

3

Double

Fuzzy if less than this

tosp_table_kn_sp_ratio

2.25

Double

Min difference of kn and sp in table

tosp_table_xht_sp_ratio

0.33

Double

Expect spaces bigger than this

tosp_threshold_bias1

0

Double

how far between kern and space?

tosp_threshold_bias2

0

Double

how far between kern and space?

tosp_wide_aspect_ratio

0

Double

wide if w/h less than this

tosp_wide_fraction

0.52

Double

Fract of xheight for wide

words_default_fixed_limit

0.6

Double

Allowed size variance

words_default_fixed_space

0.75

Double

Fraction of xheight

words_default_prop_nonspace

0.25

Double

Fraction of xheight

words_initial_lower

0.5

Double

Max initial cluster size

words_initial_upper

0.15

Double

Min initial cluster spacing

xheight_penalty_inconsistent

0.25

Double

Score penalty (0.1 = 10%) added if an xheight is inconsistent.

xheight_penalty_subscripts

0.125

Double

Score penalty (0.1 = 10%) added if there are subscripts or superscripts in a word, but it is otherwise OK.

ambigs_debug_level

0

Integer

Debug level for unichar ambiguities

applybox_debug

1

Integer

Debug level

applybox_page

0

Integer

Page number to apply boxes from

bidi_debug

0

Integer

Debug level for BiDi

chop_centered_maxwidth

90

Integer

Width of (smaller) chopped blobs above which we don't care that a chop is not near the center.

chop_debug

0

Integer

Chop debug

chop_inside_angle

-50

Integer

Min Inside Angle Bend

chop_min_outline_area

2000

Integer

Min Outline Area

chop_min_outline_points

6

Integer

Min Number of Points on Outline

chop_same_distance

2

Integer

Same distance

chop_seam_pile_size

150

Integer

Max number of seams in seam_pile

chop_split_length

10000

Integer

Split Length

chop_x_y_weight

3

Integer

X / Y length weight

classify_adapt_feature_threshold

230

Integer

Threshold for good features during adaptive 0-255

classify_adapt_proto_threshold

230

Integer

Threshold for good protos during adaptive 0-255

classify_class_pruner_multiplier

15

Integer

Class Pruner Multiplier 0-255:

classify_class_pruner_threshold

229

Integer

Class Pruner Threshold 0-255

classify_cp_cutoff_strength

7

Integer

Class Pruner CutoffStrength:

classify_debug_level

0

Integer

Classify debug level

classify_integer_matcher_multiplier

10

Integer

Integer Matcher Multiplier 0-255:

classify_learning_debug_level

0

Integer

Learning Debug Level:

classify_norm_method

1

Integer

Normalization Method ...

classify_num_cp_levels

3

Integer

Number of Class Pruner Levels

crunch_debug

0

Integer

As it says

crunch_leave_lc_strings

4

Integer

Don't crunch words with long lower case strings

crunch_leave_uc_strings

4

Integer

Don't crunch words with long lower case strings

crunch_long_repetitions

3

Integer

Crunch words with long repetitions

crunch_pot_indicators

1

Integer

How many potential indicators needed

crunch_rating_max

10

Integer

For adj length in rating per ch

dawg_debug_level

0

Integer

Set to 1 for general debug info, to 2 for more details, to 3 to see all the debug messages

debug_fix_space_level

0

Integer

Contextual fixspace debug

debug_noise_removal

0

Integer

Debug reassignment of small outlines

debug_x_ht_level

0

Integer

Reestimate debug

devanagari_split_debuglevel

0

Integer

Debug level for split shiro-rekha process.

edges_children_count_limit

46

Integer

Max holes allowed in blob

edges_children_per_grandchild

9

Integer

Importance ratio for chucking outlines

edges_max_children_layers

4

Integer

Max layers of nested children inside a character outline

edges_max_children_per_outline

15

Integer

Max number of children inside a character outline

edges_min_nonhole

14

Integer

Min pixels for potential char in box

edges_patharea_ratio

40

Integer

Max lensq/area for acceptable child outline

fixsp_done_mode

1

Integer

What constitues done for spacing

fixsp_non_noise_limit

1

Integer

How many non-noise blbs either side?

hyphen_debug_level

0

Integer

Debug level for hyphenated words.

jpg_quality

85

Integer

Set JPEG quality level

language_model_debug_level

0

Integer

Language model debug level

language_model_min_compound_length

3

Integer

Minimum length of compound words

language_model_ngram_order

8

Integer

Maximum order of the character ngram model

language_model_viterbi_list_max_num_prunable

10

Integer

Maximum number of prunable (those for which PrunablePath() is true) entries in each viterbi list recorded in BLOB_CHOICEs

language_model_viterbi_list_max_size

500

Integer

Maximum size of viterbi lists recorded in BLOB_CHOICEs

lstm_choice_mode

0

Integer

Allows to include alternative symbols choices in the hOCR output. Valid input values are 0, 1, 2 and 3. 0 is the default value. With 1 the alternative symbol choices per timestep are included. With 2 the alternative symbol choices are accumulated per character.

matcher_debug_flags

0

Integer

Matcher Debug Flags

matcher_debug_level

0

Integer

Matcher Debug Level

matcher_min_examples_for_prototyping

3

Integer

Reliable Config Threshold

matcher_permanent_classes_min

1

Integer

Min # of permanent classes

matcher_sufficient_examples_for_prototyping

5

Integer

Enable adaption even if the ambiguities have not been seen

max_permuter_attempts

10000

Integer

Maximum number of different character choices to consider during permutation. This limit is especially useful when user patterns are specified, since overly generic patterns can result in dawg search exploring an overly large number of options.

min_characters_to_try

50

Integer

Specify minimum characters to try during OSD

min_sane_x_ht_pixels

8

Integer

Reject any x-ht lt or eq than this

multilang_debug_level

0

Integer

Print multilang debug info.

noise_maxperblob

8

Integer

Max diacritics to apply to a blob

noise_maxperword

16

Integer

Max diacritics to apply to a word

ocr_devanagari_split_strategy

0

Integer

Whether to use the top-line splitting process for Devanagari documents while performing ocr.

oldbl_holed_losscount

10

Integer

Max lost before fallback line used

pageseg_devanagari_split_strategy

0

Integer

Whether to use the top-line splitting process for Devanagari documents while performing page-segmentation.

paragraph_debug_level

0

Integer

Print paragraph debug info.

pitsync_fake_depth

1

Integer

Max advance fake generation

pitsync_linear_version

6

Integer

Use new fast algorithm

ptg_pdf_resolution

300

Integer

PPI of image in scanned PDF

quality_min_initial_alphas_reqd

2

Integer

alphas in a good word

repair_unchopped_blobs

1

Integer

Fix blobs that aren't chopped

segsearch_debug_level

0

Integer

SegSearch debug level

segsearch_max_futile_classifications

20

Integer

Maximum number of pain point classifications per chunk that did not result in finding a better word choice.

segsearch_max_pain_points

2000

Integer

Maximum number of pain points stored in the queue

stopper_debug_level

0

Integer

Stopper debug level

stopper_smallword_size

2

Integer

Size of dict word to be treated as non-dict word

superscript_debug

0

Integer

Debug level for sub and superscript fixer

suspect_level

99

Integer

Suspect marker level

suspect_short_words

2

Integer

Don't suspect dict wds longer than this

tessedit_bigram_debug

0

Integer

Amount of debug output for bigram correction.

tessedit_image_border

2

Integer

Rej blbs near image edge limit

tessedit_ocr_engine_mode

2

Integer

Which OCR engine(s) to run (Tesseract, LSTM, both). Defaults to loading and running the most accurate available.

tessedit_page_number

-1

Integer

-1 -> All pages, else specific page to process

tessedit_pageseg_mode

6

Integer

Page seg mode: 0=osd only, 1=auto+osd, 2=auto_only, 3=auto, 4=column, 5=block_vert, 6=block, 7=line, 8=word, 9=word_circle, 10=char,11=sparse_text, 12=sparse_text+osd, 13=raw_line (Values from PageSegMode enum in publictypes.h)

tessedit_parallelize

0

Integer

Run in parallel where possible

tessedit_preserve_min_wd_len

2

Integer

Only preserve wds longer than this

tessedit_reject_mode

0

Integer

Rejection algorithm

tessedit_tess_adaption_mode

39

Integer

Adaptation decision algorithm for tess

tessedit_truncate_wordchoice_log

10

Integer

Max words to keep in list

textord_baseline_debug

0

Integer

Baseline debug level

textord_debug_block

0

Integer

Block to do debug on

textord_debug_bugs

0

Integer

Turn on output related to bugs in tab finding

textord_debug_tabfind

0

Integer

Debug tab finding

textord_dotmatrix_gap

3

Integer

Max pixel gap for broken pixed pitch

textord_fp_chop_error

2

Integer

Max allowed bending of chop cells

textord_lms_line_trials

12

Integer

Number of linew fits to do

textord_max_blob_overlaps

4

Integer

Max number of blobs a big blob can overlap

textord_max_noise_size

7

Integer

Pixel size of noise

textord_min_blobs_in_row

4

Integer

Min blobs before gradient counted

textord_min_xheight

10

Integer

Min credible pixel xheight

textord_noise_sizefraction

10

Integer

Fraction of size for maxima

textord_noise_sncount

1

Integer

super norm blobs to save row

textord_noise_translimit

16

Integer

Transitions for normal blob

textord_pitch_range

2

Integer

Max range test on pitch

textord_skewsmooth_offset

4

Integer

For smooth factor

textord_skewsmooth_offset2

1

Integer

For smooth factor

textord_spline_medianwin

6

Integer

Size of window for spline segmentation

textord_spline_minblobs

8

Integer

Min blobs in each spline segment

textord_tabfind_show_images

0

Integer

Show image blobs

textord_tabfind_show_partitions

0

Integer

Show partition bounds, waiting if >1

textord_tabfind_show_strokewidths

0

Integer

Show stroke widths

textord_test_x

-2147483647

Integer

coord of test pt

textord_test_y

-2147483647

Integer

coord of test pt

textord_testregion_bottom

2147483647

Integer

Bottom edge of debug rectangle

textord_testregion_left

-1

Integer

Left edge of debug reporting rectangle

textord_testregion_right

2147483647

Integer

Right edge of debug rectangle

textord_testregion_top

-1

Integer

Top edge of debug reporting rectangle

textord_words_veto_power

5

Integer

Rows required to outvote a veto

tosp_debug_level

0

Integer

Debug data

tosp_enough_space_samples_for_median

3

Integer

or should we use mean

tosp_few_samples

40

Integer

No.gaps reqd with 1 large gap to treat as a table

tosp_redo_kern_limit

10

Integer

No.samples reqd to reestimate for row

tosp_sanity_method

1

Integer

How to avoid being silly

tosp_short_row

20

Integer

No.gaps reqd with few cert spaces to use certs

user_defined_dpi

0

Integer

Specify DPI for input image

wordrec_debug_level

0

Integer

Debug level for wordrec

wordrec_display_segmentations

0

Integer

Display Segmentations

wordrec_max_join_chunks

4

Integer

Max number of broken pieces to associate

x_ht_acceptance_tolerance

8

Integer

Max allowed deviation of blob top outside of font data

x_ht_min_change

8

Integer

Min change in xht before actually trying it

applybox_exposure_pattern

.exp

String

Exposure value follows this pattern in the image filename. The name of the image files are expected to be in the form [lang].[fontname].exp[num].tif

chs_leading_punct

('`"

String

Leading punctuation

chs_trailing_punct1

).,;:?!

String

1st Trailing punctuation

chs_trailing_punct2

)'`"

String

2nd Trailing punctuation

classify_font_name

UnknownFont

String

Default font name to be used in training

classify_learn_debug_str

String

Class str to debug learning

conflict_set_I_l_1

Il1[]

String

Il1 conflict set

debug_file

String

File to send tprintf output to

document_title

String

Title of output document (used for hOCR and PDF output)

dotproduct

auto

String

Function used for calculation of dot product

file_type

.tif

String

Filename extension

numeric_punctuation

.,

String

Punct. chs expected WITHIN numbers

ok_repeated_ch_non_alphanum_wds

-?*=

String

Allow NN to unrej

outlines_2

ij!?%":;

String

Non standard number of outlines

outlines_odd

%|

String

Non standard number of outlines

output_ambig_words_file

String

Output file for ambiguities found in the dictionary

page_separator

String

Page separator (default is form feed control character)

tessedit_char_blacklist

String

Blacklist of chars not to recognize

tessedit_char_unblacklist

String

List of chars to override tessedit_char_blacklist

tessedit_char_whitelist

String

Whitelist of chars to recognize

tessedit_load_sublangs

String

List of languages to load with this one

tessedit_write_params_to_file

String

Write all parameters to the given file.

unrecognised_char

|

String

Output char for unidentified blobs

user_patterns_file

String

A filename of user-provided patterns.

user_patterns_suffix

String

A suffix of user-provided patterns located in tessdata.

user_words_file

String

A filename of user-provided words.

user_words_suffix

String

A suffix of user-provided words located in tessdata.

word_to_debug

String

Word for which stopper debug information should be printed to stdout

Parameters in 3.05 version

textord_debug_tabfind0Debug tab finding
textord_debug_bugs0Turn on output related to bugs in tab finding
textord_testregion_left-1Left edge of debug reporting rectangle
textord_testregion_top-1Top edge of debug reporting rectangle
textord_testregion_right2147483647Right edge of debug rectangle
textord_testregion_bottom2147483647Bottom edge of debug rectangle
textord_tabfind_show_partitions0Show partition bounds, waiting if >1
devanagari_split_debuglevel0Debug level for split shiro-rekha process.
edges_max_children_per_outline16Max number of children inside a character outline
edges_max_children_layers4Max layers of nested children inside a character outline
edges_children_per_grandchild9Importance ratio for chucking outlines
edges_children_count_limit46Max holes allowed in blob
edges_min_nonhole14Min pixels for potential char in box
edges_patharea_ratio40Max lensq/area for acceptable child outline
textord_fp_chop_error2Max allowed bending of chop cells
textord_tabfind_show_images0Show image blobs
classify_num_cp_levels3Number of Class Pruner Levels
textord_skewsmooth_offset4For smooth factor
textord_skewsmooth_offset21For smooth factor
textord_test_x-2147483647coord of test pt
textord_test_y-2147483647coord of test pt
textord_min_blobs_in_row4Min blobs before gradient counted
textord_spline_minblobs8Min blobs in each spline segment
textord_spline_medianwin6Size of window for spline segmentation
textord_max_blob_overlaps4Max number of blobs a big blob can overlap
textord_min_xheight10Min credible pixel xheight
textord_lms_line_trials12Number of linew fits to do
oldbl_holed_losscount10Max lost before fallback line used
editor_image_xpos590Editor image X Pos
editor_image_ypos10Editor image Y Pos
editor_image_menuheight50Add to image height for menu bar
editor_image_word_bb_color7Word bounding box colour
editor_image_blob_bb_color4Blob bounding box colour
editor_image_text_color2Correct text colour
editor_dbwin_xpos50Editor debug window X Pos
editor_dbwin_ypos500Editor debug window Y Pos
editor_dbwin_height24Editor debug window height
editor_dbwin_width80Editor debug window width
editor_word_xpos60Word window X Pos
editor_word_ypos510Word window Y Pos
editor_word_height240Word window height
editor_word_width655Word window width
pitsync_linear_version6Use new fast algorithm
pitsync_fake_depth1Max advance fake generation
textord_tabfind_show_strokewidths0Show stroke widths
textord_dotmatrix_gap3Max pixel gap for broken pixed pitch
textord_debug_block0Block to do debug on
textord_pitch_range2Max range test on pitch
textord_words_veto_power5Rows required to outvote a veto
textord_debug_images0Use greyed image background for debug
textord_debug_printable0Make debug windows printable
stream_filelist0Stream a filelist from stdin
textord_space_size_is_variable0If true, word delimiter spaces are assumed to have variable width, even though characters have fixed pitch.
textord_tabfind_show_initial_partitions0Show partition bounds
textord_tabfind_show_reject_blobs0Show blobs rejected as noise
textord_tabfind_show_columns0Show column bounds
textord_tabfind_show_blocks0Show final block bounds
textord_tabfind_find_tables1run table detection
textord_tabfind_show_color_fit0Show stroke widths
devanagari_split_debugimage0Whether to create a debug image for split shiro-rekha process.
textord_show_fixed_cuts0Draw fixed pitch cell boundaries
edges_use_new_outline_complexity0Use the new outline complexity module
edges_debug0turn on debugging for this module
edges_children_fix0Remove boxy parents of char-like children
equationdetect_save_bi_image0Save input bi image
equationdetect_save_spt_image0Save special character image
equationdetect_save_seed_image0Save the seed image
equationdetect_save_merged_image0Save the merged image
gapmap_debug0Say which blocks have tables
gapmap_use_ends0Use large space at start and end of rows
gapmap_no_isolated_quanta0Ensure gaps not less than 2quanta wide
textord_heavy_nr0Vigorously remove noise
textord_show_initial_rows0Display row accumulation
textord_show_parallel_rows0Display page correlated rows
textord_show_expanded_rows0Display rows after expanding
textord_show_final_rows0Display rows after final fitting
textord_show_final_blobs0Display blob bounds after pre-ass
textord_test_landscape0Tests refer to land/port
textord_parallel_baselines1Force parallel baselines
textord_straight_baselines0Force straight baselines
textord_old_baselines1Use old baseline algorithm
textord_old_xheight0Use old xheight algorithm
textord_fix_xheight_bug1Use spline baseline
textord_fix_makerow_bug1Prevent multiple baselines
textord_debug_xheights0Test xheight algorithms
textord_biased_skewcalc1Bias skew estimates with line length
textord_interpolating_skew1Interpolate across gaps
textord_new_initial_xheight1Use test xheight mechanism
textord_debug_blob0Print test blob information
textord_really_old_xheight0Use original wiseowl xheight
textord_oldbl_debug0Debug old baseline generation
textord_debug_baselines0Debug baseline generation
textord_oldbl_paradef1Use para default mechanism
textord_oldbl_split_splines1Split stepped splines
textord_oldbl_merge_parts1Merge suspect partitions
oldbl_corrfix1Improve correlation of heights
oldbl_xhfix0Fix bug in modes threshold for xheights
textord_ocropus_mode0Make baselines for ocropus
poly_debug0Debug old poly
poly_wide_objects_better1More accurate approx on wide things
wordrec_display_all_blobs0Display Blobs
wordrec_display_all_words0Display Words
wordrec_blob_pause0Blob pause
wordrec_display_splits0Display splits
textord_tabfind_only_strokewidths0Only run stroke widths
textord_tabfind_show_initialtabs0Show tab candidates
textord_tabfind_show_finaltabs0Show tab vectors
textord_dump_table_images0Paint table detection output
textord_show_tables0Show table regions
textord_tablefind_show_mark0Debug table marking steps in detail
textord_tablefind_show_stats0Show page stats used in table finding
textord_tablefind_recognize_tables0Enables the table recognizer for table layout and filtering.
textord_all_prop0All doc is proportial text
textord_debug_pitch_test0Debug on fixed pitch test
textord_disable_pitch_test0Turn off dp fixed pitch algorithm
textord_fast_pitch_test0Do even faster pitch algorithm
textord_debug_pitch_metric0Write full metric stuff
textord_show_row_cuts0Draw row-level cuts
textord_show_page_cuts0Draw page-level cuts
textord_pitch_cheat0Use correct answer for fixed/prop
textord_blockndoc_fixed0Attempt whole doc/block fixed pitch
textord_show_initial_words0Display separate words
textord_show_new_words0Display separate words
textord_show_fixed_words0Display forced fixed pitch words
textord_blocksall_fixed0Moan about prop blocks
textord_blocksall_prop0Moan about fixed pitch blocks
textord_blocksall_testing0Dump stats when moaning
textord_test_mode0Do current test
textord_pitch_scalebigwords0Scale scores on big words
textord_restore_underlines1Chop underlines and put back
textord_fp_chopping1Do fixed pitch chopping
textord_force_make_prop_words0Force proportional word segmentation on all rows
textord_chopper_test0Chopper is being tested.
classify_font_nameUnknownFontDefault font name to be used in training
fx_debugfileFXDebugName of debugfile
editor_image_win_nameEditorImageEditor image window name
editor_dbwin_nameEditorDBWinEditor debug window name
editor_word_nameBlnWordsBL normalized word window
editor_debug_config_fileConfig file to apply to single words
classify_training_fileMicroFeaturesTraining file
debug_fileFile to send tprintf output to
textord_underline_threshold0.5Fraction of width occupied
edges_childarea0.5Min area fraction of child outline
edges_boxarea0.875Min area fraction of grandchild for box
textord_fp_chop_snap0.5Max distance of chop pt from vertex
gapmap_big_gaps1.75xht multiplier
classify_cp_angle_pad_loose45Class Pruner Angle Pad Loose
classify_cp_angle_pad_medium20Class Pruner Angle Pad Medium
classify_cp_angle_pad_tight10CLass Pruner Angle Pad Tight
classify_cp_end_pad_loose0.5Class Pruner End Pad Loose
classify_cp_end_pad_medium0.5Class Pruner End Pad Medium
classify_cp_end_pad_tight0.5Class Pruner End Pad Tight
classify_cp_side_pad_loose2.5Class Pruner Side Pad Loose
classify_cp_side_pad_medium1.2Class Pruner Side Pad Medium
classify_cp_side_pad_tight0.6Class Pruner Side Pad Tight
classify_pp_angle_pad45Proto Pruner Angle Pad
classify_pp_end_pad0.5Proto Prune End Pad
classify_pp_side_pad2.5Proto Pruner Side Pad
textord_spline_shift_fraction0.02Fraction of line spacing for quad
textord_spline_outlier_fraction0.1Fraction of line spacing for outlier
textord_skew_ile0.5Ile of gradients for page skew
textord_skew_lag0.02Lag for skew on row accumulation
textord_linespace_iqrlimit0.2Max iqr/median for linespace
textord_width_limit8Max width of blobs to make rows
textord_chop_width1.5Max width before chopping
textord_expansion_factor1Factor to expand rows by in expand_rows
textord_overlap_x0.375Fraction of linespace for good overlap
textord_minxh0.25fraction of linesize for min xheight
textord_min_linesize1.25* blob height for initial linesize
textord_excess_blobsize1.3New row made if blob makes row this big
textord_occupancy_threshold0.4Fraction of neighbourhood
textord_underline_width2Multiple of line_size for underline
textord_min_blob_height_fraction0.75Min blob height/top to include blob top into xheight stats
textord_xheight_mode_fraction0.4Min pile height to make xheight
textord_ascheight_mode_fraction0.08Min pile height to make ascheight
textord_descheight_mode_fraction0.08Min pile height to make descheight
textord_ascx_ratio_min1.25Min cap/xheight
textord_ascx_ratio_max1.8Max cap/xheight
textord_descx_ratio_min0.25Min desc/xheight
textord_descx_ratio_max0.6Max desc/xheight
textord_xheight_error_margin0.1Accepted variation
classify_min_slope0.414214Slope below which lines are called horizontal
classify_max_slope2.41421Slope above which lines are called vertical
classify_norm_adj_midpoint32Norm adjust midpoint ...
classify_norm_adj_curl2Norm adjust curl ...
oldbl_xhfract0.4Fraction of est allowed in calc
oldbl_dot_error_size1.26Max aspect ratio of a dot
textord_oldbl_jumplimit0.15X fraction for new partition
classify_pico_feature_length0.05Pico Feature Length
pitsync_joined_edge0.75Dist inside big blob for chopping
pitsync_offset_freecut_fraction0.25Fraction of cut for free cuts
textord_tabvector_vertical_gap_fraction0.5max fraction of mean blob width allowed for vertical gaps in vertical text
textord_tabvector_vertical_box_ratio0.5Fraction of box matches required to declare a line vertical
textord_projection_scale0.2Ding rate for mid-cuts
textord_balance_factor1Ding rate for unbalanced char cells
textord_wordstats_smooth_factor0.05Smoothing gap stats
textord_width_smooth_factor0.1Smoothing width stats
textord_words_width_ile0.4Ile of blob widths for space est
textord_words_maxspace4Multiple of xheight
textord_words_default_maxspace3.5Max believable third space
textord_words_default_minspace0.6Fraction of xheight
textord_words_min_minspace0.3Fraction of xheight
textord_words_default_nonspace0.2Fraction of xheight
textord_words_initial_lower0.25Max inital cluster size
textord_words_initial_upper0.15Min initial cluster spacing
textord_words_minlarge0.75Fraction of valid gaps needed
textord_words_pitchsd_threshold0.04Pitch sync threshold
textord_words_def_fixed0.016Threshold for definite fixed
textord_words_def_prop0.09Threshold for definite prop
textord_pitch_rowsimilarity0.08Fraction of xheight for sameness
words_initial_lower0.5Max inital cluster size
words_initial_upper0.15Min initial cluster spacing
words_default_prop_nonspace0.25Fraction of xheight
words_default_fixed_space0.75Fraction of xheight
words_default_fixed_limit0.6Allowed size variance
textord_words_definite_spread0.3Non-fuzzy spacing region
textord_spacesize_ratiofp2.8Min ratio space/nonspace
textord_spacesize_ratioprop2Min ratio space/nonspace
textord_fpiqr_ratio1.5Pitch IQR/Gap IQR threshold
textord_max_pitch_iqr0.2Xh fraction noise in pitch
textord_fp_min_width0.5Min width of decent blobs
textord_underline_offset0.1Fraction of x to ignore
ambigs_debug_level0Debug level for unichar ambiguities
tessedit_single_match0Top choice only from CP
classify_debug_level0Classify debug level
classify_norm_method1Normalization Method ...
matcher_debug_level0Matcher Debug Level
matcher_debug_flags0Matcher Debug Flags
classify_learning_debug_level0Learning Debug Level:
matcher_permanent_classes_min1Min # of permanent classes
matcher_min_examples_for_prototyping3Reliable Config Threshold
matcher_sufficient_examples_for_prototyping5Enable adaption even if the ambiguities have not been seen
classify_adapt_proto_threshold230Threshold for good protos during adaptive 0-255
classify_adapt_feature_threshold230Threshold for good features during adaptive 0-255
classify_class_pruner_threshold229Class Pruner Threshold 0-255
classify_class_pruner_multiplier15Class Pruner Multiplier 0-255:
classify_cp_cutoff_strength7Class Pruner CutoffStrength:
classify_integer_matcher_multiplier10Integer Matcher Multiplier 0-255:
il1_adaption_test0Dont adapt to i/I at beginning of word
dawg_debug_level0Set to 1 for general debug info, to 2 for more details, to 3 to see all the debug messages
hyphen_debug_level0Debug level for hyphenated words.
max_viterbi_list_size10Maximum size of viterbi list.
stopper_smallword_size2Size of dict word to be treated as non-dict word
stopper_debug_level0Stopper debug level
tessedit_truncate_wordchoice_log10Max words to keep in list
fragments_debug0Debug character fragments
max_permuter_attempts10000Maximum number of different character choices to consider during permutation. This limit is especially useful when user patterns are specified, since overly generic patterns can result in dawg search exploring an overly large number of options.
repair_unchopped_blobs1Fix blobs that aren't chopped
chop_debug0Chop debug
chop_split_length10000Split Length
chop_same_distance2Same distance
chop_min_outline_points6Min Number of Points on Outline
chop_seam_pile_size150Max number of seams in seam_pile
chop_inside_angle-50Min Inside Angle Bend
chop_min_outline_area2000Min Outline Area
chop_centered_maxwidth90Width of (smaller) chopped blobs above which we don't care that a chop is not near the center.
chop_x_y_weight3X / Y length weight
segment_adjust_debug0Segmentation adjustment debug
wordrec_debug_level0Debug level for wordrec
wordrec_max_join_chunks4Max number of broken pieces to associate
segsearch_debug_level0SegSearch debug level
segsearch_max_pain_points2000Maximum number of pain points stored in the queue
segsearch_max_futile_classifications20Maximum number of pain point classifications per chunk thatdid not result in finding a better word choice.
language_model_debug_level0Language model debug level
language_model_ngram_order8Maximum order of the character ngram model
language_model_viterbi_list_max_num_prunable10Maximum number of prunable (those for which PrunablePath() is true) entries in each viterbi list recorded in BLOB_CHOICEs
language_model_viterbi_list_max_size500Maximum size of viterbi lists recorded in BLOB_CHOICEs
language_model_min_compound_length3Minimum length of compound words
wordrec_display_segmentations0Display Segmentations
tessedit_pageseg_mode6Page seg mode: 0=osd only, 1=auto+osd, 2=auto, 3=col, 4=block, 5=line, 6=word, 7=char (Values from PageSegMode enum in publictypes.h)
tessedit_ocr_engine_mode0Which OCR engine(s) to run (Tesseract, Cube, both). Defaults to loading and running only Tesseract (no Cube,no combiner). Values from OcrEngineMode enum in tesseractclass.h)
pageseg_devanagari_split_strategy0Whether to use the top-line splitting process for Devanagari documents while performing page-segmentation.
ocr_devanagari_split_strategy0Whether to use the top-line splitting process for Devanagari documents while performing ocr.
bidi_debug0Debug level for BiDi
applybox_debug1Debug level
applybox_page0Page number to apply boxes from
tessedit_bigram_debug0Amount of debug output for bigram correction.
debug_noise_removal0Debug reassignment of small outlines
noise_maxperblob8Max diacritics to apply to a blob
noise_maxperword16Max diacritics to apply to a word
debug_x_ht_level0Reestimate debug
quality_min_initial_alphas_reqd2alphas in a good word
tessedit_tess_adaption_mode39Adaptation decision algorithm for tess
tessedit_test_adaption_mode3Adaptation decision algorithm for tess
paragraph_debug_level0Print paragraph debug info.
cube_debug_level0Print cube debug info.
tessedit_preserve_min_wd_len2Only preserve wds longer than this
crunch_rating_max10For adj length in rating per ch
crunch_pot_indicators1How many potential indicators needed
crunch_leave_lc_strings4Dont crunch words with long lower case strings
crunch_leave_uc_strings4Dont crunch words with long lower case strings
crunch_long_repetitions3Crunch words with long repetitions
crunch_debug0As it says
fixsp_non_noise_limit1How many non-noise blbs either side?
fixsp_done_mode1What constitues done for spacing
debug_fix_space_level0Contextual fixspace debug
x_ht_acceptance_tolerance8Max allowed deviation of blob top outside of font data
x_ht_min_change8Min change in xht before actually trying it
superscript_debug0Debug level for sub and superscript fixer
suspect_level99Suspect marker level
suspect_space_level100Min suspect level for rejecting spaces
suspect_short_words2Dont Suspect dict wds longer than this
tessedit_reject_mode0Rejection algorithm
tessedit_image_border2Rej blbs near image edge limit
min_sane_x_ht_pixels8Reject any x-ht lt or eq than this
tessedit_page_number-1-1 -> All pages , else specifc page to process
tessdata_manager_debug_level0Debug level for TessdataManager functions.
tessedit_parallelize0Run in parallel where possible
tessedit_ok_mode5Acceptance decision algorithm
segment_debug0Debug the whole segmentation process
language_model_fixed_length_choices_depth3Depth of blob choice lists to explore when fixed length dawgs are on
tosp_debug_level0Debug data
tosp_enough_space_samples_for_median3or should we use mean
tosp_redo_kern_limit10No.samples reqd to reestimate for row
tosp_few_samples40No.gaps reqd with 1 large gap to treat as a table
tosp_short_row20No.gaps reqd with few cert spaces to use certs
tosp_sanity_method1How to avoid being silly
textord_max_noise_size7Pixel size of noise
textord_baseline_debug0Baseline debug level
textord_noise_sizefraction10Fraction of size for maxima
textord_noise_translimit16Transitions for normal blob
textord_noise_sncount1super norm blobs to save row
use_definite_ambigs_for_classifier0Use definite ambiguities when running character classifier
use_ambigs_for_adaption0Use ambigs for deciding whether to adapt to a character
allow_blob_division1Use divisible blobs chopping
prioritize_division0Prioritize blob division over chopping
classify_enable_learning1Enable adaptive classifier
tess_cn_matching0Character Normalized Matching
tess_bn_matching0Baseline Normalized Matching
classify_enable_adaptive_matcher1Enable adaptive classifier
classify_use_pre_adapted_templates0Use pre-adapted classifier templates
classify_save_adapted_templates0Save adapted templates to a file
classify_enable_adaptive_debugger0Enable match debugger
classify_nonlinear_norm0Non-linear stroke-density normalization
disable_character_fragments1Do not include character fragments in the results of the classifier
classify_debug_character_fragments0Bring up graphical debugging windows for fragments training
matcher_debug_separate_windows0Use two different windows for debugging the matching: One for the protos and one for the features.
classify_bln_numeric_mode0Assume the input is numbers [0-9].
load_system_dawg1Load system word dawg.
load_freq_dawg1Load frequent word dawg.
load_unambig_dawg1Load unambiguous word dawg.
load_punc_dawg1Load dawg with punctuation patterns.
load_number_dawg1Load dawg with number patterns.
load_bigram_dawg1Load dawg with special word bigrams.
use_only_first_uft8_step0Use only the first UTF8 step of the given string when computing log probabilities.
stopper_no_acceptable_choices0Make AcceptableChoice() always return false. Useful when there is a need to explore all segmentations
save_raw_choices0Deprecated- backward compatablity only
segment_nonalphabetic_script0Don't use any alphabetic-specific tricks.Set to true in the traineddata config file for scripts that are cursive or inherently fixed-pitch
save_doc_words0Save Document Words
merge_fragments_in_matrix1Merge the fragments in the ratings matrix and delete them after merging
wordrec_no_block0Don't output block information
wordrec_enable_assoc1Associator Enable
force_word_assoc0force associator to run regardless of what enable_assoc is.This is used for CJK where component grouping is necessary.
fragments_guide_chopper0Use information from fragments to guide chopping process
chop_enable1Chop enable
chop_vertical_creep0Vertical creep
chop_new_seam_pile1Use new seam_pile
assume_fixed_pitch_char_segment0include fixed-pitch heuristics in char segmentation
wordrec_skip_no_truth_words0Only run OCR for words that had truth recorded in BlamerBundle
wordrec_debug_blamer0Print blamer debug messages
wordrec_run_blamer0Try to set the blame for errors
save_alt_choices1Save alternative paths found during chopping and segmentation search
language_model_ngram_on0Turn on/off the use of character ngram model
language_model_ngram_use_only_first_uft8_step0Use only the first UTF8 step of the given string when computing log probabilities.
language_model_ngram_space_delimited_language1Words are delimited by space
language_model_use_sigmoidal_certainty0Use sigmoidal score for certainty
tessedit_resegment_from_boxes0Take segmentation and labeling from box file
tessedit_resegment_from_line_boxes0Conversion of word/line box file to char box file
tessedit_train_from_boxes0Generate training data from boxed chars
tessedit_make_boxes_from_boxes0Generate more boxes from boxed chars
tessedit_dump_pageseg_images0Dump intermediate images made during page segmentation
tessedit_ambigs_training0Perform training for ambiguities
tessedit_adaption_debug0Generate and print debug information for adaption
applybox_learn_chars_and_char_frags_mode0Learn both character fragments (as is done in the special low exposure mode) as well as unfragmented characters.
applybox_learn_ngrams_mode0Each bounding box is assumed to contain ngrams. Only learn the ngrams whose outlines overlap horizontally.
tessedit_display_outwords0Draw output words
tessedit_dump_choices0Dump char choices
tessedit_timing_debug0Print timing stats
tessedit_fix_fuzzy_spaces1Try to improve fuzzy spaces
tessedit_unrej_any_wd0Dont bother with word plausibility
tessedit_fix_hyphens1Crunch double hyphens?
tessedit_redo_xheight1Check/Correct x-height
tessedit_enable_doc_dict1Add words to the document dictionary
tessedit_debug_fonts0Output font info per char
tessedit_debug_block_rejection0Block and Row stats
tessedit_enable_bigram_correction1Enable correction based on the word bigram dictionary.
tessedit_enable_dict_correction0Enable single word correction based on the dictionary.
enable_noise_removal1Remove and conditionally reassign small outlines when they confuse layout analysis, determining diacritics vs noise
debug_acceptable_wds0Dump word pass/fail chk
tessedit_minimal_rej_pass10Do minimal rejection on pass 1 output
tessedit_test_adaption0Test adaption criteria
tessedit_matcher_log0Log matcher activity
test_pt0Test for point
paragraph_text_based1Run paragraph detection on the post-text-recognition (more accurate)
docqual_excuse_outline_errs0Allow outline errs in unrejection?
tessedit_good_quality_unrej1Reduce rejection on good docs
tessedit_use_reject_spaces1Reject spaces?
tessedit_preserve_blk_rej_perfect_wds1Only rej partially rejected words in block rejection
tessedit_preserve_row_rej_perfect_wds1Only rej partially rejected words in row rejection
tessedit_dont_blkrej_good_wds0Use word segmentation quality metric
tessedit_dont_rowrej_good_wds0Use word segmentation quality metric
tessedit_row_rej_good_docs1Apply row rejection to good docs
tessedit_reject_bad_qual_wds1Reject all bad quality wds
tessedit_debug_doc_rejection0Page stats
tessedit_debug_quality_metrics0Output data to debug file
bland_unrej0unrej potential with no chekcs
unlv_tilde_crunching1Mark v.bad words for tilde crunch
hocr_font_info0Add font info to hocr output
crunch_early_merge_tess_fails1Before word crunch?
crunch_early_convert_bad_unlv_chs0Take out ~^ early?
crunch_terrible_garbage1As it says
crunch_pot_garbage1POTENTIAL crunch garbage
crunch_leave_ok_strings1Dont touch sensible strings
crunch_accept_ok1Use acceptability in okstring
crunch_leave_accept_strings0Dont pot crunch sensible strings
crunch_include_numerals0Fiddle alpha figures
tessedit_prefer_joined_punct0Reward punctation joins
tessedit_write_block_separators0Write block separators in output
tessedit_write_rep_codes0Write repetition char code
tessedit_write_unlv0Write .unlv output file
tessedit_create_txt1Write .txt output file
tessedit_create_hocr0Write .html hOCR output file
tessedit_create_pdf0Write .pdf output file
suspect_constrain_1Il0UNLV keep 1Il chars rejected
tessedit_minimal_rejection0Only reject tess failures
tessedit_zero_rejection0Dont reject ANYTHING
tessedit_word_for_word0Make output have exactly one word per WERD
tessedit_zero_kelvin_rejection0Dont reject ANYTHING AT ALL
tessedit_consistent_reps1Force all rep chars the same
tessedit_rejection_debug0Adaption debug
tessedit_flip_0O1Contextual 0O O0 flips
rej_trust_doc_dawg0Use DOC dawg in 11l conf. detector
rej_1Il_use_dict_word0Use dictword test
rej_1Il_trust_permuter_type1Dont double check
rej_use_tess_accepted1Individual rejection control
rej_use_tess_blanks1Individual rejection control
rej_use_good_perm1Individual rejection control
rej_use_sensible_wd0Extend permuter check
rej_alphas_in_number_perm0Extend permuter check
tessedit_create_boxfile0Output text with boxes
tessedit_write_images0Capture the image from the IPE
interactive_display_mode0Run interactively?
tessedit_override_permuter1According to dict_word
tessedit_use_primary_params_model0In multilingual mode use params model of the primary language
textord_tabfind_show_vlines0Debug line finding
textord_use_cjk_fp_model0Use CJK fixed pitch model
poly_allow_detailed_fx0Allow feature extractors to see the original outline
tessedit_init_config_only0Only initialize with the config file. Useful if the instance is not going to be used for OCR but say only for layout analysis.
textord_equation_detect0Turn on equation detector
textord_tabfind_vertical_text1Enable vertical detection
textord_tabfind_force_vertical_text0Force using vertical text page mode
preserve_interword_spaces0Preserve multiple interword spaces
include_page_breaks0Include page separator string in output text after each image/page.
textord_tabfind_vertical_horizontal_mix1find horizontal lines such as headers in vertical page mode
load_fixed_length_dawgs1Load fixed length dawgs (e.g. for non-space delimited languages)
permute_debug0Debug char permutation process
permute_script_word0Turn on word script consistency permuter
segment_segcost_rating0incorporate segmentation cost in word rating?
permute_fixed_length_dawg0Turn on fixed-length phrasebook search permuter
permute_chartype_word0Turn on character type (property) consistency permuter
ngram_permuter_activated0Activate character-level n-gram-based permuter
permute_only_top0Run only the top choice permuter
use_new_state_cost0use new state cost heuristics for segmentation state evaluation
enable_new_segsearch0Enable new segmentation search path.
textord_single_height_mode0Script has no xheight, so use a single mode
tosp_old_to_method0Space stats use prechopping?
tosp_old_to_constrain_sp_kn0Constrain relative values of inter and intra-word gaps for old_to_method.
tosp_only_use_prop_rows1Block stats to use fixed pitch rows?
tosp_force_wordbreak_on_punct0Force word breaks on punct to break long lines in non-space delimited langs
tosp_use_pre_chopping0Space stats use prechopping?
tosp_old_to_bug_fix0Fix suspected bug in old code
tosp_block_use_cert_spaces1Only stat OBVIOUS spaces
tosp_row_use_cert_spaces1Only stat OBVIOUS spaces
tosp_narrow_blobs_not_cert1Only stat OBVIOUS spaces
tosp_row_use_cert_spaces11Only stat OBVIOUS spaces
tosp_recovery_isolated_row_stats1Use row alone when inadequate cert spaces
tosp_only_small_gaps_for_kern0Better guess
tosp_all_flips_fuzzy0Pass ANY flip to context?
tosp_fuzzy_limit_all1Dont restrict kn->sp fuzzy limit to tables
tosp_stats_use_xht_gaps1Use within xht gap for wd breaks
tosp_use_xht_gaps1Use within xht gap for wd breaks
tosp_only_use_xht_gaps0Only use within xht gap for wd breaks
tosp_rule_9_test_punct0Dont chng kn to space next to punct
tosp_flip_fuzz_kn_to_sp1Default flip
tosp_flip_fuzz_sp_to_kn1Default flip
tosp_improve_thresh0Enable improvement heuristic
textord_no_rejects0Don't remove noise blobs
textord_show_blobs0Display unsorted blobs
textord_show_boxes0Display unsorted blobs
textord_noise_rejwords1Reject noise-like words
textord_noise_rejrows1Reject noise-like rows
textord_noise_debug0Debug row garbage detector
m_data_sub_dirtessdata/Directory for data files
tessedit_module_namelibtesseract304.dllModule colocated with tessdata dir
classify_learn_debug_strClass str to debug learning
user_words_fileA filename of user-provided words.
user_words_suffixA suffix of user-provided words located in tessdata.
user_patterns_fileA filename of user-provided patterns.
user_patterns_suffixA suffix of user-provided patterns located in tessdata.
output_ambig_words_fileOutput file for ambiguities found in the dictionary
word_to_debugWord for which stopper debug information should be printed to stdout
word_to_debug_lengthsLengths of unichars in word_to_debug
tessedit_char_blacklistBlacklist of chars not to recognize
tessedit_char_whitelistWhitelist of chars to recognize
tessedit_char_unblacklistList of chars to override tessedit_char_blacklist
tessedit_write_params_to_fileWrite all parameters to the given file.
applybox_exposure_pattern.expExposure value follows this pattern in the image filename. The name of the image files are expected to be in the form [lang].[fontname].exp[num].tif
chs_leading_punct('`"Leading punctuation
chs_trailing_punct1).,;:?!1st Trailing punctuation
chs_trailing_punct2)'`"2nd Trailing punctuation
outlines_odd%| Non standard number of outlines
outlines_2ij!?%":;Non standard number of outlines
numeric_punctuation.,Punct. chs expected WITHIN numbers
unrecognised_char|Output char for unidentified blobs
ok_repeated_ch_non_alphanum_wds-?*=Allow NN to unrej
conflict_set_I_l_1Il1[]Il1 conflict set
file_type.tifFilename extension
tessedit_load_sublangsList of languages to load with this one
page_separatorPage separator (default is form feed control character)
classify_char_norm_range0.2Character Normalization Range ...
classify_min_norm_scale_x0Min char x-norm scale ...
classify_max_norm_scale_x0.325Max char x-norm scale ...
classify_min_norm_scale_y0Min char y-norm scale ...
classify_max_norm_scale_y0.325Max char y-norm scale ...
classify_max_rating_ratio1.5Veto ratio between classifier ratings
classify_max_certainty_margin5.5Veto difference between classifier certainties
matcher_good_threshold0.125Good Match (0-1)
matcher_reliable_adaptive_result0Great Match (0-1)
matcher_perfect_threshold0.02Perfect Match (0-1)
matcher_bad_match_pad0.15Bad Match Pad (0-1)
matcher_rating_margin0.1New template margin (0-1)
matcher_avg_noise_size12Avg. noise blob length
matcher_clustering_max_angle_delta0.015Maximum angle delta for prototype clustering
classify_misfit_junk_penalty0Penalty to apply when a non-alnum is vertically out of its expected textline position
rating_scale1.5Rating scaling factor
certainty_scale20Certainty scaling factor
tessedit_class_miss_scale0.00390625Scale factor for features not used
classify_adapted_pruning_factor2.5Prune poor adapted results this much worse than best result
classify_adapted_pruning_threshold-1Threshold at which classify_adapted_pruning_factor starts
classify_character_fragments_garbage_certainty_threshold-3Exclude fragments that do not look like whole characters from training and adaption
speckle_large_max_size0.3Max large speckle size
speckle_rating_penalty10Penalty to add to worst rating for noise
xheight_penalty_subscripts0.125Score penalty (0.1 = 10%) added if there are subscripts or superscripts in a word, but it is otherwise OK.
xheight_penalty_inconsistent0.25Score penalty (0.1 = 10%) added if an xheight is inconsistent.
segment_penalty_dict_frequent_word1Score multiplier for word matches which have good case andare frequent in the given language (lower is better).
segment_penalty_dict_case_ok1.1Score multiplier for word matches that have good case (lower is better).
segment_penalty_dict_case_bad1.3125Default score multiplier for word matches, which may have case issues (lower is better).
segment_penalty_ngram_best_choice1.24Multipler to for the best choice from the ngram model.
segment_penalty_dict_nonword1.25Score multiplier for glyph fragment segmentations which do not match a dictionary word (lower is better).
segment_penalty_garbage1.5Score multiplier for poorly cased strings that are not in the dictionary and generally look like garbage (lower is better).
certainty_scale20Certainty scaling factor
stopper_nondict_certainty_base-2.5Certainty threshold for non-dict words
stopper_phase2_certainty_rejection_offset1Reject certainty offset
stopper_certainty_per_char-0.5Certainty to add for each dict char above small word size.
stopper_allowable_character_badness3Max certaintly variation allowed in a word (in sigma)
doc_dict_pending_threshold0Worst certainty for using pending dictionary
doc_dict_certainty_threshold-2.25Worst certainty for words that can be inserted into thedocument dictionary
wordrec_worst_state1Worst segmentation state
tessedit_certainty_threshold-2.25Good blob limit
chop_split_dist_knob0.5Split length adjustment
chop_overlap_knob0.9Split overlap adjustment
chop_center_knob0.15Split center adjustment
chop_sharpness_knob0.06Split sharpness adjustment
chop_width_change_knob5Width change adjustment
chop_ok_split100OK split limit
chop_good_split50Good split limit
segsearch_max_char_wh_ratio2Maximum character width-to-height ratio
language_model_ngram_small_prob1e-006To avoid overly small denominators use this as the floor of the probability returned by the ngram model.
language_model_ngram_nonmatch_score-40Average classifier score of a non-matching unichar.
language_model_ngram_scale_factor0.03Strength of the character ngram model relative to the character classifier
language_model_ngram_rating_factor16Factor to bring log-probs into the same range as ratings when multiplied by outline length
language_model_penalty_non_freq_dict_word0.1Penalty for words not in the frequent word dictionary
language_model_penalty_non_dict_word0.15Penalty for non-dictionary words
language_model_penalty_punc0.2Penalty for inconsistent punctuation
language_model_penalty_case0.1Penalty for inconsistent case
language_model_penalty_script0.5Penalty for inconsistent script
language_model_penalty_chartype0.3Penalty for inconsistent character type
language_model_penalty_font0Penalty for inconsistent font
language_model_penalty_spacing0.05Penalty for inconsistent spacing
language_model_penalty_increment0.01Penalty increment
noise_cert_basechar-8Hingepoint for base char certainty
noise_cert_disjoint-1Hingepoint for disjoint certainty
noise_cert_punc-3Threshold for new punc char certainty
noise_cert_factor0.375Scaling on certainty diff from Hingepoint
quality_rej_pc0.08good_quality_doc lte rejection limit
quality_blob_pc0good_quality_doc gte good blobs limit
quality_outline_pc1good_quality_doc lte outline error limit
quality_char_pc0.95good_quality_doc gte good char limit
test_pt_x100000xcoord
test_pt_y100000ycoord
tessedit_reject_doc_percent65%rej allowed before rej whole doc
tessedit_reject_block_percent45%rej allowed before rej whole block
tessedit_reject_row_percent40%rej allowed before rej whole row
tessedit_whole_wd_rej_row_percent70Number of row rejects in whole word rejectswhich prevents whole row rejection
tessedit_good_doc_still_rowrej_wd1.1rej good doc wd if more than this fraction rejected
quality_rowrej_pc1.1good_quality_doc gte good char limit
crunch_terrible_rating80crunch rating lt this
crunch_poor_garbage_cert-9crunch garbage cert lt this
crunch_poor_garbage_rate60crunch garbage rating lt this
crunch_pot_poor_rate40POTENTIAL crunch rating lt this
crunch_pot_poor_cert-8POTENTIAL crunch cert lt this
crunch_del_rating60POTENTIAL crunch rating lt this
crunch_del_cert-10POTENTIAL crunch cert lt this
crunch_del_min_ht0.7Del if word ht lt xht x this
crunch_del_max_ht3Del if word ht gt xht x this
crunch_del_min_width3Del if word width lt xht x this
crunch_del_high_word1.5Del if word gt xht x this above bl
crunch_del_low_word0.5Del if word gt xht x this below bl
crunch_small_outlines_size0.6Small if lt xht x this
fixsp_small_outlines_size0.28Small if lt xht x this
superscript_worse_certainty2How many times worse certainty does a superscript position glyph need to be for us to try classifying it as a char with a different baseline?
superscript_bettered_certainty0.97What reduction in badness do we think sufficient to choose a superscript over what we'd thought. For example, a value of 0.6 means we want to reduce badness of certainty by at least 40%
superscript_scaledown_ratio0.4A superscript scaled down more than this is unbelievably small. For example, 0.3 means we expect the font size to be no smaller than 30% of the text line font size.
subscript_max_y_top0.5Maximum top of a character measured as a multiple of x-height above the baseline for us to reconsider whether it's a subscript.
superscript_min_y_bottom0.3Minimum bottom of a character measured as a multiple of x-height above the baseline for us to reconsider whether it's a superscript.
suspect_rating_per_ch999.9Dont touch bad rating limit
suspect_accept_rating-999.9Accept good rating limit
tessedit_lower_flip_hyphen1.5Aspect ratio dot/hyphen test
tessedit_upper_flip_hyphen1.8Aspect ratio dot/hyphen test
rej_whole_of_mostly_reject_word_fract0.85if >this fract
min_orientation_margin7Min acceptable orientation margin
textord_tabfind_vertical_text_ratio0.5Fraction of textlines deemed vertical to use vertical page mode
textord_tabfind_aligned_gap_fraction0.75Fraction of height used as a minimum gap for aligned blobs.
bestrate_pruning_factor2Multiplying factor of current best rate to prune other hypotheses
segment_reward_script0.95 Score multipler for script consistency within a word. Being a 'reward' factor, it should be ≤ 1. Smaller value implies bigger reward.
segment_reward_chartype0.97Score multipler for char type consistency within a word.
segment_reward_ngram_best_choice0.99Score multipler for ngram permuter's best choice (only used in the Han script path).
heuristic_segcost_rating_base1.25base factor for adding segmentation cost into word rating.It's a multiplying factor, the larger the value above 1, the bigger the effect of segmentation cost.
heuristic_weight_rating1weight associated with char rating in combined cost ofstate
heuristic_weight_width1000weight associated with width evidence in combined cost of state
heuristic_weight_seamcut0weight associated with seam cut in combined cost of state
heuristic_max_char_wh_ratio2max char width-to-height ratio allowed in segmentation
segsearch_max_fixed_pitch_char_wh_ratio2Maximum character width-to-height ratio for fixed-pitch fonts
tosp_old_sp_kn_th_factor2Factor for defining space threshold in terms of space and kern sizes
tosp_threshold_bias10how far between kern and space?
tosp_threshold_bias20how far between kern and space?
tosp_narrow_fraction0.3Fract of xheight for narrow
tosp_narrow_aspect_ratio0.48narrow if w/h less than this
tosp_wide_fraction0.52Fract of xheight for wide
tosp_wide_aspect_ratio0wide if w/h less than this
tosp_fuzzy_space_factor0.6Fract of xheight for fuzz sp
tosp_fuzzy_space_factor10.5Fract of xheight for fuzz sp
tosp_fuzzy_space_factor20.72Fract of xheight for fuzz sp
tosp_gap_factor0.83gap ratio to flip sp->kern
tosp_kern_gap_factor12gap ratio to flip kern->sp
tosp_kern_gap_factor21.3gap ratio to flip kern->sp
tosp_kern_gap_factor32.5gap ratio to flip kern->sp
tosp_ignore_big_gaps-1xht multiplier
tosp_ignore_very_big_gaps3.5xht multiplier
tosp_rep_space1.6rep gap multiplier for space
tosp_enough_small_gaps0.65Fract of kerns reqd for isolated row stats
tosp_table_kn_sp_ratio2.25Min difference of kn and sp in table
tosp_table_xht_sp_ratio0.33Expect spaces bigger than this
tosp_table_fuzzy_kn_sp_ratio3Fuzzy if less than this
tosp_fuzzy_kn_fraction0.5New fuzzy kn alg
tosp_fuzzy_sp_fraction0.5New fuzzy sp alg
tosp_min_sane_kn_sp1.5Dont trust spaces less than this time kn
tosp_init_guess_kn_mult2.2Thresh guess - mult kn by this
tosp_init_guess_xht_mult0.28Thresh guess - mult xht by this
tosp_max_sane_kn_thresh5Multiplier on kn to limit thresh
tosp_flip_caution0Dont autoflip kn to sp when large separation
tosp_large_kerning0.19Limit use of xht gap with large kns
tosp_dont_fool_with_small_kerns-1Limit use of xht gap with odd small kns
tosp_near_lh_edge0Dont reduce box if the top left is non blank
tosp_silly_kn_sp_gap0.2Dont let sp minus kn get too small
tosp_pass_wide_fuzz_sp_to_context0.75How wide fuzzies need context
textord_blob_size_bigile95Percentile for large blobs
textord_noise_area_ratio0.7Fraction of bounding box for noise
textord_blob_size_smallile20Percentile for small blobs
textord_initialx_ile0.75Ile of sizes for xheight guess
textord_initialasc_ile0.9Ile of sizes for xheight guess
textord_noise_sizelimit0.5Fraction of x for big t count
textord_noise_normratio2Dot to norm ratio for deletion
textord_noise_syfract0.2xh fract height error for norm blobs
textord_noise_sxfract0.4xh fract width error for norm blobs
textord_noise_hfract0.015625Height fraction to discard outlines as speckle noise
textord_noise_rowratio6Dot to norm ratio for deletion
textord_blshift_maxshift0Max baseline shift
textord_blshift_xfraction9.99Min size of baseline shift

See Also

Reference