![]() | Tesseract.Net parameters |
„Tesseract is extremely flexible, if you know how to control it. There is a large number of control parameters to modify its behaviour. While these change from time to time, most of them are fairly stable.“ (Tesseract ControlParams wiki)
There are two way how to set parameter: cofig file and through API
Config file is simple text file without BOM and with Unix end-of-line mark (on Windows you can use some advanced text editor e.g. Notepad++ to achieve this).
Config file should be located in your tessdata/configs directory.
Your tessdata directory should be look like this:
tessdats\
configs\
config.cfg
pdf.ttf
pdf.ttx
...
language files
public void InitWithConfig() { using (var api = OcrApi.Create()) { string[] configs = { "config.cfg" }; api.Init(@"path_to_tessdata_folder", "eng", OcrEngineMode.OEM_DEFAULT, configs); } }
You can set single parameter with API function SetVariable. E.g.
public void Init() { using (var api = OcrApi.Create()) { api.Init(@"path_to_tessdata_folder", "eng"); api.SetVariable("editor_image_xpos", "590"); } }
In case you want (need) to set parameter during tesseract init you need to create arrays for parameters and their values. Here is example:
public void InitWithVariables() { using (var api = OcrApi.Create()) { string[] variables = { "editor_image_xpos", "editor_dbwin_width" }; string[] values = { "590", "80" }; api.Init(@"path_to_tessdata_folder", "eng", OcrEngineMode.OEM_DEFAULT, null, variables, values); } }
Of course you can use also API function ReadConfigFiles(String) (or ReadDebugConfigFiles(String)) to read tesseract config files with non-init parameters.
Variable | Default value | Type | Description |
allow_blob_division | 1 | Boolean | Use divisible blobs chopping |
applybox_learn_chars_and_char_frags_mode | 0 | Boolean | Learn both character fragments (as is done in the special low exposure mode) as well as unfragmented characters. |
applybox_learn_ngrams_mode | 0 | Boolean | Each bounding box is assumed to contain ngrams. Only learn the ngrams whose outlines overlap horizontally. |
assume_fixed_pitch_char_segment | 0 | Boolean | include fixed-pitch heuristics in char segmentation |
bland_unrej | 0 | Boolean | unrej potential with no checks |
chop_enable | 1 | Boolean | Chop enable |
chop_new_seam_pile | 1 | Boolean | Use new seam_pile |
chop_vertical_creep | 0 | Boolean | Vertical creep |
classify_bln_numeric_mode | 0 | Boolean | Assume the input is numbers [0-9]. |
classify_debug_character_fragments | 0 | Boolean | Bring up graphical debugging windows for fragments training |
classify_enable_adaptive_debugger | 0 | Boolean | Enable match debugger |
classify_enable_adaptive_matcher | 1 | Boolean | Enable adaptive classifier |
classify_enable_learning | 1 | Boolean | Enable adaptive classifier |
classify_nonlinear_norm | 0 | Boolean | Non-linear stroke-density normalization |
classify_save_adapted_templates | 0 | Boolean | Save adapted templates to a file |
classify_use_pre_adapted_templates | 0 | Boolean | Use pre-adapted classifier templates |
crunch_accept_ok | 1 | Boolean | Use acceptability in okstring |
crunch_early_convert_bad_unlv_chs | 0 | Boolean | Take out ~^ early? |
crunch_early_merge_tess_fails | 1 | Boolean | Before word crunch? |
crunch_include_numerals | 0 | Boolean | Fiddle alpha figures |
crunch_leave_accept_strings | 0 | Boolean | Don't pot crunch sensible strings |
crunch_leave_ok_strings | 1 | Boolean | Don't touch sensible strings |
crunch_terrible_garbage | 1 | Boolean | As it says |
devanagari_split_debugimage | 0 | Boolean | Whether to create a debug image for split shiro-rekha process. |
disable_character_fragments | 1 | Boolean | Do not include character fragments in the results of the classifier |
edges_children_fix | 0 | Boolean | Remove boxy parents of char-like children |
edges_debug | 0 | Boolean | turn on debugging for this module |
edges_use_new_outline_complexity | 0 | Boolean | Use the new outline complexity module |
enable_noise_removal | 1 | Boolean | Remove and conditionally reassign small outlines when they confuse layout analysis, determining diacritics vs noise |
equationdetect_save_bi_image | 0 | Boolean | Save input bi image |
equationdetect_save_merged_image | 0 | Boolean | Save the merged image |
equationdetect_save_seed_image | 0 | Boolean | Save the seed image |
equationdetect_save_spt_image | 0 | Boolean | Save special character image |
force_word_assoc | 0 | Boolean | force associator to run regardless of what enable_assoc is. This is used for CJK where component grouping is necessary. |
gapmap_debug | 0 | Boolean | Say which blocks have tables |
gapmap_no_isolated_quanta | 0 | Boolean | Ensure gaps not less than 2quanta wide |
gapmap_use_ends | 0 | Boolean | Use large space at start and end of rows |
hocr_char_boxes | 0 | Boolean | Add coordinates for each character to hocr output |
hocr_font_info | 0 | Boolean | Add font info to hocr output |
interactive_display_mode | 0 | Boolean | Run interactively? |
language_model_ngram_on | 0 | Boolean | Turn on/off the use of character ngram model |
language_model_ngram_space_delimited_language | 1 | Boolean | Words are delimited by space |
language_model_ngram_use_only_first_uft8_step | 0 | Boolean | Use only the first UTF8 step of the given string when computing log probabilities. |
language_model_use_sigmoidal_certainty | 0 | Boolean | Use sigmoidal score for certainty |
load_bigram_dawg | 1 | Boolean | Load dawg with special word bigrams. |
load_freq_dawg | 1 | Boolean | Load frequent word dawg. |
load_number_dawg | 1 | Boolean | Load dawg with number patterns. |
load_punc_dawg | 1 | Boolean | Load dawg with punctuation patterns. |
load_system_dawg | 1 | Boolean | Load system word dawg. |
load_unambig_dawg | 1 | Boolean | Load unambiguous word dawg. |
lstm_use_matrix | 1 | Boolean | Use ratings matrix/beam search with lstm |
matcher_debug_separate_windows | 0 | Boolean | Use two different windows for debugging the matching: One for the protos and one for the features. |
merge_fragments_in_matrix | 1 | Boolean | Merge the fragments in the ratings matrix and delete them after merging |
oldbl_corrfix | 1 | Boolean | Improve correlation of heights |
oldbl_xhfix | 0 | Boolean | Fix bug in modes threshold for xheights |
pageseg_apply_music_mask | 1 | Boolean | Detect music staff and remove intersecting components |
paragraph_text_based | 1 | Boolean | Run paragraph detection on the post-text-recognition (more accurate) |
poly_allow_detailed_fx | 0 | Boolean | Allow feature extractors to see the original outline |
poly_debug | 0 | Boolean | Debug old poly |
poly_wide_objects_better | 1 | Boolean | More accurate approx on wide things |
preserve_interword_spaces | 0 | Boolean | Preserve multiple interword spaces |
prioritize_division | 0 | Boolean | Prioritize blob division over chopping |
rej_1Il_trust_permuter_type | 1 | Boolean | Don't double check |
rej_1Il_use_dict_word | 0 | Boolean | Use dictword test |
rej_alphas_in_number_perm | 0 | Boolean | Extend permuter check |
rej_trust_doc_dawg | 0 | Boolean | Use DOC dawg in 11l conf. detector |
rej_use_good_perm | 1 | Boolean | Individual rejection control |
rej_use_sensible_wd | 0 | Boolean | Extend permuter check |
rej_use_tess_accepted | 1 | Boolean | Individual rejection control |
rej_use_tess_blanks | 1 | Boolean | Individual rejection control |
save_alt_choices | 1 | Boolean | Save alternative paths found during chopping and segmentation search |
save_doc_words | 0 | Boolean | Save Document Words |
segment_nonalphabetic_script | 0 | Boolean | Don't use any alphabetic-specific tricks. Set to true in the traineddata config file for scripts that are cursive or inherently fixed-pitch |
stopper_no_acceptable_choices | 0 | Boolean | Make AcceptableChoice() always return false. Useful when there is a need to explore all segmentations |
stream_filelist | 0 | Boolean | Stream a filelist from stdin |
suspect_constrain_1Il | 0 | Boolean | UNLV keep 1Il chars rejected |
tess_bn_matching | 0 | Boolean | Baseline Normalized Matching |
tess_cn_matching | 0 | Boolean | Character Normalized Matching |
tessedit_adaption_debug | 0 | Boolean | Generate and print debug information for adaption |
tessedit_ambigs_training | 0 | Boolean | Perform training for ambiguities |
tessedit_create_alto | 0 | Boolean | Write .xml ALTO file |
tessedit_create_boxfile | 0 | Boolean | Output text with boxes |
tessedit_create_hocr | 0 | Boolean | Write .html hOCR output file |
tessedit_create_lstmbox | 0 | Boolean | Write .box file for LSTM training |
tessedit_create_pdf | 0 | Boolean | Write .pdf output file |
tessedit_create_tsv | 0 | Boolean | Write .tsv output file |
tessedit_create_txt | 0 | Boolean | Write .txt output file |
tessedit_create_wordstrbox | 0 | Boolean | Write WordStr format .box output file |
tessedit_debug_block_rejection | 0 | Boolean | Block and Row stats |
tessedit_debug_doc_rejection | 0 | Boolean | Page stats |
tessedit_debug_fonts | 0 | Boolean | Output font info per char |
tessedit_debug_quality_metrics | 0 | Boolean | Output data to debug file |
tessedit_display_outwords | 0 | Boolean | Draw output words |
tessedit_do_invert | 1 | Boolean | Try inverting the image in `LSTMRecognizeWord` |
tessedit_dont_blkrej_good_wds | 0 | Boolean | Use word segmentation quality metric |
tessedit_dont_rowrej_good_wds | 0 | Boolean | Use word segmentation quality metric |
tessedit_dump_choices | 0 | Boolean | Dump char choices |
tessedit_dump_pageseg_images | 0 | Boolean | Dump intermediate images made during page segmentation |
tessedit_enable_bigram_correction | 1 | Boolean | Enable correction based on the word bigram dictionary. |
tessedit_enable_dict_correction | 0 | Boolean | Enable single word correction based on the dictionary. |
tessedit_enable_doc_dict | 1 | Boolean | Add words to the document dictionary |
tessedit_fix_fuzzy_spaces | 1 | Boolean | Try to improve fuzzy spaces |
tessedit_fix_hyphens | 1 | Boolean | Crunch double hyphens? |
tessedit_flip_0O | 1 | Boolean | Contextual 0O O0 flips |
tessedit_good_quality_unrej | 1 | Boolean | Reduce rejection on good docs |
tessedit_init_config_only | 0 | Boolean | Only initialize with the config file. Useful if the instance is not going to be used for OCR but say only for layout analysis. |
tessedit_make_boxes_from_boxes | 0 | Boolean | Generate more boxes from boxed chars |
tessedit_minimal_rej_pass1 | 0 | Boolean | Do minimal rejection on pass 1 output |
tessedit_minimal_rejection | 0 | Boolean | Only reject tess failures |
tessedit_override_permuter | 1 | Boolean | According to dict_word |
tessedit_prefer_joined_punct | 0 | Boolean | Reward punctuation joins |
tessedit_preserve_blk_rej_perfect_wds | 1 | Boolean | Only rej partially rejected words in block rejection |
tessedit_preserve_row_rej_perfect_wds | 1 | Boolean | Only rej partially rejected words in row rejection |
tessedit_reject_bad_qual_wds | 1 | Boolean | Reject all bad quality wds |
tessedit_rejection_debug | 0 | Boolean | Adaption debug |
tessedit_resegment_from_boxes | 0 | Boolean | Take segmentation and labeling from box file |
tessedit_resegment_from_line_boxes | 0 | Boolean | Conversion of word/line box file to char box file |
tessedit_row_rej_good_docs | 1 | Boolean | Apply row rejection to good docs |
tessedit_test_adaption | 0 | Boolean | Test adaption criteria |
tessedit_timing_debug | 0 | Boolean | Print timing stats |
tessedit_train_from_boxes | 0 | Boolean | Generate training data from boxed chars |
tessedit_train_line_recognizer | 0 | Boolean | Break input into lines and remap boxes if present |
tessedit_unrej_any_wd | 0 | Boolean | Don't bother with word plausibility |
tessedit_use_primary_params_model | 0 | Boolean | In multilingual mode use params model of the primary language |
tessedit_use_reject_spaces | 1 | Boolean | Reject spaces? |
tessedit_word_for_word | 0 | Boolean | Make output have exactly one word per WERD |
tessedit_write_block_separators | 0 | Boolean | Write block separators in output |
tessedit_write_images | 0 | Boolean | Capture the image from the IPE |
tessedit_write_rep_codes | 0 | Boolean | Write repetition char code |
tessedit_write_unlv | 0 | Boolean | Write .unlv output file |
tessedit_zero_kelvin_rejection | 0 | Boolean | Don't reject ANYTHING AT ALL |
tessedit_zero_rejection | 0 | Boolean | Don't reject ANYTHING |
test_pt | 0 | Boolean | Test for point |
textonly_pdf | 0 | Boolean | Create PDF with only one invisible text layer |
textord_all_prop | 0 | Boolean | All doc is proportial text |
textord_biased_skewcalc | 1 | Boolean | Bias skew estimates with line length |
textord_blockndoc_fixed | 0 | Boolean | Attempt whole doc/block fixed pitch |
textord_blocksall_fixed | 0 | Boolean | Moan about prop blocks |
textord_blocksall_prop | 0 | Boolean | Moan about fixed pitch blocks |
textord_blocksall_testing | 0 | Boolean | Dump stats when moaning |
textord_chopper_test | 0 | Boolean | Chopper is being tested. |
textord_debug_baselines | 0 | Boolean | Debug baseline generation |
textord_debug_blob | 0 | Boolean | Print test blob information |
textord_debug_pitch_metric | 0 | Boolean | Write full metric stuff |
textord_debug_pitch_test | 0 | Boolean | Debug on fixed pitch test |
textord_debug_printable | 0 | Boolean | Make debug windows printable |
textord_debug_xheights | 0 | Boolean | Test xheight algorithms |
textord_disable_pitch_test | 0 | Boolean | Turn off dp fixed pitch algorithm |
textord_equation_detect | 0 | Boolean | Turn on equation detector |
textord_fast_pitch_test | 0 | Boolean | Do even faster pitch algorithm |
textord_fix_makerow_bug | 1 | Boolean | Prevent multiple baselines |
textord_fix_xheight_bug | 1 | Boolean | Use spline baseline |
textord_force_make_prop_words | 0 | Boolean | Force proportional word segmentation on all rows |
textord_fp_chopping | 1 | Boolean | Do fixed pitch chopping |
textord_heavy_nr | 0 | Boolean | Vigorously remove noise |
textord_interpolating_skew | 1 | Boolean | Interpolate across gaps |
textord_new_initial_xheight | 1 | Boolean | Use test xheight mechanism |
textord_no_rejects | 0 | Boolean | Don't remove noise blobs |
textord_noise_debug | 0 | Boolean | Debug row garbage detector |
textord_noise_rejrows | 1 | Boolean | Reject noise-like rows |
textord_noise_rejwords | 1 | Boolean | Reject noise-like words |
textord_ocropus_mode | 0 | Boolean | Make baselines for ocropus |
textord_old_baselines | 1 | Boolean | Use old baseline algorithm |
textord_old_xheight | 0 | Boolean | Use old xheight algorithm |
textord_oldbl_debug | 0 | Boolean | Debug old baseline generation |
textord_oldbl_merge_parts | 1 | Boolean | Merge suspect partitions |
textord_oldbl_paradef | 1 | Boolean | Use para default mechanism |
textord_oldbl_split_splines | 1 | Boolean | Split stepped splines |
textord_parallel_baselines | 1 | Boolean | Force parallel baselines |
textord_pitch_cheat | 0 | Boolean | Use correct answer for fixed/prop |
textord_pitch_scalebigwords | 0 | Boolean | Scale scores on big words |
textord_really_old_xheight | 0 | Boolean | Use original wiseowl xheight |
textord_restore_underlines | 1 | Boolean | Chop underlines and put back |
textord_show_blobs | 0 | Boolean | Display unsorted blobs |
textord_show_boxes | 0 | Boolean | Display unsorted blobs |
textord_show_expanded_rows | 0 | Boolean | Display rows after expanding |
textord_show_final_blobs | 0 | Boolean | Display blob bounds after pre-ass |
textord_show_final_rows | 0 | Boolean | Display rows after final fitting |
textord_show_fixed_cuts | 0 | Boolean | Draw fixed pitch cell boundaries |
textord_show_fixed_words | 0 | Boolean | Display forced fixed pitch words |
textord_show_initial_rows | 0 | Boolean | Display row accumulation |
textord_show_initial_words | 0 | Boolean | Display separate words |
textord_show_new_words | 0 | Boolean | Display separate words |
textord_show_page_cuts | 0 | Boolean | Draw page-level cuts |
textord_show_parallel_rows | 0 | Boolean | Display page correlated rows |
textord_show_row_cuts | 0 | Boolean | Draw row-level cuts |
textord_show_tables | 0 | Boolean | Show table regions |
textord_single_height_mode | 0 | Boolean | Script has no xheight, so use a single mode |
textord_space_size_is_variable | 0 | Boolean | If true, word delimiter spaces are assumed to have variable width, even though characters have fixed pitch. |
textord_straight_baselines | 0 | Boolean | Force straight baselines |
textord_tabfind_find_tables | 1 | Boolean | run table detection |
textord_tabfind_force_vertical_text | 0 | Boolean | Force using vertical text page mode |
textord_tabfind_only_strokewidths | 0 | Boolean | Only run stroke widths |
textord_tabfind_show_blocks | 0 | Boolean | Show final block bounds |
textord_tabfind_show_columns | 0 | Boolean | Show column bounds |
textord_tabfind_show_finaltabs | 0 | Boolean | Show tab vectors |
textord_tabfind_show_initial_partitions | 0 | Boolean | Show partition bounds |
textord_tabfind_show_initialtabs | 0 | Boolean | Show tab candidates |
textord_tabfind_show_reject_blobs | 0 | Boolean | Show blobs rejected as noise |
textord_tabfind_show_vlines | 0 | Boolean | Debug line finding |
textord_tabfind_vertical_text | 1 | Boolean | Enable vertical detection |
textord_tablefind_recognize_tables | 0 | Boolean | Enables the table recognizer for table layout and filtering. |
textord_tablefind_show_mark | 0 | Boolean | Debug table marking steps in detail |
textord_tablefind_show_stats | 0 | Boolean | Show page stats used in table finding |
textord_test_landscape | 0 | Boolean | Tests refer to land/port |
textord_test_mode | 0 | Boolean | Do current test |
textord_use_cjk_fp_model | 0 | Boolean | Use CJK fixed pitch model |
tosp_all_flips_fuzzy | 0 | Boolean | Pass ANY flip to context? |
tosp_block_use_cert_spaces | 1 | Boolean | Only stat OBVIOUS spaces |
tosp_flip_fuzz_kn_to_sp | 1 | Boolean | Default flip |
tosp_flip_fuzz_sp_to_kn | 1 | Boolean | Default flip |
tosp_force_wordbreak_on_punct | 0 | Boolean | Force word breaks on punct to break long lines in non-space delimited langs |
tosp_fuzzy_limit_all | 1 | Boolean | Don't restrict kn->sp fuzzy limit to tables |
tosp_improve_thresh | 0 | Boolean | Enable improvement heuristic |
tosp_narrow_blobs_not_cert | 1 | Boolean | Only stat OBVIOUS spaces |
tosp_old_to_bug_fix | 0 | Boolean | Fix suspected bug in old code |
tosp_old_to_constrain_sp_kn | 0 | Boolean | Constrain relative values of inter and intra-word gaps for old_to_method. |
tosp_old_to_method | 0 | Boolean | Space stats use prechopping? |
tosp_only_small_gaps_for_kern | 0 | Boolean | Better guess |
tosp_only_use_prop_rows | 1 | Boolean | Block stats to use fixed pitch rows? |
tosp_only_use_xht_gaps | 0 | Boolean | Only use within xht gap for wd breaks |
tosp_recovery_isolated_row_stats | 1 | Boolean | Use row alone when inadequate cert spaces |
tosp_row_use_cert_spaces | 1 | Boolean | Only stat OBVIOUS spaces |
tosp_row_use_cert_spaces1 | 1 | Boolean | Only stat OBVIOUS spaces |
tosp_rule_9_test_punct | 0 | Boolean | Don't chng kn to space next to punct |
tosp_stats_use_xht_gaps | 1 | Boolean | Use within xht gap for wd breaks |
tosp_use_pre_chopping | 0 | Boolean | Space stats use prechopping? |
tosp_use_xht_gaps | 1 | Boolean | Use within xht gap for wd breaks |
unlv_tilde_crunching | 0 | Boolean | Mark v.bad words for tilde crunch |
use_ambigs_for_adaption | 0 | Boolean | Use ambigs for deciding whether to adapt to a character |
use_only_first_uft8_step | 0 | Boolean | Use only the first UTF8 step of the given string when computing log probabilities. |
wordrec_blob_pause | 0 | Boolean | Blob pause |
wordrec_debug_blamer | 0 | Boolean | Print blamer debug messages |
wordrec_display_all_blobs | 0 | Boolean | Display Blobs |
wordrec_display_splits | 0 | Boolean | Display splits |
wordrec_enable_assoc | 1 | Boolean | Associator Enable |
wordrec_run_blamer | 0 | Boolean | Try to set the blame for errors |
wordrec_skip_no_truth_words | 0 | Boolean | Only run OCR for words that had truth recorded in BlamerBundle |
certainty_scale | 20 | Double | Certainty scaling factor |
certainty_scale | 20 | Double | Certainty scaling factor |
chop_center_knob | 0.15 | Double | Split center adjustment |
chop_good_split | 50 | Double | Good split limit |
chop_ok_split | 100 | Double | OK split limit |
chop_overlap_knob | 0.9 | Double | Split overlap adjustment |
chop_sharpness_knob | 0.06 | Double | Split sharpness adjustment |
chop_split_dist_knob | 0.5 | Double | Split length adjustment |
chop_width_change_knob | 5 | Double | Width change adjustment |
classify_adapted_pruning_factor | 2.5 | Double | Prune poor adapted results this much worse than best result |
classify_adapted_pruning_threshold | -1 | Double | Threshold at which classify_adapted_pruning_factor starts |
classify_char_norm_range | 0.2 | Double | Character Normalization Range ... |
classify_character_fragments_garbage_certainty_threshold | -3 | Double | Exclude fragments that do not look like whole characters from training and adaption |
classify_cp_angle_pad_loose | 45 | Double | Class Pruner Angle Pad Loose |
classify_cp_angle_pad_medium | 20 | Double | Class Pruner Angle Pad Medium |
classify_cp_angle_pad_tight | 10 | Double | CLass Pruner Angle Pad Tight |
classify_cp_end_pad_loose | 0.5 | Double | Class Pruner End Pad Loose |
classify_cp_end_pad_medium | 0.5 | Double | Class Pruner End Pad Medium |
classify_cp_end_pad_tight | 0.5 | Double | Class Pruner End Pad Tight |
classify_cp_side_pad_loose | 2.5 | Double | Class Pruner Side Pad Loose |
classify_cp_side_pad_medium | 1.2 | Double | Class Pruner Side Pad Medium |
classify_cp_side_pad_tight | 0.6 | Double | Class Pruner Side Pad Tight |
classify_max_certainty_margin | 5.5 | Double | Veto difference between classifier certainties |
classify_max_rating_ratio | 1.5 | Double | Veto ratio between classifier ratings |
classify_max_slope | 2.41421 | Double | Slope above which lines are called vertical |
classify_min_slope | 0.414214 | Double | Slope below which lines are called horizontal |
classify_misfit_junk_penalty | 0 | Double | Penalty to apply when a non-alnum is vertically out of its expected textline position |
classify_norm_adj_curl | 2 | Double | Norm adjust curl ... |
classify_norm_adj_midpoint | 32 | Double | Norm adjust midpoint ... |
classify_pico_feature_length | 0.05 | Double | Pico Feature Length |
classify_pp_angle_pad | 45 | Double | Proto Pruner Angle Pad |
classify_pp_end_pad | 0.5 | Double | Proto Prune End Pad |
classify_pp_side_pad | 2.5 | Double | Proto Pruner Side Pad |
crunch_del_cert | -10 | Double | POTENTIAL crunch cert lt this |
crunch_del_high_word | 1.5 | Double | Del if word gt xht x this above bl |
crunch_del_low_word | 0.5 | Double | Del if word gt xht x this below bl |
crunch_del_max_ht | 3 | Double | Del if word ht gt xht x this |
crunch_del_min_ht | 0.7 | Double | Del if word ht lt xht x this |
crunch_del_min_width | 3 | Double | Del if word width lt xht x this |
crunch_del_rating | 60 | Double | POTENTIAL crunch rating lt this |
crunch_poor_garbage_cert | -9 | Double | crunch garbage cert lt this |
crunch_poor_garbage_rate | 60 | Double | crunch garbage rating lt this |
crunch_pot_poor_cert | -8 | Double | POTENTIAL crunch cert lt this |
crunch_pot_poor_rate | 40 | Double | POTENTIAL crunch rating lt this |
crunch_small_outlines_size | 0.6 | Double | Small if lt xht x this |
crunch_terrible_rating | 80 | Double | crunch rating lt this |
doc_dict_certainty_threshold | -2.25 | Double | Worst certainty for words that can be inserted into the document dictionary |
doc_dict_pending_threshold | 0 | Double | Worst certainty for using pending dictionary |
edges_boxarea | 0.875 | Double | Min area fraction of grandchild for box |
edges_childarea | 0.5 | Double | Min area fraction of child outline |
fixsp_small_outlines_size | 0.28 | Double | Small if lt xht x this |
gapmap_big_gaps | 1.75 | Double | xht multiplier |
language_model_ngram_nonmatch_score | -40 | Double | Average classifier score of a non-matching unichar. |
language_model_ngram_rating_factor | 16 | Double | Factor to bring log-probs into the same range as ratings when multiplied by outline length |
language_model_ngram_scale_factor | 0.03 | Double | Strength of the character ngram model relative to the character classifier |
language_model_ngram_small_prob | 0.000001 | Double | To avoid overly small denominators use this as the floor of the probability returned by the ngram model. |
language_model_penalty_case | 0.1 | Double | Penalty for inconsistent case |
language_model_penalty_chartype | 0.3 | Double | Penalty for inconsistent character type |
language_model_penalty_font | 0 | Double | Penalty for inconsistent font |
language_model_penalty_increment | 0.01 | Double | Penalty increment |
language_model_penalty_non_dict_word | 0.15 | Double | Penalty for non-dictionary words |
language_model_penalty_non_freq_dict_word | 0.1 | Double | Penalty for words not in the frequent word dictionary |
language_model_penalty_punc | 0.2 | Double | Penalty for inconsistent punctuation |
language_model_penalty_script | 0.5 | Double | Penalty for inconsistent script |
language_model_penalty_spacing | 0.05 | Double | Penalty for inconsistent spacing |
matcher_avg_noise_size | 12 | Double | Avg. noise blob length |
matcher_bad_match_pad | 0.15 | Double | Bad Match Pad (0-1) |
matcher_clustering_max_angle_delta | 0.015 | Double | Maximum angle delta for prototype clustering |
matcher_good_threshold | 0.125 | Double | Good Match (0-1) |
matcher_perfect_threshold | 0.02 | Double | Perfect Match (0-1) |
matcher_rating_margin | 0.1 | Double | New template margin (0-1) |
matcher_reliable_adaptive_result | 0 | Double | Great Match (0-1) |
min_orientation_margin | 7 | Double | Min acceptable orientation margin |
noise_cert_basechar | -8 | Double | Hingepoint for base char certainty |
noise_cert_disjoint | -1 | Double | Hingepoint for disjoint certainty |
noise_cert_factor | 0.375 | Double | Scaling on certainty diff from Hingepoint |
noise_cert_punc | -3 | Double | Threshold for new punc char certainty |
oldbl_dot_error_size | 1.26 | Double | Max aspect ratio of a dot |
oldbl_xhfract | 0.4 | Double | Fraction of est allowed in calc |
pitsync_joined_edge | 0.75 | Double | Dist inside big blob for chopping |
pitsync_offset_freecut_fraction | 0.25 | Double | Fraction of cut for free cuts |
quality_blob_pc | 0 | Double | good_quality_doc gte good blobs limit |
quality_char_pc | 0.95 | Double | good_quality_doc gte good char limit |
quality_outline_pc | 1 | Double | good_quality_doc lte outline error limit |
quality_rej_pc | 0.08 | Double | good_quality_doc lte rejection limit |
quality_rowrej_pc | 1.1 | Double | good_quality_doc gte good char limit |
rating_scale | 1.5 | Double | Rating scaling factor |
rej_whole_of_mostly_reject_word_fract | 0.85 | Double | if >this fract |
segment_penalty_dict_case_bad | 1.3125 | Double | Default score multiplier for word matches, which may have case issues (lower is better). |
segment_penalty_dict_case_ok | 1.1 | Double | Score multiplier for word matches that have good case (lower is better). |
segment_penalty_dict_frequent_word | 1 | Double | Score multiplier for word matches which have good case and are frequent in the given language (lower is better). |
segment_penalty_dict_nonword | 1.25 | Double | Score multiplier for glyph fragment segmentations which do not match a dictionary word (lower is better). |
segment_penalty_garbage | 1.5 | Double | Score multiplier for poorly cased strings that are not in the dictionary and generally look like garbage (lower is better). |
segsearch_max_char_wh_ratio | 2 | Double | Maximum character width-to-height ratio |
speckle_large_max_size | 0.3 | Double | Max large speckle size |
speckle_rating_penalty | 10 | Double | Penalty to add to worst rating for noise |
stopper_allowable_character_badness | 3 | Double | Max certaintly variation allowed in a word (in sigma) |
stopper_certainty_per_char | -0.5 | Double | Certainty to add for each dict char above small word size. |
stopper_nondict_certainty_base | -2.5 | Double | Certainty threshold for non-dict words |
stopper_phase2_certainty_rejection_offset | 1 | Double | Reject certainty offset |
subscript_max_y_top | 0.5 | Double | Maximum top of a character measured as a multiple of x-height above the baseline for us to reconsider whether it's a subscript. |
superscript_bettered_certainty | 0.97 | Double | What reduction in badness do we think sufficient to choose a superscript over what we'd thought. For example, a value of 0.6 means we want to reduce badness of certainty by at least 40% |
superscript_min_y_bottom | 0.3 | Double | Minimum bottom of a character measured as a multiple of x-height above the baseline for us to reconsider whether it's a superscript. |
superscript_scaledown_ratio | 0.4 | Double | A superscript scaled down more than this is unbelievably small. For example, 0.3 means we expect the font size to be no smaller than 30% of the text line font size. |
superscript_worse_certainty | 2 | Double | How many times worse certainty does a superscript position glyph need to be for us to try classifying it as a char with a different baseline? |
suspect_accept_rating | -999.9 | Double | Accept good rating limit |
suspect_rating_per_ch | 999.9 | Double | Don't touch bad rating limit |
tessedit_certainty_threshold | -2.25 | Double | Good blob limit |
tessedit_class_miss_scale | 0.00390625 | Double | Scale factor for features not used |
tessedit_good_doc_still_rowrej_wd | 1.1 | Double | rej good doc wd if more than this fraction rejected |
tessedit_lower_flip_hyphen | 1.5 | Double | Aspect ratio dot/hyphen test |
tessedit_reject_block_percent | 45 | Double | %rej allowed before rej whole block |
tessedit_reject_doc_percent | 65 | Double | %rej allowed before rej whole doc |
tessedit_reject_row_percent | 40 | Double | %rej allowed before rej whole row |
tessedit_upper_flip_hyphen | 1.8 | Double | Aspect ratio dot/hyphen test |
tessedit_whole_wd_rej_row_percent | 70 | Double | Number of row rejects in whole word rejects which prevents whole row rejection |
test_pt_x | 100000 | Double | xcoord |
test_pt_y | 100000 | Double | ycoord |
textord_ascheight_mode_fraction | 0.08 | Double | Min pile height to make ascheight |
textord_ascx_ratio_max | 1.8 | Double | Max cap/xheight |
textord_ascx_ratio_min | 1.25 | Double | Min cap/xheight |
textord_balance_factor | 1 | Double | Ding rate for unbalanced char cells |
textord_blshift_maxshift | 0 | Double | Max baseline shift |
textord_blshift_xfraction | 9.99 | Double | Min size of baseline shift |
textord_chop_width | 1.5 | Double | Max width before chopping |
textord_descheight_mode_fraction | 0.08 | Double | Min pile height to make descheight |
textord_descx_ratio_max | 0.6 | Double | Max desc/xheight |
textord_descx_ratio_min | 0.25 | Double | Min desc/xheight |
textord_excess_blobsize | 1.3 | Double | New row made if blob makes row this big |
textord_expansion_factor | 1 | Double | Factor to expand rows by in expand_rows |
textord_fp_chop_snap | 0.5 | Double | Max distance of chop pt from vertex |
textord_fp_min_width | 0.5 | Double | Min width of decent blobs |
textord_fpiqr_ratio | 1.5 | Double | Pitch IQR/Gap IQR threshold |
textord_initialasc_ile | 0.9 | Double | Ile of sizes for xheight guess |
textord_initialx_ile | 0.75 | Double | Ile of sizes for xheight guess |
textord_linespace_iqrlimit | 0.2 | Double | Max iqr/median for linespace |
textord_max_pitch_iqr | 0.2 | Double | Xh fraction noise in pitch |
textord_min_blob_height_fraction | 0.75 | Double | Min blob height/top to include blob top into xheight stats |
textord_min_linesize | 1.25 | Double | * blob height for initial linesize |
textord_minxh | 0.25 | Double | fraction of linesize for min xheight |
textord_noise_area_ratio | 0.7 | Double | Fraction of bounding box for noise |
textord_noise_hfract | 0.015625 | Double | Height fraction to discard outlines as speckle noise |
textord_noise_normratio | 2 | Double | Dot to norm ratio for deletion |
textord_noise_rowratio | 6 | Double | Dot to norm ratio for deletion |
textord_noise_sizelimit | 0.5 | Double | Fraction of x for big t count |
textord_noise_sxfract | 0.4 | Double | xh fract width error for norm blobs |
textord_noise_syfract | 0.2 | Double | xh fract height error for norm blobs |
textord_occupancy_threshold | 0.4 | Double | Fraction of neighbourhood |
textord_oldbl_jumplimit | 0.15 | Double | X fraction for new partition |
textord_overlap_x | 0.375 | Double | Fraction of linespace for good overlap |
textord_pitch_rowsimilarity | 0.08 | Double | Fraction of xheight for sameness |
textord_projection_scale | 0.2 | Double | Ding rate for mid-cuts |
textord_skew_ile | 0.5 | Double | Ile of gradients for page skew |
textord_skew_lag | 0.02 | Double | Lag for skew on row accumulation |
textord_spacesize_ratiofp | 2.8 | Double | Min ratio space/nonspace |
textord_spacesize_ratioprop | 2 | Double | Min ratio space/nonspace |
textord_spline_outlier_fraction | 0.1 | Double | Fraction of line spacing for outlier |
textord_spline_shift_fraction | 0.02 | Double | Fraction of line spacing for quad |
textord_tabfind_aligned_gap_fraction | 0.75 | Double | Fraction of height used as a minimum gap for aligned blobs. |
textord_tabfind_vertical_text_ratio | 0.5 | Double | Fraction of textlines deemed vertical to use vertical page mode |
textord_tabvector_vertical_box_ratio | 0.5 | Double | Fraction of box matches required to declare a line vertical |
textord_tabvector_vertical_gap_fraction | 0.5 | Double | max fraction of mean blob width allowed for vertical gaps in vertical text |
textord_underline_offset | 0.1 | Double | Fraction of x to ignore |
textord_underline_threshold | 0.5 | Double | Fraction of width occupied |
textord_underline_width | 2 | Double | Multiple of line_size for underline |
textord_width_limit | 8 | Double | Max width of blobs to make rows |
textord_width_smooth_factor | 0.1 | Double | Smoothing width stats |
textord_words_def_fixed | 0.016 | Double | Threshold for definite fixed |
textord_words_def_prop | 0.09 | Double | Threshold for definite prop |
textord_words_default_maxspace | 3.5 | Double | Max believable third space |
textord_words_default_minspace | 0.6 | Double | Fraction of xheight |
textord_words_default_nonspace | 0.2 | Double | Fraction of xheight |
textord_words_definite_spread | 0.3 | Double | Non-fuzzy spacing region |
textord_words_initial_lower | 0.25 | Double | Max initial cluster size |
textord_words_initial_upper | 0.15 | Double | Min initial cluster spacing |
textord_words_maxspace | 4 | Double | Multiple of xheight |
textord_words_min_minspace | 0.3 | Double | Fraction of xheight |
textord_words_minlarge | 0.75 | Double | Fraction of valid gaps needed |
textord_words_pitchsd_threshold | 0.04 | Double | Pitch sync threshold |
textord_words_width_ile | 0.4 | Double | Ile of blob widths for space est |
textord_wordstats_smooth_factor | 0.05 | Double | Smoothing gap stats |
textord_xheight_error_margin | 0.1 | Double | Accepted variation |
textord_xheight_mode_fraction | 0.4 | Double | Min pile height to make xheight |
tosp_dont_fool_with_small_kerns | -1 | Double | Limit use of xht gap with odd small kns |
tosp_enough_small_gaps | 0.65 | Double | Fract of kerns reqd for isolated row stats |
tosp_flip_caution | 0 | Double | Don't autoflip kn to sp when large separation |
tosp_fuzzy_kn_fraction | 0.5 | Double | New fuzzy kn alg |
tosp_fuzzy_sp_fraction | 0.5 | Double | New fuzzy sp alg |
tosp_fuzzy_space_factor | 0.6 | Double | Fract of xheight for fuzz sp |
tosp_fuzzy_space_factor1 | 0.5 | Double | Fract of xheight for fuzz sp |
tosp_fuzzy_space_factor2 | 0.72 | Double | Fract of xheight for fuzz sp |
tosp_gap_factor | 0.83 | Double | gap ratio to flip sp->kern |
tosp_ignore_big_gaps | -1 | Double | xht multiplier |
tosp_ignore_very_big_gaps | 3.5 | Double | xht multiplier |
tosp_init_guess_kn_mult | 2.2 | Double | Thresh guess - mult kn by this |
tosp_init_guess_xht_mult | 0.28 | Double | Thresh guess - mult xht by this |
tosp_kern_gap_factor1 | 2 | Double | gap ratio to flip kern->sp |
tosp_kern_gap_factor2 | 1.3 | Double | gap ratio to flip kern->sp |
tosp_kern_gap_factor3 | 2.5 | Double | gap ratio to flip kern->sp |
tosp_large_kerning | 0.19 | Double | Limit use of xht gap with large kns |
tosp_max_sane_kn_thresh | 5 | Double | Multiplier on kn to limit thresh |
tosp_min_sane_kn_sp | 1.5 | Double | Don't trust spaces less than this time kn |
tosp_narrow_aspect_ratio | 0.48 | Double | narrow if w/h less than this |
tosp_narrow_fraction | 0.3 | Double | Fract of xheight for narrow |
tosp_near_lh_edge | 0 | Double | Don't reduce box if the top left is non blank |
tosp_old_sp_kn_th_factor | 2 | Double | Factor for defining space threshold in terms of space and kern sizes |
tosp_pass_wide_fuzz_sp_to_context | 0.75 | Double | How wide fuzzies need context |
tosp_rep_space | 1.6 | Double | rep gap multiplier for space |
tosp_silly_kn_sp_gap | 0.2 | Double | Don't let sp minus kn get too small |
tosp_table_fuzzy_kn_sp_ratio | 3 | Double | Fuzzy if less than this |
tosp_table_kn_sp_ratio | 2.25 | Double | Min difference of kn and sp in table |
tosp_table_xht_sp_ratio | 0.33 | Double | Expect spaces bigger than this |
tosp_threshold_bias1 | 0 | Double | how far between kern and space? |
tosp_threshold_bias2 | 0 | Double | how far between kern and space? |
tosp_wide_aspect_ratio | 0 | Double | wide if w/h less than this |
tosp_wide_fraction | 0.52 | Double | Fract of xheight for wide |
words_default_fixed_limit | 0.6 | Double | Allowed size variance |
words_default_fixed_space | 0.75 | Double | Fraction of xheight |
words_default_prop_nonspace | 0.25 | Double | Fraction of xheight |
words_initial_lower | 0.5 | Double | Max initial cluster size |
words_initial_upper | 0.15 | Double | Min initial cluster spacing |
xheight_penalty_inconsistent | 0.25 | Double | Score penalty (0.1 = 10%) added if an xheight is inconsistent. |
xheight_penalty_subscripts | 0.125 | Double | Score penalty (0.1 = 10%) added if there are subscripts or superscripts in a word, but it is otherwise OK. |
ambigs_debug_level | 0 | Integer | Debug level for unichar ambiguities |
applybox_debug | 1 | Integer | Debug level |
applybox_page | 0 | Integer | Page number to apply boxes from |
bidi_debug | 0 | Integer | Debug level for BiDi |
chop_centered_maxwidth | 90 | Integer | Width of (smaller) chopped blobs above which we don't care that a chop is not near the center. |
chop_debug | 0 | Integer | Chop debug |
chop_inside_angle | -50 | Integer | Min Inside Angle Bend |
chop_min_outline_area | 2000 | Integer | Min Outline Area |
chop_min_outline_points | 6 | Integer | Min Number of Points on Outline |
chop_same_distance | 2 | Integer | Same distance |
chop_seam_pile_size | 150 | Integer | Max number of seams in seam_pile |
chop_split_length | 10000 | Integer | Split Length |
chop_x_y_weight | 3 | Integer | X / Y length weight |
classify_adapt_feature_threshold | 230 | Integer | Threshold for good features during adaptive 0-255 |
classify_adapt_proto_threshold | 230 | Integer | Threshold for good protos during adaptive 0-255 |
classify_class_pruner_multiplier | 15 | Integer | Class Pruner Multiplier 0-255: |
classify_class_pruner_threshold | 229 | Integer | Class Pruner Threshold 0-255 |
classify_cp_cutoff_strength | 7 | Integer | Class Pruner CutoffStrength: |
classify_debug_level | 0 | Integer | Classify debug level |
classify_integer_matcher_multiplier | 10 | Integer | Integer Matcher Multiplier 0-255: |
classify_learning_debug_level | 0 | Integer | Learning Debug Level: |
classify_norm_method | 1 | Integer | Normalization Method ... |
classify_num_cp_levels | 3 | Integer | Number of Class Pruner Levels |
crunch_debug | 0 | Integer | As it says |
crunch_leave_lc_strings | 4 | Integer | Don't crunch words with long lower case strings |
crunch_leave_uc_strings | 4 | Integer | Don't crunch words with long lower case strings |
crunch_long_repetitions | 3 | Integer | Crunch words with long repetitions |
crunch_pot_indicators | 1 | Integer | How many potential indicators needed |
crunch_rating_max | 10 | Integer | For adj length in rating per ch |
dawg_debug_level | 0 | Integer | Set to 1 for general debug info, to 2 for more details, to 3 to see all the debug messages |
debug_fix_space_level | 0 | Integer | Contextual fixspace debug |
debug_noise_removal | 0 | Integer | Debug reassignment of small outlines |
debug_x_ht_level | 0 | Integer | Reestimate debug |
devanagari_split_debuglevel | 0 | Integer | Debug level for split shiro-rekha process. |
edges_children_count_limit | 46 | Integer | Max holes allowed in blob |
edges_children_per_grandchild | 9 | Integer | Importance ratio for chucking outlines |
edges_max_children_layers | 4 | Integer | Max layers of nested children inside a character outline |
edges_max_children_per_outline | 15 | Integer | Max number of children inside a character outline |
edges_min_nonhole | 14 | Integer | Min pixels for potential char in box |
edges_patharea_ratio | 40 | Integer | Max lensq/area for acceptable child outline |
fixsp_done_mode | 1 | Integer | What constitues done for spacing |
fixsp_non_noise_limit | 1 | Integer | How many non-noise blbs either side? |
hyphen_debug_level | 0 | Integer | Debug level for hyphenated words. |
jpg_quality | 85 | Integer | Set JPEG quality level |
language_model_debug_level | 0 | Integer | Language model debug level |
language_model_min_compound_length | 3 | Integer | Minimum length of compound words |
language_model_ngram_order | 8 | Integer | Maximum order of the character ngram model |
language_model_viterbi_list_max_num_prunable | 10 | Integer | Maximum number of prunable (those for which PrunablePath() is true) entries in each viterbi list recorded in BLOB_CHOICEs |
language_model_viterbi_list_max_size | 500 | Integer | Maximum size of viterbi lists recorded in BLOB_CHOICEs |
lstm_choice_mode | 0 | Integer | Allows to include alternative symbols choices in the hOCR output. Valid input values are 0, 1, 2 and 3. 0 is the default value. With 1 the alternative symbol choices per timestep are included. With 2 the alternative symbol choices are accumulated per character. |
matcher_debug_flags | 0 | Integer | Matcher Debug Flags |
matcher_debug_level | 0 | Integer | Matcher Debug Level |
matcher_min_examples_for_prototyping | 3 | Integer | Reliable Config Threshold |
matcher_permanent_classes_min | 1 | Integer | Min # of permanent classes |
matcher_sufficient_examples_for_prototyping | 5 | Integer | Enable adaption even if the ambiguities have not been seen |
max_permuter_attempts | 10000 | Integer | Maximum number of different character choices to consider during permutation. This limit is especially useful when user patterns are specified, since overly generic patterns can result in dawg search exploring an overly large number of options. |
min_characters_to_try | 50 | Integer | Specify minimum characters to try during OSD |
min_sane_x_ht_pixels | 8 | Integer | Reject any x-ht lt or eq than this |
multilang_debug_level | 0 | Integer | Print multilang debug info. |
noise_maxperblob | 8 | Integer | Max diacritics to apply to a blob |
noise_maxperword | 16 | Integer | Max diacritics to apply to a word |
ocr_devanagari_split_strategy | 0 | Integer | Whether to use the top-line splitting process for Devanagari documents while performing ocr. |
oldbl_holed_losscount | 10 | Integer | Max lost before fallback line used |
pageseg_devanagari_split_strategy | 0 | Integer | Whether to use the top-line splitting process for Devanagari documents while performing page-segmentation. |
paragraph_debug_level | 0 | Integer | Print paragraph debug info. |
pitsync_fake_depth | 1 | Integer | Max advance fake generation |
pitsync_linear_version | 6 | Integer | Use new fast algorithm |
ptg_pdf_resolution | 300 | Integer | PPI of image in scanned PDF |
quality_min_initial_alphas_reqd | 2 | Integer | alphas in a good word |
repair_unchopped_blobs | 1 | Integer | Fix blobs that aren't chopped |
segsearch_debug_level | 0 | Integer | SegSearch debug level |
segsearch_max_futile_classifications | 20 | Integer | Maximum number of pain point classifications per chunk that did not result in finding a better word choice. |
segsearch_max_pain_points | 2000 | Integer | Maximum number of pain points stored in the queue |
stopper_debug_level | 0 | Integer | Stopper debug level |
stopper_smallword_size | 2 | Integer | Size of dict word to be treated as non-dict word |
superscript_debug | 0 | Integer | Debug level for sub and superscript fixer |
suspect_level | 99 | Integer | Suspect marker level |
suspect_short_words | 2 | Integer | Don't suspect dict wds longer than this |
tessedit_bigram_debug | 0 | Integer | Amount of debug output for bigram correction. |
tessedit_image_border | 2 | Integer | Rej blbs near image edge limit |
tessedit_ocr_engine_mode | 2 | Integer | Which OCR engine(s) to run (Tesseract, LSTM, both). Defaults to loading and running the most accurate available. |
tessedit_page_number | -1 | Integer | -1 -> All pages, else specific page to process |
tessedit_pageseg_mode | 6 | Integer | Page seg mode: 0=osd only, 1=auto+osd, 2=auto_only, 3=auto, 4=column, 5=block_vert, 6=block, 7=line, 8=word, 9=word_circle, 10=char,11=sparse_text, 12=sparse_text+osd, 13=raw_line (Values from PageSegMode enum in publictypes.h) |
tessedit_parallelize | 0 | Integer | Run in parallel where possible |
tessedit_preserve_min_wd_len | 2 | Integer | Only preserve wds longer than this |
tessedit_reject_mode | 0 | Integer | Rejection algorithm |
tessedit_tess_adaption_mode | 39 | Integer | Adaptation decision algorithm for tess |
tessedit_truncate_wordchoice_log | 10 | Integer | Max words to keep in list |
textord_baseline_debug | 0 | Integer | Baseline debug level |
textord_debug_block | 0 | Integer | Block to do debug on |
textord_debug_bugs | 0 | Integer | Turn on output related to bugs in tab finding |
textord_debug_tabfind | 0 | Integer | Debug tab finding |
textord_dotmatrix_gap | 3 | Integer | Max pixel gap for broken pixed pitch |
textord_fp_chop_error | 2 | Integer | Max allowed bending of chop cells |
textord_lms_line_trials | 12 | Integer | Number of linew fits to do |
textord_max_blob_overlaps | 4 | Integer | Max number of blobs a big blob can overlap |
textord_max_noise_size | 7 | Integer | Pixel size of noise |
textord_min_blobs_in_row | 4 | Integer | Min blobs before gradient counted |
textord_min_xheight | 10 | Integer | Min credible pixel xheight |
textord_noise_sizefraction | 10 | Integer | Fraction of size for maxima |
textord_noise_sncount | 1 | Integer | super norm blobs to save row |
textord_noise_translimit | 16 | Integer | Transitions for normal blob |
textord_pitch_range | 2 | Integer | Max range test on pitch |
textord_skewsmooth_offset | 4 | Integer | For smooth factor |
textord_skewsmooth_offset2 | 1 | Integer | For smooth factor |
textord_spline_medianwin | 6 | Integer | Size of window for spline segmentation |
textord_spline_minblobs | 8 | Integer | Min blobs in each spline segment |
textord_tabfind_show_images | 0 | Integer | Show image blobs |
textord_tabfind_show_partitions | 0 | Integer | Show partition bounds, waiting if >1 |
textord_tabfind_show_strokewidths | 0 | Integer | Show stroke widths |
textord_test_x | -2147483647 | Integer | coord of test pt |
textord_test_y | -2147483647 | Integer | coord of test pt |
textord_testregion_bottom | 2147483647 | Integer | Bottom edge of debug rectangle |
textord_testregion_left | -1 | Integer | Left edge of debug reporting rectangle |
textord_testregion_right | 2147483647 | Integer | Right edge of debug rectangle |
textord_testregion_top | -1 | Integer | Top edge of debug reporting rectangle |
textord_words_veto_power | 5 | Integer | Rows required to outvote a veto |
tosp_debug_level | 0 | Integer | Debug data |
tosp_enough_space_samples_for_median | 3 | Integer | or should we use mean |
tosp_few_samples | 40 | Integer | No.gaps reqd with 1 large gap to treat as a table |
tosp_redo_kern_limit | 10 | Integer | No.samples reqd to reestimate for row |
tosp_sanity_method | 1 | Integer | How to avoid being silly |
tosp_short_row | 20 | Integer | No.gaps reqd with few cert spaces to use certs |
user_defined_dpi | 0 | Integer | Specify DPI for input image |
wordrec_debug_level | 0 | Integer | Debug level for wordrec |
wordrec_display_segmentations | 0 | Integer | Display Segmentations |
wordrec_max_join_chunks | 4 | Integer | Max number of broken pieces to associate |
x_ht_acceptance_tolerance | 8 | Integer | Max allowed deviation of blob top outside of font data |
x_ht_min_change | 8 | Integer | Min change in xht before actually trying it |
applybox_exposure_pattern | .exp | String | Exposure value follows this pattern in the image filename. The name of the image files are expected to be in the form [lang].[fontname].exp[num].tif |
chs_leading_punct | ('`" | String | Leading punctuation |
chs_trailing_punct1 | ).,;:?! | String | 1st Trailing punctuation |
chs_trailing_punct2 | )'`" | String | 2nd Trailing punctuation |
classify_font_name | UnknownFont | String | Default font name to be used in training |
classify_learn_debug_str | String | Class str to debug learning | |
conflict_set_I_l_1 | Il1[] | String | Il1 conflict set |
debug_file | String | File to send tprintf output to | |
document_title | String | Title of output document (used for hOCR and PDF output) | |
dotproduct | auto | String | Function used for calculation of dot product |
file_type | .tif | String | Filename extension |
numeric_punctuation | ., | String | Punct. chs expected WITHIN numbers |
ok_repeated_ch_non_alphanum_wds | -?*= | String | Allow NN to unrej |
outlines_2 | ij!?%":; | String | Non standard number of outlines |
outlines_odd | %| | String | Non standard number of outlines |
output_ambig_words_file | String | Output file for ambiguities found in the dictionary | |
page_separator | String | Page separator (default is form feed control character) | |
tessedit_char_blacklist | String | Blacklist of chars not to recognize | |
tessedit_char_unblacklist | String | List of chars to override tessedit_char_blacklist | |
tessedit_char_whitelist | String | Whitelist of chars to recognize | |
tessedit_load_sublangs | String | List of languages to load with this one | |
tessedit_write_params_to_file | String | Write all parameters to the given file. | |
unrecognised_char | | | String | Output char for unidentified blobs |
user_patterns_file | String | A filename of user-provided patterns. | |
user_patterns_suffix | String | A suffix of user-provided patterns located in tessdata. | |
user_words_file | String | A filename of user-provided words. | |
user_words_suffix | String | A suffix of user-provided words located in tessdata. | |
word_to_debug | String | Word for which stopper debug information should be printed to stdout | |
textord_debug_tabfind | 0 | Debug tab finding |
textord_debug_bugs | 0 | Turn on output related to bugs in tab finding |
textord_testregion_left | -1 | Left edge of debug reporting rectangle |
textord_testregion_top | -1 | Top edge of debug reporting rectangle |
textord_testregion_right | 2147483647 | Right edge of debug rectangle |
textord_testregion_bottom | 2147483647 | Bottom edge of debug rectangle |
textord_tabfind_show_partitions | 0 | Show partition bounds, waiting if >1 |
devanagari_split_debuglevel | 0 | Debug level for split shiro-rekha process. |
edges_max_children_per_outline | 16 | Max number of children inside a character outline |
edges_max_children_layers | 4 | Max layers of nested children inside a character outline |
edges_children_per_grandchild | 9 | Importance ratio for chucking outlines |
edges_children_count_limit | 46 | Max holes allowed in blob |
edges_min_nonhole | 14 | Min pixels for potential char in box |
edges_patharea_ratio | 40 | Max lensq/area for acceptable child outline |
textord_fp_chop_error | 2 | Max allowed bending of chop cells |
textord_tabfind_show_images | 0 | Show image blobs |
classify_num_cp_levels | 3 | Number of Class Pruner Levels |
textord_skewsmooth_offset | 4 | For smooth factor |
textord_skewsmooth_offset2 | 1 | For smooth factor |
textord_test_x | -2147483647 | coord of test pt |
textord_test_y | -2147483647 | coord of test pt |
textord_min_blobs_in_row | 4 | Min blobs before gradient counted |
textord_spline_minblobs | 8 | Min blobs in each spline segment |
textord_spline_medianwin | 6 | Size of window for spline segmentation |
textord_max_blob_overlaps | 4 | Max number of blobs a big blob can overlap |
textord_min_xheight | 10 | Min credible pixel xheight |
textord_lms_line_trials | 12 | Number of linew fits to do |
oldbl_holed_losscount | 10 | Max lost before fallback line used |
editor_image_xpos | 590 | Editor image X Pos |
editor_image_ypos | 10 | Editor image Y Pos |
editor_image_menuheight | 50 | Add to image height for menu bar |
editor_image_word_bb_color | 7 | Word bounding box colour |
editor_image_blob_bb_color | 4 | Blob bounding box colour |
editor_image_text_color | 2 | Correct text colour |
editor_dbwin_xpos | 50 | Editor debug window X Pos |
editor_dbwin_ypos | 500 | Editor debug window Y Pos |
editor_dbwin_height | 24 | Editor debug window height |
editor_dbwin_width | 80 | Editor debug window width |
editor_word_xpos | 60 | Word window X Pos |
editor_word_ypos | 510 | Word window Y Pos |
editor_word_height | 240 | Word window height |
editor_word_width | 655 | Word window width |
pitsync_linear_version | 6 | Use new fast algorithm |
pitsync_fake_depth | 1 | Max advance fake generation |
textord_tabfind_show_strokewidths | 0 | Show stroke widths |
textord_dotmatrix_gap | 3 | Max pixel gap for broken pixed pitch |
textord_debug_block | 0 | Block to do debug on |
textord_pitch_range | 2 | Max range test on pitch |
textord_words_veto_power | 5 | Rows required to outvote a veto |
textord_debug_images | 0 | Use greyed image background for debug |
textord_debug_printable | 0 | Make debug windows printable |
stream_filelist | 0 | Stream a filelist from stdin |
textord_space_size_is_variable | 0 | If true, word delimiter spaces are assumed to have variable width, even though characters have fixed pitch. |
textord_tabfind_show_initial_partitions | 0 | Show partition bounds |
textord_tabfind_show_reject_blobs | 0 | Show blobs rejected as noise |
textord_tabfind_show_columns | 0 | Show column bounds |
textord_tabfind_show_blocks | 0 | Show final block bounds |
textord_tabfind_find_tables | 1 | run table detection |
textord_tabfind_show_color_fit | 0 | Show stroke widths |
devanagari_split_debugimage | 0 | Whether to create a debug image for split shiro-rekha process. |
textord_show_fixed_cuts | 0 | Draw fixed pitch cell boundaries |
edges_use_new_outline_complexity | 0 | Use the new outline complexity module |
edges_debug | 0 | turn on debugging for this module |
edges_children_fix | 0 | Remove boxy parents of char-like children |
equationdetect_save_bi_image | 0 | Save input bi image |
equationdetect_save_spt_image | 0 | Save special character image |
equationdetect_save_seed_image | 0 | Save the seed image |
equationdetect_save_merged_image | 0 | Save the merged image |
gapmap_debug | 0 | Say which blocks have tables |
gapmap_use_ends | 0 | Use large space at start and end of rows |
gapmap_no_isolated_quanta | 0 | Ensure gaps not less than 2quanta wide |
textord_heavy_nr | 0 | Vigorously remove noise |
textord_show_initial_rows | 0 | Display row accumulation |
textord_show_parallel_rows | 0 | Display page correlated rows |
textord_show_expanded_rows | 0 | Display rows after expanding |
textord_show_final_rows | 0 | Display rows after final fitting |
textord_show_final_blobs | 0 | Display blob bounds after pre-ass |
textord_test_landscape | 0 | Tests refer to land/port |
textord_parallel_baselines | 1 | Force parallel baselines |
textord_straight_baselines | 0 | Force straight baselines |
textord_old_baselines | 1 | Use old baseline algorithm |
textord_old_xheight | 0 | Use old xheight algorithm |
textord_fix_xheight_bug | 1 | Use spline baseline |
textord_fix_makerow_bug | 1 | Prevent multiple baselines |
textord_debug_xheights | 0 | Test xheight algorithms |
textord_biased_skewcalc | 1 | Bias skew estimates with line length |
textord_interpolating_skew | 1 | Interpolate across gaps |
textord_new_initial_xheight | 1 | Use test xheight mechanism |
textord_debug_blob | 0 | Print test blob information |
textord_really_old_xheight | 0 | Use original wiseowl xheight |
textord_oldbl_debug | 0 | Debug old baseline generation |
textord_debug_baselines | 0 | Debug baseline generation |
textord_oldbl_paradef | 1 | Use para default mechanism |
textord_oldbl_split_splines | 1 | Split stepped splines |
textord_oldbl_merge_parts | 1 | Merge suspect partitions |
oldbl_corrfix | 1 | Improve correlation of heights |
oldbl_xhfix | 0 | Fix bug in modes threshold for xheights |
textord_ocropus_mode | 0 | Make baselines for ocropus |
poly_debug | 0 | Debug old poly |
poly_wide_objects_better | 1 | More accurate approx on wide things |
wordrec_display_all_blobs | 0 | Display Blobs |
wordrec_display_all_words | 0 | Display Words |
wordrec_blob_pause | 0 | Blob pause |
wordrec_display_splits | 0 | Display splits |
textord_tabfind_only_strokewidths | 0 | Only run stroke widths |
textord_tabfind_show_initialtabs | 0 | Show tab candidates |
textord_tabfind_show_finaltabs | 0 | Show tab vectors |
textord_dump_table_images | 0 | Paint table detection output |
textord_show_tables | 0 | Show table regions |
textord_tablefind_show_mark | 0 | Debug table marking steps in detail |
textord_tablefind_show_stats | 0 | Show page stats used in table finding |
textord_tablefind_recognize_tables | 0 | Enables the table recognizer for table layout and filtering. |
textord_all_prop | 0 | All doc is proportial text |
textord_debug_pitch_test | 0 | Debug on fixed pitch test |
textord_disable_pitch_test | 0 | Turn off dp fixed pitch algorithm |
textord_fast_pitch_test | 0 | Do even faster pitch algorithm |
textord_debug_pitch_metric | 0 | Write full metric stuff |
textord_show_row_cuts | 0 | Draw row-level cuts |
textord_show_page_cuts | 0 | Draw page-level cuts |
textord_pitch_cheat | 0 | Use correct answer for fixed/prop |
textord_blockndoc_fixed | 0 | Attempt whole doc/block fixed pitch |
textord_show_initial_words | 0 | Display separate words |
textord_show_new_words | 0 | Display separate words |
textord_show_fixed_words | 0 | Display forced fixed pitch words |
textord_blocksall_fixed | 0 | Moan about prop blocks |
textord_blocksall_prop | 0 | Moan about fixed pitch blocks |
textord_blocksall_testing | 0 | Dump stats when moaning |
textord_test_mode | 0 | Do current test |
textord_pitch_scalebigwords | 0 | Scale scores on big words |
textord_restore_underlines | 1 | Chop underlines and put back |
textord_fp_chopping | 1 | Do fixed pitch chopping |
textord_force_make_prop_words | 0 | Force proportional word segmentation on all rows |
textord_chopper_test | 0 | Chopper is being tested. |
classify_font_name | UnknownFont | Default font name to be used in training |
fx_debugfile | FXDebug | Name of debugfile |
editor_image_win_name | EditorImage | Editor image window name |
editor_dbwin_name | EditorDBWin | Editor debug window name |
editor_word_name | BlnWords | BL normalized word window |
editor_debug_config_file | Config file to apply to single words | |
classify_training_file | MicroFeatures | Training file |
debug_file | File to send tprintf output to | |
textord_underline_threshold | 0.5 | Fraction of width occupied |
edges_childarea | 0.5 | Min area fraction of child outline |
edges_boxarea | 0.875 | Min area fraction of grandchild for box |
textord_fp_chop_snap | 0.5 | Max distance of chop pt from vertex |
gapmap_big_gaps | 1.75 | xht multiplier |
classify_cp_angle_pad_loose | 45 | Class Pruner Angle Pad Loose |
classify_cp_angle_pad_medium | 20 | Class Pruner Angle Pad Medium |
classify_cp_angle_pad_tight | 10 | CLass Pruner Angle Pad Tight |
classify_cp_end_pad_loose | 0.5 | Class Pruner End Pad Loose |
classify_cp_end_pad_medium | 0.5 | Class Pruner End Pad Medium |
classify_cp_end_pad_tight | 0.5 | Class Pruner End Pad Tight |
classify_cp_side_pad_loose | 2.5 | Class Pruner Side Pad Loose |
classify_cp_side_pad_medium | 1.2 | Class Pruner Side Pad Medium |
classify_cp_side_pad_tight | 0.6 | Class Pruner Side Pad Tight |
classify_pp_angle_pad | 45 | Proto Pruner Angle Pad |
classify_pp_end_pad | 0.5 | Proto Prune End Pad |
classify_pp_side_pad | 2.5 | Proto Pruner Side Pad |
textord_spline_shift_fraction | 0.02 | Fraction of line spacing for quad |
textord_spline_outlier_fraction | 0.1 | Fraction of line spacing for outlier |
textord_skew_ile | 0.5 | Ile of gradients for page skew |
textord_skew_lag | 0.02 | Lag for skew on row accumulation |
textord_linespace_iqrlimit | 0.2 | Max iqr/median for linespace |
textord_width_limit | 8 | Max width of blobs to make rows |
textord_chop_width | 1.5 | Max width before chopping |
textord_expansion_factor | 1 | Factor to expand rows by in expand_rows |
textord_overlap_x | 0.375 | Fraction of linespace for good overlap |
textord_minxh | 0.25 | fraction of linesize for min xheight |
textord_min_linesize | 1.25 | * blob height for initial linesize |
textord_excess_blobsize | 1.3 | New row made if blob makes row this big |
textord_occupancy_threshold | 0.4 | Fraction of neighbourhood |
textord_underline_width | 2 | Multiple of line_size for underline |
textord_min_blob_height_fraction | 0.75 | Min blob height/top to include blob top into xheight stats |
textord_xheight_mode_fraction | 0.4 | Min pile height to make xheight |
textord_ascheight_mode_fraction | 0.08 | Min pile height to make ascheight |
textord_descheight_mode_fraction | 0.08 | Min pile height to make descheight |
textord_ascx_ratio_min | 1.25 | Min cap/xheight |
textord_ascx_ratio_max | 1.8 | Max cap/xheight |
textord_descx_ratio_min | 0.25 | Min desc/xheight |
textord_descx_ratio_max | 0.6 | Max desc/xheight |
textord_xheight_error_margin | 0.1 | Accepted variation |
classify_min_slope | 0.414214 | Slope below which lines are called horizontal |
classify_max_slope | 2.41421 | Slope above which lines are called vertical |
classify_norm_adj_midpoint | 32 | Norm adjust midpoint ... |
classify_norm_adj_curl | 2 | Norm adjust curl ... |
oldbl_xhfract | 0.4 | Fraction of est allowed in calc |
oldbl_dot_error_size | 1.26 | Max aspect ratio of a dot |
textord_oldbl_jumplimit | 0.15 | X fraction for new partition |
classify_pico_feature_length | 0.05 | Pico Feature Length |
pitsync_joined_edge | 0.75 | Dist inside big blob for chopping |
pitsync_offset_freecut_fraction | 0.25 | Fraction of cut for free cuts |
textord_tabvector_vertical_gap_fraction | 0.5 | max fraction of mean blob width allowed for vertical gaps in vertical text |
textord_tabvector_vertical_box_ratio | 0.5 | Fraction of box matches required to declare a line vertical |
textord_projection_scale | 0.2 | Ding rate for mid-cuts |
textord_balance_factor | 1 | Ding rate for unbalanced char cells |
textord_wordstats_smooth_factor | 0.05 | Smoothing gap stats |
textord_width_smooth_factor | 0.1 | Smoothing width stats |
textord_words_width_ile | 0.4 | Ile of blob widths for space est |
textord_words_maxspace | 4 | Multiple of xheight |
textord_words_default_maxspace | 3.5 | Max believable third space |
textord_words_default_minspace | 0.6 | Fraction of xheight |
textord_words_min_minspace | 0.3 | Fraction of xheight |
textord_words_default_nonspace | 0.2 | Fraction of xheight |
textord_words_initial_lower | 0.25 | Max inital cluster size |
textord_words_initial_upper | 0.15 | Min initial cluster spacing |
textord_words_minlarge | 0.75 | Fraction of valid gaps needed |
textord_words_pitchsd_threshold | 0.04 | Pitch sync threshold |
textord_words_def_fixed | 0.016 | Threshold for definite fixed |
textord_words_def_prop | 0.09 | Threshold for definite prop |
textord_pitch_rowsimilarity | 0.08 | Fraction of xheight for sameness |
words_initial_lower | 0.5 | Max inital cluster size |
words_initial_upper | 0.15 | Min initial cluster spacing |
words_default_prop_nonspace | 0.25 | Fraction of xheight |
words_default_fixed_space | 0.75 | Fraction of xheight |
words_default_fixed_limit | 0.6 | Allowed size variance |
textord_words_definite_spread | 0.3 | Non-fuzzy spacing region |
textord_spacesize_ratiofp | 2.8 | Min ratio space/nonspace |
textord_spacesize_ratioprop | 2 | Min ratio space/nonspace |
textord_fpiqr_ratio | 1.5 | Pitch IQR/Gap IQR threshold |
textord_max_pitch_iqr | 0.2 | Xh fraction noise in pitch |
textord_fp_min_width | 0.5 | Min width of decent blobs |
textord_underline_offset | 0.1 | Fraction of x to ignore |
ambigs_debug_level | 0 | Debug level for unichar ambiguities |
tessedit_single_match | 0 | Top choice only from CP |
classify_debug_level | 0 | Classify debug level |
classify_norm_method | 1 | Normalization Method ... |
matcher_debug_level | 0 | Matcher Debug Level |
matcher_debug_flags | 0 | Matcher Debug Flags |
classify_learning_debug_level | 0 | Learning Debug Level: |
matcher_permanent_classes_min | 1 | Min # of permanent classes |
matcher_min_examples_for_prototyping | 3 | Reliable Config Threshold |
matcher_sufficient_examples_for_prototyping | 5 | Enable adaption even if the ambiguities have not been seen |
classify_adapt_proto_threshold | 230 | Threshold for good protos during adaptive 0-255 |
classify_adapt_feature_threshold | 230 | Threshold for good features during adaptive 0-255 |
classify_class_pruner_threshold | 229 | Class Pruner Threshold 0-255 |
classify_class_pruner_multiplier | 15 | Class Pruner Multiplier 0-255: |
classify_cp_cutoff_strength | 7 | Class Pruner CutoffStrength: |
classify_integer_matcher_multiplier | 10 | Integer Matcher Multiplier 0-255: |
il1_adaption_test | 0 | Dont adapt to i/I at beginning of word |
dawg_debug_level | 0 | Set to 1 for general debug info, to 2 for more details, to 3 to see all the debug messages |
hyphen_debug_level | 0 | Debug level for hyphenated words. |
max_viterbi_list_size | 10 | Maximum size of viterbi list. |
stopper_smallword_size | 2 | Size of dict word to be treated as non-dict word |
stopper_debug_level | 0 | Stopper debug level |
tessedit_truncate_wordchoice_log | 10 | Max words to keep in list |
fragments_debug | 0 | Debug character fragments |
max_permuter_attempts | 10000 | Maximum number of different character choices to consider during permutation. This limit is especially useful when user patterns are specified, since overly generic patterns can result in dawg search exploring an overly large number of options. |
repair_unchopped_blobs | 1 | Fix blobs that aren't chopped |
chop_debug | 0 | Chop debug |
chop_split_length | 10000 | Split Length |
chop_same_distance | 2 | Same distance |
chop_min_outline_points | 6 | Min Number of Points on Outline |
chop_seam_pile_size | 150 | Max number of seams in seam_pile |
chop_inside_angle | -50 | Min Inside Angle Bend |
chop_min_outline_area | 2000 | Min Outline Area |
chop_centered_maxwidth | 90 | Width of (smaller) chopped blobs above which we don't care that a chop is not near the center. |
chop_x_y_weight | 3 | X / Y length weight |
segment_adjust_debug | 0 | Segmentation adjustment debug |
wordrec_debug_level | 0 | Debug level for wordrec |
wordrec_max_join_chunks | 4 | Max number of broken pieces to associate |
segsearch_debug_level | 0 | SegSearch debug level |
segsearch_max_pain_points | 2000 | Maximum number of pain points stored in the queue |
segsearch_max_futile_classifications | 20 | Maximum number of pain point classifications per chunk thatdid not result in finding a better word choice. |
language_model_debug_level | 0 | Language model debug level |
language_model_ngram_order | 8 | Maximum order of the character ngram model |
language_model_viterbi_list_max_num_prunable | 10 | Maximum number of prunable (those for which PrunablePath() is true) entries in each viterbi list recorded in BLOB_CHOICEs |
language_model_viterbi_list_max_size | 500 | Maximum size of viterbi lists recorded in BLOB_CHOICEs |
language_model_min_compound_length | 3 | Minimum length of compound words |
wordrec_display_segmentations | 0 | Display Segmentations |
tessedit_pageseg_mode | 6 | Page seg mode: 0=osd only, 1=auto+osd, 2=auto, 3=col, 4=block, 5=line, 6=word, 7=char (Values from PageSegMode enum in publictypes.h) |
tessedit_ocr_engine_mode | 0 | Which OCR engine(s) to run (Tesseract, Cube, both). Defaults to loading and running only Tesseract (no Cube,no combiner). Values from OcrEngineMode enum in tesseractclass.h) |
pageseg_devanagari_split_strategy | 0 | Whether to use the top-line splitting process for Devanagari documents while performing page-segmentation. |
ocr_devanagari_split_strategy | 0 | Whether to use the top-line splitting process for Devanagari documents while performing ocr. |
bidi_debug | 0 | Debug level for BiDi |
applybox_debug | 1 | Debug level |
applybox_page | 0 | Page number to apply boxes from |
tessedit_bigram_debug | 0 | Amount of debug output for bigram correction. |
debug_noise_removal | 0 | Debug reassignment of small outlines |
noise_maxperblob | 8 | Max diacritics to apply to a blob |
noise_maxperword | 16 | Max diacritics to apply to a word |
debug_x_ht_level | 0 | Reestimate debug |
quality_min_initial_alphas_reqd | 2 | alphas in a good word |
tessedit_tess_adaption_mode | 39 | Adaptation decision algorithm for tess |
tessedit_test_adaption_mode | 3 | Adaptation decision algorithm for tess |
paragraph_debug_level | 0 | Print paragraph debug info. |
cube_debug_level | 0 | Print cube debug info. |
tessedit_preserve_min_wd_len | 2 | Only preserve wds longer than this |
crunch_rating_max | 10 | For adj length in rating per ch |
crunch_pot_indicators | 1 | How many potential indicators needed |
crunch_leave_lc_strings | 4 | Dont crunch words with long lower case strings |
crunch_leave_uc_strings | 4 | Dont crunch words with long lower case strings |
crunch_long_repetitions | 3 | Crunch words with long repetitions |
crunch_debug | 0 | As it says |
fixsp_non_noise_limit | 1 | How many non-noise blbs either side? |
fixsp_done_mode | 1 | What constitues done for spacing |
debug_fix_space_level | 0 | Contextual fixspace debug |
x_ht_acceptance_tolerance | 8 | Max allowed deviation of blob top outside of font data |
x_ht_min_change | 8 | Min change in xht before actually trying it |
superscript_debug | 0 | Debug level for sub and superscript fixer |
suspect_level | 99 | Suspect marker level |
suspect_space_level | 100 | Min suspect level for rejecting spaces |
suspect_short_words | 2 | Dont Suspect dict wds longer than this |
tessedit_reject_mode | 0 | Rejection algorithm |
tessedit_image_border | 2 | Rej blbs near image edge limit |
min_sane_x_ht_pixels | 8 | Reject any x-ht lt or eq than this |
tessedit_page_number | -1 | -1 -> All pages , else specifc page to process |
tessdata_manager_debug_level | 0 | Debug level for TessdataManager functions. |
tessedit_parallelize | 0 | Run in parallel where possible |
tessedit_ok_mode | 5 | Acceptance decision algorithm |
segment_debug | 0 | Debug the whole segmentation process |
language_model_fixed_length_choices_depth | 3 | Depth of blob choice lists to explore when fixed length dawgs are on |
tosp_debug_level | 0 | Debug data |
tosp_enough_space_samples_for_median | 3 | or should we use mean |
tosp_redo_kern_limit | 10 | No.samples reqd to reestimate for row |
tosp_few_samples | 40 | No.gaps reqd with 1 large gap to treat as a table |
tosp_short_row | 20 | No.gaps reqd with few cert spaces to use certs |
tosp_sanity_method | 1 | How to avoid being silly |
textord_max_noise_size | 7 | Pixel size of noise |
textord_baseline_debug | 0 | Baseline debug level |
textord_noise_sizefraction | 10 | Fraction of size for maxima |
textord_noise_translimit | 16 | Transitions for normal blob |
textord_noise_sncount | 1 | super norm blobs to save row |
use_definite_ambigs_for_classifier | 0 | Use definite ambiguities when running character classifier |
use_ambigs_for_adaption | 0 | Use ambigs for deciding whether to adapt to a character |
allow_blob_division | 1 | Use divisible blobs chopping |
prioritize_division | 0 | Prioritize blob division over chopping |
classify_enable_learning | 1 | Enable adaptive classifier |
tess_cn_matching | 0 | Character Normalized Matching |
tess_bn_matching | 0 | Baseline Normalized Matching |
classify_enable_adaptive_matcher | 1 | Enable adaptive classifier |
classify_use_pre_adapted_templates | 0 | Use pre-adapted classifier templates |
classify_save_adapted_templates | 0 | Save adapted templates to a file |
classify_enable_adaptive_debugger | 0 | Enable match debugger |
classify_nonlinear_norm | 0 | Non-linear stroke-density normalization |
disable_character_fragments | 1 | Do not include character fragments in the results of the classifier |
classify_debug_character_fragments | 0 | Bring up graphical debugging windows for fragments training |
matcher_debug_separate_windows | 0 | Use two different windows for debugging the matching: One for the protos and one for the features. |
classify_bln_numeric_mode | 0 | Assume the input is numbers [0-9]. |
load_system_dawg | 1 | Load system word dawg. |
load_freq_dawg | 1 | Load frequent word dawg. |
load_unambig_dawg | 1 | Load unambiguous word dawg. |
load_punc_dawg | 1 | Load dawg with punctuation patterns. |
load_number_dawg | 1 | Load dawg with number patterns. |
load_bigram_dawg | 1 | Load dawg with special word bigrams. |
use_only_first_uft8_step | 0 | Use only the first UTF8 step of the given string when computing log probabilities. |
stopper_no_acceptable_choices | 0 | Make AcceptableChoice() always return false. Useful when there is a need to explore all segmentations |
save_raw_choices | 0 | Deprecated- backward compatablity only |
segment_nonalphabetic_script | 0 | Don't use any alphabetic-specific tricks.Set to true in the traineddata config file for scripts that are cursive or inherently fixed-pitch |
save_doc_words | 0 | Save Document Words |
merge_fragments_in_matrix | 1 | Merge the fragments in the ratings matrix and delete them after merging |
wordrec_no_block | 0 | Don't output block information |
wordrec_enable_assoc | 1 | Associator Enable |
force_word_assoc | 0 | force associator to run regardless of what enable_assoc is.This is used for CJK where component grouping is necessary. |
fragments_guide_chopper | 0 | Use information from fragments to guide chopping process |
chop_enable | 1 | Chop enable |
chop_vertical_creep | 0 | Vertical creep |
chop_new_seam_pile | 1 | Use new seam_pile |
assume_fixed_pitch_char_segment | 0 | include fixed-pitch heuristics in char segmentation |
wordrec_skip_no_truth_words | 0 | Only run OCR for words that had truth recorded in BlamerBundle |
wordrec_debug_blamer | 0 | Print blamer debug messages |
wordrec_run_blamer | 0 | Try to set the blame for errors |
save_alt_choices | 1 | Save alternative paths found during chopping and segmentation search |
language_model_ngram_on | 0 | Turn on/off the use of character ngram model |
language_model_ngram_use_only_first_uft8_step | 0 | Use only the first UTF8 step of the given string when computing log probabilities. |
language_model_ngram_space_delimited_language | 1 | Words are delimited by space |
language_model_use_sigmoidal_certainty | 0 | Use sigmoidal score for certainty |
tessedit_resegment_from_boxes | 0 | Take segmentation and labeling from box file |
tessedit_resegment_from_line_boxes | 0 | Conversion of word/line box file to char box file |
tessedit_train_from_boxes | 0 | Generate training data from boxed chars |
tessedit_make_boxes_from_boxes | 0 | Generate more boxes from boxed chars |
tessedit_dump_pageseg_images | 0 | Dump intermediate images made during page segmentation |
tessedit_ambigs_training | 0 | Perform training for ambiguities |
tessedit_adaption_debug | 0 | Generate and print debug information for adaption |
applybox_learn_chars_and_char_frags_mode | 0 | Learn both character fragments (as is done in the special low exposure mode) as well as unfragmented characters. |
applybox_learn_ngrams_mode | 0 | Each bounding box is assumed to contain ngrams. Only learn the ngrams whose outlines overlap horizontally. |
tessedit_display_outwords | 0 | Draw output words |
tessedit_dump_choices | 0 | Dump char choices |
tessedit_timing_debug | 0 | Print timing stats |
tessedit_fix_fuzzy_spaces | 1 | Try to improve fuzzy spaces |
tessedit_unrej_any_wd | 0 | Dont bother with word plausibility |
tessedit_fix_hyphens | 1 | Crunch double hyphens? |
tessedit_redo_xheight | 1 | Check/Correct x-height |
tessedit_enable_doc_dict | 1 | Add words to the document dictionary |
tessedit_debug_fonts | 0 | Output font info per char |
tessedit_debug_block_rejection | 0 | Block and Row stats |
tessedit_enable_bigram_correction | 1 | Enable correction based on the word bigram dictionary. |
tessedit_enable_dict_correction | 0 | Enable single word correction based on the dictionary. |
enable_noise_removal | 1 | Remove and conditionally reassign small outlines when they confuse layout analysis, determining diacritics vs noise |
debug_acceptable_wds | 0 | Dump word pass/fail chk |
tessedit_minimal_rej_pass1 | 0 | Do minimal rejection on pass 1 output |
tessedit_test_adaption | 0 | Test adaption criteria |
tessedit_matcher_log | 0 | Log matcher activity |
test_pt | 0 | Test for point |
paragraph_text_based | 1 | Run paragraph detection on the post-text-recognition (more accurate) |
docqual_excuse_outline_errs | 0 | Allow outline errs in unrejection? |
tessedit_good_quality_unrej | 1 | Reduce rejection on good docs |
tessedit_use_reject_spaces | 1 | Reject spaces? |
tessedit_preserve_blk_rej_perfect_wds | 1 | Only rej partially rejected words in block rejection |
tessedit_preserve_row_rej_perfect_wds | 1 | Only rej partially rejected words in row rejection |
tessedit_dont_blkrej_good_wds | 0 | Use word segmentation quality metric |
tessedit_dont_rowrej_good_wds | 0 | Use word segmentation quality metric |
tessedit_row_rej_good_docs | 1 | Apply row rejection to good docs |
tessedit_reject_bad_qual_wds | 1 | Reject all bad quality wds |
tessedit_debug_doc_rejection | 0 | Page stats |
tessedit_debug_quality_metrics | 0 | Output data to debug file |
bland_unrej | 0 | unrej potential with no chekcs |
unlv_tilde_crunching | 1 | Mark v.bad words for tilde crunch |
hocr_font_info | 0 | Add font info to hocr output |
crunch_early_merge_tess_fails | 1 | Before word crunch? |
crunch_early_convert_bad_unlv_chs | 0 | Take out ~^ early? |
crunch_terrible_garbage | 1 | As it says |
crunch_pot_garbage | 1 | POTENTIAL crunch garbage |
crunch_leave_ok_strings | 1 | Dont touch sensible strings |
crunch_accept_ok | 1 | Use acceptability in okstring |
crunch_leave_accept_strings | 0 | Dont pot crunch sensible strings |
crunch_include_numerals | 0 | Fiddle alpha figures |
tessedit_prefer_joined_punct | 0 | Reward punctation joins |
tessedit_write_block_separators | 0 | Write block separators in output |
tessedit_write_rep_codes | 0 | Write repetition char code |
tessedit_write_unlv | 0 | Write .unlv output file |
tessedit_create_txt | 1 | Write .txt output file |
tessedit_create_hocr | 0 | Write .html hOCR output file |
tessedit_create_pdf | 0 | Write .pdf output file |
suspect_constrain_1Il | 0 | UNLV keep 1Il chars rejected |
tessedit_minimal_rejection | 0 | Only reject tess failures |
tessedit_zero_rejection | 0 | Dont reject ANYTHING |
tessedit_word_for_word | 0 | Make output have exactly one word per WERD |
tessedit_zero_kelvin_rejection | 0 | Dont reject ANYTHING AT ALL |
tessedit_consistent_reps | 1 | Force all rep chars the same |
tessedit_rejection_debug | 0 | Adaption debug |
tessedit_flip_0O | 1 | Contextual 0O O0 flips |
rej_trust_doc_dawg | 0 | Use DOC dawg in 11l conf. detector |
rej_1Il_use_dict_word | 0 | Use dictword test |
rej_1Il_trust_permuter_type | 1 | Dont double check |
rej_use_tess_accepted | 1 | Individual rejection control |
rej_use_tess_blanks | 1 | Individual rejection control |
rej_use_good_perm | 1 | Individual rejection control |
rej_use_sensible_wd | 0 | Extend permuter check |
rej_alphas_in_number_perm | 0 | Extend permuter check |
tessedit_create_boxfile | 0 | Output text with boxes |
tessedit_write_images | 0 | Capture the image from the IPE |
interactive_display_mode | 0 | Run interactively? |
tessedit_override_permuter | 1 | According to dict_word |
tessedit_use_primary_params_model | 0 | In multilingual mode use params model of the primary language |
textord_tabfind_show_vlines | 0 | Debug line finding |
textord_use_cjk_fp_model | 0 | Use CJK fixed pitch model |
poly_allow_detailed_fx | 0 | Allow feature extractors to see the original outline |
tessedit_init_config_only | 0 | Only initialize with the config file. Useful if the instance is not going to be used for OCR but say only for layout analysis. |
textord_equation_detect | 0 | Turn on equation detector |
textord_tabfind_vertical_text | 1 | Enable vertical detection |
textord_tabfind_force_vertical_text | 0 | Force using vertical text page mode |
preserve_interword_spaces | 0 | Preserve multiple interword spaces |
include_page_breaks | 0 | Include page separator string in output text after each image/page. |
textord_tabfind_vertical_horizontal_mix | 1 | find horizontal lines such as headers in vertical page mode |
load_fixed_length_dawgs | 1 | Load fixed length dawgs (e.g. for non-space delimited languages) |
permute_debug | 0 | Debug char permutation process |
permute_script_word | 0 | Turn on word script consistency permuter |
segment_segcost_rating | 0 | incorporate segmentation cost in word rating? |
permute_fixed_length_dawg | 0 | Turn on fixed-length phrasebook search permuter |
permute_chartype_word | 0 | Turn on character type (property) consistency permuter |
ngram_permuter_activated | 0 | Activate character-level n-gram-based permuter |
permute_only_top | 0 | Run only the top choice permuter |
use_new_state_cost | 0 | use new state cost heuristics for segmentation state evaluation |
enable_new_segsearch | 0 | Enable new segmentation search path. |
textord_single_height_mode | 0 | Script has no xheight, so use a single mode |
tosp_old_to_method | 0 | Space stats use prechopping? |
tosp_old_to_constrain_sp_kn | 0 | Constrain relative values of inter and intra-word gaps for old_to_method. |
tosp_only_use_prop_rows | 1 | Block stats to use fixed pitch rows? |
tosp_force_wordbreak_on_punct | 0 | Force word breaks on punct to break long lines in non-space delimited langs |
tosp_use_pre_chopping | 0 | Space stats use prechopping? |
tosp_old_to_bug_fix | 0 | Fix suspected bug in old code |
tosp_block_use_cert_spaces | 1 | Only stat OBVIOUS spaces |
tosp_row_use_cert_spaces | 1 | Only stat OBVIOUS spaces |
tosp_narrow_blobs_not_cert | 1 | Only stat OBVIOUS spaces |
tosp_row_use_cert_spaces1 | 1 | Only stat OBVIOUS spaces |
tosp_recovery_isolated_row_stats | 1 | Use row alone when inadequate cert spaces |
tosp_only_small_gaps_for_kern | 0 | Better guess |
tosp_all_flips_fuzzy | 0 | Pass ANY flip to context? |
tosp_fuzzy_limit_all | 1 | Dont restrict kn->sp fuzzy limit to tables |
tosp_stats_use_xht_gaps | 1 | Use within xht gap for wd breaks |
tosp_use_xht_gaps | 1 | Use within xht gap for wd breaks |
tosp_only_use_xht_gaps | 0 | Only use within xht gap for wd breaks |
tosp_rule_9_test_punct | 0 | Dont chng kn to space next to punct |
tosp_flip_fuzz_kn_to_sp | 1 | Default flip |
tosp_flip_fuzz_sp_to_kn | 1 | Default flip |
tosp_improve_thresh | 0 | Enable improvement heuristic |
textord_no_rejects | 0 | Don't remove noise blobs |
textord_show_blobs | 0 | Display unsorted blobs |
textord_show_boxes | 0 | Display unsorted blobs |
textord_noise_rejwords | 1 | Reject noise-like words |
textord_noise_rejrows | 1 | Reject noise-like rows |
textord_noise_debug | 0 | Debug row garbage detector |
m_data_sub_dir | tessdata/ | Directory for data files |
tessedit_module_name | libtesseract304.dll | Module colocated with tessdata dir |
classify_learn_debug_str | Class str to debug learning | |
user_words_file | A filename of user-provided words. | |
user_words_suffix | A suffix of user-provided words located in tessdata. | |
user_patterns_file | A filename of user-provided patterns. | |
user_patterns_suffix | A suffix of user-provided patterns located in tessdata. | |
output_ambig_words_file | Output file for ambiguities found in the dictionary | |
word_to_debug | Word for which stopper debug information should be printed to stdout | |
word_to_debug_lengths | Lengths of unichars in word_to_debug | |
tessedit_char_blacklist | Blacklist of chars not to recognize | |
tessedit_char_whitelist | Whitelist of chars to recognize | |
tessedit_char_unblacklist | List of chars to override tessedit_char_blacklist | |
tessedit_write_params_to_file | Write all parameters to the given file. | |
applybox_exposure_pattern | .exp | Exposure value follows this pattern in the image filename. The name of the image files are expected to be in the form [lang].[fontname].exp[num].tif |
chs_leading_punct | ('`" | Leading punctuation |
chs_trailing_punct1 | ).,;:?! | 1st Trailing punctuation |
chs_trailing_punct2 | )'`" | 2nd Trailing punctuation |
outlines_odd | %| | Non standard number of outlines |
outlines_2 | ij!?%":; | Non standard number of outlines |
numeric_punctuation | ., | Punct. chs expected WITHIN numbers |
unrecognised_char | | | Output char for unidentified blobs |
ok_repeated_ch_non_alphanum_wds | -?*= | Allow NN to unrej |
conflict_set_I_l_1 | Il1[] | Il1 conflict set |
file_type | .tif | Filename extension |
tessedit_load_sublangs | List of languages to load with this one | |
page_separator | Page separator (default is form feed control character) | |
classify_char_norm_range | 0.2 | Character Normalization Range ... |
classify_min_norm_scale_x | 0 | Min char x-norm scale ... |
classify_max_norm_scale_x | 0.325 | Max char x-norm scale ... |
classify_min_norm_scale_y | 0 | Min char y-norm scale ... |
classify_max_norm_scale_y | 0.325 | Max char y-norm scale ... |
classify_max_rating_ratio | 1.5 | Veto ratio between classifier ratings |
classify_max_certainty_margin | 5.5 | Veto difference between classifier certainties |
matcher_good_threshold | 0.125 | Good Match (0-1) |
matcher_reliable_adaptive_result | 0 | Great Match (0-1) |
matcher_perfect_threshold | 0.02 | Perfect Match (0-1) |
matcher_bad_match_pad | 0.15 | Bad Match Pad (0-1) |
matcher_rating_margin | 0.1 | New template margin (0-1) |
matcher_avg_noise_size | 12 | Avg. noise blob length |
matcher_clustering_max_angle_delta | 0.015 | Maximum angle delta for prototype clustering |
classify_misfit_junk_penalty | 0 | Penalty to apply when a non-alnum is vertically out of its expected textline position |
rating_scale | 1.5 | Rating scaling factor |
certainty_scale | 20 | Certainty scaling factor |
tessedit_class_miss_scale | 0.00390625 | Scale factor for features not used |
classify_adapted_pruning_factor | 2.5 | Prune poor adapted results this much worse than best result |
classify_adapted_pruning_threshold | -1 | Threshold at which classify_adapted_pruning_factor starts |
classify_character_fragments_garbage_certainty_threshold | -3 | Exclude fragments that do not look like whole characters from training and adaption |
speckle_large_max_size | 0.3 | Max large speckle size |
speckle_rating_penalty | 10 | Penalty to add to worst rating for noise |
xheight_penalty_subscripts | 0.125 | Score penalty (0.1 = 10%) added if there are subscripts or superscripts in a word, but it is otherwise OK. |
xheight_penalty_inconsistent | 0.25 | Score penalty (0.1 = 10%) added if an xheight is inconsistent. |
segment_penalty_dict_frequent_word | 1 | Score multiplier for word matches which have good case andare frequent in the given language (lower is better). |
segment_penalty_dict_case_ok | 1.1 | Score multiplier for word matches that have good case (lower is better). |
segment_penalty_dict_case_bad | 1.3125 | Default score multiplier for word matches, which may have case issues (lower is better). |
segment_penalty_ngram_best_choice | 1.24 | Multipler to for the best choice from the ngram model. |
segment_penalty_dict_nonword | 1.25 | Score multiplier for glyph fragment segmentations which do not match a dictionary word (lower is better). |
segment_penalty_garbage | 1.5 | Score multiplier for poorly cased strings that are not in the dictionary and generally look like garbage (lower is better). |
certainty_scale | 20 | Certainty scaling factor |
stopper_nondict_certainty_base | -2.5 | Certainty threshold for non-dict words |
stopper_phase2_certainty_rejection_offset | 1 | Reject certainty offset |
stopper_certainty_per_char | -0.5 | Certainty to add for each dict char above small word size. |
stopper_allowable_character_badness | 3 | Max certaintly variation allowed in a word (in sigma) |
doc_dict_pending_threshold | 0 | Worst certainty for using pending dictionary |
doc_dict_certainty_threshold | -2.25 | Worst certainty for words that can be inserted into thedocument dictionary |
wordrec_worst_state | 1 | Worst segmentation state |
tessedit_certainty_threshold | -2.25 | Good blob limit |
chop_split_dist_knob | 0.5 | Split length adjustment |
chop_overlap_knob | 0.9 | Split overlap adjustment |
chop_center_knob | 0.15 | Split center adjustment |
chop_sharpness_knob | 0.06 | Split sharpness adjustment |
chop_width_change_knob | 5 | Width change adjustment |
chop_ok_split | 100 | OK split limit |
chop_good_split | 50 | Good split limit |
segsearch_max_char_wh_ratio | 2 | Maximum character width-to-height ratio |
language_model_ngram_small_prob | 1e-006 | To avoid overly small denominators use this as the floor of the probability returned by the ngram model. |
language_model_ngram_nonmatch_score | -40 | Average classifier score of a non-matching unichar. |
language_model_ngram_scale_factor | 0.03 | Strength of the character ngram model relative to the character classifier |
language_model_ngram_rating_factor | 16 | Factor to bring log-probs into the same range as ratings when multiplied by outline length |
language_model_penalty_non_freq_dict_word | 0.1 | Penalty for words not in the frequent word dictionary |
language_model_penalty_non_dict_word | 0.15 | Penalty for non-dictionary words |
language_model_penalty_punc | 0.2 | Penalty for inconsistent punctuation |
language_model_penalty_case | 0.1 | Penalty for inconsistent case |
language_model_penalty_script | 0.5 | Penalty for inconsistent script |
language_model_penalty_chartype | 0.3 | Penalty for inconsistent character type |
language_model_penalty_font | 0 | Penalty for inconsistent font |
language_model_penalty_spacing | 0.05 | Penalty for inconsistent spacing |
language_model_penalty_increment | 0.01 | Penalty increment |
noise_cert_basechar | -8 | Hingepoint for base char certainty |
noise_cert_disjoint | -1 | Hingepoint for disjoint certainty |
noise_cert_punc | -3 | Threshold for new punc char certainty |
noise_cert_factor | 0.375 | Scaling on certainty diff from Hingepoint |
quality_rej_pc | 0.08 | good_quality_doc lte rejection limit |
quality_blob_pc | 0 | good_quality_doc gte good blobs limit |
quality_outline_pc | 1 | good_quality_doc lte outline error limit |
quality_char_pc | 0.95 | good_quality_doc gte good char limit |
test_pt_x | 100000 | xcoord |
test_pt_y | 100000 | ycoord |
tessedit_reject_doc_percent | 65 | %rej allowed before rej whole doc |
tessedit_reject_block_percent | 45 | %rej allowed before rej whole block |
tessedit_reject_row_percent | 40 | %rej allowed before rej whole row |
tessedit_whole_wd_rej_row_percent | 70 | Number of row rejects in whole word rejectswhich prevents whole row rejection |
tessedit_good_doc_still_rowrej_wd | 1.1 | rej good doc wd if more than this fraction rejected |
quality_rowrej_pc | 1.1 | good_quality_doc gte good char limit |
crunch_terrible_rating | 80 | crunch rating lt this |
crunch_poor_garbage_cert | -9 | crunch garbage cert lt this |
crunch_poor_garbage_rate | 60 | crunch garbage rating lt this |
crunch_pot_poor_rate | 40 | POTENTIAL crunch rating lt this |
crunch_pot_poor_cert | -8 | POTENTIAL crunch cert lt this |
crunch_del_rating | 60 | POTENTIAL crunch rating lt this |
crunch_del_cert | -10 | POTENTIAL crunch cert lt this |
crunch_del_min_ht | 0.7 | Del if word ht lt xht x this |
crunch_del_max_ht | 3 | Del if word ht gt xht x this |
crunch_del_min_width | 3 | Del if word width lt xht x this |
crunch_del_high_word | 1.5 | Del if word gt xht x this above bl |
crunch_del_low_word | 0.5 | Del if word gt xht x this below bl |
crunch_small_outlines_size | 0.6 | Small if lt xht x this |
fixsp_small_outlines_size | 0.28 | Small if lt xht x this |
superscript_worse_certainty | 2 | How many times worse certainty does a superscript position glyph need to be for us to try classifying it as a char with a different baseline? |
superscript_bettered_certainty | 0.97 | What reduction in badness do we think sufficient to choose a superscript over what we'd thought. For example, a value of 0.6 means we want to reduce badness of certainty by at least 40% |
superscript_scaledown_ratio | 0.4 | A superscript scaled down more than this is unbelievably small. For example, 0.3 means we expect the font size to be no smaller than 30% of the text line font size. |
subscript_max_y_top | 0.5 | Maximum top of a character measured as a multiple of x-height above the baseline for us to reconsider whether it's a subscript. |
superscript_min_y_bottom | 0.3 | Minimum bottom of a character measured as a multiple of x-height above the baseline for us to reconsider whether it's a superscript. |
suspect_rating_per_ch | 999.9 | Dont touch bad rating limit |
suspect_accept_rating | -999.9 | Accept good rating limit |
tessedit_lower_flip_hyphen | 1.5 | Aspect ratio dot/hyphen test |
tessedit_upper_flip_hyphen | 1.8 | Aspect ratio dot/hyphen test |
rej_whole_of_mostly_reject_word_fract | 0.85 | if >this fract |
min_orientation_margin | 7 | Min acceptable orientation margin |
textord_tabfind_vertical_text_ratio | 0.5 | Fraction of textlines deemed vertical to use vertical page mode |
textord_tabfind_aligned_gap_fraction | 0.75 | Fraction of height used as a minimum gap for aligned blobs. |
bestrate_pruning_factor | 2 | Multiplying factor of current best rate to prune other hypotheses |
segment_reward_script | 0.95 | Score multipler for script consistency within a word. Being a 'reward' factor, it should be ≤ 1. Smaller value implies bigger reward. |
segment_reward_chartype | 0.97 | Score multipler for char type consistency within a word. |
segment_reward_ngram_best_choice | 0.99 | Score multipler for ngram permuter's best choice (only used in the Han script path). |
heuristic_segcost_rating_base | 1.25 | base factor for adding segmentation cost into word rating.It's a multiplying factor, the larger the value above 1, the bigger the effect of segmentation cost. |
heuristic_weight_rating | 1 | weight associated with char rating in combined cost ofstate |
heuristic_weight_width | 1000 | weight associated with width evidence in combined cost of state |
heuristic_weight_seamcut | 0 | weight associated with seam cut in combined cost of state |
heuristic_max_char_wh_ratio | 2 | max char width-to-height ratio allowed in segmentation |
segsearch_max_fixed_pitch_char_wh_ratio | 2 | Maximum character width-to-height ratio for fixed-pitch fonts |
tosp_old_sp_kn_th_factor | 2 | Factor for defining space threshold in terms of space and kern sizes |
tosp_threshold_bias1 | 0 | how far between kern and space? |
tosp_threshold_bias2 | 0 | how far between kern and space? |
tosp_narrow_fraction | 0.3 | Fract of xheight for narrow |
tosp_narrow_aspect_ratio | 0.48 | narrow if w/h less than this |
tosp_wide_fraction | 0.52 | Fract of xheight for wide |
tosp_wide_aspect_ratio | 0 | wide if w/h less than this |
tosp_fuzzy_space_factor | 0.6 | Fract of xheight for fuzz sp |
tosp_fuzzy_space_factor1 | 0.5 | Fract of xheight for fuzz sp |
tosp_fuzzy_space_factor2 | 0.72 | Fract of xheight for fuzz sp |
tosp_gap_factor | 0.83 | gap ratio to flip sp->kern |
tosp_kern_gap_factor1 | 2 | gap ratio to flip kern->sp |
tosp_kern_gap_factor2 | 1.3 | gap ratio to flip kern->sp |
tosp_kern_gap_factor3 | 2.5 | gap ratio to flip kern->sp |
tosp_ignore_big_gaps | -1 | xht multiplier |
tosp_ignore_very_big_gaps | 3.5 | xht multiplier |
tosp_rep_space | 1.6 | rep gap multiplier for space |
tosp_enough_small_gaps | 0.65 | Fract of kerns reqd for isolated row stats |
tosp_table_kn_sp_ratio | 2.25 | Min difference of kn and sp in table |
tosp_table_xht_sp_ratio | 0.33 | Expect spaces bigger than this |
tosp_table_fuzzy_kn_sp_ratio | 3 | Fuzzy if less than this |
tosp_fuzzy_kn_fraction | 0.5 | New fuzzy kn alg |
tosp_fuzzy_sp_fraction | 0.5 | New fuzzy sp alg |
tosp_min_sane_kn_sp | 1.5 | Dont trust spaces less than this time kn |
tosp_init_guess_kn_mult | 2.2 | Thresh guess - mult kn by this |
tosp_init_guess_xht_mult | 0.28 | Thresh guess - mult xht by this |
tosp_max_sane_kn_thresh | 5 | Multiplier on kn to limit thresh |
tosp_flip_caution | 0 | Dont autoflip kn to sp when large separation |
tosp_large_kerning | 0.19 | Limit use of xht gap with large kns |
tosp_dont_fool_with_small_kerns | -1 | Limit use of xht gap with odd small kns |
tosp_near_lh_edge | 0 | Dont reduce box if the top left is non blank |
tosp_silly_kn_sp_gap | 0.2 | Dont let sp minus kn get too small |
tosp_pass_wide_fuzz_sp_to_context | 0.75 | How wide fuzzies need context |
textord_blob_size_bigile | 95 | Percentile for large blobs |
textord_noise_area_ratio | 0.7 | Fraction of bounding box for noise |
textord_blob_size_smallile | 20 | Percentile for small blobs |
textord_initialx_ile | 0.75 | Ile of sizes for xheight guess |
textord_initialasc_ile | 0.9 | Ile of sizes for xheight guess |
textord_noise_sizelimit | 0.5 | Fraction of x for big t count |
textord_noise_normratio | 2 | Dot to norm ratio for deletion |
textord_noise_syfract | 0.2 | xh fract height error for norm blobs |
textord_noise_sxfract | 0.4 | xh fract width error for norm blobs |
textord_noise_hfract | 0.015625 | Height fraction to discard outlines as speckle noise |
textord_noise_rowratio | 6 | Dot to norm ratio for deletion |
textord_blshift_maxshift | 0 | Max baseline shift |
textord_blshift_xfraction | 9.99 | Min size of baseline shift |