Usage

Installation

Write the below command in your terminal to install findanywhere:

pip install findanywhere

Installation is typically quick and does not require any additional setup.

Direct search

Searching in input.csv for data specified in search_data.json, for example can be achieved by. Use –help 1 to display further options. The help output will adapt to the made. Using –source tabular for instance will print all options for tabular files.

findanywhere_search search_data.json input.csv  \
--source tabular --threshold constant
--threshold-constant 0.8 --similarity jaro_winkler

Setting Up Schema

Schemas allow to save settings in order to invoke the same search later without further effort. Start by creating a schema for your project. The schema defines the parameters that the tool will use to search and analyze your data. Use the following command to create the schema:

findanywhere_schema tabular string_based_evaluation \
--threshold constant \
--out schema.yml

After creating the schema, customize it as needed. An example of a schema might look like this:

deduction:
  config: {}
  name: average
evaluation:
  config:
    aggregate: max
    similarity: token_best_fit_similarity
    similarity_parameter: {}
  name: string_based_evaluation
source:
  config:
    encoding: utf-8
    errors: surrogateescape
  name: tabular
threshold:
  config:
    constant: 0.9
  name: constant

Processing Your Data

Once your schema is ready, you can run the tool on your datasets using the following command:

findanywhere schema.yml search_data.json input.csv --out result.json_line

After the operation ends, the filtered results will be saved into a new file (result.json_line), formatted as json lines.

Running The Tool

Use the –out option followed by a file name to define where you want the results to be stored. If no output file is specified, the results will be printed out in the console.

For a complete list of options, use the –help flag:

findanywhere --help