Usage
Installation
Write the below command in your terminal to install findanywhere:
pip install findanywhere
Installation is typically quick and does not require any additional setup.
Direct search
Searching in input.csv for data specified in search_data.json, for example can be achieved by. Use –help 1 to display further options. The help output will adapt to the made. Using –source tabular for instance will print all options for tabular files.
findanywhere_search search_data.json input.csv \
--source tabular --threshold constant
--threshold-constant 0.8 --similarity jaro_winkler
Setting Up Schema
Schemas allow to save settings in order to invoke the same search later without further effort. Start by creating a schema for your project. The schema defines the parameters that the tool will use to search and analyze your data. Use the following command to create the schema:
findanywhere_schema tabular string_based_evaluation \
--threshold constant \
--out schema.yml
After creating the schema, customize it as needed. An example of a schema might look like this:
deduction:
config: {}
name: average
evaluation:
config:
aggregate: max
similarity: token_best_fit_similarity
similarity_parameter: {}
name: string_based_evaluation
source:
config:
encoding: utf-8
errors: surrogateescape
name: tabular
threshold:
config:
constant: 0.9
name: constant
Processing Your Data
Once your schema is ready, you can run the tool on your datasets using the following command:
findanywhere schema.yml search_data.json input.csv --out result.json_line
After the operation ends, the filtered results will be saved into a new file (result.json_line), formatted as json lines.
Running The Tool
Use the –out option followed by a file name to define where you want the results to be stored. If no output file is specified, the results will be printed out in the console.
For a complete list of options, use the –help flag:
findanywhere --help