For brevity, in the below documentation we're assuming that you are using an alias:
alias seesv="sh /path/to/seesv/seesv.sh"The examples here show the command line use. If running interactively through the RAMADDA web interface the commands are the same form except for the seesv command call and the specification of the input file. So, for example, if you are calling the tool from the command line as:
seesv -columns column1,column2 -p file.csv > output.csvThe corresponding commands interactively would be:
-columns column1,column2
seesv <a number of processing commands> <some output command> file.csv > output.csv #or seesv <a number of processing commands> <some output command> -o output.csv file.csv #To show help: seesv -help #To show help on a topic: seesv -help:some_topic #With no processing commands and just printing the header: seesv -printheader file.csv #Or a shortcut seesv -ph file.csv #With no processing commands and csv output: seesv -p file.csv > output.csv #With no processing commands and a record oriented output: seesv -record file.csv > output.csv #With the -columns command printing csv output seesv -columns column1,column2 -p file.csv > output.csv
column 1, Column 2, Some "funny" column ...Then the column ids are:
column_1,column_2,some_funny_columnYou can always use the -ids command to list the ids
seesv -ids -printheader test.csv
#### selecting columns #Subset test.csv and only include column1 and column2 seesv -columns column1,column2 -p test.csv > newtest.csv #You can also use numeric indices - zero based seesv -columns 0,1 -p test.csv > newtest.csv #Specify a range of columns #either using column names: seesv -columns column1-columnN -p test.csv > newtest.csv #or indices: seesv -columns 1-5,6,7,10-15 -p test.csv > newtest.csv #or mix and match numeric and names seesv -columns 0-1,columnA,columnB,columnF-columnZ -p test.csv > newtest.csv #Specify a step: #seesv -columns start-end:step #e.g. below gives 0,3,6,9,12,15 seesv -columns 0-15:3 #include all columns except column1 seesv -notcolumns column1 -p test.csv > newtest.csv #subset columns and then change the values seesv -columns column1,column2 -change column1,column2 "pattern" "new value" -p test.csv > newtest.csv #Only include column1 and then only pass through unique values matching exactly seesv -columns column1 -unique 0 exact -p test.csv > newtest.csv #Only pass throught unique values in the first column and then sort the output seesv -columns column1 -unique 0 -sort column1 -p test.csv > newtest.csv
nameXXXvalue1XXXXXvalue2XXXXX foo 1 2 bar 3 4You would set the -widths to be:
{ "p1": { "p2": { "values": [ 1,2,3,4 ] } } }The command would be:
1 |
2 |
3 |
4 |
{ "p1": [ {"foo":"bar"}, { "p2": { "values": [ "1","2","3","4" ] } } ] }The command would be:
{ "p1": { "p2": { "values": [ { "name":"jim", "value":5 }, { "name":"jill", "value":10 } ] } } }The command is:
name | value |
jim | 5 |
jill | 10 |
{ "p1": { "p2": { "values": [ { "foo":"bar", "goodvalues": { "name":"jim", "value":5 } }, { "foo":"bar", "goodvalues": { "name":"jill", "value":10 } }, ] } } }You would use the second argument to specify an inner path to the object of interest:
{ "p1": { "p2": { "values": [ { "foo":"bar", "goodvalues": { "name":"jim", "value":5 }, "moregoodvalues": { "age":74, } }, { "foo":"bar", "goodvalues": { "name":"jill", "value":10 }, "moregoodvalues": { "age":80, } }, ] } } }Then the second argument can be a comma separated list of path specifications:
seesv -json "p1.p2.values" "goodvalues,moregoodvalues"Resulting in:
age | name | value |
74 | jim | 5 |
80 | jill | 10 |
export RAMADDA_TABULA=/path/to/tabula.sh
The properties are defined as:
Assume you have a file test.txt
Hello there how are you Person: name: Jim age: 42 and some more text Person: name: Joe age: 53 Person: name: Sally age: 93 and moreWe are going to extract the names and age with:
Assume you have a file test.txt
Person: name: Joe age: 53 Person: name: Sally age: 93Extract the data with:
export seesv_db_<db>_url=JDBC URL export seesv_db_<db>_tables=List of tables that can be accessedSo for the db "test" in a Derby database:
export seesv_db_test_url=jdbc:derby:/Users/home/derby/db1;create=true; export seesv_db_test_tables=test_table1,test_table2,test_table3
name | value |
example name | example value |
example name | example value |
example name | example value |
example name | example value |
example name | example value |
name | value |
<blank> | <blank> |
<blank> | <blank> |
<blank> | <blank> |
<blank> | <blank> |
<blank> | <blank> |
name1 | value1 |
foo | 1 |
bar | 2 |
name2 | value2 | value3 |
xxx | 1 | 3 |
yyy | 2 | 7 |
name1 | value1 | name2 | value2 | value3 |
foo | 1 | xxx | 1 | 3 |
bar | 1 | yyy | 2 | 7 |
name | value |
foo | bar |
name | value |
x | y |
name | value |
l | m |
name | value |
foo | bar |
x | y |
l | m |
value:joey smith 95 joe smith value:sam jones 44 joe smith 42 joey smith value:sam jones 44 joe smith 42 joey smith value:sue washington 43 joe smith 42 joey smith 35 sam jones value:sue washington 43 joe smith 42 joey smith 35 sam jones
name | value1 | value2 |
jane | foo | 1 |
jane | bar | 2 |
john | doe | 3 |
bill | doe | 4 |
bill | doe | 4 |
name | value1 | value2 |
jane | foo | 1 |
jane | bar | 2 |
john | doe | 3 |
name | value1 | value2 |
jane | foo | 1 |
jane | bar | 2 |
john | doe | 3 |
bill | doe | 4 |
bill | doe | 4 |
name | value1 | value2 |
bill | doe | 4 |
bill | doe | 4 |
precinct |
p1 |
p2 |
p3 |
name | voter precinct |
jim | p1 |
joe | p1 |
sally | p7 |
... |
For example, say you have a file of addresses short.csv:
address | city | state |
123 oak | boulder | co |
345 main | denver | co |
... |
full address | city | state |
123 oak unit #5 | ||
789 pine apt 3 | ||
345 main | ||
... |
^123 oak.* ^345 main.* ...And then only passes through the rows in full.csv that match one of these patterns.
The command -ifnotmatchesfile does the opposite. For a row to pass through none of the patterns can match.
name | value |
joe | 5 |
jill | 2 |
jack | 1 |
jim | 7 |
name | value | name | value |
joe | 5 | jill | 5 |
jack | 1 | jim | 7 |
category | category | category |
fruit | vegetable | meat |
1 | 2 | 3 |
4 | 5 | 6 |
category - fruit | category - vegetable | category - meat |
1 | 2 | 3 |
4 | 5 | 6 |
letter |
a |
b |
c |
d |
number |
1 |
2 |
3 |
letter | number |
a | 1 |
a | 2 |
a | 3 |
b | 1 |
b | 2 |
b | 3 |
c | 1 |
c | 2 |
c | 3 |
d | 1 |
d | 2 |
d | 3 |
label |
xxxsome valueyyy |
label |
label:some value |
pattern1::replace_string1 pattern2::replace_string2e.g.:
(?i).*foo.*::foobar .*value.*::new value
name | value1 | value2 |
jane | foo | 1 |
jane | bar | 2 |
john | doe | 3 |
bill | doe | 4 |
bill | doe | 4 |
name | value1 | value2 |
jane | foo | 1 |
<blank> | bar | 2 |
john | doe | 3 |
bill | <blank> | 4 |
<blank> | <blank> | 4 |
export NUMLOOKUPAPI_API_KEY= or export NUMVERIFY_API_KEY=This command is only available when running from the command line or if running within RAMADDA the user is a logged in user. We do this as the SeeSV service within RAMADDA can be run by any user.
A cache file, ismobile.txt, of past phone numbers will be created in the local working directory so this command can be run multiple times without resorting to the API calls for past phone numbers.
To enable this command create an account at Twilio and access your API keys. Add the following environment variables:
export TWILIO_ACCOUNT_SID= export TWILIO_AUTH_TOKEN= export TWILIO_PHONE=This command takes a column ID which holds the phone number, a campaign identifier and a message template. The campaign identifier is used to create a list of phone numbers that a message has already been sent to. For example if you first run:
Some header label with some punctuation-+/ | Some other header label |
... |
some_header_label_with_some_punctuation | some_other_header_label |
... |
firstname fullname lastname name namewithmiddle prefix suffix title username address city country state stateabbr streetname timezone zipcode latitude longitude countrycode boolean asin ean13 ean8 gtin13 gtin8 imei isbn10 isbn13 isbngroup isbngs1 isbnregistrant color department material price productname promotioncode demonym educationalattainment maritalstatus race sex bic creditcard iban ssn digit digits:number_of_digits numberbetween:first:last randomdigit randomdigitnotzero randomDouble:maxNumberOfDecimals:min:max randomnumber cellphone phonenumber diseasename hospitalname medicinename symptoms
v2 = (value+offset1)*scale + offsetYou can specify random values for any of the values with:
#Generate a random number between 0-1 random: #Generate a random number between 10 and 11 random:10 #Generate a random number between 10 and 30 random:10:30For example, if you had a latitude value that you wanted to obfuscate by adding some random value between 1 and 2 you would do:
-scale latitude random:1:2 1 0
The num_random_digits is used to generate a number with those random digits.
If the num_places argument is <= zero then
that is the number of decimal places the number is rounded to.
If there are fewer than num_places decimals than those decimals are converted to "0".
The randomized digits are then appended to the number.
For example with R = random digit:
12345.6789 -> 12345.67RR 145.678 -> 145.67RR 9.0 -> 9.RR
If num_places > 0 then the number is converted to an integer and
the num_places least significant digits are converted to 0.
If the number of digits is < num_places then the number is converted to 0.
The randomized digits are then addedto the number.
For example with R = random digit:
12345.678 -> 123RR 145.678 -> 1RR 1.678 -> RR
Region | Date | Units | Sales |
West | 2016 | 1 | 11.00 |
South | 2016 | 8 | 96.00 |
West | 2016 | 2 | 26.00 |
North | 2016 | 7 | 84.00 |
North | 2016 | 8 | 104.00 |
South | 2016 | 2 | 22.00 |
Region | 2016 | 2017 | 2018 | 2019 |
West | 15.0 | 51.0 | 49.0 | 42.0 |
South | 25.0 | 23.0 | 31.0 | 14.0 |
North | 15.0 | 27.0 | 56.0 | 39.0 |
East | 10.0 | 24.0 | 18.0 | 13.0 |
Region | 2016 - count | 2016 - min | 2016 - max | 2017 - count | 2017 - min | 2017 - max | 2018 - count | 2018 - min | 2018 - max | 2019 - count | 2019 - min | 2019 - max |
West | 15.0 | 11.0 | 96.0 | 51.0 | 11.0 | 104.0 | 49.0 | 11.0 | 112.0 | 42.0 | 13.0 | 120.0 |
South | 25.0 | 12.0 | 96.0 | 23.0 | 11.0 | 104.0 | 31.0 | 11.0 | 120.0 | 14.0 | 12.0 | 112.0 |
North | 15.0 | 12.0 | 104.0 | 27.0 | 12.0 | 96.0 | 56.0 | 11.0 | 120.0 | 39.0 | 13.0 | 120.0 |
East | 10.0 | 24.0 | 96.0 | 24.0 | 11.0 | 104.0 | 18.0 | 11.0 | 112.0 | 13.0 | 12.0 | 105.0 |
Region | Color | Units | Sales |
West | Red | 1 | 11.00 |
South | Blue | 8 | 96.00 |
West | Green | 2 | 26.00 |
North | Blue | 7 | 84.00 |
North | Green | 8 | 104.00 |
South | Red | 2 | 22.00 |
East | Blue | 5 | 60.00 |
West | Green | 2 | 26.00 |
Region | Units sum |
West | 683.0 |
South | 399.0 |
North | 672.0 |
East | 294.0 |
Region | Sales sum | Sales avg |
West | 8873.0 | 56.51592356687898 |
South | 5044.0 | 54.236559139784944 |
North | 8709.0 | 63.56934306569343 |
East | 3730.0 | 57.38461538461539 |
Region | Sales sum | Sales avg | Color |
West | 8873.0 | 56.51592356687898 | Red |
South | 5044.0 | 54.236559139784944 | Blue |
North | 8709.0 | 63.56934306569343 | Blue |
East | 3730.0 | 57.38461538461539 | Blue |
bins="18,30,40,50,60,70"This results in bins of:
0-18 18-30 30-40 40-50 etc.You can also specify a range with an optional step value. For example this:
bins="18-30"Will give you the bins:
18-19 19-20 20-21 ... 29-30 30-infinityOr you can specify a step
bins="18-30:2"Will give you the bins:
18-20 20-22 22-24 ...
To use Google geocoding set the environment variable:
GOOGLE_API_KEY=...To use geocod.io set the environment variable:
GEOCIDEIO_API_KEY=...For here.com set:
HERE_API_KEY=To geocode, e.g., with the following file:
city | state |
Boulder | CO |
Laramie | WY |
Buffalo | NY |
To use internal location tables set the prefix argument to one of:
seesv -geocode columns "county:" "" ...
seesv -geocode columns "state:" "" ...
seesv -geocode columns "zip:" "" ...
seesv -geocode columns "country:" "" ...
city |
Boulder |
Denver |
Grand Junction |
Pueblo |
precinct | latitude | longitude |
4171107648 | 40.197055 | -105.100351 |
2171207308 | 39.996986 | -105.084938 |
2173307410 | 40.039912 | -105.081393 |
name | precinct |
John Doe | 4171107648 |
Jane Doe | 4171107648 |
Jim Smith | 2171207308 |
latitude | longitude |
40° 1' 12 N | 107° 40' 10 W |
40:1:12 | 107:40:10 |
40:1:12 N | 107:40:10 W |
40:1 | 107:40 |
latitude | longitude |
40.0 | -107.0 |
40.019999999999996 | 107.66944444444445 |
40.019999999999996 | -107.66944444444445 |
40.016666666666666 | 107.66666666666667 |
latitude | longitude |
40.0 | -107.0 |
40.02 | 107.66944 |
40.02 | -107.66944 |
40.01667 | 107.66667 |
name | latitude | longitude |
City 1 | 40.197055 | -105.100351 |
City 2 | 39.996986 | -105.084938 |
City 3 | 40.039912 | -105.081393 |
name | latitude | longitude |
City 1 | 40.197055 | -105.100351 |
City 2 | 39.996986 | -105.084938 |
City 3 | 40.039912 | -105.081393 |
Alaska=West AK=West Alabama=South AL=South Arkansas=South ...So with the following data:
name Wyoming California ColoradoThe region is added with:
PRECISELY_API_KEY=Call:
some_really_long_field_name |
1 |
2 |
seesv_ext_<id>=/path/to/executableAny command line arguments given up to but not including an argument that equals "-" are passed in to the executable. The executable is called with the given arguments. Each row that is processed is converted into a CSV string which is passed to the external command on the stdin. A CSV text string is then read from the external command. This line is then passed on in the processing flow. If you want to skip a line the return the string "_null_".
seesv_exec_<id>=/path/to/executableAny command line arguments given up to but not including an argument that equals "-" are passed in to the executable. For every row the executable is called with the given arguments. Each of the arguments can contain any number of macros of the form:
seesv -exec <id> ${url} ${name}The macros get replaced with the corresponding values in the given row.
So for example, if you have a list of urls that you want to fetch using cURL for set the environment variable:
export seesv_exec_curl=/usr/bin/curlThen assuming you have the file:
url url1.html url2.htmlCall the command
seesv -exec curl ${url} - -pThis produces a new file:
url,result url1.html,contents of url1 url2.html,contents of url2
name | value |
joe | 5 |
jill | 2 |
bill | 7 |
PREFIX name:joe value:5 ROW DELIMITER name:jill value:2 ROW DELIMITER name:bill value:7 SUFFIXNote: the row delimiter is only printed between rows.
If you wanted to generated custom XML, for example, call:
<mytags> <tag name="joe" value="5"></tag> <tag name="jill" value="2"></tag> <tag name="bill" value="7"></tag> </mytags>
The format is
The min1;max1;number1 are the min/max range and the number of steps the range is divided by for each column value.
The output file is a template to use to create the destination output file. You can use "${ikey}" for the indices or ${vkey} for the values.
For example, you can use this to do geographic tiling of a set of input data
that contains latitude and longitude. If your input.csv file has latitude and longitude
columns the below command:
grid116_102.csv grid116_103.csv grid119_84.csv grid119_85.csv grid119_86.csv ...The grid${ikey}.csv is the output file name macro where ${ikey} is made up of the indices. Using grid${vkey}.csv would give the following file names based on the values of the ranges:
grid30_31_-92_-91.csv grid39_40_-76_-75.csv grid38_39_-76_-75.csv ...
#fields=field1[prop1=value1 prop2=value2 ...],field2[],...,fieldN[propn=valuen ...] v1,v2,v3 ...The addheader command is run as below passing in one set of name/value arguments. If the value has spaces then group the text with {...} brackets.
For example, say you have a file like this:
some date | year | some field |
2018-09-01 | 2023 | 10 |
2019-10-01 | 2024 | 20 |
2020-09-01 | 2024 | 30 |
#fields=some_date[label="Some Date" type="date" format="yyyy-MM-dd" ] ,year[label="Year" type="date" format="yyyy" ] ,temperature[label="Some Field" type="integer" chartable="true" ] 2018-09-01,2023,10 2019-10-01,2024,20This specifies:
value1,value2,value3 10,20,30You can specify the type as "value.*.type integer".
Adding a grouping to the pattern can be used to pull out text from the field name and use it in the value. For example, below would set the label:
The -db command takes the form:
-db "name value pairs" e.g.: -db " table.id <new id> table.name <new name> table.cansearch false table.canlist false table.icon <icon>, e.g., /db/database.png <column>.id <new id for column> <column>.label <new label> lt;column>.type <string|enumeration|double|int|date> <column>.format <yyyy MM dd HH mm ss format for dates> <column>.canlist false <column>.cansearch false install <true|false install the new db table> nukedb <true|false> careful! this deletes any prior created dbs yesreallynukethewholedb true - this double checks "In the interactive Convert Data SeeSV form if you press "Table" then the schema will be shown. Clicking on the column links brings up a popup menu that sets the id, label, type, etc for the column.
If you have the install flag defined as true then when you hit "Process" the db.xml file will be generated and the plugin will be loaded into RAMADDA. You can then go and add a new entry of the Database name from the "Pick a type..." menu
If you have nukedb and the yesreallynukethewholedb flags set to true then when you press Process the database table will first be dropped and then recreated. This capability is there because you can iteratively change the type of the columns. Note: any database entry and data that has been loaded will be removed.
fruit,Fruit,enumeration,values,banana\,apple\,orange
id,label,type,prop1,value1,prop2,value2,... e.g.: #A String value some_field,Some Field,string #An integer value some_number,int,Some Integer Column,default=30,showinhtml,false #A double value some_double,double,Some Double number,cansearch,false #An enumeration. Use \n to escape the commas fruit,Fruit,enumeration,values,banana\,apple\,orange #An enumeration that can also be added new values for fruit_plus,Fruit Plus,enumerationplus,values,banana\,apple\,orange #Use id:label in the values sport,Sport,enumeration,values,baseball:Baseball\,football:Football\,hockey:HockeyThe id will be the column identifier in the database so it should be lower case Press Outputs-Print text output to generate the mytypes.xml file. Download it and copy it to your RAMADDA Home/plugins directory and restart your RAMADDA. The types.xml has a default wiki text that you can change