Filter syntax

http://docs.tokenscript.org/Filter.html only examplified the most common filters, such as locality=Sydney.

A filter is used for finding tokens or data objects (such as events, attestations and secrets).

We never formally discussed this issue on what the syntax will be. My default, thoughtless thinking used to be "Just follow RFC2254", (RFC2254: The String Representation of LDAP Search Filters). The reason being that before blockchain was invented, the attestations/token were mostly stored in a directory service such as X.500 which later became LDAP and all technologies passed on to there. (Our concept of Operational Attribute is also borrowed from there.)

RFC2254 look like this:

example meaning
locality=Sydney All tokens' locality being Sydney, sydney or SYDNEY (e.g. AirBnB token)
locality=* All Tokens that have locality attribute
locality=S* All tokens that has locality start with S
dateOfBirth=20200202* Born in 2020-02-02
price<=20832 If attribute 'price' is less than 20832
(&(locality=Sydney)(price<=20832)) (AirBnB token) located in Sydney and price is less than 20832
(|(locality=Sydney)(locality=Paramatta)) located in either Sydney or Paramatta

But thre are shortcomings of RFC2254.

First, Polish notion being easy to parse by Smart Contracts is probably irrevlant to the users in terms of the parsing gas it can save? Bitcoin used some reverse-polish notation that is not adopted by any later blockchain I guess. RFC 2254 documented what was created in the age where (&(locality=Sydney)(price<=20832)) wasn't considered outrages, but today's programmer were mostly born after 90s.

If we wish to encode expressions in Polish or reverse-polish notion for smart contract's parsing, then it seems to be suitable to do it inside cheque/attestation code - that is, the TokenScript file uses SQL-like expression such as locality=Sydney AND price<=20832 or JavaScript-like expression such as locality == Sydney && price <= 20832 and it's translated to polish notion when encoding cheque.

Second, that there are unreasonable restrictiosn such as missing <. You can do price<=20832, or !(price>=20832), but you can't do price<20832. Such restriction is already removed in αW Android/iOS.


Since we are free to invent for the ease of use of our technology, let's say RFC2254 is irrelevant, what would you do?

  • Should it be more SQL like? (so locality CONTAINS Sydney instead of locality=*Sydney*)
  • Should it be more JavaScript-like? (so locality==Sydney instead of locality=Sydney)?
  • Or maybe Polish notion actually is not a burden for developers and also helps the developers to not assume it the same as JS or SQL?

A cunning developer might consider deligating the filtering into a function but I think there are advantages to be language agnostic and stay in the literal level instead of implementation level, similiar to how Google treated the search expression.

  1. NOT operator can be helpful too

  2. doesnt matter format locality==Sydney or locality=Sydney because we will not assign values in filter

  3. locality=*Sydney* , not (CONTAINS Sydney) is better IMHO, because in this case we can create more specific filters, like date=202009* to get all events for September 2020.

I think it's better to use infix notation (i.e. like JavaScript). It's easier to read and test for TokenScript authors, even if it's slightly harder to parse.

Could we still restrict attribute names to be on the left of each expression using binary operators and values on the right? Ethereum event log topics are a constraint we have to work with.

Also different systems support different operators, so naturally, it'd be better to support a minimal set (though support both < and > makes sense).