ioBroker parser adapter
This adapter uses Sentry libraries to automatically report exceptions and code errors to the developers. For more details and for information how to disable the error reporting see Sentry-Plugin Documentation! Sentry reporting is used starting with js-controller 3.0.
This adapter parses data received via URL or from a file, by using regular expressions. For each rule being configured in the settings of this adapter, a state will be created under
parser.<instance number> and filled and updated with the parsed information.
1. Default poll interval
This default poll interval value will be used, if no individual poll interval value is specified for an entry in the configuration table (column: "Interval"). The interval is in milliseconds and defines how often the link or file is being read and the states are being updated.
Note: Do not use a too aggressive poll interval especially for website URLs. For example, if you want to retrieve the price of your shares from a certain website, you probably should be good with an interval of just every 24 hours (= 86400000 ms), if you are not a day trader. If you try to retrieve data from certain URLs too often, the website may ban you and put you on a server blacklist. So please use the poll interval with care.
2. Request timeout
Specify how long the adapter waits for an HTTP response when doing website queries
3. Delay between requests
Specify how long the adapter waits between making HTTP requests when performing remote queries. Useful when retrieving data from slow hosts or over slow connections to avoid overloading either one. Zero (default) means no delay.
This delay is on a per-host basis. If remote queries are configured to fetch from multiple remote hosts each host will be queried in parallel.
The delay is a minimum value between initiating each request. I.e. if a query takes longer than this delay parameter to be read, the next will start instantly the read completes.
4. Accept invalid certificates
Specify if self-signed/invalid SSL/TLS certificates are accepted or declined when doing HTTPS requests
5. Use insecure HTTP parser
Specify to use an insecure HTTP parser that accepts invalid HTTP headers. This may allow interoperability with non-conformant HTTP implementations. Using the insecure parser should be avoided.
Click the "Plus" button to add a new entry to the table.
Performance Note: If you enter the same URL or filename more than once into different table rows, and if the values of the column "Interval" are the same, the content of the URL or filename will be retrieved only once and cached for processing multiple table rows matching URL/filename and Interval. This allows you to apply multiple regex (so multiple table rows) to a single URL or filename, without the need of retrieving the data multiple times from the source.
Name - name of state that is being created under
parser.<instance number>. Spaces are not allowed. You can use dots "." as separator to create sub folders. Example:
Shares.Microsoft.Currentwill result in
URL or file name - either a URL of a website or the path to a file of which we want to retrieve information. Examples
https://darksky.net/forecast/48.1371,11.5754/si24/de(weather information Munich), or
/opt/iobroker/test/testdata.txt(file from within ioBroker).
- RegEx - regular expression, how to extract data from link. There is a good service to test regula expressions: regex101. E.g. temp swip">(-?\d+)˚< for the line above.
- Item (German: "Num") - a regex can find (match) multiple entries. With this option you can define which match to be chosen. 0 = first match, 1 = second match, 2 = third match, etc. Default is 0 (first match).
Role - one of the roles:
- custom - user defines itself via admin the role
- temperature - the value is temperature
- value - the value is a number (e.g. dimmer)
- blinds - the value is a blind position
- switch - the value is switch position (true/false)
- button - the value is a button
- indicator - boolean indicator
- Type - the type of variable per the pull-down menu.
Unit - Optional: unit of the value added to the state entry. E.g.
- Old - If activated, the state will not be updated if the value cannot be read or found in the provided date (URL or file), so it will keep the former value in this case.
- Subs - Optional: substitute URL or file name. This substitute URL/filename will be used if the URL/file name of the first column is not available.
Factor/Offset (for "Type" numbers only) - allows to modify the retrieved data prior to set into the state:
- calculated value = extracted value * factor + offset , to make immediately modifications of value
- Interval - poll interval in ms (milliseconds). If blank or 0, the default poll interval will be used. Please see further information above.
|Name||URL or file name||RegEx||Role||Type||Unit||Interval|
Note: While applying regex to the retrieved URL/file data, all line breaks will be replaced with spaces to allow multi-line search.
About Regular expressions (RegExp)
Regular expressions are a powerful tool to parse and extract certain data from strings, and even more important: it allows to extract certain values/text from a given string (like from the HTML of a webpage, or text from a file) by applying rules.
For boolean types, the regex is rather simple. For numeric types, you should mark the number with brackets - "()". E.g. to extract the number from The temperature is 5°C you should use " (\d+)" expression.
Further information on RegExp:
- .at matches any three-character string ending with "at", including "hat", "cat", and "bat".
- [hc]at matches "hat" and "cat".
- [^b]at matches all strings matched by .at except "bat".
- [^hc]at matches all strings matched by .at other than "hat" and "cat".
- ^[hc]at matches "hat" and "cat", but only at the beginning of the string or line.
- [hc]at$ matches "hat" and "cat", but only at the end of the string or line.
- [.] matches any single character surrounded by "[" and "]" since the brackets are escaped, for example: "[a]" and "[b]".
- s.* matches s followed by zero or more characters, for example: "s" and "saw" and "seed".
- [hc]+at matches "hat", "cat", "hhat", "chat", "hcat", "cchchat", and so on, but not "at".
- [hc]?at matches "hat", "cat", and "at".
- [hc]*at matches "hat", "cat", "hhat", "chat", "hcat", "cchchat", "at", and so on.
- cat|dog matches "cat" or "dog".
- (\d+) - get the number from string
- now (\w+) later - get the word between "now" and "later"
Other useful expressions
- (-?\d+) get number (both negative and positive numbers)
- [+-]?([0-9]+.?[0-9]|.[0-9]+) get a number with decimal places (and . as decimal separator)
- [+-]?([0-9]+,?[0-9]|,[0-9]+) get a number with decimal places (and , as decimal separator)
Values can have quality codes:
- 0 - OK
- 0x82 - The URL or file cannot be read.
- 0x44 - Number or string value not found in the text
- General: ioBroker Forum. German-speaking users: see ioBroker forum thread Parser-Adapter.
- In case of any issues, please check out ioBroker Parser Adapter: GitHub Issues.
- (Apollon77) In error cases return error as string
- (raintonr) added delay option for slow connections
- (bluefox) added compact mode
- (Apollon77) Always use raw response and not try to parse it
- (Apollon77) Allow to specify if self-signed/invalid SSL certificates are ignored or not (default is to ignore as till now)
- (Apollon77) Allow to specify if an "insecure HTTP parser" is used which also enables HTTP implementations that are not compliant to specifications
- (Apollon77) Allow to specify the HTTP request timeout
- (Apollon77) Check that a link is configured
- (Apollon77) Fix potential crash cases reported by Sentry
- (Apollon77) Set method to "GET" when requesting URLs
- (Apollon77) Ignore objects without configuration for parser and log it
- (Apollon77) Fix crash case reported by Sentry
- (Apollon77) if regex did not match set defined replacement value (or null)
- (Apollon77) Fix initialization of new parser objects
- IMPORTANT: js-controller 2.0 is required at least now!
- (Apollon77) ignore self signed ssl certificates
- (Apollon77) make sure object changes do not block further updates of values
- (Apollon77) Add Sentry to get crash reports
- (bluefox) Comma will be replaced automatically by point for the offset and for the factor
- (bluefox) fix parser
- (bluefox) Multi-line search allowed
- (bluefox) Iterations in regex were corrected
- (bluefox) Added additional option: old value
- (bluefox) Allow set the number of found item
- (Apollon77) fix handling of multiple fields for one URL
- (bluefox) fix error with timestamp
- (bluefox) Add visual test
- (bluefox) move to common group
- (bluefox) initial commit
The MIT License (MIT)
Copyright (c) 2017-2022 bluefox email@example.com
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.