Parser Example: BasicIntegerParser
The BasicIntegerParser
example parses a string of integers separated by non-numeric characters. For a version of this parser that uses continuous load, see Parser Example: ContinuousIntegerParser.
Loading and Using the Example
Load and use the BasicIntegerParser
example as follows.
=> CREATE LIBRARY BasicIntegerParserLib AS '/home/dbadmin/BIP.so'; => CREATE PARSER BasicIntegerParser AS LANGUAGE 'C++' NAME 'BasicIntegerParserFactory' LIBRARY BasicIntegerParserLib; => CREATE TABLE t (i integer); => COPY t FROM stdin WITH PARSER BasicIntegerParser(); 0 1 2 3 4 5 \.
Implementation
The BasicIntegerParser
class implements only the process()
method from the API. (It also implements a helper method for type conversion.) This method processes each line of input, looking for numbers on each line. When it advances to a new line it moves the input.offset
marker and checks the input state. It then writes the output.
virtual StreamState process(ServerInterface &srvInterface, DataBuffer &input, InputState input_state) { // WARNING: This implementation is not trying for efficiency. // It is trying for simplicity, for demonstration purposes. size_t start = input.offset; const size_t end = input.size; do { bool found_newline = false; size_t numEnd = start; for (; numEnd < end; numEnd++) { if (input.buf[numEnd] < '0' || input.buf[numEnd] > '9') { found_newline = true; break; } } if (!found_newline) { input.offset = start; if (input_state == END_OF_FILE) { // If we're at end-of-file, // emit the last integer (if any) and return DONE. if (start != end) { writer->setInt(0, strToInt(input.buf + start, input.buf + numEnd)); writer->next(); } return DONE; } else { // Otherwise, we need more data. return INPUT_NEEDED; } } writer->setInt(0, strToInt(input.buf + start, input.buf + numEnd)); writer->next(); start = numEnd + 1; } while (true); } };
In the factory, the plan()
method is a no-op; there are no parameters to check. The prepare()
method instantiates the parser using the macro provided by the SDK:
virtual UDParser* prepare(ServerInterface &srvInterface, PerColumnParamReader &perColumnParamReader, PlanContext &planCtxt, const SizedColumnTypes &returnType) { return vt_createFuncObject<BasicIntegerParser>(srvInterface.allocator); }
The getParserReturnType()
method declares the single output:
virtual void getParserReturnType(ServerInterface &srvInterface, PerColumnParamReader &perColumnParamReader, PlanContext &planCtxt, const SizedColumnTypes &argTypes, SizedColumnTypes &returnType) { // We only and always have a single integer column returnType.addInt(argTypes.getColumnName(0)); }
As for all UDxs written in C++, the example ends by registering its factory:
RegisterFactory(BasicIntegerParserFactory);