I’ve started with a side project, making a “smart” chatterbot, which should be able to respond to given commands from other users on IRC (Internet Relay Chat).
The first hurdle is to make the bot connect to a network and stay connected. In order to do that, it needs to understand the IRC format. The IRC RFC started with version RFC 1459, and was later separated into different specifications. RFC 2810 (architecture), RFC 2811 (channel), RFC 2812 (client) and RFC 2813 (server).
All code in this article is available from the SmartAss GitHub repository with a very permissible license (MIT). The actual IRC code is in it’s own library. This IRC library is designed like an engine. There is no connection management. The user program is expected to provide the inputs and outputs (from network, or file or whatever). Most communication with this library will be with events / callback.
To begin with, if you have some some knowledge of IRC and the protocol, you can start with the IRC client document (RFC 2812) for making a client.
But before we get there, we must able to break down each message from the server, and make something meaningful out of it. To do that, we need a parser.
The specification for the IRC protocol states, from a continuous stream of octets, separated by carriage return
+ newline
,(rn
), extract data.
In other words, from an incoming buffer of characters, grab each line and parse them. Sounds easy, and it is.
When receiving data from a socket, you get some bytes and the length of the bytes. It’s not guaranteed that those bytes will be a full message, or many messages, or any message at all. But for brevity, we’ll assume that for now.
Some libraries contain a string split method, but the bulk of it are designed to delimit by one or more delimiters in a delimiter string. This is the opposite of what we want. We want to split strings by rn
, and only rn
, as the IRC specification tells us. It’s not too difficult to roll our own. Put this in a header somewhere and include in when necessary.
// Not completely tested, use with caution ... template<typename STR> void split(std::vector<STR> & out, const STR & in, const STR & delimiter) { const auto npos = STR::npos; const auto delsize = delimiter.size(); size_t offset = 0; size_t endpos = 0; size_t len = 0; do { endpos = in.find(delimiter, offset); STR tmp; if (endpos != npos) { len = endpos - offset; tmp = in.substr(offset, len); out.push_back(tmp); // Prepare next round offset = endpos + delsize; } else { // Final, or nothing found tmp = in.substr(offset); out.push_back(tmp); break; } } while (endpos != npos); }
Then, for each line which is not empty, call this method to create an IrcMessage. An IrcMessage is an object containing the decoded data in and easy to use interface.
The IRC format is easy (for a human) to understand. It’s not too simple to elegantly parse the data efficiently. I’m not saying I’ve done so, but here is my attempt.
IrcMessage IrcParser::parseLine(const std::string & message) { if (message.empty()) { // Garbage in, garbage out return IrcMessage(); }
If there is no data, then just silently ignore it.
// https://tools.ietf.org/html/rfc1459 -- Original specification // https://tools.ietf.org/html/rfc2810 -- Architecture specfication // https://tools.ietf.org/html/rfc2811 -- Channel specification // https://tools.ietf.org/html/rfc2812 -- Client specification // https://tools.ietf.org/html/rfc2813 -- Server specification // // <message> ::= [':' <prefix> <SPACE> ] <command> <params> <crlf> // <prefix> ::= <servername> | <nick> [ '!' <user> ] [ '@' <host> ] // <command> ::= <letter> { <letter> } | <number> <number> <number> // <SPACE> ::= ' ' { ' ' } // <params> ::= <SPACE> [ ':' <trailing> | <middle> <params> ] // // <middle> ::= <Any *non-empty* sequence of octets not including SPACE // or NUL or CR or LF, the first of which may not be ':'> // <trailing> ::= <Any, possibly *empty*, sequence of octets not including // NUL or CR or LF> // // <crlf> ::= CR LF // Parameters are between command and trail auto trailDivider = message.find(" :"); bool haveTrailDivider = trailDivider != message.npos; // Assemble outputs std::vector<std::string> parts; std::string prefix; std::string command; ParamType parameters; std::string trail;
The format can best be described like this, where everything in square brackets are optional:
[:prefix ]COMMAND[ parameter1 [parameter2]][ :trail]
// With or without trail if (haveTrailDivider) { // Have trail, split by trail std::string uptotrail = message.substr(0, trailDivider); trail = message.substr(trailDivider + 2); boost::split(parts, uptotrail, boost::is_any_of(" ")); } else { // No trail, everything are parameters boost::split(parts, message, boost::is_any_of(" ")); }
Up to this point, find out if we have trail and split that off into the trail
variable, while splitting the first section by space
.
enum class DecoderState { PREFIX, COMMAND, PARAMETER } state; bool first = true; state = DecoderState::PREFIX; for (const std::string & part : parts) { switch (state) { // Prefix, or command... have to be decided case DecoderState::PREFIX: case DecoderState::COMMAND: { // Prefix, aka origin of message bool havePrefix = part[0] == ':'; if (havePrefix && first) { // Oh the sanity if (part.size() < 2) { return IrcMessage(); } // Have prefix state = DecoderState::COMMAND; prefix = part.substr(1); first = false; } else { // Have command state = DecoderState::PARAMETER; command = part; } break; } case DecoderState::PARAMETER: { parameters.push_back(part); break; } } }
Then pick the remaining parts and figure out what they are. When prefix
and command
have been decided, anything more are parameters to the command
.
For example, the following lines is a join and then message to a #channel.
:nick!user@10.0.0.1 JOIN #channel<<<
:nick!user@10.0.0.1 PRIVMSG #channel :test<<<
// Construct an IrcMessage IrcMessage ircmsg(command, prefix, std::move(parameters), trail); m_Handles(ircmsg); return ircmsg; }
After this, maintaining a stable Irc state is much easier when all messages follow the same format as the type IrcMessage
. The parser is in this file https://github.com/Studiofreya/smartass/blob/master/irclib/IrcParser.cpp.