/Roll Your Own Analytics – PC Maffey

Roll Your Own Analytics – PC Maffey

Google Analytics runs on over 56% of all websites. It’s the backbone of ad-tech across the web. Unfortunately, for site owners like me who just want to learn how people are using their website—while respecting their privacy—there simply aren’t any alternatives that meet all my requirements. So in two days, after a couple dead-ends, I built my own using React, AWS Lambda, and a spreadsheet. This is how.

My Requirements

No 3rd-party tracking – When you visit my site, no one else is listening in. That means, no calls to any 3rd-party domains from the client.* No tracking snippets or pixels. No 3rd-party cookies. (Also, don’t use Google Fonts!)

Anonymous – Don’t store any personal data, unless given with permission (eg. email for newsletter). Don’t store IP addresses or superfluous session data that could be used for fingerprinting. This way, my site is GDPR compliant by default. And while not a hard requirement, I don’t use cookies to track visitors across multiple sessions.

Avoid ad-blockers – My goal with analytics is to learn how people use my site so I can improve it and serve them better. I’m not using ad-tech so there’s no point in getting blocked by 25% of visitors with an ad-blocker. That means doing 1st-party analytics, without using a 3rd-party tracking snippet—even self-hosted!*

No bloat – What’s the minimum number of network calls required to log a full session? Turns out, the answer is 1. Unlike every tracking snippet.*

Free – I spin up a lot of sites for various projects. Having a free tier when getting started is critical. So everything in this stack needs to be free up to a certain level—beyond that, I’m happy to upgrade and pay.

Easily replicable – I want to be able to copy and paste a few things and set up analytics for a new site’s analytics.

No server – I use (and love!) Netlify’s free static site hosting. It doesn’t provide access to server logs though—which rules out log-based analytics, like Go Access (which would satisfy most of the other requirements). Fortunately, Netlify does provide free proxied access to AWS Lambda functions…

The Stack

I wanted to set this up quickly, so I decided on the most easily accessible tools for my current go-to stack.* You could swap out any of these layers for your preferred tech. If you’re expecting higher usage, you’ll almost certainly want a more robust data store.*

Here’s how it works:

  • 1.Log a session on the client (React, state + context API)
  • When the session ends…
  • 2.Call a lambda function (AWS Lambda via Netlify)
  • 3.Store the data in Google Sheets (Google Sheets, Google App Scripts)

3.3 Define the Data Model

Working backwards, I start by defining what data I care about tracking. This is actually one of the biggest ancillary benefits of rolling your own analytics. I get to perfectly fit my analytics to my application. How much of GA do you really understand and use anyways?

My initial use case is a common one. I’m building featherbubble.com, an interactive children’s story. The site has a landing page and a waitlist conversion page. I’ll soon add a pilot episode people can read through to preview. So, a site with a few pages and a conversion.

Track sessions, not people, mostly events. All I want to know is referrer and some session metadata like language, timezone, and device. And then I want to log what happened using events* and summarize that in the session table.

I create a new Google Sheet (sheet.new) with two sheets:Sessions and Events. Put the column headers in the first row. A logged session looks like this:

Sessions

Here’s some more details about each field.*

Events

And details about the event fields.*

Again, you can customize these fields to your heart’s content. Instead of shoehorning all your data into the complicated event mappings of Category/Action/Label/Value fields (or Segment’s analytics.js `properties` field), you can simply create a new column with what you want to track.

3.2 Save the Data

I initially tried to use Google’s Sheets API. This is a dead-end. Even with an API key, while you can read data from a spreadsheet, “anonymous edits aren’t currently supported.” The API needs an auth token.

Fortunately, there’s an even easier way (without needing to walk through the complexity of Oauth configuration): Google App Scripts. From your spreadsheet menu, Tools > Script Editor, you can publish a script that will run operations on your spreadsheet.

I got the idea from here, and modified the script to:

  • a.Handle POST requests (instead of GET) to receive a JSON object of Session data, with an array of Events:
  
    
    sessionId: '1234',
    device: 'iPad',
    ...,
    Events: [
      
      sessionId: '1234',
      event: '/',
      ...,
    ]
  
  • b.Parse and write that data to the Sessions and Events sheets. The script will match whatever column headers you use for each sheet to the keys in your JSON blob
  • c.Aggregate daily totals to a 3rd sheet called ‘Analytics’ (more on that below)
  • d.Send me an email if it errors
  • Here’s the full gist. Copy that into the script editor, make your changes and save it. To publish, in the menu go to Publish > Deploy to web app....

    A few important gotchas about publishing:

    • You must execute the app as yourself (ie. an authenticated google account)
    • Give access to Anyone, even anonymous
    • Save the web app URL and treat it like an API KEY. Don’t publish to Git.
    • When you make changes to the script, you will need to redeploy it and update the project version to New
    • You need to manually run the setup function. From the menu, Run function > setup You only ever need to do this once.

    Now you’ve got an API endpoint you can use to store analytics data in your spreadsheet.

    3.1 Bonus: Aggregate Daily Totals

    Getting the data in is only half the battle. I want to make my analytics easily consumable. For me, that means a daily summary, in my inbox. So, because my spreadsheet-fu is not very good, instead of figuring out pivot tables, I decided to write a script that aggregates the session data each day, adds it to a 3rd “Analytics” sheet, and emails me a report.

    Analytics

    Here’s what I’m aggregating.*

    Instead of running a daily cron job, whenever a new session is saved, I check to see if the previous day’s sessions have been aggregated. If not, I total them up and email myself a report.

    The dailyTotals function is included in the gist above.*

    With the daily summaries in a spreadsheet, I can easily turn that into a chart and view by week, month, etc.

    2. The Lambda Function

    Serverless lambda functions are perfect for an analytics API. Logging doesn’t require a response. This makes the cold start issue with lambda functions negligible. Of course, there’s no reason you couldn’t use a regular server endpoint, especially if you’ve already got an API. But for a static site, the free and easy setup with Netlify along with the potential to scale effortlessly makes it an obvious choice.

    Using Netlify’s free tier,* access to AWS Lambda functions are easy to setup. I won’t go into the details, since there are plenty of resources out there already. Here are the important parts:

    • 1.Setup your functions – Check out Netlify’s create-react-app-lambda package or this guide for how to get started with Lamda functions in a React app. I’m using the netlify-lambda package to build my functions with webpack.
    • 2.Get a proxied endpoint – Once setup, you’ll be able to post data to example.com/.netlify/functions/my-function, and viola, your analytics API calls are to your own domain.
    • 3.Parse and format data – When the lambda function receives the raw session data from the client, I have it do most of the work to format the data and do things like: generate a sessionId and add it to each event, count the number of PAGE events, check for a conversion, etc.
    • 4.Post it – Finally, I post the formatted Session data as JSON to the Google Sheets / App Script url, which I’m storing as an environment variable in Netlify.

    Here’s my lambda function code:

    
    import 'regenerator-runtime/runtime'
    
    import fetch from 'node-fetch'
    
    
    const  GOOGLE_SHEETS_URL  = process.env
    
    
    import genId from './helpers/genId'
    import getDevice from './helpers/getDevice'
    import isBot from './helpers/isBot'
    
    export const handler = async event => 
      const data = JSON.parse(event.body)
    
      
      if (!data 

    1.3 One Call (to rule them all)

    Now that I’ve got a serverless endpoint, I just need to log a visitor’s session and events from the client. There are a couple major differences with how I’m doing things vs. a standard analytics implementation.*

    I make a single call at the end of a session, posting the entire session log as JSON to my lambda function. How do I know when a session ends? Unfortunately, there’s no magic bullet for universal coverage across all browsers and all cases. Instead, I’m listening for several exit events and gracefully degrading based on what’s happening in the client’s browser. While I haven’t done a side-by-side test with a standard implmentation, I’d estimate I’m covering around 95% of visitor sessions.*

    I’m sending the data with navigator.sendBeacon if it’s available,* which posts the session data in the background without waiting for a response. As a fallback, I have several levels of degradation that get called based on browser.

    Here’s the simplified version:

    
    window.addEventListener('pagehide', endSession)
    window.addEventListener('beforeunload', endSession)
    window.addEventListener('unload', endSession)
    
    if (iOS) window.addEventListener('blur', endSession)
    
    let skip
    
    const endSession = () => 
      
      if (skip) return
      skip = true
    
      
      const data = SESSION_DATA
      const url = FUNCTION_URL
    
      const  vendor  = window.navigator
    
      
      
      if (window.navigator.sendBeacon && !~vendor.indexOf('Apple')) 
        
        const beacon = window.navigator.sendBeacon(url, data)
        if (beacon) return
        
      
    
      
      
      const async = !iOS
      const request = new XMLHttpRequest()
      request.open('POST', url, async) 
      request.setRequestHeader('Content-Type', 'application/json')
      request.send(data)
    
      
      
    
      
      if (!async 

    The benefit to this strategy is a lightweight solution, both for the client and the API—which is subject to usage rates. You could certainly implement a more standard approach to log each interaction as it happens, but it will likely cost much more at any kind of scale.

    1.2 Log Session Metadata

    With React, it’s easy to make a few reusable components for all my analytics. I use react-router, and render my main Analytics component with every route: <Route component=Analytics />.

    This component has 3 main functions:

    • 1.Start session – Rather than calling this onMount, I’m registering an on “load” event listener that starts the session once the document has fully loaded. This way, if a visitor bounces before the page finishes loading, I ignore it.
    •   Then, I log the initial PAGE event (more about event logging in the next section), save some session and page performance metadata in the component’s state, and then register my endSession event listeners.
    • 2.End session – As described in the previous section, when the session ends, I grab the session data from the component store and post it to my lambda function’s url.
    • 3.Listen to route changes – When the path changes,* I log a page view as an event.
    if (location.pathname !== prevProps.location.pathname) 
      event(location.pathname,  label: 'PAGE' )
    

    1.1 Log Events

    The event logging component is a tad more complicated, as the event() function needs to be accessible anywhere in the code I want to log an event from.

    To accomplish this, I’m using React Context as a “global” store for both my array of events and the push function.

    I create an Events component for the context provider,* that stores the events in local state, and passes into Context.Provider a function that adds a new event to the array.

    
    event(name, properties = ) 
      const event = 
        event: name,
        timestamp: new Date().getTime(),
        ...properties,
      
    
      events.push(event)
      this.setState( events )
    
      
      if (dev) 
        const  label, ...rest  = properties
        console.log(label + ': ' + name, JSON.stringify(rest))
      
    

    I can then create a Context.Consumer anywhere in my code to access the context. I try and avoid render functions as a pattern, so I turn my context consumer into a helpful higher-order component:

    
    const Event = Component => 
      const setContext = props => (
        <SetContext.Consumer>
          context => <Component ...props event=context />
        </SetContext.Consumer>
      )
      return setContext
    
    
    
    import Event from '..'
    const MyComponent = ( event ) => 
    export default Event(MyComponent)
    

    You could probably do this even easier now with Hooks.

    The final step is to grab from context the array of events in the endSession() function and post it as part of the session data.

    A Final Note

    On performance – I’m just starting to use this this analytics setup with a couple brand new sites (including this one, first post!). As I learn how this performs with traffic, I will update this section with details.

    Asking for help – If you need help setting this up for yourself, feel free to send me an email at , and I will do my best to answer your questions.

    Thanks for reading!