Articles

Meet the People and Tech of Engineering at Attentive

Article illustration

We’ll be diving into what makes our culture unique, and how we tackle challenges with smart solutions and a teamwork-first approach.

It’s been an exciting time over the last few years at Attentive, as we’ve been riding the wave of hypergrowth, from building the next generation of Attentive’s AI marketing platform to growing our team. It's a rollercoaster that demands agility, smart decision-making, and a whole lot of innovation. 

Our customers demand state-of-the-art features, consistent with an industry that's surging into uncharted AI waters. They also require our services to be stable, reliable, and have the scaling capabilities that enterprise customers require.

Creating an organization capable of developing top-tier software starts with crafting a high-performance culture. “Living our values” may sound like a cliche, but it really presents an opportunity to clarify what we should (and shouldn't) be doing. By viewing these principles through the prism of engineering, we can come up with real, concrete improvements to our culture.

Be One Unstoppable Team 

We all own pieces of a giant messaging service. It’s important to work together as engineers and cross-functionally to create thoughtful, well-structured interfaces between our services. We need to break through team boundaries and ensure we’re building the right software for the future.

Default To Action 

Indecision and the diffusion of responsibility can stop us in our tracks. Borrowing from Voltaire (or echoing the Pareto principle): “pursuing perfection can sometimes be our downfall.” We need to build smartly for today, plan the right roadmap for tomorrow, and never stop moving forward.

Champion The Customer

Behind every tool we develop are thousands of marketers, each with their unique viewpoints, needs, and wishes. Listening to them isn't just beneficial—it's crucial. Seeing the world through their eyes directs us towards crafting the right features, the most user-friendly UX, and pinpointing the essential bugs that need fixing.

Act Like An Owner

We’re all invested in the company, and we are truly fractional owners. Often, the most game-changing ideas bubble up from the grassroots level. Embracing an ownership mentality empowers us to propose new solutions and challenge ‌existing norms without hesitation.

Our tech stack

Ok, but what about the technology?

We use Java and React for seamless app development, Kubernetes (k8s) for orchestrating our containerized applications, and GraphQL for optimizing data queries. We leverage Gradle for automating our builds, gRPC for lightning-fast inter-service communication, and CircleCI for continuous integration to streamline our deployment process. To ensure our infrastructure is as robust and scalable as it can be, we employ Terraform and Istio, while Playwright aids us in end-to-end testing to guarantee a flawless user experience. Monitoring and logging are critical for our operations, which is why we trust Datadog to keep an eye on things.

Our architecture consists of hundreds of services all housed within a monorepo, which simplifies dependency management and makes our codebase easier to navigate. When it comes to data storage, we don't believe in a one-size-fits-all solution; instead, we choose the best tool for each specific use case. For example, we may consider Postgres (traditional SQL), DynamoDB (NoSQL), Redis (ephemeral key/value), or Planetscale (sharded data at mega-scale).

The infrastructure scale at which we operate is significant. We manage 18,000 containers to support over 200 Java services, which gives us the scale we need to support our large customer base. Our streaming services are also a cornerstone of our operation, processing over 80 billion events per month. This massive throughput requires a thoughtful architecture that can handle data at scale while maintaining performance and reliability.

We also leverage multiple technologies to power our AI/ML capabilities. For orchestration and CI/CD, we use Argo to efficiently manage workflows and ensure seamless deployment of our machine learning models. Our compute infrastructure is built on Kubernetes (K8s) with GPU support, providing the necessary resources for intensive computations. We employ popular machine learning frameworks such as PyTorch, XGBoost, and scikit-learn for model development and training. For data processing and analytics, we use Apache Spark, which allows for fast and scalable data processing. Additionally, we integrate Iceberg tables with Snowflake to manage our data storage and ensure easy access to our data warehouse.

Messaging at scale

Our SMS and email platforms are all about catching customers at the right moment, using their preferred communication channels, and ensuring the message lands just in time. We manage this for over 8,000 brands.

Scale-wise, our traffic generally mirrors consumer shopping patterns. Traffic peaks during the day around noon ET, and significantly spikes during holiday periods. The largest of these, of course, is Black Friday / Cyber Monday. Last year, we sent an astonishing 2.2 billion SMS messages that week (up 31% from 2022!), driving $1.8 billion in revenue for our brands. 

When we hit those peak traffic times, our messaging pipeline has to exhibit two critical qualities: reliability and throughput. Let’s talk about why these matter. 

Reliability is critical, as our customers depend on us to drive a significant portion of their holiday sales. If we have issues (or worse, go down), we risk customer unhappiness and churn. To mitigate this, we use a few strategies: we ensure all of our services are carefully monitored. Metrics fire to Datadog, which then creates alerts based on metric thresholds and errors. These automatically create incidents, which our engineers can use to page the appropriate team(s) for remediation. Additionally, we run integration and E2E tests on our systems before and after deployment, to catch code defects as quickly as possible.

Throughput is also key, as many of these offers are time-sensitive, and need to be delivered as close to the target time as possible. To do this, we need to support a minimum rate of around 25,000 messages per second. Achieving this is a surprisingly complex problem, as the lifecycle of a message involves multiple steps and services. Each service must be able to maintain the desired throughput levels, without creating bottlenecks.

As an example, here’s a high-level overview of the layers involved in a campaign message:

  • Campaign systems
    Serves the UX, which allows our brands to write the content to send, including any links and global macros. Evaluates whether the campaign should use dynamic features like send time optimization or A/B testing.
  • Audience resolution / segmentation
    Figures out which customers should receive the message, based on segmentation.
  • Personalized content
    Personalizes the message for each customer. Renders macros, individualized product recommendations, and any other dynamic content.
  • Send services
    Stores and queues these messages for send, handling flow control, batching, and sequencing.
  • Aggregators
    Sends the messages to our SMS and email aggregators, ensuring we can handle multiple webhooks and status updates per message.
  • Carriers
    Monitors the flow of messages to carriers and individual handsets, checks  delivery status, and ensures we comply with specific carrier guidance.

Each of these services needs to respond to requests very quickly (milliseconds!) for the platform to hit our throughput targets. 

Engineering team culture

There’s still considerable industry debate as to whether remote or in-person work is more effective. At Attentive, we’re trying a mixed approach. 

Of our ~200 engineers, around 65% are fully remote. Remote teams give our employees flexibility, and also gives us access to a larger candidate pool. But remote work can create communication and team cohesion challenges, so it’s important to give our managers the right tools and guidance to manage distributed teams.

We also believe that in-person (hybrid) work can provide significant value for certain teams. Typically, these are roles which require high cross-functional collaboration, such as user-facing product teams. We’re focusing on hybrid work in our New York and San Francisco offices. Engineers and Product Managers on these teams are expected to be in office three days a week, so we’ve invested significantly in office space, amenities, daily lunch, opportunities for connection, and other perks to make their experience comfortable and conducive to highly collaborative work. 

We also recognize the value in occasionally getting everyone together. We recently hosted a company-wide kickoff in Vegas to align everyone on our 2024 goals, get some face time with teammates that you may have only ever seen on Zoom, and engage in some wild karaoke.

Beyond offsites, we prioritize connection, coaching, and mentorship in a variety of ways—from ERGs that allow folks to build connections around shared experiences to weekly Eng All-Hands where we celebrate wins, share demos, and host regular Q&As with leadership.

Looking ahead

Over the coming months, we’ll be sharing more glimpses into the engineering work we’re doing at Attentive—showcasing some of the interesting problems we’ve tackled, as well as insights and lessons we’ve gathered along the way.

And, if you're as passionate about engineering as we are, we're on the lookout for new team members across all three of our Engineering hubs: San Francisco, New York City, and for those of you who love the comfort of your own home, remote. If you've ever wanted to be part of a dynamic team that's exploring the cutting edge of consumer messaging, now's your chance. Come join us!

View all articles

Discover what it's like to work for us —
our culture, benefits, values, and more.