Data Products Strategy: Using Generative AI

5 min readMay 9, 2024

Continuing the series of articles related to the implementation of a Data Products Strategy, today I wanted to share my thoughts on the impact of Generative AI on the development of these products.

Within the development lifecycle of data products, Generative AI can be utilized to accelerate and optimice their development.

Three areas where it can be applied are:

When analyzing our data product, for example, in obtaining information for market exploration or pattern discovery.
While experimenting to validate business value, such as implementing rapid changes or generating tests.
During communication and distribution of the data product, for instance, in automating product documentation creation, generating communication programs, or analyzing user feedback.

So, we could say that there are two ways to apply Generative AI when developing data products: as a tool to develop the product and as a capability to integrate into the product.

In this article, we will focus on the first option.

Development tool

Within the product lifecycle, there are several phases where AI can be applied as a tool for product development (not just at the code level):

To automate and synthesize information from various sources and identify patterns when researching the functionality of the data product.
In prototyping, where it can be done more easily and quickly.
Automating user feedback evaluation.
During product development, leveraging code generation capabilities, conducting software documentation, developing automated tests, etc.

Impact on the product development phase

Where the greatest impact of AI usage is currently being detected is in the product construction phase. In this phase, it’s important to note that not all costs are related to development. For example, there are other tasks such as analysis, design, or testing that do not depend solely on development. We also have the usual management and support costs that projects typically incur.

Therefore, the impact it may have in the development phase when implementing the product would be reduced in the rest of the phases.

Taking into account the metrics obtained in the study conducted by Github on The economic impact of the AI-powered developer lifecycle and lessons from GitHub Copilot, where it is indicated that according to the analysis they have conducted, using Github Copilot can program 55% faster.

As mentioned earlier, this impact will not be the same in the other phases of the project. Below, I have made a rough calculation where:

I estimate at a high level, taking into account the volume of development, the costs of the other phases.
I apply a percentage of the other phases relative to development, based on my experience, which may vary depending on the type of project.
Based on this calculation, I apply a percentage of estimated efficiency from the capabilities provided by market tools, but not backed by any study (I have searched, but have not found any), so it may be 100% debatable.

IMPORTANT: This exercise can be used as a starting point and refined according to experience, project type, and tool usage. If you see that it could be approached differently, please leave it in the comments. In the current state of the art, any feedback is welcome 🙂.

With these assumptions, the data would be as follows:

With this approach, the following development cost efficiency can be inferred:

As you can see, development efficiency is significant, but the impact on the overall set of tasks related to software creation is reduced, which ultimately are the costs required to develop a product.

Nevertheless, this impact remains quite substantial, making it worth considering.

Evolution of LLMs for Software Development

It’s also important to consider the evolution that these models are undergoing in terms of code generation. With each new release of an LLM, there is an improvement in the accuracy of how they generate code.

Below, you can see a benchmark of the various models available in the market, conducted by Vellum, where the one related to code development (in this case only Python) is highlighted in red.

GPT-4 is currently at 67% accuracy in this benchmark, but newer models like Claude Opus 3 are reaching an accuracy of 84.9%, as seen in this benchmark.

Evolution of Code Assistants

Moreover, there are ongoing projects attempting to go further, such as Devin, the first AI developer, implementing a platform with more autonomy in code development, capable of evolving existing code by providing the corresponding GitHub repository and implementation requirements.

Additionally, GitHub is also working on the GitHub Copilot Workspace, with similar features.

Conclusions

Even though this field is still in active development, there are already tools and studies measuring the impact of such approaches on development costs.

Therefore, using these tools in the development phase should already be a must for every developer, not only for time optimization but also for improving the quality of deliverables through documentation and testing generation.

What will it be like to develop software in 2 years? It seems likely that it won’t be quite like what we know now 🙂.

Let me know your thoughts in the comments!