What tools do data analysts use

Big data

Over the past few decades, computers have transformed from filing cabinets for data to technological crystal balls that promise to predict the future by analyzing data. The tools that do this fall under the term predictive analytics and essentially fulfill two functions:

  • the analysis of databases or databases in order to derive recommendations for action for the future

  • as well as the preparation of the analysis data, which rarely has the required consistency.

The latter function includes both uncomplicated tasks such as the standardization of formatting and the often time-consuming elimination of errors. A real challenge here is often to maintain data integrity. Well-engineered predictive analytics tools master both requirements from the FF. We have put together 15 of the most popular predictive analytics tools (ets) for you.


In the past few years, Alteryx has focused on equipping its reporting and workflow management platform with predictive algorithms. The tool has a broad library as well as numerous interfaces for data import and supports a large number of common and less common data sources.

The Alteryx tool can be adapted in many ways and is designed more for managers with data know-how than for developers who delve deeper into predictive analytics and want to link it with reporting and business intelligence on a broader basis. In addition, Alteryx also offers specific solutions for specialist departments, for example for marketing or research.

Amazon Web Services

The AWS toolset for analyzing data streams for signals or patterns continues to grow. The offers are traditionally separated according to product lines. Amazon Forecast focuses, for example, on expanding economic time series to predict which sales figures can be expected for the next quarter and how many resources will be required to meet demand. Amazon Code Guru, on the other hand, looks for imponderables within source code in order to improve processes.

Some AWS tools such as Fraud Detector or Personalize primarily support Amazon's business itself - but are now also being sold on to other companies that want to create their own e-commerce realm.


Companies that want to continue to use dashboards in the future to visualize data trends in summary should take a closer look at what Board has to offer. The tool makes it possible to tap into a large number of data silos (ERP, SQL, etc.), to analyze the information stored there and to output the results in the form of a report that provides information about the business past as well as the future (predictive).

The focus is on summarizing data from as many sources as possible and "compressing" it into a standardized form, which in turn can then flow directly into the visualization or predictive analytics.


Dash's toolset is available in a free, open-source version and an enterprise version and enables the cloud-based management of predictive analytics models that are either already in use or are currently being developed.

The open source version comes with Python libraries for data analysis and visualization, the enterprise version comes with additional tools, for example for Kubernetes, authentication or the integration of GPUs for deployments for large user groups. The paid version also offers users more low-code extensions to create dashboards and other interfaces.


Databricks' toolset is based on the four major open source frameworks Apache Spark, Delta Lake, TensorFlow and ML Flow and is suitable for companies with large amounts of data. In order to integrate predictive analytics into workflows in the best possible way, the package also contains collaborative notebooks and data processing pipelines. Databricks has also already set up integrated versions of its toolset for AWS and Azure.


Companies that value the option of accommodating their predictive analytics models in local hardware, the cloud or a hybrid solution can manage their data and models with DataRobot. The tools combine automated machine learning with a series of routines focused on specific industries.


IBM's Predictive Analytics Toolset comes from two different branches: SPSS was founded in the 1960s and has become the basis for many companies that wanted to use statistics to optimize their production lines. The tool has long since left the era of punched cards: In the meantime, non-programmers can also drag and drop data into a graphical user interface in order to generate detailed reports. IBM acquired SPSS in the summer of 2019 for around $ 1.2 billion.

Under the umbrella of the Watson brand, IBM has assembled another analytics toolset that is constantly being expanded. The Watson tools for predictive analytics are largely based on iterative machine learning algorithms that can both train data and develop data models. The tools are able to process numbers, images or unstructured text.

  1. Lars Schwabe (Associate Director at Lufthansa Industry Solutions
    “The success rate of predictive analytics projects has increased because the companies have finally done the necessary preparatory work, for example the creation of modern data architectures. In addition, the staff have become more knowledgeable and the tools have become better. "
  2. Daniel Eiduzzis (Solution Architect Analytics at Datavard)
    “Technically, companies have to open up and shouldn't be slavishly committed to a manufacturer. Today it is much more a matter of identifying the ideal instrument, depending on the respective use case, with which the questions can be served in the best possible way. Therefore, a best-of-breed approach can make sense here. "
  3. Jan Henrik Fischer (Head of Business Intelligence & Big Data at Seven Principles)
    “With the methods of predictive analytics and the parallel increase in digitization, we will understand processes better. This will affect all areas of a company without exception. The greatest potential lies in optimizing customer processes. With a deeper understanding of their needs, we will be able to serve customers more efficiently and better and increase their loyalty. "
  4. Vladislav Malicevic (Vice President Development & Support at Jedox)
    “Many companies have been experimenting with predictive analytics for a long time. So far, there has often been a lack of specific use cases with a clear added value, the so-called business case. But the next phase in the technology lifecycle has already begun, and companies are no longer just conducting experiments that are purely innovation-driven. They are increasingly combining predictive analytics and AI projects with an added value that is clearly defined in advance for certain specialist areas or business processes, including the expected results and the possible effects on previous processes. "

Information Builders

Information Builders' data platform enables data architects to set up a visual pipeline that collects, cleanses and then "throws" data into the analytics engine. If information is processed that should not be visible to everyone, there is the option of "full data governance models", and specific templates are also available for individual sectors such as industry, which are intended to give users particularly quick insights into data secrets .


With its MATLAB solution, MathWorks originally wanted to support scientists in research with large amounts of data. In the meantime, MATLAB has mastered much more than just the numerical analysis of data: The product line now focuses on the optimization of statistical analyzes, while the SIMULINK product group is used for simulation and modeling purposes. The company also offers special toolboxes for many individual markets, for example autonomous mobility or image processing.


Python is now one of the most popular programming languages ​​- but also one of the most popular languages ​​for data analysis in science. Many research institutions use Python code to analyze their data. Data scientists have now bundled the data and the analytical code in the Jupyter Notebook app. Python tools such as PyCharm, Spyder or IDLE bring new, innovative approaches into play, which, however, often still require a bit of fine-tuning and are therefore primarily suitable for data scientists and software developers.


From a technical point of view, R is just an open source programming language for data analysis, which largely comes from the academic community. The integrated tools R Studio, Radiant or Visual Studio are of good quality, but are more for hardcore data scientists and programmers. Anyone looking for the latest community ideas for experimentation will surely find them here. Many of the tools listed in this article allow the integration of R code in the form of modules.

Rapid miner

Rapid Miner is designed in such a way that predictive data models can be created automatically without assistance in the shortest possible time. The developers also offer Jupyter notebooks with "automated selection" and "guided data preparation". The available models are based on principles such as classic machine learning, Bayesian statistics or various forms of clustering. Explanations of the individual models provide information on how exactly the models derive their results.


Many companies rely on SAP to manage their supply chains. So it is a good thing that the Walldorf-based reporting tools now also support predictive analytics. For example, predictions can be made using machine learning models based on "old" data. The software, which can run either locally on premises or in the cloud, also has AI capabilities. Specific user interfaces with cross-departmental consistency and the distinctive possibilities on mobile devices round off the predictive analytics package from SAP.

SAS Advanced Analytics

The Predictive Analytics Toolset from SAS bundles almost two dozen different packages on one platform that converts data into both insights and predictions. The focus of the SAS tool set is on the analysis of unstructured texts.


Tableau has made a name for itself with its almost artistic preparation of reporting information and was bought by Salesforce last year. At Tableau, dashboards can now be used with the help of an embedded analytics model to interactively inform you about the results of the data analysis. (fm)

This article is based on an article from our US sister publication cio.com.