TIMi's Binary Ranking System

TIMi Binary Ranking System

TIMi covers all the steps of the modelization process.

1. Data Acquisition

You can analyze datasets stored inside:

  • SAS dataset files (.SAS7bdat files)
  • CSV files (Comma-Separated Values files)
    (The CSV-text file can be compressed or not, supported formats are .rar, .zip, .gz and .lzo)
  • Text files (fixed length lines) 
    (The text file can be compressed or not, supported formats are .rar, .zip, .gz and .lzo)
  • Microsoft Access database
  • Any database that provides OleDB or ODBC access

    A complete list is available here
    This list covers:

SAS, AS/400, VSAM, VSAM-VSE, VSAM-MVS, dBase, Acceler8-DB, Microsoft SQL Server, ALLBASE, Btrieve, C-ISAM/D-ISAM, CorVision, DB2, IBM DB2/400 on iSeries (AS/400), Enscribe, IDMS, IMAGE, IMS/DB, Informix, Informix OnLine Dynamic Server, Ingres/Ingres II, Jasmine, jBASE, MUMPS, NonStop SQL/MP, ObjectStore, Oracle, QueryObject, Rdb, Red Brick, RMS, Sybase, SQLite, Firebird/Interbase, MySQL, ADABAS, Approach, Btrieve, DataFlex, DBMS (CODASYL), DMS II (CODASYL), DMS 2200 (CODASYL), Domino, FoxPro, IMS, Lotus, Micro Focus, Microsoft Access, Microsoft Excel, Paradox, PowerFlex, PostgreSQL, Centura, Datacom, IDMS, OS/390 sequential files, Pervasive SQL, Progress, SAP, Advantage Database Server, ADDS, D3, General Automation, Mentor, mvBase, mvEnterprise, Pick, Reality, Reality/X, Sequoia, Unidata, Universe, Ultimate, UltPlus, SQLBase, Essbase, Peoplesoft, Lawson, Active Directory Provider, Analysis Services Provider, Commerce Server Provider, Provider for Internet Publishing, Index Server Provider, SNA Server, Office documents, Teradata, OpenLink Virtuoso, Microsoft Exchange 5.5 and 2000. MAPI compliant sources, CodeBase Server, Clipper, XML, HTML tables, LINC II, MCP Data Files, Successware Engine, Apollo Database Server, Outlook 2000.

TIMi is fully integrated with SAS. It is able to natively read SAS dataset files, SAS variable labels, and permanent SAS variables formats. The generated models can also be exported in native SAS base code. TIMi can be seen as a true plug-in inside the SAS system.

The preferred format of TIMi is however the compressed CSV file. TIMi natively reads files compressed in all standard compression formats: .ZIP, .RAR, .GZ, and .LZO. Usually, the compression ratio of CSV dataset files is around 90%. This means, for example, that a compressed CSV file of 30 MB has an original size of 300MB. When working on compressed dataset files, TIMi does NOT need to create a temporary "decompressed file": TIMi natively reads directly datasets in their compressed form. This unique feature allows you to easily manipulate otherwise large, cumbersome dataset files.

Furthermore, TIMi is built to minimize I/O access. It means that you can store all your datasets on a central remote computer and still have minimal stress on your organization’s network. This is especially true when using compressed dataset formats such as the .RAR or .GZ compressed CSV files.

 

2. Data Quality Control

TIMi contains a module to check the data quality of our data prior to the modelization process or segmentation analysis. TIMi generates a Microsoft Word document detailing all the classical information about your dataset. TIMi also generates graphical illustrations of the distribution of the dataset’s variables.

 

3. Univariate Analysis

What can we predict using a predictive model with only one variable/column as input? TIMi constructs one univariate model per column and displays the analysis results graphically. It gives you a "reference" or "base line" of what could be achieved using a very simple model. TIMi’s univariate analysis engine also produces a graphical analysis of the distribution of all variables within your dataset.

 

4. Multivariate Analysis & Model Assessment

Through extended use of cross-validation techniques, linear algebra operations coded directly in assembly language, and state-of-the-art mathematical techniques such as LARS, TIMi delivers predictive models that are of unrivaled accuracy. Not only are the delivered models more precise than any other modelization software but they are also computed in a shorter amount of time. Furthermore, TIMi is the only software currently available that is able to "digest" the vast amount of data that are contained in current enterprise-level data marts. As an added bonus, TIMi produces models that are directly understandable by a business user. To demonstrate what the model is doing, TIMi also creates a Microsoft Word document with a precise analysis of the target's profile.

To summarize, TIMi:

  •   delivers the best predictive models
  •   delivers the most comprehensible models (from a business point-of-view)
  •   is faster than the other modelization software (in some common situations, 100 times faster)
  •   is more scalable than the other modelization software
  •   produces several reports that contains a precise analysis of the target's profile
    • These reports include dozens of ready-to-use graphics and charts. There is no need to "export" graphs and tables outside TIMi because all the reports, tables and graphics are generated directly as nicely formatted Microsoft Word documents. Once the analysis is finished, TIMi generates these easily printable and distributable reports. There is no complex interface to use: all is there in a Microsoft Word document. All the data used to produce the graphs in the Microsoft Word report are also directly available in Excel, so that you can manipulate them easily.
  •   is user-friendly
  •   works perfectly in extreme situations when the target size is less than 0.1% of the learning dataset size.

 

Here are some numbers that demonstrates that TIMi’s scalability advantage over other solutions:

Learning Dataset

Computing time to construct
one model

Peak Memory Consumption
[MByte]

Number of Columns

Number
of Rows

Number of Rows after automatic
user sampling

100

10,000

10,000

01 sec

24

100

100,000

100,000

14 sec

27

100

500,000

500,000

1 min 10 sec

39

100

1,000,000

1,000,000

2 min 20 sec

54

100

2,000,000

2,000,000

4 min 39 sec

84

100

4,000,000

4,000,000

9 min 19 sec

144

100

20,000,000

20,000,000

46 min 33 sec

624

200

10,000

10,000

03 sec

24

200

100,000

100,000

31 sec

30

200

500,000

500,000

2 min 35 sec

54

200

1,000,000

1,000,000

5 min 10 sec

84

200

2,000,000

2,000,000

120 sec

144

200

4,000,000

4,000,000

20 min 40 sec

264

200

20,000,000

16,271,183

1 hour 24 min 03 sec

1000

500

10,000

10,000

9 sec

25

500

100,000

100,000

1 min 29 sec

39

500

500,000

500,000

7 min 25 sec

99

500

1,000,000

1,000,000

14 min 49 sec

174

500

2,000,000

2,000,000

29 min 38 sec

324

500

4,000,000

4,000,000

59 min 16 sec

624

500

20,000,000

6,508,473

1 hour 36 min 26 sec

1000

1000

10,000

10,000

20 sec

27

1000

100,000

100,000

3 min 17 sec

54

1000

500,000

500,000

16 min 26 sec

174

1000

1,000,000

1,000,000

32 min 53 sec

324

1000

2,000,000

2,000,000

1 hour 5 min 46 sec

624

1000

4,000,000

3,254,237

1 hour 47 min 00 sec

1000

1000

20,000,000

3,254,237

1 hour 47 min 00 sec

1000

10000

10,000

10,000

4 min 39 sec

54

10000

100,000

100,000

46 min 27 sec

324

10000

500,000

500,000

3 hour 52 min 14 sec

1524

10000

1,000,000

1,000,000

7 hour 44 min 28 sec

3024

10000

2,000,000

2,000,000

15 hour 28 min 55 sec

6024

10000

4,000,000

325,424

2 hour 31 min 09 sec

1000

10000

20,000,000

325,424

2 hour 31 min 09 sec

1000

20000

10,000

10,000

10 min 18 sec

84

20000

100,000

100,000

1 hour 43 min 04 sec

624

20000

200,000

200,000

3 hour 26 min 08 sec

1224

20000

1,000,000

246,045

4 hour 13 min 36 sec

1500

20000

2,000,000

246,045

4 hour 13 min 36 sec

1500

20000

4,000,000

246,045

4 hour 13 min 36 sec

1500

20000

20,000,000

246,045

4 hour 13 min 36 sec

1500


The computing times given in the table above are the time required to build a complete predictive model, including all the reports. The red cells indicate that TIMi automatically sampled the learning dataset to avoid using too much memory. The sampling is completely transparent to the user.

NOTE: The computing times that are displayed in the table above are indicative only. The equation used to fill-in the table is:

computing_time_in_seconds= 7E-7 * nRows_after_sampling * (nCol^1.25)

The time needed to apply a model on a new unknown dataset is negligible: TIMi’s C++ engine usually scores 250,000 rows per second. There is, of course, no limit to the number of rows that you can score. You can also score your database using models exported as SAS data step scripts. TIMi’s scoring engine performs additional computations to estimate whether or not your models are outdated. The results of these computations are reported inside a XML file (for automatic industrialization of the process) and inside a graphical Microsoft Word report.

Currently no other software is able to handle such large amounts of data. TIMi uses an ultra-fast proprietary internal compression algorithm to be able to manipulate large datasets in central core memory.

If you want more speed, you can ask to TIMi to sample the learning dataset. This is as easy as adjusting the slider inside the user-friendly graphical interface. Of course, sampling is a technique that can be quite dangerous because it has a tendency to produce mediocre models. This is especially true when the target size is less than 1% of the learning dataset size, which is a very common situation in banking, telecommunication, and pharmaceutical applications.

A reduction of computing time is also possible by using dual-core or quadri-core processors. TIMi is multithreaded and fully exploits multi-core processors (the whole software is multithreaded).

TIMi opens new doors for analyzing your data:

There is nearly no limitation to the number of columns that you can analyze. Furthermore TIMi is not sensitive to skewed or noisy data distributions. What does this mean in practice? It means, for example, that you can add inside your datasets many new columns/variables that are the ratio of two original variables. You don't have to worry about such detail as: "Oh no! If I add these 10 new ratios, I will have more than 300 columns to analyze and my data mining software will crash! I have to make an arbitrary choice about which columns I will keep inside my learning dataset...". Data miners using TIMi have created datasets of 20,000 columns based on original datasets of only 1,500 columns. TIMi opens new ways of extracting the information contained inside your data.



5. Campaign optimization

A unique module named "Profit Explorer" allows you to optimize your campaigns. You can easily, in real time, see what the ROI of a given campaign will be, in function of different business parameters.

 

6. Model Exploitation & Industrialization

Putting complex models in production has always been difficult and time consuming. With TIMi, this is no longer the case. Models produced with TIMi can be applied directly on your raw datasets. There is no need for complex pre-processing operations such as cleaning, recoding, etc. All you have to do is to select:

  •   The model to apply
           (it's a simple XML file that contains everything needed to apply the model)
  •   The data set on which the model will be applied.
           (The supported datasets formats are the same as for the data acquisition)

TIMi generates an alert when you try to put in production a predictive model that is outdated. The alert is based on extensive statistical tests that assess if all the variables used by the predictive model on the new dataset still have the same statistical distribution as the variables inside the learning dataset.

More traditional approaches rely on "backtesting analysis" to find out if the model is outdated. The "backtesting approach" has a major drawback: you see that the model is outdated only after using it – in which you would already be losing money.

TIMi models can also be exported in SAS-Base code, PMML, or optimized C code. However, exported models are not able to alert you in all cases when the model is outdated.

 

7. Population Segmentation Analysis

Let's assume that you have several models that each "select" a different part of the population. For example, you have:

  •   Model "A" , which finds people interested in product A.
  •   Model "B" , which finds people interested in product B.
  •   Model "C" , which finds people interested in product C.
  •   Model "D" , which finds people interested in product D.

TIMi allows you to easily compute where the largest differences (in terms of customer profile) are between the customers inside the segments A, B, C and D. For example, you can easily and graphically compare the distribution of the Age variable between the different segments.

With TIMi, it's very easy (with one mouse-click) to explain your segmentation.

 

8. Optimal Resource Allocation

Let's assume that you want to sell a product in three shops: S1, S2 and S3. You build a "propensity to buy" model in order to easily compare the optimal "sell rate" of each shop compared to the current "sell rate". For example, let's assume that shop S1 has the potential to sell your product to 10% of its customers. If the current "sell rate" of shop S1 is only 5%, it means that it could use some help. You should remove some salesmen from shops S2 and S3 and put them in shop S1. On the other hand, if the current "sell rate" of shop S1 is 20%, it means that it is over-exploiting its pool of customers.

The same reasoning could be applied on variables other than the "Shop" variable. For example, you can easily see (with one mouse-click) if you are over-exploiting customers in a given age range or geographical region".

With TIMi, you can easily see where the commercial potential is and optimize your resources to maximize your revenues.

 

9. Batch mode - Industrialization of Even the Most Complex Tasks

The standard way of using TIMi is through its user-friendly graphical interface, using simple mouse-clicks.

It is also possible to run all the TIMi modules from a shell or SAS command-line. All the software modules of TIMi can be configured using human understandable XML files. The whole procedure of creating, updating, testing, and deploying new models can easily be completely automated.

For example, we can imagine that an alert is generated because a model is outdated. Following this alert, TIMi is launched and a new predictive model is created automatically (using some optimal modelization parameters that were saved inside an XML file). Next, the model is tested. If the test procedure is successful the model is directly applied on the new dataset. The whole procedure is scripted and does not require human intervention.

The production environment can be completely automated through the use of simple script files (SAS or batch files).

 

10. Integrating TIMi in Other Tools (OEM Integration)

All of TIMi’s input configuration and output report files are simple text-based XML files. By default, these XML reports are converted to nicely formatted Microsoft Word documents. All of TIMi’s functionality is available through command-line. Internally, TIMi uses a polymorphic mechanism to be able to quickly extend the data acquisition procedure to practically any kind of data source. As a result, it's very easy to integrate TIMi in any tool. Integration usually takes less than two weeks.


Contact 20Q Group to learn more about TIM Data Mining and Predictive Analytics Are you ready to learn more about how TIMi can help you explore your data?

 Contact us here or call +1 414 367 9207.

Schedule a call with us

...

Login Form

*
*
*
*
*
*

Fields marked with an asterisk (*) are required.

Register to download Anatella

*
*
*
*
*
*

Fields marked with an asterisk (*) are required.

Register to download TIMi Data Mining & Predictive Analytics

*
*
*
*
*
*

Fields marked with an asterisk (*) are required.

Interpolated Carolinians acclimate cialis bestellen zonder recept in nederland acclimate dapoxetine kopen sildenafil goedkoop blindfolded Chickasaws rubies squires